Comparison of Random Forest Model and Frequency Ratio Model for Landslide Susceptibility Mapping (LSM) in Yunyang County (Chongqing, China)

To compare the random forest (RF) model and the frequency ratio (FR) model for landslide susceptibility mapping (LSM), this research selected Yunyang Country as the study area for its frequent natural disasters; especially landslides. A landslide inventory was built by historical records; satellite images; and extensive field surveys. Subsequently; a geospatial database was established based on 987 historical landslides in the study area. Then; all the landslides were randomly divided into two datasets: 70% of them were used as the training dataset and 30% as the test dataset. Furthermore; under five primary conditioning factors (i.e., topography factors; geological factors; environmental factors; human engineering activities; and triggering factors), 22 secondary conditioning factors were selected to form an evaluation factor library for analyzing the landslide susceptibility. On this basis; the RF model training and the FR model mathematical analysis were performed; and the established models were used for the landslide susceptibility simulation in the entire area of Yunyang County. Next; based on the analysis results; the susceptibility maps were divided into five classes: very low; low; medium; high; and very high. In addition; the importance of conditioning factors was ranked and the influence of landslides was explored by using the RF model. The area under the curve (AUC) value of receiver operating characteristic (ROC) curve; precision; accuracy; and recall ratio were used to analyze the predictive ability of the above two LSM models. The results indicated a difference in the performances between the two models. The RF model (AUC = 0.988) performed better than the FR model (AUC = 0.716). Moreover; compared with the FR model; the RF model showed a higher coincidence degree between the areas in the high and the very low susceptibility classes; on the one hand; and the geographical spatial distribution of historical landslides; on the other hand. Therefore; it was concluded that the RF model was more suitable for landslide susceptibility evaluation in Yunyang County; because of its significant model performance; reliability; and stability. The outcome also provided a theoretical basis for application of machine learning techniques (e.g., RF) in landslide prevention; mitigation; and urban planning; so as to deliver an adequate response to the increasing demand for effective and low-cost tools in landslide susceptibility assessments.


Introduction
As frequently occurring geohazards in the world, landslides have features of slow movement but progressive deformation and destruction, often causing significantly severe damage in terms of losses both in human lives and properties. Landslides mainly develop in mountainous areas and cause serious threats to environments, settlements, and industrial facilities. In particular, after sliding into the river, landslides can block the river, form natural dams, and cause floods. In addition, they cause shipwrecks and casualties. On the other hand, landslides occurring in a reservoir can generate huge surges, turning over the dam and rushing downstream to destroy buildings, farmland, and roads. Sometimes, massive landslides can also trigger slight earthquakes. What is worse, it is not easy to implement monitoring and defense measures; thus, the losses tend to be extremely serious [1,2]. Seventy percent of China's territory consists of mountainous areas, with extensive and highly frequent landslide disasters, and this situation becomes increasingly serious year by year. Among them, rainfall-induced landslides are the most widely distributed, with the highest occurrence frequency and the most serious damage. Over the past 60 years, the deaths caused by landslides in China have exceeded 25,000 persons, with an average of more than 400 deaths per year; economic losses are as high as US $50 million [3]. As the most typical mountain city in China, Chongqing is ranked first among the 70 cities with severe geological disasters such as collapses and landslides [4]. From 1950 to 2011 in Chongqing, there were a total of 16,554 recorded landslide disasters, with an average of 271 times per year [5]. The grim situation makes the measures to prevent and forecast landslide disasters extremely urgent. As one of such measures, landslide susceptibility mapping (LSM) usually serve as a foundation for landslide prevention and spatial planning because they depict the possibility of landslides in a region in the future based on the influence of terrain and environment, human activities, etc. [6]. It evaluates the geographical spatial distribution of potential landslide disasters by analyzing the internal and external factors affecting them, thus making potential disasters visible in space and providing a strong reference for relevant agencies to carry out preventive measures [7].
The effectiveness of LSM depends greatly on the modeling methodologies adopted. There are many methods for landslide susceptibility evaluation, such as qualitative, deterministic, statistical, machine learning, and other methods. In qualitative methods based on experience, engineering geologists and geomorphologists use expert experience and knowledge to directly or indirectly analyze and draw LSM on the existing topographic maps and engineering geological maps [8].
The disadvantage of such a method is its subjectivity and non-quantitative nature. The deterministic method is based on physical mechanics models for slope stability calculation by inputting the physical mechanic and hydrological parameters of soils, calculating the stability of areas in GIS (Geographic Information System) software, and finally outputting the LSM. Guimarūes et al. have researched LSM by using the deterministic method [9]. However, this method assumes that the parameters are in uniformity and the landslide surface consists of loose soil, leading to the fact that the calculated results are often quite different from the actual situation. On the other hand, various statistical analysis models had been widely used in the early stage, including frequency ratio (FR), weights-of-evidence, the analytical hierarchy process, evidential belief function, information model [10][11][12][13], etc. Generally, these methods become relatively mature in the field of landslide susceptibility. After comparing various statistical methods, it was found that the performance of FR model was generally better than others. For example, Wang et al. [10] compared the FR model and the index of entropy model, finding that, in terms of the success rate curve, the area under the curve (AUC) of FR and the index of entropy models were 0.8191 and 0.8109 for accuracy, respectively. Similarly, the prediction accuracy was 81.75% for the FR model and 81.44% for the index of entropy model. Bourenane et al. [14] compared five methods (FR, weighting factor, logistic regression, weights-of-evidence, and the analytical hierarchy process), concluding that the FR method can provide a more accurate prediction (86.59%), while the logistic regression model had the lowest accuracy (70.45%). Furthermore, as a bivariate statistical method, the FR model depends on the observed relationship between the distribution of landslides and each conditioning factor, and it is easy to implement and has accurate results. As a traditional method, the FR model may gradually fade with the rapid development of machine learning. However, it is widely used in landslide susceptibility evaluation [15,16] and has been proven effective. What is more, the input, output, and calculation process of the FR model are easy to understand, and even massive data can be processed quickly and easily in the GIS environment.
With the development of geographic information systems and artificial intelligence, more and more studies begin to apply various machine learning methods, including logistic regression [17], decision tree [18], support vector machine [2], and so on. Although these methods have been widely used in the field of landslides, they have certain deficiencies, such as complex modeling processes, unstable model performance, and weak interpretation [19]. To avoid such problems, the random forest (RF) model is proposed in combination with multiple decision trees to improve prediction accuracy, and the model's output is determined by the modes of various trees. This model can handle high-dimensional and large data sets, with strong generalization ability, thus superior to traditional methods such as logistic regression [20]. Compared with its application in other fields [21][22][23][24][25][26], the RF model has just begun to be used in landslides over recent years [27,28]. Li et al. [29] applied the RF model in landslide disaster susceptibility mapping and factor evaluation. Yu et al. [30] applied this model in an empirical analysis of the relationship between landslide occurrence and landslide factors in Fujian Province of China and explored its adaptability in spatial prediction of landslides in southern China. They have achieved good landslide susceptibility model performance.
In fact, due to that the occurrence of landslides is closely related to geological features, crustal movement and human activities, etc., landslides are affected by multiple factors at varying degrees. Therefore, they have the properties of imbalance, nonlinearity, multi-scale, randomness, etc., which have not been systematically resolved in regional landslide susceptibility evaluation researches [31]. Each method has its advantages and disadvantages, and in general, their performance depends on the differences of the research areas and the factors selected. Hence, it is meaningful to use different methods to compare the landslide susceptibility in researches. Although many studies have compared various methods, few have compared machine learning and statistical analysis methods.
This study analyzed landslide susceptibility by using the RF model, which performs better as compared with machine learning methods, and the FR model, which also has had good results as compared with other statistical methods. Although the two methods are different in their algorithm, they each have their own advantages and can achieve good results. Therefore, this study compares the differences between data mining techniques and bivariate statistical analysis methods based on LSM acquired from the RF model and the FR model, and the study results can represent a theoretical and practical guidance to method selection. The evaluation of landslide susceptibility depends on the regional differences, the amount of data, and the accuracy of conditions. Selecting appropriate models and conditioning factors can facilitate satisfactory results. Previous studies proposed many factors for evaluating the susceptibility of landslides, but less consideration was given to human activities and soil erosion factors.
As a typical karst area in the Three Gorges Reservoir area, Yunyang County faces serious soil erosion, while human construction activities are also important factors inducing landslides [32,33]. Therefore, based on the existing literature, 22 conditioning factors in five types (namely, topography factors, geological factors, environmental factors, human engineering activities, and triggering factors) were selected to evaluate the LSM. Specifically, this study added the POI (point of interest) kernel density, sediment transport index (STI) [34], stream power index (SPI) [35], and terrain roughness index (TRI) [36] as factors affecting landslide susceptibility, and then established a landslide susceptibility spatial database. POI kernel density has not been used in previous studies. As one of the important factors in human engineering activities, it affects the occurrence of landslides. The advantages and disadvantages of the two models were comprehensively explored, and their results were validated and compared by using receiver operating characteristic (ROC), the area under the curve (AUC) value of ROC curve, precision, accuracy, and recall ratio. Finally, the distribution characteristics of new Yunyang County is in a karst landform. Its strata include Permian, Triassic, Jurassic, and Quaternary formations. Limestone was widely formed in the Permian and Triassic periods, followed by sandstone and mudstone in the Jurassic. In addition to dissolution, such mechanical erosions as water erosion, rainfall erosion, and wind erosion aggravate the soil and water loss in the study area, providing developmental conditions for landslides. Human engineering activities, such as town construction, resettlement, water storage, and power generation in the reservoir areas, as well as continuous precipitations, lead to frequent landslide disasters; moreover, the large scale, variety, and wide range of landslides cause huge loss of life and property [38]. After investigation, it was found that there are geological disasters such as landslides, mudflows, and collapses, and especially 987 landslides, accounting for 97.3% of all geological disasters in Yunyang County. Although relevant agencies investigate, monitor, and issue early warnings for very large landslides, new landslide disasters occur every year in Yunyang County under the influence of heavy rains in the flood season and the changed water levels due to the Three Gorges Reservoir. Because these disasters are not monitored by the National Land Ministry, how to avoid and prevent geological disasters has become a key issue for local governments.

Landslide Inventory
A landslide inventory map records the location, where known, the date of occurrence, and other information about landslides in an area [39]. Preparation of landslide inventory constitutes an essential basis for assessing the landslide susceptibility in this study [40,41], which identified and mapped a total of 987 landslides locations (2001-2016) ( Figure 3) based on historical records, satellite images and extensive field surveys, thus building up the landslide inventory. In Figure 3, landslides for training and testing are marked in different colors by GIS software. The smallest landslide area is 100 m 2 , while the largest is 3,200,000 m 2 , averaged at 95,127 m 2 . Hungr et al. [42] increased the 29 landslide types of Cruden et al. [43] to 32. Although a clear-cut boundary for landslide classification is always controversial, it still has important guiding significance for landslide research. Given the actual situations of the landslides in the study area, they were sorted out on two bases, i.e., type and trigger, according to their material composition and inducing factors ( Figure 4). In terms of type, it can be found that most of the landslides in the study area were soil ones (94.7%), while rock landslides and complex landslides (soil and rock landslides) only accounted for 3.3% and 2.0%, respectively. Although complex landslides were not included in the classification made by the above researchers, they can be described by using a combination of two or more types of landslides. On the other hand, many scholars suggested that different types of landslides should be treated separately; however, soil landslides were most frequently seen in the study area, and different types of landslides have been proven to be effective in the same method [44,45]. The landslide area is mainly distributed with residual, slope soil and mudstone, shale or siltstone, etc., without active faults. There are mostly small and medium overburden soil landslides, most of which are developed in low mountain areas and hilly landform with a slope of 20-40 degree, and the stability is poor. The bedrock of the overburden landslides is mainly J2s or J3s strata. The materials of the landslides are primarily composed of purple or brown silty clay and gravel. The above materials contribute to the penetration of surface water, and are easily softened in water, which is an important internal cause of landslide deformation. In terms of trigger, most of the landslides in the study area were caused by rainfall (84.8%), rainfall and Reservoir water (14.4%), coupling (0.7%), and human engineering activities (0.1%, only one). Therefore, rainfall is the main trigger of landslides in Yunyang County.
Typical landslides in Yunyang County, such as the Dashiba Landslide, Jiuxianping Landslide, and Liekou Mountain Landslide (Figure 5a-c), are located around reservoirs. The Dashiban Landslide and Liekou Mountain Landslide occurred on 1 April 2014 (in spring), and the Jiuxingping Landslide occurred on 15 June 2012 (in summer). Due to the rainy spring and summer in the study area, numerous fissures favored by the strong permeability of limestone and clays enable rainfall infiltration, river erosion, and rock spreading [46]. The Dashiba Landslide is a soil landslide with a volume of 1.365 × 10 8 km 3 , resulting in 432 people injured and a direct economic loss of up to 20.4 million CNY. The Jiuxianping Landslide is a rock landslide, still in poor stability currently, causing a new highway to have obvious and widening cracks. In addition, a crematorium and a cemetery were evidently deformed ( Figure 5b). The volume of the landslide is 2.7 × 10 6 km 3 , causing property damage of up to 16 million CNY. The Liekoushan Landslide is a mixed one, combining the features of bedrock and accumulated-layer landslides, with a volume of 1.342 × 10 6 km 3 , causing property damage of 450,000 CNY. The analysis of typical landslides is helpful to understand the specific situation of the landslide in the study area. Subsequently, the new landslides in 2017 will also be considered in Section 4.2 for accuracy evaluation of the RF and FR model.

Conditioning Factors of Landslides
The development of landslide disasters is not only controlled by the geological conditions of the slopes, but also by external factors such as hydrological conditions, climatic conditions, and human engineering activities [47]. Reichenbach et al. [48] identified 596 factors for landslide susceptibility and classified the factors into five types: geology, hydrology, land cover, landforms, and others. The selection of conditioning factors as input variables in models is a crucial step in LSM. Based on the existing research results and literature [49,50] as well as the overall characteristics of landslide development in Yunyang County, this study selected 22 secondary conditioning factors under five primary conditioning factors to construct a basic evaluation system for landslide susceptibility. The acquisition path and classification of conditioning factor is directly related to their different natures [51], as described on Table 1.

Topographic Factors
Topographic factors include elevation, relief degree of land surface (RDLS), slope, aspect, slope position, curvature, plan curvature, profile curvature, micro-landform, topographic wetness index (TWI) terrain roughness index (TRI) sediment transport index (STI) and stream power index (SPI). These factors were all calculated with a digital elevation model (DEM). Aspect, slope position, and micro-landform are categorical factors, and thus, should be separated first (Table 1). Others are continuous factors and should be discretized. The conventional factors include: elevation, RDLS, slope, aspect, slope position, curvature, plane curvature, and profile curvature (Figure 6a-h). Micro-landform is a relatively small geomorphology unit, and there exist 10 types, including canyons, deeply incised streams, mid-slope drainages, etc. (Figure 6i). TWI, TRI, STI and SPI were extracted with ArcGIS 10.4 software. TWI defines the amount of water flow accumulated at any site in a catchment and the ability of the water to flow downward under gravity (Figure 6j). TRI map and STI map (Figure 6k,l) were prepared and divided into five subclasses and six subclasses, respectively. SPI is the power of water flows in cases of erosion ( Figure 6m). TWI, TRI, SPI and STI are defined as: where A is the flow accumulation in square meters (m 2 /m) and β is the slope (in degrees).
defines the amount of water flow accumulated at any site in a catchment and the ability of the water to flow downward under gravity (Figure 6j). map and map (Figure 6k,l) were prepared and divided into five subclasses and six subclasses, respectively.
is the power of water flows in cases of erosion ( Figure 6m). , , , and are defined as: where A is the flow accumulation in square meters (m 2 /m) and β is the slope (in degrees).

Geological Factors
Three factors were used as geological factors: lithology, distance from fault, and combination reclassification of stratum dip direction and slope aspect (CRDS). As an important internal cause of landslides, different lithology features have large differences in physical and mechanical parameters  Figure 7c) is a preliminary evaluation method of slope stability based on topographic maps and geological maps, and it was classified into seven types [52].

Environmental Factors
As external factors affecting landslides, the normalized vegetation index (NDVI) and land cover are environmental factors. A positive NDVI value refers coverage of active forests or other vegetation biomasses. According to the NDVI values of the study area, its landslides generally occur in bare soils and grasslands. NDVI was classified into five subclasses (Figure 8a). Land cover is widely considered as an important factor in small and medium landslides. In fact, the roots of vegetation can reinforce the soil and increase soil shear strength. There are many types of land cover in the study area, and they were used as factors for landslides ( Figure 8b).
As external factors affecting landslides, the normalized vegetation index (NDVI) and land cover are environmental factors. A positive NDVI value refers coverage of active forests or other vegetation biomasses. According to the NDVI values of the study area, its landslides generally occur in bare soils and grasslands. NDVI was classified into five subclasses (Figure 8a). Land cover is widely considered as an important factor in small and medium landslides. In fact, the roots of vegetation can reinforce the soil and increase soil shear strength. There are many types of land cover in the study area, and they were used as factors for landslides ( Figure 8b).

Triggering Factors
Rainfalls and distances from rivers are the main trigger factors in the area. In total, 84.8% of its landslides are caused by rainfall. There exists uneven distribution of rainfall in Yunyang County ( Figure 9a). The surface water formed by rainfall would not only wash the slope surfaces, but also infiltrate and soften the rocks and soils. In consequence, the slippage-resisting ability of slopes is reduced. Riverbank erosion is another essential cause of landslides [53]. Owing to bank cutting and toe erosion, slope bodies near rivers are prone to landslides. Chen et al. [54] applied an equal interval method to rivers, and therefore, it is categorized to equally partitioned segments. Hong et al. [34] performed the similar work that they use 200 m interval to produce the river buffer zones. The result shows that the higher correlation between landslides and the rivers within 200 m. Figure 9b shows most of the historical landslides are distributed along the rivers. The distance from the rivers was divided into seven classes. The density of the landslide is the highest within 200 m from the rivers, which is the main range of reservoir water to impact on landslides. Because the study area is in the center of the Three Gorges Reservoir, and the periodic rise and fall of the water level is one of the main causes of landslides. The water level in the reservoir fluctuate repeatedly from 145 to 175 m, forming a riparian zone with a height difference of 30 m. Chongqing's riparian zone has an area of 306.3 km 2 and the coastline is 4881.4 km. Meanwhile, Yunyang County is one of the four counties with the largest riparian zone. Long-term and periodic ups and downs of the reservoir water level caused the water flow to wash away a lot of soil, the river bank to become steeper, the gravity of the

Triggering Factors
Rainfalls and distances from rivers are the main trigger factors in the area. In total, 84.8% of its landslides are caused by rainfall. There exists uneven distribution of rainfall in Yunyang County (Figure 9a). The surface water formed by rainfall would not only wash the slope surfaces, but also infiltrate and soften the rocks and soils. In consequence, the slippage-resisting ability of slopes is reduced. Riverbank erosion is another essential cause of landslides [53]. Owing to bank cutting and toe erosion, slope bodies near rivers are prone to landslides. Chen et al. [54] applied an equal interval method to rivers, and therefore, it is categorized to equally partitioned segments. Hong et al. [34] performed the similar work that they use 200 m interval to produce the river buffer zones. The result shows that the higher correlation between landslides and the rivers within 200 m. Figure 9b shows most of the historical landslides are distributed along the rivers. The distance from the rivers was divided into seven classes. The density of the landslide is the highest within 200 m from the rivers, which is the main range of reservoir water to impact on landslides. Because the study area is in the center of the Three Gorges Reservoir, and the periodic rise and fall of the water level is one of the main causes of landslides. The water level in the reservoir fluctuate repeatedly from 145 to 175 m, forming a riparian zone with a height difference of 30 m. Chongqing's riparian zone has an area of 306.3 km 2 and the coastline is 4881.4 km. Meanwhile, Yunyang County is one of the four counties with the largest riparian zone. Long-term and periodic ups and downs of the reservoir water level caused the water flow to wash away a lot of soil, the river bank to become steeper, the gravity of the front edge of the landslide to decrease, and the supporting force of the front edge of the landslide to decrease, resulting in the decrease in the stability of the landslide. In particular, the slope bank was soaked for a long time by the reservoir water, which caused the soil to become soft. Additionally, during the fall of the reservoir water level (January to May), groundwater level drops slower than reservoir level, which increased sliding force of landslides significantly, after which the landslide occurred. decrease, resulting in the decrease in the stability of the landslide. In particular, the slope bank was soaked for a long time by the reservoir water, which caused the soil to become soft. Additionally, during the fall of the reservoir water level (January to May), groundwater level drops slower than reservoir level, which increased sliding force of landslides significantly, after which the landslide occurred.

Factors of Human Engineering Activities
Factors of POI kernel densities and distance from roads are related to human engineering activities. POI is based on location services and usually contains the name, address, longitude, latitude, category, etc. If each POI site is regarded as a functional unit, then the higher the POI density, the more concentrated the urban functions in an area. POI kernel density analysis was made with ArcGIS software, which is often used to identify urban centers, economic vitalities, and intensities of human activities [55][56][57]. However, in the field of LSM, no research has been found when using POI as an influencing factor in human engineering activities thus far. The construction of massive roads is a process where human transform the natural environment, which includes transportation, erosion, and accumulation of surface soil. Excessive digging, application of external loads, and vegetation destruction lead to steep slopes and loose soil. Finally, precipitation and earthquakes can trigger landslides. Wen et al. [58] used 100 m as the road buffer interval in the region along the highway of Mao County. Bourenane et al. [59] made a bivariate statistical and expert approaches of the landslides in the city of Constantine, Algeria. They indicated that the main range to affect landslides was within less than 200 m from the roads. Therefore, we categorized the road at equal intervals ( Figure 10b). According to the statistics, the landslide density is the largest within 200 m from the roads in the study area. As the buffer zone increases, the landslide density shows a decreasing trend, which shows that the main range of landslide occurrence is within 200 m of the roads in Yunyang County.

Factors of Human Engineering Activities
Factors of POI kernel densities and distance from roads are related to human engineering activities. POI is based on location services and usually contains the name, address, longitude, latitude, category, etc. If each POI site is regarded as a functional unit, then the higher the POI density, the more concentrated the urban functions in an area. POI kernel density analysis was made with ArcGIS software, which is often used to identify urban centers, economic vitalities, and intensities of human activities [55][56][57]. However, in the field of LSM, no research has been found when using POI as an influencing factor in human engineering activities thus far. The construction of massive roads is a process where human transform the natural environment, which includes transportation, erosion, and accumulation of surface soil. Excessive digging, application of external loads, and vegetation destruction lead to steep slopes and loose soil. Finally, precipitation and earthquakes can trigger landslides. Wen et al. [58] used 100 m as the road buffer interval in the region along the highway of Mao County. Bourenane et al. [59] made a bivariate statistical and expert approaches of the landslides in the city of Constantine, Algeria. They indicated that the main range to affect landslides was within less than 200 m from the roads. Therefore, we categorized the road at equal intervals ( Figure 10b). According to the statistics, the landslide density is the largest within 200 m from the roads in the study area. As the buffer zone increases, the landslide density shows a decreasing trend, which shows that the main range of landslide occurrence is within 200 m of the roads in Yunyang County. Table 2 shows the data and their sources, types, and accuracy for the above 22 secondary conditioning factors. Historical landslide data and related geography, topographical geology, POI, and other data come from the years between 2001 and 2016, and they have temporal and spatial consistency with historical landslides.  Table 2 shows the data and their sources, types, and accuracy for the above 22 secondary conditioning factors. Historical landslide data and related geography, topographical geology, POI, and other data come from the years between 2001 and 2016, and they have temporal and spatial consistency with historical landslides. In summary, based on the investigation, extensive literature review and manual classification (expert experience), factors were categorized. All conditioning factors were converted into 30 m × 30 m grid units to establish a geospatial database of landslide conditioning factors. The common unit  In summary, based on the investigation, extensive literature review and manual classification (expert experience), factors were categorized. All conditioning factors were converted into 30 m × 30 m grid units to establish a geospatial database of landslide conditioning factors. The common unit types used in LSM include grid units, drainage basin units, slope units, etc. Except for the first type, which is a regular unit, the others are irregular units. As the length and width of the landslides are relatively small, the grid units are the most prevalent method to represent the datasets of landslides [60]. On the contrary, it is better to use drainage basin units or slope units to evaluate mudflows, as they are mostly narrow and long.
In order to reduce the data dispersion, all the 22 factors after reclassification should be normalized. Among them, qualitative data such as lithology, land cover, slope position, micro-landform, aspect, and CRDS are divided into different classes before normalization. Then, the factors were transformed linearly after assigning an integer value (Starting from 1) to each class, so that their values were reduced to the [0, 1] interval. The normalization formula is: where X* is the normalized data; X is the original data; X min is the minimum value after each factor is assigned; and X max is the maximum value after each factor is assigned.

Methodology
This study is purposed to compare the machine learning method and statistical analysis method for LSM. The methodological framework of this study mainly includes five parts, as shown in Figure 11: (1) data preparation, including: collection of information on historical landslides and non-landslides, preparation of training and testing datasets before 10-fold-cross validation, and selection the landslide-conditioning factors; (2) landslide susceptibility modeling by using the RF model and the FR model; (3) drawing LSM maps by using the two models; (4) validation and comparison, covering: the ROC curve, AUC value (the area under the ROC curve), precision, accuracy, and recall ratio; and (5) verification and comparison of the two models, the new landslides in 2017, and the importance of the conditioning factors.

Preparation of the Training and Testing Datasets
LSM with the RF model (Machine learning) can be considered a binary classification. First and foremost, an adequate number of valid samples (landslide and non-landslide data) are extremely important. Especially, in machine learning methods, adequate data are needed to ensure a high learning performance. Because the information on historical landslides is limited in the study area, selecting more non-landslides could expend the quantity of sample data. On the other hand, based on previous researches of other scholars [61], when the ratio of positive samples to negative samples is 1:10, the model would deliver a better prediction performance. Hence, 987 positive samples and 9870 negative samples were collected into a dataset in this study. Moreover, in order to select the "non-landslide area" as widely as possible, this study considered the area excluding the 500 m buffer zone of all landslides and excluding the part where the rivers in the study area are located is taken as the non-landslide area.
Overfitting is a problem that cannot be ignored in RF model, although the model has good performance when dealing with big data. A very useful technique for testing and avoiding overfitting is cross-validation (Rotation Estimation). The data set was randomly divided into two subsets: 70% of the samples were used as training and 30% for testing. In order to get a reliable and stable model, Figure 11. The methodological framework of the study.

Preparation of the Training and Testing Datasets
LSM with the RF model (Machine learning) can be considered a binary classification. First and foremost, an adequate number of valid samples (landslide and non-landslide data) are extremely important. Especially, in machine learning methods, adequate data are needed to ensure a high learning performance. Because the information on historical landslides is limited in the study area, selecting more non-landslides could expend the quantity of sample data. On the other hand, based on previous researches of other scholars [61], when the ratio of positive samples to negative samples is 1:10, the model would deliver a better prediction performance. Hence, 987 positive samples and 9870 negative samples were collected into a dataset in this study. Moreover, in order to select the "non-landslide area" as widely as possible, this study considered the area excluding the 500 m buffer zone of all landslides and excluding the part where the rivers in the study area are located is taken as the non-landslide area.
Overfitting is a problem that cannot be ignored in RF model, although the model has good performance when dealing with big data. A very useful technique for testing and avoiding overfitting is cross-validation (Rotation Estimation). The data set was randomly divided into two subsets: 70% of the samples were used as training and 30% for testing. In order to get a reliable and stable model, the datasets were divided into ten independent subsets that all included 70% training set and 30% testing set (so-called 10-fold cross-validation). The random forest function in R studio software was used to develop the RF model with the training dataset.

Random Forest (RF)
By building multiple decision trees from different subsets of data, RF is an integrated method that combines the ideas proposed by Breiman [62] and the methods described by Ho [63]. Compared with the traditional landslide division methods, the RF method introduces two random samplings (samples and features). The decision trees improve the accuracy and stability of the model better than a single decision tree by using a randomly generated method to select samples and features. Then, the judgment results of multiple decision trees are voted to get the final output. Many studies have shown that the RF has high tolerance in terms of algorithms, outliers, and noises [64] and can process multi-dimensional data without feature selection, with an easy implementation process in parallel. In this study, the RF consists of two trees (landslide and non-landslide), and each is constructed by using 22 random features.
The key point of RF is to combine n independent decisions [y(X, θ k ; k = 1, 2, . . . n)] to build a model. Each decision tree in the model judges or predicts the samples. Different classification models y 1 (X), y 2 (X), . . . , y k (X) are obtained after samples training. Then, these classification models can be used to build RF models: where Y(X) represents an RF model, y i (X) denotes a single decision tree model, Z means output variable, and I(.) is an explicit function. Figure 12 shows the steps of the RF algorithm. The procedure of RF is summarized as follows: (a) Determine the value of mtry i.e., to generate mtry variables for the binary tree on the nodes randomly. The choice of binary-tree variables needs to meet the principle of the minimum impurity. (b) On the one hand, the model uses the bootstrap method to randomly select ntree sample sets in the original data set to form ntree decision trees. On the other hand, unsampled samples are used for the prediction of a single decision tree. (c) ntree decision trees constitute a RF model, and then, the samples are predicted or classified based on the generated RF. The principle of classification is voting and the principle of prediction is a simple average. mtry and ntree are two main parameters in RF model. The mtry parameter refers to the number of variables used in each decision tree, while ntree refers to the number of trees that the random forest contains [65]. Generally, mtry has a default of 2. On the other hand, it is also equal to the square root (classification model) or one-third (regression model) of the number of variables. Therefore, this study set mtry to 7. After solving the mtry s value, it is brought into the RF model for training. While the out-of-bag (OOB) error is stable, the minimum value of the abscissa is ntree. In Figure 13, the proportion of misclassifications over all out-of-bag elements is the out-of-bag (OOB) error, which is an unbiased estimate of the generalization error. As the number of trees increases, the generalization error is always becoming steady. Hence, when trees are close to 860, the OOB error of the model tends to be stable. This study set ntree to 860. In the process of building decision trees, this study uses the Classification and Regression Tree (CART) algorithm to split the nodes. CART follows the minimum principle of Gini. At node t, CART randomly extracts object is assigned to class i according to probability p( j t) . The estimated probability that the object belongs to class j is p( j t) . Under this rule, the estimated probability of misclassification is:

Frequency Ratio (FR)
The FR model is based on the classification of certain conditioning factor states and calculates the degree of influence of each level state on landslides [15], which is a statistical analysis method based on susceptibility evaluation. The FR is defined as the ratio of the probability of the occurrence of landslides to the probability of non-landslide in given area. The model deduces the spatial relationship between the landslide occurrence locations and various factors affecting the landslide occurrence, improves the accuracy of state classification, and reveals the correlation between the landslide locations and various factors in the study area [66]. The frequency ratio (Fr i ) assesses the relative importance of each class with respect to landslides. In order to implement the FR method, this study converted each factor to different classes (Table 6), and the Arc GIS software was used to produce the number of cells (for landslides and non-landslides) and the value of FR, which is defined as: The Fr i index indicates the importance of the states of the conditioning factors for the occurrence of landslides: FR > 1 indicates that the state has a high correlation with landslide occurrence, and FR < 1 indicates a low correlation.
The summation of each factor's ratio was used to calculate the landslide susceptibility index (LSI) where Fr i is the FR of each factor's class (i = 1, 2, 3 . . .) and LSI is an index of the entire study area's landslide.

Evaluation of LSM Models
Model evaluation is the key to reflect model performance, and different aspects can be assessed for a model. The accuracy, precision (positive predictive value), sensitivity (true positive rate), and specificity are usually considered effective indicators of the fitting and predictive accuracies. Therefore, this paper applied these indicators to evaluate and compare the performances of the two models in the present research (Table 3). Moreover, the ROC curve is also a method to measure the effectiveness of a model. The AUC value is used as the basis for determination [67]. This value ranges from 0.5 (very poor performance) to 1.0 (perfect performance). When the AUC value is greater than 0.7, the closer it is to 1, the more accurate the model's prediction. The value of AUC can be computed by the trapezoidal rule of integral calculus, as shown in the Equation (10).
where X p is specificity and S p is sensitivity. Table 4 shows the accuracy of the 10-fold cross-validation of the RF model. The average accuracy of the test dataset of the RF model was 0.907, and Subset 8 had the highest accuracy (0.918) by 10-fold. Hence, the RF model was constructed by using the training dataset of Subset 8. The trained RF model was applied to the geospatial database to simulate the probability of landslides for each grid in the study area. According to the expert experience method [44], the prediction results of the RF were divided into five classes [68]: very low (<0.06), low (0.06-0.12), medium (0.12-0.21), high (0.21-0.31), and very high (>0.31). Figure 9 and Table 4 show the resulting spatial probability of the landslide distribution maps derived from the RF model.

LSM Acquired by RF Model in the Study Area
According to the susceptibility evaluation map (Figure 14), most areas of Yunyang County showed low susceptibility and concentrated in relatively flat areas, such as the upper-middle and northeast regions. The areas with high susceptibility to landslides were concentrated on both sides of the Yangtze River and its tributaries, mainly in the southwest and northwest of Yunyang County. Due to scours and soaks of rivers, the soils become loose, extremely prone to landslides under the influence of gravity. On the other hand, within the elevation range of the southwest and northwest parts, the population density is high and human engineering activities are intensive, thus changing the surrounding geological environments and affecting the occurrence of landslide disasters. Moreover, the distribution of areas with high susceptibility to landslides was almost consistent with that of the historical landslide locations (Figure 14a,b). LSM is a qualitative evaluation of model performance, while statistical analyses (accuracy statistics approach) are with more specific and qualitative features. Table 5 shows the distribution of historical landslides in the five classes. The regions with high and very high susceptibility to landslides accounted for 12.8% of the total area, but 77.5% of the landslides were in these regions. The regions with low and very low susceptibility to landslides accounted for 62.6% of the total area, while only 8.5% of the landslides were in these regions. This means that the landslide locations have a high spatial correlation with the landslide susceptibility. The evaluation shows that the landslide density was increased by approximately 276 times (from 0.021 to 5.806) from very low to very high.

LSM Acquired by the FR Model in Study Area
Based on the analysis of the relationship between the 22 conditioning factors and the landslide occurrence, the application of FR method produced the Fr i indexes of each class ( Table 6). The factors with the most significant correlations with landslide included elevation, slope, lithology, NDVI, annual average rainfall, land cover, distance from roads, and POI kernel density. In particularly, dem showed a negative correlation in all classes, with values of Fr i increasing with dem. The first classes (<690 m) have a Fr i of more than 1, showing a strong correlation to landslides. Concerning slope, the relationship between Fr i and slope is a rule of increasing first and then decreasing. The values from the first three to five classes are more than 1, which indicates that the gentle and incline slope have a greater impact on landslides. The values of Fr i are also strongly correlated with different types of lithology. As the most widely distributed strata in the study area, J2 and J3 have a strong correlation with the Fr i value. NDVI shows a positive correlation in the first four and five classes, which are covered with rich vegetation and play a key role in limiting the occurrence of landslides. The annual average rainfall is the main trigger factor of landslides in the study area. While the number of grids and landslides decreases with increasing rainfall, they have a typical complex correlation with Fr i . Concerning land cover, the values of Fr i indicate a positive correlation with 'Farmland' and 'Transportation,' while 'Residential land' has indexes slightly less than 1, possibly indicating an obvious impact of human activities on the occurrence of landslides. Lastly, distance from roads and POI kernel density are two factors of typical human engineering activities. The distance from 0 to 300 m and POI kernel density from the first three to seven classes have a close correlation with landslides. The development and construction in the county in recent years has reduced vegetation covers and damaged slope structures, resulting in an increase of landslides and other natural disasters.
With the help of the FR model, the spatial relations, i.e., LSM, between historical landslides' locations and contributing factors for the occurrence of landslides were derived ( Figure 15). The landslide susceptibility index (LSI) was between 16.50 and 27.99. As in the RF model, based on expert classification, the LSI was divided into five categories: very low (<21.10), low (21.10-22.19), medium (22.19-22.93), high (22.93-24.54), and very high (>24.54).
It can be seen from the susceptibility evaluation map (Figure 15) that most areas of Yunyang County have middle-high susceptibility of landslides, almost wholly distributed on both sides of the Yangtze River, which basically conforms to the historical landslides' distribution. However, the above distribution pattern is different from the actual landslide distribution in Yunyang County, which mainly expanded the overall landslide susceptibility area compared with the historical landslides' distribution.  Moreover, statistical analysis was used to quantitatively evaluate the effectiveness of the model, including the percentage of each susceptibility classification, the number of landslides, the proportion, and density proportion of each category, as shown in Table 7. At the same time, the regions with high and very high susceptibility to landslides in the grading statistics table accounted for 26.7% of the total area, but 52.7% of the landslides were in these regions. The regions with low and very low susceptibility to landslides accounted for 51.6% of the total area, while 24.0% of the landslides were in these regions. The evaluation shows that the landslide density increased by approximately 30 times (from 0.063 to 0.968) as the susceptibility class increased from very low to very high.

Validation and Comparison
Validation is very important for the generated LSI map, so as to evaluate its prediction result reliability. After the landslide susceptibility models were trained and tested, the following four evaluation statistics were used to evaluate the two models: the accuracy, precision, recall rate, and AUC. The results in Tables 8 and 9 present the confusion matrix of the RF model and the FR model, which were classified by using the library 'Information Value' to select a threshold better for the R statistical programming environment instead of the traditional threshold of 0.5. Regarding the accuracy, it can be found that the RF model significantly performed better (0.992), while the accuracy of the FR model was 0.600. Concerning the precision, landslide precision of RF (0.990) was much better than FR (0.147), showing that true landslides account for too few predicted landslides in FR model. On the other hand, non-landslide precision of RF (0.992) was also better than FR (0.954), but both had a similar performance in non-landslides. Because the number of non-landslides in this study is 10 times the landslides, leading to a result that whether for RF or FR, non-landslides were better classified than landslides. The landslide and non-landslide recall rates of RF were both above 0.9, which verify the classification outcomes are rational. For the FR model, the reason why the non-landslide recall rate was relatively low (0.538) is that the areas of high and very high susceptibility were enlarged, resulting in many non-landslides falling in these areas. In fact, compared to the recall rate, the accuracy can better Additionally, this study verified the results of LSM of the two models based on the sample database by performing AUC of ROC. According to Chen et al. [69], an AUC value can be quantified as follows: poor (0.5-0.6), average (0.6-0.7), good (0.7-0.8), very good (0.8-0.9), and excellent (0.9-1). Figure 16 shows the ROC curves of the RF model and the FR model, with AUC values of 0.988 and 0.716, respectively. Both models had an AUC value of more than 0.7 (good and above), so they can be used in analyzing landslide susceptibility of the study area. However, similar to other statistical results, the AUC value for RF was also observed to be better than for FR. Hence, it can be concluded that the predictive performance of RF was better than the FR.

The Comparison of the Two Models
In this study, the RF model and the FR model were selected for a comparative study of landslide susceptibility. The RF model is innovative and has been applied to landslide susceptibility only in recent years, and has shown good performance. The FR model is a traditional method to evaluate landslide susceptibility and can also deliver acceptable results. In this study, a qualitative assessment as shown in Figures 14 and 15 indicates that the results of LSM were similar for both models. Nevertheless, the high and very high regions of FR had obviously larger area than RF, which means that the landslide area is beyond the normal range. This suggests adequate robustness in RF compared with FR. Figure 17 shows the quantitative distribution of landslide susceptibility classes in RF and FR models. Concerning the percentage of susceptibility regions (Figure 17a), the two models have both similarities and differences: the 'Low' 'Medium' and 'Very high' classes of RF and FR are similar, while it is much different in the 'Very low' and 'High' classes. Consequently, the reason why the performance of the FR was worse than that of the RF is that most areas of Yunyang County were located in very low susceptibility class regions by using the RF model, but low susceptibility class regions by using the FR model. On the other hand, the percentage of susceptibility regions in RF is decreasing as the susceptibility classes goes up (from 'Very low' to 'Very high'), but the FR does not follow this rule. Regarding the percentages of landslides in RF and FR (Figure 17b), they had a positive correlation to the susceptibility class. Except for the class of 'Very high' the percentage of landslides of the other four classes of FR are higher than RF's, resulting in an increased percentage of landslides falling in the four classes of FR. The division of susceptibility regions should follow the guidelines that the density of historical landslides in the low susceptibility area is the smallest, while it is the largest in the high susceptibility area; thus, LSM is consistent with the actual situation of the distribution of historical landslides, which is the basis of expert experience. Finally, the percentage of landslides and percentage of susceptibility regions show a clear negative correlation. The accuracy, precision, recall rate, and AUC were used to evaluate and compare the capability of the two models. The results showed that the four indexes of the RF model were higher than the FR model. These results may suggest that the conventional method limited the capability for landslide prediction. Such results are similar to the related study [70] that the FR is not highly selective in classification in the study area. This limitation is due to the fact the FR is realized by summing for each cell Fr i obtained considering a causative factor at the same time, while the RF can balance errors for unbalanced data sets, and it can deal with classification and regression problems well. Nevertheless, Hong et al. [71] focused on using four methods to evaluate and compare landslide susceptibility, finding that the AUC value of FR (0.8134) is higher than the AUC value of RF (0.7172). This difference can be explained by the fact that they have small sample sizes (a total of 163 landslides events), but RF operates by constructing a multitude of decision trees; thus, it needs enough samples to perform better.
The results showed that the prediction accuracy of landslides in RF was significantly higher than in the FR model, suggesting that machine learning is usually a good complement to the statistical method when the research issues need special attention to predictability. Secondly, when the sample size is large, the prediction ability of the machine learning method will be greatly promoted. In addition, another reason for better performance of RF than FR model is that the RF has been optimized for two important parameters (mtry and ntree) and the 10-fold cross-validation, while the FR model has not been optimized. For this problem, Guo et al. [72] combined the FR model and the logistic regression model to improve the accuracy of landslide susceptibility evaluation by 4-9%. Subsequent research will need to consider the coupling model to improve the performance for landslide susceptibility.

Distribution Characteristics of New Landslide Events
The new landslide distribution map was used as another method to evaluate the accuracy of the models [73]. Information on three new landslides in 2017 was collected in the region. After projecting the locations' coordinates to landslide susceptibility of the two models, it was shown that all the new landslides were in high or very high susceptibility regions (Figure 18a,b).
The cases of the three new landslides were analyzed to compare the two models: (a) the Longwang Temple Landslide in Panlong Street, Yunyang County. This occurred on 14 September 2017, a typical rainy season in Yunyang County. Like most landslides in the County, the type of the new landslide was soil-related and was induced by precipitation. Concerning the susceptibility mapping of the RF, this new landslide fell in medium and high susceptibility regions, while it only fell in very high susceptibility region in FR. (b) The Daowan Landslide was a soil landslide located in Lao Cao Town, Yunyang County, and occurred on 12 August 2017. Figure 18 shows that the Daowan Landslide was in the saddles of two mountains and occurred during the erosion of rainfall. Whether in RF or FR, this new landslide fell in high or very high susceptibility areas. (c) The Dalishu soil landslide occurred on 6 September 2017. Like the two new landslides above, it was caused by rainfall and fell in high or very high susceptibility region in RF and FR models.
The three new landslides' locations were compared and analyzed in the two models. Longwang Temple Landslide was in medium and high susceptibility areas, while Daowan Landslide and Dalishu Landslide were in high or very high susceptibility areas. The above results indicated that both the RF model and the FR model had an acceptable prediction accuracy.

Importance of Contributing Factors
Effective and contributing factors play an important role in affecting the prediction accuracy of landslide susceptibility [74]. For this reason, the RF model can give the importance of ranking the landslide factors by the mean decrease accuracy [44]. The importance of each evaluation factor and its impact on the landslide susceptibility are different. Therefore, analyses of the importance and impact of the factors can provide important guidance for landslide disaster prediction and prevention. R studio software (https://rstudio.com/) was used to calculate the mean decrease accuracy of each factor, i.e., changing the order of the factors, and then analyzing the reduction degree of the accuracy of prediction in the RF model after disorder. The larger the value, the greater the significance of the factor. However, when the variables contain noise and correlation, it will affect the importance of ranking results. If the importance ranking is made only once, the result is often inaccurate [75]. Therefore, this study used the averages of 10 times as the final ranking result ( Figure 19). On the other hand, we retained the model by reducing each of the 22 factors in turn. Figure 20 shows the AUC value for each retraining. The above two methods were combined to evaluate the importance of factors. Figure 19 shows that the elevation, annual average rainfall, and slope are the top three factors affecting the susceptibility of landslides in Yunyang County, with a mean decrease accuracy of 56.00, 48.74, and 45.07, respectively, while the most insignificant factor was the distance from faults, with a mean decrease accuracy of 2.91. Similarly, Figure 20 shows that the elevation had the greatest influence on the AUC value of RF model, followed by lithology. Secondly, if POI kernel density, distance from roads and distance from rivers are removed, the predictive ability of the RF model will be reduced by 0.003. Although the POI kernel density, an innovation factor in this study, was lower in the ranking of the factors, it contributes to the prediction accuracy of the model. Furthermore, they were also similar to the typical factors in the FR model, and the key to affect the occurrence of landslides in Yunyang County. In order to better analyze the relationship between factors and landslide, a statistical map of historical landslide density ranking of the four types was drawn: elevation, annual average rainfall, slope, and lithology (Figure 21a-d). In contrast, the distance from fault was the least significant factor ( Figure 21e). The abscissa corresponds to the factor value class, and the ordinate corresponds to the landslide density. Thus, the larger the ordinate value is, the easier a landslide occurs.   It can be seen from Figure 21a that the landslide density had a negative correlation with the elevation: the landslide density is higher at lower elevations. Yunyang County sits in a typical mountainous environment, with a large elevation difference and a low elevation. The areas at lower elevations have sparse vegetation coverage and loose soils. Most of such areas are in the watershed regions, with heavy human engineering activities; therefore, landslides occur frequently. The areas at higher elevation have thick vegetation coverage and few human activities; therefore, fewer landslides occur. Secondly, Figure 21b shows that the landslide density had a negative correlation with the average rainfall over the years. Rainfall is the main cause of landslides in the study area. Especially in the monsoon seasons, a scouring effect is imposed on the slope surface, causing unstable rocks and soil particles on slope surfaces to be carried away by surface runoff formed by rainfall. Erosion usually occurs in the slope body, and the average rainfall over many years also affects the development of vegetation, which in turn affects the development of landslides. However, the annual average rainfall is the average of the rainfall for many years, and it is different from short-term rainfall, which includes antecedent rainfall and rainfall on the day of a landslide. On the one hand, the annual average rainfall not only affects the slope itself, but also affects the development of vegetation. Generally, the more the rainfall, the lusher the vegetation, thereby reducing the possibility of landslides. On the other hand, generally, the areas with the most precipitation are in the middle of mountains, and the precipitation will decrease significantly near the top. Human activities are more significant in the low and middle mountains, which may affect the occurrence of landslides greatly. The landslide density and slope showed a typical normal distribution relationship (Figure 21c). The literature review revealed that a single conditioning factor such as slope may not necessarily always have high importance in landslide susceptibility evaluation [76]. However, the slope still plays a role in the landslide susceptibility model for Yunyang County. Landslides may be induced by a steep slope and varying shear stresses that is external forces affect deformation and sliding relatively of landslide body [77]. On the other hand, gentle slopes are expected to have a low frequency of landslides, because of the common lower shear stresses that are associated with low gradients. Lithology is a kind of categorical variable; J2 and J3 are the most widely distributed strata in the study area, and a high landslide density appear with them ( Figure 21e). The distance to faults is of the lowest importance (Figure 21d), and there was no obvious correlation between the landslide density and faults. This is related to the fact that there is only one small fault in Yunyang County. Wen et al. [78] showed that faults have little effect on landslides, except for the strong earthquakes area.

Conclusions
LSM provides the possibility of occurrence of landslides and is a useful tool for the prevention and evaluation of landslides. This study took Yunyang County, a typical and severely-affected area in the Chongqing Section of Three Gorges Reservoir, as a research case, and employed the RF model and the FR model to conduct a comparative study of LSM. The main conclusions are as follows: (1) A total of 987 historical landslides are identified with landslide susceptibility inventory, which contains the historical records, satellite images, and extensive field surveys, and 94.7% of the landslides are soil landslides, while 84.8% are induced by rainfall. Subsequently, 70% of the landslides were used as the training dataset and 30% as the testing dataset. Twenty-two factors in five categories, including elevation, slope, slope position, aspect, and lithology, were selected as the contributing factors of landslides in Yunyang County. By optimizing two important parameters of RF, with 10-fold-cross validation for the best sample on R software, a more efficient RF model can be built to evaluate landslide susceptibility. As a result, the LSM was produced with the two models. (2) In mapping evaluation, the RF model had 77.5% of historical landslides falling in the regions with high or very high susceptibility, accounting for about 12.8% of the total area. The regions with low or very low susceptibility to landslides accounted for 62.6% of the total area, while only 8.5% of landslides were in these areas. On the other hand, the FR model had 52.7% of the landslide falling in the high or very high susceptibility regions, accounting for of 26.7% of the total area. The regions with very low or low susceptibility accounted for 51.6% of the total area, while 24.0% of the landslides were in these areas. The AUC values under the ROC curve of the RF model and the FR model were 0.988 and 0.716, respectively. Similarly, accuracy, precision, and recall ratio of RF were higher than FR. Furthermore, in high and very low classes, RF performed better. In addition, the susceptibility mapping results of the two models both had a high spatial correlation with new landslides in 2017. The evaluation results above show that the RF model has higher accuracy, reliability, and stability. The RF model is more suitable for landslide susceptibility evaluation in Yunyang County than the FR model. The performance of models depends not only on algorithms, but also on the specific conditions of the study areas and the selection of impacting factors. Therefore, this study cannot conclude that the RF model is definitely the best.
Compared with the FR model, the RF model has higher prediction accuracy. This finding is similar to the results of Sun et al. [73], who used RF to study Fengjie County (a neighbor of Yunyang County, with a similar geographic environment). (3) Finally, the importance-ranking results obtained from the impact factor importance analysis and AUC values of RF model with different reduced landslide influencing factors are in accordance with the basic laws of the geology and consistent with previous research findings. They can provide guidance for landslide management. The elevation, annual average rainfall, slope, lithology, POI kernel density, distance from roads, and distance from rivers were the main important landslide contributors in Yunyang County, while the contribution rate of faults was the smallest. In particular, as the highlight of this study, the POI kernel density proves useful in landslide susceptibility models. There are complex relationships between the factors, and the occurrence of landslides is inseparable from the combined effects of human and natural factors.