Landslide Susceptibility Assessment Based on Different MaChine Learning Methods in Zhaoping County of Eastern Guangxi

: Regarding the ever increasing and frequent occurrence of serious landslide disaster in eastern Guangxi, the current study was implemented to adopt support vector machines (SVM), particle swarm optimization support vector machines (PSO-SVM), random forest (RF), and particle swarm optimization random forest (PSO-RF) methods to assess landslide susceptibility in Zhaoping County. To this end, 10 landslide disaster-related variables including digital elevation model (DEM)- derived, meteorology-derived, Landsat8-derived, geology-derived, and human activities factors were provided. Of 345 landslide disaster locations found, 70% were used to train the models, and the rest of them were performed for model veriﬁcation. The aforementioned four models were run, and landslide susceptibility evaluation maps were produced. Then, receiver operating characteristics (ROC) curves, statistical analysis, and ﬁeld investigation were performed to test and verify the efﬁciency of these models. Analysis and comparison of the results denoted that all four landslide models performed well for the landslide susceptibility evaluation as indicated by the area under curve (AUC) values of ROC curves from 0.863 to 0.934. Among them, it has been shown that the PSO-RF model has the highest accuracy in comparison to other landslide models, followed by the PSO-SVM model, the RF model, and the SVM model. Moreover, the results also showed that the PSO algorithm has a good effect on SVM and RF models. Furthermore, the landslide models devolved in the present study are promising methods that could be transferred to other regions for landslide susceptibility evaluation. In addition, the evaluation results can provide suggestions for disaster reduction and prevention in Zhaoping County of eastern Guangxi.


Introduction
The geological environment in eastern Guangxi is fragile and landslide disasters occur frequently, which not only causes huge economic losses and ecological damage, but also seriously restricts the survival of human beings and the sustainable development of human society [1][2][3].With the rapid development of the economy in recent decades, the frequency and intensity of landslide disasters are rapidly increasing with the over-exploitation and utilization of natural resources by humans [4].Therefore, it is of great significance to objectively evaluate landslide susceptibility for the reduction and prevention of disasters.
Over the past few decades, the most commonly used methods for ascertaining landslide susceptibility in a specific region can be divided into two categories: knowledge-driven methods and data-driven methods.The former is mainly based on experts' experience of knowledge-driven methods, such as expert scoring method [5], analytic hierarchy process [5][6][7], fuzzy logic method [5][6][7][8] and so on.It lacks consistency and portability because it relies too much on individual experts' subjective experience and analytical judgment.The latter can be divided into statistical analysis model and ML model.Statistical analysis models, e.g., weights-of-evidence [9,10], frequency ratio [7,9,11,12], certainty factor [13,14], index of entropy [1], spatial multi-criteria evaluation [15,16], and others, have been widely used to assess landslide susceptibility because they can use mathematical models to establish a quantitative relationship between landslide disaster and evaluation factors, but these models do not deal with the non-linear problem in landslide disaster systems.
However, a landslide disaster system is a non-linear, dynamic and open complex giant system with multi-level structure, multi-time scale, and multiple internal and external interaction processes [17].Statistical analysis method is difficult to accurately deal with the multi-source, heterogeneous, dynamic and massive landslide disaster-related data accumulated by long-term landslide disaster investigation [9].The ML method has strong learning ability and can identify the non-linear relationship between landslide disaster susceptibility and influence factors in the region [18][19][20][21][22][23][24][25].
In addition, many comparative studies on landslide susceptibility assessment using different ML methods have been performed.For example, Marjanović et al. [18] stated a comparison research of SVM with other models and found that SVM has the best performances compared with DT and LR for landslide susceptibility evaluation.In another landslide assessment investigation, Tien Bui et al. [19] also proved that the capability of SVM was better than the decision tree and NB models.In another comparative study on performance of landslide susceptibility mapping, Kavzoglu et al. [15] undertook an experimental research to investigate that the performance of SVM is higher than the LR.In another comparative investigation, Trigila et al. [34] completed a comparison of the LR and RF algorithms in an analytic study of landslide susceptibility and discovered that RF presents a better performance than LR.Another study certified that results produced from SVM have the highest prediction accuracy compared to LR, BN, NB, and FLDA for landslide susceptibility evaluation [27].Likewise, other comparative research on the performance of two ML algorithms, SVM and RF, for landslide susceptibility prediction based on two-level random sampling, was undertaken by Ada and San, and their results indicated that the spatial performances of SVM and RF classifications were almost equally accurate, because all the area under curve (AUC) values of receiver operating characteristics (ROC) curves ranged between 0.82 and 0.87 [31].
In general, each of the above ML models has been widely applied to landslide prediction and evaluation.Among them, SVM and RF have been widely proved to be useful methods in the evaluation of landslide susceptibility [15,18,19,27,31,34].However, few studies have focused on the optimization of SVM and RF models in landslide susceptibility prediction and evaluation and compared the optimized results.Therefore, the objective of the present paper is to: (1) determine the landslide susceptibility assessment factors by multi-source data processing and correlation factor analysis; (2) optimize SVM and RF models by using a particle swarm optimization (PSO) algorithm; (3) analyze and evaluate the susceptibility levels of landslides by using the SVM, PSO-SVM, RF, and PSO-RF models for Zhaoping County; and (4) compare the performances of four ML models for landslide susceptibility evaluation by ROC curve, statistical analysis, and field-verified methods.
The results provide valuable informational support for the prediction and evaluation of landslides in Zhaoping County, Guangxi.

Study Areas
Zhaoping County is located between longitude 110 • 34 E to 111 • 19 E and latitude 23 • 39 N to 24 • 24 N in the eastern part of Guangxi, the middle reaches of the Guijiang River, with a total area of about 3223.67 km 2 and a total population of 448,000, as shown in Figure 1.It is situated in the subtropical monsoon humid climate region with mild climate and abundant rainfall.The annual average temperature is 19.8 • C and the annual rainfall is 2046 mm, which is one of the rainy and heavy rain centers in Guangxi.
Remote Sens. 2021, 13, x FOR PEER REVIEW 3 of 20 the susceptibility levels of landslides by using the SVM, PSO-SVM, RF, and PSO-RF models for Zhaoping County; and (4) compare the performances of four ML models for landslide susceptibility evaluation by ROC curve, statistical analysis, and field-verified methods.The results provide valuable informational support for the prediction and evaluation of landslides in Zhaoping County, Guangxi.

Study Areas
Zhaoping County is located between longitude 110°34′ E to 111°19′ E and latitude 23°39′ N to 24°24′ N in the eastern part of Guangxi, the middle reaches of the Guijiang River, with a total area of about 3223.67 km 2 and a total population of 448,000, as shown in Figure 1.It is situated in the subtropical monsoon humid climate region with mild climate and abundant rainfall.The annual average temperature is 19.8 °C and the annual rainfall is 2046 mm, which is one of the rainy and heavy rain centers in Guangxi.Zhaoping County has remarkable geomorphological characteristics; it is in a mountainous region with intervening deep valleys, where the mountain area is 87.6% of the total area, and the terrain is high in the northwest and low in the southeast.The main structure is near EN to WS trending large fault and the north protruding Dayaoshan arc structural compression belt, where a series of secondary arc folds and faults are distributed.At the same time, the Dayaoshan uplift belt is cut by a series of near-SN trending faults and it forms many secondary depression areas.Under the influence of multi-stage tectonic movements, a joint fissure is developed in rock mass and the rock is weathered seriously, which provides the basic conditions for the formation of landslides.Finally, extremely fragile geological characteristics are formed, because of long-term geological changes in geological internal and external forces; these landslides occurred frequently in Zhaoping County.According to the field investigation report of the geological disaster project by Guangxi Geological Survey Bureau in 2018, there are 345 landslide disaster points in Zhaoping County [2].

Data Sources and Landslide Inventory Data
The following are the main data sources adopted in this paper: (1) A digital elevation model (DEM) for Zhaoping with a spatial resolution of 30 m × 30 m; it was constructed from ASTER Global DEM acquired from the United States Geological Survey Zhaoping County has remarkable geomorphological characteristics; it is in a mountainous region with intervening deep valleys, where the mountain area is 87.6% of the total area, and the terrain is high in the northwest and low in the southeast.The main structure is near EN to WS trending large fault and the north protruding Dayaoshan arc structural compression belt, where a series of secondary arc folds and faults are distributed.At the same time, the Dayaoshan uplift belt is cut by a series of near-SN trending faults and it forms many secondary depression areas.Under the influence of multi-stage tectonic movements, a joint fissure is developed in rock mass and the rock is weathered seriously, which provides the basic conditions for the formation of landslides.Finally, extremely fragile geological characteristics are formed, because of long-term geological changes in geological internal and external forces; these landslides occurred frequently in Zhaoping County.According to the field investigation report of the geological disaster project by Guangxi Geological Survey Bureau in 2018, there are 345 landslide disaster points in Zhaoping County [2].

Data Sources and Landslide Inventory Data
The following are the main data sources adopted in this paper: (1) A digital elevation model (DEM) for Zhaoping with a spatial resolution of 30 m × 30 m; it was constructed from ASTER Global DEM acquired from the United States Geological Survey (http://earthexplorer.usgs.gov,accessed on 7 September 2021).Based on the DEM data, three geomorphic factors were generated: slope, aspect, and plan curvature; (2) the annual precipitation data of 2015 were collected from the Guangxi Meteorological Bureau, and their resolution is 30 m after resampling by ArcGIS software; (3) Landsat 8 OLI image (24 December 2017, 124/043) with the 30 m resolution used to extract the normalized differential vegetation index (NDVI), and land use and land cover (LULC) map; (4) a 1:50,000 topographic map was collected to reflect the densities of residents and road network; (5) a 1:50,000 geological map was adopted to extract the stratum lithology and tectonic complexity; (6) a landslide inventory map in Zhaoping was prepared by image interpretation and field investigations of Guangxi Geological Survey Bureau staff based on historical data and remote sensing data in 2017 [2].All these data constituted a landslide disaster evaluation factor database, and this database listed the ID number, scale, direction, location, (X, Y) coordinates, center point, slope, aspect, interpreter, and name of the landslide.

Classification of Evaluation Factors
Many factors affect the occurrence of landslides in Zhaoping, and the factors are not independent of each other.To more objectively assess the susceptibility of landslide, a total of ten factors of high correlation with landslide disaster occurrence were chosen based on the field investigation report of the geological disaster project by Guangxi Geological Survey Bureau and the disaster factors correlation analysis in Zhaoping: slope, aspect, curvature, annual rainfall, NDVI, stratum lithology, tectonic complexity, LULC, residential density, and road network density [2].At the same time, these factors have been classified into different grades (Table 1) according to the analysis of influence of each evaluation factor to landslide occurrences implemented by Guangxi Geological Survey Bureau staff for Zhaoping [2].According to the classification standard of Table 1, the attribute value of each evaluation factor is obtained by superimposed analysis with a 30 m × 30 m grid and the attributes of each evaluation factor; the results are shown in Figure 2a-j.Among them, Figure 2a-c indicates that maps of slope Figure 2a, aspect Figure 2b, and curvature Figure 2c were extracted from DEM with a 30 m × 30 m grid cell, which represented the influence of topography on the development and distribution of landslides in Zhaoping.
Precipitation, especially heavy rain or continuous precipitation is the external dynamic factor that induces the landslide [4].There is plenty of precipitation in Zhaoping, and the annual average number of heavy rain days is between 3 and 15 days.Under the action of precipitation infiltration, scour, erosion, and so on, unstable mountains easily form landslides.Meanwhile, the landslide and frequent periods of heavy rain are basically the same, both concentrated from May to August, indicating that the formation of landslides is closely related to heavy rain in Zhaoping.Figure 2d is the annual rainfall map of Zhaoping from the Guangxi Meteorological Bureau.
The ecological environment is closely related to the occurrence of landslides.Zhaoping has a warm and humid climate with a wide variety of vegetation.In this current study, the map of NDVI Figure 2e was extracted from a Landsat8 OLI image to characterize the ecological environmental characteristics for Zhaoping.
The strata of Zhaoping are mainly Cambrian, Devonian, and a small number of Quaternary, and the main lithology are clastic rocks, clastic rocks intercalated with siliceous rocks, sandstone and shale, carbonate rock, and a small amount of granite or basal rock, accounting for 55.89%, 34.11%, 4.54%, 3.96%, and 0.47% of the total area, respectively Figure 2f.Clastic rocks are prone to landslides under the action of precipitation, especially heavy precipitation [4].At the same time, after the influence of multi-stage tectonic movement and long-term action of geological internal and external forces, a more complex geological structure pattern is formed, and folds and fractures staggered distribution, which resulted in extremely fragile geological environmental characteristics.Figure 2g indicates the tectonic complexity of Zhaoping.
In addition, human activities have become one of the major driving forces for environmental changes and induced landslide [4].Human engineering activities such as land use change, steep slope reclamation, road and bridge building, development of forests and mineral resources, construction of hydropower engineering and so on, strongly disturb the topography and geomorphology and make it lose its state of equilibrium, which leads to the probability of landslides occurring far more than in the natural state.Therefore, the LULC map, residential density, and road network density were selected as representative factors to reflect the influences of human activities on the environment in Zhaoping, as shown in Figure 2h-j.
Precipitation, especially heavy rain or continuous precipitation is the external dynamic factor that induces the landslide [4].There is plenty of precipitation in Zhaoping, and the annual average number of heavy rain days is between 3 and 15 days.Under the action of precipitation infiltration, scour, erosion, and so on, unstable mountains easily form landslides.Meanwhile, the landslide and frequent periods of heavy rain are basically the same, both concentrated from May to August, indicating that the formation of landslides is closely related to heavy rain in Zhaoping.Figure 2d is the annual rainfall map of Zhaoping from the Guangxi Meteorological Bureau.
The ecological environment is closely related to the occurrence of landslides.Zhaoping has a warm and humid climate with a wide variety of vegetation.In this current study, the map of NDVI Figure 2e was extracted from a Landsat8 OLI image to characterize the ecological environmental characteristics for Zhaoping.
The strata of Zhaoping are mainly Cambrian, Devonian, and a small number of Quaternary, and the main lithology are clastic rocks, clastic rocks intercalated with siliceous rocks, sandstone and shale, carbonate rock, and a small amount of granite or basal rock, accounting for 55.89%, 34.11%, 4.54%, 3.96%, and 0.47% of the total area, respectively Figure 2f.Clastic rocks are prone to landslides under the action of precipitation, especially heavy precipitation [4].At the same time, after the influence of multi-stage tectonic movement and long-term action of geological internal and external forces, a more complex geological structure pattern is formed, and folds and fractures staggered distribution, which resulted in extremely fragile geological environmental characteristics.Figure 2g indicates the tectonic complexity of Zhaoping.
In addition, human activities have become one of the major driving forces for environmental changes and induced landslide [4].Human engineering activities such as land use change, steep slope reclamation, road and bridge building, development of forests and mineral resources, construction of hydropower engineering and so on, strongly disturb the topography and geomorphology and make it lose its state of equilibrium, which leads to the probability of landslides occurring far more than in the natural state.Therefore, the LULC map, residential density, and road network density were selected as representative factors to reflect the influences of human activities on the environment in Zhaoping, as shown in Figure 2h-j.Based on the above, the database of the landslide susceptibility evaluation factors in Zhaoping was established, with a total of 3,581,859 grid evaluation units.In view of the obvious non-parallel data between landslide points and non-slide points in the study area, a random sampling method based on environmental similarity strategies was adopted to construct training dataset and testing dataset to avoid machine learning preference.In the present database, 1493 grid units as training samples were selected to construct the training dataset, including 242 (70%) landslide disaster points and 1251 non-disaster points with low environmental similarity with landslide disaster points; 1042 grid units as testing samples to construct the testing dataset, including 103 (30%) landslide disaster points and 939 non-disaster points with low environmental similarity with landslide disaster Based on the above, the database of the landslide susceptibility evaluation factors in Zhaoping was established, with a total of 3,581,859 grid evaluation units.In view of the obvious non-parallel data between landslide points and non-slide points in the study area, a random sampling method based on environmental similarity strategies was adopted to construct training dataset and testing dataset to avoid machine learning preference.In the present database, 1493 grid units as training samples were selected to construct the training dataset, including 242 (70%) landslide disaster points and 1251 non-disaster points with low environmental similarity with landslide disaster points; 1042 grid units as testing samples to construct the testing dataset, including 103 (30%) landslide disaster points and 939 non-disaster points with low environmental similarity with landslide disaster points.Four ML models (SVM, PSO-SVM, RF and PSO-RF) for landslide disaster susceptibility evaluation were trained using the training dataset, whereas the performance of the constructed four landslide susceptibility evaluation models was verified using the testing dataset.

Methods
Landslide susceptibility evaluation has been carried out in nine main processes points.Four ML models (SVM, PSO-SVM, RF and PSO-RF) for landslide disaster susceptibility evaluation were trained using the training dataset, whereas the performance of the constructed four landslide susceptibility evaluation models was verified using the testing dataset.

Methods
Landslide susceptibility evaluation has been carried out in nine main processes

Support Vector Machine (SVM) Model
SVM is based on statistical approach and structured risk minimization theory [43,44].It uses the kernel function to map the input variables to a high-dimensional characteristic space, and then finds the optimal hyperplane for separating two classes.The SVM ensures that the extreme solution is the global optimal solution [15].SVM has been proven to have many unique advantages in dealing with small samples, non-linear and high-dimensional pattern recognition, and is successfully applied in disaster prediction and assessment [15,18,19,27,[30][31][32]. In

Support Vector Machine (SVM) Model
SVM is based on statistical approach and structured risk minimization theory [43,44].It uses the kernel function to map the input variables to a high-dimensional characteristic space, and then finds the optimal hyperplane for separating two classes.The SVM ensures that the extreme solution is the global optimal solution [15].SVM has been proven to have many unique advantages in dealing with small samples, non-linear and highdimensional pattern recognition, and is successfully applied in disaster prediction and assessment [15,18,19,27,[30][31][32].
In the landslide assessment of the current study, the training sample dataset is given as {x i , y i }, i = 1, 2, . . ., n; x i ∈ R m , y i ∈ {−1, +1}.SVM seeks the optimal classification hyperplane in the feature space of the landslide, which can separate the two types of training samples of the disaster point and the non-disaster point.The optimal classification hyperplane is defined as Equation (1): where n represents the number of training samples, m represents the dimension of the input vector, ω represents the norm of the hyperplane normal vector, and b is the displacement term.
The Lagrangian multiplier rule is introduced to find the extreme value, and the auxiliary function is generated as Equation (2): where the λ i is Lagrange multiplier.
The dual minimum method given by Vapnik [44] and Tax and Duin [45] is used to solve the w and b values of the equation.
For the non-linear non-separable disaster samples, the non-negative relaxation variables (ξ i ) and penalty factor C are introduced to adjust the constraint conditions, and the equation is modified to Equation (3): where ξ i > 0 denotes a sample classification error; C represents the degree of the penalty.In the landslide assessment, C ∈ (0, 1].denotes that the support vector represents the percentage of the entire training dataset.Therefore, the smaller the valve of C n ∑ i=1 ξ i , the better for finding the classification hyperplane. Meanwhile, the radial basis kernel function k(x, x i ) is adopted to process the nonlinear decision boundary when the SVM is constructed based on the training sample dataset as shown in Equation (4): where σ 2 represents the kernel parameter, which implicitly decides the distribution of data after mapping to a new characteristic space.The number of support vectors affects the speed of training and prediction.
To bring the kernel function into Equation (3), the final regression function (the optimal hyperplane) is obtained as Equation (5): The evaluation results of landslide susceptibility in Zhaoping are obtained by using regression analysis of Equation ( 5) and parameter optimization.Furthermore, the natural breakpoint method is adopted to divide the susceptibility into five levels: extremely high, high, middle, low, and extremely low areas Figure 4a.

Particle Swarm Optimization Support Vector Machine (PSO-SVM)
From the above analysis, it can be seen that the selection of the SVM parameters (penalty factor , and the core parameter of radial basis function σ directly affects the prediction accuracy of the landslide susceptibility evaluation model [15].Therefore, the PSO algorithm with powerful parameter global search capability was adopted to select the optimal  and σ, and the PSO-SVM model for prediction and evaluation of landslide was set up in Zhaoping.The main steps of the PSO-SVM model can be summed up as Table 2.

Particle Swarm Optimization Support Vector Machine (PSO-SVM)
From the above analysis, it can be seen that the selection of the SVM parameters (penalty factor C, and the core parameter of radial basis function σ directly affects the prediction accuracy of the landslide susceptibility evaluation model [15].Therefore, the PSO algorithm with powerful parameter global search capability was adopted to select the optimal C and σ, and the PSO-SVM model for prediction and evaluation of landslide was set up in Zhaoping.The main steps of the PSO-SVM model can be summed up as Table 2. Table 2.The main steps of the particle swarm optimization support vector machine (PSO-SVM) model.

(1) Initialization:
The initial parameters of the PSO-SVM model are set, including species size, iteration times, learning factor, inertia weight, initial particle, and particle initial velocity.The particle vector represents a SVM model corresponding to different C and σ.
(2) Optimization: In the process of particle optimization, each solution of the optimization problem is called a particle in the search space.The particle adaptation value (f i ) is calculated according to the fitness function.Adaptive function is the measure basis of the selection individual, and the individual is evaluated by the fitness function.
(3) Replacement: Based on the objective function, the adaptive value of each particle (fi), the population individual optimal solution f i (p best ), and the population global optimal solution f i (p gbest ) were calculated and compared.If f i < f i (p best ), then the optimization solution of the previous round is replaced with the new adaptation value (fi), and the particles of the previous round is replaced with the new particles, and then the f i (p best ) of each particle is compared with the f i (p gbest ) of all particles.If f i (p best ) < f i (p gbest ), the optimal solution of each particle is used to replace the optimal solution of all the original particles, and the current state of the particles is saved at the same time.
(4) Determination: If the f i of the individual in the population meets the requirements, or if the evolutionary algebra is terminated, then the calculation is ended, and the particle individual corresponds to the optimal C and σ combination, otherwise go to step (2) to continue the iteration. (

5) Set Up the PSO-SVM Model:
The global optimal PSO-SVM model is obtained by using the optimal parameters of the SVM with the optimal C and σ combination to train the training samples.The susceptibility of landslides is quantitatively evaluated and divided into five levels: extremely high, high, medium, low, and extremely low areas Figure 4b.

Random Forest (RF) Model
RF is a cluster tree classification proposed by Breiman [46], which is composed of multiple unrelated decision trees.It sampled from the original training dataset using the Bagging algorithm to obtain a multi-bootstrap training dataset.Then the corresponding decision tree model was acquired by training random selection of m attributes from all M decision attributes.Finally, the final classification result of the test dataset samples was determined by voting [22,31,34,35,[38][39][40][41]47].
Suppose that for the landslide sample x of Zhaoping, the output of the g decision tree is f tree, g (x) = i, i = 1, 2, . . ., n, that is, its corresponding category g = 1, 2, . . ., G, G is the number of decision trees in RF, and then the output of the RF model is Equation (6): where G(•) represents the number of samples that satisfy the expressions in parentheses.
The construction process of the RF model for landslide susceptibility assessment in Zhaoping can be seen in Table 3.

Weighted PSO-RF
To further compare the performance of different models in the evaluation of the susceptibility of the landslide, the parameters of the weighted RF are optimized by the PSO algorithm, and the main steps are shown in Table 4.
The data processing and visualization in this paper is undertaken using ArcGIS software, and the training and testing of the four ML models is completed in R language.For each training subset D i (1 ≤ I ≤ K), the decision tree without pruning is generated by the following procedure: Firstly, let the number of predictive attributes in the training sample be M, F (F < M) attributes are randomly chosen from M to compose a random characteristic subspace X i , and those as the split attribute datasets of the present node of the decision tree.In the process of generating the RF model, the value of F remains unaltered; Secondly, the node was split according to the optimal split attribute of each node selecting from the random feature subspace X i by the decision tree generation algorithm; Thirdly, every tree grows completely and has no pruning process.The corresponding decision tree h i (D i ) is generated by each training dataset D i ; Fourthly, the RF model of {h 1 (D 1 ), h 2 (D 2 ), . . ., h i (D i )} was generated by combining all the generated decision trees.And the corresponding classification result of {C 1 (X), C 2 (X), . . ., C K (X)} is obtained by using testing of each decision tree h i (D i ) with test dataset sample X; Finally, according to the classification results of K decision trees, the final classification results corresponding to the test dataset sample X was determined by classification results with a large number of decision trees by voting method.
(4) Dividing Levels: According to the above steps, the landslide susceptibility of Zhaoping is divided into 5 levels Figure 4c.

Table 4.
The main steps of the particle swarm optimization random forest (PSO-RF) model.
(1) Initialization: The initial parameters of the PSO-RF model are set, including the number of decision trees R, pruning threshold ε, number of predicted test samples X, and initial value of random attributes m.
(2) Sampling: Using the Bootstrap algorithm, R training datasets are randomly produced, and X pre-test samples are selected in each training dataset. (

3) Generating Decision Tree:
A total of R decision trees are generated by using the rest of the samples of each training dataset.In the process of generating decision trees, m attributes are selected from all attributes as the decision attributes of the present node before each attribute is selected.
(4) Determination: When the number of samples included in the node is less than the threshold ε, the node is taken as the leaf node, and the mode of the target attributes is returned as the classification result of the decision tree.
(5) Setting Up the PSO-RFModel: When all decision trees are produced, each decision tree is pre-tested and its weights are calculated by using the equation ( 7): where X correct,r is the classified correct number of samples of r decision trees, and X is the number of pre-tested samples.

(6) Calculation of the Classification Results:
The classification results of the model are calculated by Equation (8): Taking the classification results as the fitness values, the PSO algorithm is applied to optimize the parameters of Equation ( 6) iteratively and determine the parameters of the final RF model.

(8) Running
Finally, the optimized parameters are input into the model, and the output results of the model are obtained.According to the results, the susceptibility of landslides is divided into five levels Figure 4d.

Evaluation Results
The 3,581,859 grids of Zhaoping were input into the above trained four ML models, and corresponding landslide susceptibility indexes were obtained.Using the natural breaks classification method, the landslide susceptibility of Zhaoping was divided into five levels from low to high: extremely low, low, medium, high and extremely high, as shown in Figure 4.
Figure 4 shows that the extremely high susceptibility level for landslides is mainly distributed in the clastic rock areas along the Guijiang River and its tributaries, and the closer the riverbank, the higher its susceptibility index.Here the geological structure is complex, where multi-period tectonic movement makes the joints and fractures of rock mass develop, the weathering of rock is serious, and water erosion is strong.Under the action of precipitation, especially heavy precipitation, as well as undermining and erosion of river water, clastic rocks easily form landslide disasters.
Simultaneously, Figure 4 indicates that the high susceptibility levels for landslides are mainly distributed in the surrounding towns and trunk lines built near the mountains or the Guijiang River.Here the geological structure is relatively complex, the stability of the rock is poor, and weathering is strong, which supplies adequate material basis for the development of landslide disaster.Meanwhile, the NDVI map of these regions indicates that the vegetation coverage is low, which indirectly reflects the frequent human engineering activities in the regions, indicating that the human engineering construction strongly interferes with the geological ecological environment of the region and leads to the frequent occurrence of landslides.This also illustrates that the stability and bearing capacity of regional geological environment systems should be fully considered in the construction of human engineering.
Figure 4 also indicates that the medium susceptibility levels for landslides is mainly distributed along the county roads, rural roads, and residential areas, distributed in belts or surface-like distribution.The rock mass here is stable; the vegetation covers it well, and it is less disturbed by human activities.
The remaining areas are low and extremely low susceptibility levels for landslide, far away from the Guijiang River and its tributaries, with high vegetation coverage and less human engineering activities.

Evaluation Accuracy and Validation Analysis
Evaluation accuracy and validation analysis is an essential component in landslide susceptibility prediction and evaluation to attest the availability and scientific significance of the adopted method [48].Many research papers confirmed that the AUC value of the ROC curve was an effective method for the precision inspection of the prediction model, and was widely used in all subjects [8, 20,27,36,39,49,50].Therefore, the AUC values of the ROC curves, calculated from continuous susceptibility values, were used to evaluate the accuracy of landslide susceptibility in Zhaoping for the ML methods, such as the SVM, PSO-SVM, RF, and PSO-RF model, as shown in Figure 5.
Figure 5 indicates the ROC curves and the AUC values of the testing dataset for the PSO-RF, RF, PSO-SVM, and SVM models.The values of AUC are 0.934, 0.886, 0.918, 0.863, respectively, which indicate that the probability of the four ML methods in the evaluation and prediction of landslide susceptibility in Zhaoping is higher than 86%.At the same time, the AUC values of the PSO-SVM and PSO-RF models (0.918 and 0.934) were higher than those of the traditional SVM and the RF (0.863 and 0.886), which indicated that the PSO algorithm can effectively optimize SVM and RF models, and the prediction probability of the optimized model is more than 91.5%.Such a result further revealed that the PSO-RF and PSO-SVM models have the stronger robustness and stable performance [40].Furthermore, the present study further testified that PSO has strong global parameter search ability, and parameter adjustment is simple and easy to implement, which confirmed that the PSO algorithm is successfully applied in landslide evaluation and prediction [51].Meanwhile, the results also demonstrated that PSO-RF model has a better prediction performance than the PSO-SVM model, which is mainly due to the large number of factors selected in this study, the PSO-RF model, a type of ensemble learning, exhibited advantages over a traditional ML method by not only accounting for different types of factors but also evaluating the relative importance of the factors in terms of landslide stability [47].
Remote Sens. 2021, 13, x FOR PEER REVIEW 14 of 20 of the adopted method [48].Many research papers confirmed that the AUC value of the ROC curve was an effective method for the precision inspection of the prediction model, and was widely used in all subjects [8, 20,27,36,39,49,50].Therefore, the AUC values of the ROC curves, calculated from continuous susceptibility values, were used to evaluate the accuracy of landslide susceptibility in Zhaoping for the ML methods, such as the SVM, PSO-SVM, RF, and PSO-RF model, as shown in Figure 5. Figure 5 indicates the ROC curves and the AUC values of the testing dataset for the PSO-RF, RF, PSO-SVM, and SVM models.The values of AUC are 0.934, 0.886, 0.918, 0.863, respectively, which indicate that the probability of the four ML methods in the evaluation and prediction of landslide susceptibility in Zhaoping is higher than 86%.At the same time, the AUC values of the PSO-SVM and PSO-RF models (0.918 and 0.934) were higher than those of the traditional SVM and the RF (0.863 and 0.886), which indicated that the PSO algorithm can effectively optimize SVM and RF models, and the prediction probability of the optimized model is more than 91.5%.Such a result further revealed that the PSO-RF and PSO-SVM models have the stronger robustness and stable performance [40].Furthermore, the present study further testified that PSO has strong global parameter search ability, and parameter adjustment is simple and easy to implement, which confirmed that the PSO algorithm is successfully applied in landslide evaluation and prediction [51].Meanwhile, the results also demonstrated that PSO-RF model has a better prediction performance than the PSO-SVM model, which is mainly due to the large number of factors selected in this study, the PSO-RF model, a type of ensemble learning, exhibited advantages over a traditional ML method by not only accounting for different types of factors but also evaluating the relative importance of the factors in terms of landslide stability [47].
Figure 5 indicates that the performance of the RF and RF-PSO is better than the SVM and PSO-SVM in evaluating the susceptibility of landslides because the values of AUC for RF (0.886) and RF-PSO (0.934) are higher than the values of AUC for SVM (0.863) and PSO-SVM (0.918), respectively, which confirmed that the generalization performance of the ensemble learner is superior to that of a single learner [47].At the same time, the research further certified that the RF and PSO-RF models have advantages in dealing with high-dimensional features and geological big data, such as fast classification speed, strong anti-noise ability, and avoiding over-fitting [20].However, because of the sensitivity of Figure 5 indicates that the performance of the RF and RF-PSO is better than the SVM and PSO-SVM in evaluating the susceptibility of landslides because the values of AUC for RF (0.886) and RF-PSO (0.934) are higher than the values of AUC for SVM (0.863) and PSO-SVM (0.918), respectively, which confirmed that the generalization performance of the ensemble learner is superior to that of a single learner [47].At the same time, the research further certified that the RF and PSO-RF models have advantages in dealing with high-dimensional features and geological big data, such as fast classification speed, strong anti-noise ability, and avoiding over-fitting [20].However, because of the sensitivity of the RF and PSO-RF models to the landslide samples, it is necessary to carry out sample screening before using RF and PSO-RF models to evaluate the susceptibility of landslide.
One interesting thing to note about Figure 5 is that at (1-specificity) = 0.1, RF has a better sensitivity than PSO-SVM, indicating a better performance.This agrees with Table 5, where RF also has a better performance than PSO-SVM in lower susceptibility regions (a region that includes low and extremely low).This is worth investigating, since PSO-SVM tend to have a better overall performance than RF.To further verify the performance of the four ML models, all landslide points (including training sample dataset and test sample dataset) were overlaid on the evaluation results of the four ML models to calculate the percentages of landslide points falling into different susceptibility regions, as shown in Figure 6.
screening before using RF and PSO-RF models to evaluate the susceptibility of landslide.
One interesting thing to note about Figure 5 is that at (1-specificity) = 0.1, RF has a better sensitivity than PSO-SVM, indicating a better performance.This agrees with Table 5, where RF also has a better performance than PSO-SVM in lower susceptibility regions (a region that includes low and extremely low).This is worth investigating, since PSO-SVM tend to have a better overall performance than RF.
To further verify the performance of the four ML models, all landslide points (including training sample dataset and test sample dataset) were overlaid on the evaluation results of the four ML models to calculate the percentages of landslide points falling into different susceptibility regions, as shown in Figure 6. Figure 6 indicates that the landslide susceptibility evaluation results of four ML models in Zhaoping are in accordance with the distribution of landslide points.
In addition, the performance of the four ML models is demonstrated by quantitatively analyzing the percentage of all landslide disaster points falling into the different susceptibility regions, as shown in Table 5.Among them, larger percentages in regions with extremely high and high susceptibility levels as well as lower percentages in regions with extremely low and low susceptibility levels indicates higher accuracy.
Table 5 indicates that the percentages of landslide points falling into either extremely high or high susceptibility regions are 44.64% and 20.87%, 50.43% and 19.13%, 53.33% and 21.16%, and 54.78% and 21.74% for the SVM, RF, PSO-SVM, and PSO-RF models, respectively.All higher than 65%, indicating high accuracy of the four ML models, which certified that the evaluation accuracy of four ML models in either the extremely high or high prone regions from high to low are: PSO-RF, PSO-SVM, RF, and SVM.Simultaneously, Table 5 also indicates that the proportions of landslide points falling into either low or extremely low susceptibility regions are 10.43% and 7.54%, 9.57% and 2.61%, 6.38% and 11.30%, and 4.35% and 4.06% for the SVM, RF, PSO-SVM, and PSO-RF models, respectively, which certified that the wrong accuracy of four ML models in either low or extremely low susceptibility regions from low to high are: PSO-RF, RF, PSO-SVM, and SVM.
Furthermore, the percentages of landslide points in the test sample dataset falling into different susceptibility regions was also counted to testify the performance for the four ML models, as shown in Figure 7.

Conclusions
The improvement of performance for landslide susceptibility models is still the focus of widespread concern in the disaster research community, because the capability of the models is dominated by the method adopted [20], although ML methods have been validated as efficient in terms of prediction and assessment performance [27].Therefore, four widely used ML models such as SVM, PSO-SVM, RF, and PSO-RF were investigated to predict and evaluate the susceptibility levels of landslides for Zhaoping in Guangxi of southern China.
Analysis and comparison of the results denoted that all four ML models performed well for the landslide susceptibility evaluation and prediction as the AUC values of ROC curves are all greater than 86%.Among them, it has been shown that the PSO-RF model (93.4%) has the highest performance in comparison to other landslide models, followed by the PSO-SVM model (91.8%), the RF model (88.6%), and the SVM model (86.3%).This agrees with the result of Ada and San's research: without optimization, the AUC values of ROC curves of RF and SVM falls between 0.82 and 0.87 [31]; and our unoptimized result has the range of 0.863 to 0.886.Moreover, the results also showed that the PSO algorithm has a good effect on SVM and RF models [40].In addition, our results also revealed that the PSO-RF and PSO-SVM landslide models have strong robustness and stable performance, and those two models are prospective methods that could be applied to landslide susceptibility evaluation in regions with similar natural geological and ecological environmental backgrounds.
At the same time, the results described in the present study proved that the predic- Figure 7 illustrates that the percentages of the landslide disaster points falling into extremely high susceptibility regions is increasing (from left to right in the figure's arrangement).This shows that the accuracy of the four ML models ranks as PSO-RF, PSO-SVM, RF, SVM from high to low.By further statistical analysis, in the PSO-RF model, the 58.25% of the landslide points in the testing dataset falls into the extremely high region, 20.39% in the high region, adding up to a sum of 78.64%, coming to the result that the probability of PSO-RF can reach 78.64%.In the same analysis, the probability of PSO-SVM, RF, SVM models are 75.73%,74.81%, and 66.99%, respectively.From the above analysis, all four models have accuracy higher than 66%, agreeing with Figure 5 that the AUC values of ROC values are all higher than 0.85.Thus, we can conclude that the four models have relatively high performance in terms of accuracy, with PSO-RF being the highest.
Overall, the ML models of the SVM, PSO-SVM, RF, and PSO-RF achieved excellent performance in predicting and evaluating the susceptibility levels of landslides in this study.

Conclusions
The improvement of performance for landslide susceptibility models is still the focus of widespread concern in the disaster research community, because the capability of the models is dominated by the method adopted [20], although ML methods have been validated as efficient in terms of prediction and assessment performance [27].Therefore, four widely used ML models such as SVM, PSO-SVM, RF, and PSO-RF were investigated to predict and evaluate the susceptibility levels of landslides for Zhaoping in Guangxi of southern China.
Analysis and comparison of the results denoted that all four ML models performed well for the landslide susceptibility evaluation and prediction as the AUC values of ROC curves are all greater than 86%.Among them, it has been shown that the PSO-RF model (93.4%) has the highest performance in comparison to other landslide models, followed by the PSO-SVM model (91.8%), the RF model (88.6%), and the SVM model (86.3%).This agrees with the result of Ada and San's research: without optimization, the AUC values of ROC curves of RF and SVM falls between 0.82 and 0.87 [31]; and our unoptimized result has the range of 0.863 to 0.886.Moreover, the results also showed that the PSO algorithm has a good effect on SVM and RF models [40].In addition, our results also revealed that the PSO-RF and PSO-SVM landslide models have strong robustness and stable performance, and those two models are prospective methods that could be applied to landslide susceptibility evaluation in regions with similar natural geological and ecological environmental backgrounds.
At the same time, the results described in the present study proved that the prediction results of four ML models are consistent with the field survey results, by comparing Figures 4 and 6, which verified the validity of the four ML models again.This also proved that the four ML models have excellent performance in evaluating and predicting the occurrence of landslides.Furthermore, the results can provide informational service and decision support for landslide early warning, land-use planning and environmental management for local government departments.
In addition, our study found that the 10 disaster-related factors selected in this paper can fully reflect the natural geological and ecological environment characteristics of the study area.Simultaneously our study also found that the selection of training samples will affect the susceptibility evaluation results during the process of landslide susceptibility evaluation using four ML methods.It is worth mentioning that there is a great difference between the extremely low and extremely high susceptibility regions for the evaluation results of RF and PSO-RF models, and the occurrences of the extremely low prone regions is almost 0.However, regions where landslide disaster have not occurred do not mean that landslides will not occur, so future investigations should pay more attention to over-fitting in evaluating and predicting the susceptibility of landslides for the RF and PSO-RF models.
Author Contributions: All authors contributed to the study conception and design.Conceptualization, methodology were performed by C.K., K.X. and X.M.; Material preparation, data collection and analysis were performed by C.K. and K.X.; Formal analysis and investigation were Y.T., Z.W. and Z.Z.; the first draft of the manuscript was written by C.K. and all authors commented on previous versions of the manuscript.Writing-review and editing was performed by X.M.; Funding acquisition was performed by C.K. and Y.T.All authors have read and agreed to the published version of the manuscript.

Figure 1 .
Figure 1.Location of Zhaoping County in Guangxi Province (a) and China (b).

Figure 1 .
Figure 1.Location of Zhaoping County in Guangxi Province (a) and China (b).

Figure 3 :
(1) according to the environmental characteristics of Zhaoping, all the evaluation factors related to landslides are collected; (2) evaluation units were divided into 30 m × 30 m grid cells by using ArcGIS; (3) the landslide susceptibility assessment factor system was determined; (4) the classification criterion for each evaluation factor was divided according to the classification standard of Guangxi Geological Survey Bureau;(5) spatial and attribute databases for each evaluation factor were set up based on 30 m × 30 m grid cells by nearest neighbor resampling;(6) Training and testing datasets were selected; (7) landslide susceptibility evaluation models were established based on different ML methods, such as SVM, PSO-SVM, RF, and PSO-RF; (8) we validated and compared the evaluation accuracy for four ML models with ROC curves, statistical analysis, and field-survey; (9) we divided the landslide susceptibility levels in Zhaoping.Remote Sens. 2021, 13, x FOR PEER REVIEW 8 of 20 Figure 3: (1) according to the environmental characteristics of Zhaoping, all the evaluation factors related to landslides are collected; (2) evaluation units were divided into 30 m × 30 m grid cells by using ArcGIS; (3) the landslide susceptibility assessment factor system was determined; (4) the classification criterion for each evaluation factor was divided according to the classification standard of Guangxi Geological Survey Bureau; (5) spatial and attribute databases for each evaluation factor were set up based on 30 m × 30 m grid cells by nearest neighbor resampling; (6) Training and testing datasets were selected; (7) landslide susceptibility evaluation models were established based on different ML methods, such as SVM, PSO-SVM, RF, and PSO-RF; (8) we validated and compared the evaluation accuracy for four ML models with ROC curves, statistical analysis, and field-survey; (9) we divided the landslide susceptibility levels in Zhaoping.

Figure 3 .
Figure 3. Flowchart of landslide susceptibility evaluation based on machine learning (ML).
the landslide assessment of the current study, the training sample dataset is given as {  ,   },  = 1,2, … , ;   ∈   ,   ∈ {−1, +1}.SVM seeks the optimal classification hyperplane in the feature space of the landslide, which can separate the two types of training Divided landslide susceptibility levels in Zhaoping County Collected data related to landslide Divided evaluation units based on ArcGIS software Pre-processed the evaluation factors Determined landslide evaluation factor system Divided classification criterion for each evaluation factor Set up spatial and attribute databases for evaluation factor Established landslide susceptibility evaluation models based on ML methods Validated and compared the evaluation accuracy for different ML models Selected the training and testing datasets Topographic

Figure 3 .
Figure 3. Flowchart of landslide susceptibility evaluation based on machine learning (ML).

( 1 )( 2 )
Initialization: Suppose D is an original training dataset of landslide susceptibility assessment factors, which is composed of M prediction attributes (M = 10) and a classification attribute Y (Y = 5).There are n (n = 3,581,859) different examples in D. Get Multiple Training Datasets: The K new training subsets of {D 1 , D 2 , . . ., D K } were obtained by K times random sampling with replay from the original training dataset D by using the Bagging algorithm.At the same time, each of the K training subsets contains n instances, in which there is repetition.

( 3 )
Training to Generate Decision Tree:

Figure 5 .
Figure 5. Receiver operating characteristics (ROC) curves and area under the curve (AUC) values of testing dataset for the PSO-RF, RF, PSO-SVM, and SVM models.

Figure 5 .
Figure 5. Receiver operating characteristics (ROC) curves and area under the curve (AUC) values of testing dataset for the PSO-RF, RF, PSO-SVM, and SVM models.

20 Figure 7 .
Figure 7. Percentages of landslides in testing dataset falling into different susceptibility levels.

Figure 7 .
Figure 7. Percentages of landslides in testing dataset falling into different susceptibility levels.

Funding:
This research was funded by the National Natural Science Foundation of China, grant number U1711267; Science and Technology Plan Project of Guizhou Province, grant number [2020]4Y039; Project Funding of Investigation and Evaluation of Guizhou Provincial Geological 3D Spatial Strategy, grant number 2019-02; Geological Scientific Research Project of Geology and Mineral Exploration and Development Bureau Guizhou Province, grant number [2021]03 and [2018]07; the Open research project of key laboratory of Tectonics and Petroleum Resources, Ministry of Education, grant number TPR-2019-11; and the Open fund project of National-Local Joint Engineering Laboratory on Digital Preservation and Innovative Technologies for the Culture of Traditional Villages and Towns, grant number CTCZ19K01.The authors would like to thank the anonymous reviewers for providing valuable comments on the manuscript.Institutional Review Board Statement: Not applicable.Informed Consent Statement: Not applicable.

Table 1 .
Landslide affecting factors and their classes.

Table 3 .
The main steps of the random forest (RF) model.

Table 5 .
Percentages of landslide points falling into different susceptibility levels.