Landslide Susceptibility Mapping Based on Particle Swarm Optimization of Multiple Kernel Relevance Vector Machines : Case of a Low Hill Area in Sichuan Province , China

In this paper, we propose a multiple kernel relevance vector machine (RVM) method based on the adaptive cloud particle swarm optimization (PSO) algorithm to map landslide susceptibility in the low hill area of Sichuan Province, China. In the multi-kernel structure, the kernel selection problem can be solved by adjusting the kernel weight, which determines the single kernel contribution of the final kernel mapping. The weights and parameters of the multi-kernel function were optimized using the PSO algorithm. In addition, the convergence speed of the PSO algorithm was increased using cloud theory. To ensure the stability of the prediction model, the result of a five-fold cross-validation method was used as the fitness of the PSO algorithm. To verify the results, receiver operating characteristic curves (ROC) and landslide dot density (LDD) were used. The results show that the model that used a heterogeneous kernel (a combination of two different kernel functions) had a larger area under the ROC curve (0.7616) and a lower prediction error ratio (0.28%) than did the other types of kernel models employed in this study. In addition, both the sum of two high susceptibility zone LDDs (6.71/100 km2) and the sum of two low susceptibility zone LDDs (0.82/100 km2) demonstrated that the landslide susceptibility map based on the heterogeneous kernel model was closest to the historical landslide distribution. In conclusion, the results obtained in this study can provide very useful information for disaster prevention and land-use planning in the study area.


Introduction
Landslide susceptibility assessment always requires the consideration of many non-linear relation environmental factors, such as geomorphological, geological, hydrological and land cover data [1,2].Determining how to map regional landslide susceptibility from these factors has become a focus of geological research.In recent years, several approaches to mapping landslide susceptibility have been proposed, which can be grouped into two broad groups: qualitative and quantitative [3].Qualitative methods are somewhat subjective and depend on the judgment of experts, whereas quantitative methods are relatively objective and based on numerical expressions of the relationships between environmental factors and landslides [4].The quantitative methods are generally considered to be more scientific and accurate than qualitative methods.With the development of computing and geographic information system (GIS) technologies, quantitative methods have developed rapidly in recent decades; these methods include fuzzy logic, decision trees, logistic regression, artificial neural networks (ANNs) and support vector machines (SVMs) [5][6][7][8][9].Studies have shown that fuzzy logic can address missing values, but it depends on greater experience.Moreover, the fuzzy logic single factor evaluation matrix is difficult to establish, thus making the model prediction unstable.Decision trees can also handle missing values and have better local analysis capabilities than logistic regression does, which inevitably leads to over-fitting problems.Logistic regression models are capable of good global analyses, but are sensitive to extremes.ANN and SVM have been proven to be more effective than the above methods for landslide susceptibility assessments.However, ANN can easily fall into local minima and has slow convergence speeds.SVM performs better than ANN at generalization, but its kernel needs to satisfy the Mercer conditions.Additionally, it cannot directly estimate the prediction uncertainty.Therefore, better techniques are urgently needed.
The relevance vector machine (RVM) proposed by Tipping [10] is a new Bayes probability model based on SVM, and its kernel does not need to satisfy the Mercer conditions [11].Moreover, the RVM results include classification results and the probability distribution, which are very suitable for landslide susceptibility assessment [12].However, RVM has been seldom applied to the study of landslide fields, especially in regional landslide susceptibility analyses [13,14].Similar to the SVM model, the effect of the RVM model depends on the kernel function and kernel parameters.At present, the methods for choosing an effective kernel function and reasonable kernel parameters are still imperfect [15,16].
The multiple kernel learning method provides a proper solution for the kernel selection problem.It always combines some single kernels under certain rules to offer a suitable kernel mapping according to specific samples [16][17][18][19].In that way, the kernel selection problem is transformed into an optimization problem of kernel parameters and kernel weights in a multiple kernel structure.In recent years, the particle swarm optimization (PSO) algorithm [20][21][22][23] has been widely used to optimize the parameters of intelligent algorithms in many fields [24][25][26][27][28][29].It is a classic group optimization technique that Kennedy and Eberhart [20,21] designed on the basis of the actions of birds feeding.The optimization target can be searched by tracking individual and group particles of the population.At present, studies of the PSO algorithm can be divided into two groups: improvement research and applied research.Most of the improvement research has concentrated on setting the inertia weight and learning factors of the PSO algorithm [22][23][24][25].The adaptive cloud PSO (CPSO) algorithm is one of the most efficient methods in improvement research.It can both increase the convergence speed and ensure the diversity of a population [24,25,28].
In the present study, RVM models using five kernel types based on the CPSO algorithm were applied to landslide susceptibility mapping of the low hill area of China's Sichuan Province.The five-fold cross-validation method was used in the training processes of the five models to avoid the over-fitting problem [30,31], and cloud theory was used to improve the PSO convergence speed [24,25].In addition, the prediction performances of the five models were verified using a receiver operating characteristic (ROC) curve [32][33][34] and landslide dot density (LDD) [35,36].

Study Area
The study area (27 • 9 N-32 • 4 N and 102 • 9 E-108 • 1 E) is located in the eastern part of Sichuan Province, China (Figure 1), and covers approximately 112,000 km 2 (23% of Sichuan Province).It includes Chengdu, Zigong, Luzhou, Deyang, Mianyang, Guangyuan, Suining, Neijiang, Leshan, Nanchong, Meishan, Yibin, Guang'an, Dachuan, Ya'an, Bazhong, Ziyang and Liangshan, for a total of 18 cities or districts (Figure 1).It has a high population density and rapid economic development.According to statistics from 2009 [37][38][39][40], the population density was 437 persons per square kilometre, which was much higher than the provincial average (168 persons per square kilometre).The gross domestic product (GDP) was approximately 13,738.54 billion Yuan (96% of the province).However, its landslide disaster density, which is as large as one per 100 km 2 , is the highest in China [41].Therefore, research on the formation mechanisms of landslides and mapping the regional landslide susceptibility in this area are very useful for disaster prevention and land planning [38][39][40].
Landslides are induced by many factors that can be divided into two broad groups: geographical environment and human activities.First, the specific geographical environment of the study area provides favourable conditions for landslides: (1) low hills mainly cover the study area (98%), the altitudes of which primarily range from 0 m to 800 m (94%); (2) the topographic relief is large, and the altitude differs over 2000 meters from west to east; (3) the study area is located in the subtropical moist monsoon climate zone, and the annual rainfall is usually over 1000 mm from June to September (from records, 2664 landslides induced by rainfall had occurred in the low hilly area of Sichuan Province [42]); (4) many fractures, rivers (1.6 km/km 2 ) and sedimentary rocks (89%) are concentrated in this area.Second, human activities are usually the major factor for inducing landslides.There is a well-developed transportation system in the study area, and the total length of the road network (including national and provincial roads) exceeded 211,888 km by 2009 [42].
ISPRS Int.J. Geo-Inf.2016, 5,191 3 of 15 The gross domestic product (GDP) was approximately 13,738.54 billion Yuan (96% of the province).However, its landslide disaster density, which is as large as one per 100 km 2 , is the highest in China [41].Therefore, research on the formation mechanisms of landslides and mapping the regional landslide susceptibility in this area are very useful for disaster prevention and land planning [38][39][40].
Landslides are induced by many factors that can be divided into two broad groups: geographical environment and human activities.First, the specific geographical environment of the study area provides favourable conditions for landslides: (1) low hills mainly cover the study area (98%), the altitudes of which primarily range from 0 m to 800 m (94%); (2) the topographic relief is large, and the altitude differs over 2000 meters from west to east; (3) the study area is located in the subtropical moist monsoon climate zone, and the annual rainfall is usually over 1000 mm from June to September (from records, 2664 landslides induced by rainfall had occurred in the low hilly area of Sichuan Province [42]); (4) many fractures, rivers (1.6 km/km 2 ) and sedimentary rocks (89%) are concentrated in this area.Second, human activities are usually the major factor for inducing landslides.There is a well-developed transportation system in the study area, and the total length of the road network (including national and provincial roads) exceeded 211,888 km by 2009 [42].

Relevance Vector Machine
Given a set of nonlinear training samples , ( = 1,2, ⋯ , ), where is the number of samples, ∈ are the training influencing factors of landslides and ∈ 0,1 represents the state of the landslides (the landslide state is 1, and the non-landslide state is 0), assume that the objective function is independent and has additional noise; the output of RVM model can be expressed as Equations ( 1) and (2): where = (0, ) is the additive Gaussian noise, is the input vector that needs to be predicted, = [ , , , … , ] , is the model "weight", is the deviation and ( , ) is the selected kernel function.

Relevance Vector Machine
Given a set of nonlinear training samples {x i , t i } (i = 1, 2, • • • , M), where M is the number of samples, x i ∈ R n are the training influencing factors of landslides and t i ∈ {0, 1} represents the state of the landslides (the landslide state is 1, and the non-landslide state is 0), assume that the objective function is independent and has additional noise; the output of RVM model can be expressed as Equations ( 1) and (2): where ε i = N 0, σ 2 is the additive Gaussian noise, x is the input vector that needs to be predicted, ω = [ω 0 , ω 1 , ω 2 , . . . ,ω M ] T , ω i is the model "weight", ω 0 is the deviation and K (x, x i ) is the selected kernel function.
A Bayesian probability model is used to explain the effects of the noise (ε i ) on the predicted results.The model not only can improve the problem of setting the error parameter in SVM, but can also output the probability of the results, which is used to describe the landslide susceptibility.The Bayesian probability is: where t * is the prediction target of the new input vector x.
To prevent the over-fitting problem when evaluating the maximum likelihood estimation of ω and σ 2 , the automatic relevance determination (ARD) prior probability distribution can be defined as The prior distribution of the ω i is a Gaussian distribution with a mean of 0 and variance α −1 i .Following the Bayesian rule, the conditional probability can be written as: Using the Laplace method to make p ω, α, σ 2 t i approximate a Gaussian distribution: where µ i is the i-th element of the a posteriori mean vector (µ) of weights, ∑ i,i is the i-th diagonal element of the covariance matrix (∑) and γ i = 1 − ∂ i ∑ i,i .In the training process, Equation ( 6) continues to iterate until ∂ new i and (σ 2 ) new are approximately ∂ MP and σ 2 MP .In that case, most ∂ i tend to infinity, and the corresponding ω i tend to 0. Finally, a few ω tend to finite values, and their corresponding x i are the relevant vectors of the RVM model.
The logistic model of the regression method was applied to solve the problem of classification, which is written as: The probability of the prediction results can then be described as: where t = {t n } M n=1 , and the landslide discrimination criterion is:

Multiple Kernel RVM
The adaptive multi-kernel function of the linear combination is the most classic method [23] and is formed of a combination of an overall kernel function (polynomial) and a local kernel function (Gaussian).It can be written as: , σ > 0 and (11) where K gauss (x, x i ) is the Gaussian kernel function and K poly (x, x i ) is the polynomial kernel function.
The kernel weight β (β ∈ (0, 1)) is a regulatory factor that is used to adjust the contribution of each kernel.When β is 1, the multi-kernel function is the same as a Gaussian kernel function, and when β is 0, the multi-kernel function equates to a polynomial kernel function.The greater the kernel weight, the greater the contribution of the corresponding kernel to K mix (x, x i ).
After substituting Equation (10) into the original RVM model (Equation ( 2)), the output of the MKRVM model can be written as In this way, the multiple kernel RVM (MKRVM) model transforms the expression problem of the samples in the feature space into the setting problem of the kernel parameters and kernel weights [21].Three types of multi-kernel combinations are encountered in this paper: the combination of a Gaussian kernel and a polynomial kernel (Gauss and Poly), the combination of two polynomial kernels (Poly and Poly), and the combination of two Gaussian kernels (Gauss and Gauss).A combination of two same kernels is called a homogeneous kernel, and a combination of different kernels is called a heterogeneous kernel.

Particle Swarm Optimization
The iterative update strategy of original PSO algorithm is: where v i is the particle velocity, x i is an individual particle, ω is the inertia weight, P best (i) and G best (i) are the individual and global optimizations in the i-th iteration, respectively, r 1 and r 2 are random numbers (0-1) and c 1 and c 2 are the learning factors.In this paper, the original PSO was improved using cloud theory (CPSO).In the CPSO algorithm, the particle swarm is divided into three groups to calculate the inertia weight (ω) using different generation strategies.Suppose that the fitness (the prediction error of the RVM models) of the i-th particle in the k-th iteration is f k i ; the average of all of the fitnesses is is the population size); the average of the fitness that is lower than f k avg is set to f avg ; the average of the fitness that is higher than f k avg is set to f avg ; and the best fitness is set to f k best .The specific generation strategy is described as follows.
(1) When f k i ≤ f avg , the fitness of these particles is closer to the optimal solution (the lowest error rate).Therefore, set a low inertia weight value (ω = 0.2) to speed up local convergence.
(2) When f k i < f avg and f k i > f avg , these particles are relatively far from the best fitness, which can be improved by the cloud model.The expectation of the cloud model is Ex = f k best .The entropy can be calculated using the distance of the expectation and f avg : In addition, the hyper entropy was set using He = En/c 2 .
The value of the inertia weight can be described as: According to "3En" rules, the control parameters c 1 and c 2 were set to 3 and 10 [24]."normrnd" generates normally distributed data.(3) When f k i ≥ f avg , these particles need a higher inertia weight (ω = 0.9) to improve the global search capability.

PSO-MKRVM
Many of the PSO-MKRVM model parameters require initialization.For the study area, the RVM training samples were comprised of eight condition attributes (the landslides' predisposing factors) and one decision attribute (the states of the landslides).In the multi-kernel structure, one kernel weight β and two kernel parameters needed to be calculated.The optimization target of the PSO algorithm is to minimize the result of the five-fold cross-validation [33].The PSO population size was set to 20, and the number of iterations was set to 50.The iterative curves of the classic linear kernel function (Figure 2) show that the optimal solution could be found within 20 iterations.In addition, c 1 = c 2 = 2 was chosen to make the search region centre on P best (i) and G best (i).The particle of the PSO algorithm was set as x = (β, width1, width2).
The training process for the PSO-MKRVM models included the following steps.
Step 1: Initialization.Set some of the parameters of the PSO-MKRVM models, including the population size, iterations and learning factors.
Step 2: Optimization.Based on the training samples, the 5-fold cross-validation method and PSO algorithm are used to optimize the parameters and weights of the multi-kernel function.
Step 3: Building.Depending on the results of step 2, the MKRVM prediction models can be built.
Step 4: Prediction.The models built in Step 3 can be used to make distribution maps of the landslide susceptibility index (DMLSI).

PSO-MKRVM
Many of the PSO-MKRVM model parameters require initialization.For the study area, the RVM training samples were comprised of eight condition attributes (the landslides' predisposing factors) and one decision attribute (the states of the landslides).In the multi-kernel structure, one kernel weight and two kernel parameters needed to be calculated.The optimization target of the PSO algorithm is to minimize the result of the five-fold cross-validation [33].The PSO population size was set to 20, and the number of iterations was set to 50.The iterative curves of the classic linear kernel function (Figure 2) show that the optimal solution could be found within 20 iterations.In addition, = = 2 was chosen to make the search region centre on ( ) and ( ).The particle of the PSO algorithm was set as = ( , ℎ1, ℎ2).
The training process for the PSO-MKRVM models included the following steps.
Step 1: Initialization.Set some of the parameters of the PSO-MKRVM models, including the population size, iterations and learning factors.
Step 2: Optimization.Based on the training samples, the 5-fold cross-validation method and PSO algorithm are used to optimize the parameters and weights of the multi-kernel function.
Step 3: Building.Depending on the results of step 2, the MKRVM prediction models can be built.
Step 4: Prediction.The models built in Step 3 can be used to make distribution maps of the landslide susceptibility index (DMLSI).

Influencing Factors of Landslides
Overall, 382 landslide samples from the China Institute of Geo-Environmental Monitoring were used for this study; the landslides had occurred in the low hilly area of Sichuan in 2007.
Researchers have performed some studies on the susceptibility of landslide predisposing factors based on variable dimension fractal theory and a certainty factor probability model in this area [39,40].Eight landslide-predisposing factors have been demonstrated to be the major factors responsible for inducing landslides in this area.Based on the landslide inventories and thematic data available in

Influencing Factors of Landslides
Overall, 382 landslide samples from the China Institute of Geo-Environmental Monitoring were used for this study; the landslides had occurred in the low hilly area of Sichuan in 2007.
Researchers have performed some studies on the susceptibility of landslide predisposing factors based on variable dimension fractal theory and a certainty factor probability model in this area [39,40].Eight landslide-predisposing factors have been demonstrated to be the major factors responsible for inducing landslides in this area.Based on the landslide inventories and thematic data available in the study area, we used those factors to build landslide prediction models.The factors include landforms (altitude, slope and relief amplitude), geological lithology (lithology), geological structure (faults), cutting slope (river and road) and vegetation (normalized difference vegetation index (NDVI)).The values of the factors are listed in Table 1.Based on the above evaluation factor system (Table 1), eight maps of landslide predisposing factors (Figure 3) can be displayed using GIS technologies.Slope, altitude and relief amplitude were derived from the DEM.The slope (Figure 3A) cannot accurately reflect the landforms over equal intervals in the study area.Based on surface features, the slope was divided into 7 intervals.Most areas are in the range of the top four levels (0 • -25 • , 95.09%).However, many landslides are concentrated in a few areas (25 • -45 • , 4.68%).In addition, the landslides that occurred above 45 • are typically collapsed [5].The altitude (Figure 3B) was divided into six parts.Area (>800 m) accounts for only 5.65% of the study area, but it gathers a large number of landslides.The relief amplitude (Figure 3C) was derived from the GIS neighbourhood statistics module when the best statistics window radius was 1.1 km.The study area is mainly composed of hills (0-200 m, 71.81%) and low-mountain areas (200-600 m, 25.96%), where landslides have often occurred.Vegetation coverage was extracted from MODIS products and described by NDVI (Figure 3D).The NDVI values ranged from −1-1; the value increased with an increasing vegetation coverage rate and vice versa.Rivers and roads directly influence the regional geological environments of landslides [2].The buffer distance maps of rivers (Figure 3E) and roads (Figure 3F) show that landslides primarily occur in the lower buffer distance area.In addition, geological lithology and geological structure are important landslide factors and were extracted from geological maps.The lithology map (Figure 3G) was produced using the ArcGIS vector fusion (Dissolve) method.Limestone, basalt and carbonate rocks represent a small part of the lithology (4.45%), but a number of landslides (2.81, 3.2 and 3.9) occurred in them.Mudstone, sandstone and conglomerates make up 89% of the lithology, but only 1.7, 1.2 and 0.8 landslides per 100 square kilometres occurred in them.In this paper, the impact of faults on landslides was quantified by the buffer distance from the faults (Figure 3H).The fractured structure of the study area presents a spatial "chessboard form", where the lengths of the fracture zones vary from one kilometre to tens of kilometres.structure of the study area presents a spatial "chessboard form", where the lengths of the fracture zones vary from one kilometre to tens of kilometres.

Normalization Processing
Based on historical landslides and the distribution maps of the eight landslide predisposing factors, 750 training samples (382 landslide samples and 368 non-landslide samples) from the prediction model were extracted using the Arc Toolbox Extraction tool (Figure 4).Similarly, the prediction model inputs of the study area were extracted based on the 27,998 raster units in the study area (the widths and heights of the raster units were 2000 m).

Normalization Processing
Based on historical landslides and the distribution maps of the eight landslide predisposing factors, 750 training samples (382 landslide samples and 368 non-landslide samples) from the prediction model were extracted using the Arc Toolbox Extraction tool (Figure 4).Similarly, the prediction model inputs of the study area were extracted based on the 27,998 raster units in the study area (the widths and heights of the raster units were 2000 m).
These training samples and regional inputs had to be normalized into unified non-dimensional data using Equation ( 16): x = (x − x min ) / (x max − x min ) where x (x ∈ [x min , x max ]) represents the actual value of one factor, x min and x max are, respectively, the minimum and maximum of that factor and x ∈ (0 − 1) is the normalization result.The landslides were divided into two states: 1 (landslide) and 0 (non-landslide).To ensure the reliability and stability of the prediction models, a 5-fold cross-validation method was used to divide the 750 training samples into five equal parts.In the training process, four parts were used as training samples, and the other part was used as testing samples.Each part of the samples was used in the training role and testing role five times.The result of the 5-fold cross-validation method was used as the fitness of the PSO algorithm.

Normalization Processing
Based on historical landslides and the distribution maps of the eight landslide predisposing factors, 750 training samples (382 landslide samples and 368 non-landslide samples) from the prediction model were extracted using the Arc Toolbox Extraction tool (Figure 4).Similarly, the prediction model inputs of the study area were extracted based on the 27,998 raster units in the study area (the widths and heights of the raster units were 2000 m).These training samples and regional inputs had to be normalized into unified non-dimensional data using Equation ( 16): where ( ∈ [ , ]) represents the actual value of one factor, and are, respectively, the minimum and maximum of that factor and ∈ (0 − 1) is the normalization result.The landslides were divided into two states: 1 (landslide) and 0 (non-landslide).To ensure the reliability and stability of the prediction models, a 5-fold cross-validation method was used to divide the 750 training samples into five equal parts.In the training process, four parts were used as training samples, and the other part was used as testing samples.Each part of the samples was used

Model Training
Based on the 750 training samples and the 5-fold cross-validation method, the RVM model parameters were optimized using the PSO algorithm and are shown in Table 2.The "weight" in the multi-kernel structure represents the weight of the first kernel function."Width 1" and "Width 2" are the values of the first kernel and second kernel parameters, respectively.The search ranges of the two kernel parameters were [0.3, 1.5] and [0.6, 2.5].The "error rate" (ER) is the result of the 5-fold cross-validation.Table 2 shows that the "Gauss and Poly" kernel demonstrated the best prediction performance (ER = 0.28) when the three parameters were 0.6821, 0.6138 and 2.
These models were used to make maps of the distribution of the landslide susceptibility index (DMLSI) in the study area.Based on the DMLSI, a cluster analysis of the natural discontinuous points method was used to divide the study area into five landslide susceptibility zones: a very low susceptibility zone (VLS-zone), low susceptibility zone (LS-zone), moderate susceptibility zone (MS-zone), high susceptibility zone (HS-zone) and very high susceptibility zone (VHS-zone).The landslide susceptibility maps (LSM) of the study area developed using this method are presented in Figure 5.
(DMLSI) in the study area.Based on the DMLSI, a cluster analysis of the natural discontinuous points method was used to divide the study area into five landslide susceptibility zones: a very low susceptibility zone (VLS-zone), low susceptibility zone (LS-zone), moderate susceptibility zone (MS-zone), high susceptibility zone (HS-zone) and very high susceptibility zone (VHS-zone).The landslide susceptibility maps (LSM) of the study area developed using this method are presented in Figure 5.It is obvious that the LSM of the five models conforms well to the distribution of historical landslides.In addition, the models whose kernel function contained a Gaussian function performed better than those employing a polynomial function.This result means that the Gaussian function is more suitable for landslide susceptibility assessment in the study area.However, it is difficult to accurately evaluate the model prediction performances by using LSM.It is obvious that the LSM of the five models conforms well to the distribution of historical landslides.In addition, the models whose kernel function contained a Gaussian function performed better than those employing a polynomial function.This result means that the Gaussian function is more suitable for landslide susceptibility assessment in the study area.However, it is difficult to accurately evaluate the model prediction performances by using LSM.

Receiver Operating Characteristic Curve
ROC curves are a representative method for testing the accuracies of analysis methods and can reflect the relationship between specificity and sensitivity [32][33][34].They were initially applied in the assessment of the receptivity of radar signals and have recently been used to evaluate the prediction performance of medical diagnostic tests [32].The RIC index that is used to discriminate the accuracy of a "diagnosis" is the area under the ROC curve (AUC) [33].When the AUC value ranges from 0.5-1, the current method has a good "diagnosis" ability [9].The diagnostic accuracy can be divided into three levels: low (0.5-0.7), medium (0.7-0.9) and high (>0.9)[33,34].
In this paper, the landslide states were used as the state variable of the ROC curve, and the prediction results of the RVM training models were used as the test variable of the ROC curve.The ROC curves for the different kernel type analyses are shown in Figure 6.The curves indicate that the heterogeneous kernels achieved higher prediction performances (AUC = 0.7616) than did the other kernel types.In addition, the three multi-kernel types that contained a Gaussian function had stronger diagnostic abilities (AUC > 0.7) than did the other types (AUC < 0.7).The strong ability of Gaussian kernel functions for assessing landslide susceptibility in the study area has again been demonstrated.heterogeneous kernels achieved higher prediction performances (AUC = 0.7616) than did the other kernel types.In addition, the three multi-kernel types that contained a Gaussian function had stronger diagnostic abilities (AUC > 0.7) than did the other types (AUC < 0.7).The strong ability of Gaussian kernel functions for assessing landslide susceptibility in the study area has again been demonstrated.

Landslide Dot Density
The frequency ratios of the landslides in each zone of the five LSMs (Figure 5) were used to draw a landslide frequency ratio plot (Figure 7)).The plot results shows that the five frequency ratio lines largely conformed to the actual landslide distribution, in that the frequency ratios continued to grow from the VLS-zone to the VHS-zone.The figure also shows that the "Gauss and Gauss" kernel performed better between the HS-zone and VHS-zone than the other kernel types, but did not perform best in the two low susceptibility zones.In addition, the "Poly and Poly" and "Poly" kernels were almost the same in the five susceptibility zones.

Landslide Dot Density
The frequency ratios of the landslides in each zone of the five LSMs (Figure 5) were used to draw a landslide frequency ratio plot (Figure 7)).The plot results shows that the five frequency ratio lines largely conformed to the actual landslide distribution, in that the frequency ratios continued to grow from the VLS-zone to the VHS-zone.The figure also shows that the "Gauss and Gauss" kernel performed better between the HS-zone and VHS-zone than the other kernel types, but did not perform best in the two low susceptibility zones.In addition, the "Poly and Poly" and "Poly" kernels were almost the same in the five susceptibility zones.occurred mainly because the multi-kernel structure can adaptively adjust its kernel parameters to fit the specific study area.The classic linear kernel combination can not only maintain the interpolation ability of a local kernel function (Gaussian function), but also integrate the generalization ability of a global kernel function (polynomial function).As a final conclusion, the proposed method has a superior prediction skill and higher reliability for regional landslide susceptibility mapping.

Figure 1 .
Figure 1.Map of the distribution of hills in Sichuan.

Figure 1 .
Figure 1.Map of the distribution of hills in Sichuan.

Figure 2 .
Figure 2. Iterative curves of the classic linear kernel function.CPSO, cloud PSO.

Figure 2 .
Figure 2. Iterative curves of the classic linear kernel function.CPSO, cloud PSO.

Figure 4 .
Figure 4. Distribution map of landslides and non-landslides.

Figure 4 .
Figure 4. Distribution map of landslides and non-landslides.

15 Figure 5 .
Figure 5. Landslide susceptibility maps of the hilly area in Sichuan.(A) Landslide susceptibility map (LSM) by Gaussian; (B) LSM by polynomial; (C) LSM by Gaussian and polynomial; (D) LSM by Gaussian and Gaussian; (E) LSM by polynomial and polynomial.RVM, relevance vector machine.

Figure 5 .
Figure 5. Landslide susceptibility maps of the hilly area in Sichuan.(A) Landslide susceptibility map (LSM) by Gaussian; (B) LSM by polynomial; (C) LSM by Gaussian and polynomial; (D) LSM by Gaussian and Gaussian; (E) LSM by polynomial and polynomial.RVM, relevance vector machine.

Figure 6 .
Figure 6.ROC curves for the different kernel type analyses.

Figure 6 .
Figure 6.ROC curves for the different kernel type analyses.

Table 1 .
Evaluation factor system for landslides in the Sichuan Province hills.

Table 2 .
Kernel parameters and weight coefficient optimized using the PSO algorithm.Poly, polynomial.