Assessment of Landslide-Prone Areas and Their Zonation Using Logistic Regression, LogitBoost, and NaïveBayes Machine-Learning Algorithms

The occurrence of landslide in the hilly region of South Korea is a matter of serious concern. This study tries to produce landslide susceptibility maps for Jumunjin Country in South Korea. Three machine learning algorithms, namely Logistic Regression (LR), LogitBoost (LB), and NaïveBayes (NB) are used, and their final model outcomes are compared to each other. Firstly, a landslide inventory map and the associated input data layers of the landslide conditioning factors were developed based on field verification, historical records, and high-resolution remote-sensing data in the geographic information system (GIS) environment. Seventeen landslide conditioning factors were prepared, including aspect, slope, altitude, maximum curvature, profile curvature, topographic wetness index (TWI), topographic positioning index (TPI), distance from fault, convexity, forest type, forest diameter, forest density, land use/land cover, lithology, soil, flow accumulation, and mid slope position. The result showed that the area under the curve (AUC) values of LR, LB, and NB models were 84.2%, 70.7%, and 85.2%, respectively. The results revealed that the LR and LB models produced reasonable accuracy than respect to NB model in landslide susceptibility assessment. The final susceptibility maps would be useful for preliminary land-use planning and hazard mitigation purpose.


Introduction
A landslide is very complex natural phenomenon that affects human life, natural resources, and property worldwide [1].Annually, landslides cause 1000 deaths and about USD 4 billion loss of property worldwide [2].A landslide is described as a ground extent wide movement, which incorporates rock falls, shallow debris flow, and deep slope failures [3].The most ideal approach to mitigate the magnitude and effect of landslides is related to appropriate disaster management and mitigation plan.Landslide modeling through machine-learning models is considered as an important tool of landslide mitigation and management plan [4].There are number of factors which are co-related with the occurrence of landslide such as geomorphology, geology, and hydrological factors [5,6] but all the factors do not have a uniform impact at every event.The impact of those factors are varies from different places in different ways [7].For that, landslide susceptibility mapping at a regional scale becomes difficult and complex.However, in the last few decades a number of research institutes and governments have attempted to procedure landslide susceptibility maps for disaster prevention [8,9].Several researchers express many ways to delineate landslide susceptibility zones.For the development of landslide susceptibility maps, a key requirement is the spatial relationship between a landslide and its conditioning factors [10].In the present context, several researchers have applied numerous models to analysis the spatial relationship and then compared them by the geographic information system (GIS) environment [6,11].
In the present context, a machine-learning algorithm, as a powerful method of data driven tools, applied to learn the spatial relationship between landslide occurrence and landslide effecting factors, and overcome starting with an assumed structural model [34][35][36].Statistical based models are required large amounts of data, whereas machine-learning algorithms based models can effectively overcome the limitation of data dependency [9,37].
The aim of this research is the identification and delineation of landslide susceptibility maps (LSM) based on machine-learning algorithms (i.e., LR, LB, NB models) for the Jumunjin Country in South Korea by taking into consideration relevant parameters, grouping them based on their proximity of characters.The outcome of models also compared and validated by the receiver operating characteristic (ROC) curve method for section of the best model in the study area and finally tried to identify the relative importance of landslide conditioning factors by LVQ (Learning Vector Quantization) model [38].

Study Area
The Gangneung-city of Gangwon-do is mainly active in agriculture and tourism, and the average annual rainfall is about 1018.7 mm (Figure 1).The mountainous terrain is composed mostly of granite formed in the Mesozoic Jurassic, and the river basin with dense residential area is composed of the alluvial deposits deposited in the 4th Cenozoic era.
Typhoon Rusa in 2002 and typhoon Maemi in 2003 recorded record rainfall of more than 80 mm per hour.These typhoons caused more than 250 deaths and injuries in Korea.Gangwon province, Gangneung city, Gangneung area (70.84 km 2 , population 4219, 1914 households) and Jumunjin-eup (area 60.55 km 2 , population 21,291, 8917 households) are areas where many landslides have occurred due to typhoons Rusa and Mica.In particular, seven landslides of Gangneung city including the study area were cut off by a landslide caused by typhoon Rusa in 2002, and three residents of Samjyunjin-eup were killed and many isolated for more than 10 days.

Data Used
Landslide inventory data are considered the most important tools for foreseeing future landslide incidents [31].An inventory map is required to study the spatial relationship between landslide distribution and the effective factors of landslide.Regarding this investigation, 548 landslide locations were identified in the Jumunjin Country in South Korea (Figure 2).Different methods were applied for the landslide location identification such as inventory reports, high-resolution satellite images, and extensive field surveys.In this study, 17 landslide effective factors are used based on the literature review and data availability [6,22,26,[39][40][41][42][43].These factors include aspect, slope, altitude, maximum curvature, profile curvature, topographic wetness index (TWI), topographic positioning index (TPI), distance from fault, convexity, forest type, forest diameter, forest density, land use and land cover (LULC), lithology, soil, flow accumulation, and mid slope position.The data used in this present study are shown in Table 1.

Data Used
Landslide inventory data are considered the most important tools for foreseeing future landslide incidents [31].An inventory map is required to study the spatial relationship between landslide distribution and the effective factors of landslide.Regarding this investigation, 548 landslide locations were identified in the Jumunjin Country in South Korea (Figure 2).Different methods were applied for the landslide location identification such as inventory reports, high-resolution satellite images, and extensive field surveys.In this study, 17 landslide effective factors are used based on the literature review and data availability [6,22,26,[39][40][41][42][43].These factors include aspect, slope, altitude, maximum curvature, profile curvature, topographic wetness index (TWI), topographic positioning index (TPI), distance from fault, convexity, forest type, forest diameter, forest density, land use and land cover (LULC), lithology, soil, flow accumulation, and mid slope position.The data used in this present study are shown in Table 1.

Aspect
It is emphasized that the aspect is a vital factor in instigating landslide susceptibility mapping [1,39].Physically, aspect-related factors include orientation of discontinuities controlling landslide, sunlight, wind and precipitation [44][45][46].In this work, a digital elevation model (DEM) of 5 m spatial resolution was used to compute the aspect map (Figure 3a).

Slope Gradient
The slope gradient has considerable influence on slope stability Bednarik [47].Many researchers have shown that slope gradient is the main factor for landslide susceptibility maps (LSM) [48,49].Slope map was also developed based on the DEM; this map revels that the maximum basin area falls under steep to very steep slopes (Figure 3b).

Altitude
Altitude is another importance conditioning factors because it is affected by several geomorphic and geological processes [48,50].Generally, landslides have more tendencies to occur at high altitude [45].In this study area, the altitude ranges from 0.30 m to 956 m from mean sea level (Figure 3c).

Curvature (Maximum and Profile)
The curvature values represent slope shape and terrain morphology and can affect LSM in several ways as proposed by Haigh and Rawat (2012) [51].The maximum curvature represents the flow acceleration and erosion and deposition rate, and profile curvature affects the water flow velocity variation down the slope [9,52].Therefore, total and profile curvature maps were generated by the spatial analysis tool (Figure 3d,e).

Topographic Wetness Index (TWI)
The topographical wetness index is describe by the flow accumulation theoretical measure, where it affects soil moisture and ground water flow that affect the slope material [53,54].It is expressed as:

Aspect
It is emphasized that the aspect is a vital factor in instigating landslide susceptibility mapping [1,39].Physically, aspect-related factors include orientation of discontinuities controlling landslide, sunlight, wind and precipitation [44][45][46].In this work, a digital elevation model (DEM) of 5 m spatial resolution was used to compute the aspect map (Figure 3a).

Slope Gradient
The slope gradient has considerable influence on slope stability Bednarik [47].Many researchers have shown that slope gradient is the main factor for landslide susceptibility maps (LSM) [48,49].Slope map was also developed based on the DEM; this map revels that the maximum basin area falls under steep to very steep slopes (Figure 3b).

Altitude
Altitude is another importance conditioning factors because it is affected by several geomorphic and geological processes [48,50].Generally, landslides have more tendencies to occur at high altitude [45].In this study area, the altitude ranges from 0.30 m to 956 m from mean sea level (Figure 3c).

Curvature (Maximum and Profile)
The curvature values represent slope shape and terrain morphology and can affect LSM in several ways as proposed by Haigh and Rawat (2012) [51].The maximum curvature represents the flow acceleration and erosion and deposition rate, and profile curvature affects the water flow velocity variation down the slope [9,52].Therefore, total and profile curvature maps were generated by the spatial analysis tool (Figure 3d,e).

Topographic Wetness Index (TWI)
The topographical wetness index is describe by the flow accumulation theoretical measure, where it affects soil moisture and ground water flow that affect the slope material [53,54].It is expressed as: where, α is the cumulative upslope area drainage through a point and tan β is the slope angle at the point.In this study, the TWI was considered as an important contributing factor (Figure 3f).

Topographic Positioning Index (TPI)
A positive value of the TPI indicates higher locations comparing to averages and negative values indicates lower locations than the surrounding specified by neighborhood.The TPI maps of the present study area (Figure 3g) show an index value ranging from −30.86 to 50.83 indicating the coexistence of low land (depression, channels and valley); highlands (ridges, and residual hills); and constant slope areas (river plain, and agricultural fields), leading to rough topography over the whole catchment.

Distance from Fault
The landslide takes place mainly along the faults.Faults are the zones of weakness that not only decrease the rock strength; but also fracturing and unstable slope conditions [55,56].A Euclidean distance method was adopted to produce the distance to fault factor (Figure 3h).

Convexity
Convexity also considered as an important parameter for LSM, because slope movement is a natural phenomenon that happens in a gravitational field; a greater the degree of convexity indicates greater susceptibility to slope fails.Thus, convexity is an important factor affecting slope stability [57].In this area, the convexity ranges from 0 to 80.64 (Figure 3i).
3.9.Forest Factors (Forest Type, Forest Diameter, and Forest Density) Forest factors such as the types of forest, forest diameter, and forest density are the most importance factors for landslide susceptibility assessment.In the forest type, PK means Pinus koraiensis stand, D means Pinus densiflora stand, L means Arable land, PL means Larixkaempferi stand, PD means artificial Pinus koraiensis stand, M means Mixed stand and H means Hardwood stand.The diameter of the tree is less than 6 cm class 0, 6~16 cm is class1, class 2 is less than 18~28 cm, more than 30 cm was classified as class 3 (Figure 3j-l).The landslide occurrence is maximum in areas covered by artificial pine and non-forested area, and has low probability under paper palm trees and artificial rigid pine because the root systems of broad-leaf trees have more soil-holding capacity than needle-leaf trees.The landslide probability value is lower for medium diameter and is higher for non-forest and small diameter trees due to medium diameter trees having more roots and greater capacity to maintain the water and soil-pore pressure during heavy rain periods [23].In the case of trees age, landslide probability is higher in younger timber non-forested area and lower probability in older trees, again because an older tree has more roots [23].

Land Use/Land Cover (LULC)
Land use and land cover are affected by human activities and also alterations in the environment.It is also plays a significant role for the slope stability [58].In this research, LULC map is generated based on field observation and satellite images and classified into seven classes: 100, 200, 300, 400, 500, 600, and 700 (Figure 3m).

Lithology
The lithology can be regarded as one of the most important factors in the geo-hazard assessment [49,59], because strength, the rocks, soil permeability, and weathering are influence by lithological characteristics [60,61].The characteristics of the component materials of a slope such as permeability, porosity, and strength, play a crucial role in the stability of the slope, and are mostly different for various lithological units, so, this factor greatly influences the probability of landslide occurrence [58].In this work, the lithological map (Figure 3n) was prepared by the Geological survey of South Korea with 1: 50,000 (Table 2).

Soil
The soil is one of the most important effective factors for landslide assessment (Figure 3o) with the increase of soil depth and the tendency of soil to absorb moisture, so this reduces surface runoff rate.Soil moisture directly affects the slope material, by which pore water pressure diminishes soil stability.On the other hand, shallow soil depth is considered to be more unstable and leading to landslides [62].In this area, the maximum area falls under the SmF2 soil group having a shallow depth on steep slopes represents vulnerable condition for the occurrence of landslides [62].

Flow Accumulation
The flow accumulation is the quantity of water, which will move to each pixel from its neighbors and eventually accumulated to it [63].This is also an important factor for landslide susceptibility assessment [64] (Figure 3p).This map was generated by using the ArcGIS 10.2.2 hydrology tool.
In this work, the lithological map (Figure 3n) was prepared by the Geological survey of South Korea with 1: 50,000 (Table 2).

Soil
The soil is one of the most important effective factors for landslide assessment (Figure 3o) with the increase of soil depth and the tendency of soil to absorb moisture, so this reduces surface runoff rate.Soil moisture directly affects the slope material, by which pore water pressure diminishes soil stability.On the other hand, shallow soil depth is considered to be more unstable and leading to landslides [62].In this area, the maximum area falls under the SmF2 soil group having a shallow depth on steep slopes represents vulnerable condition for the occurrence of landslides [62].

Flow Accumulation
The flow accumulation is the quantity of water, which will move to each pixel from its neighbors and eventually accumulated to it [63].This is also an important factor for landslide susceptibility assessment [64] (Figure 3p).This map was generated by using the ArcGIS 10.2.2 hydrology tool.

Mid-Slope Position
The mid-slope position indicates the relative vertical distance from the middle of the slope of the ridge or valley.The larger value means greater distance [65].This is chosen as an important factor when modeling [66].(e) (f)

Multicolinearity of Landslide Effective Factors
Multicolinearity is a statistical analysis in which numbers of independent factors in a multiple regression model are strongly correlated, meaning that one can be linearly predicted from the others with a non-trivial degree of accuracy [67,68].
Normally, there are two exponents for considering multicollinearity of variables; those are Variance Inflation Factors (VIF) and Tolerance, defined as follows (Equations ( 2) and ( 3)): where,  is the coefficient of determination of a regression of explanatory J on all the other explanatory.A tolerance of less than 0.10 and variance inflation factors (VIF) >5 indicate multicolinearity problems [69,70].

Modeling for Landslide Susceptibility Zonation
In the current research, the models of Logistic Regression (LR), LogitBoost (LB), and NaïveBayes (NB) were used, and their results compared.For computing those models first, landslide inventory map were generated.Then, three methods (i.e., LR, LB, and NB) were used to model the spatial correlation between landslide conditioning factors and landslide occurrence in each scenario using R statistical software.Finally, the accuracy and variable contribution analysis of all models was evaluated by the ROC curve and LVQ methods, respectively.

Mid-Slope Position
The mid-slope position indicates the relative vertical distance from the middle of the slope of the ridge or valley.The larger value means greater distance [65].This is chosen as an important factor when modeling [66].

Multicolinearity of Landslide Effective Factors
Multicolinearity is a statistical analysis in which numbers of independent factors in a multiple regression model are strongly correlated, meaning that one can be linearly predicted from the others with a non-trivial degree of accuracy [67,68].
Normally, there are two exponents for considering multicollinearity of variables; those are Variance Inflation Factors (VIF) and Tolerance, defined as follows (Equations ( 2) and ( 3)): where, R 2 J is the coefficient of determination of a regression of explanatory J on all the other explanatory.A tolerance of less than 0.10 and variance inflation factors (VIF) >5 indicate multicolinearity problems [69,70].

Modeling for Landslide Susceptibility Zonation
In the current research, the models of Logistic Regression (LR), LogitBoost (LB), and NaïveBayes (NB) were used, and their results compared.For computing those models first, landslide inventory map were generated.Then, three methods (i.e., LR, LB, and NB) were used to model the spatial correlation between landslide conditioning factors and landslide occurrence in each scenario using R statistical software.Finally, the accuracy and variable contribution analysis of all models was evaluated by the ROC curve and LVQ methods, respectively.

Logistic Regression (LR)
Logistic regression is one of the best-fitting models to explore the relationship between a dependent variable and a set of independent explanatory variables, which could be categorical, continuous, or binary variables [71,72].The significant of the LR algorithm is that variables do not require normal distribution [17].Independent variables in the logistic regression model must be denoted as 0 and 1; those represent the landslide absence and presence.Finally, the model output represents probability value within a range of 0 and 1.The logistic regression is based on the logistic algorithm Pi, denoted as [42]: where, P represents the probability value co-related to a certain observation, and z expressed as: where, represents the intercept of the function, β also represents the independent variable's contribution X 1 , and n represents the number of influence factors [71].

LogitBoost (LB)
The logitBoost algorithm is generally used for best fitting the linear logistic regression functions at the tree node [73,74].It is used to solve the over-fitting problems by the built of logitBoost algorithm [75].The LogitBoost regression executes additive logistic regression with least-squares fits for individual class i.e., landslide occurrence and landslide non-occurrence as the following equation [76]: where, D denotes the number of landslide-dependent factors and β i is the coefficient of the i-th component within the input vector x.The posterior probabilities are developed by applying the linear logistic regression method, with the help of Equation ( 7) [73]: where, C is the number of classes and the least-square fits Lc(x) are resolved such ∑ C=1 C L C (x) = 0 that establish the least number of instances per node of logistic model trees.

NaïveBayes (NB)
A NaïveBayes classifier is a machine-learning algorithm classifier system based on Bayes' law, which works under an independent assumption of variables [77].The significant of the NB classifier is that it is very user-friendly for modeling without a requirement for any complicated iterative parameter estimation schemes [78].Therefore, this model has been successfully implemented for assessing landslide susceptibility mapping [9,26,38,79].In this research, the NB model was calculated by applying a R 3.0.2software environment using the "rminer' package [80].
Within an observation component of K attributes x i , is 1, 2, . . ., k (x i is landslide dependent factor), y j , j is landslide, no landslide within the output class.NaïveBayes estimates the probability value P(x i /y j ) for each and every possible output class.The largest posterior probability prediction is calculated based on following equation [36,76]: where, y j ε are landslide and non-landslide.
The conditional probability is calculated by applying flowing equation [31]: where, µ and δ are mean and standard deviation of x i respectively.

Analysis of Spatial Relationship between Landslide Location and Effective Factors Based on Frequency Ratio (FR)
The frequency ratio (FR) is the ratio of the area where landslide has occurred in respect to the total basin area, which is expressed as the ratio of the landslide probability in respect to the non-occurrences for a given attribute [2].The FR values of each class of explanatory variables are representing the degree of landslide occurrence.Based on Equation (10), the FR value will be calculated for different landslide conditioning factors: where, Npix(Sxi) represents the number of pixels of landslide within class i for factor x, Npix(Xj) represent the number of pixels within factor Xj, m is the number of classes in the parameter Xi and 'n' is the number of conditioning factors in the study area.

Analysis of Independent Variable's Importance
The models' ability to develop landslide susceptibility predictions by the influences of the various landslide factors and therefore recognise their impact is a significant work.To find out the relationship between landslide evens and conditioning factors importance analysis, learning vector quantization (LVQ) is implemented [81].The LVQ algorithm is efficiently applied in different fields for landslide susceptibility mapping [82], groundwater potential mapping [83], mineral potential mapping [84], and environmental science [84].Recently, Razavi Termeh et al. (2018) [81] and Pourghasemi and Rahmati (2018) [9] have utilized the LVQ algorithm to quantify variables importance for landslide susceptibility modeling.The Euclidean distance s assessed as the major rule of antagonism.The distance (Di) of neuron i is computed as the difference amongst the training vectors X and the reference vector Zi in following equation [85]: where, Xi and Zi are jth components of X and Zj, respectively.The importance of conditioning factors has been analyzed improving the learning equation of Zi.If the neuron stands in the false class [81,86]: or the neuron stands in the correct class, then, where, 1 represented if the ith neuron is a winner or 0 (otherwise), The neuron signifies the rate of exhilaration.λ(t) represents the learning intensity at time t.This LVQ algorithm functioned in R statistical software.In the present study, the comparative performance of the conditioning factors (independent factor) to landslide events (i.e., dependent factor) are computed applying the LVQ method.The details of the LVQ method can be noted in the articles of Kohonen (1995) [87], and Kohonen et al. (1996) [86].

Multicollinearity Analysis
In this research, multicollinearity among the conditioning factors of landslide locations was identified with the help of VIF and tolerance (Table 3).The highest VIF values represent as lowest tolerance level.The highest and lowest VIF values are, respectively, related to slope gradient (4.182) and aspect (1.039).The results revealed that there is no collinearity among the 17 landslide conditioning factors in this study.

Spatial Relationship between Landslide Locations and Effective Factors
The spatial relationship between effective factors and landslide locations were calculated based on FR bivariate statistical technique (Table 4), the continuous value of the factor extracted from the DEM was reclassified in 5 classes using the quantile for ease of analysis.The highest FR value (1.61) was found in western facing aspect class, followed by northern facing aspect with a FR value of 1.36.Slope angle is highly positively correlated with the spatial occurrence of landslides as steep slope (5th class) areas represents most of the identified landslides (FR = 2.38).The highest FR value (2.43) was found in the 5th elevation class, followed by the 4th class with a FR value of 1.90.The FR value was maximum of convex curvature for both cases i.e., maximum curvature as well as profile of curvature.The FR value was 1.49 and 1.36 for maximum and profile of curvature respectively, due to it having great influence on the susceptibility to landslide occurrence.The highest influence of convex curvature on landslide occurrence is found on areas where additional influences of convergences and divergence of running water during downhill flow are found [26].Naturally, the TWI has an intimate spatial correlation with landslide but it varies according to the varying surface topography regions.In the present scenario, only the 1st class of TWI has a greater FR value (FR = 2.40) followed by 2nd (FR = 1.14) and 3rd classes (FR = 0.91), respectively.The TPI represents the low land, high land, and constant slope areas.The FR value is maximum (1.55) for the 5th TPI classes that can represent high landslide occurrence rates.The distance from the fault is an important influencing factor for landslide occurrence because being near to the fault line represents the more weak zones due to it being characterized by heavily fractured rocks [88].There is a trend of negative relationship between distance from fault and landslide occurrence.The highest FR value (2.07) was found in the 3rd class of distance from the fault line factor.Forest factors also play an important role in landslide occurrence, such as the PL and PD types of forest having greater FR values of 7.36 and 5.22, respectively.In respect of forest diameter, the 2nd and 3rd classes of FR value are 2.12 and 1.45, and in respect to forest density class C has greater influence of landslide occurrence (FR = 1.44).In general, LULC types are very strong factors for landslide occurrence.We found especially high FR values in 400 (FR = 3.07) and 300 (FR = 1.23) classes.In addition, the highest FR value (2.10) represented in the Jbgr lithology class was followed by the Jjgr lithology class by FR value of 0.71.The FR value of 4.17 for a RC soil can be interpreted as a high landslide conditioning factor, while SmF2, and Muc FR values are 2.32 and 1.53, respectively in the area.Flow accumulation is one of the most important parameters that control the stability of a slope and the degree of saturation of the material on the slope [89].There is a trend of inverse relationship between the flow accumulation and landslide occurrence.The highest FR value in this 4th factor class is 1.47.

Landslide Susceptibility Models
Landslide predictions by the modeling and simulation are an important prospect for natural resources studies [9].Therefore, the prepared landslide susceptible map can be used as a valuable tool for conservation and sustainable planning of landslide-prone areas.For this reason, landslide susceptibility assessment is required to be as accurate as possible.For the purpose of comparative visualization, the landslide susceptibility maps generated by applying three machine-learning algorithms (LR, LB and NB) are represented in Figures 5-7.The landslide susceptibility maps (LR, LB and NB models) represent the low to very high susceptibility of a landslide occurrence.A higher index represents a higher susceptibility of the area to landslides.Classification of the landslide susceptibility index into different classes is necessary for the description and easier interpretation of spatial landslide estimation [36].For this reason, these landslide susceptibility values are divided into five classes based on the natural break [17,90] range representing five different zones in landslide susceptibility maps; very low, low, medium, high, and very high zones.It is clear in all landslide susceptibility maps that the maximum landslide susceptible areas are found middle part of this basin.Concurrent with this final model output, it was found that logistic regression and NaïveBayes generate LSMs that are spatially discontinuous, while the logitBoost (LB) model produce smoother patterns.

Landslide Susceptibility Models
Landslide predictions by the modeling and simulation are an important prospect for natural resources studies [9].Therefore, the prepared landslide susceptible map can be used as a valuable tool for conservation and sustainable planning of landslide-prone areas.For this reason, landslide susceptibility assessment is required to be as accurate as possible.For the purpose of comparative visualization, the landslide susceptibility maps generated by applying three machine-learning algorithms (LR, LB and NB) are represented in Figures 5-7.The landslide susceptibility maps (LR, LB and NB models) represent the low to very high susceptibility of a landslide occurrence.A higher index represents a higher susceptibility of the area to landslides.Classification of the landslide susceptibility index into different classes is necessary for the description and easier interpretation of spatial landslide estimation [36].For this reason, these landslide susceptibility values are divided into five classes based on the natural break [17,90] range representing five different zones in landslide susceptibility maps; very low, low, medium, high, and very high zones.It is clear in all landslide susceptibility maps that the maximum landslide susceptible areas are found middle part of this basin.Concurrent with this final model output, it was found that logistic regression and NaïveBayes generate LSMs that are spatially discontinuous, while the logitBoost (LB) model produce smoother patterns.

Accuracy Assessment and Their Comparison
The ROC curve was generated to compare the machine-learning models for validation dataset [86].In the case of the prediction rate curve (using validation data set), the LogitBoost displayed the highest AUC value of 85.2% followed by logistic regression (AUC = 84.2%,and NaïveBayes (AUC = 70.7%)models.The result of prediction curve reveals that the LB model is relatively better than the LR and NB models.The models LB and LR were reasonably satisfactory for the prediction of the landslide susceptibility map in the study area.
Although, different machine-learning algorithms for landslide susceptibility assessment have been implemented, the prediction accuracy of these models is also debated [9,26].Previous research works revealed that some of the machine-learning techniques had equal model accuracy, although they are unique in respect to individual approaches for landslide susceptibility assessment and identifying the relevant spatial relationship between the landslide factors and landslide initiation [38].Exploring and gathering knowledge of these differences is necessary to implement a suitable model for a significant study goal for a proposed study area [91,92].The use of machine-learning algorithms is relatively easier than statistical model for identifying interactions between dependent and independent variables [24].The analyzed result shows a reasonable performance regarding predicting landslide susceptibility mappings as receiver operating characteristic (ROC) curve in Figure 8.
The result revels that the LB model was the most accurate model in comparison to the LR and NB models in this study area.The better results of the LB model due to it solving the data over fitting problems [76] and also this model uses maximum likelihood to find the smallest probable difference between predicted and observed values [93].Tsangaratos and Ilia (2018) [79] compared a logistic regression and NaïveBayes model in landslide susceptibility assessment.The result reveals that the NB model had great predication accuracy [79].But in this study area the NB model did not predict a better result in respect to LR and LB models because the NB classifier converges more accelerated than LB and LR models, meaning that it can run with the help of less training data set and still have lesser predictive power in this study area.However, if one assesses the performance of the two models i.e., NB and LR, the LR approaches provides a more appropriate result.When applying NaïveBayes to models with the help of small number of variables the increase of the training data size but still it's not affected the training accuracy, but this pattern is not shown in the cases of LR

Accuracy Assessment and Their Comparison
The ROC curve was generated to compare the machine-learning models for validation dataset [86].In the case of the prediction rate curve (using validation data set), the LogitBoost displayed the highest AUC value of 85.2% followed by logistic regression (AUC = 84.2%,and NaïveBayes (AUC = 70.7%)models.The result of prediction curve reveals that the LB model is relatively better than the LR and NB models.The models LB and LR were reasonably satisfactory for the prediction of the landslide susceptibility map in the study area.
Although, different machine-learning algorithms for landslide susceptibility assessment have been implemented, the prediction accuracy of these models is also debated [9,26].Previous research works revealed that some of the machine-learning techniques had equal model accuracy, although they are unique in respect to individual approaches for landslide susceptibility assessment and identifying the relevant spatial relationship between the landslide factors and landslide initiation [38].Exploring and gathering knowledge of these differences is necessary to implement a suitable model for a significant study goal for a proposed study area [91,92].The use of machine-learning algorithms is relatively easier than statistical model for identifying interactions between dependent and independent variables [24].The analyzed result shows a reasonable performance regarding predicting landslide susceptibility mappings as receiver operating characteristic (ROC) curve in Figure 8.
The result revels that the LB model was the most accurate model in comparison to the LR and NB models in this study area.The better results of the LB model due to it solving the data over fitting problems [76] and also this model uses maximum likelihood to find the smallest probable difference between predicted and observed values [93].Tsangaratos and Ilia (2018) [79] compared a logistic regression and NaïveBayes model in landslide susceptibility assessment.The result reveals that the NB model had great predication accuracy [79].But in this study area the NB model did not predict a better result in respect to LR and LB models because the NB classifier converges more accelerated than LB and LR models, meaning that it can run with the help of less training data set and still have lesser predictive power in this study area.However, if one assesses the performance of the two models i.e., NB and LR, the LR approaches provides a more appropriate result.When applying NaïveBayes to models with the help of small number of variables the increase of the training data size but still it's not affected the training accuracy, but this pattern is not shown in the cases of LR models.LR had provided a positive relationship with training data a predictive accuracy.It can be concluded that the LB model had a great performance because it provides a better discrimination between presence and absences of landslide occurrence.With respect to landslide susceptibility assessment, the NB model does not perform better than the LR and LB models in this study area.

Conclusions
Mitigation landslide hazard and management planning plays a crucial role in sustaining biotic society and the economic world.In this research, three machine-learning models (LR, LB and NB) were used to identify the possible landslide-prone areas at a regional scale.The landslide susceptibility maps can be used to (i) most accurately identify areas that in the present time are more vulnerable to landslide occurrence; (ii) identify the most important factors for landslide occurrence through the LVQ method which can be useful for tools developing land use management strategies; (iii) finally, LSMs provide more sustainable landslide planning with an emphasis on land degradation management.
In order to validation process to prove the prediction ability of three proposed models, the prediction rate curve of the ROC was used.The AUC was calculated based on the validation data set.The AUC results showed that the prediction rates are 85.2%, 84.2% and 70.7%, respectively, for LB, LR and NB models.In addition, the LB was estimated to have prediction ability in respect to other machine-learning models in this study area.The logitBoost algorithm is generally used for best fitting the linear logistic regression functions at the tree node and solved the overfitting problems by the use of logitBoost algorithm.Finally, these landslide susceptibility maps can be supportive for planners and policy makers in hazard and disaster management planning.

Figure 1 .
Figure 1.The location ofthe Jumunjin area, from Google maps.

Figure 1 .
Figure 1.The location ofthe Jumunjin area, from Google maps.

Figure 2 .
Figure 2. Landslide inventory map with altitude of the study area.

Figure 2 .
Figure 2. Landslide inventory map with altitude of the study area.

Figure 4 .
Figure 4. Variable contribution of factors according to Learning Vector Quantization (LVQ) method.

Sustainability 2018 ,
10, x FOR PEER REVIEW 18 of 23 models.LR had provided a positive relationship with training data a predictive accuracy.It can be concluded that the LB model had a great performance because it provides a better discrimination between presence and absences of landslide occurrence.With respect to landslide susceptibility assessment, the NB model does not perform better than the LR and LB models in this study area.

Figure 8 .
Figure 8. Receiver operating characteristic (ROC) curve of the landslide susceptibility maps using validation data set.

Table 1 .
Details of data used in landslide susceptibility assessment.

Table 1 .
Details of data used in landslide susceptibility assessment.

Table 2 .
Lithological units in the study area.

Table 2 .
Lithological units in the study area.

Table 3 .
Multicollinearity analysis for the landslide conditioning factors.

Table 4 .
Spatial relationship between each effective factor and gully erosion locations using frequency ratio (FR) model.