Landslide Susceptibility Mapping and Comparison Using Decision Tree Models : A Case Study of Jumunjin Area , Korea

We assessed landslide susceptibility using Chi-square Automatic Interaction Detection (CHAID), exhaustive CHAID, and Quick, Unbiased, and Efficient Statistical Tree (QUEST) decision tree models in Jumunjin-eup, Gangneung-si, Korea. A total of 548 landslides were identified based on interpretation of aerial photographs. Half of the 548 landslides were selected for modeling, and the remaining half were used for verification. We used 20 landslide control factors that were classified into five categories, namely topographic elements, hydrological elements, soil maps, forest maps, and geological maps, to determine landslide susceptibility. The relationships of landslide occurrence with landslide-inducing factors were analyzed using CHAID, exhaustive CHAID, and QUEST models. The three models were then verified using the area under the curve (AUC) method. The results showed that the CHAID model (AUC = 87.1%) was more accurate than the exhaustive CHAID (AUC = 86.9%) and QUEST models (AUC = 82.8%). The verification results showed that the CHAID model had the highest accuracy. There was high susceptibility to landslides in mountainous areas and low susceptibility in coastal areas. Analyzing the characteristics of the landslide control factors in advance will enable us to obtain more accurate results.


Introduction
Landslides are natural disasters that can cause serious losses to both human life and property.In general, landslides are closely related to the slope of the terrain and high-slope terrain generally occurs in mountainous areas.With 70% of its total area covered by mountainous terrain, Korea is particularly susceptible to landslides.In addition to these topographical conditions, heavy rainfall in summer and high rainfall due to typhoons also increases the likelihood of landslides [1].
Jumunjin area is located in Gangneung-si, Gangwon-do, South Korea (Figure 1a,b).The main economic activities in the city of Gangneung-si are agriculture and tourism.The average annual rainfall is 1018.7 mm.The mountainous terrain is composed mostly of granite that was formed in the Mesozoic Jurassic, and the river basin, which contains the densely-populated residential areas, is composed of alluvial deposits from the Fourth Cenozoic era.Typhoons Rusa in 2002 and Maemi in 2003 were accompanied by record rainfall of more than 80 mm/h.The two typhoons were responsible for more than 250 deaths and injuries.Typhoons Rusa and Maemi caused numerous landslides in the Sacheon-myeon (70.84 km 2 , population 4219, 1914 households) and Jumunjin area (60.55 km 2 , population 21,291, 8917 households).In particular, seven landslides occurred in Gangneung-si, and the city, as well as the present study area, was cut off by a landslide, which was caused by Typhoon Rusa in 2002.This storm also claimed the lives of three residents from Jumunjin area, which was cut off for more than ten days.
To choose the most appropriate approaches to minimize the damage caused both directly and indirectly by landslides, we must first identify the areas that are susceptible to landslides and where they are most likely to occur [3,4].The most common approach used to identify landslide-susceptible areas is (geographic information system) GIS-based landslide susceptibility assessments, which include various classification-based methods, such as statistical, machine learning, and probabilistic approaches [5,6].Probabilistic models are based on the same method as the frequency ratio (FR) [6][7][8][9], the weight of evidence (WOE), and the evidential belief function (EBF).Statistical models include the statistical index, analytical hierarchy process (AHP) [10], and logistic regression [11][12][13].In recent years, fuzzy logic [14][15][16], fuzzy rule-based classifiers, neuro-fuzzy models [17][18][19], multivariate adaptive regression splines (MARS) [20], decision trees [21][22][23][24], neural networks [25][26][27][28], and support vector machines [29][30][31] have also been used.In this study, we applied probabilistic models to Gangneung-si in Gangwon-do using landslide data generated during typhoons in 2002 and 2003.Because the accuracy of the same model may vary for the mapping of landslide susceptibilities, it is important to select the most appropriate model for the region's characteristics.Decision tree models using CHAID, exhaustive CHAID, and QUEST were applied to each study area to select the more accurate algorithms.

Data and Pre-Processing
Daum map is useful because it provides previous as well as current aerial photographs [30,31].The aerial photographs without ground control point used to find landslide points in this study are two photographs taken in 2008 and 2014.It has 0.5 × 0.5 m resolution.2008 data is used to analyze events after landslides because data in 2008 recorded many small-scale landslides that spread almost throughout the region.We identify small-scale landslides to prevent large-scale landslides when typhoon occurs.2014 data used to watch before landslides occur because during this period largescale landslides did not occur.There is a disadvantage in that the acquired photographs have no  2 , population 21,291, 8917 households).In particular, seven landslides occurred in Gangneung-si, and the city, as well as the present study area, was cut off by a landslide, which was caused by Typhoon Rusa in 2002.This storm also claimed the lives of three residents from Jumunjin area, which was cut off for more than ten days.
To choose the most appropriate approaches to minimize the damage caused both directly and indirectly by landslides, we must first identify the areas that are susceptible to landslides and where they are most likely to occur [3,4].The most common approach used to identify landslide-susceptible areas is (geographic information system) GIS-based landslide susceptibility assessments, which include various classification-based methods, such as statistical, machine learning, and probabilistic approaches [5,6].Probabilistic models are based on the same method as the frequency ratio (FR) [6][7][8][9], the weight of evidence (WOE), and the evidential belief function (EBF).Statistical models include the statistical index, analytical hierarchy process (AHP) [10], and logistic regression [11][12][13].In recent years, fuzzy logic [14][15][16], fuzzy rule-based classifiers, neuro-fuzzy models [17][18][19], multivariate adaptive regression splines (MARS) [20], decision trees [21][22][23][24], neural networks [25][26][27][28], and support vector machines [29][30][31] have also been used.In this study, we applied probabilistic models to Gangneung-si in Gangwon-do using landslide data generated during typhoons in 2002 and 2003.Because the accuracy of the same model may vary for the mapping of landslide susceptibilities, it is important to select the most appropriate model for the region's characteristics.Decision tree models using CHAID, exhaustive CHAID, and QUEST were applied to each study area to select the more accurate algorithms.

Data and Pre-Processing
Daum map is useful because it provides previous as well as current aerial photographs [30,31].The aerial photographs without ground control point used to find landslide points in this study are two photographs taken in 2008 and 2014.It has 0.5 × 0.5 m resolution.2008 data is used to analyze events after landslides because data in 2008 recorded many small-scale landslides that spread almost throughout the region.We identify small-scale landslides to prevent large-scale landslides when typhoon occurs.2014 data used to watch before landslides occur because during this period large-scale landslides did not occur.There is a disadvantage in that the acquired photographs have no coordinate system, but those have the advantage of obtaining higher resolution than satellite images.In order to solve the above, Georeferencing was performed to input GCPs of recent aerial photographs with coordinate system in 2008 and 2014 photographs [32][33][34][35].This method can cause errors, because the users directly check to the reference point.To minismize this error, we used the same coordinates as the structure that is fixed in the study area and repeated until the RSME value was less than 0.5.The landslide point was selected in a naturally tree-grown area compared to the photographs that were obtained in 2008 and 2014 (Figure 2).From the digital aerial photograph, we identified areas that experienced landslides.All landslides are marked using point data.Using this process, a total of 548 locations of landslide occurrence were identified in the study area as landslide inventory (Figure 3).The total of 548 landslides were divided into two datasets; 50% of these were designated as training data and 50% as validation data.The training data were used to building landslide susceptibility map, while the validation data were used to accuracy assessment.
Remote Sens. 2018, 10, x FOR PEER REVIEW 3 of 17 coordinate system, but those have the advantage of obtaining higher resolution than satellite images.In order to solve the above, Georeferencing was performed to input GCPs of recent aerial photographs with coordinate system in 2008 and 2014 photographs [32][33][34][35].This method can cause errors, because the users directly check to the reference point.To minismize this error, we used the same coordinates as the structure that is fixed in the study area and repeated until the RSME value was less than 0.5.The landslide point was selected in a naturally tree-grown area compared to the photographs that were obtained in 2008 and 2014 (Figure 2).From the digital aerial photograph, we identified areas that experienced landslides.All landslides are marked using point data.Using this process, a total of 548 locations of landslide occurrence were identified in the study area as landslide inventory (Figure 3).The total of 548 landslides were divided into two datasets; 50% of these were designated as training data and 50% as validation data.The training data were used to building landslide susceptibility map, while the validation data were used to accuracy assessment.To ensure validity, the parameters were selected by calculating the FR of all the calculated factors using the FR method.A similar procedure was used for Jumunjin area in Gangneung-si, Gangwondo, which identified a total of 548 landslides that occurred in the area.The terrain data were interpolated using a DEM with a 1:5000 digital topographic map.The topographical factors extracted from the DEM of 10 × 10 m resolution for Jumunjin area were slope, slope direction, maximum curvature, lateral curvature, convexity, structure, surface area, central slope position (MSP), and topographic location index (TPI), which indicated the amount of flow aggregation and the topographic wetting index (TWI) [36][37][38].The rock type can affect landslide occurrence because of the differences in the rock strength and structural characteristics.Joints in the rock can contribute to Remote Sens. 2018, 10, x FOR PEER REVIEW 3 of 17 coordinate system, but those have the advantage of obtaining higher resolution than satellite images.In order to solve the above, Georeferencing was performed to input GCPs of recent aerial photographs with coordinate system in 2008 and 2014 photographs [32][33][34][35].This method can cause errors, because the users directly check to the reference point.To minismize this error, we used the same coordinates as the structure that is fixed in the study area and repeated until the RSME value was less than 0.5.The landslide point was selected in a naturally tree-grown area compared to the photographs that were obtained in 2008 and 2014 (Figure 2).From the digital aerial photograph, we identified areas that experienced landslides.All landslides are marked using point data.Using this process, a total of 548 locations of landslide occurrence were identified in the study area as landslide inventory (Figure 3).The total of 548 landslides were divided into two datasets; 50% of these were designated as training data and 50% as validation data.The training data were used to building landslide susceptibility map, while the validation data were used to accuracy assessment.To ensure validity, the parameters were selected by calculating the FR of all the calculated factors using the FR method.A similar procedure was used for Jumunjin area in Gangneung-si, Gangwondo, which identified a total of 548 landslides that occurred in the area.The terrain data were interpolated using a DEM with a 1:5000 digital topographic map.The topographical factors extracted from the DEM of 10 × 10 m resolution for Jumunjin area were slope, slope direction, maximum curvature, lateral curvature, convexity, structure, surface area, central slope position (MSP), and topographic location index (TPI), which indicated the amount of flow aggregation and the topographic wetting index (TWI) [36][37][38].The rock type can affect landslide occurrence because of the differences in the rock strength and structural characteristics.Joints in the rock can contribute to To ensure validity, the parameters were selected by calculating the FR of all the calculated factors using the FR method.A similar procedure was used for Jumunjin area in Gangneung-si, Gangwon-do, which identified a total of 548 landslides that occurred in the area.The terrain data were interpolated using a DEM with a 1:5000 digital topographic map.The topographical factors extracted from the DEM of 10 × 10 m resolution for Jumunjin area were slope, slope direction, maximum curvature, lateral curvature, convexity, structure, surface area, central slope position (MSP), and topographic location index (TPI), which indicated the amount of flow aggregation and the topographic wetting index (TWI) [36][37][38].The rock type can affect landslide occurrence because of the differences in the rock strength and structural characteristics.Joints in the rock can contribute to landslides in areas that are susceptible to their occurrence.We used the distribution of rocks and a 1:25,000 monolayer on a 1:25,000 geological map [39].Land cover, such as terrain and hydrological patterns, depends on the soil components.The soil components were used to produce soil covering at 1:5000 soil depths, because they can be a contributing factor of landslides caused by rainfall.In addition, the stability of a slope is affected by the distribution of plants on the slope [40].Therefore, the clinical classification, grade, severity, and density were determined using factors such as forest type and the diameter, density, and age of trees in the forest (Table 1) [41,42].The landslide conditioning factors were divided into a numerical and categorical factor.The numerical factor has continuous value for each class so we can select the class, for example slope, convexity, surface area, etc.The categorical factor were divided class by its category, for example aspect, forest factor, geology, etc.

Method
The first step to make the landslide susceptibility map is analyzed the correlation between landslide point and landslide occurrence using aerial photographs.We surveyed the locations of landslides in the aerial photographs of Gangneung-si in Gangwon-do and selected Jumunjin area in Gangneung-si as study areas because they are strongly affected by landslides.A total of 548 landslides were identified using an analysis of aerial photographs; 50% of these were designated as training data and 50% as validation data.Therefore, the same number non-landslide pixels was randomly sampled from the free-landslide area, where landslide pixels were assigned a value of 1 and non-landslide pixels were assigned a value of 0. To construct the landslide susceptibility maps and evaluate their performance, the landslide inventory and landslide conditioning factor maps were converted into ASCII (American Standard Code for Information Interchange) format.ArcGIS was used to convert all of the input data into ASCII data.The transformed data were analyzed using SPSS; the number of data per factor was 1,046,756.Then, landslide training data and landslide conditioning factors were analyzed while using the decision tree method to calculate the landslide susceptibility index and building a landslide susceptibility map in the study area (Figure 4).
factors were analyzed while using the decision tree method to calculate the landslide susceptibility index and building a landslide susceptibility map in the study area (Figure 4).Decision trees are an analytical technique used to perform a decision analysis.They are used to search for and model the relationships, patterns, and rules that exist in large datasets.The structure of a decision tree consists of nodes, starting with the root node and continuing to the child nodes until each branch reaches the end node based on separation criterion, a stopping rule, pruning, and so on.The node at the end of the tree, where no further branching occurs, is called the end node, and the distance from the root node to the terminal node is referred to as the depth [16].The process of decision tree analysis proceeds in order from decision tree formation through pruning feasibility evaluation and interpretation to prediction [43,44].Decision tree analysis using dependent and independent variables.In this research, the dependent variable data used is landslide inventory while the independent data used is landslide conditioning factors.In this study, we used three algorithms from the decision tree method to analyse landslide susceptibility map that is Chi-square automatic interaction detection (CHAID), exhaustive CHAID, Quick, Unbiased, and Efficient Statistical Tree (QUEST).
Chi-square automatic interaction detection (CHAID) is an algorithm that performs Dodge separation using the chi-square test (categorical target variable) or F-test (continuous target variable).The CHAID algorithm uses the Pearson's chi-square statistic or likelihood ratio chi-square statistic as a separation criterion when the target variable is categorical.Here, the likelihood ratio squared statistic can be used if the target variable is an ordered or pre-grouped continuous-type variable [43][44][45][46].The exhaustive CHAID was modified by applying the basic algorithm of CHAID.The exhaustive CHAID algorithm continues to merge categories, regardless of their importance until each of the two categories remains for each predictor.If you need to analyze large amounts of data or variables, it can take a long time.Finally, it takes more time to calculate branches that are considered to be more important than when calculating common branches [47][48][49].Both CHAID and exhaustive CHAID algorithms consist of three steps: merging, splitting, and stopping.Splitting and stopping steps in the Exhaustive CHAID algorithm are the same as those in CHAID.But, merging step uses an exhaustive search procedure to merge any similar pair until only a single pair remains.Decision trees are an analytical technique used to perform a decision analysis.They are used to search for and model the relationships, patterns, and rules that exist in large datasets.The structure of a decision tree consists of nodes, starting with the root node and continuing to the child nodes until each branch reaches the end node based on separation criterion, a stopping rule, pruning, and so on.The node at the end of the tree, where no further branching occurs, is called the end node, and the distance from the root node to the terminal node is referred to as the depth [16].The process of decision tree analysis proceeds in order from decision tree formation through pruning feasibility evaluation and interpretation to prediction [43,44].Decision tree analysis using dependent and independent variables.In this research, the dependent variable data used is landslide inventory while the independent data used is landslide conditioning factors.In this study, we used three algorithms from the decision tree method to analyse landslide susceptibility map that is Chi-square automatic interaction detection (CHAID), exhaustive CHAID, Quick, Unbiased, and Efficient Statistical Tree (QUEST).
Chi-square automatic interaction detection (CHAID) is an algorithm that performs Dodge separation using the chi-square test (categorical target variable) or F-test (continuous target variable).The CHAID algorithm uses the Pearson's chi-square statistic or likelihood ratio chi-square statistic as a separation criterion when the target variable is categorical.Here, the likelihood ratio squared statistic can be used if the target variable is an ordered or pre-grouped continuous-type variable [43][44][45][46].The exhaustive CHAID was modified by applying the basic algorithm of CHAID.The exhaustive CHAID algorithm continues to merge categories, regardless of their importance until each of the two categories remains for each predictor.If you need to analyze large amounts of data or variables, it can take a long time.Finally, it takes more time to calculate branches that are considered to be more important than when calculating common branches [47][48][49].Both CHAID and exhaustive CHAID algorithms consist of three steps: merging, splitting, and stopping.Splitting and stopping steps in the Exhaustive CHAID algorithm are the same as those in CHAID.But, merging step uses an exhaustive search procedure to merge any similar pair until only a single pair remains.
Quick, Unbiased, and Efficient Statistical Tree (QUEST) is an algorithm that uses a statistical test method and a binary division decision tree algorithm for classification and data mining.The QUEST tree growth process consists of selecting the partition predictor, selecting and stopping the partition point for the selected predictor.This algorithm only considers univariate partitioning.It performs an edge separation by identifying a separation variable and a separation point in a selected separation parameter [50].As the separation criterion, the continuous probability variable of the analysis of variance (ANOVA) F-statistic is calculated, and the categorical variable is selected as that with the least significant probability by calculating the significance probability of the chi-square test statistic.
The decision tree used nominal (categorical) data as dependent variable, then Equations ( 1) and ( 2) of Pearson chi-squared will be used [51]. where, n ij , is the observed cell frequency and m ij , is the estimated expected cell frequency for (x n = i, y n = j) following the independence model.The corresponding p value given by p = Pr x d e > x 2 [52].

Results
We analyzed the relationships between the landslide occurrence sites and landslide occurrence factors in the study area.We randomly selected 279 sites (i.e., half of the 548 landslide occurrence sites) for the verification process.The numeric data was reclassified into five classes by quantile for ease of visual interpretation.The landslide susceptibility of the three algorithms was compared with the training data and the susceptibility was verified by comparing with validation data.The results from CHAID, exhaustive CHAID, and QUEST analyses were calculated by selecting the effective eye factor by calculating the FR, i.e., the rate of occurrence of landslides according to the grade of the element.Then, we recategorized the value of each element based on the results from the analysis and created a new factor map (Figure 5 and Figure S1).The resolution of the factor map is 10 × 10 m, which is the same resolution as DEM, because it is extracted from DEM.
Remote Sens. 2018, 10, x FOR PEER REVIEW 6 of 17 Quick, Unbiased, and Efficient Statistical Tree (QUEST) is an algorithm that uses a statistical test method and a binary division decision tree algorithm for classification and data mining.The QUEST tree growth process consists of selecting the partition predictor, selecting and stopping the partition point for the selected predictor.This algorithm only considers univariate partitioning.It performs an edge separation by identifying a separation variable and a separation point in a selected separation parameter [50].As the separation criterion, the continuous probability variable of the analysis of variance (ANOVA) F-statistic is calculated, and the categorical variable is selected as that with the least significant probability by calculating the significance probability of the chi-square test statistic.
The decision tree used nominal (categorical) data as dependent variable, then Equations ( 1) and ( 2) of Pearson chi-squared will be used [51].

= −
where, , is the observed cell frequency and , is the estimated expected cell frequency for = , = following the independence model.The corresponding p value given by = > [52].

Results
We analyzed the relationships between the landslide occurrence sites and landslide occurrence factors in the study area.We randomly selected 279 sites (i.e., half of the 548 landslide occurrence sites) for the verification process.The numeric data was reclassified into five classes by quantile for ease of visual interpretation.The landslide susceptibility of the three algorithms was compared with the training data and the susceptibility was verified by comparing with validation data.The results from CHAID, exhaustive CHAID, and QUEST analyses were calculated by selecting the effective eye factor by calculating the FR, i.e., the rate of occurrence of landslides according to the grade of the element.Then, we recategorized the value of each element based on the results from the analysis and created a new factor map (Figures 5 and S1).The resolution of the factor map is 10 × 10 m, which is the same resolution as DEM, because it is extracted from DEM.The results for each element are shown in the Table 2 below.In the Table 2, % landslide (+) means the percentage of landslide points in the class and the % domain (+) means the percentage of the total area occupied by each class.The results for each element are shown in the Table 2 below.In the Table 2, % landslide (+) means the percentage of landslide points in the class and the % domain (+) means the percentage of the total area occupied by each class.The range of CHAID values was 430-4248.Using these calculated values, we generated a map of landslide susceptibility indices classified into the five stages using the natural break method.The blue area indicates low landslide susceptibility, and the red area shows where the landslide is susceptibility is high.Spatial analysis was performed by referring to the map of landslide susceptibility that is generated using the three analytical methods.Most of the five rivers in Jumunjin area are located in mountainous areas, which had the highest landslide susceptibility, while the coastal areas showed the lowest landslide susceptibility.Landslide susceptibility should effectively predict landslide area and can be used to validate existing landslide location data.Therefore, a validity test was performed using the analysis results for landslide susceptibility.The landslide susceptibilities that were generated by the three algorithms were compared with the initial classified training data, and the generated susceptibilities were verified using the validation data.For this, 547 landslide occurrence points were randomly classified into 50% training data and 50% validation data.The training data were used for processing and the validation data were used for validation.A quantitative comparison among the three algorithms was performed using the AUC method to confirm the processing results.Figures 6-8 show the degree of landslide susceptibility using the CHAID, exhaustive CHAID, and QUEST rating values.The algorithm with the highest AUC was considered to be the best algorithm for the study.The AUC value can be obtained by plotting the Receiver operating characteristic (ROC) curve and Remote Sens. 2018, 10, 1545 12 of 16 calculating the area of the curve.In the ROC curve, the x and y axies represent True Positive Rate (TPR) and False Positive Rate (FPR), respectively, where TPR is the rate at which the true value is correctly predicted, and FPR is the rate at which false is predicted as true.TPR and FPR are inversely related to each other.In this study, TPR is the relationship between landslide susceptibility and landslide points, and FPR is the relationship between landslides and landslide points.This curve was the result of comparing the training data with landslide susceptibility.Figure 9 shows that the CHAID algorithm had the highest AUC value (0.871), followed by the exhaustive CHAID (0.869) and QUEST algorithms (0.828).The accuracies of the CHAID, exhaustive CHAID, and QUEST algorithms were 87.1%, 86.9%, and 82.8%, respectively; the CHAID algorithm produced the best estimate of susceptibility to landslides.The results show that the study area has a very small difference of 0.2% between CHAID and exhaustive CHAID.This is because both of the algorithms use chi-square test and f-test as the basic algorithm.The algorithm with the highest AUC was considered to be the best algorithm for the study.The AUC value can be obtained by plotting the Receiver operating characteristic (ROC) curve and calculating the area of the curve.In the ROC curve, the x and y axies represent True Positive Rate (TPR) and False Positive Rate (FPR), respectively, where TPR is the rate at which the true value is correctly predicted, and FPR is the rate at which false is predicted as true.TPR and FPR are inversely related to each other.In this study, TPR is the relationship between landslide susceptibility and landslide points, and FPR is the relationship between landslides and landslide points.This curve was the result of comparing the training data with landslide susceptibility.Figure 9 shows that the CHAID algorithm had the highest AUC value (0.871), followed by the exhaustive CHAID (0.869) and QUEST algorithms (0.828).The accuracies of the CHAID, exhaustive CHAID, and QUEST algorithms were 87.1%, 86.9%, and 82.8%, respectively; the CHAID algorithm produced the best estimate of susceptibility to landslides.The results show that the study area has a very small difference of 0.2% between CHAID and exhaustive CHAID.This is because both of the algorithms use chi-square test and f-test as the basic algorithm.

Discussion
In this study, we used digital aerial photographs that were taken at high resolution to identify landslides.It is very difficult to distinguish small landslides in the study area using satellite imagery.The use of aerial photo analysis avoids time-consuming and costly field surveys.Landslides were identified using pattern classification, and the factors that were related to landslides were identified and analyzed together with the landslide location.The analysis used 20 factors (soil, hydrological features, and geological and forest map data).The algorithm for creating landslide susceptibility used CHAID, exhaustive CHAID, and QUEST algorithms in decision tree models.
In Korea, particularly in the rainy season, many landslides result from heavy rainfall over a short period.Therefore, it is necessary to analyze the susceptibility to landslides by selecting the appropriate factors that are related to landslides and classifying them by examining the relationships between these factors and landslide location.Twenty factors, including topography, hydrology, soil

Discussion
In this study, we used digital aerial photographs that were taken at high resolution to identify landslides.It is very difficult to distinguish small landslides in the study area using satellite imagery.The use of aerial photo analysis avoids time-consuming and costly field surveys.Landslides were identified using pattern classification, and the factors that were related to landslides were identified and analyzed together with the landslide location.The analysis used 20 factors (soil, hydrological features, and geological and forest map data).The algorithm for creating landslide susceptibility used CHAID, exhaustive CHAID, and QUEST algorithms in decision tree models.
In Korea, particularly in the rainy season, many landslides result from heavy rainfall over a short period.Therefore, it is necessary to analyze the susceptibility to landslides by selecting the appropriate factors that are related to landslides and classifying them by examining the relationships between these factors and landslide location.Twenty factors, including topography, hydrology, soil map, and clinical map, were selected to analyze landslide probability using CHAID, exhaustive CHAID, and QUEST algorithms in decision tree models.These algorithms are used in various fields as sophisticated modeling techniques and were applied to determine the effects of environmental factors on landslides and landslide susceptibilities.The landslide susceptibility maps were divided into five grades (very low, low, medium, high, and very high) for the ease of visual interpretation.The maps were verified against training and validation data.Specifically, 548 landslides were randomly divided into two sets of data; 50% of the landslides were used for trainning and the remaining 50% were used for validation using the ROC curve.The CHAID algorithm had the highest AUC (0.871), followed by the exhaustive CHAID algorithm (0.869), and the QUEST algorithm (0.828).The CHAID algorithm had the highest decision time (DT) accuracy (87.1%), followed by the exhaustive CHAID algorithm (86.9%), and the QUEST algorithm (82.8%).Hence, the accuracy was greater than 80% for all algorithms, and all of the algorithms were valid.The results from this study showed that slope, topographical solidity index, surface area, and convexity were positively correlated with landslide susceptibility in both of the study areas.These factors are considered related to landslide susceptibility because they increase the instability of the slope as the size of the area increases.In contrast, the TWI and the flow accumulation are negatively correlated with landslide susceptibility.The TWI and the flow aggregation correspond to hydrologic factors, and landslide susceptibility increases due to decreased cohesion from moisture as the slope becomes drier.Finally, our study revealed that those factors that increase the instability of the slope and those with less effect on the hydrological factors will increase landslide susceptibility.

Conclusions
Landslide susceptibility maps are of much interest in the landslide research community to improve performance.Map quality is controlled in an adapted way and new machine learning techniques have proven to be effective in terms of predictive performance.Therefore, in this study, we investigated the application of three decision tree method CHAID, exhaustive CHAID, and QUEST to the assessment of landslide susceptibility.According to the literature, such investigations are rare.Especially, it is based on a case study at Jumunjin area.The results of this study confirm that the performance of the landslide mapping is improved by using the machine learning ensemble.For comparison, we also considered traditional model frequency ratios with an AUC of 0.812 (Figure 10).map, and clinical map, were selected to analyze landslide probability using CHAID, exhaustive CHAID, and QUEST algorithms in decision tree models.These algorithms are used in various fields as sophisticated modeling techniques and were applied to determine the effects of environmental factors on landslides and landslide susceptibilities.The landslide susceptibility maps were divided into five grades (very low, low, medium, high, and very high) for the ease of visual interpretation.The maps were verified against training and validation data.Specifically, 548 landslides were randomly divided into two sets of data; 50% of the landslides were used for trainning and the remaining 50% were used for validation using the ROC curve.The CHAID algorithm had the highest AUC (0.871), followed by the exhaustive CHAID algorithm (0.869), and the QUEST algorithm (0.828).The CHAID algorithm had the highest decision time (DT) accuracy (87.1%), followed by the exhaustive CHAID algorithm (86.9%), and the QUEST algorithm (82.8%).Hence, the accuracy was greater than 80% for all algorithms, and all of the algorithms were valid.The results from this study showed that slope, topographical solidity index, surface area, and convexity were positively correlated with landslide susceptibility in both of the study areas.These factors are considered related to landslide susceptibility because they increase the instability of the slope as the size of the area increases.In contrast, the TWI and the flow accumulation are negatively correlated with landslide susceptibility.The TWI and the flow aggregation correspond to hydrologic factors, and landslide susceptibility increases due to decreased cohesion from moisture as the slope becomes drier.Finally, our study revealed that those factors that increase the instability of the slope and those with less effect on the hydrological factors will increase landslide susceptibility.

Conclusions
Landslide susceptibility maps are of much interest in the landslide research community to improve performance.Map quality is controlled in an adapted way and new machine learning techniques have proven to be effective in terms of predictive performance.Therefore, in this study, we investigated the application of three decision tree method CHAID, exhaustive CHAID, and QUEST to the assessment of landslide susceptibility.According to the literature, such investigations are rare.Especially, it is based on a case study at Jumunjin area.The results of this study confirm that the performance of the landslide mapping is improved by using the machine learning ensemble.For comparison, we also considered traditional model frequency ratios with an AUC of 0.812 (Figure 10).Decision tree model predictions were improved by 5.9% for CHAID, 5.7% for exhaustive CHAID and 1.6% for QUEST.These results are reasonable because the techniques used in the classifier ensemble

Figure 1 .
Figure 1.Location of study area from Daum map (a) Korea map and (b) Jumunjin area map marked by red boundary [2].

Figure 1 .
Figure 1.Location of study area from Daum map (a) Korea map and (b) Jumunjin area map marked by red boundary [2].Typhoons Rusa in 2002 and Maemi in 2003 were accompanied by record rainfall of more than 80 mm/h.The two typhoons were responsible for more than 250 deaths and injuries.Typhoons Rusa and Maemi caused numerous landslides in the Sacheon-myeon (70.84 km 2 , population 4219, 1914 households) and Jumunjin area (60.55 km2 , population 21,291, 8917 households).In particular, seven landslides occurred in Gangneung-si, and the city, as well as the present study area, was cut off by a landslide, which was caused by Typhoon Rusa in 2002.This storm also claimed the lives of three residents from Jumunjin area, which was cut off for more than ten days.To choose the most appropriate approaches to minimize the damage caused both directly and indirectly by landslides, we must first identify the areas that are susceptible to landslides and where they are most likely to occur[3,4].The most common approach used to identify landslide-susceptible areas is (geographic information system) GIS-based landslide susceptibility assessments, which include various classification-based methods, such as statistical, machine learning, and probabilistic approaches[5,6].Probabilistic models are based on the same method as the frequency ratio (FR)[6][7][8][9], the weight of evidence (WOE), and the evidential belief function (EBF).Statistical models include the statistical index, analytical hierarchy process (AHP)[10], and logistic regression[11][12][13].In recent years, fuzzy logic[14][15][16], fuzzy rule-based classifiers, neuro-fuzzy models[17][18][19], multivariate adaptive regression splines (MARS)[20], decision trees[21][22][23][24], neural networks[25][26][27][28], and support vector machines[29][30][31] have also been used.In this study, we applied probabilistic models to Gangneung-si in Gangwon-do using landslide data generated during typhoons in 2002 and 2003.Because the accuracy of the same model may vary for the mapping of landslide susceptibilities, it is important to select the most appropriate model for the region's characteristics.Decision tree models using CHAID, exhaustive CHAID, and QUEST were applied to each study area to select the more accurate algorithms.

Figure 3 .
Figure 3. Landslide point of Jumunjin area marked by green circle on hill shade map.

Figure 3 .
Figure 3. Landslide point of Jumunjin area marked by green circle on hill shade map.

Figure 3 .
Figure 3. Landslide point of Jumunjin area marked by green circle on hill shade map.

Figure 4 .
Figure 4. Workflow in this study.

Figure 4 .
Figure 4. Workflow in this study.

Figure 6 .
Figure 6.Landslide susceptibility map of Chi-square automatic interaction detection (CHAID) algorithm.red area means Landslide susceptibility is very high, orange area means high, yellow area means medium, light green area means low, dark green area means very low.

Figure 7 .
Figure 7. Landslide susceptibility map of exhaustive CHAID algorithm.red area means Landslide susceptibility is very high, orange area means high, yellow area means medium, light green area means low, dark green area means very low.

Figure 8 .
Figure 8. Landslide susceptibility map of QUEST algorithm.red area means Landslide susceptibility is very high, orange area means high, yellow area means medium, light green area means low, dark green area means very low.

Figure 6 . 17 Figure 6 .
Figure 6.Landslide susceptibility map of Chi-square automatic interaction detection (CHAID) algorithm.red area means Landslide susceptibility is very high, orange area means high, yellow area means medium, light green area means low, dark green area means very low.

Figure 7 .
Figure 7. Landslide susceptibility map of exhaustive CHAID algorithm.red area means Landslide susceptibility is very high, orange area means high, yellow area means medium, light green area means low, dark green area means very low.

Figure 8 .
Figure 8. Landslide susceptibility map of QUEST algorithm.red area means Landslide susceptibility is very high, orange area means high, yellow area means medium, light green area means low, dark green area means very low.

Figure 7 .
Figure 7. Landslide susceptibility map of exhaustive CHAID algorithm.red area means Landslide susceptibility is very high, orange area means high, yellow area means medium, light green area means low, dark green area means very low.

Figure 6 .
Figure 6.Landslide susceptibility map of Chi-square automatic interaction detection (CHAID) algorithm.red area means Landslide susceptibility is very high, orange area means high, yellow area means medium, light green area means low, dark green area means very low.

Figure 7 .
Figure 7. Landslide susceptibility map of exhaustive CHAID algorithm.red area means Landslide susceptibility is very high, orange area means high, yellow area means medium, light green area means low, dark green area means very low.

Figure 8 .
Figure 8. Landslide susceptibility map of QUEST algorithm.red area means Landslide susceptibility is very high, orange area means high, yellow area means medium, light green area means low, dark green area means very low.

Figure 8 .
Figure 8. Landslide susceptibility map of QUEST algorithm.red area means Landslide susceptibility is very high, orange area means high, yellow area means medium, light green area means low, dark green area means very low.

Table 1 .
The input factors used in this study.

Table 2 .
The Calculated frequency ratio in Jumunjin area.

Table 2 .
The Calculated frequency ratio in Jumunjin area.