Rainfall-Induced Shallow Landslide Susceptibility Mapping at Two Adjacent Catchments Using Advanced Machine Learning Algorithms

Pradhan, Ananta Man Singh; Kim, Yun-Tae

doi:10.3390/ijgi9100569

Open AccessArticle

Rainfall-Induced Shallow Landslide Susceptibility Mapping at Two Adjacent Catchments Using Advanced Machine Learning Algorithms

by

Ananta Man Singh Pradhan

¹

and

Yun-Tae Kim

^2,*

¹

Water Resources Research and Development Center, Ministry of Energy, Water Resources and Irrigation, Government of Nepal, Pulchok, Lalitpur 44700, Nepal

²

Department of Ocean Engineering, Geo-Systems Engineering Laboratory, Pukyong National University, 45 Yongso-ro, Nam-gu, Busan 48513, Korea

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2020, 9(10), 569; https://doi.org/10.3390/ijgi9100569

Submission received: 5 September 2020 / Revised: 20 September 2020 / Accepted: 28 September 2020 / Published: 29 September 2020

(This article belongs to the Special Issue Advances in Machine Learning and Statistical Analysis of Geographical Data)

Download

Browse Figures

Versions Notes

Abstract

:

Landslides impact on human activities and socio-economic development, especially in mountainous areas. This study focuses on the comparison of the prediction capability of advanced machine learning techniques for the rainfall-induced shallow landslide susceptibility of Deokjeokri catchment and Karisanri catchment in South Korea. The influencing factors for landslides, i.e., topographic, hydrologic, soil, forest, and geologic factors, are prepared from various sources based on availability, and a multicollinearity test is also performed to select relevant causative factors. The landslide inventory maps of both catchments are obtained from historical information, aerial photographs and performed field surveys. In this study, Deokjeokri catchment is considered as a training area and Karisanri catchment as a testing area. The landslide inventories contain 748 landslide points in training and 219 points in testing areas. Three landslide susceptibility maps using machine learning models, i.e., Random Forest (RF), Extreme Gradient Boosting (XGBoost) and Deep Neural Network (DNN), are prepared and compared. The outcomes of the analyses are validated using the landslide inventory data. A receiver operating characteristic curve (ROC) method is used to verify the results of the models. The results of this study show that the training accuracy of RF is 0.756 and the testing accuracy is 0.703. Similarly, the training accuracy of XGBoost is 0.757 and testing accuracy is 0.74. The prediction of DNN revealed acceptable agreement between the susceptibility map and the existing landslides, with a training accuracy of 0.855 and testing accuracy of 0.802. The results showed that the DNN model achieved lower prediction error and higher accuracy results than other models for shallow landslide modeling in the study area.

Keywords:

Deep Neural Network; Extreme Gradient Boosting; Random Forest; landslide susceptibility

Graphical Abstract

1. Introduction

Landslides are one of the major natural hazards worldwide due to unfavorable geological causes, weathering patterns, shallow soil deposits and heavy rainfall. Landslides create dangerous menaces to the environment. The Korean peninsula is currently experiencing climate change effects, including annual temperature, precipitation and the rate of typhoon occurrence [1]. In the Korean Peninsula, one of the main causes of heavy downpour is typhoons. Every year, large-scale typhoons pass over South Korea, leaving a trail of devastation in their path. Shallow-seated landslides often result from heavy downpours accompanying typhoons and torrential rains during the summer season [2].

Landslide prediction is a crucial task because the inherent causes of landslides are complex due to mechanical and chemical weathering, nonhomogeneous soil deposits, the orientation of geological lineament, and the erosion of a slope. The external causes, including rainfall, earthquakes, the excavation of a slope, and the loading of a slope or its crest, are the most powerful agents that trigger the landslide [3,4,5,6,7].

Landslide susceptibility is the occurrence probability of a landslide under certain geo-environmental conditions, and it estimates “where” landslides are most likely to occur in the future [8,9]. In the last three decades, various researchers have proposed and applied GIS-based qualitative and quantitative methods [10]. The qualitative methods are subjective and depend on expert decisions [11,12,13], and quantitative methods are objective and founded on the geographical distribution of landslides and influencing factors (IFs). The weight of evidence, frequency ratio, relative effect models and logistic regression models have been frequently used in quantitative methods [14,15,16,17,18,19]. In recent years, machine learning techniques have been increasingly employed for resolving the non-linear spatial relationship of landslide occurrences, using either regression or classification methods [20,21,22,23,24,25].

Advances in computational technologies and the rapid development of advanced machine learning algorithms have amended the understanding and management of many geo-environmental problems. The main advantage of machine learning over traditional techniques is its capability to handle high-dimensional and complex non-linear datasets [26]. Various previous investigators have utilized machine learning in different fields of geosciences and hydrogeology, including spring potential mapping [27], gully erosion mapping [28], flood hazard mapping [29] and landslide studies [30,31]. The application of advanced machine learning algorithms to landslide susceptibility modeling has been very limited, despite the variety thereof and the potential advantages.

The main aim of this research is to evaluate landslide susceptibility based on machine learning techniques, i.e., Random Forest (RF), Extreme Gradient Boosting (XGBoost) and Deep Neural Network (DNN), in Deokjeokri and Karisanri catchments. The designed focus of this study is landslide susceptibility mapping, with training of the model done in Deokjeokri catchment and testing of the model done in Karisanri catchment, close to Deokjeokri catchment. Two adjacent catchments (Deokjeokri and Karisanri) were selected as study sites because the two sites have similar geological and geomorphological settings, and they had severe rainfall-induced landslide problems.

2. Description of Study Area

Deokjeokri catchment and Karisanri catchment are located within the Inje region, Gangwon Province in the northeastern part of Korea (inset of Figure 1). Deokjeokri catchment occupies about 33.4 km², and Karisanri catchment about 22.2 km². These catchments are confined by moderate to steep mountains. The predominant lithological types of this area include banded gneisses and granites, as shown in Figure 1. The study area consists of residual soil that includes silt and clay. Shallow landslides are the major land degradation process in both catchments. The channels are narrowed and meandered. The major extrinsic cause of landslide in the study area was a heavy downpour. The landslide occurred under the indirect gist of two typhoons, Ewiniar and Bilis, in July 2006. The total rainfall from 14–16 July 2006 was 402 mm, and the maximum hourly rainfall intensity, which occurred on 15 July 2006, was 62 mm/hr. The extreme rainfall that occurred on 14–16 July resulted in 17 deaths and 12 people unaccounted for in Inje County.

3. The Collected Dataset and Methods

3.1. Landslide Inventory

A landslide inventory map records the geographic location, size, date and type of landslide [32,33,34,35,36]. The precise detection of landslide emplacements is crucial for the prediction and assessment of the landslide susceptibility models. A new generation of satellite images, like Word view and Geo-eye, aided a quick reproduction of landslide inventory maps [37,38]. Recently, the application of interferometry techniques to radar images has been extensively used in inventory mapping [39]. However, due to the limitation of resources, in this study, we (1) examined aerial photographs (provided in the web portal site; www.map.kakao.com, www.map.naver.com) to prepare a landslide inventory map. Both web portals give an excellent bird’s eye view with a 25 cm resolution. We also (2) performed fieldwork. The field observations showed that the predominant mass movements are shallow slides that sometimes ensued as debris flows during periods of heavy rainfalls, and occurred on natural and man-made slopes with highly weathered bedrock.

As shown in Figure 2, a total of 748 landslides were identified, and afterward digitized in ArcGIS 10.3 environment, for further analysis in Deokjeokri catchment (training area), and 219 landslides were identified at Karisanri catchment (testing area). Some glimpses of landslide observed during field visits are presented in Figure 3. In this study, a single point feature from the center of each landslide’s initiation area was extracted and rasterized into 10 × 10 m pixel size. The aerial extent of the initiation of each landslide was measured to be less than 100 m². This means it can represent the landslide initiation area based on the least grid size of the digital elevation model (DEM). A Gaussian Kernel density function of the sampling strategy (an unbiased sample technique) was used [40] for non-landslide pixels. A double number of non-landslide pixels (i.e., 1496 from Deokjeokri and 438 from Karisanri) was sampled in each catchment.

3.2. Landslide Influencing Factor (IF)

The main process for susceptibility modeling is the gathering and designing of a spatial database in a GIS environment [41]. The landslide influencing factors (IFs) considered in this study are the intrinsic factors such as topographic, hydrologic, forest, soil and geological factors, which are collected from available resources and fieldworks. Table 1 represents the spatial database obtained from different sources. In this study, based on reviewing previous research [22,23,42,43] and the availability of data, 14 IFs were utilized to identify the hillslope features of landslide occurrence. Among the 14 IFs, topographic (except aspect map) and hydrologic factors are continuous, but forest, soil and geology are categorical factors. The selection of an appropriate pixel is necessary to achieve high precision in landslide susceptibility mapping [44,45]. Tarolli and Tarboton (2006) found that a 10 × 10 m resolution is adequate to achieve better landslide susceptibility prediction performance, and thus a 10 × 10 m resolution DEM was created from 5 m interval contours. Similarly, all collected IFs (thematic layers) were rasterized into 10 × 10 m pixel size.

3.2.1. Topographic Factors

The aspect does not show a direct relation with landslide occurrence, however, the aspect of the terrain is related to parameters such as sunlight exposure and precipitation [46,47]. The aspect was divided into north, northeast, east, southeast, south, southwest, west, northwest and flat, as shown in Figure 4a. The elevation is another important element in landslide studies. One study shows that a higher altitude tends to cause landslides [46]. The elevation of the study area ranges from 195.1 to 1513.67 m asl, as presented in Figure 4b. The slope is a very powerful driver of the landslide. In fact, the slope affects the surface and subsurface flow, and thus the soil moisture content, the soil formation and the likelihood of soil erosion [48]. In the study area, the slope ranges for 0° to 62.3° as shown in Figure 4c. Internal relief indicates the topographic breakage—the potential energy available for mass movements [49]. Internal relief was prepared by calculating the local height difference within nine pixels (unit area), and in the study area it ranges from 0.15 to 72.1 m, as shown in Figure 4d. Terrain curvature represents the morphology of the hillslope (Figure 4e), and it controls the slope erosion process by the convergence or divergence of water flow [50].

3.2.2. Hydrologic Factors

Hydrological factors are vital factors in rainfall-induced slope instability. Two types of drainage proximity, i.e., horizontal and vertical Ifs, were used in this study. The drainage proximity (h) (Figure 5a) was obtained from the Euclidean function in the GIS environment. A new topo-hydrological factor, i.e., drainage proximity (v), was used. It gives the height above the nearest drainage network (Figure 5b). This index considers the height differences along flow paths. Drainage density is the length of the drainage within a pixel (Figure 5c). The presence of higher drainage density means less percolation but faster surface flow [51].

The SPI indicates the erosive processes, as caused by runoff. As the specific catchment area and slope gradient increase, the quantity of water contributed by the upslope and the surface runoff also increases [52]. Equation (1) defines the SPI:

S P I = A_{s} \times t a n β

(1)

where A_s is the specific catchment area, and β is the slope gradient. The spatial distribution of SPI is presented in Figure 5d.

The STI reflects the erosive power of the overflow drain and the process of erosion and deposition. To calculate STI, two components of the slope which are responsible for soil loss, i.e., length (L) and steepness (S), were considered, as suggested by Moore and Burch (1986) [53], given in Equation (2),

S T I = {(\frac{A_{s}}{22.13})}^{0.6} {(\frac{s i n β}{0.0896})}^{1.3}

(2)

where A_s is the specific catchment area, and β is the slope gradient. The spatial distribution of STI is shown in Figure 5e.

TWI is related to soil conditions and surface runoff [54]. TWI is the tendency of water to aggregate in the catchment and the propensity of gravitational powers to move downslope [55]. TWI is defined as

T W I = l o g (\frac{α}{t a n β})

(3)

where α is the cumulative upslope area, and β is the slope angle. The distribution of TWI in the study area is shown in Figure 5f.

3.2.3. Forest and Soil Factors

Rickli et al. (2002) [56] addressed the effect of forests in hillslope reinforcement. The crucial role of trees and forests is to forestall mass movements by fortifying and drying soils. The catchments mainly consist of oak, Japanese larch, Japanese red pine, Japanese pine and Korean pine forest, as shown in Figure 6a. The non-forest area consists of agricultural land.

Soil properties affect degrees of drainage and erosion, which influence the landslide occurrence. The particle size and pore distribution affect water movement and the holding of infiltrated water [57]. Finer soils can hold higher volumes of water than coarse-textured soils [58]. In the study area, the predominant soil types are sandy loam soils (Figure 6b). However, downstream areas were covered by silty clay loam and clay loam. The northern and southern elevated areas of the catchments are rocky and covered by a very thin layer of soil.

3.2.4. Geologic Factor

Landslides are significantly determined by the lithological characteristics of the hillslope because the individual lithological unit relates to different weathering degrees [59,60]. As described earlier, the lithology of this area includes banded gneisses and granites (Figure 1). Prominent lineaments and dense joint spacing with weathering and erosions were observed in both catchments, which create weak zones. The drainage system follows the lineaments. The bulky granite peaks are one of the typical characteristics of the Inje area.

3.3. Modeling Approaches

As illustrated in Figure 7, the fundamental working phases of the adopted methodology in this study comprise four main steps: (1) the preparation of landslide inventory and related database, (2) the preparation of training and testing dataset (in this research Deokjeokri catchment was selected as a training area and Karisanri catchment as a testing area), (3) the landslide IFs were tested and selected using multicollinearity test, and (4) the preparation and comparison of the performance of landslide susceptibility maps using RF, XGBoost and DNN models.

The number of IFs can be minimized via multicollinearity testing, reducing high-dimensional data. For the susceptibility analysis, the IFs were examined through a multicollinearity test, which is useful for selecting applicable IFs. The variance inflation (VIF) and tolerance (TOL) are two indicators of the level of multicollinearity [61,62]. A VIF value more prominent than or equivalent to 5 and a TOL value under 0.2 demonstrates a genuine multicollinearity issue [63,64]:

TOL = 1 - R^{2}

(4)

VIF = \frac{1}{TOL}

(5)

where the coefficient of determination (R²) is the proportion of the variance in the target variable [65,66,67].

The quality of the predictions was examined using the receiver operating characteristic curve (ROC) technique. One of the popular methods for assessing model accuracy is using a receiver operating characteristic curve (ROC) in the landslide susceptibility assessment. The ROC is a useful method for describing the quality of probabilistic prediction systems [68]. The area under the curve can be utilized as a measurement to evaluate the general execution of the model [69], assuming that the bigger the area is, the better the presentation of the model becomes. A rough guide for grouping the exactness is the customary scholastic point framework. The following rankings were considered for accuracy test [70]: 0.90–1 (excellent), 0.80–0.90 (good), 0.70–0.80 (fair), 0.60–0.70 (poor) and 0.50–0.60 (fail).

In order to compare the reliability of the obtained landslide susceptibility maps, a difference in the three landslide susceptibility methods was evaluated in a 2 × 2 contingency table. The numbers of correctly predicted landslides and non-landslides are indicated by TP (True positives) and TN (True negatives), and the numbers of incorrectly predicted landslides and non-landslides are FN (False negatives) and FP (False positives), respectively. The aim of two-class prediction methods is to separate true cases from false ones. Based on the four descriptors, several further measures can be calculated. Sensitivity, also called true positive rate (TPR), and specificity (true negative rate, TNR) show the ratios of the landslide and non-landslide cases correctly identified by the landslide susceptibility models. Accuracy (ACC) refers to the correct classification of index. The mathematical basis of these parameters [71] is given as in equations (6)–(8):

S e n s i t i v i t y = \frac{T P}{T P + F N}

(6)

S p e c i f i c i t y = \frac{T N}{F P + T N}

(7)

A C C = \frac{T P + T N}{T P + F P + F N + T N}

(8)

The machine learning methods that were applied in the present study are briefly described below.

3.3.1. Random Forest (RF)

Random Forest (RF), introduced by Breiman in 2001 [72], has been demonstrated to be an incredible method in classification and regression. It has the ability to deal with high-dimension, continuous and categorical data. RF does not require an assumption about the statistical distribution of the data, and it can be considered robust with respect to changes in the composition of the dataset [73]. These properties are very useful in applications using variables that have mutual nonlinear interactions. In this work, we used the “ModelMap” package [74] in the R-statistical environment. The “ModelMap” package allows an interface between several existing R-packages to automate and construct the maps.

The training algorithm for RF applies the general technique of bootstrap aggregating or bagging to tree learners. The training set comprises a simple and fast way of learning a function that translates the dependent variable (target) X₀ and independent IFs data (X₁, X₂,…..,X_n) to output Y_i, where (X₁, X₂,…..,X_n) can be a mix of categorical and numerical variables. X₀ can be binary for classification, or numeric for regression. After training query point x, each tree independently predicts the average of the Y_i, as shown in Equations (9) and (10)

f_{n}^{j} (x) = \frac{1}{N^{e} (A_{n} (x))} \sum_{\begin{array}{l} Y_{i} \in A_{n} (x) \\ I_{i} = e \end{array}} Y_{i}

(9)

and the forest averages the predictions of each tree

f_{n}^{(M)} (x) = \frac{1}{M} \sum_{j = 1}^{M} f_{n}^{j} (x)

(10)

where

A_{n} (x)

denotes the leaf containing x, and

N^{e} (A_{n} (x))

denotes the number of estimation points it contains.

3.3.2. Extreme Gradient Boosting (XGBoost)

The XGBoost is based on the Gradient Boosting concept. It uses a more regularized model formalization to control over-fitting, which gives it better performance. It was developed by Chen and Guestrin (2016) [75]. In XGBoost, training is done using an additive strategy. Given samples i with independent IFs data (X₁, X₂, …, X_n), and a tree ensemble model uses the K additive function to predict the outputs Y.

Y_{i} = ϕ (x_{i}) = \sum_{k = 1}^{K} f_{k} (x_{i}), f_{k} \in F

(11)

where F is the set of all possible trees. The

f_{k}

is a function at each of the k steps of the descriptor values in

x_{i}

to a certain output. The differentiable loss function that measures the difference between the predictions and the target is simply a mean-square error. After each step of boosting, the algorithm scales the newly added weights. In this study, an open source code “xgboost” for R-environment was utilized [76].

3.3.3. Deep Neural Network (DNN)

DNN is inspired by the information processing and communication pattern in the biological nervous system [77,78]. In DNN, layer representations are learned via models called a neural network. From a technical point of view, DNN algorithms are multilayered neural networks. The main body of DNN consists of an input layer that receives the data and an output layer that gives the prediction, and the core part is the hidden layer that processes the data. With the rapid development of DNN, a great number of libraries and packages has been developed to set a DNN with minimal effort. In this work, “H2O” packages [79] are designed for the R-statistical environment, and run seamlessly on both ‘CPU’ and ‘GPU’ devices.

Recently, DNN has been the subject of most of the advancements in artificial intelligence. A DNN is a multi-layered model for learning complex non-linear relationship between input data. For a given dependent variable i and known model parameters θ, the Poisson distribution of the response variable can be predicted, as given in Equation (12)

λ_{m, θ} (x) = \exp (γ_{i}^{T} a_{m}^{N_{h, .}} (x, θ))

(12)

where

a_{m}^{N_{h, .}} (x, θ)

builds the shared part of the model. In this study, the ReLU activation function was chosen because it displayed empirically less optimization problems.

4. Results and Analysis

4.1. Selection of IFs

The 14 landslide IFs were analyzed for multicollinearity using TOL and VIF parameters. The results show that the highest value of VIF is 10.47, and the lowest value of TOL is 0.10, as presented in Table 2, which shows that IR has a multicollinearity problem among the 14 landslide IFs. So, IR was eliminated for further analysis.

4.2. Application of RF in Landslide Susceptibility Mapping

Three training parameters need to be defined in the RF model, as follows: ntree, the number of bootstrap samples for the original data (the default value is 500); mtry, the number of different predictors tested at each node, i.e., as many as the IFs; node size, the minimum size of the terminal nodes of the trees [72]. The elements not included in the ntree bootstrap sample are referred to as out-of-bag data (OOB) for that bootstrap sample. OOB is used to estimate an error rate called the OOB error rate (Figure 8a). The result suggests that when the training of the data was applied to building the model, the error rates stay constant after 100 trees.

The error rate was found to be 12%; therefore 88% of the model building result was accurate, which makes this a reasonably good model. Figure 8b depicts the variable importance by means of the mean decrease accuracy in percentage. The result shows that drainage proximity (h) has the highest importance followed by slope and forest.

4.3. Application of XGBost in Landslide Susceptibility Mapping

Basic training using XGBoost is the most critical part of the modeling. It contains several hyper-parameters to be set [80]. We hold the trees constant at the default of 1000 rounds. The learning rate was selected as 0.1 to avoid overfitting the model, and the maximum depth of the tree was chosen as 3. Figure 9a shows the log loss during the modeling process. After tuning the model, it was found that the log loss of the training process is 0.48 and the log loss for the testing data is 0.56 in 90 iterations, respectively. The model has the capability to check the important variables using the bias of all IFs. Figure 9b illustrates that drainage proximity (h) is the most important IF, followed by slope and curvature during the XGBoost modeling.

4.4. Application of DNN in Landslide Susceptibility Mapping

There is no universal rule for the determination of an optimum neural net structure. In this study, three hidden layers and 13 neurons in each hidden layer were used. The input IFs were normalized and then a “ReLU” activation function was used to run the model. The loss function is an important output of the model, which is used to measure the inconsistency between the dependent variable and the predicted outcome. Figure 10a presents the loss function during the modeling process. It has been found that as the number of epochs increases, the loss values for the training and test data decrease. After 10 epochs, the curves were stable. The relative importance of the model is presented in Figure 10b. Drainage proximity (h) is the most important IF, followed by slope and forest.

4.5. Evaluation Measures

Three landslide susceptibility maps obtained from RF, XGBoost and DNN were classified into five classes, namely very low, low, moderate, high and very high, using the Jenks natural breaks classification method, as presented in Figure 11a–c, respectively.

The outcomes show a common pattern, ensuring that with an increase in the index value, the occurrence of the probability of the landslide also increases, i.e., a higher index represents a higher susceptibility to landslide occurrence. The highly susceptible area is more confined near to the drainage and steep areas. The flat and gentle slopes represent the low susceptible areas.

The ROC evaluation for the three methods in the landslide susceptibility mapping shows that the area under the curve (AUC) values of the training data are 0.756 (fair), 0.757 (fair) and 0.855 (good) for RF, XGBoost and DNN, respectively (Figure 12a). Similarly, the AUC values of testing data are 0.703 (fair), 0.74 (fair) and 0.802 (good) for RF, XGBoost and DNN, respectively (Figure 12b).

The comparison of the three methods shows that the prediction capabilities of these models are different. Among the three models, DNN shows the highest prediction capability, followed by the XGBoost and then the RF model. The measurement of the landslide susceptibility models was validated utilizing the statistical evaluation parameters. The results during the testing stage for the implemented models are presented in Table 3, which clearly shows that the execution of all models produced very good outcomes. During the testing phase, the correctly captured presence of landslides (TP) and absence of landslides (TN) is higher in the DNN model than in the RF and XGBoost models.

The DNN model exhibited the highest values of specificity (83.56%) and sensitivity (84.01%). The ACC of DNN was 83.71% (testing); on other hand, this value was 74.73% (testing) for XGBoost and 68.19% (testing) for the RF model, respectively. The performance results show that the prediction of the probable occurrence of a landslide by the DNN model surpasses the remaining models by a significant degree, followed by XGBoost and RF.

The consistent performance and model uncertainty were estimated using a four-fold plot. The quadruple plot is a visual representation of a confusion matrix that summarizes the number of TP, TN, FP and FN. Figure 13 shows the aggregated data of landslide presence or absence, i.e., the pixel frequencies, numerically in the corners of the plot. Thus, there were 219 landslides in the test catchment, of which the RF model correctly classified 155 (70.78%), compared with 438 non-landslides, of which 293 (66.89%) were correctly classified. In the case of XGBoost, 72.60% of the landslides were correctly classified, and 75.80% of the non-landslides were identified as no landslides pixels. Similarly, the DNN model identified 84.02% true landslides and 83.56% no landslides.

5. Discussion

The most effective means of reducing casualties and losses ensuing from landslides is landslide risk management; therefore, high-quality landslide susceptibility maps are an important tool [81] to predict a probable occurrence of the landslide. Landslide susceptibility mapping has been studied more via qualitative and quantitative methods. Previous studies focused on improving the performance of landslide prediction using various machine learning methods. The sample size used for modeling was not sufficient for these methods. Therefore, we used the Gaussian Kernel density function to select a double number of pixels without landslides. The purpose of the study was to apply and compare different machine learning models to obtain accurate landslide susceptibility maps. Therefore, in the present study, three classification algorithms (RF, XGBoost and DNN) were utilized and compared for landslide susceptibility mapping in the Inje area. The accuracy assessment of the results is the next step in comparing several models of landslide prediction. Without accuracy assessment, the results could not be interpreted, the methods could not be provided support or input, and they would have no scientific meaning. For this purpose, four metrics (sensitivity, specificity, ACC and AUC of ROC) were used in the evaluation of the model. Basically, the RF model is one of the more popular and widely used models to determine landslide susceptibility [73,82,83], while on other hand, XGBoost and DNN have been explored in limited studies for the spatial prediction of the occurrence of landslides, which showed promising results in the study area. An important step in landslide susceptibility modeling is determining the most significant IFs to decrease over-fitting. From multicollinearity analysis, IR was identified as a variable having collinearity issues; therefore IF was discarded in the modeling process. With regard to important IFs, three models revealed that horizontal drainage proximity (h) is the most important IF among 13 IFs. This finding justifies the statement given by Nobre et al. (2011) [84] that the terrain nearer to drainage has a strong correlation with the soil–water saturation regime, and it is a significant controlling factor for landslide occurrences, especially in the soil slopes. Gökceoglu and Aksoy (1996) [85] also state that a stream adversely affects the hillslope because most of the landslides occur in proximity to the stream. Similarly, the slope is second most important factor in these implemented models, as the slope is the most powerful agent of landslides [86]. On the other hand, SPI or TWI are less significant factors, compared to other IFs.

Although the applied machine learning models are “black box models”, applying machine learning is a field in which interest has been growing in recent years. The prescient outcomes do not distinguish the reasons for the landslide, however they do show the relation between landslides and terrain features. Nevertheless, such results may yield understanding of landslide events, and of which IFs affect landslides in the study area. To get more definitive results, more experiments should be done to validate the model with time series landslide inventories.

Previous studies illustrated that the RF method can adequately depict the probable occurrence of landslides and spatial data [82,83,87,88]. There are several pieces of research on landslide susceptibility that have been done using a gradient boosting algorithm [89,90,91]. It was found that the Boosted model performs better than the RF model. Recently, DNN has become popular among researchers [92,93]. Wang et al. (2017) [94] stated that the DNN framework achieved higher or comparable prediction accuracy, and this study also supports the previous findings [95,96,97]. Thus, DNN can give better predictions and can improve the understanding of landslide-susceptible areas.

6. Conclusions

Landslide susceptibility mapping is an important tool to predict the probability of occurrences of landslides in areas with hilly terrain. Therefore, high-quality landslide prediction models are of paramount importance. Two adjacent catchments were selected for the application and comparison of three machine learning models, i.e., RF, XGBoost and DNN. Deokjeokri catchment was selected as the training area and Karisanri catchment as the testing area. A landslide inventory consisting of 748 landslides in Deokjeokri catchment and 219 landslides in Karisanri catchment was prepared by using GIS. In total, 14 thematic layers were collected as landslide-influencing factors. Among these, IR failed in a multicollinearity test. From the comparison in this study, the three models have different accuracy levels. The results of DNN (training AUC = 0.855, testing AUC = 0.802) were significantly better than those of the XGBoost (training AUC = 0.757, testing AUC = 0.74) and RF (training AUC = 0.756, testing AUC = 0.703) models. The performance results of the testing phase also indicated that DNN has higher accuracy (83.71%) than XGBoost (74.73%) and RF (68.19%). The three models revealed that drainage proximity (h) is the most important IF. The DNN model is a very promising alternative for shallow landslide modeling in the study area. It is obvious that a high-quality landslide susceptibility map can decrease costs and time. Therefore, the resulting model can be used for spatial planning in the study area to reduce the risk of landslides and develop environment management.

Author Contributions

Ananta Man Singh Pradhan performed the research, modified the codes, analyzed the data and wrote the manuscript. Yun-Tae Kim designed the research and extensively updated the manuscript. Both authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Korea Agency for Infrastructure Technology Advancement (KAIA) grant funded by the Ministry of Land, Infrastructure and Transport (Grant 19TSRD-B151228-01).

Acknowledgments

We are thankful to the anonymous reviewers for their valuable comments and suggestions on the manuscript. We would like to thank Ji-Sung Lee for assistance during fieldwork. All views and interpretations expressed in this publication are those of the authors and are not necessarily attributable to Water Resource Research Center, Government of Nepal.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chung, Y.-S.; Yoon, M.-B.; Kim, H.-S. On Climate Variations and Changes Observed in South Korea. Clim. Chang. 2004, 66, 151–161. [Google Scholar] [CrossRef]
Kim, Y.-T.; Lee, J.-S. Slope Stability Characteristic of Unsaturated Weathered Granite Soil in Korea considering Antecedent Rainfall. Geo Congr. 2013 2013, 349–401. [Google Scholar] [CrossRef]
Miles, S.B.; Keefer, D.K. Evaluation of seismic slope-performance models using a regional case study. Environ. Eng. Geosci. 2000, 6, 25–39. [Google Scholar] [CrossRef]
Tarolli, P.; Tarboton, D.G. A new method for determination of most likely landslide initiation points and the evaluation of digital terrain model scale in terrain stability mapping. Hydrol. Earth Syst. Sci. 2006, 10, 663–667. [Google Scholar] [CrossRef] [Green Version]
Iida, T. A hydrological method of estimation of the topographic effect on the saturated throughflow. Jpn. Geomorph. Union Trans. 1984, 5, 1–12. [Google Scholar]
Keefer, D.K. Investigating landslides caused by earthquakes—A historical review. Surv. Geophys. 2002, 23, 473–510. [Google Scholar] [CrossRef]
Moore, J.G.; Clague, D.A.; Holcomb, R.T.; Lipman, P.W.; Normark, W.R.; Torresan, M.E. Prodigious submarine landslides on the Hawaiian Ridge. J. Geophys. Res. 1989, 94, 17465–17484. [Google Scholar] [CrossRef] [Green Version]
Brabb, E.E. Innovative Approaches to Landslide Hazard Mapping. In Proceedings of the 4th International Symposium on Landslides, Toronto, ON, Canada, 23–31 August 1984; pp. 307–324. [Google Scholar]
Furlani, S.; Ninfo, A. Is the present the key to the future? Earth-Sci. Rev. 2015, 142, 38–46. [Google Scholar] [CrossRef]
Carrara, A.; Cardinali, M.; Detti, R.; Guzzetti, F.; Pasqui, V.; Reichenbach, P. GIS techniques and statistical models in evaluating landslide hazard. Earth Surf. Process. Landf. 1991, 16, 427–445. [Google Scholar] [CrossRef]
Chung, C.-J.F.; Fabbri, A.G. Validation of Spatial Prediction Models for Landslide Hazard Mapping. Nat. Hazards 2003, 30, 451–472. [Google Scholar] [CrossRef]
Corominas, J.; van Westen, C.; Frattini, P.; Cascini, L.; Malet, J.-P.; Fotopoulou, S.; Catani, F.; Van Den Eeckhaut, M.; Mavrouli, O.; Agliardi, F.; et al. Recommendations for the quantitative analysis of landslide risk. Bull. Eng. Geol. Environ. 2013, 73, 209–263. [Google Scholar] [CrossRef]
Reichenbach, P.; Rossi, M.; Malamud, B.D.; Mihir, M.; Guzzetti, F. A review of statistically-based landslide susceptibility models. Earth-Sci. Rev. 2018, 180, 60–91. [Google Scholar] [CrossRef]
Dai, F.; Lee, C. Landslide characteristics and slope instability modeling using GIS, Lantau Island, Hong Kong. Geomorphology 2002, 42, 213–228. [Google Scholar] [CrossRef]
Shrestha, S.; Kang, T.-S.; Choi, J.C. Assessment of co-seismic landslide susceptibility using LR and ANCOVA in Barpak region, Nepal. J. Earth Syst. Sci. 2018, 127, 38. [Google Scholar] [CrossRef] [Green Version]
Aleotti, P.; Chowdhury, R. Landslide hazard assessment: Summary review and new perspectives. Bull. Eng. Geol. Environ. 1999, 58, 21–44. [Google Scholar] [CrossRef]
Ko Ko, C.; Flentje, P.; Chowdhury, R. Quantitative Landslide Hazard and Risk Assessment: A Case Study. Q. J. Eng. Geol. Hydrogeol. 2003, 36, 261–272. [Google Scholar] [CrossRef]
Pradhan, A.M.S.; Kim, Y.T. Relative effect method of landslide susceptibility zonation in weathered granite soil: A case study in Deokjeok-ri Creek, South Korea. Nat. Hazards 2014, 72, 1189–1217. [Google Scholar] [CrossRef]
Pradhan, A.M.S.; Dawadi, A.; Kim, Y.T. Use of different bivariate statistical landslide susceptibility methods: A case study of Khulekhani watershed, Nepal. J. Nepal Geol. Soc. 2012, 44, 1–12. [Google Scholar] [CrossRef]
Hong, H.; Liu, J.; Bui, D.T.; Pradhan, B.; Acharya, T.D.; Pham, B.T.; Zhu, A.-X.; Chen, W.; Ahmad, B. Bin Landslide susceptibility mapping using J48 Decision Tree with AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang area (China). CATENA 2018, 163, 399–413. [Google Scholar] [CrossRef]
Pourghasemi, H.; Gayen, A.; Park, S.; Lee, C.-W.; Lee, S.; Pourghasemi, H.R.; Gayen, A.; Park, S.; Lee, C.-W.; Lee, S. Assessment of Landslide-Prone Areas and Their Zonation Using Logistic Regression, LogitBoost, and NaïveBayes Machine-Learning Algorithms. Sustainability 2018, 10, 3697. [Google Scholar] [CrossRef] [Green Version]
Thai Pham, B.; Prakash, I.; Dou, J.; Singh, S.K.; Trinh, P.T.; Trung Tran, H.; Minh Le, T.; Tran, V.P.; Kim Khoi, D.; Shirzadi, A.; et al. A Novel Hybrid Approach of Landslide Susceptibility Modeling Using Rotation Forest Ensemble and Different Base Classifiers. Geocarto Int. 2018, 14, 1–38. [Google Scholar] [CrossRef]
Pham, B.T.; Nguyen, M.D.; Bui, K.-T.T.; Prakash, I.; Chapi, K.; Bui, D.T. A novel artificial intelligence approach based on Multi-layer Perceptron Neural Network and Biogeography-based Optimization for predicting coefficient of consolidation of soil. CATENA 2019, 173, 302–311. [Google Scholar] [CrossRef]
Kornejady, A.; Pourghasemi, H.R.; Afzali, S.F. Presentation of RFFR New Ensemble Model for Landslide Susceptibility Assessment in Iran; Springer: Cham, Switzerland, 2019; pp. 123–143. [Google Scholar]
Nhu, V.H.; Hoang, N.D.; Nguyen, H.; Ngo, P.T.T.; Thanh Bui, T.; Hoa, P.V.; Samui, P.; Tien Bui, D. Effectiveness assessment of Keras based deep learning with different robust optimization algorithms for shallow landslide susceptibility mapping at tropical area. Catena 2020, 188, 104458. [Google Scholar] [CrossRef]
Knudby, A.; Brenning, A.; LeDrew, E. New approaches to modelling fish–habitat relationships. Ecol. Modell. 2010, 221, 503–511. [Google Scholar] [CrossRef]
Falah, F.; Ghorbani Nejad, S.; Rahmati, O.; Daneshfar, M.; Zeinivand, H. Applicability of generalized additive model in groundwater potential modelling and comparison its performance by bivariate statistical methods. Geocarto Int. 2017, 32, 1069–1089. [Google Scholar] [CrossRef]
Arabameri, A.; Yamani, M.; Pradhan, B.; Melesse, A.; Shirani, K.; Tien Bui, D. Novel ensembles of COPRAS multi-criteria decision-making with logistic regression, boosted regression tree, and random forest for spatial prediction of gully erosion susceptibility. Sci. Total Environ. 2019, 688, 903–916. [Google Scholar] [CrossRef]
Darabi, H.; Choubin, B.; Rahmati, O.; Torabi Haghighi, A.; Pradhan, B.; Kløve, B. Urban flood risk mapping using the GARP and QUEST models: A comparative study of machine learning techniques. J. Hydrol. 2019, 569, 142–154. [Google Scholar] [CrossRef]
Kavzoglu, T.; Colkesen, I.; Sahin, E.K. Machine Learning Techniques in Landslide Susceptibility Mapping: A Survey and a Case Study; Springer: Cham, Switzerland, 2019; pp. 283–301. [Google Scholar]
Nguyen, V.; Pham, B.; Vu, B.; Prakash, I.; Jha, S.; Shahabi, H.; Shirzadi, A.; Ba, D.; Kumar, R.; Chatterjee, J.; et al. Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling. Forests 2019, 10, 157. [Google Scholar] [CrossRef] [Green Version]
Sameen, M.I.; Pradhan, B.; Bui, D.T.; Alamri, A.M. Systematic sample subdividing strategy for training landslide susceptibility models. Catena 2020, 187, 104358. [Google Scholar] [CrossRef]
Pašek, J. Landslides inventory. Bull. Int. Assoc. Eng. Geol. 1975, 12, 73–74. [Google Scholar] [CrossRef]
Ghosh, T.; Bhowmik, S.; Jaiswal, P.; Ghosh, S.; Kumar, D. Generating Substantially Complete Landslide Inventory Using Multiple Data Sources: A Case Study in Northwest Himalayas, India. J. Geol. Soc. India 2020, 95, 45–58. [Google Scholar] [CrossRef]
Guzzetti, F.; Peruccacci, S.; Rossi, M.; Stark, C.P. The rainfall intensity–duration control of shallow landslides and debris flows: An update. Landslides 2008, 5, 3–17. [Google Scholar] [CrossRef]
Du, J.; Glade, T.; Woldai, T.; Chai, B.; Zeng, B. Landslide susceptibility assessment based on an incomplete landslide inventory in the Jilong Valley, Tibet, Chinese Himalayas. Eng. Geol. 2020, 270, 105572. [Google Scholar] [CrossRef]
Tofani, V.; Del Ventisette, C.; Moretti, S.; Casagli, N.; Tofani, V.; Del Ventisette, C.; Moretti, S.; Casagli, N. Integration of Remote Sensing Techniques for Intensity Zonation within a Landslide Area: A Case Study in the Northern Apennines, Italy. Remote Sens. 2014, 6, 907–924. [Google Scholar] [CrossRef] [Green Version]
Guerriero, L.; Confuorto, P.; Calcaterra, D.; Guadagno, F.M.; Revellino, P.; Di Martire, D. PS-driven inventory of town-damaging landslides in the Benevento, Avellino and Salerno Provinces, southern Italy. J. Maps 2019, 15, 619–625. [Google Scholar] [CrossRef] [Green Version]
Rosi, A.; Tofani, V.; Tanteri, L.; Tacconi Stefanelli, C.; Agostini, A.; Catani, F.; Casagli, N. The new landslide inventory of Tuscany (Italy) updated with PS-InSAR: Geomorphological features and landslide distribution. Landslides 2018, 15, 5–19. [Google Scholar] [CrossRef] [Green Version]
Phillips, S.J.; Dudík, M. Modeling of species distributions with Maxent: New extensions and a comprehensive evaluation. Ecography 2008, 31, 161–175. [Google Scholar] [CrossRef]
Pradhan, A.M.S.; Kim, Y.T. Evaluation of a combined spatial multi-criteria evaluation model and deterministic model for landslide susceptibility mapping. Catena 2016, 140, 125–139. [Google Scholar] [CrossRef]
Pradhan, A.M.S.; Lee, S.-R.; Kim, Y.-T. A shallow slide prediction model combining rainfall threshold warnings and shallow slide susceptibility in Busan, Korea. Landslides 2019, 16, 647–659. [Google Scholar] [CrossRef]
Chen, W.; Pourghasemi, H.R.; Zhao, Z. A GIS-based comparative study of Dempster-Shafer, logistic regression and artificial neural network models for landslide susceptibility mapping. Geocarto Int. 2017, 32, 367–385. [Google Scholar] [CrossRef]
Glade, T.; Crozier, M.; Smith, P. Applying Probability Determination to Refine Landslide-triggering Rainfall Thresholds Using an Empirical “Antecedent Daily Rainfall Model”. Pure Appl. Geophys. 2000, 157, 1059–1079. [Google Scholar] [CrossRef]
Crozier, M.J.; Glade, T. A Review of Scale Dependency in Landslide Hazard and Risk Analysis. In Landslide Hazard and Risk; Wiley Online Library: Hoboken, NJ, USA, 2012; ISBN 9780471486633. [Google Scholar]
Ercanoglu, M.; Gokceoglu, C. Use of fuzzy relations to produce landslide susceptibility map of a landslide prone area (West Black Sea Region, Turkey). Eng. Geol. 2004, 75, 229–250. [Google Scholar] [CrossRef]
Dai, F.C.; Lee, C.F.; Li, J.; Xu, Z.W. Assessment of landslide susceptibility on the natural terrain of Lantau Island, Hong Kong. Environ. Geol. 2001, 40, 381–391. [Google Scholar] [CrossRef]
Doornkamp, J.C.; Cooke, R.U. Geomorphology in Environmental Management: An Introduction; Clarendon Press: Oxford, UK, 1974. [Google Scholar]
Pradhan, A.M.S.; Lee, J.-S.; Kim, Y.-T. Effect of spatial soil depth distribution model on shallow landslide prediction: A case study from Korean Mountain. EGUA 2018, 20, 17502. [Google Scholar]
Erener, A.; Düzgün, H.S.B. Landslide susceptibility assessment: What are the effects of mapping unit and mapping method? Environ. Earth Sci. 2012, 66, 859–877. [Google Scholar] [CrossRef]
Pachauri, A.K.; Pant, M. Landslide hazard mapping based on geological attributes. Eng. Geol. 1992, 32, 81–100. [Google Scholar] [CrossRef]
Moore, I.D.; Grayson, R.B.; Ladson, A.R. Digital terrain modelling: A review of hydrological, geomorphological, and biological applications. Hydrol. Process. 1991, 5, 3–30. [Google Scholar] [CrossRef]
Moore, I.D.; Burch, G.J. Physical Basis of the Length-slope Factor in the Universal Soil Loss Equation1. Soil Sci. Soc. Am. J. 1986, 50, 1294. [Google Scholar] [CrossRef]
He, Q.; Shahabi, H.; Shirzadi, A.; Li, S.; Chen, W.; Wang, N.; Chai, H.; Bian, H.; Ma, J.; Chen, Y.; et al. Landslide spatial modelling using novel bivariate statistical based Naïve Bayes, RBF Classifier, and RBF Network machine learning algorithms. Sci. Total Environ. 2019, 663, 1–15. [Google Scholar] [CrossRef]
Beven, K.J.; Kirkby, M.J. A physically based, variable contributing area model of basin hydrology. Hydrol. Sci. Bull. 1979, 24, 43–69. [Google Scholar] [CrossRef] [Green Version]
Rickli, C.; Zürcher, K.; Frey, W.; Lüscher, P. Wirkungen des Waldes auf oberflächennahe Rutschprozesse|Effects of forest on landslides. Schweiz. Z. Forstwes. 2002, 153, 437–445. [Google Scholar] [CrossRef] [Green Version]
Kitutu, M.G.; Muwanga, A.; Poesen, J.; Deckers, J.A. Influence of soil properties on landslide occurrences in Bududa district, Eastern Uganda. Afr. J. Agric. Res. 2009, 4, 611–620. [Google Scholar]
Sidle, R.C.; Pearce, A.J.; O’Loughlin, C.L.; American Geophysical Union. Hillslope Stability and Land Use; American Geophysical Union: Washington, DC, USA, 1985; ISBN 0875903150. [Google Scholar]
Yalcin, A. The effects of clay on landslides: A case study. Appl. Clay Sci. 2007, 38, 77–85. [Google Scholar] [CrossRef]
Duna, C.R.; D’Arcy, M.; McDonald, J.; Whittaker, C.A. Lithological controls on hillslope sediment supply: Insights from landslide activity and grain size distributions. Earth Surf. Process. Landf. 2018, 43, 956–977. [Google Scholar]
Kavzoglu, T.; Sahin, E.K.; Colkesen, I. Landslide susceptibility mapping using GIS-based multi-criteria decision analysis, support vector machines, and logistic regression. Landslides 2014, 11, 425–439. [Google Scholar] [CrossRef]
Pradhan, A.M.S.; Kang, H.S.; Lee, J.S.; Kim, Y.T. An ensemble landslide hazard model incorporating rainfall threshold for Mt. Umyeon, South Korea. Bull. Eng. Geol. Environ. 2019, 78, 131–146. [Google Scholar] [CrossRef]
O’Brien, R.M. A caution regarding rules of thumb for variance inflation factors. Qual. Quant. 2007, 41, 673–690. [Google Scholar] [CrossRef]
Menard, S. Applied Logistic Regression Analysis; SAGE: Thousand Oaks, CA, USA, 1995. [Google Scholar]
Slinker, B.K.; Glantz, S.A. Multiple regression for physiological data analysis: The problem of multicollinearity. Am. J. Physiol. 1985, 249, R1–R12. [Google Scholar] [CrossRef]
Slinker, B.K.; Glantz, S.A. Multiple linear regression: Accounting for multiple simultaneous determinants of a continuous dependent variable. Circulation 2008, 117, 1732–1737. [Google Scholar] [CrossRef] [Green Version]
Belsley, D.; Kuh, E.; Welsch, R. Detecting and Assessing Collinearity. In Regression Diagnostics: Identifying Influential Data and Sources of Collinearity; Wiley: New Yor, NY, USA, 1980; pp. 85–91. ISBN 9780471725152. [Google Scholar]
Swets, J.; Pickett, R.; Whitehead, S.; Getty, D.; Schnur, J.; Swets, J.; Freeman, B. Assessment of diagnostic technologies. Science 1979, 205, 753–759. [Google Scholar] [CrossRef]
Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143, 29–36. [Google Scholar] [CrossRef] [Green Version]
Hosmer, D.W.; Lemeshow, S. Applied Logistic Regression, 2nd ed.; Wiley-Blackwell: Hoboken, NJ, USA, 2000. [Google Scholar]
Baldi, P.; Brunak, S.; Chauvin, Y.; Andersen, C.A.F.; Nielsen, H. Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics 2000, 16, 421–424. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Catani, F.; Lagomarsino, D.; Segoni, S.; Tofani, V. Landslide susceptibility estimation by random forests technique: Sensitivity and scaling issues. Nat. Hazards Earth Syst. Sci. 2013, 13, 2815–2831. [Google Scholar] [CrossRef] [Green Version]
Freeman, E.; Frescino, T.; Moisen, G. ModelMap: An R Package for Modeling and Map Production Using Random Forest and Stochastic Gradient Boosting; USDA Forest Service/Rocky Mountain Research Station: Ogden, Utah, 2009; p. 507. [Google Scholar]
Chen, T.; Guestrin, C. XGBoost, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD ’16; ACM Press: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Chen, T.; He, T.; Benesty, M. Xgboost: Extreme Gradient Boosting; R Package Version 0.3-1; Technical Report; 2015; pp. 1–4. Available online: http://cran.fhcrc.org/web/packages/xgboost/vignettes/xgboost.pdf (accessed on 4 September 2020).
Bengio, Y.; Lee, D.-H.; Bornschein, J.; Mesnard, T.; Lin, Z. Towards Biologically Plausible Deep Learning. arXiv 2015, arXiv:1502.04156. [Google Scholar]
Marblestone, A.H.; Wayne, G.; Kording, K.P. Toward an Integration of Deep Learning and Neuroscience. Front. Comput. Neurosci. 2016, 10, 94. [Google Scholar] [CrossRef]
LeDell, E.; Gill, N.; Aiello, S.; Fu, A.; Candel, A.; Click, C.; Kraljevic, T.; Nykodym, T.; Aboyoun, P.; Kurka, M.; et al. H2O: R Interface for ‘H2O’; R Package Version 3.20.0.2; 2018; Available online: https://CRAN.R-project.org/package=h2o (accessed on 4 September 2020).
Sandino, J.; Pegg, G.; Gonzalez, F.; Smith, G.; Sandino, J.; Pegg, G.; Gonzalez, F.; Smith, G. Aerial Mapping of Forests Affected by Pathogens Using UAVs, Hyperspectral Sensors, and Artificial Intelligence. Sensors 2018, 18, 944. [Google Scholar] [CrossRef] [Green Version]
Klimeš, J. Landslide temporal analysis and susceptibility assessment as bases for landslide mitigation, Machu Picchu, Peru. Environ. Earth Sci. 2013, 70, 913–925. [Google Scholar] [CrossRef]
Shrestha, S.; Kang, T.-S.; Suwal, M. An Ensemble Model for Co-Seismic Landslide Susceptibility Using GIS and Random Forest Method. ISPRS Int. J. Geo-Inf. 2017, 6, 365. [Google Scholar] [CrossRef] [Green Version]
Pham, B.T.; Prakash, I.; Singh, S.K.; Shirzadi, A.; Shahabi, H.; Tran, T.-T.-T.; Bui, D.T. Landslide susceptibility modeling using Reduced Error Pruning Trees and different ensemble techniques: Hybrid machine learning approaches. CATENA 2019, 175, 203–218. [Google Scholar] [CrossRef]
Nobre, A.D.; Cuartas, L.A.; Hodnett, M.; Rennó, C.D.; Rodrigues, G.; Silveira, A.; Waterloo, M.; Saleska, S. Height Above the Nearest Drainage—A Hydrologically Relevant New Terrain Model. J. Hydrol. 2011, 404, 13–29. [Google Scholar] [CrossRef] [Green Version]
Gökceoglu, C.; Aksoy, H. Landslide susceptibility mapping of the slopes in the residual soils of the Mengen region (Turkey) by deterministic stability analyses and image processing techniques. Eng. Geol. 1996, 44, 147–161. [Google Scholar] [CrossRef]
Convertino, M.; Troccoli, A.; Catani, F. Detecting fingerprints of landslide drivers: A MaxEnt model. J. Geophys. Res. Earth Surf. 2013, 118, 1367–1386. [Google Scholar] [CrossRef] [Green Version]
Lagomarsino, D.; Tofani, V.; Segoni, S.; Catani, F.; Casagli, N. A Tool for Classification and Regression Using Random Forest Methodology: Applications to Landslide Susceptibility Mapping and Soil Thickness Modeling. Environ. Model. Assess. 2017, 22, 201–214. [Google Scholar] [CrossRef]
Xu, C.; Dai, F.; Xu, X.; Lee, Y.H. GIS-based support vector machine modeling of earthquake-triggered landslide susceptibility in the Jianjiang River watershed, China. Geomorphology 2012, 145–146, 70–80. [Google Scholar] [CrossRef]
Kim, J.C.; Lee, S.; Jung, H.S.; Lee, S. Landslide susceptibility mapping using random forest and boosted tree models in Pyeong-Chang, Korea. Geocarto Int. 2018, 33, 1000–1015. [Google Scholar] [CrossRef]
Lombardo, L.; Cama, M.; Conoscenti, C.; Märker, M.; Rotigliano, E. Binary logistic regression versus stochastic gradient boosted decision trees in assessing landslide susceptibility for multiple-occurring landslide events: Application to the 2009 storm event in Messina (Sicily, southern Italy). Nat. Hazards 2015, 79, 1621–1648. [Google Scholar] [CrossRef]
Song, Y.; Niu, R.; Xu, S.; Ye, R.; Peng, L.; Guo, T.; Li, S.; Chen, T. Landslide susceptibility mapping based on weighted gradient boosting decision tree in Wanzhou section of the three gorges reservoir area (China). ISPRS Int. J. Geo-Inf. 2019, 8, 4. [Google Scholar] [CrossRef] [Green Version]
Ghorbanzadeh, O.; Blaschke, T.; Gholamnia, K.; Meena, S.R.; Tiede, D.; Aryal, J. Evaluation of different machine learning methods and deep-learning convolutional neural networks for landslide detection. Remote Sens. 2019, 11, 196. [Google Scholar] [CrossRef] [Green Version]
Xiao, L.; Zhang, Y.; Peng, G. Landslide susceptibility assessment using integrated deep learning algorithm along the china-nepal highway. Sensors (Switzerland) 2018, 18, 4436. [Google Scholar] [CrossRef] [Green Version]
Wang, F.; Xu, P.; Wang, C.; Wang, N.; Jiang, N. Application of a GIS-Based Slope Unit Method for Landslide Susceptibility Mapping along the Longzi River, Southeastern Tibetan Plateau, China. ISPRS Int. J. Geo-Inf. 2017, 6, 172. [Google Scholar] [CrossRef] [Green Version]
Dou, J.; Yunus, A.P.; Merghadi, A.; Shirzadi, A.; Nguyen, H.; Hussain, Y.; Avtar, R.; Chen, Y.; Pham, B.T.; Yamagishi, H. Different sampling strategies for predicting landslide susceptibilities are deemed less consequential with deep learning. Sci. Total Environ. 2020, 720, 137320. [Google Scholar] [CrossRef] [PubMed]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017, 60, 84–90. [Google Scholar] [CrossRef]
Ciregan, D.; Meier, U.; Schmidhuber, J. Multi-column deep neural networks for image classification. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar]

Figure 1. Geological map of study area. The study sites are predominated by Mesozoic granites (Pink in color in Figure). Location of study area in inset.

Figure 2. Landslide inventory map of Deokjeokri (Training data) and Karisanri catchment (Testing data).

Figure 3. (a)–(e): observed landslides and (f) debris deposit.

Figure 4. Topographic influencing factors: (a) aspect, (b) elevation, (c) slope, (d) internal relief and (e) curvature.

Figure 5. Hydrologic influencing factors: (a) drainage proximity (h), (b) drainage proximity (v), (c) drainage density, (d) stream power index (SPI), (e) sediment transport index (STI), and (f) topographic wetness index (TWI).

Figure 6. Influencing factors: (a) forest type and (b) soil type.

Figure 7. Illustrating study flow.

Figure 8. (a) The error rate of overall RF model and (b) mean decrease accuracy.

Figure 9. (a) Loss vs. iteration of XGBoost model and (b) variable importance.

Figure 10. (a) Loss vs. epoch of DNN model and (b) variable importance.

Figure 11. Landslide susceptibility maps: (a) RF, (b) XGBoost and (c) DNN.

Figure 12. Accuracy assessment using ROC: (a) training and (b) testing.

Figure 13. Four-fold plot summarizing the total number of TP, TN, FP and FN of the (a) RF, (b) XGBoost and (c) DNN models.

Table 1. Data type, scale and producer.

Spatial Data		Scale	Resolution	Producer
Landslide inventory		Point data		Satellite image, aerial photographs, field survey, ArcGIS 10.2
Topographic factor	Aspect	1:5000	10 × 10 m	ArcGIS 10.2
	Elevation
	Slope
	Internal relief
	Curvature
Hydrologic factor	Drainage proximity (h)	1:5000		ArcGIS 10.2
	Drainage proximity (v)
	Drainage density
	Stream Power Index		10 × 10 m
	Sediment Transport Index
	Topographic Wetness Index
Soil factor	Soil type	1:25,000	10 × 10 m	Korea Forest Service (KFS)
Forest factor	Forest type	1:25,000	10 × 10 m	Korea Forest Service (KFS)
Geological factor	Geology	1:50,000	10 × 10 m	Korean Institute of Geoscience and Mineral Resources (KIGAM)

Table 2. Multicollinearity test for the 14 landslide IFs.

Statistic	R²	Tolerance	VIF
Aspect	0.05	0.95	1.05
Elevation	0.40	0.60	1.67
Slope	0.70	0.30	3.33
Internal relief	0.90	0.10	10.47
Curvature	0.49	0.51	1.95
Drain prox. (h)	0.45	0.55	1.82
Drain prox. (v)	0.55	0.45	2.23
Drainage density	0.52	0.48	2.10
SPI	0.71	0.29	3.45
STI	0.78	0.22	4.55
TWI	0.74	0.26	3.85
Forest	0.25	0.75	1.33
Soil	0.02	0.98	1.02
Geology	0.09	0.91	1.09

Remark: Bold numbers are failed in multicollinearity test.

Table 3. Performance results of implemented models in the testing phase.

Data Set	Model	TP	FP	FN	TN	Specificity (%)	Sensitivity (%)	ACC (%)
Testing	RF	155	145	64	293	66.89	70.77	68.19
	XGBoost	159	106	60	332	75.79	72.6	74.73
	DNN	184	72	35	366	83.56	84.01	83.71

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pradhan, A.M.S.; Kim, Y.-T. Rainfall-Induced Shallow Landslide Susceptibility Mapping at Two Adjacent Catchments Using Advanced Machine Learning Algorithms. ISPRS Int. J. Geo-Inf. 2020, 9, 569. https://doi.org/10.3390/ijgi9100569

AMA Style

Pradhan AMS, Kim Y-T. Rainfall-Induced Shallow Landslide Susceptibility Mapping at Two Adjacent Catchments Using Advanced Machine Learning Algorithms. ISPRS International Journal of Geo-Information. 2020; 9(10):569. https://doi.org/10.3390/ijgi9100569

Chicago/Turabian Style

Pradhan, Ananta Man Singh, and Yun-Tae Kim. 2020. "Rainfall-Induced Shallow Landslide Susceptibility Mapping at Two Adjacent Catchments Using Advanced Machine Learning Algorithms" ISPRS International Journal of Geo-Information 9, no. 10: 569. https://doi.org/10.3390/ijgi9100569

APA Style

Pradhan, A. M. S., & Kim, Y.-T. (2020). Rainfall-Induced Shallow Landslide Susceptibility Mapping at Two Adjacent Catchments Using Advanced Machine Learning Algorithms. ISPRS International Journal of Geo-Information, 9(10), 569. https://doi.org/10.3390/ijgi9100569

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Rainfall-Induced Shallow Landslide Susceptibility Mapping at Two Adjacent Catchments Using Advanced Machine Learning Algorithms

Abstract

1. Introduction

2. Description of Study Area

3. The Collected Dataset and Methods

3.1. Landslide Inventory

3.2. Landslide Influencing Factor (IF)

3.2.1. Topographic Factors

3.2.2. Hydrologic Factors

3.2.3. Forest and Soil Factors

3.2.4. Geologic Factor

3.3. Modeling Approaches

3.3.1. Random Forest (RF)

3.3.2. Extreme Gradient Boosting (XGBoost)

3.3.3. Deep Neural Network (DNN)

4. Results and Analysis

4.1. Selection of IFs

4.2. Application of RF in Landslide Susceptibility Mapping

4.3. Application of XGBost in Landslide Susceptibility Mapping

4.4. Application of DNN in Landslide Susceptibility Mapping

4.5. Evaluation Measures

5. Discussion

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI