Comparative Analysis of Tree-Based Ensemble Learning Algorithms for Landslide Susceptibility Mapping: A Case Study in Rize, Turkey

Yavuz Ozalp, Ayse; Akinci, Halil; Zeybek, Mustafa

doi:10.3390/w15142661

Open AccessArticle

Comparative Analysis of Tree-Based Ensemble Learning Algorithms for Landslide Susceptibility Mapping: A Case Study in Rize, Turkey

by

Ayse Yavuz Ozalp

^1,*

,

Halil Akinci

¹

and

Mustafa Zeybek

²

¹

Department of Geomatics Engineering, Artvin Coruh University, 08100 Artvin, Turkey

²

Guneysinir Vocational School, Selcuk University, 42490 Konya, Turkey

^*

Author to whom correspondence should be addressed.

Water 2023, 15(14), 2661; https://doi.org/10.3390/w15142661

Submission received: 9 May 2023 / Revised: 27 June 2023 / Accepted: 20 July 2023 / Published: 22 July 2023

(This article belongs to the Special Issue Risk Analysis in Landslides and Groundwater-Related Hazards)

Download

Browse Figures

Versions Notes

Abstract

:

The Eastern Black Sea Region is regarded as the most prone to landslides in Turkey due to its geological, geographical, and climatic characteristics. Landslides in this region inflict both fatalities and significant economic damage. The main objective of this study was to create landslide susceptibility maps (LSMs) using tree-based ensemble learning algorithms for the Ardeşen and Fındıklı districts of Rize Province, which is the second-most-prone province in terms of landslides within the Eastern Black Sea Region, after Trabzon. In the study, Random Forest (RF), Gradient Boosting Machine (GBM), CatBoost, and Extreme Gradient Boosting (XGBoost) were used as tree-based machine learning algorithms. Thus, comparing the prediction performances of these algorithms was established as the second aim of the study. For this purpose, 14 conditioning factors were used to create LMSs. The conditioning factors are: lithology, altitude, land cover, aspect, slope, slope length and steepness factor (LS-factor), plan and profile curvatures, tree cover density, topographic position index, topographic wetness index, distance to drainage, distance to roads, and distance to faults. The total data set, which includes landslide and non-landslide pixels, was split into two parts: training data set (70%) and validation data set (30%). The area under the receiver operating characteristic curve (AUC-ROC) method was used to evaluate the prediction performances of the models. The AUC values showed that the CatBoost (AUC = 0.988) had the highest prediction performance, followed by XGBoost (AUC = 0.987), RF (AUC = 0.985), and GBM (ACU = 0.975) algorithms. Although the AUC values of the models were close to each other, the CatBoost performed slightly better than the other models. These results showed that especially CatBoost and XGBoost models can be used to reduce landslide damages in the study area.

Keywords:

landslide susceptibility map; machine learning; RF; GBM; CatBoost; XGBoost

1. Introduction

Natural events and subsequent disasters constitute an important problem for Turkey and the whole world. Especially in recent years, when factors such as the rapid increase in population; increasing land need and subsequent unplanned and improper land use; global climate change; and deforestation are taken into consideration, there has been a significant increase in the number of natural events that turn into disasters, and it is predicted that this increasing trend will continue in the future [1,2,3]. The Sixth Assessment Report of the Intergovernmental Panel on Climate Change emphasized that “unplanned rapid urbanization is a significant risk where cities and settlements expand into lands prone to natural disasters such as coastal flooding or landslides” [4]. In this context, landslides are the type of disaster with the highest potential for damage and loss of life after earthquakes, considering the long-term averages worldwide [1,5]. The most effective way to deal with the threat of landslides and to reduce their negative impacts on human life, the environment, and the economy is to identify hazardous and risky areas and thus produce robust, up-to-date, and trustworthy landslide susceptibility maps (LSMs) [3]. According to Pardeshi et al. [6], if enough effort is paid to investigate trigger factors and determine high-risk areas, 90% of landslide-related losses can be managed and mitigated. The difficulty in predicting landslides makes it important to comprehend the factors that cause landslides and to identify and map areas vulnerable to landslide occurrence in the future [7,8]. LSMs include spatial rather than temporal and magnitude estimates of landslides [9] and are an important tool to prevent or mitigate disasters, as well as for environmental management and urban planning [5,8,10,11].

The methods used in the generation of LSMs can be categorized into two groups: qualitative approaches (such as Analytic Hierarchy Process—AHP), which are mostly based on expert knowledge, and quantitative approaches (such as logistic regression and the frequency ratio method), which are based on statistical theories or modeling [8,11,12]. Recently, the interest in machine learning (ML) techniques due to the developments in computer technologies has led to the use of ML algorithms in the production of LSMs and thus allowed us to obtain high-precision, more-accurate, and reliable results. As a matter of fact, Tien Bui et al. [7] reported that forecasting with high-performance models is very important in controlling landslide-prone areas. The main advantage of ML techniques over traditional statistical methods is their ability to deal with high-dimensional and complex nonlinear data sets and thus effectively solve the nonlinear nature of geographical problems [12]. The fundamental idea behind ML approaches is to investigate the functional link between existing landslides and conditioning factors [13].

During the past decade, hundreds of studies have utilized ML algorithms for landslide susceptibility (LS) evaluation. Looking at the previous studies using ML algorithms, it is seen that some studies produced LSMs with a single ML algorithm [14,15,16,17], while some studies produced LSMs using multiple ML algorithms together [18,19,20] and compared their performances. In these studies, algorithms such as Support Vector Machines (SVM) [21,22], K-Nearest Neighbor (KNN) [23,24], Naïve Bayes (NB) [25,26], Artificial Neural Network (ANN) [27,28], Multilayer Perceptron (MLP) [7,29], Classification and Regression Tree (CART) [30,31], Random Forest (RF) [16,32], Adaptive Boosting (AdaBoost) [33,34], Gradient Boosting Machine (GBM) [28,35], Light Gradient Boosting Machine (LightGBM) [36,37], Natural Gradient Boosting (NGBoost) [3], Extreme Gradient Boosting (XGBoost) [17] and categorical boosting (CatBoost) [36,38] are frequently used. Algorithms such as RF, GBM, LightGBM, AdaBoost, NGBoost, CatBoost, and XGBoost that use ensemble methods such as bagging, stacking, or boosting are called tree-based ensemble learning algorithms [36,39].

When previous studies in the literature are examined, it is seen that ML algorithms produce successful results in the production of LSMs; however, the performance of the algorithms varies depending on different topographies, geological formation, climate, and geographies, and there is disagreement over the ideal paradigm for LS mapping [7,11,40,41]. Therefore, it is very important to test different ML algorithms in different geographies and compare their performances in order to contribute to the LS literature. On the other hand, Merghadi et al. [23] noticed that tree-based ensemble learning algorithms outperform other ML algorithms. For example, in the LS mapping study conducted by Akinci and Zeybek [19], logistic regression (LR), SVM, and RF algorithms were compared, and it was determined that the RF algorithm gave better prediction results than other algorithms. Merghadi et al. [42] examined five landslide susceptibility models, including ANN, LR, RF, SVM, and GBM, and determined that GBM had the best predicting performance, followed by RF. In the study carried out by Sahin [35], the performances of GBM, RF, and XGBoost algorithms were compared. The researcher concluded that XGBoost is the optimum model when compared to other ensemble models.

Deep learning (DL) is a current trending approach in LS mapping in addition to ML. Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Deep Residual Networks (ResNets), and Recurrent Neural Networks (RNNs) have all been employed successfully in LS mapping applications [29,43,44,45,46]. It is possible to find LS mapping research that claims DL methods beat ML algorithms like ANN, MLP, SVM, and RF [7,43,47,48]. There has not been much research comparing DL algorithms to advanced ML algorithms like AdaBoost, CatBoost, LightGBM, and XGBoost. According to Lv et al. [45], boosting and stacking learning models outperform deep learning models such as DBN, CNN, and ResNet. This example demonstrates that DL algorithms do not outperform ensemble learning algorithms in LS mapping, and the argument over the best algorithm remains.

Trabzon and Rize are the two provinces in Turkey with the highest number of landslides. These two provinces account for around 13% of all landslides in Turkey. Therefore, the first aim of this study is to produce LSMs of the Ardeşen and Fındıklı districts of Rize Province by utilizing tree-based ensemble learning algorithms including RF, GBM, CatBoost, and XGBoost. The second aim of this research is to compare the predictive capabilities of these models. These two districts were selected as the study area for the following reasons: (i) no previous LS mapping study has been conducted with ML algorithms in the study area, (ii) the study area is one of the most landslide-prone regions in Turkey, (iii) researchers reported that the landslide risk remains in these districts, and there is a need for accurate and reliable LSMs [49,50], (iv) the slope in the region is high and shows sudden changes, (v) Rize is the province with the highest rainfall in Turkey and its geological and geomorphological characteristics as well as its climatic characteristics are suitable for landslide formation. While traditional ML algorithms are widely used in LS mapping applications, there have been few studies that examine the performance of ensemble learning algorithms. As stated by Yu et al. [34], more studies are needed to compare the performances of ensemble learning algorithms. To the best of our knowledge, no study has compared the performance of the four algorithms employed in this study when used simultaneously. As a result of this circumstance, our study will be able to contribute to the LS mapping literature.

2. Materials and Methods

2.1. Study Area

The Ardeşen and Fındıklı districts of Rize Province, in the eastern Black Sea Region, are the focus of this research (Figure 1). The overall area of the two districts, lying between 40°57′37.42″–41°19′37.57″ north latitude and 40°57′41.5″–41°23′44.21″ east longitude, is 76,001.20 ha. In the research area, where the average elevation is 1169.25 m, the elevation varies between 0 and 3497.38 m. The slope in the study area, a very hilly topography, is between 0° and 75.82°. The slope is below 10° in 5.83% of the study area, between 10° and 20° in 14.51%, and above 20° in 79.66%.

Rize has cool summers, mild winters, and a rainy climate in all seasons. According to the climate classification of Thornthwaite [51], Rize has a very humid, second-degree mesothermal climate, no or very little water deficit, a summer evaporation rate of 50.4%, and a climate type close to the ocean climate (A,B’2,r,b’4) [52]. Average temperatures in Rize vary between 6.8 °C and 23.3 °C. Based on measurements made by the General Directorate of Meteorology (GDM) from 1928 to 2022, the lowest average temperature is 3.7 °C in February and the highest average temperature is 26.5 °C in August. In Rize, where the average sunshine duration is 4.2 h, the average annual total precipitation is 2302 mm [53]. According to researchers [49,54,55], excessive and intense precipitation is the primary cause of landslides in the Ardeşen and Fındıklı districts, as well as in Rize in general.

According to the CORINE 2018 land cover data set (Copernicus Land Monitoring Service, European Environment Agency, EU), artificial surfaces cover 0.96% of the study area, agricultural areas cover 21.12%, forest covers 59.27%, natural grass-land covers 8.07%, transitional woodland/shrub covers 2.02%, bare rocks cover 4.94%, sparsely vegetated areas cover 2.75%, and water bodies cover 0.87%. According to Karsli et al. [54], tea gardens cover 90% of the agricultural land in Ardeşen. Tea cultivation is the primary source of income for residents of the Ardeşen and Fındıklı districts. In the study conducted by Yalcin [50], it was stated that 61.09% of the landslides in Ardeşen district occurred in tea gardens. Also, Akgun et al. [49] stated that 48.78% of the landslides in Fındıklı occurred in residential and agricultural areas.

Figure 2 contains the study area’s geological map. There are nine lithological units in the research area according to this map at a 1/100.000 scale, which was collected from the General Directorate of Mineral Research and Exploration (GDMRE). Hamurkesen Formation (Jh), which consists of Triassic-aged basaltic, andesitic, dacitic lavas and pyroclastics and rock types such as sandstone, marl, limestone and shale, is the oldest unit in the research area. The Santonian-aged Kızılkaya Formation (Kk) consists of dacite, rhyolite, rhyolodacitic lavas and pyroclastics; the Campanian-Maastrichtian-aged Çağlayan Formation (Kça) consists of basaltic, andesitic lavas and pyroclastics and limestone, mudstone, sandstone, marl and tuff interlayers; and the Maastrichtian-aged Çayırbağ Formation (Kçb) consists of rhyolite, rhyolite, rhyodacite, dacitic lavas and pyroclastics. These volcanic units are overlain by the Maastrichtian-Danian (Early Palaeocene)-aged Cankurtaran Formation (KTc) consisting of sandy limestone, micritic limestone, tuff, marl and volcanic sandstone. These units were cut by the Kaçkar granitoid-I (Kk1), which is composed of acidic and basic intrusive rocks that continued its development during the Late Cretaceous and completed its intrusion at the end of the Palaeocene. Early-Middle Miocene-aged Pazar formation (Tmp) consists of sandstone, marl, pebble and claystone succession. The Plio-Quaternary-aged Hamidiye formation (plQh) consisting of sand and clay lenticular terrestrial conglomerate and Quaternary-aged alluvium (Qal) are the youngest units in the study area [56].

2.2. Landslide Inventory Map

In ML-based landslide susceptibility assessment, landslide inventory maps (LIMs) are needed for the training and validation of the models. LIMs are maps that provide information about the spatial distribution, activity, type, and, if known, the time of occurrence of landslides in a certain region [57]. The GDMRE provided digital landslide inventory data for the research area. There are 76 landslide polygons in this 1/25.000 scale data set. The total area of landslide polygons ranging from 0.74 ha to 54.67 ha is 621.06 ha. According to GDMRE, 16 of the landslides are classified as inactive landslides, 30 as active landslides, and the remaining 30 as active flow. According to Varnes [58], landslides in the study area are mostly of the flow and rotational slide type. The main factor triggering the occurrence of landslides in Rize Province is excessive and heavy rainfall [55,59]. The other factor causing landslides is the significant weathering of lithological units. Yalçın [59] found that about 95% of the landslides in Ardeşen happen in rocks that have been highly and fully weathered. Landslide polygons in the study area are represented by 62,089 pixels at 10 m spatial resolution. In parallel with previous studies in the literature, landslide pixels were divided into two parts: training data set (70%) and validation/test data set (30%) [28,46,60,61,62].

2.3. Data Preparation for Landslide Conditioning Factors

The literature, the study area’s local characteristics, and the availability of spatial data were all taken into account while deciding on the conditioning factors employed in the study. Although there is much research in the literature examining the relationship between landslide factors and landslides in different geographies, there is no universal guideline for the selection of conditioning factors. A certain factor may play an important role in the occurrence of landslides in one region, while it may not be important in another region. As a result, the study area’s characteristics and accessible data should be considered while determining the conditioning factors [5,16,63,64].

Therefore, in this study, 15 conditioning factors affecting landslide formation were identified by considering topographical, hydrological, geological, and some anthropogenic effects. These factors include lithology, altitude, land cover, aspect, slope, slope length and steepness factor (LS-factor), plan and profile curvatures, tree cover density (TCD), topographic position index (TPI), topographic ruggedness index (TRI), topographic wetness index (TWI), distance to drainage, distance to roads, and distance to faults (Figure 2, Figure 3, Figure 4 and Figure 5). TRI was excluded based on multicollinearity analysis, and the remaining 14 factors were used in ML-based LS mapping models. Raster-based factor maps with a spatial resolution of 10 m were produced by ArcGIS 10.5 and SAGA GIS 7.8.2 software.

On the other hand, since the role and importance of conditioning factors in the formation of landslides have been explained in detail in the literature [16,21,38,65,66], the data source and characteristics of the conditioning factors used in this article are shortly presented in Table 1.

2.4. Multicollinearity Analysis

In ML-based LS mapping studies, verifying the independence of factors from each other and determining the proper factors for the model is usually performed by multicollinearity analysis [32,46]. Multicollinearity problems may cause misleading results and the misinterpretation of model results by reducing the prediction accuracy of the applied model [3,85]. With multicollinearity analysis, factors that are highly correlated and have a similar effect on landslide occurrence are identified and removed from the model. The most commonly used statistical indicators to determine multicollinearity are variance inflation factor (VIF) and tolerance (TOL) [26,86,87]. Using the following Equation (1) [26,39,85] and Equation (2) [26,43,65], TOL and VIF are calculated.

T O L = 1 - R_{j}^{2}

(1)

V I F = \frac{1}{1 - R_{j}^{2}} = \frac{1}{T O L}

(2)

where R_j is the correlation coefficient of a particular conditioning factor on the remaining factors. Multicollinear factors should be excluded from the model if their VIF is larger than 10 or TOL is below 0.1 [39,43,65,88].

2.5. Model Validation

The last step of LS mapping studies is the validation of the models used in the study [89]. Wei et al. [39] stated that the effectiveness and reliability of the models cannot be evaluated scientifically without validation. The commonly used validation method in LS mapping studies is the receiver operating characteristic (ROC) curve and area under the ROC curve (AUC-ROC for short) approach [28,29,32,60,62]. The AUC value of the ROC curve, which displays the true positive rate (TPR) and false positive rate (FPR) on the Y and X axes, respectively, ranges from 0.5 to 1, and a value near 1 denotes outstanding model performance [65]. According to Chen et al. [31], Jiao et al. [90], and Wu et al. [91], the AUC value is generally classified five 5 ways: poor (0.5–0.6), average (0.6–0.7), good (0.7–0.8), very good (0.8–0.9), and excellent (0.9–1). As in previous studies in the literature, success rate and prediction rate curves are used in this study to evaluate the performance of landslide susceptibility algorithms [22,92,93]. While the success rate curve is produced using the training data set, the prediction rate curve is produced using the validation data set [13,16].

2.6. Machine Learning Methods

2.6.1. Random Forest (RF)

One of the well-known tree-based machine learning algorithms used for classification, regression, and clustering issues is RF, developed by Breiman [94]. RF, which is essentially a collection of decision trees, is an ensemble learning method that combines the results or predictions of decision trees in the forest to make an accurate and stable prediction. The RF algorithm uses majority voting for classification and averaging for regression [95]. Combining the predictions from multiple decision trees in this way both reduces the variance of the model and greatly improves the performance of the model. RF has two hyperparameters: the number of trees (ntree) and the number of randomly selected features or variables (mtry) to train each decision tree in the forest [35,39]. RF has been widely and successfully used in susceptibility mapping studies for different types of natural disasters [84,95,96]. In this study, the “rf” method of the caret library [97] in R 4.2.2 software was used to implement the RF algorithm. For hyperparameter optimization, the tuneLength approach was applied and the value of the ntree parameter was set to 100 (same number of trees used as GBM’s and XGBoost’s default number of trees) and the value of the mtry parameter was set to 11.

2.6.2. Gradient Boosting Machine (GBM)

GBM, proposed by Friedman [98], is an ensemble learning method that combines multiple weak learners in a sequential manner to form a strong learner. Unlike RF, which builds trees independently and in parallel, GBM builds trees continuously, and each tree improves the model’s performance by reducing mistakes from the preceding tree [11,29]. GBM has four main hyperparameters: number of trees (n.trees), learning rate (shrinkage), maximum tree depth (interaction.depth) and the minimum number of observations in trees (n.minobsinnode). In this study, the number of trees was set to 100 in order to make an objective comparison with the XGBoost algorithm performed with default values. The n.minobsinnode parameter was set to the default value (10). The grid search optimization approach was used to determine the values of the interaction.depth and shrinkage parameters. Grid search and random search are two strategies for optimizing hyperparameters in ML algorithms [18,34,36,39]. The grid search optimization found the value of the interaction.depth parameter to be 8 and the value of the shrinkage parameter to be 0.3. In this study, the “caret” library [97] and the “gbm” method of this library were used for landslide susceptibility modeling with GBM.

2.6.3. Extreme Gradient Boosting (XGBoost)

Extreme Gradient Boosting, or XGBoost for short, introduced by Chen and Guestrin [99], is a supervised machine learning algorithm for regression and classification problems. Designed to achieve faster and more-accurate predictions, XGBoost is an optimized version of GBM [3,17]. The main idea of the XGBoost algorithm is to transform several weak learners into stronger learners through multiple iterations to achieve better prediction performance [17,39]. Unlike RF and GBM, XGBoost uses two additional techniques called “shrinkage” and “column (feature) subsampling” to avoid the overfitting problem [99]. XGBoost has hyperparameters called “nrounds (maximum number of iterations or number of trees), max_depth (maximum depth of the trees), eta (learning rate), gamma (regularization parameter), colsample_bytree (number of features or variables supplied to a tree), min_child_weight (minimum sum of instance weight needed in a child) and subsample (number of samples or observations supplied to a tree)” to improve the performance of the model. Hyperparameters are important parameters that affect the accuracy, performance, and speed of the model. In this study, default values of hyperparameters (nrounds = 100, max_depth = 6, eta = 0.3, gamma = 0, colsample_bytree = 1, min_child_weight = 1, subsample = 1) were used. The “xgbTree” method from the “caret” library was used to implement XGBoost for this research [97].

2.6.4. Categorical Boosting (CatBoost)

CatBoost, an algorithm using binary decision trees first introduced in 2017, uses ordered boosting to achieve high prediction accuracy [100]. Features such as GPU support for fast training, high learning speed, better handling of categorical details, visualization tools, overcoming gradient bias, and producing symmetric oblivious trees make CatBoost different from other gradient boosting algorithms [38,101]. Kang et al. [102] reported that CatBoost responds to the overfitting problem better than other gradient boosting algorithms. In this study, the “catboost” library [103] in R 4.2.2 software was used to implement the CatBoost algorithm. There are six commonly used hyperparameters in CatBoost. These are depth, learning_rate, iterations, l2_leaf_reg, rsm, and border_count. The iterations, which represent the maximum number of trees, are set to 100 to be compatible with other models. The values of the other hyperparameters of the CatBoost algorithm were determined using the grid search method. As a result of the grid search optimization, the value of depth parameter was determined as 8, the value of learning_rate parameter was determined as 0.3, l2_leaf_reg was determined as 0.1, rsm was determined as 0.95 and the value of border_count parameter was determined as 16.

3. Results and Discussion

3.1. Multicollinearity Analysis of Conditioning Factors

Multicollinearity analysis results for the conditioning factors used in this investigation are presented in Table 2. According to the preliminary results, it was determined that there was a high correlation between slope and TRI. Therefore, TRI was removed from the model, and multicollinearity analysis was performed again with the remaining 14 factors. The final results are summarized in Table 2, where we learn that the highest VIF value was 3.61071 and that there was no multicollinearity among these factors. Therefore, the 14 factors in Table 2 were used to produce LSMs of the study area.

3.2. Landslide Susceptibility Maps

The landslide susceptibility index values calculated by RF, GBM, CatBoost, and XGBoost algorithms were categorized into five classes (very low, low, moderate, high and very high) using the “natural breaks (jenks)” algorithm [104], and LSMs of the study area were obtained (Figure 6). When the current studies in the literature are examined, it is seen that landslide susceptibility index values are mostly classified with the “natural breaks (Jenks)” algorithm [28,38,39,69].

In order to compare the produced LSMs, the areal distributions of landslide susceptibility classes were calculated. In this context, the areal distributions of the landslide susceptibility classes of the four models in percentage are given in Figure 7. It was determined according to the LSM produced by the RF model that 62.17% of the study area is very low, 19.27% is low, 9.55% is moderate, 5.45% is high and 3.56% is very high in terms of susceptibility to landslides. In the LSM produced according to the GBM model, the proportion of areas susceptible to very low, low, moderate, high, and very high degrees was determined as 20.76%, 40.48%, 19.85%, 12.06% and 6.85%, respectively. In the CatBoost model, it was calculated that 13.40% of the study area was very low, 52.48% was low, 18.47% was moderate, 10.29% was high, and 5.36% was very high in terms of landslide susceptibility. In the LSM produced according to the XGBoost model, the ratio of very low, low, moderate, high and very high landslide susceptibility areas was calculated as 14.08%, 53.77%, 17.53%, 9.36% and 5.26%, respectively. It is clearly seen in Figure 7 that there are significant differences in the distribution of landslide susceptibility classes. It is thought that this difference in susceptibility classes is due to the differences in the mechanisms or approaches used by the models while creating the decision trees.

In the study area, the proportion of areas with very high landslide susceptibility varies between 3.56% and 6.85%. Considering the sum of the areas susceptible to high and very high landslides, it was seen that GBM ranks first with a rate of 18.91%, followed by CatBoost (15.65%), XGBoost (14.62%) and RF (9.01%). It was determined in the LS mapping study carried out by Akgun et al. [49] using the likelihood ratio model in the Fındıklı region that the ratio of areas with very high landslide susceptibility was 2.88%.

When the LS maps were visually examined, it was seen that areas with high and very high landslide susceptibility were located in the north and northwest parts of the study area, while low and very low landslide-susceptible areas were located in the south and southeast of the study area. The fact that the settlement and tea farming areas were mostly located in the northern parts close to the Black Sea coast has caused the landslide-susceptible areas to intensify in the north.

An accurate LS map should be able to accurately classify the existing landslides in the study area. When overlapping the LS maps produced in the study with the landslide inventory map, the distribution of landslide pixels according to the susceptibility classes in the LS maps was determined. The landslide pixel ratios coinciding with high and very high landslide-susceptible areas were determined as 99.64%, 99.5%, 98.38%, and 96.42% for the CatBoost, XGBoost, GBM, and RF models, respectively (Table 3). This evaluation revealed that the LS map produced with CatBoost was slightly more accurate than other models.

3.3. Landslide Susceptibility Map Rationality

The rationality of the landslide susceptibility maps was evaluated using the data in Table 3. As stated by Guo et al. [105], the rationality of the susceptibility map can be assessed as follows: “(i) landslides should be located in areas of high susceptibility as much as possible; (ii) areas of very high susceptibility in the susceptibility map should be as small as possible”.

As a whole, the landslide susceptibility maps derived from ML models displayed a similar trend. As susceptibility increased, the frequency ratios tended to increase as well. In all models, the class with the highest likelihood of landslides was found to have the highest incidence rate. The data in Table 3 demonstrate that the produced landslide susceptibility maps were plausible and that the percentage of landslide occurrence increased from regions with very low susceptibility to very high susceptibility. In contrast, landslide susceptibility maps derived from the CatBoost and XGBoost models appear to be more reasonable than those derived from other models. Considering the first criterion stated by Guo et al. [105], the fact that 99.64% of the landslides in CatBoost are in the high and very high sensitivity classes propels this model to the forefront. Regarding the second criterion, the XGBoost model comes out on top because it has the lowest rate (5.26%) of severely landslide-prone regions. Nevertheless, the difference between both criteria for CatBoost and XGBoost is relatively small, indicating that these models are reasonable, applicable, and rational for the study area.

3.4. Landslide Conditioning Factors Analysis

One of the most important analyses in landslide susceptibility assessment with ML models is the determination of the importance of conditioning factors used in the models [11]. In LS modeling, there may be some conditioning factors that have little or no effect on landslide formation [89]. For this study region, the contributions or importance levels of conditioning factors to the modeling are displayed in Figure 8. In this study, the caret library was used in R to implement LS models. The caret library provides functions that report the importance of variables in the training data. Figure 8 was produced using the varImp function, which is one of the built-in functions in the caret package. This function displays the importance of variables by scaling them to a maximum value of 100 unless the scale argument is set as “false” [97]. While the most important factor in RF, GBM, and XGBoost models is slope, the most important factor in the CatBoost model is distance to faults. In the study conducted by Kasahara et al. [106] in the Güneysu district of Rize, where the effects of land use on landslides were investigated, it was determined that the probability of landslides occurring in tea gardens in Rize was higher than in forest areas. In addition, it was determined that the probability of landslides occurring on high slopes (where the slope is between 30°–40°) in tea farming areas is 3.5 to 9.1 times higher than in forested areas. In the rainfall-induced LS mapping study conducted by Ye et al. [38] in the Fujian Province of China, the importance of conditioning factors was determined using the CatBoost algorithm, and distance to faults was determined as the most important factor affecting landslides among all factors. Figure 8 shows that slope, distance to faults, and altitude are the three most important factors for the GBM, CatBoost, and XGBoost models. On the other hand, plan and profile curvatures were found to be the least-important factors for all models. In the LS mapping study conducted by Youssef and Pourghasemi [66] in the Abha basin of Saudi Arabia, the researchers found that plan and profile curvature and land use/land cover factors were the least-effective factors. Similarly, in the study conducted by Kavzoglu and Teke [3] in Macka district of Trabzon Province in Turkey, elevation and slope were found to be the most effective factors, while plan curvature and NDVI were found to be the least-effective factors. Shahzad et al. [11] attributed the different maps produced by landslide susceptibility models to the different contributions of the factors in the models.

3.5. Models Validation and Comparison

Landslide susceptibility maps produced using different ML models need to be validated in order to be accepted in the scientific arena and to be used by the authorities in mitigation studies. In many studies in the literature, it has been emphasized that unvalidated LSMs have no significance and do not carry any scientific value [83,107,108]. In this study, the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC) approach were used to evaluate the models and compare their performance. In this study, success rate and prediction rate curves were generated using the training and test data set, respectively. The success rate curve is used to explain how well the landslide susceptibility models classify the existing landslide areas in the training data set, while the prediction rate curve is used to explain how well the models can predict unknown landslides or future landslides [109,110,111]. When the success rate curve shown in Figure 9 is analyzed, it is seen that the AUC value of GBM is 0.977 and the AUC value of RF is 0.989. The AUC values of CatBoost and XGBoost models, which have the highest AUC values, are 0.99.

Considering the AUC values of the prediction rate curves, it was determined that all models showed excellent performance for the study region (Figure 10). However, it is seen that the CatBoost algorithm, which has an AUC value of 0.988, performs slightly better than other models. In the studies published by Dorogush et al. [112] and Prokhorenkova et al. [100], it is also stated that CatBoost outperforms other gradient boosting algorithms. Sahin [35] compared the performances of GBM, CatBoost, XGBoost, and Light Gradient Boosting Machine (LightGBM) algorithms in a landslide susceptibility modeling study and concluded that CatBoost showed superior performance compared to other gradient boosting algorithms. Finally, in the rainfall-induced LS mapping study by Ye et al. [38], where SVM, RF, CatBoost, LightGBM, and XGBoost algorithms were used, CatBoost showed the best performance with an AUC value of 0.917, while RF showed the lowest performance among all algorithms used with an AUC value of 0.848.

4. Conclusions

In this study, we aimed to produce LS maps of the Ardeşen and Fındıklı districts of Rize Province by utilizing four tree-based ensemble learning algorithms, and the performances of these models were evaluated. The study area is located in the Eastern Black Sea Region, which receives the highest rainfall in Turkey. As stated in the literature, the main factor triggering the occurrence of landslides in the study area is excessive and heavy rainfall. Tree-based ensemble learning ML algorithms provide more-accurate prediction results by combining predictions from many decision trees. Therefore, there is a tendency towards ensemble learning algorithms in LS mapping studies. In this regard, tree-based ensemble learning algorithms such as RF, GBM, CatBoost, and XGBoost were used in this study. Considering the topographic, geological, and environmental conditions of the study area and previous studies in the Eastern Black Sea Region, 14 landslide conditioning factors were used in the LS models. The models’ performances were evaluated using the AUC-ROC approach. During the validation phase, it was discovered that the AUC values of the four models were extremely similar. The AUC values in the prediction rate curve used to evaluate the prediction capability of the models range from 0.975 and 0.988. CatBoost has a somewhat greater prediction capacity than other models, based on the characteristics of the study region and conditioning factors. However, the resulting AUC values were very close to each other, showing that the CatBoost and XGBoost models were promising for LS mapping. Also, it was determined that the most effective factor in the occurrence of landslides in the study area was slope, while plan and profile curvatures were the least-effective factors. Moreover, tea production in Turkey is carried out in Giresun, Trabzon, Rize, and Artvin provinces in the Eastern Black Sea Region. Approximately 65% of tea production is carried out in Rize. In addition to the slope being the most effective factor in the study region, which has a very rugged topography, incorrect land-use activities carried out due to tea cultivation are also contributory to the occurrence of landslides in the region. The destruction of forested areas in the region for tea cultivation and the lack of necessary drainage measures in tea-cultivation areas cause the number and frequency of landslides to increase. As a result, it is thought that the LSMs produced in this study can provide important contributions to the studies to be carried out to reduce the damages caused by landslides in the region and to create landslide-resistant areas.

Author Contributions

Conceptualization, A.Y.O. and H.A.; methodology, A.Y.O., H.A. and M.Z.; software, H.A. and M.Z.; validation, H.A., M.Z. and A.Y.O.; formal analysis, H.A. and A.Y.O.; investigation, A.Y.O.; resources, H.A. and M.Z.; data curation, H.A. and M.Z.; writing—original draft preparation, A.Y.O.; writing—review and editing, A.Y.O. and H.A.; visualization, H.A. and A.Y.O. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Scientific Research Projects Office of Artvin Coruh University (AÇÜBAP) (Scientific Research Project Number: 2022.F40.02.01).

Data Availability Statement

All the data are reported in this study. Additional details are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gökçe, O.; Özden, S.; Demir, A. Türkiye’de Afetlerin Mekânsal ve İstatistiksel Dağılımı Afet Bilgileri Envanteri; Bayındırlık ve İskân Bakanlığı Afet İşleri Genel Müdürlüğü, Afet Etüt ve Hasar Tespit Daire Başkanlığı: Ankara, Turkey, 2008. [Google Scholar]
Aydoğan, E.; Dağ, S. Landslide susceptibility analysis of the northeastern part of the upper Karasu Basin (Erzurum) using statistical methods. Turk. J. Remote Sens. GIS 2023, 4, 64–82. [Google Scholar] [CrossRef]
Kavzoglu, T.; Teke, A. Predictive performances of ensemble machine learning algorithms in landslide susceptibility mapping using random forest, extreme gradient boosting (XGBoost) and natural gradient boosting (NGBoost). Arab. J. Sci. Eng. 2022, 47, 7367–7385. [Google Scholar] [CrossRef]
IPCC. Climate change 2022: Impacts, adaptation and vulnerability. In Contribution of Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Pörtner, H.-O., Roberts, D.C., Tignor, M., Poloczanska, E.S., Mintenbeck, K., Alegría, A., Craig, M., Langsdorf, S., Löschke, S., Möller, V., et al., Eds.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2022; p. 3056. [Google Scholar] [CrossRef]
Demirağ Turan, İ.; Özkan, B.; Türkeş, M.; Dengiz, O. Landslide susceptibility mapping for the Black Sea Region with spatial fuzzy multi-criteria decision analysis under semi-humid and humid terrestrial ecosystems. Theor. Appl. Climatol. 2020, 140, 1233–1246. [Google Scholar] [CrossRef]
Pardeshi, S.D.; Autade, S.E.; Pardeshi, S.S. Landslide hazard assessment: Recent trends and techniques. SpringerPlus 2013, 2, 523. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Tien Bui, D.; Tuan, T.A.; Klempe, H.; Pradhan, B.; Revhaug, I. Spatial prediction models for shallow landslide hazards: A comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 2016, 13, 361–378. [Google Scholar] [CrossRef]
Ullah, I.; Aslam, B.; Shah, S.H.I.A.; Tariq, A.; Qin, S.; Majeed, M.; Havenith, H.-B. An integrated approach of machine learning, remote sensing, and GIS data for the landslide susceptibility mapping. Land 2022, 11, 1265. [Google Scholar] [CrossRef]
Das, S.; Sarkar, S.; Kanungo, D.P. A critical review on landslide susceptibility zonation: Recent trends, techniques, and practices in Indian Himalaya. Nat. Hazards 2023, 115, 23–72. [Google Scholar] [CrossRef]
Taalab, K.; Cheng, T.; Zhang, Y. Mapping landslide susceptibility and types using random forest. Big Earth Data 2018, 2, 159–178. [Google Scholar] [CrossRef]
Shahzad, N.; Ding, X.; Abbas, S. A comparative assessment of machine learning models for landslide susceptibility mapping in the rugged terrain of northern Pakistan. Appl. Sci. 2022, 12, 2280. [Google Scholar] [CrossRef]
Pradhan, A.M.S.; Kim, Y.T. Rainfall-induced shallow landslide susceptibility mapping at two adjacent catchments using advanced machine learning algorithms. ISPRS Int. J. Geo-Inf. 2020, 9, 569. [Google Scholar] [CrossRef]
He, Q.; Shahabi, H.; Shirzadi, A.; Li, S.; Chen, W.; Wang, N.; Chai, H.; Bian, H.; Ma, J.; Chen, Y.; et al. Landslide spatial modelling using novel bivariate statistical based Naïve Bayes, RBF Classifier, and RBF Network machine learning algorithms. Sci. Total Environ. 2019, 663, 1–15. [Google Scholar] [CrossRef] [PubMed]
Mutlu, B.; Nefeslioglu, H.A.; Sezer, E.A.; Akcayol, M.A.; Gokceoglu, C. An experimental research on the use of recurrent neural networks in landslide susceptibility mapping. ISPRS Int. J. Geo-Inf. 2019, 8, 578. [Google Scholar] [CrossRef] [Green Version]
Shahri, A.A.; Spros, J.; Johasson, F.; Larsson, S. Landslide susceptibility hazard map in southwest Sweden using artificial neural network. Catena 2019, 183, 104225. [Google Scholar] [CrossRef]
Akinci, H.; Kilicoglu, C.; Dogan, S. Random Forest-Based Landslide Susceptibility Mapping in Coastal Regions of Artvin, Turkey. ISPRS Int. J. Geo-Inf. 2020, 9, 553. [Google Scholar] [CrossRef]
Can, R.; Kocaman, S.; Gokceoglu, C. A Comprehensive assessment of XGBoost algorithm for landslide susceptibility mapping in the upper basin of Ataturk Dam, Turkey. Appl. Sci. 2021, 11, 4993. [Google Scholar] [CrossRef]
Cao, J.; Zhang, Z.; Du, J.; Zhang, L.; Song, Y.; Sun, G. Multi-geohazards susceptibility mapping based on machine learning—A case study in Jiuzhaigou, China. Nat. Hazards 2020, 102, 851–871. [Google Scholar] [CrossRef]
Akinci, H.; Zeybek, M. Comparing classical statistic and machine learning models in landslide susceptibility mapping in Ardanuc (Artvin), Turkey. Nat. Hazards 2021, 108, 1515–1543. [Google Scholar] [CrossRef]
Ajim Ali, S.; Parvin, F.; Pham, Q.B.; Khedher, K.M.; Dehbozorgi, M.; Rabby, Y.W.; Anh, D.T.; Nguyen, D.H. An ensemble random forest tree with SVM, ANN, NBT, and LMT for landslide susceptibility mapping in the Rangit River watershed, India. Nat. Hazards 2022, 113, 1601–1633. [Google Scholar] [CrossRef]
Kavzoglu, T.; Sahin, E.K.; Colkesen, I. Landslide susceptibility mapping using GIS-based multi-criteria decision analysis, support vector machines, and logistic regression. Landslides 2014, 11, 425–439. [Google Scholar] [CrossRef]
Colkesen, I.; Sahin, E.K.; Kavzoglu, T. Susceptibility mapping of shallow landslides using kernel-based Gaussian process, support vector machines and logistic regression. J. Afr. Earth Sci. 2016, 118, 53–64. [Google Scholar] [CrossRef]
Merghadi, A.; Yunus, A.P.; Dou, J.; Whiteley, J.; ThaiPham, B.; Bui, D.T.; Avtar, R.; Abderrahmane, B. Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance. Earth Sci. Rev. 2020, 207, 103225. [Google Scholar] [CrossRef]
Abu El-Magd, S.; Ali, S.A.; Pham, Q.B. Spatial modeling and susceptibility zonation of landslides using random forest, naïve bayes and K-nearest neighbor in a complicated terrain. Earth Sci. Inform. 2021, 14, 1227–1243. [Google Scholar] [CrossRef]
Tsangaratos, P.; Ilia, I. Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size. Catena 2016, 145, 164–179. [Google Scholar] [CrossRef]
Pourghasemi, H.R.; Gayen, A.; Park, S.; Lee, C.W.; Lee, S. Assessment of landslide-prone areas and their zonation using logistic regression, LogitBoost, and NaïveBayes machine-learning algorithms. Sustainability 2018, 10, 3697. [Google Scholar] [CrossRef] [Green Version]
Tsangaratos, P.; Benardos, A. Estimating landslide susceptibility through a artificial neural network classifier. Nat. Hazards 2014, 74, 1489–1516. [Google Scholar] [CrossRef]
Akinci, H. Assessment of rainfall-induced landslide susceptibility in Artvin, Turkey using machine learning techniques. J. Afr. Earth Sci. 2022, 191, 104535. [Google Scholar] [CrossRef]
Wang, Z.; Liu, Q.; Liu, Y. Mapping landslide susceptibility using machine learning algorithms and GIS: A case study in Shexian county, Anhui province, China. Symmetry 2020, 12, 1954. [Google Scholar] [CrossRef]
Youssef, A.M.; Pourghasemi, H.R.; Pourtaghi, Z.S.; Al-Katheeri, M.M. Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 2016, 13, 839–856. [Google Scholar] [CrossRef]
Chen, W.; Xie, X.; Wang, J.; Pradhan, B.; Hong, H.; Tien Bui, D.; Duan, Z.; Ma, J. A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. Catena 2017, 151, 147–160. [Google Scholar] [CrossRef] [Green Version]
Chen, W.; Zhang, S.; Li, R.; Shahabi, H. Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling. Sci. Total Environ. 2018, 644, 1006–1018. [Google Scholar] [CrossRef]
Nhu, V.; Mohammadi, A.; Shahabi, H.; Ahmad, B.B.; Al-Ansari, N.; Shirzadi, A.; Clague, J.J.; Jaafari, A.; Chen, W.; Nguyen, H. Landslide susceptibility mapping using machine learning algorithms and remote sensing data in a tropical environment. Int. J. Environ. Res. Public Health 2020, 17, 4933. [Google Scholar] [CrossRef] [PubMed]
Yu, H.; Pei, W.; Zhang, J.; Chen, G. Landslide susceptibility mapping and driving mechanisms in a vulnerable region based on multiple machine learning models. Remote Sens. 2023, 15, 1886. [Google Scholar] [CrossRef]
Sahin, E.K. Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest. SN Appl. Sci. 2020, 2, 1308. [Google Scholar] [CrossRef]
Sahin, E.K. Comparative analysis of gradient boosting algorithms for landslide susceptibility mapping. Geocarto Int. 2022, 37, 2441–2465. [Google Scholar] [CrossRef]
Song, Y.; Yang, D.; Wu, W.; Zhang, X.; Zhou, J.; Tian, Z.; Wang, C.; Song, Y. Evaluating landslide susceptibility using sampling methodology and multiple machine learning models. ISPRS Int. J. Geo-Inf. 2023, 12, 197. [Google Scholar] [CrossRef]
Ye, P.; Yu, B.; Chen, W.; Liu, K.; Ye, L. Rainfall-induced landslide susceptibility mapping using machine learning algorithms and comparison of their performance in Hilly area of Fujian Province, China. Nat. Hazards 2022, 113, 965–995. [Google Scholar] [CrossRef]
Wei, A.; Yu, K.; Dai, F.; Gu, F.; Zhang, W.; Liu, Y. Application of tree-based ensemble models to landslide susceptibility mapping: A comparative study. Sustainability 2022, 14, 6330. [Google Scholar] [CrossRef]
Hong, H.; Pourghasemi, H.R.; Pourtaghi, Z.S. Landslide susceptibility assessment in Lianhua County (China): A comparison between a random forest data mining technique and bivariate and multivariate statistical models. Geomorphology 2016, 259, 105–118. [Google Scholar] [CrossRef]
Aditian, A.; Kubota, T.; Shinohara, Y. Comparison of GIS-based landslide susceptibility models using frequency ratio, logistic regression, and artificial neural network in a tertiary region of Ambon, Indonesia. Geomorphology 2018, 318, 101–111. [Google Scholar] [CrossRef]
Merghadi, A.; Abderrahmane, B.; Tien Bui, D. Landslide susceptibility assessment at Mila basin (Algeria): A comparative assessment of prediction capability of advanced machine learning methods. ISPRS Int. J. Geo-Inf. 2018, 7, 268. [Google Scholar] [CrossRef] [Green Version]
Wang, Y.; Fang, Z.; Hong, H. Comparison of convolutional neural networks for landslide susceptibility mapping in Yanshan County, China. Sci. Total Environ. 2019, 666, 975–993. [Google Scholar] [CrossRef] [PubMed]
Thi Ngo, P.T.; Panahi, M.; Khosravi, K.; Ghorbanzadeh, O.; Kariminejad, N.; Cerda, A.; Lee, S. Evaluation of deep learning algorithms for national scale landslide susceptibility mapping of Iran. Geosci. Front. 2021, 12, 505–519. [Google Scholar] [CrossRef]
Lv, L.; Chen, T.; Dou, J.; Plaza, A. A hybrid ensemble-based deep-learning framework for landslide susceptibility mapping. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102713. [Google Scholar] [CrossRef]
Aslam, B.; Zafar, A.; Khalil, U. Comparative analysis of multiple conventional neural networks for landslide susceptibility mapping. Nat. Hazards 2023, 115, 673–707. [Google Scholar] [CrossRef]
Sameen, M.I.; Pradhan, B.; Lee, S. Application of convolutional neural networks featuring Bayesian optimization for landslide susceptibility assessment. Catena 2020, 186, 104249. [Google Scholar] [CrossRef]
Mandal, K.; Saha, S.; Mandal, S. Applying deep learning and benchmark machine learning algorithms for landslide susceptibility modelling in Rorachu river basin of Sikkim Himalaya, India. Geosci. Front. 2021, 12, 101203. [Google Scholar] [CrossRef]
Akgun, A.; Dag, S.; Bulut, F. Landslide susceptibility mapping for a landslide-prone area (Findikli, NE of Turkey) by likelihood-frequency ratio and weighted linear combination models. Environ. Geol. 2008, 54, 1127–1143. [Google Scholar] [CrossRef]
Yalcin, A. GIS-based landslide susceptibility mapping using analytical hierarchy process and bivariate statistics in Ardesen (Turkey): Comparisons of results and confirmations. Catena 2008, 72, 1–12. [Google Scholar] [CrossRef]
Thornthwaite, C.W. An approach toward a rational classification of climate. Geogr. Rev. 1948, 38, 55–94. [Google Scholar] [CrossRef]
General Directorate of Meteorology. Climate Classification Rize. 2023. Available online: https://www.mgm.gov.tr/iklim/iklim-siniflandirmalari.aspx?m=RIZE, (accessed on 11 April 2023).
General Directorate of Meteorology. General Statistical Data of Our Provinces. 2023. Available online: https://www.mgm.gov.tr/veridegerlendirme/il-ve-ilceler-istatistik.aspx?k=A&m=RIZE (accessed on 11 April 2023).
Karsli, F.; Atasoy, M.; Yalcin, A.; Reis, S.; Demir, O.; Gokceoglu, C. Effects of land-use changes on landslides in a landslide-prone area (Ardesen, Rize, NE Turkey). Environ. Monit. Assess. 2009, 156, 241–255. [Google Scholar] [CrossRef]
Dağ, S.; Bulut, F. An example for preparation of GIS-based landslide susceptibility maps: Çayeli (Rize, NE Türkiye). J. Eng. Geol. 2012, 36, 35–62. (In Turkish) [Google Scholar]
Keskin, I. 1:100,000 Scale Geological Map of Turkey, No:178 Artvin-F46 Map Sheet; General Directorate of Mineral Research and Exploration, Geological Research Department: Ankara, Turkey, 2013. (In Turkish) [Google Scholar]
Parise, M. Landslide mapping techniques and their use in the assessment of the landslide hazard. Phys. Chem. Earth C 2001, 26, 697–703. [Google Scholar] [CrossRef]
Varnes, D.J. Slope movement types and processes. In Landslides Analysis and Control; Schuster, R.L., Krizek, R.J., Eds.; Special Report; Transportation Research Board, National Academy of Sciences: New York, NY, USA, 1978; Volume 176, pp. 12–33. [Google Scholar]
Yalçın, A. Use of analytical hierarchy method and GIS in the production of landslide susceptibility maps. J. Fac. Eng. Arch. Selcuk. Univ. 2007, 22, 1–14. (In Turkish) [Google Scholar]
Vakhshoori, V.; Pourghasemi, H.R.; Zare, M.; Blaschke, T. Landslide susceptibility mapping using GIS-based data mining algorithms. Water 2019, 11, 2292. [Google Scholar] [CrossRef] [Green Version]
Li, Y.; Chen, W. Landslide susceptibility evaluation using hybrid integration of evidential belief function and machine learning techniques. Water 2020, 12, 113. [Google Scholar] [CrossRef] [Green Version]
Pourghasemi, H.R.; Sadhasivam, N.; Amiri, M.; Eskandari, S.; Santosh, M. Landslide susceptibility assessment and mapping using state-of-the art machine learning techniques. Nat. Hazards 2021, 108, 1291–1316. [Google Scholar] [CrossRef]
Arca, D.; Keskin Citiroglu, H.; Tasoglu, I.K. A comparison of GIS-based landslide susceptibility assessment of the Satuk village (Yenice, NW Turkey) by frequency ratio and multi-criteria decision methods. Environ. Earth Sci. 2019, 78, 81. [Google Scholar] [CrossRef]
Shahabi, H.; Hashim, M.; Ahmad, B.B. Remote sensing and GIS-based landslide susceptibility mapping using frequency ratio, logistic regression, and fuzzy logic methods at the central Zab basin, Iran. Environ. Earth Sci. 2015, 73, 8647–8668. [Google Scholar] [CrossRef]
Dağ, S.; Akgün, A.; Kaya, A.; Alemdağ, S.; Bostancı, H.T. Medium scale earthflow susceptibility modelling by remote sensing and geographical information systems based multivariate statistics approach: An example from Northeastern Turkey. Environ. Earth Sci. 2020, 79, 468. [Google Scholar] [CrossRef]
Youssef, A.M.; Pourghasemi, H.R. Landslide susceptibility mapping using machine learning algorithms and comparison of their performance at Abha Basin, Asir Region, Saudi Arabia. Geosci. Front. 2021, 12, 639–655. [Google Scholar] [CrossRef]
Pourghasemi, H.R.; Pradhan, B.; Gokceoglu, C.; Moezzi, K.D. A comparative assessment of prediction capabilities of Dempster–Shafer and Weights-of-evidence models in landslide susceptibility mapping using GIS. Geomat. Nat. Hazards Risk 2013, 4, 93–118. [Google Scholar] [CrossRef]
Ding, Q.; Chen, W.; Hong, H. Application of frequency ratio, weights of evidence and evidential belief function models in landslide susceptibility mapping. Geocarto Int. 2017, 32, 619–639. [Google Scholar] [CrossRef]
Kilicoglu, C. Investigation of the effects of approaches used in the production of training and validation data sets on the accuracy of landslide susceptibility mapping models: Samsun (Turkey) example. Arab. J. Geosci. 2021, 14, 2106. [Google Scholar] [CrossRef]
Erener, A.; Mutlu, A.; Düzgün, H.S. A comparative study for landslide susceptibility mapping using GIS-based multi-criteria decision analysis (MCDA), logistic regression (LR) and association rule mining (ARM). Eng. Geol. 2016, 203, 45–55. [Google Scholar] [CrossRef]
Feizizadeh, B.; Blaschke, T.; Nazmfar, H. GIS based ordered weighted averaging and Dempster–Shafer methods for landslide susceptibility mapping in the Urmia Lake Basin, Iran. Int. J. Digit. Earth. 2014, 7, 688–708. [Google Scholar] [CrossRef]
Shahabi, H.; Khezri, S.; Ahmad, B.B.; Hashim, M. Landslide susceptibility mapping at central Zab basin, Iran: A comparison between analytical hierarchy process, frequency ratio and logistic regression models. Catena 2014, 115, 55–70. [Google Scholar] [CrossRef]
Ba, Q.; Chen, Y.; Deng, S.; Wu, Q.; Yang, J.; Zhang, J. An improved information value model based on gray clustering for landslide susceptibility mapping. ISPRS Int. J. Geoinf. 2017, 6, 18. [Google Scholar] [CrossRef]
Du, G.L.; Zhang, Y.S.; Iqbal, J.; Yang, Z.; Yao, X. Landslide susceptibility mapping using an integrated model of information value method and logistic regression in the Bailongjiang watershed, Gansu Province, China. J. Mt. Sci. 2017, 14, 249–268. [Google Scholar] [CrossRef]
Roy, D.; Sarkar, A.; Kundu, P.; Paul, S.; Sarkar, B.C. An ensemble of evidence belief function (EBF) with frequency ratio (FR) using geospatial data for landslide prediction in Darjeeling Himalayan region of India. Quat. Sci. Adv. 2023, 11, 100092. [Google Scholar] [CrossRef]
Hong, H.; Xu, C.; Tien Bui, D. Landslide susceptibility assessment at the Xiushui area (China) using frequency ratio model. Procedia Environ. Sci. 2015, 15, 513–517. [Google Scholar] [CrossRef] [Green Version]
Ghasemian, B.; Shahabi, H.; Shirzadi, A.; Al-Ansari, N.; Jaafari, A.; Kress, V.R.; Geertsema, M.; Renoud, S.; Ahmad, A. A robust deep-learning model for landslide susceptibility mapping: A case study of Kurdistan Province, Iran. Sensors 2022, 22, 1573. [Google Scholar] [CrossRef] [PubMed]
Trigila, A.; Iadanza, C.; Esposito, C.; Scarascia-Mugnozza, G. Comparison of logistic regression and random forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy). Geomorphology 2015, 249, 119–136. [Google Scholar] [CrossRef]
Yilmaz, I. Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: A case study from Kat landslides (Tokat—Turkey). Comput. Geosci. 2009, 35, 1125–1138. [Google Scholar] [CrossRef]
Guillard, C.; Zezere, J. Landslide susceptibility assessment and validation in the framework of municipal planning in Portugal: The case of Loures municipality. Environ. Manag. 2012, 50, 721–735. [Google Scholar] [CrossRef] [PubMed]
Lee, S.; Ryu, J.H.; Kim, L.S. Landslide susceptibility analysis and its verification using likelihood ratio, logistic regression, and artificial neural network models: Case study of Youngin, Korea. Landslides 2007, 4, 327–338. [Google Scholar] [CrossRef]
Son, J.; Suh, J.; Park, H.D. GIS-based landslide susceptibility assessment in Seoul, South Korea, applying the radius of influence to frequency ratio analysis. Environ. Earth Sci. 2016, 75, 310. [Google Scholar] [CrossRef]
Arabameri, A.; Saha, S.; Roy, J.; Chen, W.; Blaschke, T.; Tien Bui, D. Landslide susceptibility evaluation and management using different machine learning methods in the Gallicash river watershed, Iran. Remote Sens. 2020, 12, 475. [Google Scholar] [CrossRef] [Green Version]
Sahin, E.K.; Colkesen, I. Performance analysis of advanced decision tree-based ensemble learning algorithms for landslide susceptibility mapping. Geocarto Int. 2021, 36, 1253–1275. [Google Scholar] [CrossRef]
Chen, X.; Chen, W. GIS-based landslide susceptibility assessment using optimized hybrid machine learning methods. Catena 2021, 196, 104833. [Google Scholar] [CrossRef]
Nohani, E.; Moharrami, M.; Sharafi, S.; Khosravi, K.; Pradhan, B.; Pham, B.T.; Lee, S.; Melesse, A.M. Landslide susceptibility mapping using different GIS-based bivariate models. Water 2019, 11, 1402. [Google Scholar] [CrossRef] [Green Version]
Akinci, H.; Yavuz Ozalp, A. Landslide susceptibility mapping and hazard assessment in Artvin (Turkey) using frequency ratio and modified information value model. Acta Geophys. 2021, 69, 725–745. [Google Scholar] [CrossRef]
Hong, H.; Tsangaratos, P.; Ilia, I.; Loupasakis, C.; Wang, Y. Introducing a novel multi-layer perceptron network based on stochastic gradient descent optimized by a meta-heuristic algorithm for landslide susceptibility mapping. Sci. Total Environ. 2020, 742, 140549. [Google Scholar] [CrossRef] [PubMed]
Zhang, T.; Fu, Q.; Li, C.; Liu, F.; Wang, H.; Han, L.; Quevedo, R.P.; Chen, T.; Lei, N. Modeling landslide susceptibility using data mining techniques of kernel logistic regression, fuzzy unordered rule induction algorithm, SysFor and random forest. Nat. Hazards 2022, 114, 3327–3358. [Google Scholar] [CrossRef]
Jiao, Y.; Zhao, D.; Ding, Y.; Liu, Y.; Xu, Q.; Qiu, Y.; Liu, C.; Liu, Z.; Zha, Z.; Li, R. Performance evaluation for four GIS-based models purposed to predict and map landslide susceptibility: A case study at a World Heritage site in Southwest China. Catena 2019, 183, 104221. [Google Scholar] [CrossRef]
Wu, Y.; Ke, Y.; Chen, Z.; Liang, S.; Zhao, H.; Hong, H. Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping. Catena 2020, 187, 104396. [Google Scholar] [CrossRef]
Althuwaynee, O.F.; Pradhan, B.; Lee, S. Application of an evidential belief function model in landslide susceptibility mapping. Comput. Geosci. 2012, 44, 120–135. [Google Scholar] [CrossRef]
Pourghasemi, H.R.; Mohammady, M.; Pradhan, B. Landslide susceptibility mapping using index of entropy and conditional probability models in GIS: Safarood Basin, Iran. Catena 2012, 97, 71–84. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Akıncı, H.A.; Akıncı, H. Machine learning based forest fire susceptibility assessment of Manavgat district (Antalya), Turkey. Earth Sci. Inform. 2023, 16, 397–414. [Google Scholar] [CrossRef]
Costache, R.; Arabameri, A.; Elkhrachy, I.; Ghorbanzadeh, O.; Pham, Q.B. Detection of areas prone to flood risk using state-of-the-art machine learning models. Geomat. Nat. Hazards Risk 2021, 12, 1488–1507. [Google Scholar] [CrossRef]
Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef] [Green Version]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef] [Green Version]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, QC, Canada, 2–8 December 2018. [Google Scholar]
Huang, G.; Wu, L.; Ma, X.; Zhang, W.; Fan, J.; Yu, X.; Zeng, W.; Zhou, H. Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions. J. Hydrol. 2019, 574, 1029–1041. [Google Scholar] [CrossRef]
Kang, Y.; Jang, E.; Im, J.; Kwon, C.; Kim, S. Developing a new hourly forest fire risk index based on Catboost in South Korea. Appl. Sci. 2020, 10, 8213. [Google Scholar] [CrossRef]
Catboost. Available online: https://catboost.ai/en/docs/concepts/r-installation (accessed on 2 April 2023).
Jenks, G.F. The data model concept in statistical mapping. Int. Yearb. Cartogr. 1967, 7, 186–190. [Google Scholar]
Guo, X.; Fu, B.; Du, J.; Shi, P.; Chen, Q.; Zhang, W. Applicability of Susceptibility Model for Rock and Loess Earthquake Landslides in the Eastern Tibetan Plateau. Remote Sens. 2021, 13, 2546. [Google Scholar] [CrossRef]
Kasahara, N.; Gonda, Y.; Huvaj, N. Quantitative land-use and landslide assessment: A case study in Rize, Türkiye. Water 2022, 14, 1811. [Google Scholar] [CrossRef]
Chen, W.; Zhao, X.; Shahabi, H.; Shirzadi, A.; Khosravi, K.; Chai, H.; Zhang, S.; Zhang, L.; Ma, J.; Chen, Y.; et al. Spatial prediction of landslide susceptibility by combining evidential belief function, logistic regression, and logistic model tree. Geocarto Int. 2019, 34, 1177–1201. [Google Scholar] [CrossRef]
Achour, Y.; Pourghasemi, H.R. How do machine learning techniques help in increasing accuracy of landslide susceptibility maps? Geosci. Front. 2020, 11, 871–883. [Google Scholar] [CrossRef]
Chen, T.; Zhu, L.; Niu, R.-q.; Trinder, C.J.; Peng, L.; Lei, T. Mapping landslide susceptibility at the Three Gorges Reservoir, China, using gradient boosting decision tree, random forest, and information value models. J. Mt. Sci. 2020, 17, 670–685. [Google Scholar] [CrossRef]
Rabby, Y.W.; Li, Y. Landslide Susceptibility Mapping Using Integrated Methods: A Case Study in the Chittagong Hilly Areas, Bangladesh. Geosciences 2020, 10, 483. [Google Scholar] [CrossRef]
Wubalem, A. Landslide susceptibility mapping using statistical methods in Uatzau catchment area, northwestern Ethiopia. Geoenviron. Disasters. 2021, 8, 1. [Google Scholar] [CrossRef]
Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv 2018, arXiv:1810.11363. [Google Scholar] [CrossRef]

Figure 1. Location of the study area.

Figure 2. Units of lithology in the research area [56].

Figure 3. Conditioning factor maps: (a) altitude, (b) slope, (c) aspect, (d) LS-factor.

Figure 4. Conditioning factor maps: (a) plan curvature, (b) profile curvature, (c) land cover, (d) TCD.

Figure 5. Conditioning factor maps: (a) TPI, (b) TWI), (c) distance to drainage, (d) distance to roads, and (e) distance to faults.

Figure 6. Landslide susceptibility maps: (a) RF, (b) GBM, (c) CatBoost, (d) XGBoost.

Figure 7. Percentage of susceptibility classes for machine learning models.

Figure 8. Importance of conditioning factors: (a) RF, (b) GBM, (c) CatBoost, (d) XGBoost.

Figure 9. Success rate curve.

Figure 10. Prediction rate curve.

Table 1. Description and data source of the conditioning factors.

Factors	Source	Scale/Resolution	Sub-Classes				Reference
Altitude (m)	DEM	10 m	1	0–300	6	1500–1800	[31,67,68]
			2	300–600	7	1800–2100
			3	600–900	8	2100–2400
			4	900–1200	9	2400–2700
			5	1200–1500	10	2700–3497.38
Aspect	DEM	10 m	1	Flat	6	South	[16,29,36,69]
			2	North	7	South West
			3	North East	8	West
			4	East	9	North West
			5	South East
Distance to drainage (m)	DEM	10 m	1	0–100	6	500–600	[16,19,35,70]
			2	100–200	7	600–700
			3	200–300	8	700–800
			4	300–400	9	800–900
			5	400–500	10	900–1090.18
Distance to faults (m)	GDMRE, Türkiye	1:100,000	1	0–1000	6	5000–6000	[16,19,71,72]
			2	1000–2000	7	6000–7000
			3	2000–3000	8	7000–8000
			4	3000–4000	9	8000–9000
			5	4000–5000	10	9000–16,500.94
Distance to roads (m)	digital road network (Basarsoft Inc., Ankara, Turkey)	10 m	1	0–200	6	1000–1200	[16,19,73,74]
			2	200–400	7	1200–1400
			3	400–600	8	1400–1600
			4	600–800	9	1600–1800
			5	800–1000	10	1800–8658.22
Land cover	ESRI Land Cover	10 m	1	Water	7	Built Area	[23,34,46,75]
			2	Trees	8	Bare ground
			3	Grass (Rangeland)	9	Snow/ice
			5	Crops
			6	Scrub/shrub
Lithology	GDMRE, Türkiye	1:100,000	Presented in Figure 2.				[56]
LS-factor	DEM	10 m	1	0.003–13.938	5	76.648–118.455	[40,67,76,77]
			2	13.938–30.196	6	118.455–190.456
			3	30.196–48.777	7	190.456–592.265
			4	48.777–76.648
Plan curvature	DEM	10 m	1	<0 (concave)			[3,69,78]
			2	0 (flat)
			3	>0 (convex)
Profile curvature	DEM	10 m	1	<0 (concave)			[3,69,78]
			2	0 (flat)
			3	>0 (convex)
Slope (°)	DEM	10 m	1	0–5	6	25–30	[16,69,79,80]
			2	5–10	7	30–35
			3	10–15	8	35–40
			4	15–20	9	40–45
			5	20–25	10	45–75.82
TCD (%)	Copernicus Land Monitoring Service	10 m	1	0–10	6	50–60	[26,81,82]
			2	10–20	7	60–70
			3	20–30	8	70–80
			4	30–40	9	80–90
			5	40–50	10	90–100
TPI	DEM	10 m	1	−58.711–15.402	5	5.113–11.381	[3,35,83]
			2	−15.402–6.854	6	11.381–20.499
			3	−6.854–0.586	7	20.499–86.602
			4	−0.586–5.113
TWI	DEM	10 m	1	0.869–4.627	5	9.548–12.591	[19,35,84]
			2	4.627–5.880	6	12.591–16.796
			3	5.880–7.401	7	16.796–23.686
			4	7.401–9.548

Table 2. Multicollinearity analysis results for landslide conditioning factors [34,39,83,87].

Conditioning Factors	VIF	TOL
Altitude (m)	2.29424	0.43587
Aspect	1.04426	0.95762
Distance to drainage (m)	1.10645	0.90379
Distance to faults (m)	1.18195	0.84606
Distance to roads (m)	2.25385	0.44369
Land cover	1.17820	0.84875
Lithology	1.11000	0.90090
LS-factor	3.01434	0.33175
Plan curvature	1.31931	0.75797
Profile curvature	1.15036	0.86929
Slope (°)	3.61071	0.27695
TCD	1.13473	0.88126
TPI	1.85399	0.53938
TWI	3.13184	0.31930

Table 3. Comparison of the ML models’ results.

ML Model	Susceptibility Level	Area Percentage (%)	Landslide Pixel	Landslide Percentage (%)	Frequency Ratio
RF	Very low	62.17	31	0.049	0.0008
	Low	19.27	501	0.807	0.0419
	Moderate	9.55	1695	2.729	0.2858
	High	5.45	5988	9.646	1.7699
	Very high	3.56	53,874	86.769	24.3733
GBM	Very low	20.76	2	0.003	0.0001
	Low	40.48	47	0.076	0.0019
	Moderate	19.85	960	1.546	0.0779
	High	12.06	5906	9.512	0.7887
	Very high	6.85	55,174	88.863	12.9727
XGBoost	Very low	14.08	0	0	0
	Low	53.77	7	0.011	0.0002
	Moderate	17.53	306	0.493	0.0281
	High	9.36	3383	5.449	0.5822
	Very high	5.26	58,393	94.047	17.8796
CatBoost	Very low	13.40	0	0	0
	Low	52.48	5	0.008	0.0001
	Moderate	18.47	219	0.353	0.0191
	High	10.29	3328	5.360	0.5209
	Very high	5.36	58,537	94.279	17.5894

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yavuz Ozalp, A.; Akinci, H.; Zeybek, M. Comparative Analysis of Tree-Based Ensemble Learning Algorithms for Landslide Susceptibility Mapping: A Case Study in Rize, Turkey. Water 2023, 15, 2661. https://doi.org/10.3390/w15142661

AMA Style

Yavuz Ozalp A, Akinci H, Zeybek M. Comparative Analysis of Tree-Based Ensemble Learning Algorithms for Landslide Susceptibility Mapping: A Case Study in Rize, Turkey. Water. 2023; 15(14):2661. https://doi.org/10.3390/w15142661

Chicago/Turabian Style

Yavuz Ozalp, Ayse, Halil Akinci, and Mustafa Zeybek. 2023. "Comparative Analysis of Tree-Based Ensemble Learning Algorithms for Landslide Susceptibility Mapping: A Case Study in Rize, Turkey" Water 15, no. 14: 2661. https://doi.org/10.3390/w15142661

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparative Analysis of Tree-Based Ensemble Learning Algorithms for Landslide Susceptibility Mapping: A Case Study in Rize, Turkey

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Landslide Inventory Map

2.3. Data Preparation for Landslide Conditioning Factors

2.4. Multicollinearity Analysis

2.5. Model Validation

2.6. Machine Learning Methods

2.6.1. Random Forest (RF)

2.6.2. Gradient Boosting Machine (GBM)

2.6.3. Extreme Gradient Boosting (XGBoost)

2.6.4. Categorical Boosting (CatBoost)

3. Results and Discussion

3.1. Multicollinearity Analysis of Conditioning Factors

3.2. Landslide Susceptibility Maps

3.3. Landslide Susceptibility Map Rationality

3.4. Landslide Conditioning Factors Analysis

3.5. Models Validation and Comparison

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI