Random Forest-Based Landslide Susceptibility Mapping in Coastal Regions of Artvin, Turkey

Natural disasters such as landslides often occur in the Eastern Black Sea region of Turkey owing to its geological, topographical, and climatic characteristics. Landslide events occur nearly every year in the Arhavi, Hopa, and Kemalpaşa districts located on the Black Sea coast in the Artvin province. In this study, the landslide susceptibility map of the Arhavi, Hopa, and Kemalpaşa districts was produced using the random forest (RF) model, which is widely used in the literature and yields more accurate results compared with other machine learning techniques. A total of 10 landslide-conditioning factors were considered for the susceptibility analysis, i.e., lithology, land cover, slope, aspect, elevation, curvature, topographic wetness index, and distances from faults, drainage networks, and roads. Furthermore, 70% of the landslides on the landslide inventory map were used for training, and the remaining 30% were used for validation. The RF-based model was validated using the area under the receiver operating characteristic (ROC) curve. Evaluation results indicated that the success and prediction rates of the model were 98.3% and 97.7%, respectively. Moreover, it was determined that incorrect land-use decisions, such as transforming forest areas into tea and hazelnut cultivation areas, induce the occurrence of landslides.


Introduction
Landslide is a natural disaster that causes economic damage and fatalities, both in Turkey and many different regions in the world. In Turkey, the region most affected by landslides in terms of frequency of occurrence and resultant damage is the Eastern Black Sea region [1][2][3]. The main factors that predispose landslide triggering in the Eastern Black Sea region are rugged terrain structure, climatic conditions with heavy rainfall, weathering, deforestation, dense tea and hazelnut cultivation, dispersed rural settlements, low-quality transportation network owing to dispersed settlement, and uncontrolled surface drainage systems created by the transportation network [4].
One of the essential procedures to reduce damage and fatalities caused by landslides is landslide susceptibility mapping (LSM) [5][6][7][8][9]. Trigila et al. [10] reported that LSM is the first step in estimating landslide hazard and risk. Landslide susceptibility maps show the regions where future landslides can occur in the study area [11]. Generally, LSM involves the following steps: collecting spatial data regarding the factors affecting the occurrence of landslides in the selected study area, determining landslide susceptibility using the association between existing landslides on the inventory map and conditioning factors, and validating the results [12]. One of the important steps in this process is the determination of the method or model to be used for producing the landslide susceptibility map. According to the CORINE 2018 land cover data, 62.74% of the study area is covered with forest, 16.08% with agricultural lands, and 8.15% with natural grasslands. The main agricultural products include tea and hazelnut. Tea and hazelnut cultivation is conducted in 70% of agricultural lands in the Arhavi district and 53% of agricultural lands in the Hopa and Kemalpaşa districts.
According to the Turkish Statistical Institute's 2019 population data, Arhavi's population is 20,926, Hopa's population is 26,958, and Kemalpaşa's population is 9224 [56]. The total population of the study area is 57,108.
According to the data of the General Directorate of Meteorology (2009-2020), the total monthly precipitation in the study area is 2215.16 mm. In this region, the minimum average monthly precipitation of 94.25 mm is observed in May and the maximum average monthly precipitation of 345.2 mm is observed in September. According to the data of the General Directorate of Meteorology (2000-2020), the maximum average monthly temperature of 27.71 °C is observed in August and the average monthly minimum temperature of 3.63 °C is observed in January in the study area.
The study area has a rather steep and rough topography. In the region, there are sharp hillsides from coastal areas to interiors. Deep valleys stand out all across the study area. In accordance with the morphological structure of the region, there are many streams with regular and irregular flow regimes, forming a dendritic drainage network. The streams have a high base of slope, which increases the flow rate, causing more material to be transported [57,58]. Because of the morphology of the region, the materials carried by streams are deposited along with the flat areas near the Black Sea. Furthermore, it is seen that the settlement areas are also established in these wide and flat morphologic units.
In addition to this, the geologic formation in the region is formed from rocks belonging to various geological times. The study area is located in the Northern Zone of East Pontides within the Pontides Tectonic Assembly. This section of the Eastern Pontides is separated into two zones. In the north, the Adjara-Trialeti belt coming from the east continues along with the Black Sea coast. On the other hand, the Somcheti-Kafan (Karabakh) belt, constituting the south section of Transcaucasia, According to the CORINE 2018 land cover data, 62.74% of the study area is covered with forest, 16.08% with agricultural lands, and 8.15% with natural grasslands. The main agricultural products include tea and hazelnut. Tea and hazelnut cultivation is conducted in 70% of agricultural lands in the Arhavi district and 53% of agricultural lands in the Hopa and Kemalpaşa districts.
According to the Turkish Statistical Institute's 2019 population data, Arhavi's population is 20,926, Hopa's population is 26,958, and Kemalpaşa's population is 9224 [56]. The total population of the study area is 57,108.
According to the data of the General Directorate of Meteorology (2009-2020), the total monthly precipitation in the study area is 2215.16 mm. In this region, the minimum average monthly precipitation of 94.25 mm is observed in May and the maximum average monthly precipitation of 345.2 mm is observed in September. According to the data of the General Directorate of Meteorology (2000-2020), the maximum average monthly temperature of 27.71 • C is observed in August and the average monthly minimum temperature of 3.63 • C is observed in January in the study area.
The study area has a rather steep and rough topography. In the region, there are sharp hillsides from coastal areas to interiors. Deep valleys stand out all across the study area. In accordance with the morphological structure of the region, there are many streams with regular and irregular flow regimes, forming a dendritic drainage network. The streams have a high base of slope, which increases the flow rate, causing more material to be transported [57,58]. Because of the morphology of the region, the materials carried by streams are deposited along with the flat areas near the Black Sea. Furthermore, it is seen that the settlement areas are also established in these wide and flat morphologic units.
In addition to this, the geologic formation in the region is formed from rocks belonging to various geological times. The study area is located in the Northern Zone of East Pontides within the Pontides Tectonic Assembly. This section of the Eastern Pontides is separated into two zones. In the north, the Adjara-Trialeti belt coming from the east continues along with the Black Sea coast. On the other hand, the Somcheti-Kafan (Karabakh) belt, constituting the south section of Transcaucasia, corresponds to the Pondites Tectonic Assembly. The study area is rich in volcanic, volcano-sedimentary and intrusive rocks containing magmatism products that have developed in periods in the Early Jurassic and Late Cretaceous Eocene age range. During periods when the magmatism was not active, sedimentary stacks were accumulated [59].

Data Used
The first stage of landslide susceptibility assessment is data collection and creation of a spatial database in which thematic maps of the factors associated with landslides are produced [60]. In this study, 1:100,000 scale digital geology and 1:25,000 scale digital landslide inventory maps were obtained from the General Directorate of Mining Research and Exploration (GDMRE), 1:25,000 scale topographic maps from the General Directorate of Mapping, 1:100,000 scale CORINE 2018 land cover data from the European Union Copernicus Land Monitoring Service (https://land.copernicus.eu/pan-european/ corine-land-cover), and 1:25,000 scale digital road network data from the Artvin Regional Directorate of Forestry. Thematic maps of the factors associated with landslides were prepared using ESRI ArcGIS 10.5 software and converted in raster format (ESRI GRID) with a spatial resolution of 10 m. The modeling operations were conducted in R 3.6.3 software.

Landslide Inventory
It is necessary to use an accurate and complete landslide inventory for the training and validation of landslide models [10]. Landslide inventory maps providing basic information to assess landslide hazards on a regional scale are the most crucial data required to examine the relationship between landslide occurrence and factors affecting it [61]. However, in Yanar et al. [62], it is clearly expressed that dense buildings and constructions change the topography, and these areas dramatically prevent the visibility of the landslides in the urban areas. Therefore, in the residential areas, it is very difficult to prepare landslide inventory maps. On the other side, Dag and Bulut [58] stated that in the tea cultivation areas, such as our study area, the vegetation covers the relatively small-scale landslides (mainly flow debris and rotational slides); thus, making it very difficult to produce the inventory maps in these tea covered areas. Furthermore, the 1:25,000 scaled regional landslide inventory maps cannot represent the small-scale landslides as polygons since the small-scale landslide areas are too small to be represented on this scale. In this study, 1:25,000 scale landslide inventory maps produced by the GDMRE and updated by the Artvin Provincial Directorate of Disaster and Emergency were used. This inventory map, which we used in this paper, is the most accurate and the most confident landslide inventory map for this region. Landslide inventory maps used in Turkey were prepared within the scope of the "Turkey Landslide Inventory Project," which was initiated by GDMRE in 1997 on a national scale and completed in 2009. The landslides detected based on the field studies and aerial images (1:10,000-1:35,000 scale) were processed on 1:25,000 scale basic topographic maps [63].
The landslides were classified into three types based on their movements: flowing, sliding, and complex. Fall and tipping type landslides were not considered, owing to scale limitations. Furthermore, the landslides were classified into two groups based on their relative depths. Accordingly, landslides with sliding surface of less than 5-m depth were classified as shallow and those with sliding surface greater than 5-m depth were classified as deep. The landslides were further classified by dividing them into two groups based on their activities: active and inactive [63,64].
According to the inventory map, there are a total of 85 mapped landslide polygons in the study area. The landslides in the study area typically fall into the shallow landslide group. The landslides cover 0.94% of the study area. Moreover, 30 of these landslides belong to the inactive landslide class and 55 to the active (45 sliding and 11 flowing types) landslide class. The smallest and largest landslides on the inventory map have areas of 0.002 and 0.81 km 2 , respectively. The average area of the landslide polygons is 0.07 km 2 . The widths of the landslides in the study region range from 45 to 981 m, while their lengths range from 60 to 1947 m.
However, models used to produce landslide susceptibility maps require both "landslide (or positive)" and "non-landslide (or negative)" samples of the study area. Although there is no accepted rule in the creation of these two subsets, most researchers use a ratio of 70:30 in LSM studies, particularly in the selection of "landslide" samples. In this approach, 70% of the randomly selected landslides on the inventory map are used for model training and the remaining 30% are used for model validation [31,32,44,65]. Huang and Zhao [36] emphasized that the positive and negative samples in the training and test datasets should be the same, i.e., a ratio of 1:1. Thus, researchers use the same number of negative and positive samples in their studies [39,[66][67][68][69]. The landslide inventory map was converted to a raster format with a cell size of 10 m × 10 m, and the total number of positive samples in the study area, i.e., the number of pixels with landslides, was determined to be 58,537. The same number of "non-landslide" pixels was randomly selected in the study area. The positive samples were provided a value of "1," and the negative samples were provided a value of "0." Moreover, 70% of the dataset comprising a total of 117,074 pixels was used for model training and the remaining 30% was used for model validation.

Landslide-Conditioning Factors
One of the important stages of LSM is the determination of topographical, geological, environmental, and anthropogenic factors affecting the occurrence of landslides in the study area [70]. Vakhshoori et al. [71] reported that there is no standard rule on the selection of the conditioning factors, and the scale of the study area, geoenvironmental conditions, landslide occurrence mechanism in the study area, and data availability influence the selection of factors. Considering these criteria, a total of 10 landslide-conditioning factors, such as lithology, land cover, slope, aspect, elevation, curvature, TWI, and distances from faults, drainage networks, and roads, were used in the study. The factor maps prepared in ESRI ArcGIS 10.5 were converted to raster format with a spatial resolution of 10 m.
In this study, a digital elevation model (DEM) with a resolution of 10 m × 10 m was derived from 1:25,000 scale topographic maps with a 10-m contour interval obtained from the General Directorate of Mapping. The elevation, slope angle, aspect, curvature, TWI, and distance from drainage network parameters were derived from this DEM.
The elevation is often used in LSM studies [23,34,39,72]. The elevation of the study ranges from 0 to 3370 m ( Figure 2a). The elevation is divided into 10 equal parts at 337-m intervals ( Table 1).
As reported by Zhang et al. [72], the slope angle is a key factor for slope stability and is one of the most commonly used parameters in LSM studies [73][74][75]. The average slope of the study area, where the slope ranges from 0 • to 76.95 • , is 28.38 • . The slope is divided into 10 classes at 5 • intervals ( Figure 2b).
The aspect at a point on the land surface is the direction that faces the tangent plane passing through that point and is expressed in angles (in degrees). Similar to elevation and slope, aspect is also widely used in LSM studies [42,45,48]. Bragagnolo et al. [74] reported that aspect has an indirect effect on slope stability depending on rainfall, wind, and solar radiation. The aspect map of the study area was derived from DEM and divided into nine classes ( Figure 2c).
Curvature, one of the factors associated with the occurrence of landslides, represents slope shape and terrain morphology [22,76]. Negative curvatures represent concave surfaces, zero curvatures represent flat surfaces, and positive curvatures represent convex surfaces. In this study, the curvature was derived from DEM using ArcGIS 10.5 software and divided into three subclasses, i.e., concave, flat, and convex ( Figure 2d).
TWI is defined as a theoretical measure of flow accumulation at any point in a basin, and hence, soil wetness [74]. In other words, TWI measures the degree of water accumulation in one region [77]. In the study area, the TWI ranges from 2.12 to 27.92 (Table 1). TWI is divided into nine subclasses using the natural breaks classification method and used in landslide susceptibility analysis ( Figure 3a).
1 Figure 2. Topographic factor maps a) elevation; b) slope; c) aspect; and d) curvature.   One of the most important parameters affecting the stability of the slopes is lithology. This is because lithological and structural variations often cause a difference in the strength and permeability of rocks and soils [80]. The lithological units and faults used in this study were obtained from the The proximity of the slopes to a drainage network is another important factor in terms of stability. This is because rivers disrupt the stability of slopes by eroding the toe of the slopes or saturating the section of the material constituting the slope under the stream level [78]. The drainage network of the study area was derived from DEM using the hydrological analysis module in ArcGIS 10.5 software. Drainage buffers were calculated using the Euclidean distance function in ArcGIS 10.5 and reclassified at 100-m intervals and divided into nine subclasses ( Table 1). The maximum distance to the drainage network was calculated to be 926.55 m (Figure 3b).
In regions with rugged topography, landslides occur on the slopes near the road. Road constructions, particularly on hillsides with a high slope, may reduce the load on the slope toe, form stress cracks behind the slope, and lead to the occurrence of landslides owing to the disrupted slope structure [21]. Therefore, in this study, the distance from roads was considered as a conditioning factor. The road network data of the study area were digitally provided by the Forest Information System of Artvin Regional Directorate of Forestry. This 1:25,000 scale dataset includes highways, village roads, and forest roads. In ArcGIS 10.5 software, the road buffer map was produced using the Euclidean distance function. The road buffers were divided into 10 categories at 200-m intervals to obtain a map of the proximity from roads (Figure 3c).
Land cover is an indirect indicator of slope stability. Areas covered with barren and sparse vegetation show more instability than forests. Moderately sloping agricultural lands are susceptible to landslides because of repetitive irrigation processes [79]. In this study, CORINE land cover (CLC 2018) data, which were provided by Copernicus Land Monitoring Service (CLMS), a European Union Earth Observation Programme service, updated in 2018, were used. According to this dataset, the study area includes 17 different classes of land cover (Figure 3d). Moreover, 62.74% of the study area is covered with forests, 16.08% with agricultural lands, 8.15% with natural grasslands, and 7.99% with sparsely vegetated areas ( Table 1).
One of the most important parameters affecting the stability of the slopes is lithology. This is because lithological and structural variations often cause a difference in the strength and permeability of rocks and soils [80]. The lithological units and faults used in this study were obtained from the 1:100,000 scaled geology map obtained from the GDMRE [59]. According to the geological map, there are 24 different lithological units in the study area ( Figure 4). The Liyas-aged Hamurkesen formation (Jh), which is considered the oldest unit in the study area, consists of basaltic, andesitic, dacitic lava and pyroclastics and sandstone, marn, limestone, and shale. Among volcanic units, Late Cretaceous-aged (Turonian-Conasian) Çatak formation (Kç) consists of basaltic, andesitic lava and pyroclastics, and intermediate levels of clay limestone, marn, sandstone, siltstone, and claystone. The stack of dacitic lava-tuffs in the study area is discriminated by the symbol "Kkdt." The Santonian-aged Kızılkaya formation (Kk) consists of dacitic, rhyodacitic lava and pyroclastics. The Campanian-Maastrichtian aged Çaglayan formation (Kça), the main lithological unit in the study area, comprises basaltic and andesitic lava and pyroclastics along with mudstone, sandstone, argillaceous limestone, marn, and tuff interlayers. In the study area, the red limestones with sand-clay-Globotruncana, which are observed in the areas where the Çaglayan formation outcrops, is shown by the symbol "Kçt," and the stack of clay tuff, tuff sandstone, sandy limestone and breccia is shown by the symbol "Kçat1." In the unit that outcrops in the study area and is called Hematite dacite (Kçbh), the dacite is brown, since it contains hematite. The unit was effectively clayed. These units include the Maastrichtian-Danian (Early Paleocene) aged Cankurtaran formation (KTc) consisting of sandy limestone, micritic limestone, tuff, marn, and volcanic sandstone. The intercalation of tuff, marn, and fine-grained agglomerate is discriminated by the symbol KTct2; the grey and red limestone is discriminated by the symbol KTck; and the intercalation of tuff, marn, limestone, volcanic sandstone, and agglomerate is discriminated by the symbol KTct3. Accordingly, volcanic intercalations are seen to be present within the unit. The Bakırköy formation (Tpeb), consisting of siltstone, claystone, sandstone, conglomerate, clayey limestone, and marn, is Paleocene-Early Eocene aged [59].
limestone, tuff, marn, and volcanic sandstone. The intercalation of tuff, marn, and fine-grained agglomerate is discriminated by the symbol KTct2; the grey and red limestone is discriminated by the symbol KTck; and the intercalation of tuff, marn, limestone, volcanic sandstone, and agglomerate is discriminated by the symbol KTct3. Accordingly, volcanic intercalations are seen to be present within the unit. The Bakırköy formation (Tpeb), consisting of siltstone, claystone, sandstone, conglomerate, clayey limestone, and marn, is Paleocene-Early Eocene aged [59]. The Cankurtaran and Bakırköy formations were cut by Kaçkar granitoid-I (Kk1) consisting of acidic and basic intrusive rocks that have continued to develop in the Late Cretaceous period and completed their intrusion at the end of Paleocene. In the study area, the superficial adamellite is discriminated by (Kk1a), granodiorite by (Kk1gd), and quartz diorite, diorite by (Kk1kd). In the study The Cankurtaran and Bakırköy formations were cut by Kaçkar granitoid-I (Kk1) consisting of acidic and basic intrusive rocks that have continued to develop in the Late Cretaceous period and completed their intrusion at the end of Paleocene. In the study area, the superficial adamellite is discriminated by (Kk1a), granodiorite by (Kk1gd), and quartz diorite, diorite by (Kk1kd). In the study area, the unit consisting of the Middle Eocene aged (lower level of the Middle Eocene) is called the Erenler formation (Tee), which consists of the mudstone, claystone, and sandstone intercalation. The volcanics, which are discriminated as basalt and andesite and rest conformably on this unit, are shown by "Tekba" (Figure 4). In this unit, there is the Kabaköy formation (Tek), a volcano-sedimentary stack consisting of conglomerate, sandy limestone, sandstone, marn, and tuff, and Middle Eocene aged andesitic, basaltic lava and pyroclastics. The first of the two rocks that cut this formation consists of Diorite-Quartz Diorite-Granodiorite-Gabbro, which is a black and dark green color, with a compact appearance, and is not fractured, is shown by the symbol "Tkd." The other is a blackish-dark green rock, which consists of diorite-quartz diorite-granodiorite-gabbro, and shown as Kaçkar Granitoid-II (Tk2). Dacitic and andesitic volcanites and volcano-clastic Taşpınar formation (Tet) rest conformably on the Eocene aged rocks. The Quaternary-aged alluvial sediments (Qal) are the youngest units in the study area [59].
Regmi et al. [82], Pourghasemi et al. [22], and Wang et al. [70] reported that landslides mainly occur along faults. In fact, areas close to the faults are highly susceptible to landslides because the rock around the faults is severely broken and the strength of the rock decreases owing to tectonic breaks [34]. The fault buffers were also produced using the Euclidean distance function. The fault buffers were reclassified by dividing them into 10 classes at equal intervals of 1000 m to obtain a map of the distance from the fault lines ( Figure 5).
Regmi et al. [82], Pourghasemi et al. [22], and Wang et al. [70] reported that landslides mainly occur along faults. In fact, areas close to the faults are highly susceptible to landslides because the rock around the faults is severely broken and the strength of the rock decreases owing to tectonic breaks [34]. The fault buffers were also produced using the Euclidean distance function. The fault buffers were reclassified by dividing them into 10 classes at equal intervals of 1000 m to obtain a map of the distance from the fault lines ( Figure 5).

Landslide Susceptibility Assessment Using Random Forest (RF) Model
RF classification, which was originally developed by Breiman [83], is a machine learning algorithm for nonparametric multivariate classification [84]. RF is a popular ensemble learning method that is widely used for classification and regression. Generally, a single DT individually exhibits weak prediction performance because of a high variance or bias [64,85]. Therefore, RF creates numerous DTs for classification [86]. RF can also be perceived as a group of random DTs. Because the performance of an ensemble model is better than that of individual models, individually created DTs are combined to form a decision forest, i.e., an RF. Each tree in the forest has independent and identical distribution (iid), and thus, they are relatively uncorrelated with each other. This property makes the RF, which is formed by the iid DTs, far from the overtraining risk [83]. Here, the DTs are a subset that is randomly selected from the corresponding dataset. The results obtained from all DTs are combined to obtain the result of the RF.
To establish a classification model in RF, two parameters should be defined: ntree parameter, corresponding to the number of DTs produced by RF, and mtry parameter, referring to the number of factors or variables used in the node of each DT. Taalab et al. [85] reported that there is no rule regarding the number of trees that should be established in RF, and increasing the number of trees will not increase the accuracy of the model. However, Chen et al. [38] emphasized that the number of variables to be used in DTs should equal the square root of the number of causal factors. RF algorithm was explained in detail by Breiman [83], Catani et al. [84], and Taalab et al. [85]. In this study, the "caret" package [87] is used in R 3.6.3 to employ the RF model. Herein, the number of trees (ntree) is set to 50 and the mtry parameter is set to 8 after several attempts. RF is performed using a 10-fold cross-validation approach to reduce the variability of the model results and to limit overfitting. Figure 6 shows the landslide susceptibility map produced using the RF model in the R program and divided into five classes (very low, low, moderate, high, and very high) using the natural breaks classification method. According to this map, 61.25%, 17.70%, 10.46%, 6.05%, and 4.54% of the study area exhibit a very low, low, moderate, high, and very high susceptibility to landslide, respectively. Although only 10.59% of the study area is highly and very highly susceptible to landslides, approximately 96% of the existing landslides were in these two classes ( Table 2).

Number of Pixels
Relative Area (%)

Number of Landslide Pixels
Relative Area (%) Figure 6. Landslide susceptibility map produced using the random forest model.
According to this map, 61.25%, 17.70%, 10.46%, 6.05%, and 4.54% of the study area exhibit a very low, low, moderate, high, and very high susceptibility to landslide, respectively. Although only 10.59% of the study area is highly and very highly susceptible to landslides, approximately 96% of the existing landslides were in these two classes (Table 2). The importance levels of the factors used in the study are shown in Figure 7. According to Figure 7, slope, elevation, lithology, distance from faults, and land cover are the most important factors in the landslide susceptibility analysis in the study area, while TWI and curvature are the least important factors. Lin et al. [88] and Pourghasemi and Rahmati [33] reported that the relative importance of conditioning factors depends on the characteristics of the study area. The importance levels of the factors used in the study are shown in Figure 7. According to Figure  7, slope, elevation, lithology, distance from faults, and land cover are the most important factors in the landslide susceptibility analysis in the study area, while TWI and curvature are the least important factors. Lin et al. [88] and Pourghasemi and Rahmati [33] reported that the relative importance of conditioning factors depends on the characteristics of the study area. The relative importance levels of the factors as well as the probability of landslides in each subclass of each factor were evaluated using the frequency ratio (FR) values provided in Table 1. Table 1 shows the relationship between the subclasses of conditioning factors in the study area and the occurrence of landslides. Similar to the relative importance values, the slope factor was one of the factors with the greatest FR value. In terms of the slope factor, 5°-10° (FR = 4.0083), 10°-15° (FR = 3.9109), 15°-20° (FR = 2.3812), and 20°-25° (FR = 1.2971) slope classes exhibited the greatest FR values and the probability of occurrence of landslides decreased as the slope increased above 25° (Table 1). Wang and Li [89], Vakhshoori et al. [71], and Nohani et al. [90] reported that the probability of the occurrence of landslides increased to some extent with increasing slope and then decreased.
Kavzoglu et al. [80] reported that the effect of elevation on landslide susceptibility is a controversial topic and has not yet been clarified by researchers. However, as shown in Figure 7, the elevation was the second most important factor in the landslide susceptibility analysis in the study area. In terms of elevation, the elevation classes in the range of 1348-1685 m (FR = 2.2415), 0-337 m (FR = 1.6976), 1685-2022 m (FR = 1.5135), and 1011-1348 m (FR = 1.1962) were found to be susceptible to landslide (Table 1). The results of this research revealed that the elevation class of 0-337 m was particularly susceptible to landslide because the residential and tea cultivation areas in the study site are located within this elevation class. The study conducted by Çan and Duman [91] reported that The relative importance levels of the factors as well as the probability of landslides in each subclass of each factor were evaluated using the frequency ratio (FR) values provided in Table 1. Table 1 shows the relationship between the subclasses of conditioning factors in the study area and the occurrence of landslides. Similar to the relative importance values, the slope factor was one of the factors with the greatest FR value. In terms of the slope factor, 5 • -10 • (FR  (Table 1). Wang and Li [89], Vakhshoori et al. [71], and Nohani et al. [90] reported that the probability of the occurrence of landslides increased to some extent with increasing slope and then decreased.
Kavzoglu et al. [80] reported that the effect of elevation on landslide susceptibility is a controversial topic and has not yet been clarified by researchers. However, as shown in Figure 7 Table 1). The results of this research revealed that the elevation class of 0-337 m was particularly susceptible to landslide because the residential and tea cultivation areas in the study site are located within this elevation class. The study conducted by Çan and Duman [91] reported that the landslides induced by heavy rainfall in Hopa district in Artvin province on August 24, 2015 showed distributions along the coastline, and 80% of the landslides occurred in zones with an elevation of less than 350 m.
Approximately 83% of the landslides in the study area occurred in three lithological units: Çaglayan formation (Kça), Kabaköy formation (Tek), and Cankurtaran formation (KTc). These three units cover approximately 58% of the study area (Table 1). Gökçe et al. [52] reported that the Cretaceous and Eocene volcanisms in the Eastern Black Sea formed the source areas for the occurrence of landslides, and the landslides were primarily flowing type in areas where the Eocene and Cretaceous volcanic rocks formed as a result of the Pontid volcanism in the Eastern Black Sea.
Approximately 77% of landslides in the study area occurred at a distance of 2000 m from faults. Areas near the fault lines have high FR values, while the probability of the occurrence of landslides decreases with increasing distance from faults (Table 1). Similarly, Althuwaynee et al. [92] stated that the probability of landslide occurrence decreases as the distance from faults increases. In terms of proximity to faults, Chen et al. [68] and Wang et al. [70] found that the <1000-m class exhibited the greatest probability of the occurrence of a landslide, which decreased as the distance from faults increased.
When Table 1 is examined, it can be seen that 31.6% of landslides in the study area occurred in agricultural areas, while 61.01% occurred in forested areas. In terms of land cover, the classes with 131 (mineral extraction sites), 122 (road and rail networks and associated land), 243 (land principally occupied by agriculture, with significant areas of natural vegetation), 222 (fruit trees and berry plantations), 121 (industrial or commercial units), and 311 (broad-leaved forest) CORINE 2018 land cover codes exhibited the greatest probability of the occurrence of a landslide. In a study in which the landslide susceptibility of Arhavi and its surroundings was evaluated by Aksoy [2], it was determined that the landslides in this region generally occurred in tea plantation areas where the slope was relatively low and the decomposition depth was not considerably high. The study by Bulut et al. [93] investigated the causes of landslides occurring in the eastern parts of the Fındıklı district in the Rize province (which is located in the west of the Arhavi district and has similar topographical, climatic, and environmental factors as Arhavi), and reported that the landslides mostly occurred at hillsides with a slope ranging from 10 • to 25 • . The study also stated that the fact that 77% of landslides occurred in residential areas and tea cultivation areas was important owing to the effect of altered vegetation on the occurrence of a landslide. In a study conducted by Erener et al. [64] in theŞavşat district in the Artvin province in Turkey, the landslide activity was reported to increase in areas where the original vegetation was removed or altered.
Conversely, approximately 68% of the current landslides were found to have occurred at slopes with north, northwest, and northeast aspects. The slopes with these three aspects were found to be at a higher risk of landslides (Table 1). It is assumed that the general rainfall direction of the region and the general morphological structure of the study site play a significant role in the fact that landslides mostly occurred in the north aspect. In a study conducted by Dag and Bulut [58] in the Çayeli district in the Rize province in Turkey, most landslides in the study area occurred on the slopes with a north-northeast aspect. In the study by Bahrami et al. [94] in Sarv-Abad (Iran), the frequency rates of landslides with north, northeast, and northwest aspects were found to be higher than other directions.
Approximately 66% of landslides in the study area occurred at 100 m distance from drainage networks, and 70% of them at 200 m distance from roads. As distances from drainage networks and roads increase, the likelihood of landslides decreased (Table 1). These results have shown that uncontrolled road excavations in the region and carving of slope heels in areas close to drainage networks have been effective on landslides. Similarly, Yılmaz [95], in his study on the general characteristics and causes of landslides occurring in the Eastern Black Sea Region, found that two of the main factors affecting landslides in the region were associated with excavations made by locals for road and home constructions as well as erosion along the stream side.
The curvature results show that 51.90% of landslides occur on concave slopes and 46.31% on convex slopes (Table 1). Concave areas are seen to have the highest FR (1.0884) value. It was determined that based on TWI, the landslide occurrence probability has gradually increased up to the 9.50-11.33 subclass, reaching the maximum value in the 9.50-11.33 subclass (FR = 1.7555), which then gradually decreased (Table 1).

Model Performance and Validation
Validation is an important step in assessing the quality of the landslide susceptibility model. The quality or performance of the RF model used in the study was evaluated using the success and prediction rate curves based on the receiver operating characteristic (ROC)/area under the curve (AUC). Seventy percent of the dataset, consisting of landslide (positive) and non-landslide (negative) pixels, was used for model training, and the remaining 30% was used for model validation. The success rate curve was produced using examples from the training dataset ( Figure 8a). Pourghasemi et al. [12] noted that the success rate curve may help to determine how well landslide susceptibility maps classify existing landslide areas. In this study, the AUC value for the success rate curve was found to be 98.3%. However, several researchers have stated that there is no established method to evaluate the prediction capability of the model as the success rate curve is generated from the training dataset [12,66,92]. To explain how well landslide models and landslide-conditioning factors predict landslides, the prediction rate curve produced using the verification dataset should be used [12,66,92,96,97]. The AUC value for the prediction rate curve generated using the verification dataset was found to be 97.7% (Figure 8b). pixels, was used for model training, and the remaining 30% was used for model validation. The success rate curve was produced using examples from the training dataset (Figure 8a). Pourghasemi et al. [12] noted that the success rate curve may help to determine how well landslide susceptibility maps classify existing landslide areas. In this study, the AUC value for the success rate curve was found to be 98.3%. However, several researchers have stated that there is no established method to evaluate the prediction capability of the model as the success rate curve is generated from the training dataset [12,66,92]. To explain how well landslide models and landslide-conditioning factors predict landslides, the prediction rate curve produced using the verification dataset should be used [12,66,92,96,97]. The AUC value for the prediction rate curve generated using the verification dataset was found to be 97.7% (Figure 8b). Lin et al. [88] reported that the AUC values greater than 0.7 for the ROC curves are generally considered good, whereas values greater than 0.9 indicate excellent model compatibility. According to the success and prediction rate values, the model used in this study showed excellent performance in predicting landslide susceptibility in the study area. In the LSM study conducted by Dou et al. [40], using DT and RF models in Izu-Oshima Volcanic Island of Japan, the authors found that the RF model with 95.6% AUC exhibited better performance than the DT model. In the landslide susceptibility study conducted by Achour and Pourghasemi [45] using the RF, SVM, and boosted regression tree models, researchers found that the RF model with 97.2% AUC exhibited the highest prediction accuracy compared with other models.

Conclusions
In this study, LSM of the Arhavi, Hopa, and Kemalpaşa districts in the Artvin province was prepared using the RF machine learning technique and various factors, such as lithology, land cover, slope, aspect, elevation, curvature, TWI, and distances from faults, drainage networks, and roads. Lin et al. [88] reported that the AUC values greater than 0.7 for the ROC curves are generally considered good, whereas values greater than 0.9 indicate excellent model compatibility. According to the success and prediction rate values, the model used in this study showed excellent performance in predicting landslide susceptibility in the study area. In the LSM study conducted by Dou et al. [40], using DT and RF models in Izu-Oshima Volcanic Island of Japan, the authors found that the RF model with 95.6% AUC exhibited better performance than the DT model. In the landslide susceptibility study conducted by Achour and Pourghasemi [45] using the RF, SVM, and boosted regression tree models, researchers found that the RF model with 97.2% AUC exhibited the highest prediction accuracy compared with other models.

Conclusions
In this study, LSM of the Arhavi, Hopa, and Kemalpaşa districts in the Artvin province was prepared using the RF machine learning technique and various factors, such as lithology, land cover, slope, aspect, elevation, curvature, TWI, and distances from faults, drainage networks, and roads. The landslide susceptibility map was divided into five susceptibility classes: very low, low, moderate, high, and very high. The results show that approximately 96% of the landslides were observed in high and very high susceptibility classes. The landslide susceptibility map was validated using the success and prediction rate curves. The area below the prediction rate curve was found to be 97.7%. This value reveals the accuracy of the generated landslide susceptibility map. It was found that the most important factors in the occurrence of landslides in the study area were slope, elevation, lithology, distance from faults, and land cover. The majority of the tea and hazelnut cultivation areas in the region were obtained by transforming forests into agricultural lands. Therefore, incorrect land-use decisions, such as the removal of natural vegetation at hillsides with a high slope, transformation of these areas into tea and hazelnut cultivation areas, and building and road construction works performed by tea and hazelnut producers at these regions, lay the foundation for landslides, which are also triggered by heavy rainfall. Consequently, necessary measures should be taken to prevent landslide in the tea and hazelnut agricultural areas, and the establishment of new agricultural areas should not be allowed. Furthermore, in areas susceptible to landslides, forestation should be performed with deep-rooted forest trees and any human activities creating any additional burden on hillsides with a high slope should not be allowed.