A Robust Deep-Learning Model for Landslide Susceptibility Mapping: A Case Study of Kurdistan Province, Iran

We mapped landslide susceptibility in Kamyaran city of Kurdistan Province, Iran, using a robust deep-learning (DP) model based on a combination of extreme learning machine (ELM), deep belief network (DBN), back propagation (BP), and genetic algorithm (GA). A total of 118 landslide locations were recorded and divided in the training and testing datasets. We selected 25 conditioning factors, and of these, we specified the most important ones by an information gain ratio (IGR) technique. We assessed the performance of the DP model using statistical measures including sensitivity, specificity, accuracy, F1-measure, and area under-the-receiver operating characteristic curve (AUC). Three benchmark algorithms, i.e., support vector machine (SVM), REPTree, and NBTree, were used to check the applicability of the proposed model. The results by IGR concluded that of the 25 conditioning factors, only 16 factors were important for our modeling procedure, and of these, distance to road, road density, lithology and land use were the four most significant factors. Results based on the testing dataset revealed that the DP model had the highest accuracy (0.926) of the compared algorithms, followed by NBTree (0.917), REPTree (0.903), and SVM (0.894). The landslide susceptibility maps prepared from the DP model with AUC = 0.870 performed the best. We consider the DP model a suitable tool for landslide susceptibility mapping.


Introduction
Landslides occur in a variety of materials and undergo various styles of movement at different rates [1]. Landslides play an important geomorphological role in the evolution of landscapes, impacting the natural (soils, ecosystems, aquatic habitat, etc.) and built (residential areas, roads, pipelines, etc.) environment [2,3]. Landslide hazards are often exacerbated by land use practices such as road building, and deforestation, and may be made worse by increases in precipitation [4]. Therefore, it is important to identify areas that have a high potential for landslides and mitigate landslide damage.

Description of the Study Area
Our study area, around Kamyaran city, is a mountainous area of nearly 150 km 2 in the southwest of the Kurdistan Province, Iran ( Figure 1). The elevation ranges from 850 to 2328 m and has a mean annual temperature that varies between 11.3 • C and 17 • C and mean annual precipitation of 528 mm. Geologically, the study area is in the Sirvan drainage basin, located in the structural zone of Sanandaj-Sirjan and Zagros. Bedrock lithologies include outcrops from Cretaceous to Quaternary rocks, the oldest of which include Micrite limestone and dark gray shale. Most of the study areas are covered by Mesozoic and Cretaceous formations, which include Basaltic pillow lava and dark grey shale with intercalations of volcanic rocks. Holocene sediments of the Old Testament include alluvial fans and alluvial barracks. The predominant land covers in the study area are semi-dense forests and dry farming. In addition, dense pasture, semi-dense pasture, low-dense forest and woodlands are other types of land cover/land use in the study area. The area is significantly prone to landslides associated with road developments.

Landslide Inventory Map
It is necessary to prepare a landslide distribution map for landslide modeling because the assumption of the modeling process is that future landslides occur in the same conditions as in the past [42]. That "the past and the present are key to the future" is one of the most important principles in earth science. This means landslides that have occurred in the past and present under specific topographic, geological, hydrogeological, and climatic conditions in an area can provide useful information to predict the potential for future land- slides in that area [43]. A map showing such information is useful for studying the spatial relationships between landslide distribution and factors affecting landslide occurrence [44]. Galli et al. [45] have mentioned that the quality of a landslide inventory map can lead to reasonable results in landslide modeling. From a total of 118 landslide points detected in the study area, 94 points (~80%) were used as the training dataset, and 24 points (~20%) were considered as the validation dataset. A total of 118 landslide locations used in this study were a part of a total of 175 landslide locations of Asadi et al. [41].

Landslide Conditioning Factors
The selection of the factors affecting the occurrence of landslides is one of the most important steps in landslide susceptibility studies [46]. In this study, we selected 25 conditioning factors that were slope angle, aspect, elevation above sea level, curvature, profile curvature, plan curvature, solar radiation, valley depth (VD), terrain ruggedness index (TRI), vector ruggedness measure (VRM), stream power index (SPI), topographic wetness index (TWI), length slope (LS), topographic position index (TPI), land use, normalized difference vegetation index (NDVI), lithology, soil, distance to fault, distance to river, distance to road, fault density, river density, road density, and rainfall ( Figure 2). We used some conditioning factors for this study area that were earlier published by Asadi et al. (2022).
It is necessary to prepare a landslide distribution map for landslide modeling because the assumption of the modeling process is that future landslides occur in the same conditions as in the past [42]. That "the past and the present are key to the future" is one of the most important principles in earth science. This means landslides that have occurred in the past and present under specific topographic, geological, hydrogeological, and climatic conditions in an area can provide useful information to predict the potential for future landslides in that area [43]. A map showing such information is useful for studying the spatial relationships between landslide distribution and factors affecting landslide occurrence [44]. Galli et al. [45] have mentioned that the quality of a landslide inventory map can lead to reasonable results in landslide modeling. From a total of 118 landslide points detected in the study area, 94 points (~80%) were used as the training dataset, and 24 points (~20%) were considered as the validation dataset. A total of 118 landslide locations used in this study were a part of a total of 175 landslide locations of Asadi et al. [41].

Landslide Conditioning Factors
The selection of the factors affecting the occurrence of landslides is one of the most important steps in landslide susceptibility studies [46]. In this study, we selected 25 conditioning factors that were slope angle, aspect, elevation above sea level, curvature, profile curvature, plan curvature, solar radiation, valley depth (VD), terrain ruggedness index (TRI), vector ruggedness measure (VRM), stream power index (SPI), topographic wetness index (TWI), length slope (LS), topographic position index (TPI), land use, normalized difference vegetation index (NDVI), lithology, soil, distance to fault, distance to river, distance to road, fault density, river density, road density, and rainfall ( Figure 2). We used some conditioning factors for this study area that were earlier published by Asadi et al. (2022).

Slope Angle
Landslide hazard is often linked to slope angle, with shear stresses increasing on steeper slopes [47]. The supply of soil available for sliding often thins dramatically on steeper slopes above 25 degrees [48]. In other words, on high slopes, the type of material is more often stone and outcrops, such that medium slopes are more prone to landslides. This layer in the present study was extracted from the digital elevation model (DEM) and classified into eight intervals: 0-13, 14-22, 23-30, 31-42,   Landslide conditioning factors used in this study: (a) slope angle, (b) aspect, (c) elevation, (d) curvature, (e) plan curvature, (f) profile curvature, (g) solar radiation, (h) VRM, (i) VD, (j) SPI, (k) TWI, (l) TRI, (m) TPI, (n) LS, (o) land use, (p) NDVI, (q) rainfall, (r) distance to fault, (s) distance to road, (t) distance to river (u), fault density, (v) road density, (w), river density, (x) lithology, and (y) soil texture.

Slope Angle
Landslide hazard is often linked to slope angle, with shear stresses increasing on steeper slopes [47]. The supply of soil available for sliding often thins dramatically on steeper slopes above 25 degrees [48]. In other words, on high slopes, the type of material is more often stone and outcrops, such that medium slopes are more prone to landslides. This layer in the present study was extracted from the digital elevation model (DEM) and classified into eight intervals: 0-13, 14-22, 23-30, 31-42, and >43 ( Figure 2a).  (Figure 2b).

Curvature
Curvature maps show the extent to which the surface deviates from the flatness, or in other words, the convexity and concaveness of the slope [52]. The curvature of the slope represents the shape of the topography so that the positive concavity represents the surface where the pixels are convex (Convex, Coves, Hollows), Negative concavity indicates a surface where the pixels are concave (Concave, Noses) and zero indicates a surface that has no slope and is straight (Flat, Straight). These three types of slope shapes have a great effect on slope instability by controlling the concentration and diffusion of surface and subsurface water in the slopes [53]. Convexity and concavity of the slope curvature map using distances between consecutive topographic lines in the GIS were extracted from the DEM of the region and classified into five classes (1) highly concave (−51.

Profile Curvature
Profile curvature is an important factor that affects the stress resistance due to landslides in the path and indicates the intensity of water flow and transportation and deposition processes [54]. The positive values in the transverse curvature of the slope indicate concavity (decrease in flow rate) and the negative values indicate convexity (increase in flow rate) [55]. Profile curvature was extracted from DEM and constructed in five categories:

Solar Radiation
The average convergence of solar radiation per pixel over a year is called the intensity of solar radiation, which is expressed in kilowatt hours per square meter [56]. The importance of this index is that its larger value indicates more vapor than the soil surface in an area. This index also controls the amount of vegetation on the slope. The less solar radiation that reaches a slope, the more vegetation appears on the slope, and as a result, the slope be- comes more stable [57,58]. In the present study, the solar radiation layer was extracted from DEM in ArcGIS and categorized into five classes: (1) 80,000-43,000, (2) 440,000-540,000, (3) 550,000-630,000, (4) 640,000-700,000, and (5) 710,000-810,000 ( Figure 2g).

Vector Ruggedness Measure
Vector ruggedness measure (VRM) factor was suggested by Hobson et al. [59]. It provides a way to measure terrain ruggedness as the variation in the three-dimensional orientation of grid cells within a neighborhood: slope and aspect are captured into a single measure and used to decouple terrain ruggedness from just slope or elevation [60]. The VRM map was created from DEM in the SAGA GIS software environment and then it was divided into five classes:

Stream Power Index
Stream Power Index (SPI) is a criterion derived from the DEM that might affect landslide occurrence, and it reflects the erosive power of slope surface run-off [61,62]. It can be formulated as follows: where A s is the specific basin area and tan β represents the slope angle. In this study, it was prepared based on DEM in the SAGA GIS software and then exported to ArcGIS software to map. The SPI layer was then extracted in five intervals: (1) 0-1510, (2) 1520-1600,

Topographic Wetness Index
Topographic wetness index (TWI) represents a theoretical component of flow accumulation at any point in a watershed or region that is used to describe the spatial pattern of soil moisture [63]. This index is generally used for topographic control over hydrological processes and its high values are generally used in landslide bodies. The TWI can be formulated as follows: where A s is cumulative drainage upstream area at one point and tan the angle of slope at the point. The TWI was prepared in five classes: (1)

Terrain Ruggedness Index
Terrain Ruggedness Index (TRI) was introduced by Riley [64], and it is actually the difference in the height of one pixel with the eight pixels around it. Equation (3) is provided to calculate this index: where p is the number of pixels in the region and ZMD is the average difference of eight pixels around each pixel. The TRI map was prepared in five classes: (1) 0-2.64, (2) 2.65-4.75, Topographic position index (TPI) compares the height of each pixel in the digital elevation model with the specified pixel around that pixel [65]. To calculate TPI (Equation (4)), the height of each cell in a digital elevation model compared with the average height of neighboring cells is examined. Finally, the average height decreases from the height value in the center. Areas higher than the surrounding points (hills) are indicated by positive TPI values; negative TPI values denote areas lower than their surroundings (valleys). Zero and near-zero values also illustrate flat areas (where the slope is close to zero) or areas with a fixed slope [66].
where Z 0 is the point height of the model under evaluation, Z n is the height of the grid and n is the total number of surrounding points considered in the evaluation. We prepared TPI in five classes: (1)

Slope Length
The slope length (LS) factor, which is a combination of the slope angle and length of the slope, is a fundamental factor in the study of landslides because this factor refers to the sediment transport capacity created by the landslide through the daily (direct) flow. Carrara [67] stated that there is a relationship between landslide density and slope length. Therefore, this factor is examined in this study [67]. Mathematically, this equation is expressed as: where A s is the specific catchment area and β is the degree of local slope gradient. This index was prepared based on DEM in the SAGA GIS software, and after exporting in the GIS environment it was classified into five classes: (1) 0-6.88, (2) 6.89-13.1, (3) 13.2-19.6, (4) 19.7-28.2, and (5) 28.3-87.8 ( Figure 2n).

Land Use/Land Cover
Land use is one of the important indicators in the instability of slopes, and it affects the characteristics of the land and changes its behavior [53]. In this study, the land use layer was prepared and extracted from an Iranian land use map. Land use/cover classes in the current research are dry-farming, semi-dense forest, low-dense forest, semi-dense pasture, dense pasture, and woodland ( Figure 2o).

Normalized Difference Vegetation Index
The normalized difference vegetation index (NDVI) factor shows the ability to detect growth and vegetation levels in an area [68,69]. It is obtained by subtracting the reflection values of red band (Red) or visible spectrum (0.6-6.7 µm) and near-infrared band (NIR) (0.7-1/1 µm). Equation (6) is used to calculate this index: The minimum and maximum values of this index, respectively, are (−1) and (+1). The NDVI map was produced in five classes:

Distance to Fault
Large-scale structures such as faults and thrusts can influence the distribution of landslides [71]. In this study, distance to fault was calculated by the "Euclidean Distance" tool in ArcGIS software, in terms of distance from each pixel from the study area to the nearest fault. Based on these results, buffers were constructed around the fault with distances of 100 m, and this map was extracted into five classes: (1) 0-100, (2) 101-200, (3) 201-300, (4) 301-400, and (5) >400 (Figure 2r).

Distance to Roads
Both cut and fill slopes and improper road drainage structures associated with road construction can contribute to slope instability [72]. In this study, distance to road was calculated by the "Euclidean Distance" tool in ArcGIS software, in terms of distance from each pixel from the study area to the nearest road. Distance to roads was mapped with five categories: (1) 0-100, (2)

Distance to Rivers
Another conditioning factor that directly impacts landslide susceptibility is distance to river. Flowing water is one of the factors increasing the potential for instability in the slopes, playing an effective role in mass movements. Distance to river was calculated by the "Euclidean Distance" tool in the ArcGIS software in meters of each pixel from the study area to the nearest stream line. The map was created with five classes: (1) 0-100, (2)

Fault Density
Slope instabilities are more likely to occur in areas where the number of faults is high and particularly when the faults are active [73]. Fault density is the ratio of the total length of faults in a given watershed or a given area to the total area of the watershed or the area surrounding those faults [74]. The higher the density of faults in an area, the greater the split in rocks and the reduction in shear strength of rocks and slope constituents due to weathering. As a result, the risk of slope instability and landslides increases on the slopes [75]. Fault density was extracted with five classes: (1)

Road Density
Road density is the ratio of the total length of roads in a given watershed or a given area to the total area of the watershed or the area surrounding those roads [76]. Although the quality of roads and drainage control are important, road density can also influence landslide occurrence [77]. Road density was calculated using the "Line density" tool in the ArcGIS software for modeling, and the factor was classified into five classes:

River Density
Another influence controlling landslides is river density [78]. River density is the ratio of the total length of rivers in a given watershed within a given area to the total area of a watershed or area containing those rivers [79]. We used the "Line density" tool in the ArcGIS software to extract five classes of river density: (1) 0-0.5551, (2) 0.5551-1.4608,

Lithology
Lithology often strongly influences slope stability [80], in part due to variable strength characteristics of certain bedrock types [81]. Therefore, to determine the susceptibility of various lithological formations to produce landslides, we extracted lithological units of the case study of Kamyaran geology sheet with a scale of 1:100,000. The number of lithological units in the study area was divided into 10 classes (Figure 2x).

Soil Texture
Landslides that involve soils are influenced by the type of soil they occur in [82]. Soil texture influences properties such as permeability and cohesion, which can influence the style of movement [83]. Primarily, landslides change soil features by exposing parent material (the C horizon) by removing organic mats and the horizon A [84]. Changes in soil texture occur when a landslide moves or removes various materials to a specific location [85]. From the study area, 20 soil samples in different lithological units were collected to determine soil texture using the hydrometric method. We used the soil texture triangle to classify textural groups. The soil map was created into five classes: (1) Silty Loam (2) Clay Loam (3) Loam (4) Sandy Loam (5) Silty Clay (Figure 2y). Figure 3 shows the workflow of our study. In step 1, we collect and interpret landslideconditioning factors. In step 2, we divide landslide locations into the training and the validating datasets. In step 3, we conduct landslide modeling using the DL (deep learning) model and the three benchmark models (SVM, NBTree, and REPTree). In the DL model, we computed landslide susceptibility index (LSIs) for each pixel of the study in five steps: (i) constructing DBN using RBMs as pretraining on the dataset; (ii) parameter tuning in ELM to obtain the weights matrix from the last restricted Boltzmann machines (RBMs), (iii) fine tuning the training of the whole network by BP, (iv) optimizing the obtained weights from the network by the genetic algorithm (GA), and (v) assigning the optimum weights to the pixel of the study to map the landslide susceptibility. In step 4, we generate the landslide susceptibility maps using the outcomes of step 3. Finally, we compare and validate the performance of the models using a suite of statistical measures. ArcGIS software to extract five classes of river density: (1) 0-0.5551, (2) 0.5551-1.4608, (3) 1.4608-2.4542, (4) 2.4542-3.7983, (5) 3.7983-7.4505 (Figure 2w).

Lithology
Lithology often strongly influences slope stability [80], in part due to variable strength characteristics of certain bedrock types [81]. Therefore, to determine the susceptibility of various lithological formations to produce landslides, we extracted lithological units of the case study of Kamyaran geology sheet with a scale of 1:100,000. The number of lithological units in the study area was divided into 10 classes (Figure 2x).

Soil Texture
Landslides that involve soils are influenced by the type of soil they occur in [82]. Soil texture influences properties such as permeability and cohesion, which can influence the style of movement [83]. Primarily, landslides change soil features by exposing parent material (the C horizon) by removing organic mats and the horizon A [84]. Changes in soil texture occur when a landslide moves or removes various materials to a specific location [85]. From the study area, 20 soil samples in different lithological units were collected to determine soil texture using the hydrometric method. We used the soil texture triangle to classify textural groups. The soil map was created into five classes: (1) Silty Loam (2) Clay Loam (3) Loam (4) Sandy Loam (5) Silty Clay (Figure 2y). Figure 3 shows the workflow of our study. In step 1, we collect and interpret landslide-conditioning factors. In step 2, we divide landslide locations into the training and the validating datasets. In step 3, we conduct landslide modeling using the DL (deep learning) model and the three benchmark models (SVM, NBTree, and REPTree). In the DL model, we computed landslide susceptibility index (LSIs) for each pixel of the study in five steps: (i) constructing DBN using RBMs as pretraining on the dataset; (ii) parameter tuning in ELM to obtain the weights matrix from the last restricted Boltzmann machines (RBMs), (iii) fine tuning the training of the whole network by BP, (iv) optimizing the obtained weights from the network by the genetic algorithm (GA), and (v) assigning the optimum weights to the pixel of the study to map the landslide susceptibility. In step 4, we generate the landslide susceptibility maps using the outcomes of step 3. Finally, we compare and validate the performance of the models using a suite of statistical measures.

Deep Belief Network
One of the most common deep neural networks (DBN) training techniques is the use of unsupervised pretraining, which initializes the network using only unlabeled data. Network initialization has been shown to be a good starting point for fine-tuning with the next observer, and greatly reduces the risk of being trapped at the local minimum according to Kustikova and Druzhkov [86]. One of the methods used to teach deep networking is the deep belief network. The deep belief network [87,88] has become a popular approach in machine learning due to its advantages such as fast inference and the ability to encode richer and higher-order network structures. DBN operates a hierarchical structure with several finite Boltzmann machines, and operates through a layered learning process [89]. A deep belief network with two Boltzmann machines bounded for one problem to n inputs and one output is shown in (Figure 4). Usually, with pretraining, the deep belief network training process includes the following steps [90]: Step 1: Pretraining step: a sequential training of learning modules, greedily, one layer at a time, using unsupervised data; Step 2: First fine-tuning step: use random weights for the last layer (matrix W3 in Figure 4); Step 3: Second fine-tuning step: use back propagation to fine-tune the entire network using supervised data.

Extreme Learning Machine
The extreme learning machine (ELM) [91] was first proposed by Huang in 2004 for the single hidden-layer feedforward neural networks (SLFNs) with the aim of reducing the costs imposed by the post-error propagation procedure during the training process, and then extending to SLFNs where latent layer neurons do not need to be the same. Over the past decade, the extreme learning machine has been extensively studied due to its high productivity, effectiveness, and easy implementation [92]. The ELM has the advantage of a fast learning rate and high generalizability [93]. In ELM, the hidden layer does not need to be adjusted; that is, the connection weights from the input layer to the hidden layer as well as the hidden biases, and neurons are generated randomly without additional adjustment. The efficient least squares method is used to computationally calculate the connection weights from the hidden layer to the output layer [94].

Structure of the Deep-Learning Model
In the proposed model, for network training, the deep belief network training process mentioned in the DBN section is used; the difference is that in the first fine-tuning step, the ELM is applied to teach the weights between the last hidden layer and the output layer (W3 in Figure 4). The optimal network structure is also derived from GA. The steps of the genetic algorithm are as follows: Step 1: Chromosome coding and population initialization. The chromosome is directly counted by taking positive integers (to a predetermined population N). The number of genes on each chromosome indicates the number of hidden deep layer layers and the amount of each gene indicates the number of neurons. Chromosome genes are also randomly initialized.
Step 2: Assessment. Each chromosome is trained by the proposed hybrid model using training data. Then, the classification accuracy is calculated and considered as the fit value of that chromosome.
Step 3: Selection. The known mechanism of selecting the roulette wheel has been used to choose the parents for the combination and jump.
Step 4: Combination. To search the problem space, the one-point compound operator, which is one of the most common compound operators in the literature, has been used.
Step 5: Mutation. The mutation operator produces a new chromosome by randomly selecting a gene/layer and decreasing or increasing its amount. The purpose of this operator is to prevent the algorithm from being trapped in the local optimization by discovering new solution spaces.
Step 6: Selection of survivors. After arranging the chromosomes of the current population and the chromosomes resulting from the combination and mutation based on their proportional values, the superior N chromosomes are selected as the survivors.
Step 7: Stop criteria. When the number of generations reaches the predetermined value, the algorithm stops and the best chromosome returns as the answer; otherwise, it returns to step 3. The flowchart of the deep-learning model used in this study is shown in Figure 5.

Support Vector Machine
The support vector machine (SVM) algorithm is based on the theory of statistical learning that uses the inductive minimization principle of structural error leading to an overall optimal solution [95,96]. In recent years, this algorithm has attracted a lot of attention due to its good classification performance and good generalizability. The SVM includes the two operations, (i) nonlinear mapping of an input vector into a highdimensional feature space that is hidden from both the input and the output and (ii) construction of an optimal hyperplane to separate the features. The structure of this model is explained as follows: X i = (i = 1, 2, . . . , n) Sensors The training vectors included two classes of Y i = ±1; the purpose of this model is to find a differentiated hyperplane of −N dimensional by the maximum gap. The description is as follows: 1 /2 = W 2 (8) Subject to the following constraints.
where W is the norm of the normal of the hyperplane, (.) is a specific numerical production and b is a scalar base.

REPTree
The reduced error-pruning tree (REPTree) as a fast decision-tree learning process that combines two kinds of algorithms as a hybrid method involving reduced-error pruning (REP) and decision tree (DT) [97]. The main structure of this method is based on classification and regression problems. The REP minimizes the complexity of tree structure if the DT's performance is high [98]. The REPTree method uses the pruning mechanism to overcome the backward overfitting problem. Additionally, this technique uses the post-pruning method to obtain the minimal version of the most-accurate tree [99].

NBTree
Naïve Bayes tree (NBTree) was used due to its simplicity and linear runtime method, combining the J48 algorithm and the naïve Bayes algorithm [100]. This method is used for classification problems, especially to evaluate and pick the class that maximizes the subsequent class's likelihood. Hence, NBTree can solve problems of big data that relate to the Naïve Bayes algorithm and the data fragmentation of the J48 algorithm. The important distinguishment of this model from other machine-learning methods is that it is based on a minimal training data structure that uses a classification system to evaluate important parameters [101]. To build a Naïve Bayes classifier for detection of landslide occurrence points in the area, NBTree uses information obtained from the root node to a given leaf node down the tree, and then utilizes the training cases that fall into that leaf node [102].

Information Gain Ratio
In the present study, the information gain ratio (IGR) was applied as the basis of judgment for factor selection and to determine important comparative factors for modeling. For landslide susceptibility assessment, selecting the most effective factors as input dataset is fundamental. IGR was proposed by Quinlan [103] to define the quantitative predictive strength of the effective parameters and to select important conditioning factors for modeling. The higher the IGR value, the higher the prediction utility of a factor for modeling [19]. This method enhances the power of prediction of landslides, discarding noise factors with lower IGR. Assuming that the training data T contain n samples, Ci (landslide, nonlandslide) is a classification set of sample data, and the information entropy of the factors is calculated as follows: Estimating the amount of information (T 1 , T 2 and T m ) from T considering causal factor F takes the form of the following Equation (11): Eventually, the IGR of the landslide causal factor F can be calculated by: where Split Info denotes the potential information produced by dividing the training data T into m subsets. The formula of Split Info is shown as: If IGR > 1, the probability of landslide incidence is higher than average; if IGR = 0, the probability of landslide is equal to average; and if IGR < 0, the probability of landslide incidence is less than average [104].

Performance Metrics
To evaluate the performance of all the models, we used a number of statistical indexbased metrics: sensitivity (SST), specificity (SPF), accuracy (ACC), F1-measure, and receiver operative characteristic curve (AUC). All statistical metrics were computed based on true positive (TP), true negative (TN), false positive (FP), and false negative (FAN). Table 1 shows the mentioned statistical index-based metrics and their descriptions. Table 1. Performance metrics and their descriptions to assess the performance of the models.

Metric Formula Description
TP True positive Number of landslides (positive) that are truly classified as landslide [105].
TN True negative Number of nonlandslides (negative) that are truly classified as nonlandslide [106].
FP False positive Number of nonlandslides that are incorrectly classified as landslides [107].

FN False negative
The number of landslides that are incorrectly classified as non-landslides [9].

TP+FN
The ratio of landslides that are correctly classified as landslide. This indicates the good predictability of the landslide model for classifying landslides [108].
The ratio of nonlandslides that are correctly classified as non-landslide. This depicts good predictability of the landslide model for classifying nonlandslides [108].

TP+ TN TP+TN+FN+FN
The ratio of landslides and nonlandslides that are correctly classified [109]. This shows how well the landslide model works.
F-measure is a way to combine and balance both precision and recall into a single measure [110].
The ROC curve is plotted by sensitivity and 1-specificity, respectively, on the y-axis and x-axis [111].
The area under the ROC curve (AUC) illustrates the power prediction of a model [112].
x m − x p 2 MSE and RMSE measure the difference between measurements (x m ) and predictions (x p ) and indicate modeling error [113]

The Most Important Conditioning Factors
We obtained the relative importance of the factors influencing landslide occurrence based on average merit (AM) as IGR score through the k-fold cross-validation technique ( Table 2). Results indicated that in the lower folds (1 and 2 folds), the number of removing factors with less predictive power was higher (13 factors) than the higher folds (10-fold; 9 factors). According to Table 2, the results pointed out that from 1-fold to 10-folds crossvalidation, distance to road (AM = 0.177), road density (AM = 0. Further, the results showed that profile curvature, curvature, plan curvature, distance to river, VD, fault density, river density, TPI, and SPI, because of having AM = 0, were removed from the modeling process.  Figure 5 shows the results of training the DL model. Figure 6a,b illustrates how well the landslide (target) and nonlandslide (output) values fitted based on the training and testing datasets, respectively. A well-trained model with a high goodness of fit also has a high agreement between the target and output by the training dataset. However, high prediction accuracy of the model is inferred by the agreement between the target and output of the testing dataset. The two statistical quantitative metrics of MSE (mean squared error) and RMSE (root-mean-square error) show the modeling error of the DL model (Figure 6c,e). The values of MSE and RMSE in the training dataset were 0.0435 and 0.0208, respectively ( Figure 6c); however, these values for the testing dataset were 0.079 and 0.281 (Figure 6e). The StD (standard deviation) and mean for the training dataset were, respectively, 0.04 and 0.280, and for the testing dataset these values were 0.01 and 0.208, respectively (Figure 6d,f).

Parameter Tuning
The success rate of a model depends on selecting the optimal value of the parameters of that model. The parameter can be tuned by offline and online approaches. In the offline technique, the values of different parameters are fixed, whereas in the online approach the parameters are dynamically or adaptively controlled and updated [114]. In this study, we used the online parameter tuning approach and the results are shown in Tables 3-5.

Classification Performance
After selecting the optimal parameter values for each model, we ran the models to obtain the highest evaluation measures but the least error. Our findings for the validation phase using the testing dataset, which are briefly reported below and in Table 6, show that the DL model outclassed and outperformed the three benchmark models. The sensitivity, specificity, accuracy, F1-measure, and AUC metrics were obtained based on the four possibilities from the confusion matrix of TP, TN, FP, and FN (Table 6).
(1) The highest sensitivity (0.667) was obtained by the DL model in which the DL model correctly classified 66.7% of landslides as landslide. It is followed by NBTree, REPTree, and SVM algorithms (  Table 6).

Preparing Landslide Susceptibility Maps
We ran the DL model and also the three benchmark models (SVM, NBTree, and REPTree) and computed landslide susceptibility index (LSIs) for each pixel of the study area. We then assigned the LSIs from the DL and benchmark machine-learning models to each pixel of the study area to produce landslide susceptibility maps (Figure 7a-d). We classified the maps into the five susceptibility classes: very low susceptibility (VLS), low susceptibility (LS), moderate susceptibility (MS), high susceptibility (HS), and very high susceptibility (VHS). In DL, the range of the classes were, respectively, 0.0000875-0.

Validation and Comparison of the Models
The prediction accuracy of the DL model and the benchmark machine-learning algorithms were assessed by AUC using the testing dataset ( Figure 8). As shown in Figure 8, results indicated that the DL model had a high prediction accuracy (AUC = 0.870). In contrast, AUC for the SVM, NBTree, and REPTree models were somewhat lower, at, 0.861, 0.837, and 0.834, respectively ( Figure 8). Overall, the DL model was superior compared to the other three benchmark machine-learning models in terms of prediction accuracy.

Discussion
In recent years, the demand for accurate prediction of landslides and the production of landslide susceptibility maps has increased, in part due to the improvement of data processing techniques, but also due to the importance of landslide prediction and susceptibility mapping in more effective land-use planning and management. There are numerous approaches and methods for producing landslide susceptibility maps, but machine-learning methods based on GIS-automated techniques offer advantages such as low cost, wide scope, fast analysis, and the option for periodically updating outputs. Each machine-learning method has its specific advantages and disadvantages, and depending on the software capabilities and input data, its outputs may differ from that of other methods. The challenge many researchers face is selecting the most appropriate method. Thus, comparative analysis of the predictive performance of different machine-learning methods is a major topic in the landslide literature [115,116]. With a desire to produce a landslide susceptibility map with high prediction accuracy, we compared the predictive performance of four machine-learning methods. We first investigated the usefulness of the conditioning factors for the modeling using the information gain ratio technique with 10-fold cross-validation. The results revealed the landslides that have occurred in our study area were significantly associated with road networks, such that the distance to roads and road density factors were identified as the most influential landslide-conditioning factors. Jaafari et al. [117] and Schlögl and Matulla [118], and Sultana and Tan [119] have also reported on the strong associations between road networks and the frequency of landslide occurrences. Therefore, these regions should be the priority targets for landslide mitigation measures [119,120].
We measured the predictive performance of the models using several widely used metrics [115,121,122] and found that the DL model has obtained the first rank in all metrics used, and therefore successfully outperformed the benchmark models (i.e., NBTree, REPTree, and SVM) that have been previously used for landslide susceptibility mapping in many regions around the world [121][122][123][124][125].
DL algorithms are powerful types of machine-learning algorithms, which utilize numerous hidden layers to model complex relationships among data for pattern recognition and classification tasks, such as landslide prediction. In contrast to traditional shallow learning algorithms (e.g., backpropagation neural networks, logistic regression, and decision trees) that generate decision boundaries directly based on the original datasets [126], DL algorithms hierarchically analyze the original datasets to extract the most relevant features for the data classification [127].
Despite the infrequent applications of the DL model for the prediction of natural hazards, the superiority of this model to other models derived from machine learning has been confirmed. For example, Wang et al. [128] reported the first application of DL for landslide prediction and achieved better prediction accuracy than that of SVM. Sameen et al. [123] reported that the DL model outperformed ANN and SVM for landslide prediction. Huang et al. [129] reported that the DL model was superior to ANN and SVM for landslide prediction. Dao et al. [130] showed that the DL model could provide a more accurate prediction of landslide susceptibility compared to quadratic discriminant analysis, Fisher's linear discriminant analysis, and ANN. Dou et al. [131] reported that DL provided greater AUCs than the ANN and LR methods for landslide prediction. In a recent study, Mandal et al. [132] demonstrated improved accuracy for landslide prediction using the DL model compared to RF, ANN, and Bagging. Overall, the DL model has proven efficiency for landslide modeling and has been identified as an attractive alternative to traditional machine-learning methods.

Conclusions
We illustrated the robustness of a deep-learning model against three benchmark models (SVM, NBTree, and REPTree) for the prediction of landslide susceptibility in Kamyaran city, Kurdistan Province, Iran. First, the landslide inventory map with 118 past landslides was produced using different sources and randomly divided into two groups: 80% for the model training and 20% for the model validation. Next, using the models, the past landslides were linked to 25 conditioning factors. The performance of the models was evaluated using sensitivity, accuracy, specificity, F1-mesaure and AUROC. The results showed that although all models had acceptable performance, the deep-learning model outperformed the other models. This indicates that the DL model can be considered as a promising technique for preparing landslide susceptibility in mountainous areas prone to landsliding. Based on the results obtained from the deep-learning model, an accurate landslide susceptibility map is developed to complement previous research. The findings of this study can be used for future planning, land management, land use allocation, and government policy making, to prevent or reduce landslides in Kamyaran city. In future studies, we suggest integrating the current framework with other individual and metaclassifiers from machine learning to achieve a higher prediction accuracy for landslides, and perhaps other natural hazards.