Comparison between Deep Learning and Tree-Based Machine Learning Approaches for Landslide Susceptibility Mapping

: The efﬁciency of deep learning and tree-based machine learning approaches has gained im-mense popularity in various ﬁelds. One deep learning model viz. convolution neural network (CNN), artiﬁcial neural network (ANN) and four tree-based machine learning models, namely, alternative decision tree (ADTree), classiﬁcation and regression tree (CART), functional tree and logistic model tree (LMT), were used for landslide susceptibility mapping in the East Sikkim Himalaya region of India, and the results were compared. Landslide areas were delimited and mapped as landslide inventory (LIM) after gathering information from historical records and periodic ﬁeld investigations. In LIM, 91 landslides were plotted and classiﬁed into training (64 landslides) and testing (27 landslides) subsets randomly to train and validate the models. A total of 21 landslides conditioning factors (LCFs) were considered as model inputs, and the results of each model were categorised under ﬁve susceptibility classes. The receiver operating characteristics curve and 21 statistical measures were used to evaluate and prioritise the models. The CNN deep learning model achieved the priority rank 1 with area under the curve of 0.918 and 0.933 by using the training and testing data, quantifying 23.02% and 14.40% area as very high and highly susceptible followed by ANN, ADtree, CART, FTree and LMT models. This research might be useful in landslide studies, especially in locations with comparable geophysical and climatological characteristics, to aid in decision making for land use planning.


Introduction
In mountainous regions, landslides are regarded as one of the reoccurring natural hazards affecting human property and lives. Landside susceptibility (LS) indicates the spatial probability of landslides in an area [1]. Landslides are often triggered by earthquakes or heavy precipitations within certain geomorphological, geological and hydrological settings. However, other primary factors controlling landslide failure mechanisms, including in situ stresses, weathering and heave, play vital roles. In mountainous regions, the effect of landslides can change the topographic characteristic, forest, soil properties (consistence, structure, density, temperature, etc.), road and farming land, depending on the magnitude for modelling, and provide good results. In the Indian Himalayan context, comparison between DL techniques and tree-based models in landslide susceptibility mapping has not been considered in the literature. The accuracy of the same model may differ for landslide susceptibility modelling, and the characteristic selection of the best model is important for managing landslides. The main research questions are: (1) is a DL model (CNN) applicable in landslide susceptibility analysis in Indian Himalayan region, and, (2) can a CNN model provide a better result than tree-based ML models in the Indian Himalayan region. A research gap is found in this respect, as no comparisons have been made between the DL model and tree-based ML models in the Indian Himalayan region. This study encourages Indian researchers who are working in the landslide field. The findings of this study will provide a solid foundation for earth scientists, government officials and other stakeholders to enhance land management and disaster management.
Based on the aforementioned research gap, CNN, ANN, alternative DT (ADTree), CART, functional tree (FTree) and logistic model tree (LMT) DL, and hybrid tree ML, approaches were used for assessing landslide susceptibility in the East Sikkim Himalaya, India, in the present study. Twenty statistical measures and a receiver operating characteristic (ROC) curve were applied to evaluate the performance of the models.

Description of the Study Area
The study area of East Sikkim is a mountainous district of the state of Sikkim, in India ( Figure 1). The whole district is characterised by hilly terrain and rugged topography [35]. The highest and lowest altitudes of the study area are 4695 m and 264 m, respectively. Geographically, East Sikkim occupies the southeast corner of Sikkim, and the latitudinal/longitudinal extensions are 27 • 25 N to 27 • 8 N and 88 • 53 E to 88 • 26 E, respectively, with an area of 964 km 2 . The district is considered to be an extremely sensitive area, sharing an international border with China and Bhutan. The state capital, Gangtok, the hub of all administrative activities, is the main town of East Sikkim. In accordance with the census of India, 2011, East Sikkim district has a population of 283,583 with a population density of 295/km 2 (Census of India, 2011). Frequent occurrences of landslides are the main barrier to the development of the tourism industry and socioeconomic growth of this state. The preparation of LSM by sound techniques could be an important method to tackle this problem.

Materials and Methods
The steps and processes involved in the study ( Figure 2) are as follows: (i) Landslide inventory map (LIM) construction-Landslide locations were firstly identified by using Google Earth images to produce a LIM. Subsequently, a detailed field survey was conducted on November 2019, and historical records were acquired for verifying the location of landslides. (ii) After a literature review, and based on the geoenvironmental condition of the study area, landslide conditioning factors (LCFs) were selected, and thematic layers of LCFs were prepared on a geographical information system (GIS) platform. (iii) After selecting the LCFs based on previous literature and the geoenvironmental condition, a factor selection process was performed by using multicollinearity assessment and chi-square attribute evaluation (CSAE) techniques to choose appropriate

Materials and Methods
The steps and processes involved in the study ( Figure 2) are as follows: (i) Landslide inventory map (LIM) construction-Landslide locations were firstly identified by using Google Earth images to produce a LIM. Subsequently, a detailed field survey was conducted on November 2019, and historical records were acquired for verifying the location of landslides. (ii) After a literature review, and based on the geoenvironmental condition of the study area, landslide conditioning factors (LCFs) were selected, and thematic layers of LCFs were prepared on a geographical information system (GIS) platform. (iii) After selecting the LCFs based on previous literature and the geoenvironmental condition, a factor selection process was performed by using multicollinearity assessment and chi-square attribute evaluation (CSAE) techniques to choose appropriate LCFs for landslide susceptibility modelling. A total of 21 LCFs were chosen as appropriate factors. (iv) LSMs were then generated by using DL (CNN) and hybrid tree-based ML approach (ANN, ADtree, CART, FTree and LMT) models, and their results were compared. (v) After modelling the landslide susceptibility, ridge regression (RR) was applied to verify the importance of the selected LCFs in producing an LSM. (vi) The accuracy of each model was assessed by applying the ROC curve and 21 statistical measures. This process was performed to compare the results of the models and select the best amongst them.
Water 2021, 13, x FOR PEER REVIEW 5 of 33 LCFs for landslide susceptibility modelling. A total of 21 LCFs were chosen as appropriate factors. (iv) LSMs were then generated by using DL (CNN) and hybrid tree-based ML approach (ANN, ADtree, CART, FTree and LMT) models, and their results were compared. (v) After modelling the landslide susceptibility, ridge regression (RR) was applied to verify the importance of the selected LCFs in producing an LSM. (vi) The accuracy of each model was assessed by applying the ROC curve and 21 statistical measures. This process was performed to compare the results of the models and select the best amongst them.

Data Used
Primary and secondary datasets were used in this work. The primary data, such as size and location of the landslides, were collected through field observation by using a global positioning system (GPS) device on November 2019. The secondary data of the current study includes PALSAR DEM, geology map, soil map, Landsat 8 OLI/TIRS, topographical map and precipitation data collected from different organisations and government departments. The high-resolution phased array type L-band synthetic aperture radar (PALSAR) DEM with 12.5 m × 12.5 m spatial resolution, dated 2011, was downloaded from Alaska Satellite Facility (https://asf.alaska.edu/, accessed on 20 December 2020). The

Data Used
Primary and secondary datasets were used in this work. The primary data, such as size and location of the landslides, were collected through field observation by using a global positioning system (GPS) device on November 2019. The secondary data of the current study includes PALSAR DEM, geology map, soil map, Landsat 8 OLI/TIRS, topographical map and precipitation data collected from different organisations and government departments. The high-resolution phased array type L-band synthetic aperture radar (PALSAR) DEM with 12.5 m × 12.5 m spatial resolution, dated 2011, was downloaded from Alaska Satellite Facility (https://asf.alaska.edu/, accessed on 20 December 2020). The geology map with the scale of 1:500,000 was collected from the Geological Survey of India (https://www.gsi.gov.in/, accessed on 20 December 2020). Topographical maps with a scale of 1:50,000 were collected from the Survey of India. Rainfall data were obtained from the Indian Meteorological Department. The National Land Survey and Land Use Planning Bureau provided the soil map. Landsat 8 OLI images were collected from the USGS Earth-Explorer (https://earthexplorer.usgs.gov, accessed on 20 December 2020) with a resolution of 30 m × 30 m. The resolution of the collected data is different. For preparing the LSMs, a resolution of PALSAR DEM (12.5 m × 12.5 m) was selected as the base resolution for the collected data, and other factors having a higher or lower resolution were resampled to 12.5 m × 12.5 m.

Preparation of LIM
LIM was constructed with the integration of past and present landslide events [36]. Several sources, including primary and secondary combinations, were used to construct the LIM. Field investigation, identification of landslide location and coordinates of landslide using GPS were used as primary sources to create LIM. The field survey was conducted on November 2019. The secondary data sources, including historical landslide information (Administrative Department of Sikkim), were used to build the LIM map. High-resolution Google Earth images can easily detect landside locations in remote and inaccessible areas, and a total of 91 landslides were identified and mapped to produce the LIM layer. For training and testing the models, the same number of non-landslide locations were randomly selected. A total of 182 landslide and non-landslide samples were taken for modelling and validation purposes. Landslide sample points were divided into two subsets, namely, training and validation datasets. Various ratios were used to divide the inventory dataset. However, most studies followed the 70:30 ratio to prepare training and validation datasets [37], and a similar approach was used to divide the dataset. Dao et al. [38] and Nhu et al. [39] used DL NNs for landslide susceptibility assessment and prepared the LIM by considering 217 and 193 samples. These samples were divided into 70:30 ratio for training and testing the LSMs. Measurement of the landslides disclosed that the smallest was 153 m 2 , and the largest was 841 m 2 . The average maximum and minimum widths of the landslides were 404 m 2 and 94 m 2 , respectively. Four types of landslides were found, namely, rock falls (37.1%), debris slides (31.08%), rotational slides (20.28%) and complex and compound (combination in time and/or space of two or more principal types of movement) slides (11.54%). Some field photographs are shown in Figure 3.

Preparation of LCFs
Consideration of relevant LCFs in the modelling process is extremely essential to model the landslide susceptibility of a given area. Twenty-one LCFs were used for landslide modelling. Sixteen out of 21 factors, such as elevation, slope, aspect, plan curvature, profile curvature, general curvature, tangential curvature, terrain ruggedness index (TRI), topographic wetness index (TWI), cross sectional curvature, longitudinal curvature, topographic position index (TPI), valley depth (VD) and relative slope position (RSP), are topographical factors that were extracted from the PALSAR DEM. The other five factors, namely, rainfall, geology, soil, land use/land cover (LULC) and normalised difference vegetation index (NDVI), were collected from different sources. These factors were categorised as topographical factors, and other environmental factors are detailed below.

Topographical Factors
Elevation is regarded as an important factor because higher altitudes have been found to be a major cause of landslide occurrence in mountain regions [40]. The LCFs, such as slope, aspect, topographic position index, convergence index, plan curvature, terrain ruggedness index, profile curvature, general curvature, tangential curvature, topographic wetness index, longitudinal curvature, cross section curvature, surface area, VD, and RSP, were extracted from ALOS PALSAR DEM by using the SAGA GIS tool [38]. Slope is considered to be an important factor because it influences the occurrence of landslides and speed of sliding materials. The slope value in the study area varies from 0 (flat) to 80.16 • , indicating an extremely steep gradient ( Figure 4b) and the possibility of instability. Slope aspect (Figure 4c) indicates the direction of the slope, and has a strong relationship with landslide. It has an indirect influence because it determines the amount, duration, intensity of insolation intake, intensity of rainfall received, vegetation cover, and soil structure and texture [40]. Plan curvature is a horizontal plane and is considered the hypothetical line that crosses the contour of a given cell and regulates the stability of a landmass (Figure 4d). The surface curvature in the direction of the steepest slope is known as the profile curvature. The flow velocity of surface water, soil erosion and deposition are influenced by the profile curvature. The erosion prevails in the convex surface region, and the concave curvature is the area to be deposited [41]. The spatial distribution of the profile curvature ranges from −0.095 to 0.140 (Figure 4e). The general curvature, which was proposed by Wood [42], is defined as the total curvature that has been created by the intersection of the surface with a plan. It can be classified into three categories: convex, concave, and flat surface, corresponding to the peaks, valley, and plain areas, respectively. The spatial value of general curvature varies from −0.819 to 0.934 (Figure 4k). Wilson and Gallant [43] suggested a tangential curvature, which is curvature to a steep gradient along the orthogonal line. The value of tangential curvature in the present study ranges from −0.115 to 0.167 ( Figure 4l). A similar approach of tangential curvature is found in longitudinal curvature [42]. The curvature of the line between the surface and the plane is conceptually identical and defined by the direction of slope and aspect. It can be viewed in the same way as curvature of the profile, indicating how a liquid substance is going to accelerate or decelerate over a point. The spatial distribution of the longitudinal curvature ranges from −0.585 to 0.627 (Figure 4n).

Topographical Factors
Elevation is regarded as an important factor because higher altitudes have been found to be a major cause of landslide occurrence in mountain regions [40]. The LCFs, such as slope, aspect, topographic position index, convergence index, plan curvature, terrain ruggedness index, profile curvature, general curvature, tangential curvature, topographic wetness index, longitudinal curvature, cross section curvature, surface area, VD, and RSP, were extracted from ALOS PALSAR DEM by using the SAGA GIS tool [38]. Slope is considered to be an important factor because it influences the occurrence of landslides and speed of sliding materials. The slope value in the study area varies from 0 (flat)

Multicollinearity Assessment
A collinearity test must be performed between the LCFs because a linear association reduces the predictive precision of the model [47]. The multicollinearity test is an im- Cross section curvature was also introduced by Wood [42]. The spatial value of cross section curvature ranges from −0.386 to 0.550 ( Figure 4m). The calculation of the surface area was conducted by the slope and slope aspects of a specific cell applying Berry's approach [44]. The surface area is the same as the planimetric area. The surface topography inside the cell is defined by its value. The value of the surface area ranges from 156.25 to 855.81 in the study area ( Figure 4j). The convergence index is a terrain parameter showing the relief structure as a collection of convergent (channel) and divergent (ridge) areas (Equation (1)).
where θ denotes the average angle between the aspect of adjacent cells and the direction to the central cell. In this field of analysis, the CI spatial value varies from −100 to 100 ( Figure 4f). TWI represents the degree of wetness of the surface (Equation (2)). The TWI value ranges from 1.25 to 21.42 in the study area (4i).
where As is the upstream contributing area, and a is the slope gradient (in degrees). The TPI is the certain gap between the cell elevation value (Z 0 ) and the average surrounding cell elevation (Z) (Equation (3)).
The TPI varies from −23.91 to 26.23 ( Figure 4g), with positive values indicating that the cell is higher than its surrounding area, whereas negative values indicate that the cell is lower. TRI was defined by Riley et al. [45] to show the amount of elevation difference between adjacent cells in the digital elevation model (Equation (4)).
where x is the elevation of each neighbour cell to a specific cell, and max and min are the largest and smallest elevations amongst the nine neighbouring pixels. VD is measured as the vertical distance to the base level of the channel network. The algorithm consists of two key steps: the interpolation of the base level of the network channel, and the subtraction of the base level from the original elevations. The value of VD ranges from 0 to 787.27 ( Figure 4o). RSP is another important factor for determining the stability-instability of a land part (Figure 4p).

Other Environmental Factors
Rainfall is the triggering factor for the majority of landslides. The empirical relationship between landslides and rainfall type has been identified in many previous studies [46]. Five-year annual average rainfall data were used for the spatial mapping of rainfall by using the kriging method in GIS. The annual average rainfall is 2264 mm in this region ( Figure 4q). The spatial map of geology ( Figure 4u) was prepared with the digitisation process. Detailed geological descriptions are presented in Table 1. Ten soil categories were mapped for this study area (Figure 4t). Amongst them, coarse loamy humic dystrudepts associated with coarse loamy typic udorthents, and coarse loamy humicpachic dystrudepts associated with fine loamy type udorthents, are the dominant types of soil ( Table 2). The LULC map was prepared by using Sentinel-2 satellite images through supervised classification determined by a maximum likelihood classification algorithm ( Figure 4s). Water bodies (0.27%), wasteland (2.00%), settlement and built-up area (1.65%), grassland (5.47%), evergreen forest (47.18%), agriculture (31.92%) and plantation vegetation glacier (11.51%) are the major land use categories of the study area. The NDVI value ranges between −0.15 and 0.46 in the study area ( Figure 4r) and indicates the concentration of vegetative cover. The NDVI of the study area was calculated by using Equation (5).

Multicollinearity Assessment
A collinearity test must be performed between the LCFs because a linear association reduces the predictive precision of the model [47]. The multicollinearity test is an important step to investigate if a strong interrelationship may be found amongst the conditioning factors in multiple regression. In this work, a multicollinearity test was performed by applying the variance inflation factor (VIF) and tolerance (TOL) criteria to obtain the relevant LCFs [48]. The thresholds of the TOL and VIF values are ≤2 and ≥5, and values that exceed these thresholds suggest the existence of collinearity between or amongst the LCFs [48].

CSAE
CSAE method has been used to select the most predictive factors for LSM modelling. CSAE has been used in different artificial intelligence approaches to choose a small number of training datasets, to minimise the time and cost of the modelling process [49]. In this method, zero average merit (AM) values of conditioning factors denote the significance of less contribution to modelling. On the contrary, higher than zero average merit (AM) values indicate the most important and suitable conditioning factors for LSM modelling. The CSEA is calculated by using Equation (6).
where E and O are the expected and observed values. The higher the value of CSAE of a given LCF indicates greater significance for occurrences of landslides [43].

Evaluation of Factor Importance by Using RR
RR was firstly developed by Hoerl and Kennard [50]. It belongs to a class of regression that uses L2 regularisation to reduce the issue of overfitting. RR was developed to avoid the extensive problems of instability and collinearity caused by the least square estimator [51]. RR predictions tend to be robust in the sense that minor variations in the data, in which the fitted regression is found, are minimally influenced. At times, the ridge calculated regression function can have strong estimates of the mean responses or forecasts of new observations for independent variable thresholds beyond the area of the observations where the regression function is built. The cost function is changed in RR by adding a penalty equal to the square of the coefficients' (w) size.
Similarly, the cost function in Equation (8) should be minimised under the following conditions.
Thus, RR constrains the coefficients (w). The penalty term regularises the coefficients so that the optimisation function is punished if the coefficients take high values. In this research, the R programming language 'caret' package (https:/cran.r-project.org/web/ packages/caret/caret.pdf, accessed on 20 December 2020) was used to determine the importance of LCFs by using RR.
3.7. DL and Tree-Based ML Models 3.7.1. CNN DL approaches are computational frameworks that are based on NNs, inspired by human brains [18]. CNNs are widely applied in visual imagery processing [52]. They are recognised as shift invariant or space invariant ANNs due to their shared-weight architecture and translation invariance functions [53]. CNN was first developed in 1980. To date, the CNN model has been widely used in classification and prediction applications in several branches, including the earth science discipline [54]. A CNN is made up of input data, an output layer, and numerous hidden layers. The hidden layers of a CNN usually consist of a sequence of convolutional layers (CLs) converging with a multiplication. The activation function is normally a rectified linear unit (ReLU) layer, and additional convolutions, such as pooling layers, fully connected layers and normalisation layers, are subsequently adopted. These layers are referred to as hidden layers because the activation function and final convolution mask their inputs and outputs. The input data of CNN are images and can be interpreted as images. CLs train the convolutions and give the best performance for data categorisation. PLs provide stable conversion, reduce overfitting and increase computational performance by reducing the number of structures resulting in the convolutions [55]. Using ReLU activation function, a ReLU increases the nonlinear properties of the network. Various techniques based on the form, image and purpose of data have been used by various researchers. The detailed description of these layers, primary parameters and how the CNN handles training data can be found in the literature of LeCun et al. [18]. One-dimensional image data are CNN input data used for optimal initialisation performance. A 1D input data, which contain various characteristics, can be converted into 2D images. The CNN's linearity is decreased by using a ReLU for all linked layers. With the aid of Equation (9), the loss value of the parameters is reduced by utilising the loss function.
where x i and z i represent the predicted and true levels of the ith sample, respectively, and m denotes the total number of landslide data. The variables are updated regularly until the loss value converges. In the current study, this approach is used for susceptibility mapping of landslides because images are larger, no data are classified, and analysis is continuous. The CNN structures of the present study are shown in Figure 5. In this study, 3 convolution kernels, 2 pooling kernels, ReLU activation, 600 epochs, AdaGrad and a 0.001 learning rate are used for modelling the landslide by using the CNN DL method.

ANN
ANNs have been extensively used in classification operation [56]. The three main structures of ANN are input layers, hidden layers and output layers ( Figure 6). The input layers are LCFs, the outcome layers reflect the outcomes of model predictions as a landslide or non-landslide, and the hidden layers are the classifier layers that process and convert the data from input to output. In ANN, no presumptions are required for the training datasets. Determining the relative value of the different input measures is not needed, and most input measurements are chosen on the basis of weight change during the training process [57]. ANN is constructed with two main steps [58]. ( ) where f(x) WW shows the hidden functionality that is enhanced during the modelling by the adjustable weights to produce the ANN network structure. The sum of square difference between the actual and expected result of neuron, that is, known as error, is used to select the number of hidden layers and neuron in ANN. In each neuron, weight is adjusted to reduce the error during the training process. In the present analysis, the trial process was used to eliminate the overfitting and create the ANN model with 2 hidden layers, 500 epochs and 20 number validation thresholds.

ADTree
An ADTree is a classification ML strategy that generalises DTs and links to boosting. Freund and Mason [59] introduced this technique. An ADTree is an alternation of decision nodes describing a predicate position, and prediction nodes containing a single number. An instance is categorised by an ADTree by following all ways that all decision nodes are Let x = x i , i = 1, 2, .....21 the vector of the 21 LCFs, y = 1 or 0 indicates the landslide and non-landside class. A nonlinear sigmoid feature is frequently used to the weighted sum of input data until the data are passed to the next step. ANN function for classification is calculated by using Equation (10). (10) where f(x) WW shows the hidden functionality that is enhanced during the modelling by the adjustable weights to produce the ANN network structure. The sum of square difference between the actual and expected result of neuron, that is, known as error, is used to select the number of hidden layers and neuron in ANN. In each neuron, weight is adjusted to reduce the error during the training process. In the present analysis, the trial process was used to eliminate the overfitting and create the ANN model with 2 hidden layers, 500 epochs and 20 number validation thresholds.

ADTree
An ADTree is a classification ML strategy that generalises DTs and links to boosting. Freund and Mason [59] introduced this technique. An ADTree is an alternation of decision nodes describing a predicate position, and prediction nodes containing a single number. An instance is categorised by an ADTree by following all ways that all decision nodes are valid and by combining any estimation nodes. This varies from binary classification trees, such as CART or C4.5, in which an instance takes only one direction through the tree. The ADTree model is more accurate in the event of a classification task than other tree models [60]. Splitter node and prediction nodes are the two main types of ADTree model nodes. The function of the splitter node is that it classifies data in accordance with the selected attributes. The prediction node is the number score used to forecast [60]. The basic principle that maps from instances to actual numbers is a prediction d 1 , a base situation d 2 , with the result being p or q which are two real numbers. The prediction is p when d 1 ∩ d 2 or q when d 1 ∩ − d 2 . The values of p and q are calculated by using Equations (11) and (12).
where the best d 1 and d 2 are selected by minimising Z t (d 1 , d 2 ) and is defined as Equation (13).
Assuming M is the base rule, then a new rule can be defined as M t + 1 = M t + r t . r t (x) which shows the two prediction values (p and q) at each layer of the tree, and x is a set of instances. The grouping can be used as an indication that accumulates prediction values in M t + 1 such as: A set of routes is constructed for each pixel in the training dataset. When a path hits a decision node, it will continue with the offspring that corresponds to the decision outcome, but if the path encounters a prediction node, it will continue with all of the node's offspring [59]. A pixel's susceptibility index is calculated by adding all the values from any prediction nodes found as the pixel filters down through all applicable branches. A parameter that must be determined is the number of boosting iterations. Larger trees and overfitting may arise from a large number of boosting iterations, whereas a minimal number of boosting iterations may result in a small tree with poor performance. The classification accuracies of the training and validation datasets were estimated by changing from 1 to 14. The model was prepared by using the RWeka package in R environment.

CART
CART is a rule-based algorithm that constructs a binary tree by binary recursive partitioning. Binary recursive partitioning is a method that partitions a node into a yes/no response. The heterogeneity within each resultant subset is reduced on the basis of a single factor and the rule generated for each phase, which divides them depending on the various relationships of each division. Landslide susceptibility mapping using the CART technique has been used in several studies [61]. A "terminal" node's expected value is considered the average of the answer values in that node [62]. The predictor variables are extremely simple and can be comprised of different types: numeric, binary and categorical types. The model's results are not affected by monotonous transformations and different measurement scales between predictors. In regression trees, independent variables are insensitive to outliers and use surrogates to manage missing data [62]. The hierarchical structure of a regression tree indicates that the response to one input vector relies on higher input variables in the tree to model relationships between predictors automatically. Regression trees typically lead to an overcomplex decision tree where only the most relevant knowledge, that is, the nodes that illustrate the largest amount of deviance, needs to be 'pruned' to communicate [61]. CART, similar to other DT algorithms, does not need the identification of independent variables in advance because the most relevant variables are discovered during the selection of the optimal splitting characteristic in each node [62]. Thus, CART is appropriate for issues in which the correlation between input and output parameters is unknown in advance, making the CART model's outputs interpretable [61]. Depending on whether the output is qualitative or quantitative, CART can be used to solve classification and regression issues. CART is used in this study as a classifier for landslide susceptibility. "tree" package in R-studio was used for preparing the CART model.

FTree
FTrees were introduced by Gama [63]. FTree integrates a discriminating function and multivariate decision trees through constructive induction [63]. It is regarded to be a generalisation of multivariate trees. For learning classification trees, the FTree model adjoins with attributes at leaf nodes, decision node and leaves [63]. Decision nodes are built as the tree grows, and functional leaves are built as the tree prunes [63]. For prediction evaluation, the FTree can predict the value of dependent factors from the unclassified samples. The sample travels from the root node into a leaf over the tree in which the constructor function built at the node is used for collecting the attributes of the sample. Subsequently, the node's decision test is used to define the path by which the sample will take. With the help of the constructor function built on leaf and leaf-related constant, the sample is classified as leaf [63].
Classification tree nodes are created by comparing the values of specific input characteristics to a constant. Three important parts of the FTree are: the regression model (RM), which is utilised in FTree for internal nodes and leaves; for inner nodes, inner FTree uses RMs; and for leaves, FTree leaves utilise RM. FTree leaves were used in this study. FTree applied the gain ratio as a criterion for splitting to determine the input attribute to separate. To avoid overfitting, C4.5 pruning was used, and logit boost (iterative reweighting) was applied to resemble the leaves with the least squares for each class in accordance with the logistic regression functions [63].

LMT
LMT is a classification model in computer science that integrates logistic regression with DT learning, with the related supervised training algorithm [64]. The earlier concept of a model tree is used in logistic LMT. A DT on its leaves uses linear regression (LR) models where a piecewise constant model is generated by ordinary DTs with constants on their leaves [64]. This process is performed to obtain a piecewise linear regression model. The LogitBoost algorithm is used in the logistic version to generate a logistic regression model at each tree node. The model uses cross-validation for searching the multiple LogitBoost iterations to control overfitting of training data. For each Mi class, the LogitBoost model utilises least-squares fitting additive logistic regression, and the later likelihood of leaf nodes is measured by LR [65].
where β i is the coefficient of the ith element of vector x, n is the total factors, and D is the total classes. In the LMT model, the posterior probabilities of leaf nodes were computed by using the linear logistic regression technique [64].

Validation Methods
LSMs produced using CNN, ANN, CART, ADTree, FTree and LMT models were evaluated through 21 statistical measures mentioned in Table 3. The ROC curve is constructed with the true positive rate (TPR) versus the false positive rate (FPR). The TPR consists of landslide cells, which indicate the most susceptibility. The FPR consists of non-landslide cells, which indicate the non-susceptibility. If the area under the ROC curve remains extremely close to 100%, then the model reflects higher goodness-of-fit and excellent accuracy [66]. The AUC is indicative of the consistency of a calculated model [67]. The ROC curve consists of four elements, being, true positive (A), false positive (B), true negative (C) and false negative (D) [68] (Equations (17) and (18)).
where FPR and TPR are the false positive rate and true positive rate. The AUC is computed by using Equation (19).
where P and N are the total number of landslides and non-landslides, respectively. Amongst the statistical measures, lower values of false negative rate (miss rate) and misclassification rate designate a higher model accuracy. By contrast, higher values of the remaining measures denote high model accuracy. The computation methodology of these parameters is presented in Table 3. These various measures are computed to omit the statistical error of using a single dimension. Based on these measures, prioritisation of models in terms of accuracy is conducted by using the compound factor (CF) method to determine the hierarchy of best-fit for this study. In the CF method, consecutive rank is given in accordance with the relevance of the factors targeting the goal. The mean values of the assigned ranks of each factor represent their relative priority [69]. CF method can be formulated as: (20) where rank is represented by R of the factor, and F n is the number of factors. FPR or fall-out or 1-specificity TNR or specificity Miss rate Accuracy A+C T [67] Misclassification rate B+D T [70] PPV or precision A A+B [71] False discovery rate (FDR) Negative predictive value (NPV) C C+D [71] False omission rate (FOR) F-score 2A 2A+B+D [71] Matthews correlation coefficient (MCC) Bookmaker informedness (BM) TPR + TNR − 1 [71] Markedness (MK) PPV + NPV − 1 [71] Threat score (TS) A A+D+B [71] Equitable threat score

Analysis of Multicollinearity
TOL and the VIF were used in this study to test the problem of collinearity amongst the LCFs. The derived tolerance values were >0.2, and the VIF values were less than <5, thereby confirming the nonexistence of linearity between the factors. The outcome of the linearity test is shown in Table 4. A total of 21 LCFs were selected after the analysis ( Table 4). The highest TOI value and lowest VIF were derived from NDVI (Table 4).

Results of CSAE
The detailed results of CSAE are shown in Figure 7. Amongst the factors, rainfall (AM = 94.52) was identified as the most important predictive factor, followed by the slope aspect, elevation, NDVI, LULC, VD, soil map, TRI, slope, surface area, TPI, geology, RSP, cross sectional curvature, general curvature, longitudinal curvature, CI, tangential curvature, profile curvature, plan curvature and TWI (Figure 7).

Importance Analysis of LCFs by RR
A total of 21 LCFs were tested through RR to analyse their importance for landslide modelling. The outcome of RR revealed that rainfall (RR = 0.377) had the highest predictive capability in this study. Comparatively, other factors, such as elevation (RR = 0.077),

LSMs
LSMs were prepared by using DL and novel tree ML approaches, such as CNN, ANN, CART, ADTree, CART, FTree and LMT tree models. The LSMs are shown in Figure 8. The LSMs were divided into five susceptibility classes by using the natural break method of Jenk, as depicted in Figure 9. The LSM of CNN model assumed that the very high susceptibility class occupied an area of 23.02 and 14.40% under high, 23.07% under moderate, 24.34% under low, and 15.17% under very low susceptibility level of the study area (Figure 9). In the case of the ANN model, the very high, high, medium, low, and very low susceptibility zones encompassed the area of 13.36%, 13.22%, 19.55%, 31.53% and 22.34%, respectively ( Figure 9). In the case of the tree ML models, the results were relatively different. For instance, the LSM of ADTree model assumed only 11.80% of the district under very high susceptibility zone and 20.86% under very low, 28.33% under low, 23.83% under moderate, and 15.18% under high susceptibility classes (Figure 9). The percentage distributions of the susceptibility classes of LSM of the CART model for very high, high, moderate, low, and very low classes were 18.59, 28.38, 31.63, 18.17 and 3.23%, respectively (Figure 9). The FTree model predicted 4.77% of the total area as very low, 12.18% as low, 20.70% as moderate, 27.44% as high, and 34.90% as very high susceptibility zones of landslides ( Figure 9). The LSM of the LMT model categorised 6.54, 16.71, 26.05, 26.65 and 24.05% of the district under very low, low, moderate, high and very high susceptibility zones (Figure 9).

Importance Analysis of LCFs by RR
A total of 21 LCFs were tested through RR to analyse their importance for landslide modelling. The outcome of RR revealed that rainfall (RR = 0.377) had the highest predictive capability in this study. Comparatively, other factors, such as elevation (RR = 0.

Validation and Accuracy Assessment
The predictability of the models was validated by using the AUC of ROC and several other statistical measures ( Table 5). The success rate curve (using the training dataset) was drawn for each model. The result showed that CNN was the best fit with an AUC of 0.918, followed by CART (AUC = 0.910), LMT (AUC = 0.905), FTree (AUC = 0.900), ANN (AUC = 0.843) and ADTree (AUC = 0.745). The predictive capability of the models was assessed by using the prediction rate curve, which provided a similar result to the success rate curve. The AUC for the CNN model under the prediction rate curve was 0.933 and 0.925 for CART, 0.920 for LMT, 0.910 for FTree, 0.889 for ANN and 0.785 for ADTree model ( Figure 11). Table 5 contains the summary statistics of other measures. The outcome of CFbased prioritisation provided the relative priority ranking of models by considering all the exactness metrics calculated using the training and validation data sets. Using the training dataset, most priority was assigned to the CNN model because it ranked 1, followed by CART (rank 2), LMT (rank 3), ANN model (rank 4), FTree (rank 5) and ADTree (rank 6) model. In the validation dataset, the result was the same as the training set (Table 5)

Validation and Accuracy Assessment
The predictability of the models was validated by using the AUC of ROC and several other statistical measures ( Table 5). The success rate curve (using the training dataset) was drawn for each model. The result showed that CNN was the best fit with an AUC of 0.918, followed by CART (AUC = 0.910), LMT (AUC = 0.905), FTree (AUC = 0.900), ANN (AUC = 0.843) and ADTree (AUC = 0.745). The predictive capability of the models was assessed by using the prediction rate curve, which provided a similar result to the success rate curve. The AUC for the CNN model under the prediction rate curve was 0.933 and 0.925 for CART, 0.920 for LMT, 0.910 for FTree, 0.889 for ANN and 0.785 for ADTree model ( Figure 11). Table 5 contains the summary statistics of other measures. The outcome of CF-based prioritisation provided the relative priority ranking of models by considering all the exactness metrics calculated using the training and validation data sets. Using the training dataset, most priority was assigned to the CNN model because it ranked 1, followed by CART (rank 2), LMT (rank 3), ANN model (rank 4), FTree (rank 5) and ADTree (rank 6) model. In the validation dataset, the result was the same as the training set (Table  5).

Discussion
In this research, DL and tree ML models were chosen for the spatial susceptibility assessment of landslides in the East Sikkim region. A comparative analysis was performed with other state-of-the-art DL models and tree ML models. The DL model, namely, CNN, and ML models, such as ANN, ADTree, CART, FTree and LMT, were used to produce LSMs of the East Sikkim region. The results showed that the CNN model had the highest goodness-of-fit and excellent predictive capability, followed by ANN, ADTree, CART, FTree and LMT models. Several studies related to landsides have emphasised the different modelling approaches, such as ML, DL and ensemble ML. The DL and ML approaches have been considered as robust and efficient tools and have been used in different fields of geographical research, geotechnical application, and natural hazards, including LS mapping [72,73]. Wang et al. [74] applied DL and ML models, such as logistic regression, SVM and RF models, for LS assessment. Their research proved that CNN had the highest peak performance of predictive modelling, followed by the ML models. Moosavi et al. [75] compared the ANN and SVM by using pixel-based methods to produce the predictive model of landslides. CNN and texture shift recognition with pre and post landslide optical images for automated landslide detection were introduced by Ding et al. [54]. Ghorbanzadeh et al. [76] applied DL and ML approaches, such as ANN, SVM, RF and CNN models, to predict the most landslide-affected areas in Rasuwa District, Nepal. CNN performs several convolution and pooling operations to retrieve the characteristics. These features become more sophisticated and more abstract with the growth of convolutions and pooling. The degree of susceptibility to landslides, which was the deciding factor in assessing the susceptibility to landslides, is represented in these abstract features. CNN decreases the number of weights that need to be trained and the numerical complexity of the network [77]. The main advantages of the CNN model are that it considers all neighbourhood information and can determine manifold stages of representations from input data [77]. In agreement with the results of these prior studies, the present work confirms the CNN model as achieving the relatively highest adaptability, as demonstrated by the produced validation and accuracy measure results.
Using the mechanisms of factor selection is relevant to maximising the efficiency of landslide models by eliminating unwanted or trivial variables before training them [78]. Multicollinearity analysis (MA) and CSAE methods were applied, thereby confirming 21 LCFs, amongst the selected factors, were suitable and appropriate for this study. Effective application of the two methods can be found in various literature, such as [79]. Following Jenk's algorithm of natural breaks classification, LS maps were divided into five groups of susceptibility classes. In LSMs, these classification techniques have been widely used in the literature [80]. This strategy of clustering data helps to reduce the mean variance of each class from the mean within the class range and to increase the discrepancy between each class from the means of the other classes [81]. These studies have suggested the effective use of these classification techniques for this research. In accordance with the natural break classification method, 23.02 and 14.40% of areas were covered under very high to high susceptibility classes by the best fitting CNN model. The area covered by the high and very high susceptible classes of CNN model was greater than the ANN and ADTree models ( Figure 9).
An accurate LSM is a vital and reliable tool that helps to map the present, and predict the future, landslides of an area. In this research, LSMs are justified by using the ROC curve and several other accuracy measures. The ROC curve is a widely accepted validation technique in various fields of research [82]. Consideration of various techniques can be more effective and more justified in resolving the issue than a single approach of validation. Therefore, a prioritisation approach using CF was performed amongst the models by considering the ROC and 23 other measures to determine the best result. CF has been widely accepted for prioritisation studies, especially in watershed analysis [69]. Many researchers have emphasised the need to examine and compare susceptibility models for landslides with different approaches and techniques, because a small percentage of improvement in precision can control the resulting susceptibility areas of landslide [83,84]. Compared with the use of traditional models, the implementation of a CNN model may seem a more challenging task, but its slightly higher predictive efficiency is of considerable significance [84]. The accuracy of a CNN model with regard to AUC is greater than conventional ML models. In our study, CF-based prioritisation provided the relative priority amongst the models in which CNN model ranked first (Table 5), designating the best fit of this DL model, similar to previous studies.
Another significant part of this study is the factor contribution analysis in producing LSM. All factors in a region are not equally responsible for causing landslides. Several researchers have used different methods for the analysis of the importance of conditioning factors [85][86][87][88][89]. These methods include Relief F test, CSAE, information gain ratio and RR. All these methods have been applied successfully to determine the degree of association of the factors with the goal. In this study, RR was adopted, confirming factors such as rainfall, NDVI, LULC, slope aspect, elevation, and TRI, as the most important driving factors for landslides occurring in the study area. The outcome showed that the most vulnerable zones of the landslide are found in the northern portion of the district where soil condition, weak geology, torrent runoff, high altitude, steep sloping, rugged topography and heavy rainfall are the chief reasons for the occurrence of landslides. The East Sikkim district is covered with high hill-slope mountains. Therefore, the topographical ruggedness and steep sloping with certain geological structure, bears instability in landmasses in different parts of the study area. During the monsoon period, heavy precipitations and thunderstorms work as accelerating factors of landslides. Human activities, such as road construction, building construction, tea plantation and other agricultural activities are disturbing the balance of the slope gradient, thereby causing man-made landslides. The role of these factors, especially in the hillslope regions, were found to be significant in the previous literature [89,90]. However, every research work has a certain limitation. The limitation of the present study is the absence of some geological properties, such as joint, foliation and bedding. Only surface geology data were used. However, despite this limitation, the current study has good scope to accurately demarcate the landslide-susceptible area for future planning and management.

Conclusions
Every year, landslides cause tremendous damage to humans and their properties. Therefore, a comprehensive strategy is urgently needed to resolve this phenomenon. Identifying the places where landslides have previously occurred, and regions that are prone to landslides, is necessary to avoid future landslides. Different approaches and procedures may be used to geographically forecast landslide-prone regions, with some being more accurate than others. In this study, DL and hybrid tree ML models, such as CNN, ANN, ADTree, CART, FTree and LMT models, were used to generate LSMs. The results showed that DL models outperformed tree ML models in this analysis. These methods involved identifying possible landslide sites and implementing preventative and management measures. The CNN DL model had the best prediction performance amongst the models in this study, followed by ANN, ADTree, CART, FTree and LMT. Rainfall, NDVI and LULC were found to have greatly contributed to making an area prone to landslide based on RR. Therefore, future damage in mountainous areas or areas with comparable physiographic characteristics may be prevented by generating more precise LSMs using DL techniques, such as the CNN model, and selecting appropriate LCFs for generalisation. The major benefit of the three DL and DT models is that they automate the task of searching numerous databases for useful information. However, these methods have some drawbacks, including that pre-processing might take a long time due to the numerous operations involved. The LSMs developed in this study can be used for landslide mitigation strategies in the East Sikkim district region. LSMs can assist decision makers in making development and planning decisions. These maps can be used as a preliminary step for landslide risk management research. Landslide forecasting is relatively accurate because DL and tree-based modelling are paired with RS and GIS spatial data. We only considered the region of East Sikkim in the Indian Himalaya for our study. The occurrence of landslides in other regions of India should be investigated in future research to give data that are more relevant. If the proper variables for measuring landslide incidence are identified, further study will enable comprehensive management and control of landslides in vulnerable regions.