Comparative Study of Convolutional Neural Network and Conventional Machine Learning Methods for Landslide Susceptibility Mapping

: Landslide susceptibility mapping (LSM) is a useful tool to estimate the probability of landslide occurrence, providing a scientiﬁc basis for natural hazards prevention, land use planning, and economic development in landslide-prone areas. To date, a large number of machine learning methods have been applied to LSM, and recently the advanced convolutional neural network (CNN) has been gradually adopted to enhance the prediction accuracy of LSM. The objective of this study is to introduce a CNN-based model in LSM and systematically compare its overall performance with the conventional machine learning models of random forest, logistic regression, and support vector machine. Herein, we selected Zhangzha Town in Sichuan Province, China, and Lantau Island in Hong Kong, China, as the study areas. Each landslide inventory and corresponding predisposing factors were stacked to form spatial datasets for LSM. The receiver operating characteristic analysis, area under the curve (AUC), and several statistical metrics, such as accuracy, root mean square error, Kappa coefﬁcient, sensitivity, and speciﬁcity, were used to evaluate the performance of the models. Finally, the trained models were calculated, and the landslide susceptibility zones were mapped. Results suggest that both CNN and conventional machine learning-based models have a satisfactory performance. The CNN-based model exhibits an excellent prediction capability and achieves the highest performance but also signiﬁcantly reduces the salt-of-pepper effect, which indicates its great potential for application to LSM. Town and Lantau Island, as study areas that suffer from catastrophe earthquakes and heavy rainfall, respectively. The results demonstrate that the CNN-based model achieved a superior performance over the conventional RF, LR, and SVM models. Moreover, we presented a detailed guide for generating the training datasets for deep learning model-based susceptibility modeling. The following conclusions can be drawn:


Introduction
Landslides are one of the most serious hazards and are driven by geomorphology, geology, hydrology, and human activities [1]. Heavy rainfall, earthquake, and anthropogenic activities can directly trigger catastrophic landslides. A review by Rawshan et al. [2] mentioned that more intense landslides would happen under the background of the increasingly extreme weather events associated with climate change. When a large landslide occurs, substantial casualties and infrastructure destruction can be caused. The prediction and management of landslides are thus necessary to prevent and mitigate the losses caused by such a hazard. However, due to the lack of reliable precursory data, it is generally difficult to predict landslides in real-time at a good precision, i.e., forecasting their time, However, the literature review shows that previous research focused on comparing machine learning and statistical methods, while only a few studies have attempted to compare the CNN with conventional machine learning methods [36][37][38][39]. To fill the gap, it is necessary to investigate these techniques and make a quantitative-systematic comparison. Comparative study is a common research form in LSM literature, which is designed to explore and compare the different models to realize reliable modeling results [40,41]. In light of this, it is considered meaningful to make a comparative study of CNN and conventional machine learning methods, two types of models with different data organization structures. To best of our knowledge, this work is the first to propose a detailed comparative study on the application of the CNN-based model and conventional machine learning methods in LSM, which could provide an effective guide for researchers studying the susceptibility assessment of natural disasters.
The objective of this study is to introduce a CNN-based model in LSM and systematically compare its overall performance with the conventional machine learning models of random forest, logistic regression, and support vector machine in two typical study areas constrained by the external environment (earthquakes and rainfalls). This work also compares in detail these two types of models in terms of the dataset preparation and model effectiveness. More specifically, three typical machine learning models, i.e., random forest, logistic regression, and support vector machine, are selected as the pairwise comparison. These three models have been proven to have good performance in LSM. Two landslide inventories containing earthquake-triggered landslides and rainfall-induced landslides are fed into the conventional machine learning models in terms of matrix form and fed into a CNN-based model in terms of three-dimension (3D) tensors. A series of indices, including the receiver operating characteristic (ROC), root mean square error (RMSE), accuracy, and Kappa coefficients, are used to perform the model evaluation on the training and testing sets.

Study Areas
There are many triggering factors that contribute to landslide occurrences, such as heavy rainfalls, earthquakes, and anthropogenic activities. Of these, heavy rainfalls and earthquakes are common disasters around the world. In this work, we selected two typical areas, Zhangzha Town and Lantau Island, as study areas that suffer from a catastrophe earthquake and heavy rainfall, respectively.
Zhangzha Town is in Jiuzhaigou county on the eastern edge of the Tibetan Plateau, located in Sichuan Province, southwest China, shown in Figure 1a. This area covers approximately 1353 km 2 spanning 103 • 38 E and 104 • 40 E, and 32 • 54 N and 33 • 24 N. Two major geomorphic units, the Tibetan Plateau and the Sichuan Basin, transitioned here and formed a typical valley zone with large undulations and steep slopes. About half of the study area has a slope degree > 30.58 • . The majority lithological units are Triassic (40%) and Carboniferous (38%). The tectonic activity has led to frequent earthquakes here. A total of 49 earthquakes with Mw > 5.0 have occurred around the region in the last 100 years [42]. The latest one was the 8 August 2017 Mw 6.5 Jiuzhaigou Earthquake, triggering a large number of co-seismic landslides. These landslides caused severe damage to the natural landscape and disruption to the local traffic.
Lantau Island is the largest outlying island located at the southwest of Hong Kong SAR, covering an area of around 147 km 2 . As shown in Figure 1b, the undulating and steep terrain occupies most of the area of the island, while there is a small amount of flat ground along the coastline, which makes the majority of the area exposed to certain landslide risks. Meanwhile, Lantau Island enjoys a subtropical monsoon climate that is warm and dry in winter and hot and humid in summer, accompanied by high annual average precipitation, frequent high-intensity storms, and typhoons [43]. Under the combined influence of topography and climate, Lantau Island is thus a landslide-prone area, attracting researchers' attention. Moreover, the HK international Airport and Disneyland Resort on Lantau Island accept tourists from all over the world every year, which has aroused the concern of the authorities for landslide hazard prevention.

Landslide Inventories
Landslide inventory is fundamental to the research of landslide susceptibility. A complete landslide inventory map (LIM) records the location, size, and distribution of landslides, which can serve as basic data for further landslide studies.
Tian et al. [44] established a co-seismic landslide inventory around the epicenter of the 2017 Jiuzhaigou earthquake. In their work, a series of high-resolution images before and after the earthquake close to the seismic origin time, including the pre-seismic images on the Google Earth (GE) platform and 0.5 m resolution Geoeye-1 post-seismic images, were chosen for visual interpretation. Combined with the field investigation, a total of 4834 landslides were identified, dominated by small size rockfalls and debris flow. In these landslides, the largest, smallest, and average area are approximately 236,336.2 m 2 , 7.7 m 2 , and 1993.2 m 2 , respectively. In this work, considering the image resolution and the research purpose, we selected 710 locations from the original landslide inventory. In addition, to ensure that the distribution pattern of the selected landslides was consistent with the original inventory, we used the Subset tool in ArcMap to establish a sub-landslide inventory.
The Geotechnical Engineering Office (GEO), the government of Hong Kong, has established a landslide inventory called the "Enhanced Natural Terrain Landslide Inventory (ENTLI)" that records landslides that occurred from 1974 to 2018. This inventory, compiled by visual interpretation, includes almost all the landslides that have occurred in HK. In this work, we selected only the 2492 landslides that occurred on Lantau Island in 2008 as the training inventory without considering the time variation, thus keeping consistent with Zhangzha Town in terms of the time dimension.
Note that all landslide locations in the two inventories were converted into raster data with a grid size of 30 × 30 m using the nearest neighbor resampling method. Subsequently, we divided each landslide inventory into two groups, including training (70%) and testing (30%) data for the construction and testing of the landslide susceptibility model, respectively. As the landslide susceptibility assessment is a binary classification problem that needs positive and negative samples, non-landslide locations with the same proportion as landslide data were randomly selected to balance the landslide inventory.

Landslide Predisposing Factors
The occurrence of landslides is closely related to the combination of various factors, including topography, geology, and other environmental indexes. A proper combination of predisposing factors can make the model more competitive [45]. Because of the complexity of landslides and the diversity of triggering sources, the predisposing factors should be chosen based on the circumstances of the specific area.
In study area of Zhangzha Town, 13 landslide predisposing factors, namely peak ground acceleration (PGA), topographic wetness index (TWI), distance to rivers, distance to roads, distance to faults, normalized difference vegetation index (NDVI), land use, lithology, elevation, slope, aspect, topographic ruggedness index (TRI), and yearly precipitation, were chosen for landslide susceptibility modeling. Table 1 outlines the data source. PGA is an important indicator of the relationship between co-seismic landslide density and the earthquake and is also a necessary factor in landslide susceptibility mapping [46]. In this study, the PGA data were adapted from USGS (https://earthquake.usgs.gov/) (accessed on 6 November 2021), ranging from 0.12 to 0.26 g and classified into five groups: <0.08, 0.08-0.16, 0.16-0.20, 0.20-0.24, and 0.21-0.26 (Figure 2a). In LSM, TWI is also a frequently-used factor derived from the digital elevation model (DEM) that quantifies topographic control on hydrological processes [47]. As shown in Figure 2b, the TWI used in this work was divided into five categories, with the values of <5. 19, 5.19-6.57, 6.57-8.29, 8.29-10.81, 10.81-15.21, and >15.21. The stability of a slope around a river will be significantly affected by the fluctuation of water in the river [48,49]. The roads can affect the spread and size of landslides. Therefore, the distances to rivers and roads derived from a topographical map are considered as predisposing factors (Figure 2c,d). Note that the buffer distance of roads and rivers was determined based on the characteristic of the study area. NDVI is a crucial factor concerned with slope stability, especially in mountain areas. Figure 2e shows that the NDVI is divided into five categories, with values ranging from −0.41 to 0.80. Land use is another common factor contributing to landslides. Using the supervised classification method of the SVM classifier on the Google Earth Engine (GEE), the land use was classified into six categories, namely water, construction land, bare land, dense forests, sparse forests, grass land, and others ( Figure 2f). In this study, NDVI and land use were derived from the pre-seismic Sentinel-2 data on the GEE. Geological factors play a significant role in landslide susceptibility, including lithology and distance to faults. The original lithology and faults from a geological map were provided by local authorities with a 1:500,000 scale. The lithology of the study area mainly contains five groups, including Triassic (T1, T2, T3), Carboniferous (C), Quaternary (Q, Qh), Devonian    For modeling the landslide susceptibility of Lantau Island, ten predisposing factors including elevation, slope aspect, slope angle, TRI, TWI, NDVI, distance to faults, lithology, land use, and yearly precipitation were prepared. The maps of these landslide predisposing factors and more details are provided in Figure A1, Appendix A.
All predisposing factors of Zhangzha Town and Lantau Island were converted into raster data with a grid size of 30 × 30 m, in accordance with the available landslide inventory.

Methodology
In the present study, modeling and mapping landslide susceptibility in Zhangzha Town and Lantau Island involve four principal parts: (1) landslide factors selection using information gain ratio (IGR), (2) integrating the landslide inventory and predisposing factors to establish tensor datasets for the CNN-based model and vector datasets for conventional machine learning methods, (3) landslide susceptibility modeling, (4) assessing the predictive capability of models, and (5) mapping the landslide susceptibility of the study areas. A flowchart is shown in Figure 3. The reason why the CNN module was selected for modeling the landslide susceptibility is that it is capable of capturing high level-features from the data effectively and it can derive other deep learning techniques. Moreover, three typical machine learning models, i.e., random forest, logistic regression, and support vector machine, were selected as the pairwise comparison because these models have been proven to have good performance in LSM.

Information Gain Ratio
The importance of predisposing factors is associated with their contribution to landslide occurrence. Factors combination that significantly affects the predictability of the susceptibility model is worth discussion. In this work, we used the information gain ratio (IGR) method to quantify the factors' importance and select optimal combinations before modeling. The IGR values of each factor were calculated via the Weka ver. 3.8.5, an open-source software. The high IGR value of a factor indicates a high prediction ability for modeling and vice versa [50].
Assuming that the training data D consists of n samples (landslide points), and can be divided into the class C i (landslide, non-landslide), then the information entropy of the dataset which is split by class C i can be calculated as The amount of information gain D 1 , D 2 · · · D m , split from D regarding the predisposing factor P, is estimated as However, the second term of the formula will be a smaller value as the number of factor attribute categories increases, which may reduce the accuracy of the selection. Thus, the information gain ratio was used to estimate the predictive ability of a certain predisposing factor and can be written as In f oramtiongainratio(D, P) = Info G ain(D,P) where SplitEnt quantifies the potential information generated by dividing the training data into m subsets as follows [9]:

Establishment of Spatial Datasets
Having the landslide inventory and predisposing factors, the next step is to format and integrate these data into datasets for further modeling. As shown in Figure 4a, all layers of landslide predisposing factors were stacked together to form a tensor with the size of n × w × h, where n, w, and h represent the number of predisposing factors, the length, and the width of the entire study area, respectively. Then, the specific pixel corresponding to each landslide location was obtained by overlaying the landslide inventory with the factor tensor. Note that the grid size of all the factor layers and the landslide inventory should be the same to ensure that they can be pixel-by-pixel. As mentioned earlier, the grid size of all raster data in this study was 30 × 30 m. However, the factor tensor had different numerical ranges for each dimension. For instance, the slope aspect in Zhangzha Town was divided into nine groups, while PGA was divided into six groups. Therefore, it was essential to normalize each dimension of the factor tensor to improve the machine learning algorithm's convergent speed and accuracy [51].  Figure 4b shows the landslide and non-landslide locations extracted from the factor tensor used in the deep learning model. The cell corresponding to each landslide location is taken as the center and then expanded into a raster with a size of s × s (this is the window size, and the tuning curves of the parameter are provided in Figure A2, Appendix A). Each cell is assigned a value that contains the data of all the factor layers. In this way, more environmental information around the landslides can be considered for further modeling as opposed to just using one grid [34]. Similarly, the raster of non-landslide location is extracted from the tensor data. The size of the raster used for learning should be set according to specific demands. In this study, the size of the landslide and non-landslide raster was 21 × 21 in Zhangzha Town and 31 × 31 in Lantau Island using the trial-and-error approach. A total of i × 2 (i represents the number of landslides) training and testing tensors with n dimensions extracted from the landslide inventory were generated. Table 2 summarizes the CNN datasets we produced. In this work, the GDAL, a python package (ver. 2.3.2), was adopted to read the raster of each predisposing factor as an editable array. These data were then stacked and expanded via the Numpy package to feed the CNN-based model. Table 2. Number of pixels in the dataset of the CNN-based model.

Training Set Testing Set
Landslide pixels Non-landslide pixels Landslide pixels Non-landslide pixels Figure 4c is the diagram of dataset production for the conventional machine learning method. The cell corresponding to each landslide location is extracted from all factor layers and converted to a one-dimensional array with n elements. With the addition of non-landslide sites and the label column, we finally obtain a matrix with a size of all samples × n for learning. The whole process of generating the matrix data was implemented in ArcMap 10.6.

Convolutional Neural Network (CNN)
The convolutional neural network is perhaps the most popular and widely used deep learning algorithm, composed of three key components: convolutional, down-sampling, and fully connected layers [52]. As the key of CNN, convolutional layers contain multiple convolution kernels that extract finer feature information from the previous layer. Meanwhile, the shared weights strategy in the convolution layer allows the entire network to be trained with fewer parameters than the fully connected network. The down-sampling layer, known as the pooling layer, is leveraged to reduce the size of the feature information as well as to improve the overfitting resistance of the model in the face of different data. The fully connected layer actually acts as a classifier with the same structure as that of the conventional fully connected network. The input of the fully connected layer is the high dimensional features extracted by convolution and pooling operations. Various extended CNN architectures have been proposed and applied in many fields using these basic layers.
As we know, not all landslide predisposing factors contain the same information. That is, the contribution of each predisposing factor to landslide occurrence is different. However, the basic CNN framework assumes that different bands contribute equally. Thus, we added a lightweight channel attention module, namely the squeeze-and-excitation network (SENet), into the pure CNN model. The SENet is capable of calculating the importance of each channel (i.e., landslide predisposing factors) automatically and then, based on the importance, enhancing the important features that are more meaningful to the landslide modeling [53]. The SENet is an embedded network that mainly contains global average pooling, a fully connected layer, as shown in Figure 5, and it can be expressed as follows: where Z is the result of the global average pooling operator that converts all factors channel into a 1 × C vector, S is the weighted vector, W 1 ∈ R (C/r)×C , W 2 ∈ R C×(C/r) , r is the scaling parameters, and σ and g represent the Sigmoid and the ReLU function, respectively. Finally, the Z constrained by the attention mechanism can be obtained by the Z times the inputted data. Note that the embedding of the SENet does not alter the size of each tensor in the pure network and just results in a small extra cost. In landslide susceptibility mapping, the tensor dataset carrying information on all predisposing factors is input to the convolutional layer. A series of features related to the landslide event is extracted by the convolution operation. Moreover, the pooling layer after the convolutional layer can filter the redundant information. Then, the previous features match the label of input data in the fully connected layer. In the training process, the parameters of CNN are constantly updated with the backpropagation algorithm until an acceptable training accuracy is reached. In this study, the architecture of CNN was designed as shown in Figure 5.
One limitation of the CNN-based model is the setting of its parameters, which takes more time than the conventional machine learning methods. In addition, its code is more complex because of the introduction of the pixel window. Considering the above limitations, the computational efficiency of the CNN-based model is thus lower than that of the conventional machine learning methods. On the other hand, the salt-of-pepper effect that exists in the prediction results could be the limitation of the conventional machine learning methods. A detailed analysis concerning these limitations can be found in Section 5.2 and Section 5.4.

Random Forest
Random forest (RF) is a popular machine learning method introduced by Breiman [54]. It is an ensemble algorithm that generates multiple decision trees with different classification capabilities to learn input data using the bootstrap sampling strategy. The classification is achieved by voting among all the independent decision trees in RF. Different from the single decision tree, the input data of each tree in RF are randomly selected from all input datasets. Each node is split using the subset of predictive features (i.e., landslide predisposing factors) that are randomly selected [55]. In such a way, RF increases diversity among the decision trees and improves the capability to handle the redundant data (over-fitting resistance). Thus, RF is a useful method for mining the useful but hidden association between features and targets within large amounts of data [28].

Logistics Regression
Logistic regression (LR), a generalized linear model, is widely used in mapping landslide susceptibility. In this study, the dependent variable of LR was a binary variable that represented the presence or absence of a landslide (1 means landslide and 0 means non-landslide), and the independent variables were the 12 landslide predisposing factors. The goal of LR is to describe the relationship between binary-coded landslide locations and the various landslide predisposing factors and estimate the probability of landslide occurrence. The general expression of LR is as follows: where x 1 , x 2 , . . . , x n are the landslide predisposing factors, b 0 is the intercept of the model, parameters b 1 , b 2 , . . . , b n are the regression coefficients that must be determined, and p is the probability of landslide occurrence. In a regression model, the term multicollinearity implies that a perfect linear relationship exists among more than two variables, causing estimates of the model to not be uniquely computed [50]. There is thus a need to implement multicollinearity diagnosis before modeling. The variance inflation factor (VIF) and tolerances methods are commonly used to quantify the multicollinearity among predisposing factors in landslide studies. When the VIF is greater than 10 or the tolerance is smaller than 0.1, there is potential multicollinearity in the datasets [9]. In this work, the abovementioned two indicators of the corresponding landslide datasets in the two study areas are within the expected range, thus implying no potential multicollinearity among landslide predisposing factors.

Support Vector Machine
As a machine learning method based on a structural risk minimization (SRM) strategy, support vector machine (SVM) have been widely used in various classification and regression processes. For the classification of this tool, similar to other algorithms, a boundary needs to be found to delineate the data space. The SVM algorithm achieves the goal of classification by finding a hyperplane with the maximum distance from the classes [56].
Given a training dataset D(x i , y i ) with n samples, where x i ∈ R n and the labels y i ∈ {1, −1}, a separating hyperplane is defined as where w is the coefficient vector that determines the direction of the hyperplane, and b is generally referred to as the bias. Parameters w and b affect the hyperplane synthetically. The optimization of w and b can be considered as a quadratic programming problem: where C is the penalty, ξ i is the slack factor used for the soft-margin classifier, and l is the number of support vectors. The optimal hyperplane is determined after solving the optimization problem with the Lagrange multiplier a i . The classification function is expressed as For nonlinear classification, it is common to use nonlinear kernels to map data in a high-dimensional feature space and then classify it using a separating hyperplane. By introducing kernel functions, Equation (13) becomes In this study, the radial basis kernel was chosen to be the kernel function K(x i · x) as

Generation of Landslide Susceptibility Maps
According to previous studies, there are two main approaches to generate landslide susceptibility maps (LSM). One is first to calculate the susceptibility values of random points in the study area and then perform inverse distance weighted (IDW) spatial interpolation, which is an effective method to assign values to the entire area using limited points. Then, the landslide susceptibility values are classified into various risk levels using the natural breaking method. Another approach is to calculate the susceptibility values of all pixels of the study area and then classify the values into five risk levels: very low, low, moderate, high, and very high [16]. The choice between random point-based and pixel-based methods is driven primarily by the considerations of study area size and computer performance. An LSM generated by these methods can reflect the distribution of potential landslide hazards with corresponding probability. However, the pixel-based LSM may require more computer resources and code to implement when the study area is large. In this study, the pixel-based method was used to generate landslide susceptibility maps. The GDAL transformed each pixel assigned the predicted value into the raster data with the same coordinate system and spatial resolution as the actual study area.

Model Performance Evaluation
This study overlaps landslide locations with the landslide susceptibility map for model evaluation. The receiver operating characteristic (ROC) curve is considered an effective metric for assessing the performance of predictive models, specially developed for classification. A ROC curve is a graph with two axes corresponding to "1-specificity" and "sensitivity". Specificity is the proportion of landslide pixels that are correctly predicted as landslides, Sensitivity is the proportion of non-landslide pixels that are correctly predicted as non-landslide [55]. The area under the curve of ROC indicated by the value of area under the curve (AUC) [57] is another index to test the model performance. The AUC values are between 0.5 and 1. A larger value of AUC generally indicates that the corresponding model can achieve a better performance. In detail, the AUC value can be divided into four levels: poor (0.5-0.6), moderate (0.6-0.7), good (0.7-0.8), and excellent (0.9-1) [58].
Furthermore, several statistical indices, including the root mean square error (RMSE), accuracy, and Kappa index, are adopted for model validation. Generally, a landslide susceptibility model with a lower RMSE value means that it performs better in predicting landslide occurrence. A full description of the Kappa index can be found in the literature [59]. Following are the mathematical expressions of these indices. All metrics were calculated using the metrics class of the Sklearn package: Note: Accuracy, sensitivity, and specificity can be derived from the confusion matrix, which is a table presenting the percentages of false positive (FP), false negative (FN), true positive (TP), and true negative (TN) observations [60]. In RMSE, y pred. and y act. represent the predictive and actual values, respectively. In Kappa, TS is the total number of landslide samples in the study area. Figure 6 shows the contribution to the landslides of each factor calculated by the information gain ratio (IGR), which was generated by OriginPro 2021. It can be seen that, in the area of Zhangzha Town, the PGA has the highest IGR values of 0.092, while the factors of TWI, yearly precipitation, and slope aspect are less than 0.01, indicating that the weight of these factors is somewhat small. These results reflect the background characteristics of the selected landslide inventory, that is, all samples in the inventory of Zhangzha Town are earthquake-triggered landslides. In return, it could also confirm the accuracy of the prepared data. Interestingly, the weight of yearly precipitation is much less than that of PGA. One explanation for this could be that the strong earthquake ground motion is magnified under the complex topography of Zhangzha Town, exerting a negative impact on the hillslope stability, and this impact is much more significant than that of rainfall [61,62]. Different from the IGR results of Zhangzha Town, the top three factors of Lantau Island are lithology, TRI, and yearly precipitation. The yearly precipitation becomes a governing factor among all factors, while the weight of elevation and related topographic factors are slightly reduced. As described in Section 2.1, Lantau Island suffers from a high annual precipitation, frequent high-intensity storms, and typhoons, thus resulting in the high weight of rainfall.

Selection of Predisposing Factors
In addition, the difference in the contribution of factors to landslides between Zhangzha Town and Lantau Island allows us to analyze the generalization ability of the susceptibility models.

Construction of Models
For the construction of CNN, hyperparameter tuning is the fundamental step to obtain an expected model with high predictability. In this study, the related hyperparameters, including convolutional kernel size, pooling size, loss function, optimizer, epoch, batch size, learning rate, and activation function, were determined using five-fold cross-validation. In terms of the RF, LR, and SVM models, the GridSearchCV method was adopted for parameter tuning [63].
In the training process, the CNN operation on the dataset is as follows: the tensor datasets containing comprehensive environmental information are first fed into two convolutional layers and two down-sampling layers for convolution and pooling operations, generating high-level feature maps, then the feature map is flattened into the fully connected layer and matched to the landslide label. Finally, the entire model can learn the pattern of landslide attribution and the surrounding environment. In terms of conventional machine learning methods, the process is to input the landslide factor vector (representing predisposing factors and training samples) into the model and match the landslide label for training. Figure 7 shows the training process of the CNN-based model, which can be used as an indicator to determine whether the model converges or not. Intuitively, we hold the idea that the training is completed when the loss of model converges (i.e., no significant fluctuations along loss and accuracy curves). It can be observed that, in the area of Zhangzha Town, the model loss shows a large fluctuation before 25 epochs and stabilizes around epoch 35 over time. In terms of the model of the Lantau Island area, its training curves tend to level off from epoch 65. These results demonstrate that the constructed model converged after learning the hidden pattern between predisposing factors and landslides. We used the register forward hook function of PyTorch to obtain the outputs of each layer and then visualized it using the matplotlib package. Figure 8 shows the feature extraction process of the CNN-based landslide susceptibility model trained on Zhangzha Town. It can be observed that the selected predisposing factors were fed into the network, and then the high-level features were extracted layer by layer through different convolutional operations. Finally, these high-level features reflect the contributions to landslide occurrence gathered at the output layers to generate the probability map of landslide occurrence. That is to say, the trained CNN-based model can leverage all the landslide predisposing factors simultaneously to map the landslide susceptibility.
The neural network and machine learning methods were performed under the Python environment, outside of the GIS software. The CNN and conventional machine learning models were developed by PyTorch 1.9.0 and Sklearn 1.0.1, respectively. The hardware environment of this study was a personal computer with a 6 GB graphic card GTX1660Ti, a 2.6 GHz Intel®Core™ i7-9750H CPU, and 16 GB of RAM.

Model Comparison
To quantify the robustness and generalization ability of the susceptibility models, we compared each model's performance on two typical landslide inventories, including earthquake-triggered and rainfall-induced landslides, based on the testing dataset, AUC, and statistic measures. The results are shown in Figure 9 and Table 3. The ROC curves were drawn by OriginPro based on the outputs gained from the metrics function in PyTorch. It can be seen that, in the area of Zhangzha Town, the AUC values vary from 84.65% to 91.23%, and the CNN-based model achieves the highest values (91.23%), followed by RF (89.92%), LR (85.59%), and SVM (84.65%). Notably, the AUC value of the CNN-based model is 6.58% higher than that of SVM. The CNN model has the highest probability of correctly predicting the non-landslide pixels (specificity = 0.86), whereas the RF has the highest probability of correctly predicting landslide pixels as landslides. The Kappa coefficient varies from 0.53 to 0.67, satisfying the strength of agreement given the magnitude proposed by Landis and Koch et al. [64]: 0.4-0.6 and 0.6-1 are moderate and almost perfect, respectively.  For Lantau Island, the CNN-based model has the highest AUC value (92.70%), followed by RF (90.79%), LR (86.45%), and SVM (86.24%). A similar pattern is also observed in terms of accuracy, RMSE, Kappa, sensitivity, and specificity, which indicates a reasonable agreement between the predicted landslides and the real ones.
Overall, the CNN-based model shows a better performance than the three conventional machine learning (ML) models in predicting the landslide susceptibility of the two selected areas. Furthermore, the AUC and statistical measures of the RF model are close to those of the CNN-based model, and the predictive ability of the SVM and LR is significantly similar. All of these susceptibility models achieved an acceptable performance, and the CNN-based model outperformed the other three ML models with the help of its excellent feature extraction predictive ability in the study areas of Zhangzha Town and Lantau Island.

Landslide Susceptibility Mapping
This study used the CNN, RF, LR, and SVM models to generate the landslide susceptibility maps for Zhangzha Town and Lantau Island (Figures 10 and 11). All pixels in the study area were fed into these trained models for calculating the landslide susceptibility index (LSI). Then, the LSI was divided into five susceptibility levels, namely very low (VLS), low (LS), moderate (MS), high (HS), and very high (VHS), using the natural break approach in ArcMap10.6. The landslide susceptibility zones that indicate the ratio of each susceptibility level to the whole study area were used to analyze the landslide susceptibility maps qualitatively.
In the CNN-based model predicted landslide susceptibility map of the Zhangzha Town, it was observed that the VHS area was relatively concentrated with 13% of the study area, mainly distributed in the middle and northeast, and most landslide points accurately fall into it. A similar spatial distribution of the VHS area was observed in the two conventional ML method models, whereas the VHS and LS areas were lower than those of the CNN-based models, with more than 50% of the area calculated as three susceptibility levels ranging from low to high, as shown in Figures 10 and 12a. Figure 11 shows the landslide susceptibility map of Lantau Island. It can be observed that the western region of Lantau Island with steep terrain is MS to VHS, while the eastern region is mostly VLS to LS. Additionally, as in the results for Zhangzha Town, a similar tendency regarding the total area of VHS and VLS can be noticed, i.e., the CNN-based model (79%) is significantly higher than RF (55%), SVM (50%), and LR (48%). A landslide susceptibility zone of Lantau Island is shown in Figure 12b.

Discussion
Landslide susceptibility mapping (LSM) is a useful tool in predicting the spatial distribution of landslide occurrence. Over the past decades, LSM has gradually evolved from early qualitative analysis to data-driven quantitative methods. The development of machine learning methods in conjunction with the increase of Earth observation data allows us to effectively mine the hidden pattern between landslides and their predisposing factors, which is the basis for susceptibility assessment. Recently, the convolutional neural network (CNN), an effective feature extractor, has been gradually applied to deal with the LSM problem. Compared with the conventional machine learning methods, the CNN-based LSM model is different in many aspects, especially the organization form of the training data and the spatial expression of the predicted results. Taking Zhangzha Town and Lantau Island as the study areas, the present work compares in detail the differences between the CNN-based model and conventional machine learning models in terms of dataset preparation and model effectiveness. To the best of our knowledge, this work is the first to comprehensively compare these two types of landslide susceptibility models, which adds extra value to the literature of LSM or other natural disasters, such as floods and soil erosion. In this subsection, we further discuss the applicability of the CNN-based model from three perspectives: (1) modeling, (2) result analysis, and (3) limitations.

Model Parameter Analysis
Exploring the effects of hyperparameters on the accuracy of landslide susceptibility modeling could allow us to understand the optimization process better to expand the application to future research. In this subsection, four important hyperparameters of the CNN model, the neurons in the hidden layer, batch size, activation function, and optimizer (i.e., the optimization algorithm), are selected for discussion. Note that when we analyze a specific parameter, the others were set to their optimal values as obtained by cross-validation, which ensures that the independent effect of the hyperparameter is captured. Figure 13 shows the impact of the number of hidden layer neurons on the AUC value of the validation dataset. It can be observed that the CNN model of Zhangzha Town reached the highest AUC values when the neuron number was 700, and the worst result with the number of 200. In terms of batch size, a series of batch size from 16 to 256 with the power-of-two step were selected for comparison. The results are presented in Figure 13b, indicating the optimal batch size is 30 and the lowest AUC is associated with a batch size of 64. As can be observed from Figure 13a,b, the best activation function and optimizer for modeling are the ReLU and AdaGrad, respectively.
Moreover, the mean deviation (M.D.) of AUC values of each hyperparameter was calculated to measure its sensitivity to the modeling result. The higher M.D. of a hyperparameter is, the more sensitivity it has to the prediction accuracy. The activation function has the highest M.D. (3.77%), followed by the optimizer (1.24%), batch size (1.19%), and neurons in the hidden layer (0.57%). In the same way, the conclusion that the M.D. of the optimizer is the highest in the CNN model of Lantau Island was obtained. In light of these results, much more attention should be paid to the tuning of the activation function and the optimizer when constructing the CNN-based landslide susceptibility model in Zhangzha Town and Lantau Island.

Computational Efficiency
In recent years, machine learning methods have received extensive attention in LSM. In this paper, CNN was applied to generate the LSM, and the comparative analysis results show that the CNN model achieves a more satisfactory performance than conventional machine learning methods. The performance of models is a vital issue as well as the goal of LSM. Nevertheless, the computational efficiency also needs to be considered. In the experiment, we found that although there is a GPU to accelerate the computation, the CNN-based model consumes much more time than conventional machine learning models in both the training and prediction phases. Accordingly, the performance of conventional machine learning methods is acceptable when considering computational efficiency. On the other hand, the CNN-based approach used in this study extends the LSM from the point-based processing to the image-based processing with a greater potential operating space in terms of the model construction. Compared to conventional machine learning methods, the deep learning approach can be more flexible in changing the network and dataset structure to suit the specific condition. For instance, Fang et al. [65] integrated CNN with three conventional machine learning classifiers to assess landslide susceptibility. The features extracted from the convolutional layers were input into the conventional machine learning classifiers and obtained satisfactory results; Yang et al. [30] proposed a hybrid CNN-based landslide susceptibility model for synchronously capturing the spatial information and the correlated features among the environmental variables. Therefore, the decision to use the CNN method or conventional machine learning methods needs to be determined according to the specific conditions such as experimental equipment, model performance, and disaster emergency degree.

Reliability Analysis of the Modeling Results
In this work, the performance of the CNN-based model that trained on two study areas is higher than that of the corresponding three benchmark models. However, the ultimate goal of susceptibility mapping is to provide scientific and practical advice on disaster reduction for the civil protection department, rather than simply improving the modeling accuracy. The time cost is a significant issue in disaster prevention and management, which is also suitable in LSM [66]. Note that this time cost refers to the time it takes to determine the target area based on the predicted results, not the modeling time. An ideal landslide susceptibility model is capable of predicting the extreme values of very low and very high susceptibility areas accurately [67]. Figure 14a,b shows the distribution of all sample points in different probability intervals. It can be observed that most of the landslide points and random points (i.e., non-landslide points) are distributed at both ends of the probability interval, and the CNN-based model shows higher sensitivity to the very high and low susceptibility level (i.e., the probability of landslide occurrence is~1 or~0). Specifically, the CNN-based model that was trained on Zhangzha Town has the highest percentage of landslides (95%) in the range of the top 20% of the landslide probability, followed by RF (73%) and SVM (57%). On the contrary, most non-landslide points fall in the interval with a lower occurrence probability. Similar distribution curves of Lantau Island can be observed in Figure 14c,d. These results demonstrate that the CNN-based model could better identify the very high and very low susceptibility areas, which reasonably allows decision-makers to focus on prevention targets in a disaster perspective view. Additionally, sensitivity and specificity are two indicators used to quantify the performance of the models for correctly predicting landslides or non-landslide areas. In this work, these two indicators of the CNN-based model are in the satisfactory range and more balanced as compared to the conventional ML models. All this considered, we can assume that the prediction results of the CNN-based model are reliable and more in line with the view of landslide risk management.

Analysis of the Salt-and-Pepper Effect in LSM
The salt-and-pepper effect often appears in image segmentation and classification. The reason for this effect is that single pixels with different values cannot form homogeneous areas [68]. Nevertheless, none of the previous studies analyzed the presence of the saltand-pepper effect in landslide susceptibility maps. The salt-and-pepper in the image segmentation may make the results difficult to distinguish, which for LSM means that this effect may destroy the coherence of the susceptibility assessment in the study area and is not conducive to the subsequent hazards management. Therefore, it is necessary to eliminate the impact of the salt-and-pepper effect in LSM. Figure 15 shows the landslide susceptibility maps of different models for the same region, demonstrating that the CNN-based model significantly reduces the salt-and-pepper effect compared with the conventional machine learning-based models. A similar result can be observed in Figure A3, Appendix A. The reason for this should be the difference in the establishment of the datasets. As stated in Section 3.2, the training data of the conventional machine learning model consists of single points and the corresponding predisposing factors. However, the CNN training data is expanded with the help of the pixel window, where more pixels are given class labels, and more knowledge is learned by the model, thus creating more homogeneous areas. Therefore, the CNN-based model is more suitable for LSM when considering the salt-and-pepper effect.

Limitations and Future Research
The results from the CNN-based model trained on the earthquake-triggered and rainfall-induced landslide datasets show that it has the advantages of high prediction ability on testing datasets and achieved the desired accuracy. However, the scale of the case study areas, the limitation of this paper, should be acknowledged. In the case of landslide risk management, the susceptibility assessment is usually conducted at the county level or above. In this study, only a local area of Jiuzhaigou County and an outlying island in Hong Kong were selected as the study areas, and the scale of these two areas is slightly small. As described in Section 5.2, the CNN-based model takes longer to model and predict than conventional machine learning methods; thus, its application to a larger target area may need a trade-off between time efficiency and performance. Therefore, further research of larger-scale areas is required to explore the robustness and reliability of the CNN-based landslide susceptibility model. Another limitation of this study is that the DTM of Lantau Island with a 5 × 5 m grid was resampled to 30 × 30 m for keeping the resolution of all factors consistent, which may have resulted in the loss of some spatial information.
In addition, the framework of landslide susceptibility assessment based on the CNN model can also be applied to landslide polygon recognition in theory. It is worth studying the exact boundary between the two landslide risk reduction techniques.

Conclusions
This work compared and analyzed the performance of the CNN-based model and conventional ML models for landslide susceptibility mapping (LSM) in two typical areas, Zhangzha Town and Lantau Island, as study areas that suffer from catastrophe earthquakes and heavy rainfall, respectively. The results demonstrate that the CNN-based model achieved a superior performance over the conventional RF, LR, and SVM models. Moreover, we presented a detailed guide for generating the training datasets for deep learning modelbased susceptibility modeling. The following conclusions can be drawn:

1.
Among four landslide susceptibility models (i.e., CNN, RF, LR, and SVM), the CNNbased model exhibits the best predictive capability for LSM on the testing datasets.

2.
Different from the datasets of conventional ML methods, the 3D dataset allows more spatial information to be considered and learned by CNN-based models. The LSM generated by the CNN-based model is not only sensitive to the high-risk landslide zone but also significantly reduces the salt-and-pepper effect, which guarantees the consistency of susceptibility assessment.

3.
Although the CNN-based model achieved significant results, it consumed more time than conventional ML models in both the training and prediction phase. When assessing landslide susceptibility for large areas, time efficiency is an issue that must be considered. Therefore, the choice of the LSM model should be a trade-off between time efficiency and performance. 4.
The results of the LSM would assist in disaster management and policy making in the Jiuzhaigou region. Also, this study adds value to the literature of landslide susceptibility mapping through a comparative study of CNN-based and conventional ML models.
Additionally, we only used the typical models and network architecture and did not combine engineering geology analysis methods into the model. In future research, a sophisticated and specific CNN architecture should be designed for dealing with the LSM, which may lead to better performance.
Author Contributions: Conceptualization, R.L. and X.Y.; methodology, R.L. and X.Y.; software, X.Y.; validation, X.Z. and X.Y.; formal analysis, X.Y.; resources, C.X. and L.W.; data curation, C.X. and L.W.; writing-original draft preparation, R.L. and X.Y.; writing-review and editing, R.L., X.Y., C.X., L.W. and X.Z.; visualization, X.Y.; project administration, R.L.; funding acquisition, R.L. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: Landslide inventory data may be available from the corresponding author or the third author upon request. The core code for reading the predisposing factors and label data and generating the pixel window can be found in the GitHub repository: https://github.com/ kkshiney/RS_ML_DeepLearning_LSM (accessed on 22 November 2021).

Acknowledgments:
The authors would like to thank the handling editors and the four anonymous reviewers for their valuable comments on the earlier version of the manuscript. The authors wish to thank the lab's staff for their valuable comments and contributions.

Conflicts of Interest:
The authors declare no conflict of interest.