Accurate Prediction of Earthquake-Induced Landslides Based on Deep Learning Considering Landslide Source Area

: An earthquake-induced landslide (EQIL) is a rapidly changing process occurring at the Earth’s surface that is strongly controlled by the earthquake in question and predisposing conditions. Predicting locations prone to EQILs on a large scale is signiﬁcant for managing rescue operations and disaster mitigation. We propose a deep learning framework while considering the source area feature of EQIL to model the complex relationship and enhance spatial prediction accuracy. Initially, we used high-resolution remote sensing images and a digital elevation model (DEM) to extract the source area of an EQIL. Then, 14 controlling factors were input to a stacked autoencoder (SAE) to search for robust features by sparse optimization, and the classiﬁer took advantage of high-level abstract features to identify the EQIL spatially. Finally, the EQIL inventory collected from the Wenchuan earthquake was used to validate the proposed model. The results show that the proposed method signiﬁcantly outperformed conventional methods, achieving an overall accuracy ( OA ) of 91.88%, while logistic regression (LR), support vector machine (SVM), and random forest (RF) achieved 80.75%, 82.22%, and 84.16%, respectively. Meanwhile, this study reveals that shallow machine learning models only take advantage of signiﬁcant factors for EQIL prediction, but deep learning models can extract more effective information related to EQIL distribution from low-value density data, which is why its prediction accuracy is growing with increasing input factors. There is hope that new knowledge of EQILs can be represented by high-level abstract features extracted by hidden layers of the deep learning model, which are typically acquired by statistical


Introduction
Globally, many earthquakes occur annually due to the release of accumulated stress in the Earth's crust. Catastrophic earthquakes (≥magnitude 6) in mountainous regions could trigger numerous landslides on steep mountain slopes [1,2]. Such earthquake-induced landslides (EQILs) can cause tragic economic losses and human causalities by damaging buildings, critical infrastructures, and the environment. Furthermore, other landslides block rivers to form a dammed lake that is a potential risk for outburst floods and debris flow, threatening people and properties downstream [3,4]. Therefore, it is valuable to describe and predict the spatial distribution of earthquake-induced landslides for the sake of disaster prevention and mitigation [5][6][7].
The occurrences of EQILs are primarily controlled by causative factors, including seismic parameters, topography, lithology, tectonics, and human activities, making the spatial patterns of EQILs complicated [8][9][10]. Thus, predicting where landslides are likely to occur after an earthquake is a challenging task [6,11]. In the past two decades, many predictive models have been developed to identify landslide-prone areas after earthquakes, and these are divided into two general categories: (1) physical-and numerical-based models [12][13][14][15]; (2) susceptibility assessment models [16,17].
The physically based models (PBMs) were deduced from the mechanics of slopefailure initiation and runout. Among those models, pseudostatic analysis was the first proposed, assuming that the earthquake force is a permanent body force added to static limit-equilibrium analysis [18,19]. Although pseudostatic analysis is conceptually simple, selecting a pseudostatic coefficient lacks criteria, and crude assumption always leads to conservative results [20]. Stress-deformation analysis, as an extension of finite-element modeling, was later developed and provided a valuable tool for modeling the static and dynamic deformation of slopes. This method is based on mathematical computation, with the capability to solve physical problems with complex geometries, material properties, and boundary conditions, and it is suitable for critical projects researching the stability of artificial slopes [21]. Soon after its implementation, permanent-displacement analysis-whose complexity is between that of the two methods above-was proposed to estimate seismically driven displacements of landslides, considering landslide as a rigid-plastic body that slides on an inclined plane. The representative models of permanent-displacement analysis are the Newmark model and its variations [22]. Over the past few years, the accuracy of physically based models has been enhanced significantly with improvements in analyzing EQIL mechanisms. However, PBMs require a large number of parameter values, which limits their application in a large geographical area [20].
Later, researchers gradually began exploring susceptibility assessment models (SAM) that could represent a potential relationship between EQIL and causative variables and identify EQIL-prone areas [23]. In recent years, landslide susceptibility models have become increasingly popular due to the fast development of computer technology, geographic information systems (GIS), and data mining [24]. Those models can be divided into two categories: knowledge-driven models that are based on expert knowledge and data-driven models that are based on historical landside inventories and related spatial causative data of landslides [25]. The knowledge-based models utilize expert knowledge to quantitatively express the relationship between the landslide occurrence and causative factors, and they regard the analytic hierarchy process (AHP) as the most representative one [26]. For the data-driven models, many statistical methods and machine learning approaches have been introduced to model the probability of landslide occurrence; these mainly include multivariate logistic regression (MLR), artificial neural networks (ANN) [27], support vector machine (SVM) [28], random forests (RF), and decision tree approaches [29]. Researchers have indicated that data-driven models outperform knowledge-driven models in susceptibility mapping. Data-driven models pave the way for detecting the potential distribution pattern of EQILs, which is difficult when only using the naked eye [11].
However, most of the data-driven models mentioned above are traditional machine learning algorithms that can simply represent a single layer of linear or nonlinear relations between causative factors and landslide occurrence [30]. Therefore, those models can easily be overfitted or become stuck on a local optimum when faced with complex data [31,32]. On the other hand, these landslide prediction models rarely consider the difference between landslide source and accumulation due to a limitation in EQIL inventories that always records an EQIL as points or polygons. In the last decade, many deep learning algorithms have achieved remarkable results in computer vision, speech recognition, and intelligent control of robots, whose capacity for exploiting the potential of multiple relationships in massive data has also been gradually noticed by geohazard specialists [33][34][35][36][37][38].
In this paper, a deep learning framework that considers landslide source area is proposed for the spatial prediction of EQIL. Then, the EQIL inventory of the Wenchuan earthquake is used as a sample dataset to train and test the proposed approach. A comprehensive comparison with traditional machine learning models, such as MLR, SVM, and decision tree, is carried out.

Methods and Methods
As mentioned above, a new deep learning-based framework is applied for spatially predicting EQILs. In this framework, the stacked autoencoder is responsible for extracting the deep distribution features of EQIL that could represent the nonlinear and interacting relationship between causative factors and EQIL occurrence. Then, those extracted features are input into a classifier to identify an EQIL-prone area.

Background
An autoencoder (AE) is a kind of construction of the neural network that consists of an input layer x ∈ R n , a hidden layer t ∈ R p , and an output layer y ∈ R m (Figure 1). In the input layer, the value of every unit denotes raw data that are a quantitative description of the object. After encoding and decoding, the raw data are transformed into a feature space, which is more suitable for pattern recognition or classification [41]. In detail, the AE will first map input layer x into latent vector y by the encoding process. Mathematically, this step is formulated as where W x denotes the weights of input-hidden layers, b x denotes bias term, and f (.) is an activation function named Sigmoid y = 1/1 + exp(−x). Then, the value of the output layer is calculated using where W y denotes the weights of hidden-output layers and b y denotes the bias term of hidden and output units.
tionships in massive data has also been gradually noticed by geohazard specialists [33][34][35][36][37][38]. Consequently, some types of deep learning framework have been successfully applied to landslide detection and prediction, including deep neural networks (DNN), convolution neural networks (CNNs), stacked autoencoder (SAE), and their derivative models [39,40]. In this paper, a deep learning framework that considers landslide source area is proposed for the spatial prediction of EQIL. Then, the EQIL inventory of the Wenchuan earthquake is used as a sample dataset to train and test the proposed approach. A comprehensive comparison with traditional machine learning models, such as MLR, SVM, and decision tree, is carried out.

Methods and Methods
As mentioned above, a new deep learning-based framework is applied for spatially predicting EQILs. In this framework, the stacked autoencoder is responsible for extracting the deep distribution features of EQIL that could represent the nonlinear and interacting relationship between causative factors and EQIL occurrence. Then, those extracted features are input into a classifier to identify an EQIL-prone area.

Background
An autoencoder (AE) is a kind of construction of the neural network that consists of an input layer n R ∈ x , a hidden layer p R ∈ t , and an output layer m R ∈ y ( Figure 1). In the input layer, the value of every unit denotes raw data that are a quantitative description of the object. After encoding and decoding, the raw data are transformed into a feature space, which is more suitable for pattern recognition or classification [41]. In detail, the AE will first map input layer x into latent vector y by the encoding process. Mathematically, this step is formulated as where x W denotes the weights of input-hidden layers, x b denotes bias term, and (.) f is an activation function named Sigmoid Then, the value of the output layer is calculated using ( ) where y W denotes the weights of hidden-output layers and y b denotes the bias term of hidden and output units. There are some errors between the input layer and the output layer from constructed AE with randomly initialized parameters. These errors will limit the output layer's capacity to represent the input layer, causing information loss. Therefore, an unsupervised learning method is used to train parameters θ = W x , W y , b x , b y by minimizing the loss function. For real-valued inputs, MSE is deemed as a suitable loss function, which is written as where Ω(W) denotes a regularization term that could avoid the overfitting problem of the model, such as L2 normalization; λ is decay parameter that balances the relative importance of the regularization term and MSE.
With the loss function, the group of parameter θ is updated by an optimization algorithm, i.e., stochastic gradient descent (SGD), as follows: where η is the learning rate that could also be dynamically adjusted, increasing the iteration number. When the loss function reaches the relative minimum, the weights matrix and bias vectors are set as AE's optimal parameters in the specified dataset.
A trained AE has the ability to reconstruct the input data by using a hidden layer. The hidden layer is regarded as an abstract feature representing one kind of property of input data. However, a single AE is limited in feature representation, especially when it comes to highly complex data. SAE, constructed with AEs layer by layer, was proposed to generate the deep abstract feature and was expected to enhance data classification performance. SAE is stacked with several AEs, in which every hidden unit of the former is set as input units of the next AE. Thus, the output of the SAE can be expressed as follows: where W 1 and b 1 denote the weight matrix and bias vector of first AE, respectively.
To obtain the SAE with the ability of data identification, there is a classification layer on top of SAE, such as Softmax [42], naive Bayes, and support vector machine (SVM). Consequently, two training processes are conducted to update the parameters. In the first step, the AEs are applied to the unsupervised method to find the optimal parameter of data reconstruction one by one, which is called "greedy layerwise pretraining". Then, fine-tuning is adopted to train the whole network. In this step, the loss function is deduced based on the difference between the truth label and the predicted one. It is recognized from the introduction above that all the parameters in the framework contribute to the loss function. Therefore, the error backpropagation algorithm and stochastic gradient descent are combined to optimize the loss function.

Prediction Framework
Through EQIL inventory statistics, many research works have proved that several related factors control the occurrence of EQIL. Based on these conclusions, the traditional prediction models are applied to represent the relationship between related factors and EQIL. However, these models paid more attention to every factor's contribution, rather than the combined effect of factors, which limited performance in EQIL prediction. Therefore, we propose a deep learning framework considering landslide source area to model the complex relationship and enhance spatial prediction accuracy.
As shown in Figure 2, the raw features, including triggering and predisposing factors, are input to the model after normalization processing to avoid the convergence failure problem. The first feature layer is then deduced by single-layer AE, which represents the raw dataset with minimum information loss. Some pretrained AEs are stacked to generate high-level abstract features layer by layer, and mapping the relationship between related factors and EQIL occurrence, which is so-called "greedy layer-wise training".
where defines some user prescribed tolerance. In Figure 1, a schematic that summarizes how the X-TFC algorithm works for solving inverse problems is shown. The main steps are also reported here:

1.
Approximate the latent solution(s) with the CE; 2.
Expand with the single layer NN (trained via ELM); 4.
Substitute into the DE (that can be also a system of DEs); 5.
Build the DE losses (that drive the training of the network, informing it with the physics of the problem); 6.
Build the data losses (the data can be provided on the solutions and/or on their derivatives); 7.
Train the network; 8.
Build the approximate solution (with the estimated optimal parameters).  At the top of SAE, a classifier is added to decide the probability as to whether an input pixel belongs to an EQIL. Initially, the classifier without supervised training is unable to specify the label of pixels; thus, a backpropagation algorithm is often employed to fine-tune the whole framework.
Many studies have indicated that a different group of factors influences the occurrence of EQIL, i.e., identifying EQIL could be considered as a multivariate inverse inference problem, where the hidden units are composite indicators of the prediction model. Generally, the number of hidden units is more than a composite indicator for the sake of complete feature extraction. However, more hidden units in the model increase the risk of feature redundancy and decrease its robustness. To limit the overabundance of information extraction, sparse optimization is applied to ensure proper units are encouraged to activate (a unit is supposed as "active" when its output is close to (1)). The loss function of AE is rewritten as follows: KL(ρ|| where KL(ρ|| ∧ ρ j ) is the Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean ρ and a Bernoulli random variable with mean ∧ ρ j ; β is the weight of the sparsity penalty term.
whereρ j is the average activation of hidden unit j on the training set, m is the number of samples, and a j denotes activation of hidden unit j based on specified x i .

Model Evaluation
Predicting EQIL and non-EQIL could be a typical binary classification. Therefore, we apply a confusion matrix to evaluate the performance of the model; as illustrated in Table 1, true positive (TP) denotes the number of correct predictions, and false positive (FP) denotes the number of incorrect predictions that should be non-EQIL. On the contrary, false negative (FN) denotes the number of incorrect predictions that should be EQIL, and true negative (TN) denotes the number of correct predictions. Based on the confusion matrix, overall accuracy (OA), Precision, and Recall are deduced as criteria for performance evaluation, which are written as follows: The OA is the probability that represents whether a sample is correctly classified, calculated by summing the number of corrected predictions and dividing by the total number of samples. Precision is the probability that the predicted EQIL is indeed an EQIL, which is deduced by summing the total number of correctly predicted EQIL examples and dividing by the total number of predicted EQILs. High recall score indicates the EQIL is correctly recognized at good performance, defined as the ratio of correctly predicted EQIL and the number of positive predictions.

EQIL Inventory
EQIL inventories are essential datasets for researchers exploring the relationship between earthquakes and landslides. Previously, field investigation has been commonly used to obtain an EQIL inventory, but it is time consuming and incomplete in some remote and inaccessible places. Remote sensing technology's popularity makes an EQIL inventory more reliable, which significantly extends our knowledge of EQILs by statistical methods [43]. Thus far, more than 300 inventories have been constructed after catastrophic earthquakes that have triggered large numbers of landslides. Among these, more than 60 EQIL inventories have been digitized and spread widely [44]. However, these digital EQIL inventories were collected from remote sensing images of different resolutions, leading to a large variability in detail and data format and lack of uniformity and standardization. In this paper, we chose the detailed Wenchuan EQIL inventory to test the model [45].

Study Area
The study area includes most regions affected by the Wenchuan earthquake and is located between longitudes 102.614-105.482 • E and latitudes 30.507-32.701 • N ( Figure 3). It is dominated by alpine valleys, with elevation ranging from 432 m to 5993 m, and the slope gradient mainly ranges between 20 and 50 • . Meanwhile, the mean annual precipitation is about 800-1400 mm, most of which takes place during the period from June to September [46]. Abundant rainfall has promoted high-density drainage networks, with intense surface erosion. earthquakes that have triggered large numbers of landslides. Among these, more than 60 EQIL inventories have been digitized and spread widely [44]. However, these digital EQIL inventories were collected from remote sensing images of different resolutions, leading to a large variability in detail and data format and lack of uniformity and standardization. In this paper, we chose the detailed Wenchuan EQIL inventory to test the model [45].

Study Area
The study area includes most regions affected by the Wenchuan earthquake and is located between longitudes 102.614-105.482°E and latitudes 30.507-32.701°N (Figure 3). It is dominated by alpine valleys, with elevation ranging from 432 m to 5993 m, and the slope gradient mainly ranges between 20 and 50°. Meanwhile, the mean annual precipitation is about 800-1400 mm, most of which takes place during the period from June to September [46]. Abundant rainfall has promoted high-density drainage networks, with intense surface erosion. This area has tectonically active mountain belts, namely the Longmenshan (LMS) thrust belt, which is the boundary zone between the Bayankala block of eastern Tibet Plateau and the Sichuan Basin of South China block. Multistage tectonic events have led to the extremely complex lithology of LMS. Due to tectonic stresses, the rock masses are highly fractured and weathered, and consequently prone to slope instability. Based on GPS, it has been proved that the extrusion of crustal material from the Tibetan Plateau against the rigid blocks of the Sichuan Basin induces deformation at about 1 mm/yr and accumulates stress in the Longmenshan regions. Therefore, there are many stress-release earthquakes on faults, especially on the most active faults, including the Yinxiu-Beichuan Fault, the Maoxian-Wenchuan Fault, the Guanxian-Anxian Fault, and the Pingwu- This area has tectonically active mountain belts, namely the Longmenshan (LMS) thrust belt, which is the boundary zone between the Bayankala block of eastern Tibet Plateau and the Sichuan Basin of South China block. Multistage tectonic events have led to the extremely complex lithology of LMS. Due to tectonic stresses, the rock masses are highly fractured and weathered, and consequently prone to slope instability. Based on GPS, it has been proved that the extrusion of crustal material from the Tibetan Plateau against the rigid blocks of the Sichuan Basin induces deformation at about 1 mm/yr and accumulates stress in the Longmenshan regions. Therefore, there are many stress-release earthquakes on faults, especially on the most active faults, including the Yinxiu-Beichuan Fault, the Maoxian-Wenchuan Fault, the Guanxian-Anxian Fault, and the Pingwu-Qingchuan Fault (Figure 3). On 12 May 2008, the devastating Wenchuan earthquake measured Mw 8.3 according to the China Earthquake Administration (CEA), and it occurred on the Yinxiu-Beichuan Fault. This earthquake caused more than 69,000 deaths and over 370,000 injuries, with 17,393 missing by 25 April 2009.

Training and Testing Samples
The Wenchuan earthquake triggered numerous landslides that caused serious casualties and property damage, especially in the high-relief mountain area. To explore the distribution patterns and mechanisms of EQILs, many studies focused on constructing the EQIL inventories by remote sensing technology and field surveys. However, those inventories were in the landslide's various presentation manner, from polygons, centroid points, or top points, which could affect the results of statistical analysis. Further, some EQILs inventories for some earthquakes are inconsistent in number because of different interpretation standards, remote sensing images, and experts. For those reasons, traditional inventories are not suitable for training and testing the proposed model. In this paper, multisource remote sensing images acquired after the earthquake were used to map landslides, referred to the existing inventory from [45], as shown in Figure 4. The landslides were mapped as polygons that separately document source areas and runout areas, based on image and topographic features (Figure 4b,c) [1]. Only the initiation areas were set as samples for model training and testing. It is worth noting that some small-scale rockfalls also cause scars on the image that confuse the interpretation map. Meanwhile, those small scars can hardly be divided into different parts, which might create errors when predicting results. Thus, we concentrate on landslides with clear shapes and boundaries. In total, 23,029 EQILs were ultimately mapped in the study area. distribution patterns and mechanisms of EQILs, many studies focused on constructing the EQIL inventories by remote sensing technology and field surveys. However, those inventories were in the landslide's various presentation manner, from polygons, centroid points, or top points, which could affect the results of statistical analysis. Further, some EQILs inventories for some earthquakes are inconsistent in number because of different interpretation standards, remote sensing images, and experts. For those reasons, traditional inventories are not suitable for training and testing the proposed model.
In this paper, multisource remote sensing images acquired after the earthquake were used to map landslides, referred to the existing inventory from [45], as shown in Figure 4. The landslides were mapped as polygons that separately document source areas and runout areas, based on image and topographic features (Figure 4b,c) [1]. Only the initiation areas were set as samples for model training and testing. It is worth noting that some small-scale rockfalls also cause scars on the image that confuse the interpretation map. Meanwhile, those small scars can hardly be divided into different parts, which might create errors when predicting results. Thus, we concentrate on landslides with clear shapes and boundaries. In total, 23,029 EQILs were ultimately mapped in the study area. The proposed model is a pixelwise framework that takes as input a feature vector of an individual pixel. The landslide polygons are converted to raster format data with a spatial resolution of 12.5 m, consistent with the resolution of the digital elevation model (DEM). Then, 8,193,389 pixels represent the whole landslides of the study area, and the same number of non-landslide pixels that are regarded as adversarial samples are randomly generated outside the landslide area. In total, a sample database with 1,638,778 pixels is used to support the proposed framework for learning the distribution pattern of The proposed model is a pixelwise framework that takes as input a feature vector of an individual pixel. The landslide polygons are converted to raster format data with a spatial resolution of 12.5 m, consistent with the resolution of the digital elevation model (DEM). Then, 8,193,389 pixels represent the whole landslides of the study area, and the same number of non-landslide pixels that are regarded as adversarial samples are randomly generated outside the landslide area. In total, a sample database with 1,638,778 pixels is used to support the proposed framework for learning the distribution pattern of EQIL. To obtain a robust prediction model, the samples are divided into training and testing sets at a ratio of 2:8. Detailed information is depicted in Table 2.

Training and Testing Samples
In the past three decades, many studies have taken advantage of statistical methods to analyze the relationship between controlling factors and EQIL. In general, these factors are mainly categorized into six groups: (1) seismic property; (2)  importance of factors, we select 14 factors as input variables for the model [1,5], as listed in Table 3. The raw data of controlling factors are different from the format, including raster, polyline, polygon, and point. To overlay all the data together, the raw data of factors are quantized and converted to raster with a spatial resolution of 12.5 m, except for topography factors.
For instance, epicenter directivity is denoted as the following index: x p > x e and y p > y e π + arctan( x p −x e y p −y e ) y p < y e 2π + arctan( x p −x e y p −y e ) x p < x e and y p > y e (16) where (x p , y p ) and (x e , y e ) are the coordinates of the epicenter and a specific pixel. The direction of surface rupture and fault controls the sliding direction of the landslide. To obtain quantitative input for the model, several steps are conducted to reprocess the raw distribution data of surface rupture and fault. In the first, the fault is cut into subsections that can more accurately describe the changes in direction along the fault. Then, the Thiessen polygon that generates competitive advantage trade areas for each point by creating boundary lines is constructed to determine the affected area of every subsection. The direction of the subsection is ultimately applied to indicate the fault direction of the subregion.
The occurrence of aftershocks is a time series process and has an additive effect on slope failure. Therefore, the aftershocks factor is produced by the kernel density analysis method that generates the raster data to model the influence of aftershocks.
Lithology and soil are polygon data, whose relationship with landslide is defined by the frequency ratio values (FR) method in this study. This indicates a strong correlation between EQIL and controlling factors when the value of FR is greater than 1.
where L i is the area of landslides in ith type, L t is the total area of landslides, A i is the area of ith type, and A is the total area of the study area.
After the preprocessing mentioned previously, all the factors are acquired as shown in Figure 5.
subsections that can more accurately describe the changes in direction along the fault. Then, the Thiessen polygon that generates competitive advantage trade areas for each point by creating boundary lines is constructed to determine the affected area of every subsection. The direction of the subsection is ultimately applied to indicate the fault direction of the subregion.
The occurrence of aftershocks is a time series process and has an additive effect on slope failure. Therefore, the aftershocks factor is produced by the kernel density analysis method that generates the raster data to model the influence of aftershocks.
Lithology and soil are polygon data, whose relationship with landslide is defined by the frequency ratio values (FR) method in this study. This indicates a strong correlation between EQIL and controlling factors when the value of FR is greater than 1.
where i L is the area of landslides in i th type, t L is the total area of landslides, i A is the area of i th type, and A is the total area of the study area.
After the preprocessing mentioned previously, all the factors are acquired as shown in Figure 5.

Experiment and Results
Using sample data and control factors, the model was trained and tested to find the optimal model structure and parameters. Then, the proposed model was compared with traditional methods and its performance evaluated. Finally, the advantages of the model in EQIL prediction are discussed in relation to feature extraction and threshold segmentation.

Experiment and Results
Using sample data and control factors, the model was trained and tested to find the optimal model structure and parameters. Then, the proposed model was compared with traditional methods and its performance evaluated. Finally, the advantages of the model in EQIL prediction are discussed in relation to feature extraction and threshold segmentation.

Framework Setting
Many studies have proved that the structure of a deep learning model has a great influence on prediction results. For different datasets, the model must be analyzed to obtain optimal framework settings. In this study, four kinds of framework setting (including hidden units, hidden layers, learning rate, and number of iterations) were considered.
The number of hidden units is related to model parameters that decide the model's complexity and whether the model is prone to overfitting. Setting the optimal number of hidden units is challenging, as it is susceptible to many factors, such as the number of input and output features. To obtain the suitable number of hidden units for EQIL spatial prediction, the ascending series in intervals of two are experimented with one by one and then assessed by OA. As shown in Figure 6, it is proven that the 110 hidden units are suitable for representing the dataset and can enhance the performance of EQIL prediction in this study.
influence on prediction results. For different datasets, the model must be analyzed to ob-tain optimal framework settings. In this study, four kinds of framework setting (including hidden units, hidden layers, learning rate, and number of iterations) were considered.
The number of hidden units is related to model parameters that decide the model's complexity and whether the model is prone to overfitting. Setting the optimal number of hidden units is challenging, as it is susceptible to many factors, such as the number of input and output features. To obtain the suitable number of hidden units for EQIL spatial prediction, the ascending series in intervals of two are experimented with one by one and then assessed by OA. As shown in Figure 6, it is proven that the 110 hidden units are suitable for representing the dataset and can enhance the performance of EQIL prediction in this study. The depth of the framework is denoted by the number of hidden layers that map the input features to high-level abstract features that represent the multilayer relationship between EQIL occurrence and its controlling factors. The model then takes advantage of these mapping relationships to assess whether a slope is prone to landslides under an earthquake. For complex target recognition, more hidden layers can properly narrow the "semantic gap", which is defined as the difference between two kinds of representation systems of one object. However, the controlling factors are a high-level description of EQIL by different aspects compared with other raw data (such as a remote sensing image that just records the features of objects in several channels). It is still necessary to extract more abstract features to learn the pattern of EQIL spatial distribution, and then predict potential EQIL.
Therefore, a group of experiments that are in the same setting, except for the number of hidden layers, is conducted to establish how the final prediction accuracy is affected overall. As shown in Figure 6, the OA initially becomes higher with hidden layers increasing; it then decreases. In conclusion, three hidden layers are considered as an optimal setting in this study.
Learning rate decides the convergence speed in processing parameter optimization and whether the optimization goal is becoming globally optimal. It is influenced by the size of sample data and the gradient of the loss function. We considered several learning rates, including 0.0001, 0.001, 0.01, 0.1, and 0.8, to assess the performance, as shown in The depth of the framework is denoted by the number of hidden layers that map the input features to high-level abstract features that represent the multilayer relationship between EQIL occurrence and its controlling factors. The model then takes advantage of these mapping relationships to assess whether a slope is prone to landslides under an earthquake. For complex target recognition, more hidden layers can properly narrow the "semantic gap", which is defined as the difference between two kinds of representation systems of one object. However, the controlling factors are a high-level description of EQIL by different aspects compared with other raw data (such as a remote sensing image that just records the features of objects in several channels). It is still necessary to extract more abstract features to learn the pattern of EQIL spatial distribution, and then predict potential EQIL.
Therefore, a group of experiments that are in the same setting, except for the number of hidden layers, is conducted to establish how the final prediction accuracy is affected overall. As shown in Figure 6, the OA initially becomes higher with hidden layers increasing; it then decreases. In conclusion, three hidden layers are considered as an optimal setting in this study.
Learning rate decides the convergence speed in processing parameter optimization and whether the optimization goal is becoming globally optimal. It is influenced by the size of sample data and the gradient of the loss function. We considered several learning rates, including 0.0001, 0.001, 0.01, 0.1, and 0.8, to assess the performance, as shown in Table 4. It is clear that the bigger learning rate is suitable for optimizing the constructed objective function in this study. The number of iterations determines whether the parametric optimization process of the model converges. Too many iterations will cause the overfitting of the model and reduce its robustness. However, too few iterations will result in the model underfitting and local optimality. Therefore, we searched for the optimal number of iterations through multiple sets of experiments, and the results are shown in Figure 7. In the initial iteration, the loss function converges quickly and then slows down gradually. After 30,000 iterations, the accuracy of the model in the test set and the training set rose asynchronously, indicating that the model reached overfitting. Consequently, for this dataset and model setting, we adopted 30,000 iterations to optimize the model's parameters. The number of iterations determines whether the parametric optimization process of the model converges. Too many iterations will cause the overfitting of the model and reduce its robustness. However, too few iterations will result in the model underfitting and local optimality. Therefore, we searched for the optimal number of iterations through multiple sets of experiments, and the results are shown in Figure 7. In the initial iteration, the loss function converges quickly and then slows down gradually. After 30,000 iterations, the accuracy of the model in the test set and the training set rose asynchronously, indicating that the model reached overfitting. Consequently, for this dataset and model setting, we adopted 30,000 iterations to optimize the model's parameters.

Visualizing Result and Performance Assessment
After obtaining the optimal settings for the framework, the training and testing samples are used to optimize all parameters and then validate the deep learning framework considering the landslide source area. Three other traditional machine learning algorithms, including logistic regression (LR), support vector machine (SVM) [48], and random forest (RF) [49], are also compared with our methods to analyze their capacity for data mining and EQIL spatial prediction. The LR model constructs the loss function with L2-norm; the regularization coefficient is 1. For SVM, the kernel is radial basis function, the penalty coefficient is 1, and the parameter of kernel function is 0.001. For RF, the number of trees is 62 and the maximum depth is 13.
With the optimal setting, all the methods are trained and tested 20 times to eliminate the influence of random initialization of parameters on evaluation results. Table 5 lists the OA, Precision, and Recall of the proposed method and other methods. Our method obtains the highest OA value of 91.88%, which is approximately 7% higher than that of RF in second place, followed by SVM and LR, with OA values of 82.22% and 80.75%, respectively. In terms of EQIL prediction precision, the four models are 79.1%, 80.7%, 81.93%, 87.56%, in which our proposed method obtains the highest values. Overall, the deep learning method has the best performance on sample data compared with other shallow machine learning models.

Visualizing Result and Performance Assessment
After obtaining the optimal settings for the framework, the training and testing samples are used to optimize all parameters and then validate the deep learning framework considering the landslide source area. Three other traditional machine learning algorithms, including logistic regression (LR), support vector machine (SVM) [48], and random forest (RF) [49], are also compared with our methods to analyze their capacity for data mining and EQIL spatial prediction. The LR model constructs the loss function with L2-norm; the regularization coefficient is 1. For SVM, the kernel is radial basis function, the penalty coefficient is 1, and the parameter of kernel function is 0.001. For RF, the number of trees is 62 and the maximum depth is 13.
With the optimal setting, all the methods are trained and tested 20 times to eliminate the influence of random initialization of parameters on evaluation results. Table 5 lists the OA, Precision, and Recall of the proposed method and other methods. Our method obtains the highest OA value of 91.88%, which is approximately 7% higher than that of RF in second place, followed by SVM and LR, with OA values of 82.22% and 80.75%, respectively. In terms of EQIL prediction precision, the four models are 79.1%, 80.7%, 81.93%, 87.56%, in which our proposed method obtains the highest values. Overall, the deep learning method has the best performance on sample data compared with other shallow machine learning models.  Figure 8 shows the four kinds of predicted result in the whole study area when 0.5 is set as a discrimination threshold of probability. All the results reflect the phenomenon that EQILs distribute along a surface rupture [50]. It can be proved that machine learning models, including shallow models, could learn and apply the basic distribution patterns to predict EQILs spatially. Furthermore, our proposed method has the advantage of elegantly representing EQIL distribution, especially in the north tail of surface rupture and the Wenchuan-Maoxian valley, which require more discriminative features to enhance the model's prediction ability.

Measurements
Logistic Regression  Figure 8 shows the four kinds of predicted result in the whole study area when 0.5 is set as a discrimination threshold of probability. All the results reflect the phenomenon that EQILs distribute along a surface rupture [50]. It can be proved that machine learning models, including shallow models, could learn and apply the basic distribution patterns to predict EQILs spatially. Furthermore, our proposed method has the advantage of elegantly representing EQIL distribution, especially in the north tail of surface rupture and the Wenchuan-Maoxian valley, which require more discriminative features to enhance the model's prediction ability. As mentioned above, many shallow machine learning methods have been applied for EQIL spatial predictions, including LR, SVM, decision tree, RF, and AdaBoost. However, these methods depend heavily on the quality of related factors and sample data, whereas the deep learning model is robust for pattern recognition and classification, especially for small sample sizes and highly coarse datasets. We use the trained models to produce the receiver operating characteristic (ROC) curve to validate different EQIL prediction performances between two kinds of machine learning method, as shown in Figure  9. Our method demonstrated the best predictive power compared with other methods in As mentioned above, many shallow machine learning methods have been applied for EQIL spatial predictions, including LR, SVM, decision tree, RF, and AdaBoost. However, these methods depend heavily on the quality of related factors and sample data, whereas the deep learning model is robust for pattern recognition and classification, especially for small sample sizes and highly coarse datasets. We use the trained models to produce the receiver operating characteristic (ROC) curve to validate different EQIL prediction performances between two kinds of machine learning method, as shown in Figure 9. Our method demonstrated the best predictive power compared with other methods in terms of area under the curve (AUC). Specifically, the other three methods obtain similar AUC values of 0.88, 0.88, and 0.89, respectively.

High-Level Feature Representation
An EQIL is an extremely complex earth surface process that is controlled by numer ous predisposing factors and a triggering event. Analyzing the relationship between fac tors and EQIL distribution paves the way for recognizing EQIL occurrence and spatia prediction mechanisms. Deep learning as a kind of effective data mining model has bee applied in many study areas, and it is well known for its discriminative feature extraction As shown in Figure 10, the raw features are derived from topographical, geological, an hydrological data, which are then mapped to hidden layers that indicate the controllin effect from different factors. That is to say that a different group of factors is processed b the model and generates a synthetical index related to EQIL occurrence, which is differen from traditional methods that consider nearly every related factor contributing to the ob ject separately.

High-Level Feature Representation
An EQIL is an extremely complex earth surface process that is controlled by numerous predisposing factors and a triggering event. Analyzing the relationship between factors and EQIL distribution paves the way for recognizing EQIL occurrence and spatial prediction mechanisms. Deep learning as a kind of effective data mining model has been applied in many study areas, and it is well known for its discriminative feature extraction. As shown in Figure 10, the raw features are derived from topographical, geological, and hydrological data, which are then mapped to hidden layers that indicate the controlling effect from different factors. That is to say that a different group of factors is processed by the model and generates a synthetical index related to EQIL occurrence, which is different from traditional methods that consider nearly every related factor contributing to the object separately.

High-Level Feature Representation
An EQIL is an extremely complex earth surface process that is controlled by numerous predisposing factors and a triggering event. Analyzing the relationship between factors and EQIL distribution paves the way for recognizing EQIL occurrence and spatial prediction mechanisms. Deep learning as a kind of effective data mining model has been applied in many study areas, and it is well known for its discriminative feature extraction. As shown in Figure 10, the raw features are derived from topographical, geological, and hydrological data, which are then mapped to hidden layers that indicate the controlling effect from different factors. That is to say that a different group of factors is processed by the model and generates a synthetical index related to EQIL occurrence, which is different from traditional methods that consider nearly every related factor contributing to the object separately.

Performance of Rock Landslide Prediction
A large earthquake (magnitude > 6) can trigger landslides of shallow soil, as well as large, deep rock landslides [51]. A landslide's initiation is controlled by different failure mechanisms associated with material types, topography, and geotechnical features [52][53][54]. Landslides in bedrock are especially strongly influenced by their structure and preaccumulated failure, which consequently result in complex failure mechanisms compared with soil landslides [55]. Thus, we selected one of the earthquake-affected areas where EQILs were dominated by rock materials to validate our model. As shown in Figure 11, both predictive results from our proposed and FR methods could cover the EQILs from the inventory data. However, FR carries out the final result with several non-EQIL areas, leading to a high false alarm rate. It is proven that the conventional prediction models have a limited capacity for synthetical analysis when faced with a multiple-parameters problem. Therefore, the predictive result from these models is always controlled by the master EQIL-related factors that mathematically have high variance. However, our proposed method could construct high-level abstract features to represent the influence of a combination of different factors on EQIL occurrence and make the spatial prediction more coincident with EQIL distribution.
It is concluded that our deep learning-based model has the potential ability to break through the bottleneck in EQIL predictions. Meanwhile, multilayer feature mapping could help researchers to update the knowledge of landslides that was acquired by statistical methods in the past.

Performance of Rock Landslide Prediction
A large earthquake (magnitude > 6) can trigger landslides of shallow soil, as well as large, deep rock landslides [51]. A landslide's initiation is controlled by different failure mechanisms associated with material types, topography, and geotechnical features [52][53][54]. Landslides in bedrock are especially strongly influenced by their structure and preaccumulated failure, which consequently result in complex failure mechanisms compared with soil landslides [55]. Thus, we selected one of the earthquake-affected areas where EQILs were dominated by rock materials to validate our model. As shown in Figure 11, both predictive results from our proposed and FR methods could cover the EQILs from the inventory data. However, FR carries out the final result with several non-EQIL areas, leading to a high false alarm rate. It is proven that the conventional prediction models have a limited capacity for synthetical analysis when faced with a multiple-parameters problem. Therefore, the predictive result from these models is always controlled by the master EQIL-related factors that mathematically have high variance. However, our proposed method could construct high-level abstract features to represent the influence of a combination of different factors on EQIL occurrence and make the spatial prediction more coincident with EQIL distribution.
It is concluded that our deep learning-based model has the potential ability to break through the bottleneck in EQIL predictions. Meanwhile, multilayer feature mapping could help researchers to update the knowledge of landslides that was acquired by statistical methods in the past.

Influence on Factor Importance
Many studies have confirmed that the initiation of an EQIL is an extremely complex process, indicating why the hazardous assessment and accurate prediction of EQILs have not yet been attempted. Over the years, researchers have introduced a variety of conventional models for EQIL prediction (such as LR, SVM, RF algorithm) and successfully applied them to many EQIL inventories all over the world [44]. Lacking the capacity for deep feature transformation, conventional algorithms usually improve the accuracy of EQIL prediction by selecting high-importance factors to enhance model robustness [11]. However, this data preprocessing method cannot further extract hidden features of coarse and

Influence on Factor Importance
Many studies have confirmed that the initiation of an EQIL is an extremely complex process, indicating why the hazardous assessment and accurate prediction of EQILs have not yet been attempted. Over the years, researchers have introduced a variety of conventional models for EQIL prediction (such as LR, SVM, RF algorithm) and successfully applied them to many EQIL inventories all over the world [44]. Lacking the capacity for deep feature transformation, conventional algorithms usually improve the accuracy of EQIL prediction by selecting high-importance factors to enhance model robustness [11]. However, this data preprocessing method cannot further extract hidden features of coarse and redundant data, and cannot improve the data utilization of EQIL inventories [56]. The coseismic landslide prediction model proposed in this paper is not only superior in terms of data input (extracting sliding features as model input), but also adopts a deep learning model based on a stacked autoencoder. Our proposed accurate prediction framework considering the EQIL source area is not only superior in terms of data input (extracting sliding features as model input), but also introduces SAE to extract the complex features of redundant data. To explain why this model has advantages in terms of EQIL prediction, we conduct several comparative experiments for EQIL distribution with different numbers of controlling factors that have different information contributions, thus reflecting the model's predictive ability.
First, the information gain function is applied to calculate the feature importance index and rearrange the 14 predisposing factors ( Figure 12). As mentioned in [57,58], the information gain function (IG) is one of the quickest and easiest attribute ranking methods that is often used in feature selections and for identifying root nodes in tree-based models. As shown in Figure 12, initially, the prediction performance of all methods improved with the increasing number of input-controlling factors. The prediction accuracies of RF, SVM, and LR remain stable or even decline when the input-controlling factors are greater than seven, which is due to the fact that increasing factors contribute less to EQIL distribution. On the contrary, the proposed method continually obtains better prediction performance. Thus, two points can be concluded: (1) the shallow machine learning models only take advantage of the significant factors for EQIL prediction, but deep learning models can extract more effective information of EQIL distribution from low-value density data, which is why prediction accuracy grows when input factors increase; (2) the EQIL distribution pattern is controlled by a combination of multiple factors; only when the slope unit satisfies specific combination conditions can a landslide be triggered by a catastrophic earthquake. These two conclusions explain why the deep learning model is superior in EQIL prediction, and also provide a hint that some new knowledge of EQIL can be represented by high-level abstract features extracted by the hidden layer of our deep learning model. redundant data, and cannot improve the data utilization of EQIL inventories [56]. The coseismic landslide prediction model proposed in this paper is not only superior in terms of data input (extracting sliding features as model input), but also adopts a deep learning model based on a stacked autoencoder. Our proposed accurate prediction framework considering the EQIL source area is not only superior in terms of data input (extracting sliding features as model input), but also introduces SAE to extract the complex features of redundant data. To explain why this model has advantages in terms of EQIL prediction, we conduct several comparative experiments for EQIL distribution with different numbers of controlling factors that have different information contributions, thus reflecting the model's predictive ability. First, the information gain function is applied to calculate the feature importance index and rearrange the 14 predisposing factors ( Figure 12). As mentioned in [57,58], the information gain function (IG) is one of the quickest and easiest attribute ranking methods that is often used in feature selections and for identifying root nodes in tree-based models. As shown in Figure 12, initially, the prediction performance of all methods improved with the increasing number of input-controlling factors. The prediction accuracies of RF, SVM, and LR remain stable or even decline when the input-controlling factors are greater than seven, which is due to the fact that increasing factors contribute less to EQIL distribution. On the contrary, the proposed method continually obtains better prediction performance. Thus, two points can be concluded: (1) the shallow machine learning models only take advantage of the significant factors for EQIL prediction, but deep learning models can extract more effective information of EQIL distribution from low-value density data, which is why prediction accuracy grows when input factors increase; (2) the EQIL distribution pattern is controlled by a combination of multiple factors; only when the slope unit satisfies specific combination conditions can a landslide be triggered by a catastrophic earthquake. These two conclusions explain why the deep learning model is superior in EQIL prediction, and also provide a hint that some new knowledge of EQIL can be represented by high-level abstract features extracted by the hidden layer of our deep learning model. Figure 12. Controlling factor importance and prediction accuracies of different models. The prediction performance of all methods improved with the increasing number of input-controlling factors; Figure 12. Controlling factor importance and prediction accuracies of different models. The prediction performance of all methods improved with the increasing number of input-controlling factors; however, the performance of the shallow machine learning model then decreases or remains stable when low-value density data are input, except for the method presented in this work.

Conclusions
Earthquakes near human communities can result in significant loss of human life and damage to the environment, especially in high-altitude and steep mountain areas. Meanwhile, earthquakes can trigger many landslides that create secondary geological disasters, increasing the intensity of earthquake damage. Thus, spatially predicting EQILs based on distribution pattern is essential for disaster mitigation. For the last two decades, complete and normalized EQIL inventories have extended our knowledge of the relationship between earthquakes and the landslides they can trigger by means of statistical analysis. However, an effective method to detect new EQIL distribution patterns in large, multiparameter datasets has been lacking, which has limited the performance of EQIL prediction for a long period. Furthermore, the source area features of EQIL, which could well represent predisposing conditions for EQIL occurrence, was seldom taken into consideration in prediction models. Deep learning is known as feature extraction in complex and highly coarse datasets, which could be used to analyze the combinations of different factors that influence EQIL occurrence.
In this study, we constructed a deep learning framework based on the stacked autoencoder (SAE) and considered the source area features of EQILs to model their distribution. The complete EQIL inventory of the Wenchuan earthquake was then applied to assess the performance of the proposed methods. After several controlled experiments, the optimal framework settings (including the number of hidden units, depth of hidden layers, iteration number, learning rate) are 120, 3, 20,000, and 0.8, respectively. Ultimately, the proposed method was compared with traditional prediction models, showing that the deep learning-based model is superior in the spatial prediction of EQILs. Meanwhile, this study reveals that shallow machine learning models only take advantage of the significant factors for EQIL prediction, but deep learning models can extract more effective information about EQIL distribution from low-value density data, which is why its prediction accuracy grows when input factors increase. There is also hope that some new knowledge about EQILs can be represented by high-level abstract features extracted by hidden layers of our deep learning model; these were acquired by statistical methods in the past. In future, our mature EQIL prediction model is expected to be promoted and applied to two scenarios: rapid positioning in the secondary disaster-prone area after an earthquake, based on a basic database, and providing a reference for disaster mitigation, thus creating a risk assessment of potential seismic areas to improve the efficiency of risk management. Data Availability Statement: The data are not publicly available due to privacy issues.