Mapping Post-Earthquake Landslide Susceptibility: A U-Net Like Approach

: A serious earthquake could trigger thousands of landslides and produce some slopes more sensitive to slide in future. Landslides could threaten human’s lives and properties, and thus mapping the post-earthquake landslide susceptibility is very valuable for a rapid response to landslide disasters in terms of relief resource allocation and posterior earthquake reconstruction. Previous researchers have proposed many methods to map landslide susceptibility but seldom considered the spatial structure information of the factors that inﬂuence a slide. In this study, we ﬁrst developed a U-net like model suitable for mapping post-earthquake landslide susceptibility. The post-earthquake high spatial airborne images were used for producing a landslide inventory. Pre-earthquake Landsat TM (Thematic Mapper) images and the inﬂuencing factors such as digital elevation model (DEM), slope, aspect, multi-scale topographic position index (mTPI), lithology, fault, road network, streams network, and macroseismic intensity (MI) were prepared as the input layers of the model. Application of the model to the heavy-hit area of the destructive 2008 Wenchuan earthquake resulted in a high validation accuracy (precision 0.77, recall 0.90, F1 score 0.83, and AUC 0.90). The performance of this U-net like model was also compared with those of traditional logistic regression (LR) and support vector machine (SVM) models on both the model area and independent testing area with the former being stronger than the two traditional models. The U-net like model introduced in this paper provides us the inspiration that balancing the environmental inﬂuence of a pixel itself and its surrounding pixels to perform a better landslide susceptibility mapping (LSM) task is useful and feasible when using remote sensing and GIS technology


Introduction
Serious earthquakes, particularly those that occurred in mountainous regions can trigger thousands of landslides because of unfavorable geomorphic environments [1]. These landslides could damage roads, crush buildings, block rivers, etc. [2,3]. Remote sensing images acquired by satellite, airborne, or UAV (unmanned airborne vehicle) have been widely used to investigate landslides, especially high spatial resolution imagery [4]. However, some previous stable slopes without showing an historical earthquake of the study area was happened on 28 August 1933 with the Mw7.3 in Mao county. On 12 May 2008, Wenchuan earthquake occurred with the Mw 7.9 (Ms 8.0) and 19 km depth. This earthquake was characterized mainly by thrust motion with right-lateral strike slip [40], the max peak ground acceleration (PGA) value 1.15 g according to USGS. It was the most destructive event since the 1976 Tangshan earthquake, leaving 69,200 dead, 18,195 missing, 374,216 injured, 5,362,500 houses collapsed, 21,426,600 houses badly damaged, and more than five million people made homeless [41]. This earthquake originated from the collision between the Indian and Eurasian plates [42], and directly caused a huge number of landslides due to the high and steep slopes in mountainous areas of 13 counties [2,43].

Pre-Earthquake Data
For the purpose of post-earthquake landslide susceptibility mapping we should use the preearthquake data to train the model so that we could predict the susceptibility of the whole study area. The original pre-earthquake Landsat remote sensing data was used to represent the land use/land cover and vegetation cover conditions which often used as influencing factors in LSM task.
Google Earth Engine (GEE) provides free and publicly accessible historical remote sensing images at a global scale including those collected Landsat, MODIS, and Sentinel [44] sensors. In addition, Google provides an efficient web-based interactive development environment (IDE) that enables rapid and easy accessing these data according to different requirements [45].
Given the date for the main earthquake is 12 May 2008, remote sensing data before this date are pre-earthquake data. With the help of Google Earth IDE, we conveniently created a pre-earthquake Landsat dataset for the study area based on the U.S. Geological Survey (USGS) Landsat 5 Surface Reflectance Tier 1 dataset. The dataset had 278 scenes acquired from 12 May 2005 to 11 May 2008. We composited all the images to a single scene for which a median method was used to minimize the effects of cloud and cloud shadows [45]. The final dataset contained 7 bands including 4 visible and near-infrared (VNIR) bands, 2 short-wave infrared (SWIR) bands and one thermal infrared (TIR) Figure 1. Location of the study area. The gray color area which cover 13 counties is the heavy hit mountain area of the Wenchuan earthquake. The abbreviation of HE represents of historical earthquake.

Pre-Earthquake Data
For the purpose of post-earthquake landslide susceptibility mapping we should use the pre-earthquake data to train the model so that we could predict the susceptibility of the whole study area. The original pre-earthquake Landsat remote sensing data was used to represent the land use/land cover and vegetation cover conditions which often used as influencing factors in LSM task.
Google Earth Engine (GEE) provides free and publicly accessible historical remote sensing images at a global scale including those collected Landsat, MODIS, and Sentinel [44] sensors. In addition, Google provides an efficient web-based interactive development environment (IDE) that enables rapid and easy accessing these data according to different requirements [45].
Given the date for the main earthquake is 12 May 2008, remote sensing data before this date are pre-earthquake data. With the help of Google Earth IDE, we conveniently created a pre-earthquake Landsat dataset for the study area based on the U.S. Geological Survey (USGS) Landsat 5 Surface Reflectance Tier 1 dataset. The dataset had 278 scenes acquired from 12 May 2005 to 11 May 2008. We composited all the images to a single scene for which a median method was used to minimize the effects of cloud and cloud shadows [45]. The final dataset contained 7 bands including 4 visible and near-infrared (VNIR) bands, 2 short-wave infrared (SWIR) bands and one thermal infrared (TIR) band. The VNIR and SWIR bands have a resolution of 30 m/pixel. The original TIR band at 120 m/pixel has been resampled to 30 m/pixel (https://developers.google.com/earth-engine/datasets/catalog/ LANDSAT_LT05_C01_T1_SR#description). Figure 2 shows a color composite of the pre-earthquake Thematic Mapper (TM) images. Airborne images were acquired using Leica ADS40/80 digital cameras after the Wenchuan earthquake. The advanced airborne ADS40/80 camera has a large field of view (FOV) of 64 degrees and an instantaneous field of view (IFOV) of 0.1 mrad. The camera has a panchromatic band and four

Post-Earthquake Data
Airborne images were acquired using Leica ADS40/80 digital cameras after the Wenchuan earthquake. The advanced airborne ADS40/80 camera has a large field of view (FOV) of 64 degrees and an instantaneous field of view (IFOV) of 0.1 mrad. The camera has a panchromatic band and four spectral bands including blue, green, red, and near infrared. The spatial resolution of these images ranges from 0.3 to 0.5 m, depending on the flight altitude involved. Eight image strips obtained from 15 May 2008 to 28 May 2008 were used for identifying the post-earthquake landslides. No. 1 to 7 images (listed on Table 1 and shown on Figure 2) were used as the model area for the development of the model while No. 8 image for independent testing.

Landslide Inventory
The serious earthquake induced large number of landslides. The good vegetation coverage in the study area and the high spatial resolution (0.3-0.5m) of post-earthquake airborne images make it easy to identify these landslides. More than 5 experienced researchers from the Chinese Academy of Sciences (CAS) performed visual interpretations of landslides for about one week. The interpretation result was verified in the field showing a validation accuracy more than 98% [46,47]. The total number of interpreted landslides was 1148, with the type of which including slide (392), rock fall (535), and debris flow (221), respectively. Most of the interpreted landslides only contain depletion zones, but sometimes they comprise track or accumulation zones, because some of them are mixed and difficult to distinguish. The average area of these landslides was 58,500 m 2 with the largest area being 153,000 m 2 and the smallest area being 715 m 2 . The landslide distribution and photographs of typical landslides in the field were shown in Figure 3.

Landslide Inventory
The serious earthquake induced large number of landslides. The good vegetation coverage in the study area and the high spatial resolution (0.3-0.5m) of post-earthquake airborne images make it easy to identify these landslides. More than 5 experienced researchers from the Chinese Academy of Sciences (CAS) performed visual interpretations of landslides for about one week. The interpretation result was verified in the field showing a validation accuracy more than 98% [46,47]. The total number of interpreted landslides was 1148, with the type of which including slide (392), rock fall (535), and debris flow (221), respectively. Most of the interpreted landslides only contain depletion zones, but sometimes they comprise track or accumulation zones, because some of them are mixed and difficult to distinguish. The average area of these landslides was 58,500 m 2 with the largest area being 153,000 m 2 and the smallest area being 715 m 2 . The landslide distribution and photographs of typical landslides in the field were shown in Figure 3.

Landslide Influencing Factors
Spatial distribution of the post-earthquake landslides is significantly controlled by the surrounding topography, geology, human activity conditions. These conditions combined with the earthquake activity in different layers were taken as influencing factors of the post-earthquake landslides. All these influencing factors were prepared in GIS using ArcGIS 10.6.

Landslide Influencing Factors
Spatial distribution of the post-earthquake landslides is significantly controlled by the surrounding topography, geology, human activity conditions. These conditions combined with the earthquake activity in different layers were taken as influencing factors of the post-earthquake landslides. All these influencing factors were prepared in GIS using ArcGIS 10.6.

Topography
Topography affects the post-earthquake landslides in many aspects. It is widely accepted that a high angle slope is prone to landslides because a larger inclination increases the shear stress on soil [48]. Elevation and aspect impact vegetation cover, moisture retention and soil strength which can influence landslide initiation [49,50]. In addition, because of the direction of the seismic wave propagation, different altitudes and aspects may suffer different movements [48,51]. In this paper, the slope and aspect were both computed from the SRTM digital elevation model (DEM) using ArcGIS10.6. The slope value ranges from 0 to 87 • instead of the true degree of the slope. The aspect is measured clockwise in degrees from 0 (due north) to 360 (again due north), coming full circle. Flat areas are given a value of −1.
Global ALOS multi-scale topographic position index (mTPI) was also used. The mTPI can be used to distinguishes ridges from valleys which contribute to the landslide occurrence. It is calculated using elevation data for each location subtracted by the mean elevation within a neighborhood. The neighborhood radius (km) are 115.8, 89.9, 35.5, 13.1, 5.6, 2.8, and 1.2, respectively [52]. The dataset is available from GEE (https://developers.google.com/earth-engine/datasets/catalog/CSP_ERGo_1_0_Global_ALOS_mTPI# description). The topography information of post-earthquake landslides is shown in Figure 4.

Lithology and Fault
Geology conditions such as lithology can affect the strength and permeability of the surface and sub-surface material which add the occurrence probability of landslide [53]. Lithology and faults can affect the rock fragmentation which can be an important influencing factor of the post-earthquake landslides [53,54]. The lithology and fault were extracted from the geological map with the scale 1:200,000 ( Figure 5 and Table 2).

Human Activity
In mountainous areas, road construction often results in excavation of slope toe while stream can erode the toe of mountain and saturate the slide due to increase in water infiltration. All of these can lead to the destabilization of slope and eventually sliding [53]. The road and stream network were obtained from the National Geomatics Center of China (NGCC) with the scale 1:50,000 ( Figure 6).

Seismic Parameters
In very short time after a serious earthquake, the USGS can produce a series of earthquake products such as macroseismic intensity (MI), peak ground acceleration (PGA), and peak ground velocity (PGV) maps [55,56]. The map MI is calculated by PGA, PGV and internet users' shaking and damage reports which could more effectively reflect the damage of the ground surface by the earthquake. So, we used the map MI ( Figure 7) as an influencing factor to evaluate post-earthquake landslides susceptibility (https://earthquake.usgs.gov/earthquakes/eventpage/usp000g650/shakemap/intensity).
Global ALOS multi-scale topographic position index (mTPI) was also used. The mTPI can be used to distinguishes ridges from valleys which contribute to the landslide occurrence. It is calculated using elevation data for each location subtracted by the mean elevation within a neighborhood. The neighborhood radius (km) are 115.8, 89.9, 35.5, 13.1, 5.6, 2.8, and 1.2, respectively [52]. The dataset is available from GEE (https://developers.google.com/earthengine/datasets/catalog/CSP_ERGo_1_0_Global_ALOS_mTPI#description).
The topography information of post-earthquake landslides is shown in Figure 4.

Lithology and Fault
Geology conditions such as lithology can affect the strength and permeability of the surface and sub-surface material which add the occurrence probability of landslide [53]. Lithology and faults can affect the rock fragmentation which can be an important influencing factor of the post-earthquake landslides [53,54]. The lithology and fault were extracted from the geological map with the scale 1:200,000 ( Figure 5 and Table 2).

Seismic Parameters
In very short time after a serious earthquake, the USGS can produce a series of earthquake products such as macroseismic intensity (MI), peak ground acceleration (PGA), and peak ground velocity (PGV) maps [55,56]. The map MI is calculated by PGA, PGV and internet users' shaking and damage reports which could more effectively reflect the damage of the ground surface by the earthquake. So, we used the map MI ( Figure 7) as an influencing factor to evaluate post-earthquake landslides susceptibility (https://earthquake.usgs.gov/earthquakes/eventpage/usp000g650/shakemap/intensity).

Seismic Parameters
In very short time after a serious earthquake, the USGS can produce a series of earthquake products such as macroseismic intensity (MI), peak ground acceleration (PGA), and peak ground velocity (PGV) maps [55,56]. The map MI is calculated by PGA, PGV and internet users' shaking and damage reports which could more effectively reflect the damage of the ground surface by the earthquake. So, we used the map MI ( Figure 7) as an influencing factor to evaluate post-earthquake landslides susceptibility (https://earthquake.usgs.gov/earthquakes/eventpage/usp000g650/shakemap/intensity).

Traditional CNN and U-Net Model
Traditional CNN models consist of one or more convolution, pooling, or fully connected operation layers. The convolution operation is to extract different features from the input layer. The pooling operation is to reduce the dimensionality of feature maps and make the model more concern about the existence of certain features rather than precise location of the features. The fully connected layer reorganizes extracted features to map to the final outputs. CNN model was first time trained by Lecun Y. using backpropagation method for the task of classifying images of handwritten digits in 1989 [57,58]. Then many scholars successfully developed different CNN models such as LeNet [59], AlexNet [60], GoogleNet [61], VGGNet [62], and ResNet [63] et al. These traditional CNN models have had great success on image classification tasks, where the typical output to an image is a single class label. However, on image segmentation tasks, the class label is supposed to be assigned to each pixel. For this purpose, sliding-window to each pixel [64] or adding decoding path [65] to CNN architecture could be used. The later method has developed rapidly in recent years due to higher efficiency, and U-net model is one of them. The U-net model ( Figure 8) consists of an encoding path and a decoding path and connects (copy and crop) the two paths. The encoding path usually includes many convolutional and max pooling layers while the decoding path includes convolution and up-convolution layers. U-net has been very successful in many semantic segmentation tasks [34][35][36][37][38], but no literatures have reported the application of U-net for LSM.
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 25 layer reorganizes extracted features to map to the final outputs. CNN model was first time trained by Lecun Y. using backpropagation method for the task of classifying images of handwritten digits in 1989 [57,58]. Then many scholars successfully developed different CNN models such as LeNet [59], AlexNet [60], GoogleNet [61], VGGNet [62], and ResNet [63] et al. These traditional CNN models have had great success on image classification tasks, where the typical output to an image is a single class label. However, on image segmentation tasks, the class label is supposed to be assigned to each pixel. For this purpose, sliding-window to each pixel [64] or adding decoding path [65] to CNN architecture could be used. The later method has developed rapidly in recent years due to higher efficiency, and U-net model is one of them. The U-net model ( Figure 8) consists of an encoding path and a decoding path and connects (copy and crop) the two paths. The encoding path usually includes many convolutional and max pooling layers while the decoding path includes convolution and upconvolution layers. U-net has been very successful in many semantic segmentation tasks [34][35][36][37][38], but no literatures have reported the application of U-net for LSM.

Model Architecture
Unlike traditional semantic segmentation tasks, the input layers for LSM can be regarded as an image with several landslide influencing factors. Each pixel of the input layers is represented by a set of features which are defined by the landslide influencing factors. Due to the interaction of adjacent pixels, whether a pixel slides or not in the field is closely related to itself and the surrounding pixels, which means the adjacent pixels' spatial structure information of possible influencing factors are important for LSM task. Traditional CNN model can mine the spatial structure information for LSM task, but the original influencing factors of each pixel may be weakened during the convolution and pooling operations. So, in this paper, we develop a U-net like model for mapping post-earthquake landslide susceptibility because U-net model connects encoding path and decoding path to compensate for the information eliminated by decoding [35,39]. What we are concerned about is how to balance the influence of a pixel itself, and its surrounding pixels, to perform an optimal LSM task.

Model Architecture
Unlike traditional semantic segmentation tasks, the input layers for LSM can be regarded as an image with several landslide influencing factors. Each pixel of the input layers is represented by a set of features which are defined by the landslide influencing factors. Due to the interaction of adjacent pixels, whether a pixel slides or not in the field is closely related to itself and the surrounding pixels, which means the adjacent pixels' spatial structure information of possible influencing factors are important for LSM task. Traditional CNN model can mine the spatial structure information for LSM task, but the original influencing factors of each pixel may be weakened during the convolution and pooling operations. So, in this paper, we develop a U-net like model for mapping post-earthquake landslide susceptibility because U-net model connects encoding path and decoding path to compensate for the information eliminated by decoding [35,39]. What we are concerned about is how to balance the influence of a pixel itself, and its surrounding pixels, to perform an optimal LSM task. First, the model cannot have too many convolution and pooling layers. For each pixel in the image, a 3×3 size kernel convolution operation means this pixel would be affected by 8 pixels around it. So, the more convolution and pooling layers, the more affected by the surrounding pixels. However, for the LSM task, too many convolution and pooling layers may cause a lot of noise and make the model difficult to train.
Second, the original input layers and the last convolution layers should be connected (copy and crop) because the original influencing factors of each pixel are the most important factors for landslides occurrence.
In this study, we finally constructed a U-net like model for post-earthquake LSM shown in Figure 9. The number of input landslide influencing factors in this study was 16 as described in Section 2.5.3. We set the number of first convolution layers to the same value and double it after pooling operation as traditional U-net model does. Totally three convolution layers and one max pooling layer were induced in this model. After each convolution operation, the Batch Normalization function was used to normalize the feature map and dropout at a rate of 0.2 to avoid overfitting. All the convolution layers use the ReLU activation function while the final output layer uses the Sigmoid function to ensure an output range between 0 and 1, indicating the landslide susceptibility.

Input and Output
The input layers detailed in Table 3 include pre-earthquake Landsat TM images with 7 bands and the influencing factors listed in Section 2.4 such as DEM, slope, aspect, mTPI, lithology, fault, road network, stream network, and MI. Among these influencing factors, faults, road, and stream network were represented by polylines, the Euclidean distance to the closest source was used to calculate their potential influence on landslides. The lithology is a categorical variable, and was assigned to be a dummy variable as described in the literature [66]. In this way, 14 dummy variables were used to represent 14 lithologies listed in Section 2.4.2. In total, there were 29 input layers, their original data values were shown in Table 3 and all were normalized to the range 0-1 when training in the model.
For a fully convolutional network, the sizes of input and output layers have no influence on the result of the model. In our U-net like model the output label size was set to 2×2-pixels meaning the input layers size was 10×10-pixels. The use of such a small size layer could conveniently control the number of landslide samples when training the model.  However, for the LSM task, too many convolution and pooling layers may cause a lot of noise and make the model difficult to train. Second, the original input layers and the last convolution layers should be connected (copy and crop) because the original influencing factors of each pixel are the most important factors for landslides occurrence.
In this study, we finally constructed a U-net like model for post-earthquake LSM shown in Figure 9. The number of input landslide influencing factors in this study was 16 as described in Section 2.5.3. We set the number of first convolution layers to the same value and double it after pooling operation as traditional U-net model does. Totally three convolution layers and one max pooling layer were induced in this model. After each convolution operation, the Batch Normalization function was used to normalize the feature map and dropout at a rate of 0.2 to avoid overfitting. All the convolution layers use the ReLU activation function while the final output layer uses the Sigmoid function to ensure an output range between 0 and 1, indicating the landslide susceptibility.

Training, Validation, and Independent Testing
In the model area, the landslide pixels and non-landslide pixels were extremely unbalanced (the ratio was about 1:33) because of some very small size landslides. To accommodate both landslide and non-landslide pixels, two sampling approaches were used. First, 5000 label images with 2 × 2-pixels in the modelled area were selected randomly. Second, another 5000 images with at least 2 landslide pixels in the label area were randomly selected. By this way, approximately one-third of the landslide pixels and 1/110 of the non-landslide pixels were included so that for modeling the ratio of landslide and non-landslide pixels was about 1:2.26. These model data were split to create the training (75%) and validation (25%) datasets for the U-net like model. Precision, Recall, F1 score, and relative operative characteristics (ROC) curve were used to evaluate the accuracy of the model. There is also another method to use all the landslide pixels in the model area and the corresponding number of non-landslide pixels as samples and split them to train and validate the model. However, the model has brought in the surrounding pixels to convolution operating, it might bring redundancy if the adjacent landslide pixels were both conducted as independent samples for training.  (2). F1 score is a measure of a test's accuracy taking into account both the precision p and the recall r of a test. The F1 score Equation (3) is the harmonic mean of the precision and recall with 1 meaning a perfect precision and recall. •

Relative operative characteristics (ROC)
The receiver operation characteristic (ROC) curve is a graphical representation of the relationship between the sensitivity and specificity of a laboratory test over all possible diagnostic cutoff values. It reflects the corrections between the "1-Specificity" Equation (4) and "Sensitivity" (equivalent to recall) [8]. We generally use the area under the ROC curve (AUC) to reflect the total accuracy of these models. A greater AUC value means a higher prediction performance [33,67].
To compare the performance of the model with other models, another 2 test datasets were prepared in the similar way. Test dataset 1 contained 10,000 landslide pixels and 10,000 non-landslide pixels. Test dataset 2 came from the airborne imagery shown as the yellow polygon in Figure 10 and contained 3000 pixels with equal landslide and non-landslide pixels. The ROC curve and confusion matrix (CM) were used to compare the performance of the U-net model and traditional logistic regression (LR) and support vector machine (SVM). LR is widely used for predicting the presence or absence of an outcome based on values of a set of predictor variables [68]. The probability of a positive outcome is transformed from the interval (0,1) to its logit ln( /(1-)) [12]. The SVM algorithm is built upon a hyperplane or a set of hyperplanes in a high-or infinite-dimensional space, and can be used for separating landslide and non-landslide cases. The two models have been successful used in many LSM tasks [68][69][70][71][72][73]. regression (LR) and support vector machine (SVM). LR is widely used for predicting the presence or absence of an outcome based on values of a set of predictor variables [68]. The probability ρ of a positive outcome is transformed from the interval (0,1) to its logit ln(ρ/(1-ρ)) [12]. The SVM algorithm is built upon a hyperplane or a set of hyperplanes in a high-or infinite-dimensional space, and can be used for separating landslide and non-landslide cases. The two models have been successful used in many LSM tasks [68][69][70][71][72][73].

Spatial Analysis of Landslides
The spatial analysis between different influencing factors and post-earthquake landslides was performed. Figure 11 shows the relationship between landslide and pre-earthquake Landsat TM data. Each Landsat TM band was reclassed to six sub-categories based on the quantile method in the ArcGIS 10.6 toolbox. The radar maps in Figure 11 were calculated using the ratio of landslide area and each sub-category area. Almost all the seven Landsat TM bands have higher landslide susceptibility on the area of higher surface reflectance value sub-category, which mostly represent barren land. Figure 12 shows the relationship between landslide and other influencing factors. Similarly, the higher slope angle, MI value, and the distance closer to fault, road, stream all show higher landslide susceptibility. East and southeast direction of slope aspect have higher landslide susceptibility than the other slope aspects. The lower mTPI value, which represent the bottom of the valley has higher landslide susceptibility. Lithology from Cambrian with Metamorphic grit and Limestone also have a high susceptibility to landslide.

Spatial Analysis of Landslides
The spatial analysis between different influencing factors and post-earthquake landslides was performed. Figure 11 shows the relationship between landslide and pre-earthquake Landsat TM data. Each Landsat TM band was reclassed to six sub-categories based on the quantile method in the ArcGIS 10.6 toolbox. The radar maps in Figure 11 were calculated using the ratio of landslide area and each sub-category area. Almost all the seven Landsat TM bands have higher landslide susceptibility on the area of higher surface reflectance value sub-category, which mostly represent barren land. Figure 12 shows the relationship between landslide and other influencing factors. Similarly, the higher slope angle, MI value, and the distance closer to fault, road, stream all show higher landslide susceptibility. East and southeast direction of slope aspect have higher landslide susceptibility than the other slope aspects. The lower mTPI value, which represent the bottom of the valley has higher landslide susceptibility. Lithology from Cambrian with Metamorphic grit and Limestone also have a high susceptibility to landslide.

LSM Result of U-Net Like Model
Our U-net model was developed using the TensorFlow 2.0 software with python. TensorFlow is a flexible and scalable software library which enables users to efficiently program and train neural network and deploy them to production [74]. The hardware environment of this study was NVDIA Quadro P2000 graphics card, Intel i7-8850H processor, and the memory was 32 G. The Adam optimizer with default learning rate 0.001 and binary cross entropy loss were used for training the model. We set the batch size to 100 and it took 3 h 25 min to train our U-net like model with an early stop (patience = 10) at 23rd epoch (the max epoch value was set to 100). The training and validation results are shown in Table 4. The validation precision 0.77 means that for the areas having the susceptibility value above 0.5, 77% of them were truly landslide pixels. The validation recall value 0.90, means 90% of the total existed landslides pixels were predicted to have a susceptibility above 0.5. Both F1 score and AUC are higher than 0.83 meaning a strong performance with 1 being a perfect performance. All the training and validation indexes listed in Table 4 were based on the susceptibility value threshold 0.5 which was commonly used in a binary classification method. In order to examine the LSM result in more detail, we divided the result into six levels based on the landslide susceptibility value (marked as ls) i.e., extremely low (ls < 0.1), very low (0.1 ≤ ls < 0.3), low (0.3 ≤ ls < 0.5), high (0.5 ≤ ls < 0.7), very high (0.7 ≤ ls < 0.9), and extremely high (ls ≥ 0.9) as shown in Figure 13. The percentages of total area and landslide area of each level in the model area were shown in Figure 14 where the three low levels (ls < 0.5) and three high levels (ls ≥ 0.5) represent the non-landslide and landslide area, respectively. The most important application of a landslide susceptibility map is to guide users to find the place prone to landslide. Hence, the area of high susceptibility level should not be too large so that it could efficiently arrange engineers to investigate these prone-landslide areas. In this result, the "extremely high" and "very high" level areas were 4.56% and 6.74% of total model area, respectively, but accounted for 42.31% and 31.58%, respectively, of exist total landslides. Therefore, we have strong confidence that the areas at the "extremely high" and "very high" levels are prone to landslide in future and should be paid more attention.
of total area and landslide area of each level in the model area were shown in Figure 14 where the three low levels (ls < 0.5) and three high levels (ls ≥ 0.5) represent the non-landslide and landslide area, respectively. The most important application of a landslide susceptibility map is to guide users to find the place prone to landslide. Hence, the area of high susceptibility level should not be too large so that it could efficiently arrange engineers to investigate these prone-landslide areas. In this result, the "extremely high" and "very high" level areas were 4.56% and 6.74% of total model area, respectively, but accounted for 42.31% and 31.58%, respectively, of exist total landslides. Therefore, we have strong confidence that the areas at the "extremely high" and "very high" levels are prone to landslide in future and should be paid more attention.

Compare with LR and SVM Models
In this study, the traditional LR and SVM were compared with the U-net model for their performance. The LR model was implemented in the TensorFlow software based on the Sigmoid function while the SVM model was programmed using the regression learner app of MATLAB 2019b software. The cubic kernel function and box constraint value 2.17 were finally confirmed to be the best based on Bayesian optimizer for SVM model. The input dataset for LR and SVM models was the same as the U-net model but without convolution and pooling. The time consumption of LR and SVM model were less than 10 min as they based on different algorithm implementation.
The results for the LR and SVM models to predict the landslide susceptibility of the whole study area are shown in Figure 15. Figure 16 shows the ROC curves for the three models tested on test dataset 1 (Figure 16a) and test dataset 2 (Figure 16b). The AUC was calculated showing that the U-

Compare with LR and SVM Models
In this study, the traditional LR and SVM were compared with the U-net model for their performance. The LR model was implemented in the TensorFlow software based on the Sigmoid function while the SVM model was programmed using the regression learner app of MATLAB 2019b software. The cubic kernel function and box constraint value 2.17 were finally confirmed to be the best based on Bayesian optimizer for SVM model. The input dataset for LR and SVM models was the same as the U-net model but without convolution and pooling. The time consumption of LR and SVM model were less than 10 min as they based on different algorithm implementation.
The results for the LR and SVM models to predict the landslide susceptibility of the whole study area are shown in Figure 15. Figure 16 shows the ROC curves for the three models tested on test dataset 1 (Figure 16a) and test dataset 2 (Figure 16b). The AUC was calculated showing that the U-net model performed better than LR and SVM models on the two datasets. Figure 17 shows the airborne imagery and landslides distribution (Figure 17a) in test area 2 as well as the LSM results by U-net (Figure 17b, LR ( Figure 17c) and SVM (Figure 17d), respectively. Compared with LR and SVM models, the U-net model resulted in an LSM map showing a high accuracy and fine detail which should be more reliable to use. For example, the southeast of the study area is the Chengdu Plain in which the stronger performance of the U-net model is evident. Figure 18 shows the comparation of the LSM results by the three models in the Chengdu Plain. Both LR and SVM models predicted that the landslides in this area should have "high" or even "extremely high" susceptibility level (Figure 18c,d). However, the area is very flat with some cities and farmlands, and thus should not be favorable to the occurrence of landslides.    In order to further compare different model results, we used confusion matrix (CM) based on six levels on the landslide susceptibility value (marked as ls) i.e., ls < 0.1, 0.1 ≤ ls < 0.3, 0.3 ≤ ls < 0.5, 0.5 ≤ ls < 0.7, 0.7 ≤ ls < 0.9, and ls ≥ 0.9 as shown in Table 5 where the three low levels (ls < 0.5) and three high levels ( ls ≥ 0.5) represent the negative (non-landslide) and positive (landslide area), respectively. The total accuracy (acc) was calculated as defined in Equation (5), where the TP is the sum number of pixels which predict value ls ≥ 0.5 and true value equal 1, the TN is the sum number of pixels which predict value ls < 0.5 and true value equals 0, FP is the sum number of pixels which predict value ls ≥ 0.5 and true value equal 0, and FN is the sum number of pixels which predict value ls < 0.5 and true  In order to further compare different model results, we used confusion matrix (CM) based on six levels on the landslide susceptibility value (marked as ls) i.e., ls < 0.1, 0.1 ≤ ls < 0.3, 0.3 ≤ ls < 0.5, 0.5 ≤ ls < 0.7, 0.7 ≤ ls < 0.9, and ls ≥ 0.9 as shown in Table 5 where the three low levels (ls < 0.5) and three high levels ( ls ≥ 0.5) represent the negative (non-landslide) and positive (landslide area), respectively. The total accuracy (acc) was calculated as defined in Equation (5), where the TP is the sum number of pixels which predict value ls ≥ 0.5 and true value equal 1, the TN is the sum number of pixels which predict value ls < 0.5 and true value equals 0, FP is the sum number of pixels which predict value ls ≥ 0.5 and true value equal 0, and FN is the sum number of pixels which predict value ls < 0.5 and true In order to further compare different model results, we used confusion matrix (CM) based on six levels on the landslide susceptibility value (marked as ls) i.e., ls < 0.1, 0.1 ≤ ls < 0.3, 0.3 ≤ ls < 0.5, 0.5 ≤ ls < 0.7, 0.7 ≤ ls < 0.9, and ls ≥ 0.9 as shown in Table 5 where the three low levels (ls < 0.5) and three high levels ( ls ≥ 0.5) represent the negative (non-landslide) and positive (landslide area), respectively. The total accuracy (acc) was calculated as defined in Equation (5), where the TP is the sum number of pixels which predict value ls ≥ 0.5 and true value equal 1, the TN is the sum number of pixels which predict value ls < 0.5 and true value equals 0, FP is the sum number of pixels which predict value ls ≥ 0.5 and true value equal 0, and FN is the sum number of pixels which predict value ls < 0.5 and true value equals 1. Although U-net model did not show a significant better performance in the test area 2 based on total accuracy (79.70% vs 78.90 and 79.20), it has a higher TP and lower FP on predict "extremely high" and "very high" levels which could make the engineers efficient to investigate these prone-landslide areas. The predict value of LR model was more distributed around 0.5 in test area 1 and 2, which make the engineers difficult to arrange work according to urgency. acc = TP + TN TP + TN + FN + FP (5)

Discussion
For a deep learning model, there are many factors may affect the performance. In this section, we will discuss several aspects of them which could be better for further study of the U-net like model on LSM task.

Sample Balance for Model Input
The area considered by model involved was about 2265 km 2 covering about 2.5 million pixels at the spatial resolution 30 m. The number of the non-landslide pixels was more than 2.4 million while the number of the landslide pixels was less than 0.1 million. The unbalanced samples resulted in the regression result bias toward the non-landslide samples. The extreme case is to output all pixels to non-landslide with the value 0 in our study. Obviously, this would not make any sense. Currently, two strategies are often used to deal with this problem: balance the sample or adjust the penalty weights of two types of error [75]. In this paper, we simply reduced the number of non-landslide pixels. Finally, 10,000 2 × 2-pixels (totally 40,000 pixels) with the ratio of landslide and non-landslide being about 1:1.26 were used to train the U-net model and gave rise to a good result. The method of adjusting the penalty weights could also be considered to improve the performance. A simple way is to use the weighted cross entropy with logits function to compute a weighted cross entropy. The TensorFlow software has such a function to allow one to trade off recall and precision by up-or down-weighting the cost of a positive error relative to a negative error.

Total Convolutional Size of Model Architecture
A traditional U-net model usually use many convolution and pooling layers to present the high-level semantic features of picture for efficiently identifying objects showing a well-defined shape. However, for the purpose of landslide susceptibility mapping, too many convolutional and pooling layers may introduce a lot of noise and make the model difficult to train. In this study, our model used convolution operation (3 × 3 size kernel) three times and pooling operation one time as shown in Figure 9. Hence, each output pixel would be affected by four pixels farthest from it. We considered this size as the total convolution size (TCS) of the model, which is represented by the TCS4 red polygon in Figure 19a. It was hard to determine the best TCS for our study area as the performance of the model could be affected by any change to the model. In this study, we kept all parameters of the model fixed except the convolution kernel size of the first convolution layer (marked as red dotted polygon in Figure 9). If we change the kernel to 5 × 5 size, the TCS becomes five, and the kernel size is 1 × 1, the TCS is equal to three. Meanwhile, to use the same 2 × 2 size pixels as the output samples, the input image size turned to be 12 × 12 pixels for TCS5 and 8 × 8 pixels for TCS3, respectively (Figure 19a). Figure 19b shows the binary cross entropy loss curve of different TCS of the model area from which we could find that the use of TCS4 resulted in a better performance than those using TCS3 and TCS5. The reason for TCS4 to perform better for this study area could be complicated. An interesting finding is that TCS4 was closer to the average landslide area of the model area, which needs more investigation in the future.
Remote Sens. 2020, 12, x FOR PEER REVIEW 19 of 25 layers may introduce a lot of noise and make the model difficult to train. In this study, our model used convolution operation (3 × 3 size kernel) three times and pooling operation one time as shown in Figure 9. Hence, each output pixel would be affected by four pixels farthest from it. We considered this size as the total convolution size (TCS) of the model, which is represented by the TCS4 red polygon in Figure 19a. It was hard to determine the best TCS for our study area as the performance of the model could be affected by any change to the model. In this study, we kept all parameters of the model fixed except the convolution kernel size of the first convolution layer (marked as red dotted polygon in Figure 9). If we change the kernel to 5 × 5 size, the TCS becomes five, and the kernel size is 1 × 1, the TCS is equal to three. Meanwhile, to use the same 2 × 2 size pixels as the output samples, the input image size turned to be 12 × 12 pixels for TCS5 and 8 × 8 pixels for TCS3, respectively ( Figure  19a). Figure 19b shows the binary cross entropy loss curve of different TCS of the model area from which we could find that the use of TCS4 resulted in a better performance than those using TCS3 and TCS5. The reason for TCS4 to perform better for this study area could be complicated. An interesting finding is that TCS4 was closer to the average landslide area of the model area, which needs more investigation in the future.

Pixel Itself or Surrounding Pixels for LSM Task
Traditional heuristic, general statistical, and machine learning methods used for LSM tasks are mostly pixel-based without considering surrounding pixels. A few other methods considering slopes as a single unit instead of a single pixel usually lead to a mapping result less smooth than that by pixel-based methods [76][77][78] because it is not always possible to divide different slope units. CNN models make it possible to mine the spatial structure information from surrounding pixels by use of convolution or pooling. However, these operations may introduce noise and make it difficult to train a model. Then the question is how to balance the influence of a pixel itself and its surrounding pixels to perform an optimal LSM task. In this paper, an improved U-net model has been developed to balance this influence by connecting (copy and crop) the original input layer and the last convolution layer. To demonstrate the advantage of the proposed U-net model, two changes to the model

Pixel Itself or Surrounding Pixels for LSM Task
Traditional heuristic, general statistical, and machine learning methods used for LSM tasks are mostly pixel-based without considering surrounding pixels. A few other methods considering slopes as a single unit instead of a single pixel usually lead to a mapping result less smooth than that by pixel-based methods [76][77][78] because it is not always possible to divide different slope units. CNN models make it possible to mine the spatial structure information from surrounding pixels by use of convolution or pooling. However, these operations may introduce noise and make it difficult to train a model. Then the question is how to balance the influence of a pixel itself and its surrounding pixels to perform an optimal LSM task. In this paper, an improved U-net model has been developed to balance this influence by connecting (copy and crop) the original input layer and the last convolution layer. To demonstrate the advantage of the proposed U-net model, two changes to the model architecture were tested. First, the copy or crop operation was eliminated and replaced with the corresponding convolution channels. For example, the black layers in the blue dotted box (Figure 9) could be removed and the number of white color convolution layer channels changed from 16 to 45. This would result in a model architecture similar to a traditional CNN model described in literature [33] considering more the influence of surrounding pixels. The other change to the U-net model was only considering the pixel itself by setting all the convolution and pooling kernel size to 1×1, and this would lead to a model similar to traditional ANN model as described in literature [26]. We call these two models CNN and ANN, respectively. Figure 20 shows the curve of binary cross entropy loss of the ANN, CNN, and U-net like models vary with the training epochs. It is evident that the validation loss line of the ANN model drops very slowly indicating more features need to be added, while the validation loss line of the CNN mode shows significant fluctuation indicating that the model was difficult to train and commit overfitting. In contrast, the U-net model performed better on loss value and time consumption than both the ANN and CNN. Although the model performance can be affected by other factors such as the sample numbers/distribution or other hyperparameter parameters of the model depending on different study areas, there is no doubt that more attention should be paid to balancing the influence of pixel itself and surrounding pixels in future studies on LSM with remote sensing.
Remote Sens. 2020, 12, x FOR PEER REVIEW 20 of 25 two models CNN and ANN, respectively. Figure 20 shows the curve of binary cross entropy loss of the ANN, CNN, and U-net like models vary with the training epochs. It is evident that the validation loss line of the ANN model drops very slowly indicating more features need to be added, while the validation loss line of the CNN mode shows significant fluctuation indicating that the model was difficult to train and commit overfitting. In contrast, the U-net model performed better on loss value and time consumption than both the ANN and CNN. Although the model performance can be affected by other factors such as the sample numbers/distribution or other hyperparameter parameters of the model depending on different study areas, there is no doubt that more attention should be paid to balancing the influence of pixel itself and surrounding pixels in future studies on LSM with remote sensing.

Conclusions
This is the first study developing a U-net like model for mapping post-earthquake landslide susceptibility. The model architecture was built similar to a traditional U-net model but used less convolutional and pooling layers and connected the original input layers with the last convolution layers. Pre-earthquake Landsat TM images and the influencing factors including DEM, slope, aspect, multi-scale topographic position index (mTPI), lithology, fault, road network, stream network, and macroseismic intensity (MI) were prepared and used as the model input layers. The post-earthquake high spatial airborne images were used for conducting landslide inventory which was the output for training the model. Relative distribution of different influencing factors and post-earthquake landslide occurrence were analyzed. We trained the model for the heavy-hit area of the destructive 2008 Wenchuan earthquake and obtained a good validation accuracy with precision 0.77, recall 0.90, F1 score 0.83, and AUC 0.90. Further detailed analysis of the result indicated that the "extremely high" and "very high" level areas were only 4.56% and 6.74% of total model area, respectively, but accounted for 42.31% and 31.58% (totally 73.89%) of the total landslides, respectively, which could provide reliable reference for engineers to investigate landslides in the field. Comparison of the proposed U-net like model with traditional LR and SVM models showed the strong performance of

Conclusions
This is the first study developing a U-net like model for mapping post-earthquake landslide susceptibility. The model architecture was built similar to a traditional U-net model but used less convolutional and pooling layers and connected the original input layers with the last convolution layers. Pre-earthquake Landsat TM images and the influencing factors including DEM, slope, aspect, multi-scale topographic position index (mTPI), lithology, fault, road network, stream network, and macroseismic intensity (MI) were prepared and used as the model input layers. The post-earthquake high spatial airborne images were used for conducting landslide inventory which was the output for training the model. Relative distribution of different influencing factors and post-earthquake landslide occurrence were analyzed. We trained the model for the heavy-hit area of the destructive 2008 Wenchuan earthquake and obtained a good validation accuracy with precision 0.77, recall 0.90, F1 score 0.83, and AUC 0.90. Further detailed analysis of the result indicated that the "extremely high" and "very high" level areas were only 4.56% and 6.74% of total model area, respectively, but accounted for 42.31% and 31.58% (totally 73.89%) of the total landslides, respectively, which could provide reliable reference for engineers to investigate landslides in the field. Comparison of the proposed U-net like model with traditional LR and SVM models showed the strong performance of the former than LR and SVM for both the model area and independent testing area. Through discuss the detail of the model architecture, we found that balancing the environmental influence of a pixel itself and its surrounding pixels to perform a better LSM task is useful and feasible when using remote sensing and GIS technology.