Research on Post-Earthquake Landslide Extraction Algorithm Based on Improved U-Net Model

: Seismic landslides are the most common and highly destructive earthquake-triggered geological hazards. They are large in scale and occur simultaneously in many places. Therefore, obtaining landslide information quickly after an earthquake is the key to disaster mitigation and relief. The survey results show that most of the landslide-information extraction methods involve too much manual participation, resulting in a low degree of automation and the inability to provide e ﬀ ective information for earthquake rescue in time. In order to solve the abovementioned problems and improve the e ﬃ ciency of landslide identiﬁcation, this paper proposes an automatic landslide identiﬁcation method named improved U-Net model. The intelligent extraction of post-earthquake landslide information is realized through the automatic extraction of hierarchical features. The main innovations of this paper include the following: (1) On the basis of the three RGB bands, three new bands, DSM, slope, and aspect, with spatial information are added, and the number of feature parameters of the training samples is increased. (2) The U-Net model structure is rebuilt by adding residual learning units during the up-sampling and down-sampling processes, to solve the problem that the traditional U-Net model cannot fully extract the characteristics of the six-channel landslide for its shallow structure. At the end of the paper, the new method is used in Jiuzhaigou County, Sichuan Province, China. The results show that the accuracy of the new method is 91.3%, which is 13.8% higher than the traditional U-Net model. It is proved that the new method is e ﬀ ective and feasible for the automatic extraction of post-earthquake landslides.


Introduction
In China, especially in mountainous areas, seismic landslides are the most common and highly destructive earthquake-triggered geological disasters [1]. Statistics show that landslides can be triggered when the earthquake magnitude (ML) is greater than 4.0 magnitude [2], and earthquaketriggered landslides are more prominent during strong earthquakes (ML ≥ 6.0), especially during large earthquakes (ML ≥ 7.0). Rough statistics show that, in some major earthquakes, the loss of life and property caused by earthquake landslides can account for more than 50% of the total earthquake losses [3].
Therefore, obtaining information about earthquake landslides is the key to quickly organize rescues to save people's lives and property at the first time [4]. However, it is difficult to extract and analyze the landslides' information in the traditional field surveys. There are problems such as heavy task, low efficiency, high cost, and unintuitive information. With the rapid development of space technology and electronic information technology, remote sensing has been increasingly used in disaster emergency responses and rescues, with its advantages of rapid, macro, all-weather, and nearly all-day monitoring [5], and has become one of the most economical and effective solutions for landslide extraction [6]. It is of great significance to get information on a landslide situation as soon as possible, make a reasonable rescue plan, determine the potential hazards for key investigation, reasonably arrange the affected residents, and avoid secondary disasters [7].
At present, there are mainly three kinds of landslide information extraction methods based on remote sensing.
(1) Visual interpretation: Visual interpretation is a professional landslides extraction method according to the image's hue, texture, shape, position, and other characteristics. It uses aerial and high-resolution multispectral imageries to establish corresponding landslide interpretation marks and then observes features through the eyes and combines other non-remote sensing data to analyze the feature information and geological knowledge of the target [8]. Visual interpretation is the earliest method to extract landslides using remote sensing and has achieved good results of geological disasters such as landslides, slides, and debris flows [9][10][11][12].
(2) Pixel-oriented landslides extraction methods: The visual interpretation method is widely used in high-precision remote-sensing image information recognition. However, visual interpretation requires that the workers have rich knowledge and experience, and it also takes a lot of time for manual interpretation. There are problems of heavy task, long cycles, and low efficiency, making it difficult to meet the needs of emergency, earthquake relief, decision-making.
At present, pixel-oriented landslides extraction techniques have overcome the shortcomings of visual interpretation. There are many classification methods, such as maximum likelihood, support vector machine and K-means clustering, etc. They are mainly based on each pixel value of the remote-sensing imageries and the changed form (such as spectral derivative, reciprocal, logarithm of reflectance, etc.) to determine the category to which the target belongs [13][14][15].
(3) Object-oriented landslides identification methods: The pixel-oriented landslides extraction methods are relatively simple, but they only consider the characteristics of a single pixel point and do not consider other attributes (such as shape, texture, spatial structure, context, etc.) of related features, which leads to the loss of correlation between pixels [13].
The essence of the object-oriented landslides identification methods is to use object main properties to classify remote sensing images from high-to-low scales, which can reduce the error of information extraction by pixel classification and make the results more reasonable [16,17].
(4) Deep-learning landslides identification methods: With the rapid development of satellites, high-resolution remote-sensing imageries have increased significantly. Extracting targets from massive high-scoring data has become a key technology. The object-oriented identification methods are based on image segmentation, and the results depend on the selection of the segmentation scale, which is often determined by the spatial structure of the imageries [18]. The suitable segmentation scale needs to be determined through repeated experiments [19]. At the present time, the main segmentation algorithms cannot quickly process complex, large-sized remote sensing data, making the segmentation efficiency low [20].
In recent years, deep-learning methods have provided an effective framework for automatic landslides extraction [21]. Compared with the traditional methods, it has the advantage of performing feature learning in an unsupervised or semi-supervised manner, and using hierarchical feature extraction to replace manual recognition [22]. At the same time, the extraction speed has been greatly improved to meet the requirements of the emergency response to the earthquake hazards. In general, it takes about three hours or more for manual interpretation on a map with an area of 40 km 2 . In contrast, deep-learning methods can finish it within an hour. Among deep-learning methods, such as FCN (Fully Convolutional Networks), Mask R-CNN, U-Net, and DeepLab series, the U-Net model has a simple and effective structure to extract target features by using a small number of samples. As to the landslides extraction, due to its variable shape and complex colors, it is easy to lead to missed and misplaced extractions when the U-Net is used just on the RGB imageries. Therefore, it needs urgent improvement to increase the landslides extraction precision.
The main objective of this paper is to improve the automatic extraction of the earthquake-triggered landslides by improving the U-Net model. The main improvements include the following: (1) adding parameters closely related to landslides (such as DSM, slope, and aspect), to enhance the characteristic parameters of samples; and (2) adding the residual learning unit in the U-Net network structure to improve the accuracy of landslides extraction.

U-Net Model
The U-Net model was proposed by Olaf Ronneberger et al, in 2015. It is based on the improved Fully Convolutional Neural Network (FCN) [23] and was originally applied to medical imageries. Its structure is shown in Figure 1 [24]. The structure is named after the letter U drawn by the authors. It consists of a compression channel on the left half and an expansion channel on the right. The compression channel is a typical convolutional neural network structure. It repeatedly uses the structure of two convolutional layers and one maximum pooling layer. After each pooling operation, the dimension of the feature map doubles, which is from 64 to 128, 256, 512, and 1024. In the expansion channel, it firstly performs a deconvolution operation to halve the dimension of the feature map, and it secondly stitches the feature maps corresponding to the compression channel, to reconstruct a feature map with double the size. Then, two convolutional layers are adopted to extract features. In the final output layer, two convolutional layers are used to map the 64-dimensional features into a two-dimensional output map.
The U-Net model is an improvement and extension of FCN. It follows the FCN idea of image semantic segmentation, using convolutional layers and pooling layers to extract features, and then using deconvolution layers to restore imageries. However, U-Net combines the characteristics of the encoding-decoding structure and the jumping network, which is more elegant in the following 2 points: (1) The U-Net model has an encoding-decoding structure. The compression channel is an encoder used to extract the imagery features. The expansion channel is a decoder used to restore the imagery information. In addition, the U-Net hidden layers have more dimensions, which are conductive to model learning with more diverse and comprehensive features.
(2) The "U-shaped" structure of the U-Net model makes the cutting and stitching process more intuitive and reasonable. The stitching of high-level features with the underlying features and the repeated operations of convolutions enable the model to combine context and detailed information to obtain a more accurate feature map using less training samples.

Improved U-Net Model
(1) Basic Principles Due to the landslides' complexity, its extraction results using the traditional U-Net model have a problem of missing and misplaced landslides. To solve it, the following two aspects are needed: (i) Increasing samples spatial information: Landslides are gravitational landforms on sloping fields, and their formation is closely related to the topographical features, with certain slope requirements [25]. Using only RGB images cannot distinguish them effectively from other features. Therefore, the input imagery is expanded to six channels, in which DSM, slope, and aspect containing more landslides spatial information are added to the model, to construct input layers to improve the landslides extraction accuracy.
(ii) Adding the residual network: For traditional deep-learning networks, we generally think that the deeper the network, the stronger the nonlinear expression ability is, and the more information the network can learn. However, later in the process of application, we found that, when the number of layers deepen to a certain degree, the deeper the network, the worse the effect becomes. To solve this problem, He et al. proposed a residual network in 2016 [26].
Following is its basic principle. Assuming that the input of a neural network unit is x and the expected output is H(x), if x is directly passed to the output, the objective of the neural network unit is the residual mapping F(x) = H(x) − x. This neural network unit that learns residual mapping is called a residual learning unit, and the structure of the residual learning unit is shown in Figure 2a. Residual learning units usually include two forms: a two-layer residual learning unit and a three-layer residual learning unit [27]. This paper uses a two-layer residual learning unit that contains two 3 × 3 convolutions with the same number of channels (as shown in Figure 2b). In order to extract the landslide features contained in the six channels, a residual learning unit was added to the U-Net model, deepening the network layers, and then fully extracted the six-channel features of the samples. Compared with the traditional U-Net model, it has stronger feature extraction and classification capabilities. At the same time, it solves the problem of gradient disappearance or gradient explosion when the network deepens, better controls the propagation of gradients, and reduces the difficulty of training deeper U-Net model. The copied channel copies the low-level feature map from the down-sampling to the up-sampling process and fuses it with the high-level feature maps to complete the cross-layer connection of context information and enhance the learning ability of the network's global features, which can effectively improve the accuracy of segmentation.
The structure of the improved U-Net network is shown in Figure 3. (2) Flowchart of the Improved U-Net Model The main objective of this study was to use the U-Net model to automatically extract the earthquake-triggered landslides. Its flowchart is shown in Figure 4.

Accuracy Evaluation
In order to quantitatively evaluate the performance of the model, the precision, recall, F1 score [28], and mean intersection over union (mIoU) were used to quantitatively evaluate the model.

Mean Intersection over Union
Mean intersection over union is a standard metric for semantic segmentation [29]. It calculates the ratio of the intersection and union of two sets. In the problem of semantic segmentation, these two sets are the true value and the predicted value. In this paper, the two sets are the number of interpreted landslides and the number of predicted landslides. The larger the ratio, the higher the accuracy rate. Its equation is as follows: in which k + 1 is the number of categories. In this paper, k = 1; i is the label of the ground truth value; j is the label of the predicted value; P ii is the number of pixels labeled i and predicted as i; P ij is the number of pixels labeled i but predicted as j; and P ji is the number of pixels labeled j but predicted as i.

F1 Score
The F1 score is defined as the harmonic average of Precision and Recall to evaluate the model. The higher the value of the F1 score, the better the performance of the model. Its equation is as follows:

Study Area
On the August 8th, 2017, a magnitude of 7.0 earthquake struck Jiuzhaigou County, Aba Prefecture, Northern Sichuan Province. As shown in Figure 5, the epicenter (33.2 • N, 103.82 • E) was located in the Jiuzhaigou County, Sichuan Province, and the focal depth was 20 km. Due to the steep terrain, a large number of landslides occurred, which lead to at least 29 roads to be blocked and damaged, and the total length of the damaged roads was about 4 kilometers. Investigating the spatial location of these landslides is essential for disaster reduction and relief, as well as the reconstruction of scenic areas [30].  Figure 6a, 233 landslides in the area from Shangsizhai to Ganhaizi are used as the test set whose area is 12.6 km 2 and the imagery spatial resolution is 0.47 m. Shown in Figure 6b, 366 landslides near the Panda Sea, Wuhuahai, and Jianzhuhai are used as the training set whose area is 34.6 km 2 and the imagery spatial resolution is 0.14 m. In this paper, DSM, slope, and aspect are used as the imagery spatial features. Firstly, normalize the pixel value of three layers to 0-255. Secondly, the three normalized layers are superimposed together to generate a three-channel image containing spatial information.

Shown in
Due to the limited running memory of the computer, it is not possible to directly input large-sized imageries into the model for training. It is necessary to split remote-sensing imageries. Image blocks with a size of 256 × 256 pixels were cut out from a large remote-sensing imagery. Small-sized imageries are put into the model for training in batches. This process speeds up the training and accelerates the landslides extraction during the image process, because convolution has a strong receptive field and can integrate contextual information. Therefore, the dimensions of split blocks have little effect on the landslide extraction. This paper uses the same method to split the original, labeled, and test imageries. The detailed splitting process of the test image is shown in Figure 7. The accuracy was only high on the training set, and it was difficult to improve the accuracy on the test set. After screening, the final enhancement method selected in this paper is 90 • rotating, 270 • rotating and horizontal flipping.

Experimental Environment
The hardware environment of this experiment: the graphics card is RTX2080Ti, the processor is Intel i7-8700K, and the memory is 32 G.
The software environment of this experiment: U-Net model and improved model are implemented by PyTorch. PyTorch is a lower-level API that focuses on processing array expressions directly, with greater flexibility and better control over it.
Based on the software and hardware environment and the datasets, three experiments were designed for comparison: the traditional U-Net model training and testing the three-channel RGB dataset; the traditional U-Net model training and testing the six-channel dataset, and the improved U-Net model to training and testing the six-channel dataset.

Results
Landslides from Shangsizhai to Ganhaizi were extracted from the three experimentally trained models, and the results are shown as follows: Figure 8a is a landslide distribution map obtained by manual visual interpretation, which provides a reference for the model extraction.  Figure 8b shows the results of the U-Net + three-channel extraction. It can be seen from the figure that the incorrect and missed cases are so obvious that the results are unsatisfactory. Main factors leading to false extraction are roads: Some roads in mountainous areas, or the slight accumulation of soil on the roads, is similar to landslides in color (see Figure 9a). At the same time, some houses that are similar in color are also incorrectly classified as parts of landslides (see Figure 9b); cases of misclassifications are mainly because the boundaries of the landslides cannot be identified well, and most of the missed extractions occur around the boundary. For some larger landslides, covering with a large amount of vegetation over part of the surface also causes missed extraction (see Figure 9c).  Figure 8c is the results of the U-Net + six-channel extraction. It can be seen that the incorrect landslides have been decreased due to the addition of spatial information. The ability to extract roads has been enhanced, and some roads that have been mistaken have been eliminated. However, the missed landslides still exist in the boundary of the landslides and the image blocks. Figure 8d is the results of U-Net + six channels + ResNet. It can be seen that the overall extraction effect is satisfied; adding spatial information has corrected the misclassified roads and boundaries to a large scale.
In order to quantitatively evaluate the performance of the model and the extraction results of the test area, each accuracy index is calculated by calculating the confusion matrix of the extracted results. As shown in Table 2, Precision and Recall are obtained when the threshold is adjusted to maximize F1. The improved model U-Net + six channels + ResNet proposed in this paper has a precision of 91.3%, a Recall of 95.4%, and an mIoU of 87.5%. Each parameter is 13.8%, 13%, and 17.1% higher than that of the traditional U-Net + three-channel model, respectively.

Conclusions and Prospect
In this paper, a landslide dataset is labeled with post-earthquake aerial remote-sensing imageries, and an improved U-Net model is proposed for earthquake-triggered landslides extraction.
(1) In the U-Net model, with the adding of the samples spatial information, the number of incorrect and missed landslides extraction reduced.
In order to improve the accuracy of the extraction, the number of channels of the input imagery was extended from three channels to six channels. DSM, slope, and aspect, which contain spatial information, were added to the imagery to make a good distinction between roads, houses, and other features.
(2) In the U-Net model, a residual learning unit was added to improve the accuracy. In order to fully extract the landslides by six channels, a residual learning unit was added to the traditional U-Net model to deepen the network layers. The results show that the improved U-Net model (U-Net + six channels + ResNet) got good results. The precision on the test set is 91.3%, the recall is 95.4%, and the mIoU is 87.5%. Compared with the traditional U-Net model, it has been improved by 13.8%, 13%, and 17.1%, respectively.
(3) The improved U-Net model is feasible for extracting earthquake-triggered landslides. Taking Jiuzhaigou earthquake-triggered landslides as an example, the results show that the improved U-Net model (U-Net + six channels + ResNet) is feasible and effective for landslides extraction using high-resolution remote sensing imageries. Compared with other methods, the model only needs to obtain the unmanned aerial vehicle (UAV) remote-sensing data in the post-earthquake area, which can save the time of acquiring remote-sensing imageries before the earthquake or no suitable data after the earthquake. After a Jiuzhaigou earthquake happens, the following three steps will be taken for quick response. Firstly, emergency response. It takes about an hour to prepare the UAV and arrive at the area. The first batch of data, including high-resolution remote-sensing imageries and DEM will be stored and distributed within three hours. Secondly, mosaic and geometric correction will take place. It takes no more than 30 minutes to mosaic 500 imageries (covering about 7-8 km 2 ), and the geometric correction efficiency is 20 GB/h, as the error in the plain is less than 1 pixel, and the error in the mountains is less than 3 pixels. Thirdly, image processing and result extraction will take place. It takes 2-4 hours to make the landslide map in the severely afflicted area and 6-8 hours to complete the whole area using the proposed method.
This research mainly serves earthquake emergency departments, including the Sichuan Earthquake Administration, the Xinjiang Earthquake Administration, and the Gansu Earthquake Administration. They have used the high-resolution remote-sensing data and these results for quick response and have given improvement suggestions to the proposed method in this paper.
Due to the small amount of data on seismic landslides, there are still some errors in the extracted landslides. Expanding the training datasets, including remote sensing imageries with different types and resolutions, is a future work.
Author Contributions: All authors contributed in a substantial way to the manuscript. P.L. and Q.W. conceived of, designed, performed the research and wrote the manuscript. Y.W., Y.C. and J.X. made contributions to the design of the research, data analysis and manuscript modifications. All authors read and approved the submitted manuscript.