Deep Learning for Landslide Detection and Segmentation in High-Resolution Optical Images along the Sichuan-Tibet Transportation Corridor

: Landslides pose a greater potential risk to the Sichuan-Tibet Transportation Project, and extensive landslide inventory and mapping are essential to prevent and control geological hazards along the Sichuan-Tibet Transportation Corridor (STTC). Recently proposed landslide detection methods mainly focused on new landslides with high vegetation. In addition, there are still challenges in automatic detection of old landslides using optical images. In this paper, two methods, namely mask region-based convolutional neural networks (Mask R-CNN) and transfer learning Mask R-CNN (TL-Mask R-CNN), are presented for detecting and segmenting new and old landslides, respectively. An optical remote sensing dataset for landslide recognition along the Sichuan-Tibet Transportation Corridor (LRSTTC) is constructed as an evaluation benchmark. Our experimental results show that the recall rate and F1-score of the proposed method for new landslide detection can reach 78.47% and 79.80%, respectively. Transfer learning is adopted to detect old landslides, and our experimental results show that evaluation indices can be further improved by about 10%. Furthermore, TL-Mask R-CNN has been applied to identify ice avalanches based on the characteristics of landslides. It appears that our proposed methods can detect and segment landslides effectively along the STTC with the constructed LRSTTC dataset, which is essential for studying and preventing landslide hazards in mountainous areas.


Introduction
Landslides are one of the major geological disasters in the world, which are usually caused by rainstorms, earthquakes, and human activities [1][2][3].They result in significant damage to land and loss of natural resources, destruction of infrastructure, and even loss of human lives [4][5][6].Landslides are extremely developed in the Qinghai-Tibet Plateau due to complex landforms and geological structures, bringing potential risks to the engineering construction and operation security of the Sichuan-Tibet Transportation Project [7,8].Therefore, it is vital to investigate and detect landslide disasters in the Sichuan-Tibet Transportation Project.
The development of landslide detection methods has gone through two stages, humancentric approaches and compute-centric methods including pixel-based, object-oriented, and deep learning.
The traditional landslide survey methods are mainly human-centric approaches including field survey and visual interpretation [2].Field investigation is time-consuming and difficult to be used in a large area.With the advance in satellite imagery, high-resolution (HR) images such as Sentinel-2 [9], WorldView [10], Gaofen-2 [11], and QuickBird [12], have been widely used in landslide interpretation.Visual interpretation is based on the characteristics of landslide hue, texture, and shapes in HR images [13,14].It avoids dangerous field surveys but relies on expert experience.Overall, human-centric approaches provide reliable results but they are laborious and inefficient.
Pixel-based [15] and object-oriented approaches [16][17][18], combined with many machine learning methods including support vector machine (SVM) [19,20], random forest (RF) [21][22][23], and Markov random field [12,24], are proposed for automatic landslide detection in HR images.The pixel-based methods process images in pixels with machine learning, such as maximum likelihood [25,26] and K-nearest neighbor classification [27].It is sensitive to noise and cannot use the spatial features of HR remote sensing images effectively.Object-oriented classification makes full use of the spatial, texture, and spectral information of images by taking adjacent pixels as objects to identify interesting elements [28].However, object-oriented approaches are difficult to determine a reasonable segmentation scale for different regions, which affects the landslide extraction results.
Currently, deep learning has been gradually widely used in landslide detection [29,30], where convolutional neural networks (CNN) [31,32] and U-Net [33,34] are used a lot in landslide detection.Moreover, Long et al. [35] used deep belief networks (DBN) and convolutional deep belief networks (CDN) to monitor high-level landslides in the Jinsha River.Liu et al. [36] applied long short-term memory (LSTM) neural network, RF, and gated recursive unit to predict the slope displacement in the Three Gorges reservoir area.In addition, geological data were further employed in deep learning to extract landslides [37,38], such as terrestrial laser scanning (TLS) [39], digital elevation matrix (DTM) [40], and digital terrain model (DEM) [41][42][43].Overall, deep learning can extract landslides with robust spatial and spectral features using the hierarchical learning framework [44].Compared with human-centric, pixel-based, and object-oriented approaches, deep learning methods with more hidden layers and stronger feature extraction ability, have great potential for landslide identification across large regions.
The above automatic methods usually detect new landslides with high vegetation coverage, but rarely for other types such as ice avalanches [45][46][47], and the performance still needs to be improved.Furthermore, it is still a challenge to detect old landslides that are more concealable, compared with new landslides in HR images, which will cause secondary instability when the external environment changes (Figure 1).So it is necessary to detect old landslides rather than just new landslides in optical images.Although airborne light detection and ranging (LiDAR) can remove surface vegetation making old landslides easy to be detected effectively [48,49], it is too expensive to be applied to a large area.In addition, deep learning training process requires large amounts of data.However, to our best knowledge, the only publicly available remote sensing dataset for landslide detection is Bijie Landslide Dataset [50], which is not sufficient for automatic landslide detection in optical images.
To fulfill this gap, this study constructed an optical remote sensing dataset for Landslide Recognition along the Sichuan-Tibet Transportation Corridor (LRSTTC).Other than introducing mask region-based convolutional neural networks (Mask R-CNN) to detect new landslides with high vegetation coverage, we proposed a transfer learning Mask R-CNN (TL-Mask R-CNN) to detect old landslides and ice avalanches with distinctive features in optical images based on the characteristics of the new landslides, which only used a small amount of data to fine-tune the network parameters on a new task.The main contributions of this paper are as follows: (1) An optical remote sensing dataset along the Sichuan-Tibet Transportation Corridor (STTC) was constructed as a benchmark for landslide detection and segmentation, filling a relative need for available landslide identification datasets.To fulfill this gap, this study constructed an optical remote sensing dataset for Landslide Recognition along the Sichuan-Tibet Transportation Corridor (LRSTTC).Other than introducing mask region-based convolutional neural networks (Mask R-CNN) to detect new landslides with high vegetation coverage, we proposed a transfer learning Mask R-CNN (TL-Mask R-CNN) to detect old landslides and ice avalanches with distinctive features in optical images based on the characteristics of the new landslides, which only used a small amount of data to fine-tune the network parameters on a new task.The main contributions of this paper are as follows: 1) An optical remote sensing dataset along the Sichuan-Tibet Transportation Corridor (STTC) was constructed as a benchmark for landslide detection and segmentation, filling a relative need for available landslide identification datasets.
2) Transfer learning was applied to identify old landslides and ice avalanches based on the trained model for new landslides.Previous studies usually focused on the identification of landslides that have occurred recently, and there are few related studies on the automatic identification of old landslides and ice avalanches on the optical image.
3) Landslides along the STTC were detected and segmented using Mask R-CNN, and proposed TL-Mask R-CNN, which is a challenging task across different geological structures in such a huge area and has great significance in the operation of the Sichuan-Tibet Railway and people's lives safety.
The remainder of this paper is organized as follows.The study areas and LRSTTC dataset are described in Sections 2.1 and 2.2, respectively.Two methods are presented for new and old landslide detection and segmentation along the STTC in Section 2.3.The results and analysis are demonstrated in Section 3. The discussion and the conclusion are given in Section 4 and Section 5, respectively.

Study Area
The study area is along the STTC from Chengdu to Lhasa in the southwest of China, which is the most important channel connecting Sichuan Province and Tibet (Figure 2).Covered an area of 779,627.3 km 2 , the study area crosses three levels of terrain in China: Chengdu Plain, Hengduan Mountains, and Tibetan Plateau, with an altitude ranging from 72 m to 7388 m.Other than steep terrain, the area is affected by well-developed faults, The remainder of this paper is organized as follows.The study areas and LRSTTC dataset are described in Sections 2.1 and 2.2, respectively.Two methods are presented for new and old landslide detection and segmentation along the STTC in Section 2.3.The results and analysis are demonstrated in Section 3. The discussion and the conclusion are given in Sections 4 and 5, respectively.

Study Area
The study area is along the STTC from Chengdu to Lhasa in the southwest of China, which is the most important channel connecting Sichuan Province and Tibet (Figure 2).Covered an area of 779,627.3 km 2 , the study area crosses three levels of terrain in China: Chengdu Plain, Hengduan Mountains, and Tibetan Plateau, with an altitude ranging from 72 m to 7388 m.Other than steep terrain, the area is affected by well-developed faults, high rainfall variability, and fragile ecosystems [51].Hence, landslides happen frequently in this area, which affects the normal operation of the traffic lines seriously.Through visual interpretation by Google Earth, Sentinel 2, Landsat 8 imagery, and a long period of field investigation, 924 landslides were obtained and shown in Figure 2.These landslides ranged in size from 1232 m 2 to 216,231,680 m 2 , with an average elevation of 3573 m, and most of them are located around rivers and railway lines.
ranged in size from 1232 m 2 to 216,231,680 m 2 , with an average elevation of 3573 m, and most of them are located around rivers and railway lines.

Constructing LRSTTC Dataset
Given the strong demand for automated, efficient, and reliable landslide datasets for landslide recognition, early warning, risk assessment, and post-disaster recovery, we created LRSTTC dataset.It can be roughly divided into two types: new landslides and old landslides, based on the time of occurrence of these landslides and the spectral and texture characteristics of these landslides in the images [52][53][54][55].
(1) New landslides: It is obvious to see the main scarp, body, and toe of these landslides that just occurred recently.There is a clear sliding surface, and the color of the landslide is obviously different from the surrounding features in Figure 3a1-a8.
(2) Old landslides: These landslides occurred earlier.The color of the slide's body is not significantly different from the surrounding features, and even vegetation has grown on some old landslides in Figure 3b2.However, the general shape of the landslide, the back wall of the landslide in Figure 3b5, and the deposits at the front of the landslide in Figure 3b4 can be still seen in the optical image.Some man-made buildings are located above these accumulations, posing huge hazards.

Constructing LRSTTC Dataset
Given the strong demand for automated, efficient, and reliable landslide datasets for landslide recognition, early warning, risk assessment, and post-disaster recovery, we created LRSTTC dataset.It can be roughly divided into two types: new landslides and old landslides, based on the time of occurrence of these landslides and the spectral and texture characteristics of these landslides in the images [52][53][54][55].
(1) New landslides: It is obvious to see the main scarp, body, and toe of these landslides that just occurred recently.There is a clear sliding surface, and the color of the landslide is obviously different from the surrounding features in Figure 3a1-a8.(2) Old landslides: These landslides occurred earlier.The color of the slide's body is not significantly different from the surrounding features, and even vegetation has grown on some old landslides in Figure 3b2.However, the general shape of the landslide, the back wall of the landslide in Figure 3b5, and the deposits at the front of the landslide in Figure 3b4 can be still seen in the optical image.Some man-made buildings are located above these accumulations, posing huge hazards.

Constructing LRSTTC Dataset
Given the strong demand for automated, efficient, and reliable landslide datasets for landslide recognition, early warning, risk assessment, and post-disaster recovery, we created LRSTTC dataset.It can be roughly divided into two types: new landslides and old landslides, based on the time of occurrence of these landslides and the spectral and texture characteristics of these landslides in the images [52][53][54][55].
(1) New landslides: It is obvious to see the main scarp, body, and toe of these landslides that just occurred recently.There is a clear sliding surface, and the color of the landslide is obviously different from the surrounding features in Figure 3a1-a8.
(2) Old landslides: These landslides occurred earlier.The color of the slide's body is not significantly different from the surrounding features, and even vegetation has grown on some old landslides in Figure 3b2.However, the general shape of the landslide, the back wall of the landslide in Figure 3b5, and the deposits at the front of the landslide in Figure 3b4 can be still seen in the optical image.Some man-made buildings are located above these accumulations, posing huge hazards.In this study, Google Earth images were used to generate the LRSTTC dataset with human visual interpretation.(1) New landslides often correspond to areas with obvious spatial and/or temporal changes in the textual and/or spectral features in optical images, and hence they can be easily identified from Google Earth images.(2) Old landslides often exhibit certain geomorphological features (e.g., scarps, flanks, cracks, and ridges) and they can be identified based on these geomorphological features [53].Note that every single sample in the LRSTTC dataset was double checked by landslide experts and some landslides were even confirmed in the field (Figure 4).
Remote Sens. 2022, 14, 5490 5 of 18 In this study, Google Earth images were used to generate the LRSTTC dataset with human visual interpretation.(1) New landslides often correspond to areas with obvious spatial and/or temporal changes in the textual and/or spectral features in optical images, and hence they can be easily identified from Google Earth images.(2) Old landslides often exhibit certain geomorphological features (e.g., scarps, flanks, cracks, and ridges) and they can be identified based on these geomorphological features [53].Note that every single sample in the LRSTTC dataset was double checked by landslide experts and some landslides were even confirmed in the field (Figure 4).According to the landslides drawn by expert visual interpretation and field investigation, the LRSTTC Dataset was made based on Google Earth image, which contains a total of 924 landslides including 740 new landslides and 184 old landslides samples.Every landslide sample contains landslide images, masks, and marked coordinate positions, which are stored in the josn file.At present, there are very few publicly datasets of geological disaster available for deep learning.The LRSTTC dataset provided by this experiment can be used for studying and evaluating corresponding methods for landslide classification, detection, and segmentation.In this paper, we mainly focus on detecting the new landslides and the old landslides with obvious features in the optical images.

Mask R-CNN for Landslide Detection and Segmentation along the STTC
Deep learning models are mainly divided into two-stage and one-stage methods for object detection.The performance of the two-stage frameworks, including region-based convolutional neural networks (R-CNN) [56], Fast R-CNN [57], and Faster R-CNN [58], has steadily increased.Based on Faster R-CNN, Mask R-CNN [59], the object mask is added as the third output for each ROI, which allows for instance segmentation.The proposed landslide recognition Mask R-CNN has five parts, feature extraction network, feature pyramid networks (FPN) [60], RPN, ROI alignment, and functional network as shown in Figure 5.According to the landslides drawn by expert visual interpretation and field investigation, the LRSTTC Dataset was made based on Google Earth image, which contains a total of 924 landslides including 740 new landslides and 184 old landslides samples.Every landslide sample contains landslide images, masks, and marked coordinate positions, which are stored in the josn file.At present, there are very few publicly datasets of geological disaster available for deep learning.The LRSTTC dataset provided by this experiment can be used for studying and evaluating corresponding methods for landslide classification, detection, and segmentation.In this paper, we mainly focus on detecting the new landslides and the old landslides with obvious features in the optical images.

Mask R-CNN for Landslide Detection and Segmentation along the STTC
Deep learning models are mainly divided into two-stage and one-stage methods for object detection.The performance of the two-stage frameworks, including region-based convolutional neural networks (R-CNN) [56], Fast R-CNN [57], and Faster R-CNN [58], has steadily increased.Based on Faster R-CNN, Mask R-CNN [59], the object mask is added as the third output for each ROI, which allows for instance segmentation.The proposed landslide recognition Mask R-CNN has five parts, feature extraction network, feature pyramid networks (FPN) [60], RPN, ROI alignment, and functional network as shown in Figure 5.As the backbone of the feature extraction network, the residual neural network (Res-Net) [61] is efficient in training deeper networks, due to its shortcut connection (Figure 6).In the shortcut connection, the input information x is directly added to the output, which protects the integrity of the information and avoids the disappearance or explosion of the gradient in deep network training.With bottom-up and top-down structure in FPN, higher-level features are passed down to complement the lower-level semantics so that high-resolution, strongly semantic features can be retrieved, facilitating the detection of small targets.The RPN network is divided into two lines (Figure 7), where regional proposals are generated.The anchors are classified into positive and negative by softmax in the top branch of Figure 7, and the bounding box regression offset for the anchors is calculated to obtain the exact proposals in the bottom branch.In the proposal layer, the positive anchors and the corresponding border offsets are synthesized, and proposals that are too small or out of bounds are eliminated.As the backbone of the feature extraction network, the residual neural network (ResNet) [61] is efficient in training deeper networks, due to its shortcut connection (Figure 6).In the shortcut connection, the input information x is directly added to the output, which protects the integrity of the information and avoids the disappearance or explosion of the gradient in deep network training.With bottom-up and top-down structure in FPN, higher-level features are passed down to complement the lower-level semantics so that high-resolution, strongly semantic features can be retrieved, facilitating the detection of small targets.As the backbone of the feature extraction network, the residual neural network (Res-Net) [61] is efficient in training deeper networks, due to its shortcut connection (Figure 6).In the shortcut connection, the input information x is directly added to the output, which protects the integrity of the information and avoids the disappearance or explosion of the gradient in deep network training.With bottom-up and top-down structure in FPN, higher-level features are passed down to complement the lower-level semantics so that high-resolution, strongly semantic features can be retrieved, facilitating the detection of small targets.The RPN network is divided into two lines (Figure 7), where regional proposals are generated.The anchors are classified into positive and negative by softmax in the top branch of Figure 7, and the bounding box regression offset for the anchors is calculated to obtain the exact proposals in the bottom branch.In the proposal layer, the positive anchors and the corresponding border offsets are synthesized, and proposals that are too small or out of bounds are eliminated.The RPN network is divided into two lines (Figure 7), where regional proposals are generated.The anchors are classified into positive and negative by softmax in the top branch of Figure 7, and the bounding box regression offset for the anchors is calculated to obtain the exact proposals in the bottom branch.In the proposal layer, the positive anchors and the corresponding border offsets are synthesized, and proposals that are too small or out of bounds are eliminated.The ROI alignment addresses the misalignment caused by quantification.It uses bilinear interpolation and pooling to transform features into maps of the same size, which avoids errors caused by the rounding of the two coordinates quantization and improves the accuracy of coordinate frame regression (Figure 8).The functional network consists of two parts.One is the backbone to extract features, and the other is the head network for classification, box regression, and mask prediction for each ROI.First, the ROI is input into the backbone and generated 7 × 7 × 1024 ROI features.Then, ROI features are sampled to 2048 channels, and they are input to the head including two-branch networks.One branch is for classification and regression, and the other one is for generating the corresponding masks.Therefore, the loss function of each ROI consists of classification, coordinate boxes, and masks, as is shown in Equations (1).The ROI alignment addresses the misalignment caused by quantification.It uses bilinear interpolation and pooling to transform features into maps of the same size, which avoids errors caused by the rounding of the two coordinates quantization and improves the accuracy of coordinate frame regression (Figure 8).The ROI alignment addresses the misalignment caused by quantification.It uses bilinear interpolation and pooling to transform features into maps of the same size, which avoids errors caused by the rounding of the two coordinates quantization and improves the accuracy of coordinate frame regression (Figure 8).The functional network consists of two parts.One is the backbone to extract features, and the other is the head network for classification, box regression, and mask prediction for each ROI.First, the ROI is input into the backbone and generated 7 × 7 × 1024 ROI features.Then, ROI features are sampled to 2048 channels, and they are input to the head including two-branch networks.One branch is for classification and regression, and the other one is for generating the corresponding masks.Therefore, the loss function of each ROI consists of classification, coordinate boxes, and masks, as is shown in Equations (1).The functional network consists of two parts.One is the backbone to extract features, and the other is the head network for classification, box regression, and mask prediction for each ROI.First, the ROI is input into the backbone and generated 7 × 7 × 1024 ROI features.Then, ROI features are sampled to 2048 channels, and they are input to the head including two-branch networks.One branch is for classification and regression, and the other one is for generating the corresponding masks.Therefore, the loss function of each ROI consists of classification, coordinate boxes, and masks, as is shown in Equations (1).
where L represents the overall loss function value of the Mask R-CNN model.N cla represents the total number of samples.N reg represents the size of the feature layer.p i and p * i represent the probability of anchor prediction to the target and background, respectively.L mask represents the average binary cross-entropy loss.L cla represents the logarithmic loss of p i and p * i .L box represents the loss value of the bounding box regression.t i is a vector representing the offset predicted by the anchor during the training phase of the RPN, and t * i is the actual offset.

Transfer Learning for Old Landslide Recognition along the STTC Transfer Learning
Machine learning assumes that training data and test data are derived from a unified feature space, with the same distribution.However, the learning model has to be trained again with new data when the distribution of test data changes.Considering the fact that labeling data is expensive, it is difficult to use machine learning, especially supervised learning, for practical applications.Therefore, how to make full use of the previously annotated data and maintain model accuracy for the new task is a big challenge.
Transfer learning (TL) was first proposed in discriminability-based transfer (DBT) to solve the above problems [62].It supposes that the target domain and source domain are different distributions, and the knowledge extracted from the target domain is applied to the source domain.In Equations ( 2)-(3), D s represents the source domain and D t represents the source domain.
where x indicates that feature space, r is the label of x. y indicates that target space, m, n represents the number of samples, i, j represents the current number of samples.
According to the form of knowledge transfer, transfer learning is divided into four categories, containing instance-based, feature-based, parameter-based, and relationalbased [63].Instance-based transfer learning means that part of the data in D s is reused by reweighting in D t .Feature-based transfer learning obtains the typical feature through the D s , encodes knowledge in the form of features, and transfers it from the D t to the target domain.Parameter-based transfer learning means if D s and D t follow a similarly prior distribution, partial parameter and model structure sharing is possible.Relational-based transfer learning assumes that the relationship between D s and D t is the same, knowledge is transferred between related domains.

TL-Mask R-CNN
The landslide samples we collected along the STTC are too small, but deep learning model training requires a large amount of data.Although old landslide samples are insufficient in D t , there is a large amount of new landslide data to be used in D t .TL emphasizes the transfer of knowledge between different but similar domains, tasks, and distributions.The new and old landslides have many similarities, for example, shapes in HR remote sensing images.But there still are differences between them.Compared with new landslides, the color of old landslides has a similar appearance to surrounding ground objects, and some old landslides even have grown vegetation.Considering the above factors, we combine TL and Mask R-CNN for old landslide detection and proposed the transfer learning-Mask R-CNN (TL-Mask R-CNN).
Firstly, the weights trained from new landslides were selected as the initial weights because they learn more similar features on the new landslides than the weights trained from the COCO dataset.Secondly, all the parameters are frozen except for the functional network, and only the functional network is trained for detection, classification, and segmentation (Figure 9), which reduces the number of training parameters effectively and preserves more common landslide characteristics in the trained model.The common features extracted from the new landslide are used for training effectively when the collected old landslide samples are insufficient in the STTC.Thirdly, shape features extracted from new landslides were added to the model training in the target domain because the new landslide, the old landslide, and the ice collapse have similar shapes.For example, landslides and ice avalanches slides have similar shapes, although they are composed of different materials.Finally, given the lack of sample data in the target domain, we carried out data enhancement, such as flip, rotation, scale, crop, translation, image brightness, and contrast changes.

Experimental Environment
The hardware configuration in this study was as follows: Intel (R) Core (TM) I7-8700K CPU, running memory 64G, and NVIDIA RTX 2080Ti GPU.The deep learning frameworks were TensorFlow and PyTorch, and other software included PyCharm, VC++, Anaconda, and Python.During the image preprocessing of the experiment, the slice size was set to 512 × 512 and the label name was set as the landslide.The dataset was divided into training, validation, and test datasets according to the ratio of 8:1:1.The LRSTTC Dataset will be submitted at https://github.com/Jiang-CHD-YunNan/LRSTTC(accessed on 31 November 2022.In the experiment, the number of training epochs was 10, the iteration number for each epoch was 1000, and the initial learning rate was 0.001.The weight decay coefficient was 0.005, and the momentum factor was 0.9.

Evaluation Indices
The confusion matrix as the basic index intuitively counts the number of detection results, but it is difficult to accurately evaluate the quality of the model.Secondary evaluation indicators can be used for this purpose: Precision () and Recall () to further evaluate the proposed methods. is the probability of being correct in the target detection, and  is the probability of correct detection in all positive samples.The above formulas are written as Equations ( 4)-( 5): where  (True Positive) indicates that the actual sample is a positive sample and the prediction is a positive sample. (False Positive) indicates that the actual sample is 9. Architecture of the TL-Mask R-CNN.

Experimental Environment
The hardware configuration in this study was as follows: Intel (R) Core (TM) I7-8700K CPU, running memory 64G, and NVIDIA RTX 2080Ti GPU.The deep learning frameworks were TensorFlow and PyTorch, and other software included PyCharm, VC++, Anaconda, and Python.During the image preprocessing of the experiment, the slice size was set to 512 × 512 and the label name was set as the landslide.The dataset was divided into training, validation, and test datasets according to the ratio of 8:1:1.The LRSTTC Dataset will be submitted at https://github.com/Jiang-CHD-YunNan/LRSTTC(accessed on 31 November 2022.In the experiment, the number of training epochs was 10, the iteration number for each epoch was 1000, and the initial learning rate was 0.001.The weight decay coefficient was 0.005, and the momentum factor was 0.9.

Evaluation Indices
The confusion matrix as the basic index intuitively counts the number of detection results, but it is difficult to accurately evaluate the quality of the model.Secondary evaluation indicators can be used for this purpose: Precision (P) and Recall (R) to further evaluate the proposed methods.P is the probability of being correct in the target detection, and R is the probability of correct detection in all positive samples.The above formulas are written as Equations ( 4)-( 5): where TP (True Positive) indicates that the actual sample is a positive sample and the prediction is a positive sample.FP (False Positive) indicates that the actual sample is negative, but the prediction is a positive sample.FN (False Negative) indicates that the actual sample is positive, but the prediction is a negative sample.The recall rate tends to be low when the precision is high, while the precision tends to be low when the recall rate is high.Only in a relatively simple dataset, will the precision and recall rate be both high.To comprehensively measure the quality of the detection model, F1 score can be used to evaluate the model, as is shown in Equations (6).
where R represents the recall rate, and P represents the accuracy rate.
In this paper, landslide shape segmentation is carried out at the same time as landslide detection.The mean pixel accuracy rate (mPA) and the mean intersection over union (mIoU) are introduced to evaluate the segmentation results reasonably, as is shown in the Equations ( 7)- (8).mPA is the ratio of the correct number of pixels for each prediction category to the total number of pixels, which reflects the accuracy of the segmentation model.mIoU is the average of the intersection and union ratio between each type of prediction result and the real mask.
where n is the number of the predicted categories.P ii represents the original category i and the predicted category i. P ij represents the original category i and the predicted category j.P ji represents the original category j and the predicted category i.

New Landslide Detection Results
The Mask R-CNN backbone consists of resnet-50 and resnet-101.The effects of landslide extraction of two backbone networks were tested separately, and test results are shown in Figure 10 and Table 1.From Figure 10, it is seen that the Mask R-CNN can both identify the landslides and segment the shape of each landslide independently.And from Table 1, the overall performance of the model trained on the basis of the resnet-101 backbone is higher than the model trained on the basis of the resnet-50 backbone.In this paper, the resnet-101 was chosen as Mask R-CNN backbone for landslide detection and segmentation.The pixel accuracy of landslide segmentation is 87.71%, the mIoU reaches 77.94%, the precision of landslide detection is 81.18%, the recall rate reaches 78.47%, and the F1 of the comprehensive index reached 0.79.

Old Landslide Detection Results
With geological movement and human activities, the landslide may occur again and cause great damage due to the instability of the surface accumulation body in the old landslide.The traditional detection method of old landslides mainly relies on the experience of experts, expensive LiDAR data, etc. TL-Mask R-CNN was presented to detect the old landslide with some visible features, and the results are shown in Figure 11 and Table 2.With geological movement and human activities, the landslide may occur again and cause great damage due to the instability of the surface accumulation body in the old landslide.The traditional detection method of old landslides mainly relies on the experience of experts, expensive LiDAR data, etc. TL-Mask R-CNN was presented to detect the old landslide with some visible features, and the results are shown in Figure 11 and Table 2.   From Figure 11a1-a3, it is difficult to identify old landslides manually without sufficient remote sensing interpretation experience.Many old landslides are hard to be distinguished from the surrounding ground objects because they occurred long time ago and are covered by heavy vegetation.From Figure 11c1-c3, it is clear that TL-Mask R-CNN can detect old landslides with some visible features, but the landslide boundaries are not precise.The reason is that manual drawing of landslide shapes during labeling is a systematic task that requires a comprehensive and integrated analysis of all geological conditions such as pore water, slope structure, location, nature of fractures, excavation, physical exploration, ground investigation, etc. Different data needs to be verified mutually and carefully before a reasonable landslide surface is drawn as labeled ground truth, not only the sliding part.So it is difficult to make an accurate segmentation of the landslide shape as Figure 11b2,c2.We compared the proposed TL-Mask R-CNN with typical semantic segmentation deep learning models including Deeplabv3+ [64], Unet [33], and Unet++ [65].The experimental parameters were set as follows: epoch was 100, batch size was 4, and learning rate was 0.0001.The Adam (Adaptive Moment Estimation) function was used at the optimizer, and the images were clipped as 512 × 512 to input model training.
As is shown in Table 2, TL-Mask R-CNN performed the best for old landslide detection.All kinds of indicators showed an increase of 10% or more using the transfer learning method compared with the Mask R-CNN method.Actually, the scores of various indicators based on pixel statistics are hard to be improved because the landslide shapes were usually marked larger than the actual shapes of the sliding part.Another reason is that the Sichuan-Tibet Transportation Project crosses the first and second terrains of China, with a total length of more than 1500 km, and there are great differences in the shape, color, and texture of landslides among different geological areas, which increases the difficulty of landslide identification.

Validation in the Ya'an-Kangding Section of the STTC
In this paper, landslides acquired in the field surveys along the STTC were used as training samples.However, the scope of artificial field investigation is limited.In order to further automatically detect the unknown landslides along the STTC and verify our proposed method, the verification experiment was carried out in the Ya'an-Kangding section of the Sichuan-Tibet Transportation Project.The geographical location of this verification area is 29 • 12 N to 30 • 00 N latitude and 101 • 24 E to 103 • 07 E longitude, covering an area of 13,487.41km 2 .
First, high-resolution images of the verification area were obtained from Google Earth and cropped to 684 images, with 1 m × 1 m spatial resolution.Then, we selected a small number of samples to fine-tune the trained model and carried out landslide detection in the verification area.A total of 588 landslides were automatically detected, in which 470 landslides were correct (TP) and 137 landslides were incorrectly detected as a landslide (FP).What's more, we found that 69 landslide models were not detected (FN).Finally, the precision and recall were calculated to be 79.9% and 87.2%, respectively.The detection results are shown in Figure 12.
A and B are new landslides that have just occurred, which have a significant color difference with surrounding ground features.There is less vegetation in area A, where two landslides fail to be detected that framed with blue, and two ground objects were incorrectly detected that framed with yellow in Figure 12a2.In area B with high vegetation cover, all landslides were well detected automatically.Compared with new landslides, some old landslides were covered in vegetation, and they were hard to be recognized.In the verification area, TL-Mask R-CNN was applied to detect old landslides with obvious collapse areas or accumulation bodies in high-resolution images.It is seen from Figure 12c1, that the old landslide has obviously collapsed on the optical image in area C. Similarly, it is observed from Figure 12d1 that the old landslide has an obvious accumulation body.In Figure 12c2,d2, it is seen that our model detected the location of the old landslide, but it is difficult to segment the shape of old landslides accurately.To summarize, our model detects landslides with distinct features on optical images effectively.

Ice Avalanche Detection
To the best of our knowledge, there are almost no identification results of ice avalanches using deep learning in previous studies.A total of 103 ice avalanches have been found along part of the Sichuan-Tibet Transportation Project through visual interpretation and field investigation.Due to the small number of samples, we adopted TL-Mask R-CNN for training, which used the landslide model previously trained on the Sichuan-Tibet Transportation Project as the basic model and only used 80 samples to learn the head network.Part of the detection and segmentation results of ice avalanches are shown in Figure 13.To the best of our knowledge, there are almost no identification results of ice avalanches using deep learning in previous studies.A total of 103 ice avalanches have been found along part of the Sichuan-Tibet Transportation Project through visual interpretation and field investigation.Due to the small number of samples, we adopted TL-Mask R-CNN for training, which used the landslide model previously trained on the Sichuan-Tibet Transportation Project as the basic model and only used 80 samples to learn the head network.Part of the detection and segmentation results of ice avalanches are shown in Figure 13.

Features of the TL-Mask R-CNN Method
In this paper, deep learning was used to detect landslides along the SSTC, which has positive implications for the safe construction and operation of the SSTC.Firstly, the LRSTTC dataset generated in this study has been made freely available for the public, which can significantly reduce the time of data collection and labeling for other researchers.There have been limited (if any) available geohazard datasets in this study area for deep learning.Secondly, the TL-Mask R-CNN method presented in this paper can be utilized to detect old landslides and ice avalanches with better performance than previously reported landslide detection methods (e.g., Mask R-CNN, Unet, Unet++, and Deeplabv3+).Given that new and old landslides have similar shapes, and landslides and ice avalanches can be seen as slides along slopes with different materials, it is feasible to use transfer learning to realize the detection of different geological.Thirdly, the TL-Mask R-CNN method is able to segment landslides one by one, which appears to be a big challenge for most of the previously published methods.

Features of the TL-Mask R-CNN Method
In this paper, deep learning was used to detect landslides along the SSTC, which has positive implications for the safe construction and operation of the SSTC.Firstly, the LRSTTC dataset generated in this study has been made freely available for the public, which can significantly reduce the time of data collection and labeling for other researchers.There have been limited (if any) available geohazard datasets in this study area for deep learning.Secondly, the TL-Mask R-CNN method presented in this paper can be utilized to detect old landslides and ice avalanches with better performance than previously reported landslide detection methods (e.g., Mask R-CNN, Unet, Unet++, and Deeplabv3+).Given that new and old landslides have similar shapes, and landslides and ice avalanches can be seen as slides along slopes with different materials, it is feasible to use transfer learning to realize the detection of different geological.Thirdly, the TL-Mask R-CNN method is able to segment landslides one by one, which appears to be a big challenge for most of the previously published methods.

Limitations of the TL-Mask R-CNN Method
It should be pointed out that the TL-Mask R-CNN method has two major limitations at the moment.
(1) Limited sample size: Deep learning always requires large sample sizes, but the sample size of the LRSTTC is still small.It is believed that the TL-Mask R-CNN method could perform even better with an increasing sample size of the LRSTTC dataset.(2) Model transferability: Geological and weather conditions vary a lot along the STTC, and the key influencing factors of landslides can be different from one place to another [66], which makes the transferability of the TL-Mask R-CNN method a challenge.To address this issue, it would be desirable, once again, to increase the sample size of the LRSTTC dataset.

Conclusions
The landslide hazards along the STTC bring great risk to its operational safety.This paper introduced a Mask R-CNN to extract the new landslides along the STTC.Furthermore, a TL-Mask R-CNN was proposed to recognize the small number of old landslide samples in the area along the STTC.In the case of small samples, we detected ice avalanches by using TL-Mask R-CNN considering that ice avalanches and landslides are both slides made of different matter.Due to the lack of an effective remote sensing image dataset for landslide detection, an image dataset LRSTTC in Sichuan and Tibet was constructed as an evaluation benchmark for visual interpretation and on-site investigation.The experimental results show that the pixel accuracies of the new and old landslide segmentation can reach 87.71% and 75.86%, respectively.To further detect the unknown landslide along the STTC, and verify the proposed method, we selected the section from Ya'an to Kangding of the Sichuan-Tibet Transportation Project for experiments and showed the effectiveness of the proposed method.Compared with previous studies, this paper presents a landslide identification dataset in the field of geological hazards where datasets are rare for deep learning, which significantly reduces the time and energy of sample collection and data labeling for related researchers.Furthermore, we have explored the automatic identification of old landslides and ice avalanches, to which there are relatively few approaches.The results of disaster identification will directly help with the construction of the Sichuan-Tibet Railway and reduce casualties.
Currently, our approach achieves the migration between three different types of geohazards including new landslides, old landslides, and ice avalanches.Our future research will study the transfer of knowledge in different regions and landscapes, exploring typical feature transfer under the diversity scenarios.How to find these effective features is a key underpinning to transferring learning.In addition, a reasonable combination of geological knowledge is also important rather than simply adding all kinds of geological data to train the model.We will also combine multi-source remote sensing data and geohydrological data to detect landslides and continuously update the LRSTTC dataset.

Figure 2 .
Figure 2. Landslide Study Area of the Sichuan-Tibet Transportation Project.

Figure 3 .
Figure 3.The landslide instances in the LRSTTC dataset.(a1-a8) are new landslides that just occurred recently; (b1-b8) are old landslides that occurred for a long time.

Figure 2 .
Figure 2. Landslide Study Area of the Sichuan-Tibet Transportation Project.
from 1232 m 2 to 216,231,680 m 2 , with an average elevation of 3573 m, and most of them are located around rivers and railway lines.

Figure 2 .
Figure 2. Landslide Study Area of the Sichuan-Tibet Transportation Project.

Figure 3 .
Figure 3.The landslide instances in the LRSTTC dataset.(a1-a8) are new landslides that just occurred recently; (b1-b8) are old landslides that occurred for a long time.Figure 3. The landslide instances in the LRSTTC dataset.(a1-a8) are new landslides that just occurred recently; (b1-b8) are old landslides that occurred for a long time.

Figure 3 .
Figure 3.The landslide instances in the LRSTTC dataset.(a1-a8) are new landslides that just occurred recently; (b1-b8) are old landslides that occurred for a long time.Figure 3. The landslide instances in the LRSTTC dataset.(a1-a8) are new landslides that just occurred recently; (b1-b8) are old landslides that occurred for a long time.

Figure 5 .
Figure 5.The architecture of the Mask R-CNN for landslide recognition.

Figure 5 .
Figure 5.The architecture of the Mask R-CNN for landslide recognition.

18 Figure 5 .
Figure 5.The architecture of the Mask R-CNN for landslide recognition.

Figure 7 .
Figure 7.The architecture of the RPN.

Figure 7 .
Figure 7.The architecture of the RPN.

Figure 7 .
Figure 7.The architecture of the RPN.
out data enhancement, such as flip, rotation, scale, crop, translation, image brightness, and contrast changes.

Figure 9 .
Figure 9. Architecture of the TL-Mask R-CNN.

Figure 10 .
Figure 10.New landslide detection and segmentation results using Mask R-CNN based on the different backbones.(a1-a4) represent different landslides: (a1) is a single intact landslide with vegetation cover; (a2) is a broken-shaped landslide with vegetation cover; (a3) is close to the surrounding ground features; (a4) is multiple landslides in an image.(b1-b4) are the results using resnet-50 as the detection backbone.(c1-c4) are the results using resnet-101 as the detection backbone.

Figure 10 .
Figure 10.New landslide detection and segmentation results using Mask R-CNN based on the different backbones.(a1-a4) represent different landslides: (a1) is a single intact landslide with vegetation cover; (a2) is a broken-shaped landslide with vegetation cover; (a3) is close to the surrounding ground features; (a4) is multiple landslides in an image.(b1-b4) are the results using resnet-50 as the detection backbone.(c1-c4) are the results using resnet-101 as the detection backbone.

Figure 10 .
Figure 10.New landslide detection and segmentation results using Mask R-CNN based on the different backbones.(a1-a4) represent different landslides: (a1) is a single intact landslide with vegetation cover; (a2) is a broken-shaped landslide with vegetation cover; (a3) is close to the surrounding ground features; (a4) is multiple landslides in an image.(b1-b4) are the results using resnet-50 as the detection backbone.(c1-c4) are the results using resnet-101 as the detection backbone.

Table 1 .
New landslide detection results using Mask R-CNN with different backbones.

Table 1 .
New landslide detection results using Mask R-CNN with different backbones.

Table 1 .
New landslide detection results using Mask R-CNN with different backbones.

Table 2 .
Evaluation of old landslide detection results.