Matrix SegNet: A Practical Deep Learning Framework for Landslide Mapping from Images of Different Areas with Different Spatial Resolutions

: Practical landslide inventory maps covering large-scale areas are essential in emergency response and geohazard analysis. Recently proposed techniques in landslide detection generally focused on landslides in pure vegetation backgrounds and image radiometric correction. There are still challenges in regard to robust methods that automatically detect landslides from images with multiple platforms and without radiometric correction. It is a signiﬁcant issue in practical application. In order to detect landslides from images over different large-scale areas with different spatial resolutions, this paper proposes a two-branch Matrix SegNet to semantically segment input images by change detection. The Matrix SegNet learns landslide features in multiple scales and aspect ratios. The pre- and post- event images are captured directly from Google Earth, without radiometric correction. To evaluate the proposed framework, we conducted landslide detection in four study areas with two different spatial resolutions. Moreover, two other widely used frameworks: U-Net and SegNet, were adapted to detect landslides via the same data by change detection. The experiments show that our model improves the performance largely in terms of recall, precision, F1-score, and IOU. It is a good starting point to develop a practical, deep learning landslide detection framework for large scale application, using images from different areas, with different spatial resolutions.


Introduction
Landslides have become devastating hazards around the globe, especially in mountainous regions. They are associated with climate change, and can cause substantial loss of life and property [1,2]. Diving into the mechanisms of landslides for accurate landslide prediction calls for practical landslide mapping from large scale areas, with complicated background objects. Many researchers have developed advanced technologies in landslide detection, but they still face challenges in transferring the technology for practical application [3][4][5][6].
Generally, landslide detection technology from remotely sensed images can be grouped into two categories: image enhancement based [7] and machine learning based [8]. The image enhancement based category enhances landslide information over other background objects through typical image processing techniques, such as Markov random field [5,9], morphological operations [10], object-oriented image segmentation [11,12], and the image enhancement method [8]. One main drawback of the methods abovementioned is the The rest of our manuscript is organized as follows: the related works with the proposed network architecture and recent developments in semantic segmentation are presented in Section 2. Section 3 introduces our proposed framework and experimental settings. The study area and data collection are demonstrated in Section 4. Sections 5 and 6 illustrate the evaluations and discussions, respectively. Our conclusions are presented in the final section.

Semantic Segmentation Development
Generally, semantic segmentation has two executing paradigms [34], one is two-stage segmentation, the other is one-stage. The two-stage segmentation method segments images after detecting the bounding box of each object. Mask R-CNN [35] is a typical method, and lays the foundation for the development of other methods, such as PANet [36], and Mask Scoring R-CNN [35]. Two-stage segmentation can achieve high accuracy but low efficiency. The one-stage end-to-end network architecture is more practical in actual application cases (we adopt it in this study). It mostly takes the structure of the encoder-decoder; encoding the input images into feature maps is conducted by spatial pooling or atrous convolution and decoding the feature maps to restore segmentation detail is achieved by deconvolution or up-sampling operations. SegNet [37], U-Net [29], PSPNet [28], and the series of DeepLab models [38,39] all take such typical network structures. In terms of landslide detection, U-Net is a commonly used network architecture [40] because it is easy to train and is highly efficient [41].

Matrix Nets (xNets)
The xNets module was originally proposed in an object detection framework [21]. It was designed to detect key-points by learning heat maps, corner regression, or center regression, simultaneously. xNets, inspired by the Feature Pyramid Networks proposed for object detection [42], was adopted to down-sample the input feature maps by convolution in horizontal, vertical, and diagonal directions, separately. Therefore, it can learn features with multiple scales and aspect ratios, which are necessary in landslide detection from images with different spatial resolutions. Synthesizing the works in [21,42], a two-branch Matrix SegNet is proposed in this manuscript, modifying xNets module to detect landslides. The images before and after the landslide event are drawn as inputs of the two-branch Matrix SegNet and encoded to the feature maps separately, without radiometric correction, as required in [5].

Proposed Architecture
Due to the large proportion of background objects in remotely-sensed images, we first adopted potential landslide detection on the pre-and post-event images for each study area, to remove the background objects that were easily separable. A raw image example with a spatial resolution of 19 m is shown in Figure 1. Since a landslide takes similar spectral characteristics with bare soil in the image, we assigned all of the pixels with intensity values in the green channel smaller than the red channel as a potential landslide, and others as background objects. Such operation was conducted because the red channel was more sensitive to bare soil than the green and blue channels [43]. By eliminating enormous background object pixels, potential landslide detection can save much computation consumption. with intensity values in the green channel smaller than the red channel as a potential landslide, and others as background objects. Such operation was conducted because the red channel was more sensitive to bare soil than the green and blue channels [43]. By eliminating enormous background object pixels, potential landslide detection can save much computation consumption.
(a) (b) Based on the detected potential landslide image, we calculated connective contour for the potential landslide pixel groups, according to their distribution. Each contour is recognized as a potential landslide. Among the potential landslide contours, bare soil was a major background object that was difficult to recognize. Therefore, we proposed a twobranch Matrix SegNet to enhance the landslide feature learning ability in distinguishing landslides from confusable background objects. The proposed model is trained on image patches that are cropped from the original images based on each connective contour. This strategy could enhance the distinguishing ability of the proposed model more directly and avoid the unbalanced sample distribution issue commonly confronted in natural hazard detection, where the number of landslide pixels is smaller than that of background objects in magnitudes.
As shown in Figure 2, our proposed semantic segmentation network takes a twintower structure to explore the land cover change of the images before and after the landslide event. The twin-tower architecture is employed according to the conclusion in [44]; it is better than concatenating the input images directly. The two input images are encoded into two feature maps with a backbone of ResNet-50 [45], respectively. ResNet-50 is a 50-layer Based on the detected potential landslide image, we calculated connective contour for the potential landslide pixel groups, according to their distribution. Each contour is recognized as a potential landslide. Among the potential landslide contours, bare soil was a major background object that was difficult to recognize. Therefore, we proposed a twobranch Matrix SegNet to enhance the landslide feature learning ability in distinguishing landslides from confusable background objects. The proposed model is trained on image patches that are cropped from the original images based on each connective contour. This strategy could enhance the distinguishing ability of the proposed model more directly and avoid the unbalanced sample distribution issue commonly confronted in natural hazard detection, where the number of landslide pixels is smaller than that of background objects in magnitudes.
As shown in Figure 2, our proposed semantic segmentation network takes a twintower structure to explore the land cover change of the images before and after the landslide event. with intensity values in the green channel smaller than the red channel as a potential landslide, and others as background objects. Such operation was conducted because the red channel was more sensitive to bare soil than the green and blue channels [43]. By eliminating enormous background object pixels, potential landslide detection can save much computation consumption.
(a) (b) Based on the detected potential landslide image, we calculated connective contour for the potential landslide pixel groups, according to their distribution. Each contour is recognized as a potential landslide. Among the potential landslide contours, bare soil was a major background object that was difficult to recognize. Therefore, we proposed a twobranch Matrix SegNet to enhance the landslide feature learning ability in distinguishing landslides from confusable background objects. The proposed model is trained on image patches that are cropped from the original images based on each connective contour. This strategy could enhance the distinguishing ability of the proposed model more directly and avoid the unbalanced sample distribution issue commonly confronted in natural hazard detection, where the number of landslide pixels is smaller than that of background objects in magnitudes.
As shown in Figure 2, our proposed semantic segmentation network takes a twintower structure to explore the land cover change of the images before and after the landslide event. The twin-tower architecture is employed according to the conclusion in [44]; it is better than concatenating the input images directly. The two input images are encoded into two feature maps with a backbone of ResNet-50 [45], respectively. ResNet-50 is a 50-layer The twin-tower architecture is employed according to the conclusion in [44]; it is better than concatenating the input images directly. The two input images are encoded into two feature maps with a backbone of ResNet-50 [45], respectively. ResNet-50 is a 50-layer residual network, a widely used network backbone in learning features [29,37,42]. It was proposed to solve the problem of model degradation by adding an identity mapping in each network building block. Model degradation is a common issue raised by the continuously increasing layers of the network. The encoded feature maps by backbone are further concatenated and enhanced by the squeeze-and-excitation (SE) module, as demonstrated in Figure 3. SE enhances features by learning the weight for each channel of feature maps, multiplying the learnt weight with each corresponding feature map channel. The enhanced feature maps are further used for learning features with multiple scales and aspect ratios using the matrix convolution module. Figure 4 shows the detailed network structure of the matrix convolution module. It consists of 5 × 5 matrix of convolution operations, wherein vertical convolution (shown in yellow) down-samples the input feature maps vertically, horizontal convolution (shown in red) down-samples the input feature maps horizontally, and diagonal convolution (shown in green) down-samples the input feature maps in both directions. By sampling the feature maps in three different ways simultaneously, the matrix convolution can extract features in multiple scales and aspect ratios and generate an output feature map Fmo. It takes the same size with that of the input feature map Fmi. To enlarge the feature scale extraction, we concatenate the feature maps Fmi and Fmo for final convolution to produce a semantic segmentation result image.
Remote Sens. 2021, 13, x FOR PEER REVIEW 5 of 16 residual network, a widely used network backbone in learning features [29,37,42]. It was proposed to solve the problem of model degradation by adding an identity mapping in each network building block. Model degradation is a common issue raised by the continuously increasing layers of the network. The encoded feature maps by backbone are further concatenated and enhanced by the squeeze-and-excitation (SE) module, as demonstrated in Figure 3. SE enhances features by learning the weight for each channel of feature maps, multiplying the learnt weight with each corresponding feature map channel. The enhanced feature maps are further used for learning features with multiple scales and aspect ratios using the matrix convolution module. Figure 4 shows the detailed network structure of the matrix convolution module. It consists of 5 × 5 matrix of convolution operations, wherein vertical convolution (shown in yellow) down-samples the input feature maps vertically, horizontal convolution (shown in red) down-samples the input feature maps horizontally, and diagonal convolution (shown in green) down-samples the input feature maps in both directions. By sampling the feature maps in three different ways simultaneously, the matrix convolution can extract features in multiple scales and aspect ratios and generate an output feature map Fmo. It takes the same size with that of the input feature map Fmi. To enlarge the feature scale extraction, we concatenate the feature maps Fmi and Fmo for final convolution to produce a semantic segmentation result image.  . Network structure of the matrix convolution layers in our framework, wherein Fmi is the input feature map, Fmo is the output feature map, yellow feature maps are obtained by vertical convolution from the layer above, pink feature maps are calculated by horizontal convolution from the layer above, and green feature maps are achieved by diagonal convolutions. residual network, a widely used network backbone in learning features [29,37,42]. It was proposed to solve the problem of model degradation by adding an identity mapping in each network building block. Model degradation is a common issue raised by the continuously increasing layers of the network. The encoded feature maps by backbone are further concatenated and enhanced by the squeeze-and-excitation (SE) module, as demonstrated in Figure 3. SE enhances features by learning the weight for each channel of feature maps, multiplying the learnt weight with each corresponding feature map channel. The enhanced feature maps are further used for learning features with multiple scales and aspect ratios using the matrix convolution module. Figure 4 shows the detailed network structure of the matrix convolution module. It consists of 5 × 5 matrix of convolution operations, wherein vertical convolution (shown in yellow) down-samples the input feature maps vertically, horizontal convolution (shown in red) down-samples the input feature maps horizontally, and diagonal convolution (shown in green) down-samples the input feature maps in both directions. By sampling the feature maps in three different ways simultaneously, the matrix convolution can extract features in multiple scales and aspect ratios and generate an output feature map Fmo. It takes the same size with that of the input feature map Fmi. To enlarge the feature scale extraction, we concatenate the feature maps Fmi and Fmo for final convolution to produce a semantic segmentation result image.   Inspired by RetinaNet [46], focal loss function is adopted in our framework training pipeline. It was proposed to deal with the unbalanced sample distribution between positive Remote Sens. 2021, 13, 3158 6 of 16 and negative samples, which is commonly confronted in landslide detection. Focal loss is modified from binary cross entropy loss, as stated in Equation (1), wherein y gt indicates the ground truth label, and p pred stands for the probability calculated by the model that input sample belongs to label 1 (binary 0/1 classification task). The weight of negative samples is reduced in the convergence of the training model by adding weight factor α, as shown in Equation (2). Moreover, focal loss focuses on difficult samples by adding focusing index β. It can adjust the ratio of easy examples, being assigned a small weight using β. Following the works in [46], β is set as 2 and α is set as 0.25.

Study Area and Dataset Preparation
In order to evaluate the transferability and robustness of our proposed two-branch Matrix SegNet, it was applied to detect landslides in four research areas, including the Lushan earthquake impacted area (Lushan in short), the Jiuzhaigou earthquake impacted area (Jiuzhaigou in short), the Central Nepal area (Nepal in short), and Southern Taiwan (Taiwan in short). All images before and after the landslide events were collected from Google earth (because they are public and free). The images collected were directly used for the training model, without radiometric correction. It added more difficulty in evaluating the proposed model. Due to different imaging times and limited data provided from Google Earth, there were still some high spatial resolution images missing after the landslide event. Therefore, we adopted different resolutions for different study areas, as long as they generally covered the impacted area, with an acquisition time of within 2 years after the event. The unified resolution for each study area was selected as high as possible. Table 1 presents general information of the images used for each landslide event, and the detailed process of each corresponding dataset construction can be referred to in the following part.

Lushan Earthquake-Induced Landslide
The magnitude of the 7.0 Ms Lushan earthquake occurred in Lushan County, Sichuan Province, on 20 April 2013 (shown in Figure 5). It triggered tens of thousands of landslides [47] and led to severe casualties and loss of wealth. The landslides were manually interpreted from high spatial resolution images on the GIS platform [47]. With the landslide inventories interpreted in [47], we collected Google Earth images covering the impacted area before and after the earthquake, with a spatial resolution of 19 m. The images were acquired on 31 December 2010, and 31 December 2013, respectively. From Figure 5, we can recognize that forestry and road networks occupy the majority part of the study area. Landslides are distributed intensively along the side of road networks. Since the landslide inventories in [47] are visually interpreted from images with resolutions of 1 to 15 m, they cover more details than what we can achieve from 19 m resolution images. Therefore, we adjusted the landslide inventories and removed the landslides that could not be visually interpreted from the 19 m resolution images. After modification, the number of landslide the landslide inventories in [47] are visually interpreted from images with resolutions of 1 to 15 m, they cover more details than what we can achieve from 19 m resolution images. Therefore, we adjusted the landslide inventories and removed the landslides that could not be visually interpreted from the 19 m resolution images. After modification, the number of landslide inventories reached 11,754, which could still provide abundant training landslide pixel samples to build up a landslide detection model.

Jiuzhaigou Earthquake-Induced Landslide
On August 8, 2017, an earthquake with a magnitude of 6.5 Mw occurred in Jiuzhaigou County. Synthesizing field investigations and visual interpretations from high spatial resolution images, 4834 earthquake-triggered landslide inventories are mapped in [48]. The images covering the Jiuzhaigou earthquake-impacted area before and after the event were downloaded with a resolution of 2.39 m. However, the resolution is still different from that used to interpret landslide inventories in [48]. Therefore, the landslide inventories were adjusted to match the images we captured from Google Earth, as well by visual interpretation. The number of landslide inventories maintained in our study images is 3817. Moreover, there is one point where the images (after the event collected from Google Earth) are largely covered by clouds, as shown in Figure 6a,b. We collected one more image taken on 27 September 2019, in Figure 6c. Some parts are updated in Figure 6c, while the majority parts are maintained (the same as in Figure 6b). Concerning the disturbing clouds, we synthesized the three images by selecting the smallest pixel intensity in the green channel to maintain soil information and remove clouds. The synthesized image is shown in Figure 6d, and is directly used for landslide detection, although there is still some remaining clouds. From Figure 6d, we can recognize that landslides are distributed intensively in the central part of the study area and sparsely along the side of road networks. The main background objects mainly comprise of road networks, forestry, vegetation, bare soil, rocks, and clouds. This complicated object distribution pattern can be used to evaluate the transferability of the proposed model.

Jiuzhaigou Earthquake-Induced Landslide
On 8 August 2017, an earthquake with a magnitude of 6.5 Mw occurred in Jiuzhaigou County. Synthesizing field investigations and visual interpretations from high spatial resolution images, 4834 earthquake-triggered landslide inventories are mapped in [48]. The images covering the Jiuzhaigou earthquake-impacted area before and after the event were downloaded with a resolution of 2.39 m. However, the resolution is still different from that used to interpret landslide inventories in [48]. Therefore, the landslide inventories were adjusted to match the images we captured from Google Earth, as well by visual interpretation. The number of landslide inventories maintained in our study images is 3817. Moreover, there is one point where the images (after the event collected from Google Earth) are largely covered by clouds, as shown in Figure 6a,b. We collected one more image taken on 27 September 2019, in Figure 6c. Some parts are updated in Figure 6c, while the majority parts are maintained (the same as in Figure 6b). Concerning the disturbing clouds, we synthesized the three images by selecting the smallest pixel intensity in the green channel to maintain soil information and remove clouds. The synthesized image is shown in Figure 6d, and is directly used for landslide detection, although there is still some remaining clouds. From Figure 6d, we can recognize that landslides are distributed intensively in the central part of the study area and sparsely along the side of road networks. The main background objects mainly comprise of road networks, forestry, vegetation, bare soil, rocks, and clouds. This complicated object distribution pattern can be used to evaluate the transferability of the proposed model.

Central Nepal Landslide
Nepal is a country prone to multiple geohazards. In 2015, Nepal went through a series of deadly earthquakes and experienced thousands of landslides [49]. As shown in Figure 7, we randomly selected one mountainous spot with numerous landslides in Central Nepal and collected the corresponding images from Google Earth with a spatial resolution of 2.39 m. The landslides are mainly distributed along the side of road networks. Ground truth landslide polygons used for training and evaluating models are visualized by two experienced experts.

Central Nepal Landslide
Nepal is a country prone to multiple geohazards. In 2015, Nepal went through a series of deadly earthquakes and experienced thousands of landslides [49]. As shown in Figure 7, we randomly selected one mountainous spot with numerous landslides in Central Nepal and collected the corresponding images from Google Earth with a spatial resolution of 2.39 m. The landslides are mainly distributed along the side of road networks. Ground truth landslide polygons used for training and evaluating models are visualized by two experienced experts.

Central Nepal Landslide
Nepal is a country prone to multiple geohazards. In 2015, Nepal went through a series of deadly earthquakes and experienced thousands of landslides [49]. As shown in Figure 7, we randomly selected one mountainous spot with numerous landslides in Central Nepal and collected the corresponding images from Google Earth with a spatial resolution of 2.39 m. The landslides are mainly distributed along the side of road networks. Ground truth landslide polygons used for training and evaluating models are visualized by two experienced experts.

Southern Taiwan Landslide
Southern Taiwan is vulnerable of rainfall-induced landslides [50]. Typhoon Morakot hit Taiwan on 7 August 2009, and resulted in about 18,000 landslides. There has been considerable research (in regard to detecting landslides) for that event [51], but that study area is purely cut out from rural areas, and the background objects are generally pure vegetation. On the contrary, our study area (shown in Figure 8) is a rectangle, including oceans, an urban area with construction, and a rural area with bare soil and road networks. The landslides are distributed intensively in the central part of the study area. That adds more difficulty in detecting landslides, but one can evaluate the model more practically and objectively. The images we collected from Google Earth have a resolution of 19 m, and the corresponding ground truth landslide polygons are visually interpreted by two experienced experts.

Southern Taiwan Landslide
Southern Taiwan is vulnerable of rainfall-induced landslides [50]. Typhoon Morakot hit Taiwan on 7 August 2009, and resulted in about 18,000 landslides. There has been considerable research (in regard to detecting landslides) for that event [51], but that study area is purely cut out from rural areas, and the background objects are generally pure vegetation. On the contrary, our study area (shown in Figure 8) is a rectangle, including oceans, an urban area with construction, and a rural area with bare soil and road networks. The landslides are distributed intensively in the central part of the study area. That adds more difficulty in detecting landslides, but one can evaluate the model more practically and objectively. The images we collected from Google Earth have a resolution of 19 m, and the corresponding ground truth landslide polygons are visually interpreted by two experienced experts.

Experimental Settings
Our experiment was conducted on the PyTorch deep learning framework, and the proposed model was trained on three TITAN × GPUs from NVidia. Each GPU had a memory storage of 12 GB. The strategies of random scaling, random colorization, and random cropping were adopted to enlarge the data variability and enhance the generalization ability of the model. As introduced in Section 3, a potential landslide was first detected from the collected image to remove background objects (as many as possible). Based on the detected potential landslide image, four patches with a size of 512 × 512 pixels were generated in four directions, respectively, for each connective contour to cover more neighboring background objects, as demonstrated in Figure 9. For the cases where patch regions exceed image boundaries, they would be filled with zero intensity to maintain a size of 512 × 512. For cases where the potential landslide contour exceeds 512 × 512, the patch would take the same size of the bounding box of the corresponding connective contour. The total number of cropped patches of landslide events in Lushan is 7140, in Jiuzhaigou-19,088, in Nepal-6037, and in Taiwan-1160.
The Adam (adaptive moment estimation) optimizer [52] is used to optimize our proposed framework. Following the work in [21], the initial learning rate is set to 5 × 10 −5 , and has a decay rate of 10. We trained our model on 70% of the cropped patches generated in Lushan earthquake-induced landslide images and evaluated the model structure on the rest patches. In order to evaluate the transferability of the proposed network structure, we randomly selected 30% of the cropped patches from the other three events (Jiuzhaigou earthquake-induced landslide event, Nepal landslide event, and Taiwan rainfall-induced

Experimental Settings
Our experiment was conducted on the PyTorch deep learning framework, and the proposed model was trained on three TITAN × GPUs from NVidia. Each GPU had a memory storage of 12 GB. The strategies of random scaling, random colorization, and random cropping were adopted to enlarge the data variability and enhance the generalization ability of the model. As introduced in Section 3, a potential landslide was first detected from the collected image to remove background objects (as many as possible). Based on the detected potential landslide image, four patches with a size of 512 × 512 pixels were generated in four directions, respectively, for each connective contour to cover more neighboring background objects, as demonstrated in Figure 9. For the cases where patch regions exceed image boundaries, they would be filled with zero intensity to maintain a size of 512 × 512. For cases where the potential landslide contour exceeds 512 × 512, the patch would take the same size of the bounding box of the corresponding connective contour. The total number of cropped patches of landslide events in Lushan is 7140, in Jiuzhaigou-19,088, in Nepal-6037, and in Taiwan-1160.

Evaluations
To verify our proposed model, U-Net and SegNet were adopted to detect landslides on the four evaluation areas. Since U-Net and SegNet are both originally proposed with a single branch, they are designed to process input images with a single time domain. In order to conduct change detection from images with two different time domains, both U- The Adam (adaptive moment estimation) optimizer [52] is used to optimize our proposed framework. Following the work in [21], the initial learning rate is set to 5 × 10 −5 , and has a decay rate of 10. We trained our model on 70% of the cropped patches generated in Lushan earthquake-induced landslide images and evaluated the model structure on the rest patches. In order to evaluate the transferability of the proposed network structure, we randomly selected 30% of the cropped patches from the other three events (Jiuzhaigou earthquake-induced landslide event, Nepal landslide event, and Taiwan rainfall-induced landslide event) to fine-tune the model, and we evaluated the model with the rest patches, respectively. The number of patches used for evaluation in each of the corresponding four datasets are 4998 (Lushan), 13,362 (Jiuzhaigou), 4226 (Nepal), and 812 (Taiwan), and the corresponding number of ground truth landslide inventories for each dataset is 654 (Lushan), 1286 (Jiuzhaigou), 483 (Nepal), and 62 (Taiwan).

Evaluations
To verify our proposed model, U-Net and SegNet were adopted to detect landslides on the four evaluation areas. Since U-Net and SegNet are both originally proposed with a single branch, they are designed to process input images with a single time domain. In order to conduct change detection from images with two different time domains, both U-Net and SegNet were adapted to take the same twin-tower structures before the backbone of ResNet-50, to carry on a fair performance comparison with our proposed model. Moreover, the modified U-Net and SegNet were trained on the same data, with the same strategies and loss functions. We randomly selected two test sites for each evaluation study area and demonstrated the corresponding detection results by U-Net, SegNet, and the proposed two-branch Matrix SegNet in Figure 10, respectively.
A visual comparison of Figure 10 reveals that the matrix convolution module and the squeeze-and-excitation module adopted in the two-branch Matrix SegNet work well in capturing multi-scale features and enhancing useful features from images with different spatial resolutions. Most landslides can be well detected by the two-branch Matrix SegNet in all four study areas, except for some minor landslides omitted in each image, especially in the cases of Lushan. As shown in Figure 10(c1,c2), SegNet fails to detect most landslides in the images of Lushan. U-Net omits more small landslides than our two-branch Matrix SegNet, as shown in the yellow circle in Figure 10(d1). In Jiuzhaigou, SegNet is able to detect some landslides, but omits more landslides (referring to Figure 10(c3,c4). U-Net performs much better than SegNet in detecting most landslides, nevertheless, it misclassifies more tiny landslides as background objects compared with the two-branch Matrix SegNet in Figure 10(a3-d3,a4-d4). The landslide distributions of images in Nepal and Taiwan are comparatively simpler, and the detection performances by the three methods are largely enhanced. However, omission is still an important issue in small landslide detection, especially by SegNet and U-Net. Moreover, in Taiwan ( Figure 10(a7-d7,a8-d8)), commission is largely raised by SegNet and U-Net as well. Faced with complicated background objects in different spatial resolutions, the proposed two-branch Matrix SegNet behaves generally well in landslide detection by learning multi-scale features through the matrix convolution module, enhancing useful features through the squeeze-and-excitation module. That further verifies the effectiveness of our model in detecting various landslides from complicated background objects without radiometric correction.
To present a general statistical and objective evaluation on each of the four study areas, we calculated recall, precision, F1-measure, and intersection over union (IOU), according to Equations (3)-(6) for each study site, and listed the statistics in Tables 2-5, respectively. TP indicates the number of ground truth landslide pixels correctly classified to landslides. TN indicates the number of ground truth background object pixels correctly classified as background objects. FP indicates the number of ground truth background object pixels misclassified as landslides. FN indicates the number of ground truth landslide pixels misclassified as background object pixels. IOU and F1-measure are commonly used as comprehensive evaluation indexes, balancing between recall and precision.  F1-measure = 2 × precision × recall precision + recall (6) It is apparent that the proposed two-branch Matrix SegNet has the best general performance with the highest F1-measure and IOU among the three methods in detecting landslides for the four study areas. U-Net and SegNet both gain low recall and high precision in Lushan datasets. Synthesizing images in Figure 10, we can see that U-Net and SegNet are more sensitive to small landslides. They are more likely to misclassify small landslide areas as background objects. We should note that the spatial resolution of Lushan and Taiwan datasets is lower than that of Nepal and Jiuzhaigou datasets. As shown in Figure 10, the landslides in Lushan dataset, especially, are distributed more intensively with a smaller size. It indicates that small landslides can easily be excluded with continuous convolution operations in the encoding parts of both U-Net and SegNet structures, since they both take the typical encoding-decoding network structures. Moreover, comparing landslide distribution of different datasets from ground truth images in Figure 10, we can recognize that landslide distribution is most complicated in Lushan dataset, with the most intensive distribution of the smallest landslides. Therefore, F1-measure and IOU of U-Net, and our two-branch Matrix SegNet, are no higher than 35%. However, our proposed Matrix SegNet still has a slightly higher accuracy, by 0.2%. In terms of the Taiwan dataset, which has the same spatial resolution with the Lushan dataset, U-Net still omits small landslides (as shown in Figure 10(c7-d7), while SegNet misclassifies many small background areas as landslides (as shown in Figure 10(c8-d8). It further validates the high sensitivity of a typical encoder-decoder network architecture in detecting small landslides from high spatial resolution images.
The performances of the three methods are apparently better in Nepal and Jiuzhaigou datasets than the other two datasets with at least 15% higher F1-measure and IOU. It indicates that more details in higher spatial resolution images are better at providing more detailed ground object information for landslide detection, especially in cases of small landslides. Our proposed Matrix SegNet can achieve up to 20% progress in detection accuracy in the datasets of Jiuzhaigou and Nepal.

Discussions
Synthesizing the evaluation performances of U-Net, SegNet, and our proposed twobranch Matrix SegNet on the test images of four different study areas, our model has better transferability than U-Net and SegNet. Faced with evaluation images with different spatial resolutions, the parameter IOU of our proposed two-branch Matrix SegNet is at least 0.21% higher than U-Net and SegNet in Lushan and Taiwan datasets with a spatial resolution of 19 m, and over 7% progress than U-Net and SegNet in the Nepal and Jiuzhaigou datasets with a spatial resolution of 2.35 m. Similar with parameter IOU, the comprehensive evaluation parameter F1-measure also witnessed similar improvement by our proposed two-branch Matrix SegNet with almost 0.2% progress in Lushan and Taiwan datasets and 7% progress in Nepal and Jiuzhaigou datasets. The performance of our model improves more on images having a spatial resolution of 2.35 m than those with a spatial resolution of 19 m. This mostly attributes to the spectral and textural details demonstrated of landslides in images with higher spatial resolution. The matrix learning module used in our model is more applicable for multiple landslide detection from high spatial resolution images by capturing landslide features horizontally, vertically, and diagonally. However, the traditional layer-wise convolution is likely to omit multi-scale features of landslides by U-Net and SegNet.
From the evaluation statistics, we recognize that the precision and recall of our proposed two-branch Matrix SegNet are more balanced than those of U-Net and SegNet in all cases of the evaluation datasets. One possible driving factor of this phenomenon is the high omission of small landslides by U-Net and SegNet. They can mostly be well-detected by our Matrix SegNet. However, regardless of the improvement, the recall by our proposed model is smaller than 71% in the evaluation cases. The disturbing bare soil, as the background objects of landslides, is an issue confronted in our model as well. This may be overcome by modifying the network structure to focus more on enhancing the shape discrimination ability between landslides and background objects in future work.
We should note that the accuracy gained by our experiments is lower than that obtained in [5], mainly because of the large differences of experimental images. Images in [5] were mostly captured from areas with pure vegetation and there was high contrast between landslides and background objects in spectral characteristics. However, images in our experiment were mostly taken with various imaging radiances from different study areas, and the ground objects were distributed complicatedly to mimic practical applications.

Conclusions
This study proposed a practical trial of landslide detection by adopting raw pre-and post-event images captured from Google Earth directly. In our proposed two-branch Matrix SegNet, the matrix convolution module was adopted to learn landslide features with multiple aspect ratios and scales, horizontally and vertically. Landslide features were further enhanced by incorporating the squeeze-and-excitation modules. The proposed model structure was applied to detect landslides from images with different spatial resolutions from four different study areas with various ground object distribution patterns. Two widely used semantic segmentation frameworks, U-Net and SegNet, were adopted as comparisons with the proposed model. The statistical evaluations verify the efficiency of our proposed two-branch Matrix SegNet in detecting multiple landslides. It shows a larger improvement in landslide detection in F1-measure and the IOU of datasets, with a spatial resolution of 2.39 m over the performances by U-Net and SegNet, than that in datasets with a spatial resolution of 19 m over U-Net and SegNet. The matrix convolution module has a stronger ability to capture multi-scale landslide features. However, a high landslide omission rate is an issue in our proposed model, especially in cases with small landslides in complicated background objects. In future work, we will work on modifying the model structures to enhance the distinguishing ability of shape characteristics of landslides and bare soils, to reduce the omission error. Our proposed framework switches on a path to develop reliable and applicable methods to detect landslides from large-scale research areas with complicated background objects.

Conflicts of Interest:
The authors declare no conflict of interest.