Next Article in Journal
Tree Segmentation and Parameter Measurement from Point Clouds Using Deep and Handcrafted Features
Previous Article in Journal
Semantic Segmentation Model for Wide-Area Coseismic Landslide Extraction Based on Embedded Multichannel Spectral–Topographic Feature Fusion: A Case Study of the Jiuzhaigou Ms7.0 Earthquake in Sichuan, China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Lightweight Landslide Detection Network for Emergency Scenarios

1
The Key Laboratory of Natural Resources Monitoring in Tropical and Subtropical Area of South China, Ministry of Natural Resources, Guangzhou 510670, China
2
Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 610032, China
3
Surveying and Mapping Institute Lands and Resource Department of Guangdong Province, Guangzhou 529001, China
4
The Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources, Shenzhen 518034, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2023, 15(4), 1085; https://doi.org/10.3390/rs15041085
Submission received: 14 December 2022 / Revised: 4 February 2023 / Accepted: 6 February 2023 / Published: 16 February 2023

Abstract

:
Landslides are geological disasters that can cause serious severe damage to properties and lead to the loss of human lives. The application of deep learning technology to optical remote sensing images can help in the detection of landslide areas. Traditional landslide detection models usually have complex structural designs to ensure accuracy. However, this complexity leads to slow detection, and these models often do not satisfy the rapid response required for the emergency monitoring of landslides. Therefore, we designed a lightweight landslide target detection network based on a CenterNet and a ResNet50 network. We replaced the BottleNeck in the backbone network of ResNet50 with a Ghost-BottleNeck structure to reduce the number of parameters in the model. We also introduced an attention mechanism module based on channel attention and spatial attention between the adjacent GhostModule modules to rich the landslide features. We introduced a lightweight multiscale fusion method in the decoding process that presented a cross-layer sampling operation for the encoding process based on Feature Pyramid Network. To down-sample from a low resolution to a high resolution and up-sample from a high resolution to a low resolution, thus skipping the medium-resolution levels in the path. We added the feature maps obtained in the previous step to the feature fusion. The Conv module that adjusts the number of channels in the multiscale feature fusion operation was replaced with the GhostModule to achieve lightweight capability. At the end of the network, we introduced a state-of-the-art Yolov5x as a teacher network for feature-based knowledge distillation to further improve the accuracy of our student network. We used challenging datasets including multiple targets and multiscale landslides in the western mountains of Sichuan, China (e.g., Danba, Jiuzhaigou, Wenchuan, and Maoxian) to evaluate the proposed lightweight landslide detection network. The experimental results show that our model satisfied landslide emergency requirements in terms of both accuracy and speed; the parameter size of the proposed lightweight model is 18.7 MB, namely, 14.6% of the size of the original CenterNet containing the ResNet50 network. The single image detection time is 52 ms—twice as fast as the original model. The detection accuracy is 76.25%, namely, 12% higher than that of the original model.

1. Introduction

Landslide disasters are one of the most widespread and severe natural disasters in the world. Landslide refers to the phenomenon of deformation and failure of slope rock and soil mass mainly moving horizontally along a certain surface under the action of gravity or other forces (e.g., superimposed seismic force or water pressure) [1]. Landslides pose a major threat to residential buildings, roads, service facilities, and personal safety at the foot of the mountain [1]. Moreover, the flow of landslides will destroy farmland and cultivated land, causing economic losses, blocking rivers to form barrier lakes, and burying hidden dangers. For example, the Zemun settlement on the northern outskirts of Belgrade has experienced a number of landslides in the last three decades, endangering buildings and roads [2]. Shallow landslides due to the soil saturation induced by intense rainfall events are very common in northern Italy, particularly in the Alps and Prealps [3]. How to reduce the huge losses caused by landslide disasters and ensure the safety of disaster relief personnel is an urgent problem to be solved. Therefore, it is necessary to explore the method of rapid response after a landslide disaster, to efficiently identify the landslide hazard and locate the landslide area in a short time. Research on landslides is as follows: focusing on the Colombian Andes, Grima [4] aims to identify differences in the frequency of landslides in forested and non-forested areas. Tonini [5] analyzed the spatio-temporal pattern distribution of landslides in Switzerland, applying global and local clustering indicators for landslide detection. Al-Umar [6] has developed a geographic information system (GIS)-based modeling approach to assess and predict rainfall-induced landslides in sensitive clay soils in the Ottawa region. With the popularization of optical remote sensing technologies such as high-resolution satellites and unmanned aerial vehicles, high-resolution optical remote sensing images are widely used in the field of geological disaster monitoring. Advanced remote sensing techniques may be considered important tools for the early detection and sustainable monitoring of ongoing or possible landslide disasters.
The detection of landslides based on remote sensing images can be regarded as an image processing problem. Mathematical statistical and machine learning methods were mainly used in the early stages of research in this field [7]. Xia et al. [8] used logistical regression to predict landslide areas using morphological features (e.g., the terrain slope, aspect, and curvature). Chen et al. [9] proposed a landslide detection method based on the fusion of spatiotemporal spectral features; by extracting the spatial shape features of landslides (e.g., the axial aspect ratio), they used a support vector machine classifier to detect and identify the area of landslide. Hu et al. [10] used a support vector machine, artificial neural network, and random forest models to identify landslides from satellite images, performing cross-validation on three machine learning models to improve the accuracy. However, the processing efficiency of traditional machine learning methods is low when faced with a large amount of data.
With the development of deep learning, a deep neural network can automatically detect landslide areas from high-dimensional features extracted from optical images. Long et al. [11] established two landslide detection models: The deep belief network (DBN) and the convolutional neural-deep belief network (CDN). Their model achieved high indicators in terms of accuracy and recall, showing that the image segmentation effect of the landslide area was better when using a deep neural network. Tanatipuknon et al. [12] realized target detection in landslide areas based on the Faster R-CNN [13] detector, which has a higher accuracy than traditional object-oriented classification methods. Cheng et al. [14] designed a lightweight target detection network based on the YOLO target detection framework, which had a high recognition accuracy while maintaining a small size. Ji et al. [15] introduced a 3D attention mechanism that greatly improved the accuracy of the traditional network architecture. Ullo et al. [16] used the MaskRCNN method to segment high-resolution landslide images and achieved high precision.
In the field of object detection, the R-CNN detector [17] is based on the candidate box mechanism and uses CNN to calculate features in the candidate region for object recognition. The Faster R-CNN detector first calculates the full-image features based on R-CNN. It then obtains candidate regions through a region proposal network and calculates features for object recognition. However, this two-stage object detection method is based on the search mechanism for the candidate regions, which leads to highly complex algorithms and redundant feature calculations that make real-time detection difficult. End-to-end one-stage object detection methods are gradually becoming mainstream with the use of YOLO [18], SSD [19], and other target detection methods that detect object categories and positions through the one-shot output of convolutional neural networks. Various lightweight neural network frameworks based on depthwise separable convolutions have been proposed [20], which can maintain good feature extraction effects while maintaining a low parameter value.
In the field of knowledge distillation, Mehta [21] researched network structures, loss function design, and training data—and applied knowledge distillation to the one-stage target detection model YOLOv2 [22]. The student network structure was optimized through the FPN-like structure. During distillation, the feature map non-maximum value suppression algorithm FM-NMS was used to filter redundant frames, and an unlabeled dataset was added for training. However, the improvement effect of distillation was still limited. In 2019, Zhang et al. [23] applied an attention transfer algorithm of knowledge distillation and a normalized loss function to the target detection task in understanding the environment of intelligent vehicle driving, which effectively improved the accuracy of the small network.
In 2020, Zhang [24] proposed an attention-guided distillation method and a non-local distillation method to overcome the shortcomings of the lack of distillation in the relationship between different pixels in the distillation process. Improvements in accuracy were obtained for both the one-stage and two-stage object detectors. In the same year, Liu et al. [25] showed that the existing pixel-by-pixel distillation method cannot extract structural information from features and proposed the construction of a static graph and adversarial training to distill the student network, which achieved better distillation results in both object detection and semantic segmentation.
Although new algorithms continue to improve the performance of landslide detection, there are still some issues that need to be resolved:
(1)
The feature extraction ability of traditional landslide detection networks is often improved by increasing the depth of the model. However, while improving the accuracy of detection, this will also increase the number of model parameters. Large deep neural networks are limited by their operating memory footprint, making it difficult to apply in emergency response scenarios.
(2)
Lightweight landslide detection models have a reduced accuracy because the network depth and the number of parameters are both reduced. Figure 1 shows the error detection results for a lightweight network, which includes missing detection (Figure 1a), false detection (Figure 1b), failure to detect (Figure 1c), and a detection range that is too small (Figure 1d).
Based on the single-stage target detection CenterNet framework, we designed a lightweight network backbone (CBAM-Mini-bone) based on the attention mechanism to reduce the number of parameters. The lightweight backbone feature extraction network (Figure 2, red box), which was designed using 16 Ghost-BottleNecks, was used to replace ReaNet50 (Figure 2, red box). ResNet50 generates higher-dimensional feature maps that need to use convolution kernels to perform convolution operations based on all dimensions of the original feature maps, which leads to a large increase in the number of parameters when facing high-dimensional features. The G-Bneck network uses simple operations (e.g., linear mapping) to obtain high-dimensional feature maps. This substantially reduces the overall number of calculations and required parameters without changing the depth of the feature map. We also introduced a convolutional attention module into the G-Bneck structure. This module contains a channel attention module and a spatial attention module that calculate the weights in the channel and space directions, respectively. We used these weights to adaptively optimize the feature map to enrich the information available in it. Figure 2b shows a simplified structure of CenterNet with CBAM-Mini-bone.
We propose a lightweight multiscale fusion method to reduce the loss of accuracy in the CBAM-Mini, as this method enhances the multiscale features. We replaced Conv with the GhostModule in the convolution operations to achieve the lightweight capability and multiscale fusion. Based on the combination of high-level semantic information and low-level location information achieved by only up-sampling adjacent layers in the feature pyramid network, we add a cross-layer sampling method and make predictions in the last layer, which not only makes the expression of target features clearer but also improves the network’s ability to detect objects of different scales. We have three types of feature fusion modules. Figure 3 shows the lightweight multiscale fusion structure.
Combined with Figure 3, we took the following measures. For the low-resolution feature maps (32 × 32 × 120) in the decoding process, we directly down-sampled the high-resolution feature maps (128 × 128 × 24) to a size of (32 × 32) (i.e., we skipped the medium-resolution levels in the path). The (16 × 16 × 480) feature maps from the previous layer were also up-sampled to a size of (32 × 32). We concatenated these maps with the same-scale feature maps (32 × 32 × 112) along the channel and then adjusted the number of channels through the GhostModule for feature fusion. The specific technical route is shown by the orange arrow in Figure 3. For the mid-resolution feature maps (64 × 64 × 60) in the decoding process, the output result from the previous step was up-sampled and then fused with the (64 × 64 × 40) feature maps. The specific technical route is shown by the green arrow in Figure 3. Similarly, for the high-resolution feature maps (128 × 128 × 30) in the decoding process, the output result from the previous step was up-sampled, and we directly up-sampled the low-resolution feature maps (128 × 128 × 30) to a size of (32 × 32). These maps were then fused with the (128 × 128 × 24) feature maps to obtain the final feature maps. The specific technical route is shown by the purple arrow in Figure 3. We then used the feature-based knowledge distillation method to migrate the features learned using the Yolov5x model to the small model. This operation further improved the accuracy of the lightweight model. In summary, the main contributions of this study are as follows:
We designed a lightweight network backbone based on the attention mechanism. We used the Ghost-Bneck to rebuild a lightweight network backbone and introduced an attention mechanism into the Ghost-Bneck structure.
We propose a lightweight multiscale fusion method that contains three fusion modules to enhance feature extraction. We used Yolov5x as the teacher network to perform feature-based knowledge distillation on the obtained network.
The remainder of this paper is organized as follows. Section 2 summarizes the study area used for landslide hazard detection. Section 3 details the lightweight landslide target detection network based on CenterNet. Section 4 describes and analyzes our results. Section 5 details our conclusions.

2. Study Area

The main data collection area of this experiment is located in the western region of Sichuan Province, China (e.g., Danba, Jiuzhaigou, Wenchuan, and Maoxian). Its latitude and longitude span 26°03′–34°19′N and 97°21′–102°31′E, respectively. The study area is about 537 km long from east to west and 921 km wide from north to south, as shown in Figure 4. The terrain in the western part of Sichuan Province is complex, mostly comprising plateaus, high mountains, and deep valleys, which lead to great fluctuations in altitude. The exposed formation lithology is also very complex. There are many slippery and brittle strata such as nitrate rock and layered metamorphic rock, and the geological activity in the fold fault structure is relatively strong. Coupled with frequent seismic activity and abundant rainfall, these factors make Sichuan Province one of those with the most serious landslide disasters in the country.
Surrounded by mountains in western Sichuan, the mountain roads pass through a large number of slopes. Sudden landslides and collapses occur often due to geological and landform conditions, resulting in traffic interruption, as shown in Figure 5a. The water conservancy project has irrigated a large area of fertile fields in the Chengdu Plain. Numerous hydropower stations also provide sufficient electricity for economic development. However, the special geographical environment in the distribution area of these engineering facilities is also conducive to the development of landslides, as shown in Figure 5b, which, when mismanaged, can cause economic loss and loss of life. Therefore, providing fast and timely detection results for a large amount of image data can quickly characterize the situation of landslides, provide accurate solutions for landslide disasters, and avoid greater losses of personal safety and property.

3. Methods

3.1. Architecture Overview

We established a light landslide detection network for use in emergency scenarios via a process of simplification and improvements in accuracy based on a neural network. The goal of this project was to obtain a landslide detection model capable of fast detection with less loss of accuracy. We split the technical route into two stages: (1) a Mini-backbone network stage based on the attention mechanism; and (2) a stage to improve the accuracy of the lightweight network (Figure 6).

3.2. Mini Network Backbone of Attention Mechanism

Deep neural networks usually extract high-dimensional feature maps with thousands of dimensions of information from landslide images. However, high-dimensional feature maps often lead to redundancy of information and increase the amount of calculation. Taking ResNet50 as an example, Figure 7 shows that there are many similar feature maps. The key to designing a lightweight model while maintaining its accuracy is to effectively avoid extracting redundant feature maps and to instead extract effective feature maps with less computation.
The GhostModule module builds the G-Bneck structure shown in Figure 8, which constitutes the backbone of the Mini feature extraction network. Among the high-dimensional feature maps extracted by G-Bneck, some of the features dominate the visual detection task, whereas others contribute less to the final detection result. Therefore, we introduced an attention mechanism structure module (CBAM) into the G-Bneck residual structure. For the input feature map F   i n F   i n , we used the channel attention mechanism to apply weight, which can be learned independently, to each channel of F   i n . This operation improves the contribution of the main features to the final detection results and thus enriches the semantic features. The calculation formula is F = F   i n M   c ( F   i n ), where M   c represents the calculation operation of the channel attention mechanism. We then used the spatial attention mechanism based on F′ to learn the weight corresponding to each position in the width and height dimensions of the feature map. This operation increases the contribution of the main area of the image to the final detection result and strengthens the detection effect of the network on local areas. We obtained the final output feature F   o u t   = M s (F′) ⊗ F′, where Ms is the spatial attention mechanism calculation operation. Figure 9 shows the two G-Bneck structures introduced into CBAM.
Based on the original CenterNet with ResNet50, we used the G-Bneck to design the backbone of the lightweight landslide detection network CBAM-Mini. The image input size was (512 × 512 × 3). We divided the backbone network into five stages. Apart from the last residual structure, the size of the feature map was unchanged in each stage. In the last step before going to the next stage, we set the convolution stride to 2. We achieved the goal of continuously extracting high-dimensional features by reducing the size of the feature map and expanding the number of channels at the same time. Table 1 gives the parameters of each stage of the backbone feature extraction network based on the attention mechanism of Mini (CBAM-Mini). At this time, the total number of parameters in the network model is 3.17M.
Figure 10 shows that, although the Mini network reduces the amount of computation required for the convolution operations, too few feature maps contain less semantic information, and it is difficult to fully represent the features of the original image. Figure 10c shows the poor generalization ability and fuzzy features of the model. After adding CBAM, we used the channel-based attention mechanism and the space-based attention mechanism to learn the weight of the landslide features, which enhanced the capture of landslide features (Figure 10b). Although the feature map was still blurred, it retained more features of the landslide.

3.3. Lightweight Network Accuracy Improvement Module

The CBAM-Mini network is limited by a smaller number of parameters than the large model, which will lead to a loss of accuracy. Therefore, we need to improve its ability to extract features. Based on the feature pyramid network, we used a lightweight multiscale feature fusion method to improve the feature extraction ability of the CBAM-Mini network and output the feature in the last layer. We then used the knowledge distillation method to transfer the knowledge of the large model to the small model to improve the detection accuracy of the latter.

3.3.1. Lightweight Multiscale Feature Fusion

In the decoding part of the CBAM-Mini network structure introduced in Figure 2b, the high-dimensional landslide feature map was only obtained directly from the last layer network of the backbone of the CBAM-Mini network. A rough landslide prediction result was obtained, and its characteristic map is shown in Figure 11c. Therefore, we modified the decoder part of the CBAM-Mini network. The method of up-sampling directly from the last layer of the backbone network was modified to combine feature maps of various sizes during encoding into different modules for feature fusion. A comprehensive feature map at multiple scales was then obtained, which improved the prediction results. Figure 11b shows the feature maps after feature fusion, where it can be seen that the feature maps not only capture the landslide target clearly but also display some of the background information.
There are three fusion modules for decoding different partial-resolution feature maps. Combined with Table 1, the (16 × 16) feature maps in the encoding process were up-sampled to a size of (32 × 32) through transposed convolution, and the (128 × 128) feature maps of stage 1 were down-sampled to a size of (32 × 32) through transposed convolution. We concatenated these maps with the (32 × 32) size feature map output from Stage 4 along the channel direction and then adjusted the number of channels through the GhostModule for feature fusion; the basic structure is shown in Figure 12a. The output result was then up-sampled and fused with the (64 × 64) feature map from Stage 2; the basic structure is shown in Figure 12b. Finally, this map was fused with the (128 × 128) feature map from Stage 1 and the (32 × 32) feature map from Stage 5 to obtain the final feature map; the basic structure is shown in Figure 12c. Figure 13 shows the CBAM-Mini multiscale feature fusion network, which improved the detection ability of objects on different scales.

3.3.2. Feature-Based Knowledge Distillation

Yolov5x is a single-stage target detection algorithm with three prediction branches. The output feature map of the backbone network and decoder part of the teacher network contains a lot of information, which, if passed to the student network as knowledge, will then help it achieve a model generalization ability that is close to that of the teacher network. However, this information also includes some useless knowledge, which, if directly transferred to the student network, will then reduce the effect of knowledge distillation. Our proposed filter map filters out the useless information in the knowledge during the process of transferring knowledge from the teacher network. In the first stage of the calculation process for the filter map, the mask map [26] is obtained from the predicted box and the real label of the teacher network. The value of part of the prediction frame in the mask map [26] is 1, indicating that the foreground area of the image is the area containing the target. The parts of the image that are not within the prediction box have a value of 0, representing the background area. The confidence output of the teacher network contains the importance of the foreground information: the closer to the core part of the target, the greater the confidence, whereas the further away the core part of the target, the smaller the confidence. We based the confidence on the output of the teacher network. The calculation formula of the filter map is as follows:
filter map = confidence × mask map,
where “filter map” represents the filter map, and “confidence” represents the confidence of the teacher network output. Multiplying the confidence and the mask map [26] gives the filter map.
We used Yolov5x as the teacher network and CBAM-Mini after lightweight feature fusion as the student network. The feature-based distillation process was divided into three steps. First, we input the feature layer output by Yolov5x into an adaptive layer consisting of (1 × 1) convolutions. This step kept the dimensionality of the output feature layer of the teacher network consistent with that of the CBAM-Mini network after lightweight multi-scaling. The feature extraction layer dimensions of the CBAM-Mini network backbone part (Encoder) after multiscale fusion are 480, 80, and 40, respectively. The dimensions of the feature extraction layer of the CBAM-Mini network decoder part (Encoder) after multiscale fusion are 240 and 60, respectively. Second, by assigning discriminative weights to the regions covering objects in the filter map, we computed the squared difference loss between the feature layers guided by the filter map. We then back-propagated the loss.
The loss function is:
L loss = L hard ( s , T ) + α L soft .
The loss function in knowledge distillation is mainly composed of the real loss of detection and the distillation loss. L loss is the total loss value, and L hard is the real loss between the student network and the real label, where s represents the student network, T represents the real label, and t represents the teacher network. L soft is the distillation loss between the student network and the teacher network. α is a hyperparameter that can be used to adjust the weight ratio of the real loss and the distillation loss, which was set to 1 here. In this experiment, the distillation loss was the squared difference loss between the output of the feature layer of the student network and the teacher network:
L soft = 1 2 N x = 1 W y = 1 H z = 1 C c o n f t H x y ( f backbone adap ( t ) x y z s x y z ) 2 ,
where N is the sum of the number of pixels that are not 0 in the information map, which can be used to normalize the result of the squared difference loss; W, H, and C represent the size of the feature layer output by each feature layer; H x y is the value of the mask map, with the foreground area marked as 1 and the background area marked as 0; c o n f t is the confidence level output by the teacher network, which is multiplied by the filter map to obtain the information map, guiding the distillation of the feature layer; and f backbone adap ( t ) x y z is the feature layer output by the teacher network feature layer after processing by the adaptive layer, which is consistent with the dimension of the feature layer s x y z   s x y z   output by the student network—convenient for the distillation of the student network for the teacher network.
Figure 14 shows three feature maps output by the student network backbone (Encoder) and the teacher network backbone (Encoder) before and after distillation. The feature maps (part b) before distillation have gradually lost the contour features of the target. The feature maps (part d) output by the teacher network maintain the outline of the target and can also distinguish the background information. Compared with the feature map before distillation (part b), the feature maps after distillation (part c) not only preserve the detailed features of the target but are also closer to the output feature map (part d) of the teacher network. When the distilled student network makes the final prediction based on the feature map, it is easier to obtain detection results that are similar to the teacher network. Thus, the detection performance of the student network is improved.

3.4. Evacuation Metrics

The AP (average precision, average accuracy) is commonly used as an indicator in the field of target detection to evaluate accuracy. It is a comprehensive evaluation index that combines the intersection ratio, the confidence threshold, the accuracy rate, the recall rate, and the P–R curve; see Formulas (4)–(7).
  • Precision and recall:
Precision and recall are two common metrics for evaluating classification accuracy. The confusion matrix is used to represent the result of the classification. For a sample with a positive predicted value, if the actual value is positive, then it is referred to as a true positive (TP) value. If the true value is negative, then it is referred to as a false positive (FP) value. Similarly, for samples with a negative predicted value, if the true value is negative, then it is referred to as a true negative (TN) value. If the true value is positive, then it is referred to as a false negative (FN) value.
The formula for calculating the accuracy is:
Precision = TP TP + FP
The formula for calculating recall is:
Precision = TP   TP + FN
The intersection over union (IoU) is an indicator used to measure the accuracy of the prediction frame. The calculation formula is:
  IoU = A B A B
In the experiment, it is generally believed that the prediction box is correctly predicted when IoU > 0.5, which is recorded as a positive class. When IoU < 0.5, it is considered that the prediction frame does not correctly predict the position of the object and it is recorded as a negative class.
  • P-R curve:
Based on the set IoU threshold, the model judges a result larger than the threshold as a positive sample and a result smaller than the threshold as a negative sample. At this time, the corresponding recall rate and the precision rate of the result are returned. The entire P-R curve is generated by shifting the threshold from high to low.
  • Average accuracy:
The AP calculates the area enclosed by the P-R curve:
AP = 0 1 P ( r ) dr
This experiment calculated the average accuracy rate with an IoU threshold of 0.5, denoted as AP 50 . Unless otherwise specified, the AP index in the accuracy evaluation reported here defaults to AP 50 .

4. Results and Discussion

4.1. Experimental Platform

Our hardware platform was based on an Intel i5-10400f CPU, and an NVIDIA RTX2060 GPU with 16 GB memory. The software platform implemented the construction and training of the deep neural network models based on the PyTorch framework. PyTorch is an open-source deep learning framework using the Python programming language. It can process tensor data and it encapsulates common basic operation units (e.g., convolution, pooling, and full connection), which is convenient for users customizing deep neural network structures. It can simultaneously realize the automatic derivation of tensors and the optimization algorithm of most model training.

4.2. Experimental Settings and Datasets

The model training optimizer adopts the Adam optimizer. The initial learning rate is set to 0.001. The Weight_decay parameter is set to 1 × 10−5 to introduce a regular term to slow down the overfitting phenomenon. The learning rate adjustment strategy is exponential decay, where the decay coefficient is 0.98, and the step size is 3.
Figure 15 is based on 1200 optical remote sensing images of landslides in western Sichuan Province, China as the data source. The dataset was divided into a training set, a validation set and a test set at a ratio of 8:1:1. The training set was used for data augmentation. The validation set was neither augmented nor involved in training and was only used for evaluating the accuracy after completing Epoch training. The test set was used to test the model.

4.3. Experimental Analysis

Figure 16 shows part of the prediction results of the CBAM-Mini network after knowledge distillation and multiscale feature fusion (Light-Net), where the best recognition is for a block landslide with a clear subject. Narrow and long landslides can be detected by calculating the smallest circumscribed rectangle. The teacher network Yolov5x has an excellent detection ability for small targets; therefore, the lightweight network also has a better detection ability for small target landslides after transfer learning.
The current model still has some missing and wrong detections (Figure 17). The image in Figure 17a is dark because it was affected by light factors, and some landslide areas (blue boxes) could not be detected. The landslide in the blue box in Figure 17b was not detected because it was texturally similar to the surrounding bare land. The bare land and the house in Figure 17c were incorrectly detected because they were similar in color to the landslide.

4.4. Model Comparison

We used CenterNet with ResNet50 as the original basic network (model 1, Figure 18, ①), the CBAM-Mini network (model 2, Figure 18, ②), and Yolov5x as the reference network (model 3, Figure 18, ③). Yolov5x is the most classic one-stage algorithm and is also the most widely used target detection network in the world. Therefore, we chose it to compare with our network. The proposed network, using lightweight multiscale fusion and knowledge distillation, was based on the CBAM-Mini network (Light-Net) (model 4, Figure 18, ④).
  • Comparison of precision and recall curves:
Figure 18 shows the P-R curves, the accuracy-confidence threshold (confidence) curves, and the recall-confidence threshold (confidence) curves of the four models. Model 2 (Figure 18, ②) can achieve a relatively high accuracy rate with a high confidence threshold, but the recall rate is low. We can improve the AP value of the detection model by introducing feature extraction modules and knowledge distillation to improve the recall rate of the model. Model 1 (Figure 18, ①) can achieve a balance between accuracy and recall when the confidence threshold is set to 0.2–0.3. Model 3 (Figure 18, ③) shows good performance in terms of accuracy and recall and is therefore suitable as a teacher network. The accuracy of the proposed network (model 4, Figure 18, ④) can reach close to 100% at a high confidence threshold, but the recall rate is low. The model recognition effect is best when the confidence threshold (confidence) is about 0.3.

4.5. Model Size, Calculation Amount, and Comparison of Accuracy

The number of parameters for model 1 is 127.9 MB, and, in the experiment, its average image reasoning speed was 100 ms. The number of parameters for model 2 (Figure 18, ②) is 12.7 MB, and it took about 40 ms to infer an image. The number of parameters of model 3 (Figure 18, ③) is 86.7B, and it took about 67.79 ms to infer an image. We introduced lightweight multiscale fusion and knowledge distillation based on model 2 (Figure 18, ②). By reducing the number of channels and replacing Conv with GhostModule in the decoding part, the number of parameters was controlled to 18.7 MB. However, due to the multiplexing of the backbone features, the amount of calculation was increased in feature fusion and knowledge distillation; thus, the inference time of the proposed network (model 4, Figure 18, ④) increased to about 52 ms. The specific data are shown in Table 2.

4.6. Landslide Emergency Detection

Figure 19 gives a flow chart of landslide emergency detection using the Light-Net model, showing the main types of landslides and their deformation characteristics. Unmanned aerial vehicles are used to collect images of landslides in disaster areas, and the Light-Net network quickly identifies them to generate valuable information.

5. Conclusions

Deep learning technology can extract rich semantic information from remote-sensing images to achieve target detection. We designed a lightweight landslide detection network for emergency scenarios. The network is based on the CenterNet object detection framework, and we used GhostModule to replace Conv to build a lightweight backbone feature extraction network. We also used the attention mechanism, lightweight multiscale feature fusion, and feature-based knowledge distillation to enhance feature extraction.
The conclusions drawn from this study are as follows:
  • The lightweight backbone feature extraction network constructed by GhostModule can reduce the number of parameters by 80–90% compared with traditional large networks such as ResNet50.
  • The use of the CBAM module increased the robustness of the extracted features. Instead of using CenterNet with ResNet50 decoding for direct up-sampling, we spliced the features of the decoding part with the feature maps of different modules of the encoding part. We then used Yolov5x for knowledge distillation, thus improving the prediction effect of the model. These operations improved the detection accuracy of the model.
  • The lightweight model is used directly, which leads to a large loss of accuracy because of the poor feature extraction ability. However, after combining lightweight feature enhancement modules and knowledge distillation modules, the feature extraction ability of the lightweight model can be improved without affecting the number of parameters—even achieving a higher precision than the original large model.
The final volume of the lightweight model is 18.7 MB, namely, 14.6% of the volume of the original large model (CenterNet with ResNet50). The inference time of the lightweight model is 52 ms (single image), which is twice that of the original model (CenterNet with ResNet50). The model achieved an AP of 76.25—a 12% improvement in accuracy compared with the original model (CenterNet with ResNet50). The model still showed some missing and wrong detections as a result of the limited datasets, manual labeling errors, lighting conditions, and some bare land and scenes where the houses were similar to landslides in terms of color and texture.

Author Contributions

X.G.: Conceptualization, Methodology, Software, Investigation, Writing—Original draft and Review, Data curation, Funding acquisition. Q.Z.: Formal analysis, Data acquisition, Tests, Writing—Original draft. B.W.: Investigation, Formal analysis, Writing—Review. M.C.: Formal analysis, Writing—Review. All authors have read and agreed to the published version of the manuscript.

Funding

This study was jointly supported by the Key Laboratory of Natural Resources Monitory in Tropical and Subtropical Area of South China, Ministry of Natural Resources No. 2022NRM005.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhou, Y.; Wang, H.; Yang, R.; Yao, G.; Xu, Q.; Zhang, X. A Novel Weakly Supervised Remote Sensing Landslide Semantic Segmentation Method: Combining CAM and cycleGAN Algorithms. Remote Sens. 2022, 14, 3650. [Google Scholar] [CrossRef]
  2. Lukić, T.; Bjelajac, D.; Fitzsimmons, K.E.; Marković, S.B.; Basarin, B.; Mlađan, D.; Micić, T.; Schaetzl, R.J.; Gavrilov, M.B.; Milanović, M.; et al. Factors Triggering Landslide Occurrence on the Zemun Loess Plateau, Belgrade Area, Serbia. Environ. Earth Sci. 2018, 77, 519. [Google Scholar] [CrossRef]
  3. Luino, F.; De Graff, J.; Biddoccu, M.; Faccini, F.; Freppaz, M.; Roccati, A.; Ungaro, F.; D’Amico, M.; Turconi, L. The Role of Soil Type in Triggering Shallow Landslides in the Alps (Lombardy, Northern Italy). Land 2022, 11, 1125. [Google Scholar] [CrossRef]
  4. Grima, N.; Edwards, D.; Edwards, F.; Petley, D.; Fisher, B. Landslides in the Andes: Forests Can Provide Cost-Effective Landslide Regulation Services. Sci. Total Environ. 2020, 745, 141128. [Google Scholar] [CrossRef] [PubMed]
  5. Tonini, M.; Cama, M. Spatio-temporal pattern distribution of landslides causing damage in Switzerland. Landslides 2019, 16, 2103–2113. [Google Scholar] [CrossRef]
  6. Al-Umar, M.; Fall, M.; Daneshfar, B. GIS based assessment of rainfall-induced landslide susceptibility of sensitive marine clays: A case study. Geomech. Geoeng.-Int. J. 2021, 17, 1458–1484. [Google Scholar] [CrossRef]
  7. Ma, Z.; Mei, G.; Piccialli, F. Machine learning for landslides prevention: A survey. Neural Comput. Appl. 2020, 33, 10881–10907. [Google Scholar] [CrossRef]
  8. Xia, W.; Chen, J.; Liu, J.; Ma, C.; Liu, W. Landslide Extraction from High-Resolution Remote Sensing Imagery Using Fully Convolutional Spectral–Topographic Fusion Network. Remote Sens. 2021, 13, 5116. [Google Scholar] [CrossRef]
  9. Chen, S.; Xiang, C.; Kang, Q.; Wu, T.; Liu, K.; Feng, L.; Tao, D. Multi-Source Remote Sensing Based Accurate Landslide Detection Leveraging Spatial-Temporal-Spectral Feature Fusion. J. Comput. Res. Dev. 2020, 57, 1877–1887. [Google Scholar]
  10. Hu, Q.; Zhou, Y.; Wang, S.; Wang, F.; Wang, H. Improving the Accuracy of Landslide Detection in “Off-site” Area by Machine Learning Model Portability Comparison: A Case Study of Jiuzhaigou Earthquake, China. Remote Sens. 2019, 11, 2530. [Google Scholar] [CrossRef]
  11. Long, L.; He, F.; Liu, H. The use of remote sensing satellite using deep learning in emergency monitoring of high-level landslides disaster in Jinsha River. J. Supercomput. 2021, 77, 8728–8744. [Google Scholar] [CrossRef]
  12. Tanatipuknon, A.; Aimmanee, P.; Watanabe, Y.; Murata, K.T.; Wakai, A.; Sato, G.; Hung, H.V.; Tungpimolrut, K.; Keerativittayanun, S.; Karnjana, J. Study on Combining Two Faster R-CNN Models for Landslide Detection with a Classification Decision Tree to Improve the Detection Performance. J. Disaster Res. 2021, 16, 588–595. [Google Scholar] [CrossRef]
  13. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
  14. Cheng, L.; Li, J.; Duan, P.; Wang, M. A small attentional YOLO model for landslide detection from satellite remote sensing images. Landslides 2021, 18, 2751–2765. [Google Scholar] [CrossRef]
  15. Ji, S.; Yu, D.; Shen, C.; Li, W.; Xu, Q. Landslide detection from an open satellite imagery and digital elevation model dataset using attention boosted convolutional neural networks. Landslides 2020, 17, 1337–1352. [Google Scholar] [CrossRef]
  16. Ullo, S.; Mohan, A.; Sebastianelli, A.; Ahamed, S.; Kumar, B.; Dwivedi, R.; Sinha, G.R. A New Mask R-CNN-Based Method for Improved Landslide Detection. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3799–3810. [Google Scholar] [CrossRef]
  17. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
  18. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  19. Zhao, H.; Li, Z.W.; Zhang, T.Q. Attention Based Single Shot Multibox Detector. J. Electron. Inf. Technol. 2021, 43, 2096–2104. [Google Scholar]
  20. Ge, D.H.; Li, H.S.; Zhang, L.; Liu, R.Y.; Shen, P.Y.; Miao, Q.G. Survey of Lightweight Neural Network. J. Softw. 2020, 31, 2627–2653. [Google Scholar]
  21. Mehta, R.; Ozturk, C. Object detection at 200 Frames Per Second. arXiv 2018, arXiv:1805.06361. [Google Scholar]
  22. Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 June 2017; pp. 6517–6525. [Google Scholar]
  23. Zhang, L.F.; Dong, R.P.; Tai, H.S.; Ma, K.S. PointDistiller: Structured Knowledge Distillation towards Efficient and Compact 3D Detection. arXiv 2022, arXiv:2205.11098. [Google Scholar]
  24. Zhang, J.; Ma, K. Improve object detection with feature-based knowledge distillation:towards accurate and efficient detectors. In Proceedings of the International Conference on Learning Representations, Online, 3–7 May 2021. [Google Scholar]
  25. Liu, Y.F.; Shu, C.Y.; Wang, J.D.; Shen, C.H. Structured knowledge distillation for dense prediction. arXiv 2019, arXiv:1903.0419. [Google Scholar] [CrossRef] [PubMed]
  26. Wang, T.; Yuan, L.; Zhang, X.; Feng, J. Distilling object detectors with fine-grained feature imitation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 4933–4942. [Google Scholar]
Figure 1. Errors in detection using a lightweight landslide detection network. (a) Missing detection, (b) false detection, (c) failure to detect, and (d) a detection range that is too small. The red squares and the green squares are the landslide areas predicted by the model.
Figure 1. Errors in detection using a lightweight landslide detection network. (a) Missing detection, (b) false detection, (c) failure to detect, and (d) a detection range that is too small. The red squares and the green squares are the landslide areas predicted by the model.
Remotesensing 15 01085 g001
Figure 2. Two detection network frameworks. (a) CenterNet with ResNet50 and (b) CenterNet with CBAM-Mini-bone (hereinafter referred to as CBAM-Mini).
Figure 2. Two detection network frameworks. (a) CenterNet with ResNet50 and (b) CenterNet with CBAM-Mini-bone (hereinafter referred to as CBAM-Mini).
Remotesensing 15 01085 g002
Figure 3. Lightweight multiscale fusion structure.
Figure 3. Lightweight multiscale fusion structure.
Remotesensing 15 01085 g003
Figure 4. Location and remote sensing imagery of the study area. (a) Sichuan’s location in China. (b) The latitude and longitude extent of the study area. (c) Remote sensing images of the study area.
Figure 4. Location and remote sensing imagery of the study area. (a) Sichuan’s location in China. (b) The latitude and longitude extent of the study area. (c) Remote sensing images of the study area.
Remotesensing 15 01085 g004
Figure 5. Landslide remote sensing images. (a) Image of landslides near highways. (b) Images of landslides near rivers. (c) Images of landslides of different shapes. The red squares are the landslide areas.
Figure 5. Landslide remote sensing images. (a) Image of landslides near highways. (b) Images of landslides near rivers. (c) Images of landslides of different shapes. The red squares are the landslide areas.
Remotesensing 15 01085 g005
Figure 6. Flow chart of the technology used in this study.
Figure 6. Flow chart of the technology used in this study.
Remotesensing 15 01085 g006
Figure 7. The connected pairs of feature maps marked in this figure are very similar.
Figure 7. The connected pairs of feature maps marked in this figure are very similar.
Remotesensing 15 01085 g007
Figure 8. The two G-Bneck structures. (a) Stride = 1 bottleneck. (b) Stride = 2 bottleneck.
Figure 8. The two G-Bneck structures. (a) Stride = 1 bottleneck. (b) Stride = 2 bottleneck.
Remotesensing 15 01085 g008
Figure 9. Schematic diagram of the G-Bneck structure introduced into CBAM (G-Bneck-CBAM).
Figure 9. Schematic diagram of the G-Bneck structure introduced into CBAM (G-Bneck-CBAM).
Remotesensing 15 01085 g009
Figure 10. Comparison of the (64 × 64) size features extracted by the Mini network with and without the attention mechanism (CBAM). (a) The original images, (b) the features extracted with CBAM, and (c) the features extracted without CBAM.
Figure 10. Comparison of the (64 × 64) size features extracted by the Mini network with and without the attention mechanism (CBAM). (a) The original images, (b) the features extracted with CBAM, and (c) the features extracted without CBAM.
Remotesensing 15 01085 g010
Figure 11. Comparison of the features extracted by the CBAM-Mini network with and without lightweight multiscale fusion. (a) Original image, (b) features extracted using lightweight multiscale fusion, and (c) features extracted without using lightweight multiscale fusion.
Figure 11. Comparison of the features extracted by the CBAM-Mini network with and without lightweight multiscale fusion. (a) Original image, (b) features extracted using lightweight multiscale fusion, and (c) features extracted without using lightweight multiscale fusion.
Remotesensing 15 01085 g011
Figure 12. Lightweight multiscale fusion module. (a) The (32 × 32) feature maps in decoding, (b) the (64 × 64) feature maps in decoding and (c) the (128 × 128) feature maps in decoding.
Figure 12. Lightweight multiscale fusion module. (a) The (32 × 32) feature maps in decoding, (b) the (64 × 64) feature maps in decoding and (c) the (128 × 128) feature maps in decoding.
Remotesensing 15 01085 g012
Figure 13. The lightweight multiscale fusion CBAM-Mini network.
Figure 13. The lightweight multiscale fusion CBAM-Mini network.
Remotesensing 15 01085 g013
Figure 14. The feature maps. (a1a3) are the original landslide images. (b1b3) are feature maps of the student network (CBAM-Mini after lightweight feature fusion) in encoding. (c1c3) are feature maps of the student network after distillation in decoding. (d1d3) are feature maps of the teacher network (Yolov5x) in encoding.
Figure 14. The feature maps. (a1a3) are the original landslide images. (b1b3) are feature maps of the student network (CBAM-Mini after lightweight feature fusion) in encoding. (c1c3) are feature maps of the student network after distillation in decoding. (d1d3) are feature maps of the teacher network (Yolov5x) in encoding.
Remotesensing 15 01085 g014
Figure 15. Partial dataset of landslides.
Figure 15. Partial dataset of landslides.
Remotesensing 15 01085 g015
Figure 16. Part of the detection results of landslide detection using our model.
Figure 16. Part of the detection results of landslide detection using our model.
Remotesensing 15 01085 g016
Figure 17. Illustration of the results with partial error detection. (a) Landslide Detection in Dark Environment, (b) Detection of landslides similar to the surrounding environment and (c) Detection of Landslides in Residential Areas.
Figure 17. Illustration of the results with partial error detection. (a) Landslide Detection in Dark Environment, (b) Detection of landslides similar to the surrounding environment and (c) Detection of Landslides in Residential Areas.
Remotesensing 15 01085 g017
Figure 18. Evaluation index curve of each model. (A) P-R curve; (B) the accuracy-confidence threshold curve; (C) the recall-confidence threshold curve.
Figure 18. Evaluation index curve of each model. (A) P-R curve; (B) the accuracy-confidence threshold curve; (C) the recall-confidence threshold curve.
Remotesensing 15 01085 g018
Figure 19. Landslide emergency detection flow chart.
Figure 19. Landslide emergency detection flow chart.
Remotesensing 15 01085 g019
Table 1. Parameters in each stage of the CBAM-Mini network backbone feature extraction network.
Table 1. Parameters in each stage of the CBAM-Mini network backbone feature extraction network.
InputConvolutionOutput ChannelsStride
512 × 512 × 3Conv2d 3 × 3162
Stage one256 × 256 × 16G-bneck-CBAM161
256 × 256 × 16G-bneck-CBAM242
Stage two128 × 128 × 24G-bneck-CBAM241
128 × 128 × 24G-bneck-CBAM402
Stage three64 × 64 × 40G-bneck-CBAM401
64 × 64 × 40G-bneck-CBAM802
Stage four32 × 32 × 80G-bneck-CBAM801
32 × 32 × 80G-bneck-CBAM801
32 × 32 × 80G-bneck-CBAM801
32 × 32 × 112G-bneck-CBAM1121
32 × 32 × 112G-bneck-CBAM1121
32 × 32 × 112G-bneck-CBAM1122
Stage five16 × 16 × 160G-bneck-CBAM1601
16 × 16 × 160G-bneck-CBAM1601
16 × 16 × 160G-bneck-CBAM1601
16 × 16 × 160G-bneck-CBAM1601
16 × 16 × 160Conv2d 1 × 14801
Output16 × 16 × 480
Table 2. Comparison of parameters of the four models.
Table 2. Comparison of parameters of the four models.
NumberMethodParams
(MB)
Precision
(%)
Recall
(%)
AP
(%)
Inference Time
(ms)
ResNet50127.980.567.867.55100
CBAM-Mini12.765.4352.8653.440
Yolov5x86.7919492.0167.8
Light-Net18.782.878.176.2552
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ge, X.; Zhao, Q.; Wang, B.; Chen, M. Lightweight Landslide Detection Network for Emergency Scenarios. Remote Sens. 2023, 15, 1085. https://doi.org/10.3390/rs15041085

AMA Style

Ge X, Zhao Q, Wang B, Chen M. Lightweight Landslide Detection Network for Emergency Scenarios. Remote Sensing. 2023; 15(4):1085. https://doi.org/10.3390/rs15041085

Chicago/Turabian Style

Ge, Xuming, Qian Zhao, Bin Wang, and Min Chen. 2023. "Lightweight Landslide Detection Network for Emergency Scenarios" Remote Sensing 15, no. 4: 1085. https://doi.org/10.3390/rs15041085

APA Style

Ge, X., Zhao, Q., Wang, B., & Chen, M. (2023). Lightweight Landslide Detection Network for Emergency Scenarios. Remote Sensing, 15(4), 1085. https://doi.org/10.3390/rs15041085

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop