Shape-Constrained ResU-Net for Old Landslides Detection in the Loess Plateau

Peng, Lulu; Ding, Mingtao; Xue, Qiang; Dong, Ying; Li, Yunlong; Zhou, Pengxiang; Li, Zhenhong

doi:10.3390/app16010546

Open AccessArticle

Shape-Constrained ResU-Net for Old Landslides Detection in the Loess Plateau

by

Lulu Peng

^1,2,3,

Mingtao Ding

^1,2,3,*

,

Qiang Xue

^4,5,

Ying Dong

^4,5,*,

Yunlong Li

⁶,

Pengxiang Zhou

^1,2,3 and

Zhenhong Li

^1,2,3

¹

State Key Laboratory of Loess, Chang’an University, Xi’an 710054, China

²

College of Geological Engineering and Geomatics, Chang’an University, Xi’an 710054, China

³

Big Data Center for Geosciences and Satellites, Chang’an University, Xi’an 710054, China

⁴

Key Laboratory for Geo-Hazards in Loess Area, Ministry of Natural Resources, Xi’an 710119, China

⁵

Xi’an Center of China Geological Survey, Xi’an 710119, China

⁶

Liyuan Power Generation Branch of Yunnan Huadian Jinsha River Middle Reaches Hydropower Development Co., Ltd., Kunming 650228, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2026, 16(1), 546; https://doi.org/10.3390/app16010546

Submission received: 25 November 2025 / Revised: 24 December 2025 / Accepted: 2 January 2026 / Published: 5 January 2026

Download

Browse Figures

Versions Notes

Abstract

The Loess Plateau is highly susceptible to landslides due to its fragile geological structure and frequent human activities, particularly old landslides with historical structural damage. The features of these landslides in remote sensing images become blurred over time, leading to huge challenges in detection. Considering that old landslides exhibit obvious shape characteristics, we propose ResU-SPMNet, a deep learning model that integrates shape characteristics into the baseline ResU-Net. The proposed model consists of three components: ResU-Net, shape prior module (SPM), and the atrous spatial pyramid pooling (ASPP) module, which jointly enhance segmentation performance from the perspectives of shape constraints and multi-scale feature representation. To validate the effectiveness of the proposed approach, old landslides in representative regions of the Loess Plateau were selected as the study targets. Results show that the proposed model outperforms ResU-Net, SegNet, MultiResUnet, and DeepLabv3+ in old landslide segmentation, achieving an F1-score of 0.6669 and an MCC of 0.6167. Moreover, generalization tests conducted in independent regions indicate that the model exhibits strong robustness across different seasons. The best performance is achieved in summer, whereas performance declines in winter due to adverse factors such as reduced illumination and snow or ice cover.

Keywords:

deep learning; old landslides; shape prior module; atrous spatial pyramid pooling; ResU-SPMNet

1. Introduction

The Loess Plateau is one of the regions in China most severely affected by frequent landslide disasters, among which old landslides are particularly typical, characterized by large volume, high hazard potential, and strong concealment [1,2]. Old landslides are the products of long-term geomorphological evolution under complex geological conditions and are generally defined as landslides that have previously occurred, forming loose deposits with significant structural damage [3]. In recent years, the intensification of extreme weather events and increased geological activity [4] have substantially increased the probability of reactivation in old landslides, posing serious threats to human life and property [5,6]. Precise detection of old landslides in the Loess Plateau is crucial for accurately assessing the risk of reactivation and ensuring regional safety.

Traditional landslide detection methods primarily rely on manual visual interpretation and field surveys, which depend on expert knowledge to analyze landslide morphology and boundaries in remote sensing images. Although relatively accurate, these methods are inherently subjective and inefficient, rendering them unsuitable for large-scale disaster monitoring [4,7]. With advances in remote sensing technology, machine learning methods have been increasingly applied to landslide detection. Common methods include support vector machines (SVMs) [8,9,10], random forests (RFs) [11,12], and k-means clustering algorithms [13,14]. Compared with manual interpretation, machine learning offers a higher degree of automation. However, the effectiveness of these models largely hinges on feature selection and engineering, which require extensive domain expertise and are often time-consuming.

As an advanced extension of neural networks, deep learning exhibits superior feature learning capabilities and has achieved remarkable success in landslide detection in recent years [15,16,17,18]. Liu et al. [19] proposed FFS-Net, a feature fusion semantic segmentation network that integrates heterogeneous high-resolution remote sensing images and digital elevation models, significantly improving the detection accuracy of visually ambiguous old landslides with similar color and texture to their surroundings. Lu et al. [20] introduced an iterative classification semantic segmentation network (ICSSN) based on multi-task learning, improving the F1-score from 0.5054 to 0.5448. Jiang et al. [17] proposed a landslide detection method simulating hard samples, achieving a 10% improvement in F1-score over the baseline Mask-RCNN model. Although these methods have advanced the accuracy of old landslide detection, their F1-scores remain below 0.6. Hence, reliable detection of old landslides remains a significant challenge.

The morphological characteristics of landslides is largely influenced by the downslope movement of materials and the terrain-controlled source-travel-deposition process, resulting in obvious shape characteristics that have been playing an important role in field investigations of landslides [21,22,23,24]. Existing studies demonstrate that landslides exhibit distinct geometric structural features in their morphology. For instance, Taylor et al. [25] utilized ellipticity and aspect ratios to model landslide morphology, revealing statistical patterns of landslide shapes through large-scale landslide databases. Lan et al. [26,27] noted that although the traditional length-to-width ratio can characterize the overall extent of a landslide, it is insufficient to reflect the width variation along the sliding path. They thus proposed a new method for generating the landslide centerline based on digital elevation models and polygon boundaries, which quantifies the landslide’s geometric morphology from the longitudinal profile perspective. Liu et al. [28] systematically classified landslide samples in the Moxi Town area, identifying nine typical morphological types, including tabular, droplet-shaped, fan-shaped, Y-shaped, rhomboidal, among others. Furthermore, they revealed the characteristic distribution patterns of landslide morphology, which are controlled by multiple factors such as topographic relief, slope gradient conditions, and the initiation location. Niculita et al. [29] proposed an algorithm for automated estimation of landslide length and width to address inherent limitations of conventional GIS methods in morphological analysis. Collectively, landslides typically exhibit distinct morphological patterns, which not only reflect the formation and evolution processes of the landslides but can also serve as effective criteria for identification.

Although shape features have played an important role in many field investigations of landslides, research on their effective application in deep learning to enhance detection performance remains relatively scarce. Moreover, due to prolonged natural evolution, old landslides gradually acquire surface color and texture characteristics that resemble the surrounding environment, leading to high visual ambiguity and significantly increasing the difficulty of detection [15,30].

To address these challenges, this study focuses on old landslides in the Loess Plateau and proposes an improved model, ResU-SPMNet, which integrates shape priors and multi-scale features. The principal contributions are as follows: Firstly, a shape prior module is introduced [31], which collaboratively extracts the global contours and local details of landslides through self-update and cross-update blocks, and transmits the learned shape information to the decoder, effectively enhancing the model’s perception and constraint of landslide geometries. Secondly, an atrous spatial pyramid pooling module is incorporated to strengthen the model’s ability to extract landslide features at multiple scales, thereby minimizing the loss of valuable information during training [32,33,34,35,36]. Finally, systematic performance evaluation experiments are conducted, and the results show that ResU-SPMNet significantly outperforms other semantic segmentation models on several metrics, markedly enhancing the model’s ability to capture landslide shapes, and demonstrating higher segmentation accuracy and generalization ability.

2. Study Area and Data

2.1. Overview of the Study Area

The study area for old landslides is situated in the Loess Plateau, as illustrated in Figure 1. The Loess Plateau is situated in northwestern China and represents the largest and thickest loess deposit region in the world. The region exhibits an overall west-to-east descending topographic trend, with elevations ranging from approximately 800 to 3000 m, and is characterized by highly dissected terrain with dense gully networks. The climate is classified as a warm temperate continental monsoon climate, featuring hot and humid summers and cold, dry winters. Annual precipitation is relatively low and unevenly distributed, with the majority occurring during the summer season. The soils are loose, vegetation cover is sparse, and soil erosion is severe. Under the combined influence of these distinctive geomorphological and hydroclimatic conditions, the Loess Plateau has become one of the most landslide-prone regions in China.

The study area consists of four typical regions: A, B, C, and D. Among them, regions A, B, and C are used to construct the old landslide dataset, while region D is used for testing the model’s generalization ability. The regions exhibit significant differences in topography and geomorphology. Region A is located in Haidong City, Qinghai Province, with geographic coordinates ranging from 102°23′ to 103°39′ E and 35°47′ to 36°12′ N. This region lies at the transitional zone between the Loess Plateau and the Tibetan Plateau, with an elevation ranging from 1800 to 3900 m. Landslide bodies in this region are medium to large in scale and are commonly found on steep slopes and the edges of valleys. Region B is located in Tianshui City, Gansu Province, with geographic coordinates ranging from 105°20′ to 106°20′ E and 34°44′ to 35°11′ N. It is situated in the Loess Hills of the Weihe River basin in the southeastern part of the Loess Plateau, characterized by loess hills and river valley landforms. The elevation ranges from approximately 1200 to 2000 m. Landslides in this region are generally small to medium in scale but occur with high frequency. They predominantly occur at the foot of slopes and on steep banks. Region C is located in Xianyang City, Shaanxi Province, with geographic coordinates ranging from 108°29′ to 108°58′ E and 34°26′ to 34°50′ N. The elevation is relatively low, ranging from approximately 390 to 1600 m. Landslides mainly occur at the edges of the loess tableland, with small-scale landslide bodies usually triggered by rainfall or artificial slope cutting. Region D is located in Xiji County, Guyuan City, Ningxia Hui Autonomous Region, with geographic coordinates ranging from 105°20′ to 106°04′ E and 35°35′ to 36°14′ N. It lies in the southern part of Ningxia in the Loess Hills, with an elevation ranging from approximately 1500 to 2800 m.

Li et al. [27] proposed the concept of landslide longitudinal shape, suggesting that variations in landslide width along the movement direction are not merely geometric characteristics but also reflect, to some extent, the kinematic constraints and evolutionary processes of landslides. In that study, landslide shapes were classified into rectangular, widening, narrowing, spindle, and hourglass. Building upon this theoretical framework and integrating extensive field investigations with remote sensing interpretations, our study classifies landslides into two major types: widening and rectangular, as illustrated in Figure 2. Further subdivision was not pursued because the remaining landslide types in the study area are relatively scarce, and the sample size is insufficient to support stable training of deep learning models.

2.2. Production of Experimental Data and Sample Set

The dataset includes optical remote sensing imagery and DEM data. Optical remote sensing images present texture and shape features of the landslides, while DEM data quantitatively constrain sliding directions and depositional patterns via terrain parameters including elevation, slope gradient, and curvature indices, thereby determining ultimate morphological configurations. Integration of these data sources enables models to concurrently capture spectral features and geomorphological characteristics of landslides during training, thus enhancing comprehensive discrimination capabilities for landslide boundaries and morphological structures.

The optical remote sensing images used in this study are obtained from Google Earth, and the DEM data are obtained from local geological institutions. The resolution of the optical images is 2 m, while the DEM data is 8 m, the imagery spans a temporal range from 2015 to 2024. During dataset construction, historical images from different years and seasons were manually inspected in Google Earth, and those with the clearest landslide morphological characteristics were selected. The dataset construction process is summarized as follows: First, old landslide locations were identified by integrating optical remote sensing imagery with field survey results. Optical remote sensing images and DEM data covering the study area were then collected. The DEM data were resampled to match the spatial resolution of the optical imagery. After resampling, the DEM values were normalized to a range of 0–255. Next, the optical remote sensing imagery and the normalized DEM data were band-composited to form four-band image data. Subsequently, the “Data Sample Production” tool in the SVM-LSM sample production toolbox [5] is used for batch cropping to obtain images. For annotation, the open-source image labeling tool Labelme was employed. The cropped image patches were imported into Labelme, and semantic regions corresponding to each class were manually delineated using the drawing tools provided by the software. Labelme supports multiple annotation types, including polygons, rectangles, and free-form shapes, and each annotated region was assigned a corresponding semantic class label. After annotation, the results were saved as JSON files, which contain the coordinate information and class labels for each annotated region. Finally, the JSON-format annotations were converted into TIFF-format label files required for this study. All images and corresponding labels were resized to 512 × 512 pixels. Each image consists of four channels, while each label is a single-channel raster.

Owing to long-term fluvial erosion in the Loess Plateau region, extensive gully landforms and steep slopes have developed. These landforms exhibit optical characteristics similar to those of landslides, which can easily lead to misclassification and reduced detection accuracy. Therefore, to enhance the model’s ability to discriminate between landslides and non-landslide features, typical non-landslide samples, such as eroded gullies and slope-foot deposits, were deliberately included in the dataset to strengthen the model’s discrimination capability for morphologically similar landforms, as shown in Figure 3. The dataset includes 1300 positive samples and 650 negative samples. The dataset was divided into training, validation, and testing subsets in a ratio of 7:2:1, as detailed in Table 1. Due to the limited number of landslide samples, data augmentation techniques including horizontal flipping, vertical flipping, and rotation were applied to expand the dataset.

3. Research Methods

3.1. Shape Prior Module

Conventional convolutional neural networks exhibit limited receptive fields, leading to insufficient utilization of contextual information. Although existing approaches such as channel/spatial attention mechanisms [37,38,39] can enhance local features relevant to specific tasks but still fail to effectively expand the model’s global receptive field. Furthermore, traditional shape models including atlas-based [40] or statistical shape models [41] provide geometric constraints but suffer from high computing costs and integration challenges within deep learning frameworks. In contrast, the SPM [31] simultaneously expands receptive fields and enables seamless integration with existing networks.

The shape prior module consists of two components: a self-update block and a cross-update block. Its inputs include the original shape prior

F_{o}

and the original skipped features

S_{o}

. The original shape prior is a learnable parameter that is dynamically optimized during the training process of the model. At the beginning of training, these vectors are randomly initialized and treated as part of the model parameters. They are iteratively updated together with the network weights via backpropagation, gradually converging toward optimal shape representations. the original skipped features are derived from image features extracted by the encoder. Specifically, the self-update block employs a self-attention mechanism to extract a global shape prior

S_{G}

that represents the overall contour and contextual information of the target class from the randomly initialized, learnable shape prior. The global shape prior is then fed into the cross-update block to guide and enhance the encoder-extracted image features, while the detail-rich image features are used to refine the coarse global shape prior, thereby generating a fine-grained local shape prior

S_{L}

. Finally, the local shape prior containing detailed information is combined with the global shape prior containing contextual information to produce the enhanced shape prior

S_{e}

. During this process, the self-update block provides global shape guidance to the cross-update block, which uses this guidance to enhance texture features and, in turn, refines the shape prior using rich texture details to generate the final updated shape prior. The updated shape prior can be used as the input to the next SPM iteration, enabling continuous refinement of shape representations throughout training. This design allows the SPM to simultaneously model global structure and local details, significantly enhancing the segmentation network’s ability to perceive and represent target shapes. The SPM network structure is illustrated in Figure 4.

3.1.1. Self-Update Block

Self-Update Block: updating the shape prior information. The input to the self-update block is the original shape prior

S_{o}

, which is an initialized and learnable parameter. Each channel of the shape prior corresponds to a class-specific shape prototype, providing the network with an explicit geometric structural prior. The self-update block utilizes a self-attention mechanism to dynamically reweight the input features, enabling the extraction of more salient information and enhancing the model’s representational capacity. The core of the self-attention mechanism lies in the interaction among the query (Q), key (K), and value (V) components. The input original shape prior is projected into three distinct feature spaces to generate

Q_{S}

,

K_{S}

, and

V_{S}

, which form the basis for self-attention computation. Through matrix multiplication, scaling, and Softmax normalization, the similarities between shape prior of different semantic classes are computed to generate the attention map

S_{m a p}

, which is an N × N attention matrix.

S_{m a p}

captures the global shape dependencies within the semantic space. A weighted summation of the value vectors

V_{S}

using the attention

S_{m a p}

yields an initial representation containing coarse-grained shape information. The aggregated features are then passed through a multilayer perceptron to apply nonlinear transformations, enhancing the model’s representational capacity and enabling the learning of more complex shape patterns. Layer normalization is employed to stabilize the training process and accelerate convergence. The original shape prior is fused with the learned global information to ensure training stability and prevent gradient vanishing, ultimately producing the global shape prior

S_{G}

. This procedure effectively extracts coarse structural shape information spanning the entire input image from the original prior, providing global guidance for subsequent enhancement of skip-connection features. In this manner, a randomly initialized vector is refined through self-attention interactions into a global shape prior that encodes the statistical regularities of target shapes. Corresponding formulation is given as follows:

S_{ma p} = S o f t \max ({\frac{Q_{S} (S_{O}) \times K_{S} (S_{O})}{\sqrt{N}}}^{T})

(1)

S^{'} = L N (S_{m a p} \times V_{S} (S_{O})) + S_{O}

(2)

S_{G} = L N (M L P (S^{'})) + S^{'}

(3)

where

Q_{S}

,

K_{S}

,

V_{s}

respectively represent the convolution transformation which projected

S_{o}

into the query vector, key vector and value vector, T is the transpose operator, N is the class number,

S_{G}

is the global shape prior.

3.1.2. Cross-Update Block

Cross-Update Block: Combining the input feature map with the shape prior information. Although global shape prior provides valuable contextual information, it lacks the granularity necessary for accurate representation of detailed shapes and contours. To enhance the model’s capacity to capture fine-grained morphological structures of landslides, a cross-update block is introduced. The inputs to the cross-update block consist of the global shape prior and image features extracted by the encoder. The cross-update block first processes the input features using two residual convolutional blocks to progressively enhance feature representations. Subsequently, the global shape prior

S_{G}

is upsampled to match the spatial resolution of the encoder original skipped features

F_{o}

.The aligned

F_{o}

and upsampled

S_{G}

are then projected into a query vector

Q_{C}

and a key vector

K_{C}

, respectively, in preparation for cross-attention computation. Through matrix multiplication, scaling, and Softmax normalization, the correlation between each pixel and the shape prototypes of all semantic classes is computed, resulting in the cross-attention map

C_{m a p}

, which is a C × N matrix. This map establishes spatial correspondences between encoder-extracted features and the shape prior, determining how shape knowledge guides the enhancement of image features. By applying

C_{m a p}

to the value vectors

V_{C}

projected from

S_{G}

, a shape-guided signal is generated. This signal is then added to the original skipped feature

F_{o}

to produce the enhanced skipped features

F_{e}

. The enhanced skipped feature preserves the original fine-grained texture details while incorporating global contextual information consistent with the target shape. The enhanced skipped feature is subsequently downsampled and convolved to extract a finer local shape prior

S_{L}

. The global shape prior

S_{G}

, which encodes overall contours, is combined with the local shape prior

S_{L}

, which captures fine details, to obtain the refined shape prior

S_{e}

. The refined shape prior

S_{e}

represents a complete update of the original shape prior

S_{o}

, embedding richer shape information. The corresponding formulation is given as follows:

C_{map} = S o f t \max (\frac{Q_{C} (F_{O}) \times K_{C} (U p s a p l e (S_{G} {))}^{T}}{\sqrt{N}})

(4)

F_{e} = C_{map} \times V_{C} (U p s a m p l e (S_{G})) + F_{O}

(5)

S_{L} = D o w n s a m p l e (C o n v (F_{e}))

(6)

S_{e} = S_{L} + S_{G}

(7)

where

Q_{C}

represent the convolution transformation which projected

F_{o}

into the query vector.

K_{C}

,

V_{C}

respectively represent the convolution transformation which projected

S_{G}

into the key vector and value vector. T is the transpose operator; N is the class number.

3.2. Atrous Spatial Pyramid Pooling

Traditional convolutional neural networks progressively downsample feature maps by stacking convolutional and pooling layers, which leads to the loss of spatial details and hampers precise localization of object boundaries. The introduction of the atrous spatial pyramid pooling module essentially addresses the information loss and reduced spatial resolution caused by pooling and upsampling operations in conventional convolutional neural networks for semantic segmentation tasks. The core design of ASPP eliminates traditional pooling operations by employing atrous convolutions, which expand the receptive field while preserving the resolution of the feature maps. Atrous convolution adjusts the dilation rate to enable the convolutional kernel to capture a broader contextual region without increasing the number of parameters or performing downsampling. However, a single dilation rate limits the convolution to capturing context at only one fixed scale, which is insufficient given the significant variation in object sizes in real-world scenes. To address this, ASPP introduces a multi-branch parallel structure that combines atrous convolution layers with varying dilation rates and a global average pooling branch to form a spatial pyramid. The structure of the atrous spatial pyramid pooling module is illustrated in Figure 5. In this study, dilation rates of 6, 12, and 18 are employed. These different kernels enable the extraction of multi-scale features to adapt to varying object sizes within the images. The formula for calculating the effective kernel size of an atrous convolution is as follows:

k + (k - 1) (r a t e - 1)

(8)

where

k

denotes the original kernel size. A small dilation rate is suitable for capturing fine-grained local details, while a large dilation rate allows the model to gather broader contextual information. Finally, the ASPP module fuses the features obtained from the various atrous convolution branches and the global pooling branch. This multi-scale feature fusion not only preserves fine local features but also enriches the global contextual understanding, leading to more accurate image segmentation results.

3.3. ResU-SPMNet

Deep neural networks can extract complex abstract features from data through multi-level nonlinear transformations, demonstrating strong performance in tasks such as image segmentation and object detection. However, as network depth increases, training often suffers from vanishing or exploding gradients, which hinder effective parameter updates and degrade model performance. In 2015, He et al. [42] proposed the residual neural network (ResNet), which introduced residual blocks to reconstruct the basic building units of neural networks, effectively addressing the training difficulties of deep architectures. ResU-Net is a deep learning architecture that combines the strengths of ResNet and U-Net [43]. It integrates residual learning with U-Net’s feature extraction and skip connection mechanisms, overcoming the gradient vanishing and network degradation issues faced by traditional deep learning models, while enhancing the ability to capture complex image features. ResU-Net retains the encoder–decoder structure of U-Net. The encoder extracts feature through a series of convolution and pooling operations, reducing the spatial resolution of feature maps to capture contextual information. The decoder progressively upsamples the low-resolution feature maps to restore spatial dimensions and precisely localize each pixel class, while skip connections fuse encoder feature maps with decoder feature maps to preserve detailed spatial information.

To enhance segmentation performance, this study proposes an improved model, ResU-SPMNet, which extends the ResU-Net architecture by introducing a shape prior module and an atrous spatial pyramid pooling module, as illustrated in Figure 6. The SPM is integrated into the skip connections at each level, while the ASPP module is integrated into the bridge connection between the encoder and decoder. The primary function of the SPM is to process the input feature maps along with shape prior information through a series of update and fusion operations to extract both global and local shape information of landslides. The extracted shape priors are then combined with features learned by the network to enhance the feature representation in the decoder, facilitating more accurate generation of segmentation masks. The core idea of the ASPP is to introduce dilated convolutions between kernels, allowing the convolution operations to cover a larger receptive field. By setting multiple dilation rates in parallel, features at different scales are extracted to capture landslide information of varying sizes. Finally, the network outputs the landslide segmentation map through a 1 × 1 convolution followed by a sigmoid activation. Table 2 lists the detailed information of the ResU-SPMNet model.

3.4. Accuracy Evaluation

The confusion matrix is a widely used tool for evaluating the performance of classification models, particularly in machine learning fields such as deep learning. It is a tabular representation that summarizes the relationship between the predicted results and the actual labels, as illustrated in Table 3.

In the confusion matrix: TP (True Positive) refers to instances predicted as landslides that are indeed landslides; FP (False Positive) refers to instances predicted as landslides but are actually non-landslides; FN (False Negative) refers to instances predicted as non-landslides but are actually landslides; and TN (True Negative) refers to instances predicted as non-landslides that are indeed non-landslides. Based on the confusion matrix, this study calculates four performance metrics: precision, recall, F1-score, and Matthews Correlation Coefficient (MCC), using the following formulas:

P = \frac{T P}{T P + F P}

(9)

R = \frac{T P}{T P + F N}

(10)

F 1 = \frac{2 P R}{P + R}

(11)

M C C = \frac{T P \cdot T N - F P \cdot F N}{\sqrt{(T P + F N) (T P + F P) (T N + F P) (T N + F N)}}

(12)

Among them, precision, also known as positive predictive value, represents the probability that a sample predicted as a landslide is indeed a landslide. A high precision indicates the model effectively reduces false positives when identifying landslide regions. Recall, also referred to as sensitivity, denotes the probability that a sample that is actually a landslide is correctly predicted as such. A high recall indicates that the model can identify more landslide areas, thereby reducing false negatives. F1-score and MCC serve as comprehensive evaluation metrics. The F1-score, defined as the harmonic mean of precision and recall, balances the performance of these two metrics and ranges from 0 to 1, with values closer to 1 indicating better model performance. The Matthews correlation coefficient (MCC) ranges from −1 to 1, where 1 indicates perfect prediction, 0 indicates performance equivalent to random guessing, and −1 indicates completely incorrect prediction.

4. Experiment Results

4.1. Experiment Setup

All experiments in this study were conducted on the same workstation with the following hardware configuration: an Intel(R) Core (TM) i5-14600KF processor at 3.50 GHz (Intel, Santa Clara, CA, USA), 64.0 GB of RAM (Dell, Round Rock, TX, USA), and an NVIDIA GeForce RTX 4090 GPU (NVIDIA, Santa Clara, CA, USA), running Windows 11 (Microsoft, Redmond, WA, USA). The deep learning framework used was PyTorch (version 2.2.2) with Python 3.8. Additional software included Anaconda (version 23.11.0), PyCharm (version 2023.3.2), and ArcGIS10.8. During training, the number of epochs was set to 200, the batch size to 4, and the learning rate to 0.0001. The dice loss function was employed instead of binary cross-entropy, and the network parameters were updated using the Adam optimizer. All experiments were trained from scratch without using any pre-trained weights. Table 4 lists the configuration information of the experiment.

4.2. Old Landslide Segmentation Results

4.2.1. Comparative Results of Different Models

To validate the effectiveness of the proposed ResU-SPMNet model for old landslide segmentation, ResU-Net [44], SegNet [45], MultiResUnet [46], and DeepLabv3+ [47] were selected as baseline models for comparative experiments under identical parameter settings and evaluation criteria. As shown in Table 5, the proposed model achieves superior performance across four evaluation metrics, namely precision, recall, F1-score, and MCC. In terms of precision, ResU-SPMNet attains a score of 0.6822, representing an improvement of 28 percentage points over the original ResU-Net, indicating a higher reliability of the predicted landslide pixels. For Recall, although the differences among the models are relatively small, ResU-SPMNet still achieves a competitive performance. Notably, ResU-SPMNet exhibits the most balanced performance between Precision and Recall, with a minimal difference of only 0.0298. In contrast, ResU-Net, SegNet, MultiResUnet, and DeepLabv3+ show noticeably poorer balance, with differences between Recall and Precision of 0.2861, 0.3125, 0.1365, and 0.2475, respectively. This indicates that ResU-SPMNet neither aggressively misclassifies excessive non-landslide areas as landslides nor conservatively overlooks actual landslide regions, achieving an effective balance between landslide and non-landslide feature representation. From the perspective of comprehensive evaluation, the proposed model achieves the highest F1-score and MCC, demonstrating stronger overall robustness and more reliable segmentation results.

Further comparison of ResU-SPMNet with and without DEM input reveals that incorporating DEM data increases the MCC to 0.6167, achieving the best performance in terms of overall evaluation metrics. DEM data provide terrain parameters such as elevation, slope, aspect, and curvature, which quantitatively constrain landslide movement directions and accumulation patterns from a physical perspective, serving as key indicators for landslide morphological control. Integrating DEM with optical imagery enables the model to simultaneously learn spectral and geomorphological features, thereby enhancing its discriminative capability. Although the introduction of DEM leads to improvements in comprehensive metrics such as Precision and MCC, the overall performance gain remains relatively limited. This limitation is mainly attributed to the relatively coarse spatial resolution of the DEM used in this study. Although resampling and co-registration were applied to the DEM, the terrain information it contains is still insufficient to fully capture subtle topographic variations along old landslide boundaries, limiting its ability to represent fine-scale landslide features.

Figure 7 illustrates the old landslide segmentation results of different models, where clear visual differences can be observed in boundary integrity and overall landslide morphology representation. The segmentation results of ResU-Net, SegNet, MultiResUnet, and DeepLabv3+ exhibit pronounced fragmentation, with discontinuous landslide boundaries. These models commonly suffer from boundary expansion or contraction, leading to overestimation or underestimation of landslide extents and producing numerous false positives in surrounding areas. In contrast, the segmentation results produced by ResU-SPMNet are visually closest to the ground-truth labels. The model not only accurately captures the overall landslide outline but also maintains strong boundary continuity and integrity, with fewer misclassifications and omissions. With the inclusion of DEM data, ResU-SPMNet further reduces boundary-related false detections, producing more compact landslide extents and morphologies that better conform to real-world geomorphological characteristics.

4.2.2. Performance Comparison Across Different Regions

The dataset used in this study consists of three regions (A, B, and C) with different elevation and geomorphological characteristics. To evaluate the stability and adaptability of ResU-SPMNet under varying terrain conditions, comparative experiments were conducted separately in the three regions. As shown in Table 6, the model achieves stable performance across all regions, with F1-scores above 0.58 and MCC values exceeding 0.53, demonstrating its robustness in old landslide identification. Region A demonstrates the best overall performance, with the F1-score and MCC reaching 0.6326 and 0.5724, respectively, after the introduction of DEM data. This can be attributed to the fact that Region A has the highest elevation and pronounced topographic relief, where landslides typically develop along steep slopes and form clearly defined scarps and accumulation zones. Such significant elevation differences provide strong topographic constraints, resulting in relatively well-preserved landslide morphologies with clear boundaries and distinct geometric structures, which are highly favorable for model recognition. Compared with Region A, the segmentation performance in Regions B and C shows a slight decline. Region B is characterized by moderate elevation, where landslides are mainly distributed in hilly or gentle slope transition zones. The relatively weakened topographic relief, together with more diverse landslide scales and irregular shapes, increases the difficulty of accurate identification. In Region C, the overall terrain is relatively flat, and long-term natural evolution and anthropogenic modification have caused landslide boundaries to become smoother. Original sliding traces are often obscured by erosion, deposition, or human activities, significantly reducing landslide distinguishability in remote sensing imagery.

Further comparison between experiments with and without DEM integration reveals that, in Regions A and B, incorporating DEM data leads to noticeable improvements in both F1-score and MCC. This indicates that topographic information provides effective auxiliary constraints for landslides with relatively intact morphologies and pronounced terrain relief. In contrast, in Region C, the F1-score slightly decreases after DEM integration. This can be explained by the weak topographic control in this region, where DEM-derived features contribute limited discriminative information and may even introduce noise due to insufficient spatial resolution, thereby negatively affecting model accuracy. These results suggest that the effectiveness of DEM integration is highly dependent on regional geomorphological conditions, and its applicability should be carefully evaluated in practical applications to achieve optimal model performance.

The visual results in Figure 8 further support the quantitative analysis. In Region A, ResU-SPMNet accurately delineates the overall landslide outlines, producing continuous boundaries and well-preserved shapes. Although the segmentation results in Regions B and C remain generally coherent, boundary blurring and local over-segmentation can be observed in certain areas. Moreover, in Regions A and B, the introduction of DEM data leads to more compact and refined landslide boundaries, with a clear reduction in local misclassification. In contrast, in Region C, DEM integration causes slight boundary expansion or irregular shapes for some small-scale landslides, indicating that the topographic constraints provided by DEM do not yield effective performance gains in this region.

4.3. Ablation Experiment

4.3.1. Model Structure Ablation

To evaluate the effectiveness of the SPM and the ASPP module for landslide segmentation performance, ablation studies were designed, with the results presented in Table 7. The experimental results reveal that after incorporating ASPP, the precision metric increased by 14 percentage points, indicating that multi-scale feature fusion effectively reduces false positives. However, the recall metric showed only a marginal improvement, suggesting that ASPP has limited effectiveness in mitigating omission errors. Furthermore, the improvements in F1-score and MCC metrics further demonstrate that ASPP through its convolutional layers with multiple dilation rates, effectively extracts and integrates features from different receptive fields.

When the SPM was introduced alone, precision, F1-score, and MCC metrics all exhibited improvements to some extent. This indicates that shape prior information constrained the model’s outputs to better align with landslide morphology, thereby reducing both false positives and false negatives. When ASPP and SPM were applied jointly, the model achieved optimal performance, with precision increasing to 0.6822. Although recall experienced a slight decrease compared to using ASPP alone, both F1-score and MCC reached the highest levels observed in the ablation study. This result demonstrates the complementary nature of ASPP’s multi-scale features and SPM’s shape constraints. Their combination significantly enhances the model’s capability to segment old landslides within complex scenes.

4.3.2. Structural Ablation of SPM

The SPM comprises two components: the self-update block and the cross-update block, each playing distinct roles in feature enhancement and shape information refinement. Specifically, the self-update block produces a global shape prior, which provides essential morphological guidance for the cross-update block to enhance local features. Guided by this prior, the cross-update block dynamically adjusts the weights of local features, strengthening responses consistent with landslide morphological patterns while suppressing noise or background responses that deviate significantly from the prior shape.

To further investigate the effectiveness of these two components, an ablation study focusing on the SPM structure was conducted, with results presented in Table 8. When the self-update block is removed, the SPM reduces to a self-attention-based local feature enhancement module. This module relies solely on the input features and enhances important regions through weight adjustments. Introducing only the cross-update block into the baseline model resulted in a significant increase in precision but a noticeable decrease in recall. This indicates that the cross-update block enhances the model’s attention to key regions by strengthening local features, thereby reducing false positives. However, excessive focus on local information may overlook global context, leading to the omission of certain true landslide areas. In contrast, when the self-update and cross-update blocks were jointly introduced, four evaluation metrics of precision, recall, F1-score, and MCC all showed significant improvements. In particular, the F1-score and MCC increase by 14 and 15 percentage points, respectively, compared with using the cross-update block alone. This demonstrates that the self-update block effectively compensates for the shortcomings arising from the excessive localization of the cross-update block. It guides the cross-update block to enhance critical local features without losing focus on the overall morphological integrity of landslides. The synergistic operation of both blocks enhances the accuracy of old landslide segmentation, leading to a significant improvement in landslide segmentation performance.

5. Discussion

5.1. Comparison Between the SPM and Attention Mechanisms

To verify the effectiveness of the shape prior module (SPM), this study integrates SPM as well as several mainstream attention modules, including SE, CBAM, and SK, into the same backbone network, ResU-Net, for comparative experiments. Conventional attention modules mainly output feature-importance weights, indicating which features are more informative, but they do not explicitly encode what the target object should look like. In contrast, the shape prior produced by SPM is a class-specific shape attention map that directly represents the potential spatial distribution and contour of each category in the image, thereby possessing explicit geometric meaning. Rather than simply reweighting feature responses, SPM incorporates learned shape knowledge as an additional and interpretable guiding signal, which is jointly updated with encoder features, enabling bidirectional and complementary enhancement between shape information and texture information. As shown by the quantitative results in Table 9, the performances of different attention mechanisms in old landslide segmentation differ markedly. Although generic attention modules such as SE, CBAM, and SK improve model performance to some extent, they remain clearly inferior to the shape prior module. The model incorporating SPM achieves substantial improvements across all four metrics, including precision, recall, F1-score, and MCC, with the F1-score reaching 0.6658 and the MCC increasing to 0.6068, significantly outperforming the other attention mechanisms.

Grad-CAM (Gradient-weighted Class Activation Mapping) is a widely used visualization technique in deep learning for interpreting model predictions by highlighting regions in the input image that contribute most to the network’s output. It computes the weighted combination of the feature maps from convolutional layers and their corresponding gradients, superimposing the result onto the original input to visually indicate the model’s focus areas. We visualized the final layer of the decoder using Grad-CAM. As shown in Figure 9, attention mechanisms such as SE, CBAM, and SK mainly focus on enhancing features along the channel or spatial dimensions, but they fail to effectively incorporate the shape knowledge of target objects. Consequently, when identifying old landslides, these models often suffer from issues such as blurred boundaries or incomplete contours. Compared with the heatmaps of other models, the key regions in the ResU-SPMNet model are more distinct and better aligned with the actual landslide areas. The model can focus more effectively on landslide features while suppressing irrelevant information, thereby improving overall accuracy.

5.2. Visualization of Global and Local Shape Priors

The shape prior module comprises two key components: the self-update block and the cross-update block, which are designed to extract global shape features and local detailed information of landslides, respectively. To validate whether the two sub-modules can effectively capture morphological features of different landslide types, widening and rectangle landslide samples were selected from joint dataset for independent training and visual analysis. The joint dataset comprises 148 widening landslides and 68 rectangular landslides. Table 10 presents a comparative analysis of performance differences across landslide types. The results indicate that models trained independently on widening and rectangular landslide datasets outperform those trained on the joint dataset across four evaluation metrics. Specifically, the F1-score increases by 2 percentage points for widening-type landslides and by 6 percentage points for rectangular landslides.

To further evaluate the modeling efficacy of shape priors for different landslide morphologies, visualization results from independently trained widening and rectangle landslides are presented in Figure 10 and Figure 11, respectively. Results reveal that the global shape priors effectively capture the macroscopic morphological structure of landslides, ensuring segmentation results align with the geometric prototypes of landslides. However, due to insufficient compensation for local details, it exhibits limitations in edge precision and the delineation of internal structures. In contrast, the local shape priors employ a cross-attention mechanism to integrate and optimize global shape priors and encoder features, highlighting the detailed features of landslides. A comparison of the global shape priors obtained from independently and jointly trained models reveals clear differences. In the independently trained datasets, the learned global shape priors exhibit more focused and coherent heatmap responses and provide a more specific characterization of the corresponding morphologies. The main reason is that during independent training, the model focuses solely on learning one landslide morphology, thereby extracting a more typical global shape. In joint training, however, the model learns multiple morphologies simultaneously, inevitably leading to the blurring of the most discriminative features for a specific morphology, resulting in a loss of specificity in the generated global shape priors.

5.3. Generalization Ability Test

To objectively evaluate the generalization capability of the ResU-SPMNet model in previously unseen areas, region D was selected as an independent test area for model generalization assessment. As an independent test area, region D was not involved in any training or validation processes, effectively avoiding data leakage and allowing a reliable evaluation of the model’s adaptability to new regions. Region D is characterized by pronounced terrain relief, intense erosion, and diverse landslide types. This complexity makes Region D an ideal area for evaluating the robustness and generalization capability of the model under varying geomorphological conditions.

Remote sensing imagery is strongly influenced by seasonal variations in illumination conditions, vegetation cover, and shadow length and orientation, which can alter the visual appearance of landslides and consequently affect model recognition performance. In terms of illumination conditions, summer is characterized by abundant sunlight, more uniform surface brightness, and higher overall image contrast, which facilitates the extraction of stable texture and structural features. In contrast, reduced solar radiation in autumn and winter often leads to insufficient brightness in local areas, decreasing spectral differences between landslides and background regions and increasing classification difficulty. From the perspective of solar elevation angle, winter and early spring are characterized by low solar angles, resulting in significantly elongated slope shadows whose orientations are strongly controlled by slope aspect, often producing extensive shadow coverage within landslide bodies or along their boundaries. Such shadow interference disrupts the original texture continuity of landslides, leading to errors in boundary delineation and area identification by the model. By contrast, the higher solar elevation angle in summer leads to more uniform surface illumination, allowing the geometric structural features of landslides, including head scarps, lateral boundaries, and depositional morphology, to be more clearly represented in the imagery. As multi-seasonal images from the same year were unavailable, the remote sensing images representing the four seasons were collected over the period from 2015 to 2024.

Table 11 presents the seasonal generalization test results of ResU-SPMNet and the baseline ResU-Net model in region D. ResU-SPMNet consistently outperforms the baseline ResU-Net across all four seasons in terms of precision, recall, F1-score, and MCC, indicating that the introduction of shape priors enables the model to maintain strong stability and robustness under unseen regional and seasonal variations. Specifically, ResU-SPMNet achieves the best generalization performance in summer, while the weakest performance is observed in winter, with an absolute F1-score difference of 0.016 between the two seasons. This can be mainly attributed to summer conditions, which are typically characterized by lower cloud cover, longer daylight duration, and higher solar elevation angles, resulting in more uniformly illuminated imagery with minimal shadow interference and facilitating the capture of clear landslide morphology by the shape prior module. Although vegetation is sparse or absent in winter and landslide morphology may appear relatively clear due to exposed surfaces, snow cover, frozen soil, low solar angles, and elongated cast shadows can obscure or alter exposed landslide surfaces, increasing the difficulty of accurately identifying true landslide morphology. Compared with summer, reduced solar radiation and lower solar elevation angles in spring and autumn lead to declines in recognition accuracy and overall performance.

6. Limitations and Future Study

Despite the positive progress in improving landslide segmentation performance, certain limitations remain. First, the training dataset is relatively small and primarily focuses on a single landform of the Loess Plateau, limiting the model’s ability to generalize and its robustness to variations in landslide morphology. Future work should focus on expanding a multi-source and multi-region old landslide sample to enhance the model’s generalizability and adaptability. Secondly, the current study relies solely on optical remote sensing images and DEM data, without fully integrating multi-temporal and multi-source data such as synthetic aperture radar (SAR), which could further enhance the model’s ability to capture the complexity of landslide morphology and improve detection accuracy. Furthermore, compared to the detection of new and deformed landslides, the current method’s accuracy in identifying old landslides remains relatively low. This is primarily due to the lack of sufficient old landslide samples and the limitations of existing lightweight models in extracting complex features. In the future, attention should be given to expanding the high-quality old landslide sample repository and exploring the integration of larger model to enhance the representation of subtle features and complex forms of old landslides.

7. Conclusions

To address the challenges of old landslide detection in the Loess Plateau, including weak geomorphic expression and limited accuracy of traditional methods, this study proposes a deep learning model integrating shape priors and multiscale features, termed ResU-SPMNet. Built upon the ResU-Net architecture, the proposed model incorporates a shape prior module and an atrous spatial pyramid pooling (ASPP) structure, enabling effective modeling of landslide morphology and accurate segmentation. The main conclusions are summarized as follows:

(1): The proposed model demonstrates superior performance in old landslide segmentation over the Loess Plateau, achieving a precision of 0.6822, a recall of 0.6524, an F1-score of 0.6669, and an MCC of 0.6167, representing substantial absolute improvements over conventional baseline models.
(2): Comparative experiments across different regions indicate that the model performs best in areas with pronounced elevation variations and well-preserved landslide morphology. Although performance decreases in relatively flat terrains where landslide features are highly degraded, the model maintains overall stability, demonstrating its adaptability to diverse geomorphic conditions.
(3): Generalization tests conducted in an independent region reveal that ResU-SPMNet exhibits robust performance across multispectral remote sensing images from different seasons. The best results are achieved under summer conditions, whereas performance declines in winter due to changes in illumination conditions and related factors.

Author Contributions

Conceptualization, L.P. and Y.L.; methodology, L.P. and Y.L.; software, L.P.; validation, L.P.; formal analysis, Q.X.; investigation, L.P. and P.Z.; resources, Y.D. and Z.L.; data curation, L.P. and Y.L.; writing—original draft preparation, L.P. and Y.L.; writing—review and editing, L.P., P.Z. and M.D.; visualization, L.P.; funding acquisition, M.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China under Grant 42374027; Geological Survey Projects of China Geological Survey under Grant Nos. DD20230436.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Acknowledgments

We sincerely thank the No. 149 Team of Gansu Coalfield Geology Bureau for providing the digital elevation model data.

Conflicts of Interest

Author Yunlong Li was employed by the company Liyuan Power Generation Branch of Yunnan Huadian Jinsha River Middle Reaches Hydropower Development Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Zhuang, J.Q.; Peng, J.B.; Wang, G.H.; Javed, I.; Wang, Y.; Li, W. Distribution and characteristics of landslide in Loess Plateau: A case study in Shaanxi province. Eng. Geol. 2018, 236, 89–96. [Google Scholar] [CrossRef]
Peng, J.B.; Wang, S.K.; Wang, Q.Y.; Zhuang, J.Q.; Huang, W.L.; Zhu, X.H.; Leng, Y.Q.; Ma, P.H. Distribution and genetic types of loess landslides in China. J. Asian Earth Sci. 2019, 170, 329–350. [Google Scholar] [CrossRef]
Zhang, Y.S.; Ren, S.S.; Liu, X.Y.; Guo, C.B.; Li, J.Q.; Bi, J.B.; Ran, L.A. Reactivation mechanism of old landslide triggered by coupling of fault creep and water infiltration: A case study from the east Tibetan Plateau. Bull. Eng. Geol. Environ. 2023, 82, 291. [Google Scholar] [CrossRef]
Gorum, T.; Carranza, E.J.M. Control of style-of-faulting on spatial pattern of earthquake-triggered landslides. Int. J. Environ. Sci. Technol. 2015, 12, 3189–3212. [Google Scholar] [CrossRef]
Huang, W.B.A.; Ding, M.T.; Li, Z.H.; Zhuang, J.Q.; Yang, J.; Li, X.L.; Meng, L.E.; Zhang, H.Y.; Dong, Y. An Efficient User-Friendly Integration Tool for Landslide Susceptibility Mapping Based on Support Vector Machines: SVM-LSM Toolbox. Remote Sens. 2022, 14, 3408. [Google Scholar] [CrossRef]
Ma, S.Y.; Shao, X.Y.; Xu, C.; Xu, Y.R. Insight from a Physical-Based Model for the Triggering Mechanism of Loess Landslides Induced by the 2013 Tianshui Heavy Rainfall Event. Water 2023, 15, 443. [Google Scholar] [CrossRef]
Fiorucci, F.; Ardizzone, F.; Mondini, A.C.; Viero, A.; Guzzetti, F. Visual interpretation of stereoscopic NDVI satellite images to map rainfall-induced landslides. Landslides 2019, 16, 165–174. [Google Scholar] [CrossRef]
Mabu, S.; Hirata, S.; Kuremoto, T. Landslide Area Detection from Synthetic Aperture Radar Images Using Convolutional Adversarial Autoencoder and One-class SVM. In Proceedings of the 2021 International Conference on Artificial Life and Robotics (ICAROB 2021), Hiroshima, Japan, 21–24 January 2021; pp. 575–580. [Google Scholar]
Hu, Z.X.; Wang, C.L.; Zhou, Z.G.; Li, C.R. Using Recovery Rate and SVM to Detect Landslides in Medium Satellite Images Time Series: A Case Study in Ludian, China. In Proceedings of the 2017 International Conference on Wireless Communications, Networking and Applications (WCNA2017), Shenzhen, China, 20–22 October 2017; pp. 240–244. [Google Scholar]
Bai, H.; Yu, G. An Improved Classification Algorithm Based on Support Vector Machine and Ridge Regression-Applied on Landslide Dam Disaster Events Detection. In Proceedings of the 2015 International Conference on Management Science & Engineering-22ND Annual Conference Proceedings, Vols I and II, Karlsruhe, Germany, 21–23 July 2015; pp. 301–306. [Google Scholar]
Huang, J.R.; Zekkos, D. Effect of Machine Learning Algorithms on Detection of Landslides Caused by the 2015 Lefkada Earthquake. In Proceedings of the Geo-Congress 2023: Geotechnical Data Analysis and Computation, Los Angeles, CA, USA, 26–29 March 2023; pp. 347–356. [Google Scholar]
Meghanadh, D.; Maurya, V.K.; Kumar, M.; Dwivedi, R. Automatic Detection of Landslides Based on Machine Learning Framework. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 8460–8463. [Google Scholar]
Haciefendioglu, K.; Adanur, S.; Demir, G. Automatic Landslide Segmentation Using a Combination of Grad-CAM Visualization and K-Means Clustering Techniques. Iran. J. Sci. Technol. Trans. Civ. Eng. 2024, 48, 943–959. [Google Scholar] [CrossRef]
Tehrani, F.S.; Santinelli, G.; Herrera, M.H. Multi-Regional landslide detection using combined unsupervised and supervised machine learning. Geomat. Nat. Hazards Risk 2021, 12, 1015–1038. [Google Scholar] [CrossRef]
Sun, H.; Yang, S.G.; Wang, R.; Yang, K.X. Study on a Landslide Segmentation Algorithm Based on Improved High-Resolution Networks. Appl. Sci. 2024, 14, 6459. [Google Scholar] [CrossRef]
Liu, Q.; Wu, T.T.; Deng, Y.H.; Liu, Z.H. Intelligent identification of landslides in loess areas based on the improved YOLO algorithm: A case study of loess landslides in Baoji City. J. Mt. Sci. 2023, 20, 3343–3359. [Google Scholar] [CrossRef]
Jiang, W.D.; Xi, J.B.; Li, Z.H.; Zang, M.H.; Chen, B.; Zhang, C.L.; Liu, Z.J.; Gao, S.Y.; Zhu, W. Deep Learning for Landslide Detection and Segmentation in High-Resolution Optical Images along the Sichuan-Tibet Transportation Corridor. Remote Sens. 2022, 14, 5490. [Google Scholar] [CrossRef]
Gao, S.Y.; Xi, J.B.; Li, Z.H.; Ge, D.Q.; Guo, Z.C.; Yu, J.C.; Wu, Q.; Zhao, Z.; Xu, J.H. Optimal and Multi-View Strategic Hybrid Deep Learning for Old Landslide Detection in the Loess Plateau, Northwest China. Remote Sens. 2024, 16, 1362. [Google Scholar] [CrossRef]
Liu, X.; Peng, Y.; Lu, Z.; Li, W.; Yu, J.; Ge, D.; Xiang, W. Feature-Fusion Segmentation Network for Landslide Detection Using High-Resolution Remote Sensing Images and Digital Elevation Model Data. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
Lu, Z.; Peng, Y.; Li, W.; Yu, J.; Ge, D.; Han, L.; Xiang, W. An Iterative Classification and Semantic Segmentation Network for Old Landslide Detection Using High-Resolution Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
Ma, S.Y.; Shao, X.Y.; Xu, C.; Niu, P.F. Geometry and mobility characteristics of landslides triggered by the 2018 Mw 7.5 Palu earthquake in Indonesia. Landslides 2025, 22, 3973–3988. [Google Scholar] [CrossRef]
Wei, J.B.; Wang, D.K.; Yang, Z.K.; Wang, J.X.; Li, Y.M.; Hu, W.Y. Characteristics and mechanism of large deformation of a reservoir colluvial landslide-a case study of the Yulinerzu landslide in Xiluodu Reservoir, China. Front. Earth Sci. 2024, 11, 1337998. [Google Scholar] [CrossRef]
Zhou, Y.Y.; Shi, Z.M.; Qiu, T.; Yu, S.B.; Zhang, Q.Z.; Shen, D.Y. Experimental study on morphological characteristics of landslide dams in different shaped valleys. Geomorphology 2022, 400, 108081. [Google Scholar] [CrossRef]
Orris, G.J.; Williams, J.W. Landslide length-width ratios as an aid in landslide identification and verification. Bull. Assoc. Eng. Geol. 1984, 21, 371–375. [Google Scholar] [CrossRef]
Taylor, F.E.; Malamud, B.D.; Witt, A.; Guzzetti, F. Landslide shape, ellipticity and length-to-width ratios. Earth Surf. Process. Landf. 2018, 43, 3164–3189. [Google Scholar] [CrossRef]
Li, L.P.; Lan, H.X.; Strom, A.; Macciotta, R. Landslide length, width, and aspect ratio: Path-dependent measurement and a revisit of nomenclature. Landslides 2022, 19, 3009–3029. [Google Scholar] [CrossRef]
Li, L.P.; Lan, H.X.; Strom, A.; Macciotta, R. Landslide longitudinal shape: A new concept for complementing landslide aspect ratio. Landslides 2022, 19, 1143–1163. [Google Scholar] [CrossRef]
Liu, X.M.; Su, P.C.; Li, Y.; Xia, Z.X.; Ma, S.Y.; Xu, R.; Lu, Y.; Li, D.H.; Lu, H.; Yuan, R.M. Spatial distribution of landslide shape induced by Luding Ms6.8 earthquake, Sichuan, China: Case study of the Moxi Town. Landslides 2023, 20, 1667–1678. [Google Scholar] [CrossRef]
Niculita, M. Automatic landslide length and width estimation based on the geometric processing of the bounding box and the geomorphometric analysis of DEMs. Nat. Hazards Earth Syst. Sci. 2016, 16, 2021–2030. [Google Scholar] [CrossRef]
Ju, Y.Z.; Xu, Q.; Jin, S.C.; Li, W.L.; Su, Y.J.; Dong, X.J.; Guo, Q.H. Loess Landslide Detection Using Object Detection Algorithms in Northwest China. Remote Sens. 2022, 14, 1182. [Google Scholar] [CrossRef]
You, X.; He, J.J.; Yang, J.; Gu, Y. Learning With Explicit Shape Priors for Medical Image Segmentation. IEEE Trans. Med. Imaging 2025, 44, 927–940. [Google Scholar] [CrossRef] [PubMed]
Ding, P.; Qian, H.M.; Zhou, Y.P.; Yan, S.Y.; Feng, S.B.; Yu, S. Real-time efficient semantic segmentation network based on improved ASPP and parallel fusion module in complex scenes. J. Real-Time Image Process. 2023, 20, 41. [Google Scholar] [CrossRef]
Hu, L.; Zhou, X.; Ruan, J.C.; Li, S.P. ASPP+-LANet: A Multi-Scale Context Extraction Network for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2024, 16, 1036. [Google Scholar] [CrossRef]
Liu, R.R.; Tao, F.; Liu, X.T.; Na, J.M.; Leng, H.J.; Wu, J.J.; Zhou, T. RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens. 2022, 14, 3109. [Google Scholar] [CrossRef]
Shi, J.F.; Gao, Z.M.; Wang, A.C. Multi-scale image semantic segmentation based on ASPP and improved HRNet. Chin. J. Liq. Cryst. Disp. 2021, 36, 1497–1505. [Google Scholar] [CrossRef]
Zhao, S.; Feng, Z.Z.; Chen, L.; Li, G.D. DANet: A Semantic Segmentation Network for Remote Sensing of Roads Based on Dual-ASPP Structure. Electronics 2023, 12, 3243. [Google Scholar] [CrossRef]
Woo, S.H.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision-ECCV 2018, PT VII, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E.H. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef]
Wang, Q.L.; Wu, B.G.; Zhu, P.F.; Li, P.H.; Zuo, W.M.; Hu, Q.H. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, WA, USA, 14–19 June 2020; pp. 11531–11539. [Google Scholar]
Konuthula, N.; Perez, F.A.; Maga, A.M.; Abuzeid, W.M.; Moe, K.; Hannaford, B.; Bly, R.A. Automated atlas-based segmentation for skull base surgical planning. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 933–941. [Google Scholar] [CrossRef]
Tai, Y.W.; Jia, J.Y.; Tang, C.K. Local color transfer via probabilistic segmentation by expectation-maximization. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 747–754. [Google Scholar]
He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Identity Mappings in Deep Residual Networks. In Proceedings of the Computer Vision-ECCV 2016, PT IV, Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention, PT III, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Qi, W.W.; Wei, M.F.; Yang, W.T.; Xu, C.; Ma, C. Automatic Mapping of Landslides by the ResU-Net. Remote Sens. 2020, 12, 2487. [Google Scholar] [CrossRef]
Song, C.G.; Wu, L.J.; Chen, Z.C.; Zhou, H.F.; Lin, P.J.; Cheng, S.Y.; Wu, Z.H. Pixel-Level Crack Detection in Images Using SegNet. In Proceedings of the Multi-Disciplinary Trends in Artificial Intelligence, Kuala Lumpur, Malaysia, 17–19 November 2019; pp. 247–254. [Google Scholar]
Zhao, Z.Y.; Tan, S.C.; Zhang, Q.H.; Chen, H. Automatic Identification Model for Landslide Disaster Using Remote Sensing Images Based on Improved Multiresunet. IEEE Access 2025, 13, 10653–10662. [Google Scholar] [CrossRef]
Chen, X.F.; Wang, S.W.; Dinavahi, V.; Yang, L.J.; Wu, D.B.; Shen, M.Y. Landslide Recognition Based on DeepLabv3+Framework Fusing ResNet101 and ECA Attention Mechanism. Appl. Sci. 2025, 15, 2613. [Google Scholar] [CrossRef]

Figure 1. Distribution of landslide points in the Loess Plateau region. (A) is located in Haidong City, Qinghai Province; (B) is located in Tianshui City, Gansu Province; (C) is located in Xianyang City, Shaanxi Province; (D) is located in Xiji County, Guyuan City, Ningxia Hui Autonomous Region. The yellow arrow indicates the sliding direction.

Figure 2. Two different types of old landslides: (a,b) are widening types with weak terrain control constraints, (c,d) are rectangular types with strong terrain control constraints. Arrows indicate the direction of movement.

Figure 3. Examples of landslide samples. (a,b) represent cliffs, while (c,d) correspond to erosional gullies. The landslide is within the red box.

Figure 4. SPM network structure.

Figure 5. ASPP network structure.

Figure 6. ResU-SPMNet network structure.

Figure 7. Segmentation results of old landslides: (a) landslide image; (b) labels; (c) ResU-Net; (d) SegNet; (e) MultiResUnet; (f) Deeplabv3+; (g) ResU-SPMNet (RGB); (h) ResU-SPMNet (RGB + DEM).

Figure 8. Segmentation results of old landslides in different regions. (A) is located in Region A, (B) is located in Region B, (C) is located in Region C. Arrows indicate the direction of movement.

Figure 9. Visualization results of heat map of old landslide: (a) landslide image; (b) labels; (c) SE; (d) CBAM; (e) SK; (f) SPM; The brighter color represents that the model pays more attention to that area.

Figure 10. Visualization results of global and local shape priors for widening landslides: (a) landslide image; (b) local shape prior; (c) global shape prior obtained from independent training; (d) global shape prior obtained from joint training. Arrows indicate the direction of movement. The brighter color represents that the model pays more attention to that area.

Figure 11. Visualization results of global and local shape priors for rectangular landslides: (a) landslide image; (b) local shape prior; (c) global shape prior obtained from independent training; (d) global shape prior obtained from joint training. Arrows indicate the direction of movement. The brighter color represents that the model pays more attention to that area.

Table 1. Dataset partitioning.

	Training	Validation	Testing
Landslide	910	260	130
Non-landslide	455	130	65

Table 2. Details of the ResU-SPMNet model.

Stage Name	Kernel Size	Step Size	Fill	Output Dimensions
Conv	3 × 3	1	1	512 × 512 × 64
ResBlock1	3 × 3	2	1	256 × 256 × 128
ResBlock2	3 × 3	2	1	128 × 128 × 256
ResBlock3	3 × 3	2	1	64 × 64 × 512
aspp	-	-		64 × 64 × 512
Spm2	-	-		128 × 128 × 256
Concat1	-	-		128 × 128 × 768
ResBlock4	3 × 3	1	1	128 × 128 × 256
Spm1	-	-		256 × 256 × 128
Concat2	-	-		256 × 256 × 384
ResBlock5	3 × 3	1	1	256 × 256 × 128
Spm0	-	-		512 × 512 × 64
Concat3	-	-		512 × 512 × 192
ResBlock6	3 × 3	1	1	512 × 512 × 64
Conv	1 × 1	1	1	512 × 512

Table 3. Confusion Matrix.

Real Situation	Forecast Situation
Real Situation	Landslide	Non-Landslide
Landslide	TP	FN
Non-landslide	FP	TN

Table 4. Configuration information.

Configuration Name	Parameters
Processor	Intel(R) Core (TM) i5-14600KF
GPU	NVIDIA GeForce RTX 4090
Operating system	Windows11
Frames	PyTorch2.2.2
Programming language	Python3.8

Table 5. Evaluation indexes of landslide segmentation results.

Model	Precision	Recall	F1-Score	MCC
ResU-Net	0.3974	0.6635	0.4971	0.4396
SegNet	0.3275	0.6400	0.4333	0.3692
MultiResUnet	0.3949	0.5314	0.4531	0.3817
Deeplabv3+	0.4093	0.6568	0.4931	0.4465
ResU-SPMNet (RGB)	0.6817	0.6654	0.6734	0.5870
ResU-SPMNet (RGB + DEM)	0.6822	0.6524	0.6669	0.6167