1. Introduction
The Loess Plateau is one of the regions in China most severely affected by frequent landslide disasters, among which old landslides are particularly typical, characterized by large volume, high hazard potential, and strong concealment [
1,
2]. Old landslides are the products of long-term geomorphological evolution under complex geological conditions and are generally defined as landslides that have previously occurred, forming loose deposits with significant structural damage [
3]. In recent years, the intensification of extreme weather events and increased geological activity [
4] have substantially increased the probability of reactivation in old landslides, posing serious threats to human life and property [
5,
6]. Precise detection of old landslides in the Loess Plateau is crucial for accurately assessing the risk of reactivation and ensuring regional safety.
Traditional landslide detection methods primarily rely on manual visual interpretation and field surveys, which depend on expert knowledge to analyze landslide morphology and boundaries in remote sensing images. Although relatively accurate, these methods are inherently subjective and inefficient, rendering them unsuitable for large-scale disaster monitoring [
4,
7]. With advances in remote sensing technology, machine learning methods have been increasingly applied to landslide detection. Common methods include support vector machines (SVMs) [
8,
9,
10], random forests (RFs) [
11,
12], and k-means clustering algorithms [
13,
14]. Compared with manual interpretation, machine learning offers a higher degree of automation. However, the effectiveness of these models largely hinges on feature selection and engineering, which require extensive domain expertise and are often time-consuming.
As an advanced extension of neural networks, deep learning exhibits superior feature learning capabilities and has achieved remarkable success in landslide detection in recent years [
15,
16,
17,
18]. Liu et al. [
19] proposed FFS-Net, a feature fusion semantic segmentation network that integrates heterogeneous high-resolution remote sensing images and digital elevation models, significantly improving the detection accuracy of visually ambiguous old landslides with similar color and texture to their surroundings. Lu et al. [
20] introduced an iterative classification semantic segmentation network (ICSSN) based on multi-task learning, improving the F1-score from 0.5054 to 0.5448. Jiang et al. [
17] proposed a landslide detection method simulating hard samples, achieving a 10% improvement in F1-score over the baseline Mask-RCNN model. Although these methods have advanced the accuracy of old landslide detection, their F1-scores remain below 0.6. Hence, reliable detection of old landslides remains a significant challenge.
The morphological characteristics of landslides is largely influenced by the downslope movement of materials and the terrain-controlled source-travel-deposition process, resulting in obvious shape characteristics that have been playing an important role in field investigations of landslides [
21,
22,
23,
24]. Existing studies demonstrate that landslides exhibit distinct geometric structural features in their morphology. For instance, Taylor et al. [
25] utilized ellipticity and aspect ratios to model landslide morphology, revealing statistical patterns of landslide shapes through large-scale landslide databases. Lan et al. [
26,
27] noted that although the traditional length-to-width ratio can characterize the overall extent of a landslide, it is insufficient to reflect the width variation along the sliding path. They thus proposed a new method for generating the landslide centerline based on digital elevation models and polygon boundaries, which quantifies the landslide’s geometric morphology from the longitudinal profile perspective. Liu et al. [
28] systematically classified landslide samples in the Moxi Town area, identifying nine typical morphological types, including tabular, droplet-shaped, fan-shaped, Y-shaped, rhomboidal, among others. Furthermore, they revealed the characteristic distribution patterns of landslide morphology, which are controlled by multiple factors such as topographic relief, slope gradient conditions, and the initiation location. Niculita et al. [
29] proposed an algorithm for automated estimation of landslide length and width to address inherent limitations of conventional GIS methods in morphological analysis. Collectively, landslides typically exhibit distinct morphological patterns, which not only reflect the formation and evolution processes of the landslides but can also serve as effective criteria for identification.
Although shape features have played an important role in many field investigations of landslides, research on their effective application in deep learning to enhance detection performance remains relatively scarce. Moreover, due to prolonged natural evolution, old landslides gradually acquire surface color and texture characteristics that resemble the surrounding environment, leading to high visual ambiguity and significantly increasing the difficulty of detection [
15,
30].
To address these challenges, this study focuses on old landslides in the Loess Plateau and proposes an improved model, ResU-SPMNet, which integrates shape priors and multi-scale features. The principal contributions are as follows: Firstly, a shape prior module is introduced [
31], which collaboratively extracts the global contours and local details of landslides through self-update and cross-update blocks, and transmits the learned shape information to the decoder, effectively enhancing the model’s perception and constraint of landslide geometries. Secondly, an atrous spatial pyramid pooling module is incorporated to strengthen the model’s ability to extract landslide features at multiple scales, thereby minimizing the loss of valuable information during training [
32,
33,
34,
35,
36]. Finally, systematic performance evaluation experiments are conducted, and the results show that ResU-SPMNet significantly outperforms other semantic segmentation models on several metrics, markedly enhancing the model’s ability to capture landslide shapes, and demonstrating higher segmentation accuracy and generalization ability.
2. Study Area and Data
2.1. Overview of the Study Area
The study area for old landslides is situated in the Loess Plateau, as illustrated in
Figure 1. The Loess Plateau is situated in northwestern China and represents the largest and thickest loess deposit region in the world. The region exhibits an overall west-to-east descending topographic trend, with elevations ranging from approximately 800 to 3000 m, and is characterized by highly dissected terrain with dense gully networks. The climate is classified as a warm temperate continental monsoon climate, featuring hot and humid summers and cold, dry winters. Annual precipitation is relatively low and unevenly distributed, with the majority occurring during the summer season. The soils are loose, vegetation cover is sparse, and soil erosion is severe. Under the combined influence of these distinctive geomorphological and hydroclimatic conditions, the Loess Plateau has become one of the most landslide-prone regions in China.
The study area consists of four typical regions: A, B, C, and D. Among them, regions A, B, and C are used to construct the old landslide dataset, while region D is used for testing the model’s generalization ability. The regions exhibit significant differences in topography and geomorphology. Region A is located in Haidong City, Qinghai Province, with geographic coordinates ranging from 102°23′ to 103°39′ E and 35°47′ to 36°12′ N. This region lies at the transitional zone between the Loess Plateau and the Tibetan Plateau, with an elevation ranging from 1800 to 3900 m. Landslide bodies in this region are medium to large in scale and are commonly found on steep slopes and the edges of valleys. Region B is located in Tianshui City, Gansu Province, with geographic coordinates ranging from 105°20′ to 106°20′ E and 34°44′ to 35°11′ N. It is situated in the Loess Hills of the Weihe River basin in the southeastern part of the Loess Plateau, characterized by loess hills and river valley landforms. The elevation ranges from approximately 1200 to 2000 m. Landslides in this region are generally small to medium in scale but occur with high frequency. They predominantly occur at the foot of slopes and on steep banks. Region C is located in Xianyang City, Shaanxi Province, with geographic coordinates ranging from 108°29′ to 108°58′ E and 34°26′ to 34°50′ N. The elevation is relatively low, ranging from approximately 390 to 1600 m. Landslides mainly occur at the edges of the loess tableland, with small-scale landslide bodies usually triggered by rainfall or artificial slope cutting. Region D is located in Xiji County, Guyuan City, Ningxia Hui Autonomous Region, with geographic coordinates ranging from 105°20′ to 106°04′ E and 35°35′ to 36°14′ N. It lies in the southern part of Ningxia in the Loess Hills, with an elevation ranging from approximately 1500 to 2800 m.
Li et al. [
27] proposed the concept of landslide longitudinal shape, suggesting that variations in landslide width along the movement direction are not merely geometric characteristics but also reflect, to some extent, the kinematic constraints and evolutionary processes of landslides. In that study, landslide shapes were classified into rectangular, widening, narrowing, spindle, and hourglass. Building upon this theoretical framework and integrating extensive field investigations with remote sensing interpretations, our study classifies landslides into two major types: widening and rectangular, as illustrated in
Figure 2. Further subdivision was not pursued because the remaining landslide types in the study area are relatively scarce, and the sample size is insufficient to support stable training of deep learning models.
2.2. Production of Experimental Data and Sample Set
The dataset includes optical remote sensing imagery and DEM data. Optical remote sensing images present texture and shape features of the landslides, while DEM data quantitatively constrain sliding directions and depositional patterns via terrain parameters including elevation, slope gradient, and curvature indices, thereby determining ultimate morphological configurations. Integration of these data sources enables models to concurrently capture spectral features and geomorphological characteristics of landslides during training, thus enhancing comprehensive discrimination capabilities for landslide boundaries and morphological structures.
The optical remote sensing images used in this study are obtained from Google Earth, and the DEM data are obtained from local geological institutions. The resolution of the optical images is 2 m, while the DEM data is 8 m, the imagery spans a temporal range from 2015 to 2024. During dataset construction, historical images from different years and seasons were manually inspected in Google Earth, and those with the clearest landslide morphological characteristics were selected. The dataset construction process is summarized as follows: First, old landslide locations were identified by integrating optical remote sensing imagery with field survey results. Optical remote sensing images and DEM data covering the study area were then collected. The DEM data were resampled to match the spatial resolution of the optical imagery. After resampling, the DEM values were normalized to a range of 0–255. Next, the optical remote sensing imagery and the normalized DEM data were band-composited to form four-band image data. Subsequently, the “Data Sample Production” tool in the SVM-LSM sample production toolbox [
5] is used for batch cropping to obtain images. For annotation, the open-source image labeling tool Labelme was employed. The cropped image patches were imported into Labelme, and semantic regions corresponding to each class were manually delineated using the drawing tools provided by the software. Labelme supports multiple annotation types, including polygons, rectangles, and free-form shapes, and each annotated region was assigned a corresponding semantic class label. After annotation, the results were saved as JSON files, which contain the coordinate information and class labels for each annotated region. Finally, the JSON-format annotations were converted into TIFF-format label files required for this study. All images and corresponding labels were resized to 512 × 512 pixels. Each image consists of four channels, while each label is a single-channel raster.
Owing to long-term fluvial erosion in the Loess Plateau region, extensive gully landforms and steep slopes have developed. These landforms exhibit optical characteristics similar to those of landslides, which can easily lead to misclassification and reduced detection accuracy. Therefore, to enhance the model’s ability to discriminate between landslides and non-landslide features, typical non-landslide samples, such as eroded gullies and slope-foot deposits, were deliberately included in the dataset to strengthen the model’s discrimination capability for morphologically similar landforms, as shown in
Figure 3. The dataset includes 1300 positive samples and 650 negative samples. The dataset was divided into training, validation, and testing subsets in a ratio of 7:2:1, as detailed in
Table 1. Due to the limited number of landslide samples, data augmentation techniques including horizontal flipping, vertical flipping, and rotation were applied to expand the dataset.
5. Discussion
5.1. Comparison Between the SPM and Attention Mechanisms
To verify the effectiveness of the shape prior module (SPM), this study integrates SPM as well as several mainstream attention modules, including SE, CBAM, and SK, into the same backbone network, ResU-Net, for comparative experiments. Conventional attention modules mainly output feature-importance weights, indicating which features are more informative, but they do not explicitly encode what the target object should look like. In contrast, the shape prior produced by SPM is a class-specific shape attention map that directly represents the potential spatial distribution and contour of each category in the image, thereby possessing explicit geometric meaning. Rather than simply reweighting feature responses, SPM incorporates learned shape knowledge as an additional and interpretable guiding signal, which is jointly updated with encoder features, enabling bidirectional and complementary enhancement between shape information and texture information. As shown by the quantitative results in
Table 9, the performances of different attention mechanisms in old landslide segmentation differ markedly. Although generic attention modules such as SE, CBAM, and SK improve model performance to some extent, they remain clearly inferior to the shape prior module. The model incorporating SPM achieves substantial improvements across all four metrics, including precision, recall, F1-score, and MCC, with the F1-score reaching 0.6658 and the MCC increasing to 0.6068, significantly outperforming the other attention mechanisms.
Grad-CAM (Gradient-weighted Class Activation Mapping) is a widely used visualization technique in deep learning for interpreting model predictions by highlighting regions in the input image that contribute most to the network’s output. It computes the weighted combination of the feature maps from convolutional layers and their corresponding gradients, superimposing the result onto the original input to visually indicate the model’s focus areas. We visualized the final layer of the decoder using Grad-CAM. As shown in
Figure 9, attention mechanisms such as SE, CBAM, and SK mainly focus on enhancing features along the channel or spatial dimensions, but they fail to effectively incorporate the shape knowledge of target objects. Consequently, when identifying old landslides, these models often suffer from issues such as blurred boundaries or incomplete contours. Compared with the heatmaps of other models, the key regions in the ResU-SPMNet model are more distinct and better aligned with the actual landslide areas. The model can focus more effectively on landslide features while suppressing irrelevant information, thereby improving overall accuracy.
5.2. Visualization of Global and Local Shape Priors
The shape prior module comprises two key components: the self-update block and the cross-update block, which are designed to extract global shape features and local detailed information of landslides, respectively. To validate whether the two sub-modules can effectively capture morphological features of different landslide types, widening and rectangle landslide samples were selected from joint dataset for independent training and visual analysis. The joint dataset comprises 148 widening landslides and 68 rectangular landslides.
Table 10 presents a comparative analysis of performance differences across landslide types. The results indicate that models trained independently on widening and rectangular landslide datasets outperform those trained on the joint dataset across four evaluation metrics. Specifically, the F1-score increases by 2 percentage points for widening-type landslides and by 6 percentage points for rectangular landslides.
To further evaluate the modeling efficacy of shape priors for different landslide morphologies, visualization results from independently trained widening and rectangle landslides are presented in
Figure 10 and
Figure 11, respectively. Results reveal that the global shape priors effectively capture the macroscopic morphological structure of landslides, ensuring segmentation results align with the geometric prototypes of landslides. However, due to insufficient compensation for local details, it exhibits limitations in edge precision and the delineation of internal structures. In contrast, the local shape priors employ a cross-attention mechanism to integrate and optimize global shape priors and encoder features, highlighting the detailed features of landslides. A comparison of the global shape priors obtained from independently and jointly trained models reveals clear differences. In the independently trained datasets, the learned global shape priors exhibit more focused and coherent heatmap responses and provide a more specific characterization of the corresponding morphologies. The main reason is that during independent training, the model focuses solely on learning one landslide morphology, thereby extracting a more typical global shape. In joint training, however, the model learns multiple morphologies simultaneously, inevitably leading to the blurring of the most discriminative features for a specific morphology, resulting in a loss of specificity in the generated global shape priors.
5.3. Generalization Ability Test
To objectively evaluate the generalization capability of the ResU-SPMNet model in previously unseen areas, region D was selected as an independent test area for model generalization assessment. As an independent test area, region D was not involved in any training or validation processes, effectively avoiding data leakage and allowing a reliable evaluation of the model’s adaptability to new regions. Region D is characterized by pronounced terrain relief, intense erosion, and diverse landslide types. This complexity makes Region D an ideal area for evaluating the robustness and generalization capability of the model under varying geomorphological conditions.
Remote sensing imagery is strongly influenced by seasonal variations in illumination conditions, vegetation cover, and shadow length and orientation, which can alter the visual appearance of landslides and consequently affect model recognition performance. In terms of illumination conditions, summer is characterized by abundant sunlight, more uniform surface brightness, and higher overall image contrast, which facilitates the extraction of stable texture and structural features. In contrast, reduced solar radiation in autumn and winter often leads to insufficient brightness in local areas, decreasing spectral differences between landslides and background regions and increasing classification difficulty. From the perspective of solar elevation angle, winter and early spring are characterized by low solar angles, resulting in significantly elongated slope shadows whose orientations are strongly controlled by slope aspect, often producing extensive shadow coverage within landslide bodies or along their boundaries. Such shadow interference disrupts the original texture continuity of landslides, leading to errors in boundary delineation and area identification by the model. By contrast, the higher solar elevation angle in summer leads to more uniform surface illumination, allowing the geometric structural features of landslides, including head scarps, lateral boundaries, and depositional morphology, to be more clearly represented in the imagery. As multi-seasonal images from the same year were unavailable, the remote sensing images representing the four seasons were collected over the period from 2015 to 2024.
Table 11 presents the seasonal generalization test results of ResU-SPMNet and the baseline ResU-Net model in region D. ResU-SPMNet consistently outperforms the baseline ResU-Net across all four seasons in terms of precision, recall, F1-score, and MCC, indicating that the introduction of shape priors enables the model to maintain strong stability and robustness under unseen regional and seasonal variations. Specifically, ResU-SPMNet achieves the best generalization performance in summer, while the weakest performance is observed in winter, with an absolute F1-score difference of 0.016 between the two seasons. This can be mainly attributed to summer conditions, which are typically characterized by lower cloud cover, longer daylight duration, and higher solar elevation angles, resulting in more uniformly illuminated imagery with minimal shadow interference and facilitating the capture of clear landslide morphology by the shape prior module. Although vegetation is sparse or absent in winter and landslide morphology may appear relatively clear due to exposed surfaces, snow cover, frozen soil, low solar angles, and elongated cast shadows can obscure or alter exposed landslide surfaces, increasing the difficulty of accurately identifying true landslide morphology. Compared with summer, reduced solar radiation and lower solar elevation angles in spring and autumn lead to declines in recognition accuracy and overall performance.