Improved UCTransNet by Integrating Pyramid Kernel Interaction with Triplet Attention for Identifying Multi-Scale Landslides from GF-2 Imagery

Wang, Miao; Ding, Weicui; Liu, Meiling; Liu, Zujian; Liu, Xiangnan; Wen, Yanan; Li, Hao

doi:10.3390/rs18030492

Open AccessArticle

Improved UCTransNet by Integrating Pyramid Kernel Interaction with Triplet Attention for Identifying Multi-Scale Landslides from GF-2 Imagery

by

Miao Wang

^1,2,

Weicui Ding

^1,2,*

,

Meiling Liu

^2,3,

Zujian Liu

^4,5,

Xiangnan Liu

^2,3,

Yanan Wen

^2,3 and

Hao Li

^2,3

¹

Chinese Academy of Geological Sciences, Beijing 100037, China

²

School of Artificial Intelligence, China University of Geosciences Beijing, Beijing 100083, China

³

Hebei Key Laboratory of Geospatial Digital Twin and Collaborative Optimization, China University of Geosciences Beijing, Beijing 100083, China

⁴

China Aero Geophysical Survey and Remote Sensing Center for Natural Resources, Beijing 100083, China

⁵

Key Laboratory of Airborne Geophysics and Remote Sensing Geology, Ministry of Nature and Resources, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(3), 492; https://doi.org/10.3390/rs18030492

Submission received: 22 December 2025 / Revised: 25 January 2026 / Accepted: 27 January 2026 / Published: 3 February 2026

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

The proposed UCTransNet-TPKI model integrates Pyramid Kernel Interaction and Triplet Attention to effectively distinguish small-scale landslides from bare soil interference.
The method achieved state-of-the-art accuracy (F1-score 0.9008) on high-resolution GF-2 imagery, outperforming MFFENet, TransLandSeg, and Segformer++ in complex mountainous terrains.

What are the implications of the main findings?

This framework provides a robust automated solution for rapid post-disaster landslide mapping and geological hazard risk assessment in spectrally complex regions.
The successful synergistic integration of multi-scale convolution and cross-dimensional attention offers a transferable technical reference for solving the “same spectrum, different objects” challenge in remote sensing.

Abstract

Landslides in mountainous regions threaten infrastructure and human safety, making high-accuracy landslide inventories crucial for disaster management. However, fine-grained identification using high-resolution remote sensing imagery is hindered by low small-landslide detection accuracy and bare soil spectral interference. The aim of this study is to propose a lightweight UCTransNet with Triplet Attention and Pyramid Kernel Interaction (UCTransNet-TPKI) deep learning model for accurate multi-scale landslide extraction. The study area is located in Wushan County, Chongqing. GF-2 imagery from 2022 was collected, along with field sampling data and Mengdong dataset as validation data. The model proposed in this study, named UCTransNet-TPKI, is based on an improved UCTransNet architecture. Its key innovations include the introduction of two critical modules: the Pyramid Kernel Interaction module and the Triplet Attention mechanism. The PKI module captures multi-scale local contextual information in parallel under different receptive fields, significantly enhancing the network’s ability to extract landslide features. Concurrently, the Triplet Attention mechanism effectively refines feature representations by capturing the interaction dependencies across the three dimensions of a feature map. This enables the model to focus more precisely on key areas, such as the main body and edges of a landslide, while simultaneously suppressing interference from background noise. The experimental results show that UCTransNet-TPKI achieved the highest F1-score of 0.9008 and an IoU of 0.8252, outperforming MFFENet, TransLandSeg, and Segformer++. Ablation studies confirmed the contributions of each component, with the PKI module improving IoU by 0.72%, the Triplet Attention mechanism increasing IoU by 0.9%, and their combination yielding a clear synergistic enhancement of overall performance. Furthermore, UCTransNet-TPKI demonstrated strong generalization on the Mengdong dataset, achieving an F1-score of 0.9230 and an IoU of 0.8560. These results demonstrate that UCTransNet-TPKI provides an accurate automated landslide mapping solution, offering significant value for post-disaster emergency response and geological hazard management.

Keywords:

multi-scale landslide extraction; UCTransNet-TPKI; Pyramid Kernel Interaction; Triplet Attention mechanism

1. Introduction

Landslides, as a frequent and highly hazardous natural disaster in mountainous areas, not only seriously jeopardize infrastructure, property, and human life, but also seriously affect natural ecosystems [1,2,3,4]. Landslide identification focuses on the swift identification and cartography of landslide spatial distribution to construct accurate disaster inventories [5,6]. Such inventories provide critical data support for formulating scientific mitigation strategies and serve as important cornerstones for early warning, susceptibility analysis, and risk management [7,8,9]. The quick creation of large-scale landslide distribution maps is essential for post-disaster emergency response, resource allocation, and reconstruction planning, especially following triggering events like intense rain or significant earthquakes [10,11,12,13,14].

High-resolution remote sensing images (HRSI) may now be obtained in greater quantities thanks to the quick development of remote sensing technology. Their characteristics of rapid, wide coverage, and all-weather data collection make them widely applicable in landslide identification, mapping, and susceptibility analysis [15,16,17,18].

At the moment, there are three main categories of landslide identification techniques based on HRSI: statistical analysis-based methods, traditional machine learning methods, and deep learning methods [19,20,21,22]. Statistical analysis methods typically rely on manually extracted image texture, spectral, and morphological features to build identification models. However, these methods involve complex feature engineering, and manual intervention can introduce subjective bias, resulting in limited model generalization capability [23]. Traditional machine learning algorithms include Bayesian methods [24,25], logistic regression [26], support vector machines [27], and random forests [28], among others. Nevertheless, these methods face limitations in scalability and the integration of multi-source heterogeneous data, and they often encounter computational efficiency bottlenecks when processing massive remote sensing data. Conversely, deep learning techniques have shown considerable benefits in multi-scale landslide identification tasks, offering excellent automatic feature extraction capabilities [29,30]. They not only reduce reliance on manual feature engineering but also adaptively learn complex features, better integrate multi-source data, and exhibit high efficiency when handling large-scale remote sensing data. As a data-driven approach, deep learning shows great potential in the context of rapidly increasing geological hazard observation data [31,32]. For example, Xu et al. presented FCDU-Net, a feature-constrained deep U-Net model. By combining auxiliary characteristics from the Gray-Level Co-occurrence Matrix (GLCM) and the Normalized Difference Vegetation Index (NDVI), and utilizing the Relief-F algorithm combined with a deep U-Net network for feature selection, they significantly enhanced the accuracy of landslide boundary identification [33]. AMU-Net, a feature improvement framework created by Wei et al., combines multi-scale and attention methods with U-Net [34]. By employing a shifted windows mechanism to enlarge the receptive field during pixel prediction, the framework effectively reduced misjudgment rates at landslide boundaries. These studies not only validate the efficiency of deep learning algorithms in landslide identification but also highlight the critical role of the U-Net architecture and its variants in this field.

However, current deep learning-based landslide identification still faces two prominent challenges: difficulty in identifying small landslides and significant interference from bare land. Small landslides occupy limited pixels in images, lacking sufficient spatial, textual, and contextual information, which makes them easily confused with background noise or other small-sized objects [35,36,37]. Meanwhile, bare land, as a major source of interference, exhibits spectral and textural characteristics highly similar to those of early-stage or small landslides, leading to the “same spectrum but different objects” phenomenon. This becomes a primary source of false detections and further complicates the accurate identification of small landslides [38,39,40].

Innovative recent research provides fresh perspectives on these issues. The PKINet method proposed by Xinhao Cai et al. [41], by incorporating non-dilated multi-scale convolutional kernels, effectively mitigates feature extraction biases caused by background interference and scale fluctuations in general remote sensing object recognition, thereby greatly increasing the precision of tiny item identification in challenging situations. This approach provides an important reference value for solving the problem of small landslide identification. However, its effectiveness in specific landslide scenarios remains to be further verified. On the other hand, in land use classification tasks involving bare land, many studies have utilized attention mechanisms to focus on the features of target objects while effectively suppressing the interference from bare land, thus improving classification accuracy and efficiency. A spectrum-Spatial Attention Network (SSAN), for instance, was proposed by Mei et al. [42]. It combines the convolutional neural network (CNN) for spatial attention with the bidirectional recurrent neural network (Bi-RNN) for spectrum attention. The network achieves outstanding results in high-resolution image classification tasks by adaptively focusing on important characteristics and reducing the influence of background noise thanks to the attention mechanism. Nevertheless, the networks cross-dimensional information fusion still has some limits.

To address the aforementioned limitations, this study proposes a UCTransNet-TPKI model. This model uses UCTransNet, a variant of U-Net, as the backbone architecture [43], and integrates the design of non-dilated multi-scale convolutional kernels from the PKINet module to augment the feature representation capacity for tiny targets. Simultaneously, it integrates the Triplet Attention module [44], which more explicitly captures dependencies between channel and spatial dimensions through rotational operations and a three-branch structure, further improving the model’s discriminative ability for landslide features in complex scenes.

2. Study Area and Data

2.1. Study Area

Chongqing is a city highly prone to landslides within the Three Gorges Reservoir Area (TGRA) [45]. Wushan County, situated in the northeast of Chongqing, covers a total area of approximately 2958 km². Wushan County, located between 30°46′ and 31°28′ N latitude and 109°33′ and 110°11′ E longitude [46]. Wushan County has an average annual temperature of roughly 18 °C and receives 1049 mm of precipitation. The county is surrounded by a complicated tectonic stress field. The physical geography of Wushan County is depicted in Figure 1.

The complex geological structures, active hydrological processes, and frequent human activities in the Wushan region collectively create a setting highly prone to landslide hazards. It is particularly noteworthy that, according to research statistics, small-scale, shallow, and soil-based landslides constitute the majority of disasters, accounting for up to 85.68% of historical landslide events [47]. Morphologically, these small-scale landslides often exhibit faint boundary features and limited surface deformation. Their spectral response is highly similar to that of surrounding land cover types, such as bare land and fallow farmland. This coupling effect of scale differences and interference from bare land significantly hinders the fine-scale identification of landslides using traditional remote sensing methods. To visually represent this key challenge, this study selected a typical area to create a high-resolution image interpretation map, with specific details illustrated in Figure 2. Figure 2c shows the GF-2 satellite imagery of Wushan County, where the circular points represent the distribution of historical landslide points.

2.2. Experimental Data

We obtained satellite imagery of Wushan County captured by the Gaofen-2 (GF-2) satellite in 2022 to investigate the landslide features in the area. The GF-2 satellite has a panchromatic resolution of 3.2 m and a multispectral resolution of 0.8 m. Comprehensive parameters of the imagery are presented in Table 1. Through the fusion of panchromatic and multispectral images (pan-sharpening), we enhanced the spatial resolution to 0.8 m. This resolution upgrade markedly improved the image’s capacity to exhibit details, facilitating a clearer representation of surface characteristics and intricate structures, thereby offering high-quality data support for fine-scale landslide extraction.

To reduce the inherent subjectivity of visual interpretation and ensure high fidelity of the ground truth labels, the landslide inventory in this study was constructed through a multi-stage annotation and validation process. Historical landslide point records were first collected from the Geographic Remote Sensing Ecological Network Platform and used as spatial references to guide polygon-level landslide delineation on high-resolution GF-2 imagery, enabling accurate conversion from point-based records to detailed landslide boundaries. The delineated samples were then examined using multi-temporal Google Earth historical imagery to verify spatial rationality and temporal consistency. In addition, the inventory was supplemented and cross-checked with authoritative geological hazard datasets provided by the Chongqing Bureau of Geology and Mineral Exploration and Development, which were derived from professional geological investigation campaigns and interpreted by experienced geologists. This integrated workflow effectively reduces interpretation subjectivity and ensures the consistency and geological reliability of the ground truth used for model training and evaluation. To ensure that there was no overlap between the subsets and to enable efficient model training and assessment, a 7:2:1 ratio was used to randomly separate the dataset into training, validation, and test sets. In subsequent chapters, we will uniformly refer to this dataset as the Wushan Dataset.

This study chose the Mengdong dataset to confirm our deep learning model’s capacity for generalization [48]. The environmental conditions covered by this dataset—including a subtropical low-latitude plateau monsoon climate, subtropical evergreen broad-leaved forests, and semi-humid vegetation in the Nangunhe River basin—present significant differences from Wushan County’s subtropical monsoon climate and mixed forest vegetation. This offers a variety of test scenarios to assess the model’s flexibility in various environmental settings. The resolution of the Mengdong dataset is 0.5 m, and the data is sourced from SuperView-1. For comprehensive details, please see Table 1.

3. Method

Figure 3 depicts this study’s entire technological structure. First, in the data preprocessing phase, high-resolution GF2 satellite imagery undergoes systematic preprocessing, followed by manual visual interpretation to generate landslide labels. Second, during model construction, we employ the UCTransNet-TPKI framework, which integrates a PKI module to address landslide scale variation and a Triplet Attention mechanism to suppress irrelevant background noise while enhancing edge features. Third, in the model evaluation stage, the performance of the proposed model is examined through both Ablation Studies (AB), which evaluate the contribution of each module, and Comparative Studies (CP), which benchmark our approach against existing methods. Finally, in the results analysis phase, the trained model is implemented in the research region, generating the final landslide distribution map of Wushan. This workflow ensures accurate detection of landslides while addressing the challenges of background interference.

An enhanced UCTransNet architecture serves as the foundation for the model put forward in this work, which is called UCTransNet-TPKI. Its key innovations include the introduction of two critical modules: the Pyramid Kernel Interaction (PKI) module and the Triplet Attention mechanism. The PKI module captures multi-scale local contextual information in parallel under different receptive fields, substantially improving the network’s capacity to identify aspects of landslides at various sizes. Concurrently, the Triplet Attention mechanism effectively refines feature representations by capturing the interaction dependencies across the three dimensions of a feature map (channel, height, and width). This allows the model to concentrate more accurately on critical regions, such as the primary mass and peripheries of a landslide, while concurrently mitigating distractions from ambient noise. Figure 4 depicts the comprehensive architecture of the UCTransNet-TPKI model.

3.1. Model Construction

3.1.1. Original UCTransNet Architecture

The UCTransNet network model employs U-Net as its backbone architecture [43]. However, to enhance the model’s representation capability for complex features and to make the channel interaction between the encoder and decoder more flexible and efficient, it incorporates a Channel Transformer (CTrans) module to supplant the original skip connections in the U-Net.

The CTrans module comprises two sub-modules: the Channel-wise Cross Fusion Transformer (CCT), utilized for fusing multi-scale encoder features, and the Channel-wise Cross-Attention (CCA), which integrates decoder features with the augmented features from the CCT module.

The CCT module itself comprises three parts. The MFE part gathers information from visual features at different scales, represented as Ti (where i = 1, 2, 3, 4), to more effectively fuse features from different levels and improve segmentation accuracy. These features first undergo Layer Normalization (LN) to standardize the input data distribution and reduce variance. Following this, the MCA mechanism uses multiple attention heads to enhance the interaction between features, enabling the model to comprehend more intricate feature interrelationships. Finally, the MLP learns the complex mapping from input to output data, performing non-linear feature transformations in the last few layers of the module to generate the final prediction or feature representation.

3.1.2. Pyramid Kernel Interaction (PKI) Module for Multi-Scale Feature Fusion

To tackle the considerable scale variability of landslides in remote sensing imagery—such as the morphological differences between large and small landslides, which are challenging for traditional single-scale convolutional kernels to handle—we introduced the PKI module. Drawing inspiration from the Inception architecture, this module utilizes parallel multi-branch depthwise separable convolutions to adaptively integrate multi-scale local contextual information. This design also helps prevent the feature sparsity issues that can arise from the use of dilated convolutions, as demonstrated in Figure 5 [41].

This stage employs a small-scale convolutional kernel (default 3 × 3) to capture basic texture features:

L = F_{3 \times 3}^{conv} (X)

(1)

where

X \in R^{C \times H \times W}

denotes the input feature map, and

F_{k \times k}^{conv}

denotes a standard

k \times k

convolution operation.

The Multi-scale Context Capture stage utilizes five parallel groups of depth-wise separable convolutions (Depthwise Conv), with their kernel sizes increasing in an arithmetic progression to generate the kernel size sequence: 5 × 5, 7 × 7, 9 × 9, and 11 × 11. Its computational cost is as follows:

Computational Cost = \sum_{m = 1}^{N} (C \times k_{m}^{2} \times H \times W + C \times 1 \times 1 \times H \times W)

(2)

Multi-scale characteristics are dynamically consolidated by a 1 × 1 convolution. This procedure adjusts weights in the channel dimension, enhancing significant scale aspects while diminishing superfluous information.

P = F_{1 \times 1}^{c o n v} (L + \sum_{m = 1}^{4} Z^{(m)})

(3)

In the context of landslide detection, the PKI module uses the 5 × 5 convolutional kernel to enhance the edge features of small landslides while avoiding background noise. Simultaneously, it utilizes the 11 × 11 kernel to capture the global morphology of large landslides, thus effectively adapting to landslide bodies of different scales.

3.1.3. Triplet Attention for Suppressing Ambiguous Backgrounds and Enhancing Weak Features

To address the “same spectrum, different objects” problem caused by the high spectral similarity between landslides and bare land, we introduce the Triplet Attention mechanism [44]. This mechanism employs a three-branch parallel architecture that establishes cross-dimensional spectral-spatial correlations through rotational operations across C-W, C-H, and H-W interaction branches, achieving lossless spatial feature enhancement, as shown in Figure 6. Compared to traditional attention mechanisms, it avoids information loss caused by dimensionality reduction operations, effectively enhances the perception of weak features such as landslide edges, and simultaneously suppresses background interference from bare land and other sources.

The core of the Triplet Attention mechanism lies in three parallel branches, with each branch specializing in capturing the interactive relationship between a different pair of dimensions.

The C-W interaction branch first rotates the input feature map X 90 degrees along the Height (H) axis, resulting in X₁ with transformed dimensions of W × H × C. Next, a Z-Pool operation compresses X₁ into the feature map X_1p, which has a channel dimension of 2. Then, a K × K followed by a Sigmoid activation function is utilized to produce the attention weights ω₁. The rotated features are ultimately augmented by element-wise multiplication: Y₁ = X₁ ⊗ ω₁.

The C-H interaction branch operates similarly to the C-W branch, but with a different axis of rotation. It rotates the input feature map X 90 degrees along the Width (W) axis to obtain X₂ with dimensions of H × C × W. This is also compressed via Z-Pool into X_2p, which has a channel dimension of 2. Then, a K × K convolution and Sigmoid function generate the attention weights ω₂, leading to the final feature enhancement: Y₂ = X₂ ⊗ ω₂.

The H-W interaction branch: Unlike the two branches above, this branch directly processes the original input feature map X. It first applies Z-Pool to X, compressing it into X₃ with a channel dimension of 2. Then, it similarly uses a convolution and Sigmoid to generate the spatial attention weights ω₃, and directly enhances the original input features through element-wise multiplication: Y₃ = X ⊗ ω₃.

The ultimate output Y is the weighted mean of the enhanced features from the three branches. Before aggregation, an inverse rotation operation is performed on Y₁ and Y₂ (denoted as

\bar{Y_{1}}

and

\bar{Y_{2}}

) to restore their original dimensions.

Y = \frac{1}{3} (\bar{Y_{1}} + \bar{Y_{2}} + Y_{3})

(4)

The Triplet Attention mechanism effectively suppresses interference from bare land by establishing spectral–spatial correlations through its C-H and C-W branches. It utilizes the dimensionality-reduction-free spatial enhancement of its H-W branch to precisely strengthen the weak edge features of small landslides. Furthermore, its computationally lightweight design ensures the capability for real-time processing of high-resolution remote sensing imagery. As a result, this mechanism significantly improves the network’s ability to focus on the morphology of the landslide body while mitigating interference from spectral confusion.

3.1.4. The Encoder

The encoder employs a hierarchical feature extraction strategy through multiple downsampling stages, where each stage consists of fundamental convolutional blocks enhanced with Position-wise Kernel Integration (PKI) modules and Triplet Attention mechanisms. The PKI module strategically integrates multi-scale convolutional kernels at different spatial positions to capture diverse receptive field information, while the Triplet Attention mechanism computes attention weights across channel, height, and width dimensions simultaneously to enhance feature representation capabilities. At the deepest level of the encoder, a Transformer component is integrated to model global contextual dependencies through self-attention mechanisms, enabling the network to capture long-range spatial relationships that are crucial for accurate segmentation boundaries.

3.1.5. The Decoder

The decoder architecture implements a progressive upsampling strategy with sophisticated feature fusion mechanisms. Each decoder stage incorporates skip connections from corresponding encoder levels, where features are carefully aligned and fused through concatenation operations followed by convolutional processing. The decoder maintains the Triplet Attention mechanism at each level to preserve the enhanced feature representation capabilities established in the encoder. The upsampling process utilizes transposed convolutions to gradually restore spatial resolution while maintaining semantic information integrity. The concluding output layer utilizes a 1 × 1 convolution succeeded by sigmoid activation to produce pixel-wise probability maps for binary segmentation tasks. This encoder–decoder design enables effective multi-scale feature extraction in the encoder while ensuring precise spatial localization through the decoder’s progressive reconstruction process, ultimately achieving accurate segmentation results with well-preserved boundary details.

3.2. Model Training and Implementation

When sampling the data, we generated tiles centered on the centroid of each vector feature and segmented them into 256 × 256 pixel patches without any overlap. This process produced 1000 scattered landslide samples for training, validation, and test sets.

To further enhance the generalization capability of the model, data augmentation strategies were incorporated during training to increase sample diversity. In addition to geometric transformations such as random flipping, rotation, and scaling, mild photometric variations were introduced to account for differences in illumination conditions commonly encountered in high-resolution remote sensing imagery. These augmentations help improve model robustness while preserving the underlying spatial characteristics of landslide features.

To improve the training procedure of the UCTransNet-TPKI model, this research configured the starting learning rate to 1 × 10⁻⁴, set the batch size to 6, and employed the Adam optimizer for updating the parameters. During training, the model was progressively optimized over multiple epochs, with the loss function value gradually decreasing as the number of iterations increased. The model underwent training for a cumulative total of 500 epochs. The corresponding loss curves and IoU curves can be found in Appendix A. All training tasks were performed on an NVIDIA GeForce RTX 4070 GPU. Table 2 presents the hyperparameters of the UCTransNet-TPKI model.

3.3. Performance Evaluation Metrics

To statistically evaluate our model’s performance, we calculated four essential metrics derived from the confusion matrix: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).

(1): Precision: This denotes the proportion of accurately detected positive examples among all instances projected as positive for a particular class. The calculation is performed as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(5)

(2): Recall: Denotes the ratio of samples predicted as positive for each class to the actual positive samples for that class in the labeled dataset, computed using the subsequent formula:

R e c a l l = \frac{T P}{T P + F N}

(6)

(3): F1-Score: Provides a balance between accuracy and recall by taking the harmonic mean of the two measures. The formula is as follows:

F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(7)

(4): Intersection over Union (IoU): Representing the degree of overlap between the actual labeled region and the predicted region, calculated using the following formula:

I o U = \frac{T P}{T P + F P + F N}

(8)

4. Results

The superiority of the suggested model for multi-scale landslide recognition will be demonstrated in this section by comprehensive ablation research and comparative tests. In the ablation study, we systematically eliminated the PKI module, which manages landslide size variation, and the Triplet Attention module, which mitigates bare land interference. We evaluated the classification performance on the Mengdong dataset and the self-constructed Wushan dataset in the comparison experiment, and we evaluated the classification outcomes of UCTransNet-TPKI with those of alternative deep learning models. Ultimately, we evaluated the comprehensive extraction outcomes for Wushan County.

4.1. Ablation Study

We conducted a number of ablation experiments to assess the efficacy of various components inside the UCTransNet framework for landslide identification. The comprehensive UCTransNet-TPKI model was first utilized as the basis for subsequent evaluations.

(1): Full model experiment

The dataset was used to train and evaluate the whole UCTransNet-TPKI model, which includes the Triplet Attention and PKI modules. The obtained performance served as the benchmark for the following ablation studies.

(2): Without the Triplet Attention module

To assess the role of Triplet Attention in suppressing background interference, this module was removed, and the model was constructed using UCTransNet combined with the PKI module. Training and testing were then conducted under the same conditions to evaluate the module’s contribution.

(3): Without the PKI module

To examine the impact of the PKI module on multi-scale feature representation, we eliminated this component while preserving UCTransNet with the Triplet Attention method. The model was appropriately fine-tuned and underwent the same experimental method.

4.1.1. Ablation Study on Wushan Dataset

The ablation experiment results on the Wushan dataset are summarized in Table 3 and illustrated in Figure 7. Compared with the baseline UCTransNet, incorporating either the Triplet Attention mechanism or the PKI module individually yields consistent improvements in Precision, F1-score, and IoU, confirming that Triplet Attention enhances feature discrimination while the PKI module strengthens multi-scale feature representation. When both modules are integrated in UCTransNet-TPKI, the model achieves the optimal overall performance, with F1-score increasing from 0.8888 to 0.9008, IoU improving from 0.8151 to 0.8252, and Accuracy reaching 0.9806. The visualization results corroborate these findings: the baseline UCTransNet exhibits substantial missed and incorrect detections, while the addition of the PKI module reduces such errors through richer multi-scale feature capture. Likewise, the Triplet Attention mechanism sharpens boundaries and improves segmentation accuracy, particularly for fine structures. Ultimately, UCTransNet-TPKI generates the most distinct segmentation boundaries, with forecasts that closely correspond to the actual data, demonstrating the complementary and synergistic impact of the two modules in improving overall segmentation performance.

4.1.2. Ablation Study on Mengdong Dataset

We performed the same set of ablation trials on the Mengdong dataset in order to assess the model’s capacity for generalization. The findings, which are compiled in Table 4, unequivocally show the Triplet Attention and PKI modules’ additional contributions in comparison to the baseline UCTransNet. The baseline UCTransNet achieves a Precision of 0.8825, an F1-score of 0.9103, an IoU of 0.8365, and an Accuracy of 0.9631. Incorporating the Triplet Attention mechanism increases the Precision to 0.8912, the F1-score to 0.9150, the IoU to 0.8437, and the Accuracy to 0.9658, indicating consistent improvements across all metrics. The addition of the PKI module leads to more substantial gains, raising the Precision to 0.8950, the F1-score to 0.9180, the IoU to 0.8455, and the Accuracy to 0.9765. Most importantly, integrating both modules within UCTransNet-TPKI yields the best overall performance, with Precision reaching 0.9015, F1-score 0.9230, IoU 0.8560, and Accuracy 0.9780. The visualization results in Figure 8 corroborate these findings, showing that the baseline UCTransNet produces large omission and commission errors, while the PKI module reduces missed detections through improved multi-scale feature integration, and the Triplet Attention mechanism enhances boundary delineation and reduces false positives. Ultimately, the full UCTransNet-TPKI model achieves predictions that most closely align with the ground truth, minimizing both types of errors, sharpening boundaries, and capturing fine-scale topographic structures. These results validate the complementary roles of Triplet Attention and PKI, demonstrating their synergistic effectiveness in improving cross-regional landslide extraction.

4.1.3. Visual Analysis of Feature Attention

To further explore changes in the model’s attention to landslide boundaries and bare soil interference regions, Grad-CAM heatmap visualization is employed [49,50]. Figure 9 shows a visual comparison of Grad-CAM activation maps for landslide segmentation before and after introducing Triplet Attention. From top to bottom, the figure shows the original remote sensing images, the corresponding ground truth masks, the Grad-CAM activation maps of the baseline model, and those of the baseline model enhanced with Triplet Attention. Compared with the baseline, the model incorporating Triplet Attention exhibits more spatially concentrated activation within landslide regions and reduced responses in surrounding background areas across different landslide shapes and scene complexities. This visualization provides feature-level evidence that Triplet Attention guides the model to focus more on landslide-related regions while suppressing background interference.

4.2. Comparative Study

We contrasted the UCTransNet-TPKI model with other cutting-edge segmentation methods in order to assess its overall performance, including MFFENet [51], TransLandSeg [52], and Segformer++ [53], using both the Wushan dataset and the Mengdong dataset. Since these models are well-known in the field of remote sensing picture segmentation and offer typical frameworks for evaluating segmentation performance, they were chosen as comparison baselines.

(1): MFFENet: The Multi-scale Feature Fusion Encoder–Decoder Network (MFFENet) integrates an Adaptive Triangle Fork (ATF) module to selectively combine features from multiple scales, along with a dense top-down feature pyramid structure. This methodology improves the network’s capacity to capture intricate local characteristics and overarching context, tackling issues such as significant intra-class variation and substantial scale discrepancies in remote sensing picture segmentation.
(2): TransLandSeg: TransLandSeg is a transfer learning–based landslide segmentation framework built on a vision foundation model. It introduces an Adaptive Transfer Learning module to adapt the general segmentation capability of SAM to landslide scenes by training only a small fraction of parameters, enabling efficient knowledge transfer and competitive segmentation performance.
(3): Segformer++: Segformer++ is an efficient transformer-based segmentation architecture that extends Segformer by introducing token-merging strategies to reduce computational complexity. Adaptively merging similar tokens within the hierarchical encoder improves efficiency for high-resolution semantic segmentation while largely preserving global contextual representation.

4.2.1. Comparative Study on Wushan Dataset

On the Wushan dataset, the proposed UCTransNet-TPKI model demonstrates exceptional superiority, as shown in Table 5. It achieved the highest F1-score of 0.9008 and IoU of 0.8252, thereby outperforming MFFENet, TransLandSeg, and Segformer++ by significant margins of 4.62%, 4.29%, and 2.10% in IoU, respectively. Although Segformer++ yields a marginally higher Precision of 0.9056 compared to 0.8936 for our model, its Accuracy of 0.9387 is substantially inferior to the 0.9806 secured by UCTransNet-TPKI, indicating a trade-off resulting in under-segmentation, whereas our method maintains the optimal balance between precision and recall. This quantitative advantage is strongly corroborated by the visual comparisons in Figure 10, where competing models like MFFENet and TransLandSeg exhibit extensive omission errors (highlighted in green), and Segformer++ suffers from fragmented boundaries; in contrast, UCTransNet-TPKI produces cohesive predictions closely aligned with the ground truth, effectively minimizing both omission and commission errors while preserving fine-scale topographic structures in complex terrain.

4.2.2. Comparative Study on Mengdong Dataset

Similar performance trends are observed on the Mengdong dataset, as shown in Table 6, where the proposed UCTransNet-TPKI secures top-tier results with an F1-score of 0.9230, IoU of 0.8560, and Accuracy of 0.9780. It surpasses competitive baselines, including MFFENet, Segformer++, and TransLandSeg, achieving notable IoU gains of 1.26%, 3.46%, and 4.72%, respectively. Although Segformer++ and TransLandSeg record higher Precision values of 0.9137 and 0.9104 compared to 0.9015 for our method, their inferior F1-scores and IoU values indicate a lack of segmentation consistency and completeness. This is visually evident in Figure 11, where TransLandSeg and Segformer++ exhibit substantial commission errors highlighted in red, particularly in boundary regions such as those in Row 4, while MFFENet displays noticeable omission errors shown in green within landslide interiors. In contrast, UCTransNet-TPKI demonstrates exceptional robustness, producing accurate segmentation masks that align closely with the Ground Truth and effectively minimize both false positives and negatives, thereby validating its generalization capability across different geological environments.

4.3. Model Efficiency and Robustness Analysis

To comprehensively evaluate the practical viability of the proposed method, we conducted a two-fold analysis focusing on computational efficiency and statistical robustness.

4.3.1. Computational Efficiency and Complexity

To further address the trade-off between segmentation accuracy and computational complexity, we evaluate the efficiency of different UCTransNet variants in terms of the number of parameters, floating-point operations (FLOPs), and inference time. This analysis aims to verify whether the performance improvements of the proposed method are achieved at the expense of increased computational cost.

Table 7 summarizes the efficiency comparison results on the Wushan dataset. Compared with the original UCTransNet, the proposed UCTransNet-TPKI demonstrates a substantial reduction in model size, with the number of parameters decreasing from 66.24 M to 19.56 M. At the same time, the inference time is reduced from 23.68 ms to 16.79 ms, while an overall IoU improvement of 1.01% is achieved. These results indicate that the performance gains of UCTransNet-TPKI are not obtained by simply increasing model complexity.

Although the introduction of the Pyramid Kernel Interaction module leads to a moderate increase in FLOPs due to multi-scale kernel interactions, this increase does not translate into higher inference latency. This suggests that the proposed architecture maintains an efficient computational structure in practice. Furthermore, the Triplet Attention mechanism introduces minimal additional parameters, owing to its lightweight and parameter-efficient design.

Overall, the efficiency analysis demonstrates that UCTransNet-TPKI achieves a favorable balance between accuracy improvement and computational cost. The observed segmentation gains are primarily attributed to enhanced feature representation efficiency rather than increased computational burden, supporting the economic viability of the proposed method for large-scale landslide mapping and practical deployment scenarios.

4.3.2. Statistical Significance and Robustness

Beyond comparing mean performance metrics, we further validated the reliability of our proposed method. Considering that small fluctuations can arise from random initialization, we employed statistical testing to confirm that the superiority of our model is genuine and robust, rather than an artifact of randomness.

Table 8 presents the performance comparison across five random seeds. To validate the statistical significance of these results, we conducted an independent two-sample t-test. The results yielded t-values of 11.41 for IoU and 9.18 for F1-score. With both p-values falling below 0.001, these results confirm that UCTransNet-TPKI provides a statistically significant improvement over the baseline UCTransNet. This rigorous validation demonstrates that the proposed method exhibits strong robustness to random initializations, ensuring reliable performance in practical deployment.

4.4. Extraction Results in Wushan County

Figure 12 presents the final landslide inventory map of Wushan County, generated based on the UCTransNet-TPKI model. Spatially, the model’s identification results exhibit a high degree of overlap with historical landslide records in many areas, robustly demonstrating the model’s accuracy and effectiveness in identifying known landslide locations. More importantly, the map reveals a large number of landslide areas newly discovered by the model that were not recorded in the historical data. These new detections indicate that the method can generate a more comprehensive and detailed landslide inventory than existing records, effectively supplementing their deficiencies. Overall, this map not only validates the model’s reliability but also highlights its powerful capability for discovering and mapping uncatalogued or recent landslide events, which holds significant practical value for the region’s geological hazard assessment and risk management.

5. Discussion

The core contribution of this study lies in the effective integration of the PKI module and the Triplet Attention mechanism into the UCTransNet architecture, forming an optimized solution for landslide identification tasks. The exclusive utilization of the PKI module mainly improves the model’s capacity to identify multi-scale targets. Incorporating Triplet Attention alone significantly improves the suppression of background interference. It is noteworthy that the combined use of both modules demonstrates a remarkable synergistic effect: the multi-scale features provided by the PKI module offer richer input information for Triplet Attention, while the latter’s cross-dimensional interaction mechanism further optimizes the representation quality of multi-scale features. The outcomes of the ablation trials unequivocally demonstrate the presence of this synergistic impact.

In order to confirm the suggested method’s capacity for generalization, this study further conducted tests on the Mengdong dataset, which differs markedly from the Wushan dataset in geographic environment and disaster triggers. Experimental results show that UCTransNet-TPKI still maintains excellent performance on the Mengdong dataset, demonstrating that the integrated method proposed in this study possesses strong cross-domain adaptability. This generalization capability mainly stems from the universal design of the PKI module and the Triplet Attention mechanism: the multi-scale feature extraction ability of the PKI module does not rely on specific geological environments, while the cross-dimensional interaction mechanism of Triplet Attention can adapt to different spectral-spatial feature combinations. This establishes a technological basis for expanding the integrated technique to further regions and different types of geological hazard identification tasks. Furthermore, the proposed approach can be extended to other remote sensing recognition tasks with similar challenges, such as multi-scale object detection and complex background interference in debris flow identification [54], or precise extraction of buildings at varying scales in urban change detection [55]. The multi-scale perception capability of the PKI module and the interference suppression mechanism of Triplet Attention provide a practical technical reference for such applications.

Although UCTransNet-TPKI demonstrates promising performance in landslide identification, several limitations remain that warrant further investigation. First, the current approach primarily relies on optical remote sensing imagery and does not fully incorporate multi-source environmental information such as topography, geological conditions, and meteorological factors [56]. Since landslide occurrence is inherently influenced by a combination of geomorphological, geological, and climatic conditions, reliance on a single data modality may limit the model’s discriminative capability in complex scenarios, particularly in regions with subtle surface expressions or severe spectral ambiguity. Second, the present framework focuses on static landslide identification and does not explicitly consider temporal evolution characteristics. Landslides are dynamic processes that often exhibit progressive deformation prior to failure, and neglecting temporal information may restrict the model’s ability to capture early-stage instability signals and transitional patterns [57].

Based on these limitations, future research can be directed toward several promising directions. Multi-modal data fusion represents an important avenue where optical imagery can be jointly analyzed with SAR data, digital elevation models (DEMs), and geological maps to provide a more comprehensive representation of landslide-prone environments. In addition, extending the current static segmentation framework toward time-series-based modeling could enable continuous monitoring of landslide development processes and improve early warning capabilities. By incorporating temporal dynamics and multi-source environmental constraints, future models may achieve greater robustness and reliability in diverse geological settings. Such advancements would further enhance the practical applicability of automated landslide identification technologies and provide stronger technical support for geological hazard prevention and mitigation efforts.

6. Conclusions

To address the challenges of multi-scale landslide recognition in high-resolution remote sensing imagery—particularly the low accuracy for small-scale landslides due to their weak features and easy confusion with backgrounds like bare land—this study introduces an improved UCTransNet model (UCTransNet-TPKI) that integrates a Pyramid Kernel Interaction (PKI) module and a Triplet Attention mechanism.

The following are the study’s primary conclusions:

(1): The UCTransNet-TPKI model, through its module integration, successfully enhances landslide recognition capabilities. The PKI module effectively captures the morphological features of different-sized landslides via parallel multi-scale convolutions, addressing the issue of scale variation. The Triplet Attention method enhances the model’s detection of weak edges and greatly increases its capacity to differentiate landslides from spectrally identical backdrops (such as barren land) because of its distinctive cross-dimensional interaction design.
(2): On the Wushan County dataset, which is dominated by small-scale landslides, UCTransNet-TPKI outperformed the baseline UCTransNet model and other module combinations across all key evaluation metrics. The ablation studies and visualization results strongly demonstrate that the synergy between the PKI module and Triplet Attention is the key to this performance breakthrough.
(3): The model also demonstrated consistent performance advantages on the Mengdong dataset, which has significant differences in its geographical environment and disaster causality. This indicates that the model is not limited to a specific region and possesses strong robustness and potential for broader application.

Author Contributions

Conceptualization, M.W., W.D. and M.L.; Methodology, M.W., W.D., M.L. and X.L.; Validation, M.W., W.D. and M.L.; Formal analysis, M.W.; Investigation, M.W., W.D., M.L. and Z.L.; Data curation, M.W. and H.L.; Writing—original draft, M.W., W.D. and M.L.; Writing—review & editing, M.W., W.D., M.L. and Y.W.; Visualization, M.W., W.D., M.L., Y.W. and H.L.; Funding acquisition, W.D., M.L. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Deep Earth Probe and Mineral Resources Exploration—National Science and Technology Major Project of China (No. 2024ZD1001100), Key Laboratory of Airborne Geophysics and Remote Sensing Geology Foundation (No. 2023YFL18), and the Programs of China Geological Survey (No. DD20230800402).

Data Availability Statement

The article goes into depth about this study’s original contributions. Please contact the appropriate author with any further questions.

Acknowledgments

The authors wish to convey their appreciation to the anonymous reviewers for their perceptive critiques, which improved the paper’s scientific quality. Historical landslide point records is provided by the Geographic Remote Sensing Ecological Network Platform (www.gisrs.cn) (accessed on 1 June 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

The training dynamics of UCTransNet-TPKI on the Wushan dataset are shown in Figure A1. The loss curve on the left displays a rapid decline in the initial phase, followed by a plateau, with training loss and validation loss closely tracking each other. Meanwhile, the IoU trajectory on the right shows that the model achieved effective convergence around the 300th epoch and maintained stability throughout the remaining 500 epochs.

Figure A1. Training convergence behavior of UCTransNet-TPKI (Loss and IoU curves).

References

Shrestha, M.; Sharma, S.; Pradhan Shrestha, R. Landslides in the Himalayas: A Comprehensive Review of Hazards, Impacts, and Adaptive Strategies. Rural Reg. Dev. 2025, 3, 10002. [Google Scholar] [CrossRef]
Alcántara-Ayala, I. Landslides in a Changing World. Landslides 2025, 22, 2851–2865. [Google Scholar] [CrossRef]
Alimohammadlou, Y.; Najafi, A.; Yalcin, A. Landslide Process and Impacts: A Proposed Classification Method. Catena 2013, 104, 219–232. [Google Scholar] [CrossRef]
Geertsema, M.; Highland, L.; Vaugeouis, L. Environmental Impact of Landslides. In Landslides—Disaster Risk Reduction; Sassa, K., Canuti, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 589–607. ISBN 978-3-540-69970-5. [Google Scholar]
Kumari, S.; Agarwal, S.; Agrawal, N.K.; Agarwal, A.; Garg, M.C. A Comprehensive Review of Remote Sensing Technologies for Improved Geological Disaster Management. Geol. J. 2024, 60, 223–235. [Google Scholar] [CrossRef]
Ma, Z.; Mei, G.; Piccialli, F. Machine Learning for Landslides Prevention: A Survey. Neural Comput. Appl. 2021, 33, 10881–10907. [Google Scholar] [CrossRef]
He, H.; Wang, W.; Wang, Z.; Li, S.; Chen, J. Enhancing Seismic Landslide Susceptibility Analysis for Sustainable Disaster Risk Management through Machine Learning. Sustainability 2024, 16, 3828. [Google Scholar] [CrossRef]
Guzzetti, F.; Mondini, A.C.; Cardinali, M.; Fiorucci, F.; Santangelo, M.; Chang, K.-T. Landslide Inventory Maps: New Tools for an Old Problem. Earth-Sci. Rev. 2012, 112, 42–66. [Google Scholar] [CrossRef]
Fan, X.; Yunus, A.P.; Scaringi, G.; Catani, F.; Siva Subramanian, S.; Xu, Q.; Huang, R. Rapidly Evolving Controls of Landslides After a Strong Earthquake and Implications for Hazard Assessments. Geophys. Res. Lett. 2021, 48, e2020GL090509. [Google Scholar] [CrossRef]
Dell’Acqua, F.; Gamba, P. Remote Sensing and Earthquake Damage Assessment: Experiences, Limits, and Perspectives. Proc. IEEE 2012, 100, 2876–2890. [Google Scholar] [CrossRef]
Prakash, N.; Manconi, A.; Loew, S. A New Strategy to Map Landslides with a Generalized Convolutional Neural Network. Sci. Rep. 2021, 11, 9722. [Google Scholar] [CrossRef]
Liu, R.; Li, L.; Pirasteh, S.; Lai, Z.; Yang, X.; Shahabi, H. The Performance Quality of LR, SVM, and RF for Earthquake-Induced Landslides Susceptibility Mapping Incorporating Remote Sensing Imagery. Arab. J. Geosci. 2021, 14, 259. [Google Scholar] [CrossRef]
Guo, X.; Fu, B.; Du, J.; Shi, P.; Li, J.; Li, Z.; Du, J.; Chen, Q.; Fu, H. Monitoring and Assessment for the Susceptibility of Landslide Changes After the 2017 Ms 7.0 Jiuzhaigou Earthquake Using the Remote Sensing Technology. Front. Earth Sci. 2021, 9, 633117. [Google Scholar] [CrossRef]
Casagli, N.; Intrieri, E.; Tofani, V.; Gigli, G.; Raspini, F. Landslide Detection, Monitoring and Prediction with Remote-Sensing Techniques. Nat. Rev. Earth Environ. 2023, 4, 51–64. [Google Scholar] [CrossRef]
Achariyaviriya, W.; Kondo, T.; Karnjana, J.; Nishio, T. Landslide Semantic Segmentation Using Satellite Imagery. In Proceedings of the 2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Prachuap Khiri Khan, Thailand, 24–27 May 2022; pp. 1–4. [Google Scholar]
Carle, E.; Sirguey, P.; Cox, S.C. Measuring Landslide-Driven Ground Displacements with High-Resolution Surface Models and Optical Flow. Comput. Geosci. 2023, 178, 105378. [Google Scholar] [CrossRef]
Li, Z.; Guo, Y. Semantic Segmentation of Landslide Images in Nyingchi Region Based on PSPNet Network. In Proceedings of the 2020 7th International Conference on Information Science and Control Engineering (ICISCE), Changsha, China, 18–20 December 2020; pp. 1269–1273. [Google Scholar]
Chen, D.; Kang, J.; Wang, L.; Yu, Y.; Zhou, W.; Guan, H.; Karim, M. SACNet: A Novel Self-Supervised Learning Method for Shadow Detection from High-Resolution Remote Sensing Images. J. Geovisualization Spat. Anal. 2025, 9, 14. [Google Scholar] [CrossRef]
Fang, Z.; Wang, Y.; Peng, L.; Hong, H. Integration of Convolutional Neural Network and Conventional Machine Learning Classifiers for Landslide Susceptibility Mapping. Comput. Geosci. 2020, 139, 104470. [Google Scholar] [CrossRef]
Huang, F.; Cao, Z.; Guo, J.; Jiang, S.-H.; Li, S.; Guo, Z. Comparisons of Heuristic, General Statistical and Machine Learning Models for Landslide Susceptibility Prediction and Mapping. Catena 2020, 191, 104580. [Google Scholar] [CrossRef]
Tehrani, F.S.; Calvello, M.; Liu, Z.; Zhang, L.; Lacasse, S. Machine Learning and Landslide Studies: Recent Advances and Applications. Nat. Hazards 2022, 114, 1197–1245. [Google Scholar] [CrossRef]
Zhou, C.; Yin, K.; Cao, Y.; Ahmed, B.; Li, Y.; Catani, F.; Pourghasemi, H.R. Landslide Susceptibility Modeling Applying Machine Learning Methods: A Case Study from Longju in the Three Gorges Reservoir Area, China. Comput. Geosci. 2018, 112, 23–37. [Google Scholar] [CrossRef]
Holloway, J.; Mengersen, K. Statistical Machine Learning Methods and Remote Sensing for Sustainable Development Goals: A Review. Remote Sens. 2018, 10, 1365. [Google Scholar] [CrossRef]
Sameen, M.I.; Pradhan, B.; Lee, S. Application of Convolutional Neural Networks Featuring Bayesian Optimization for Landslide Susceptibility Assessment. Catena 2020, 186, 104249. [Google Scholar] [CrossRef]
Guarnieri, A.; Masiero, A.; Vettore, A.; Pirotti, F. Evaluation of the Dynamic Processes of a Landslide with Laser Scanners and Bayesian Methods. Geomat. Nat. Hazards Risk 2015, 6, 614–634. [Google Scholar] [CrossRef][Green Version]
Nhu, V.-H.; Mohammadi, A.; Shahabi, H.; Ahmad, B.B.; Al-Ansari, N.; Shirzadi, A.; Geertsema, M.; Kress, V.R.; Karimzadeh, S.; Valizadeh Kamran, K.; et al. Landslide Detection and Susceptibility Modeling on Cameron Highlands (Malaysia): A Comparison between Random Forest, Logistic Regression and Logistic Model Tree Algorithms. Forests 2020, 11, 830. [Google Scholar] [CrossRef]
Dou, J.; Yunus, A.P.; Bui, D.T.; Merghadi, A.; Sahana, M.; Zhu, Z.; Chen, C.-W.; Han, Z.; Pham, B.T. Improved Landslide Assessment Using Support Vector Machine with Bagging, Boosting, and Stacking Ensemble Machine Learning Framework in a Mountainous Watershed, Japan. Landslides 2020, 17, 641–658. [Google Scholar] [CrossRef]
Chen, F.; Yu, B.; Li, B. A Practical Trial of Landslide Detection from Single-Temporal Landsat8 Images Using Contour-Based Proposals and Random Forest: A Case Study of National Nepal. Landslides 2018, 15, 453–464. [Google Scholar] [CrossRef]
Jiang, P.; Ma, Z.; Mei, G. Review Article: Deep Learning for Potential Landslide Identification: Data, Models, Applications, Challenges, and Opportunities. Nat. Hazards Earth Syst. Sci. 2026, 26(1), 487–529. [Google Scholar] [CrossRef]
Yu, B.; Li, J.; Huang, X. STSNet: A Cross-Spatial Resolution Multi-Modal Remote Sensing Deep Fusion Network for High Resolution Land-Cover segmentation. Inf. Fusion 2025, 114, 102689. [Google Scholar] [CrossRef]
Wang, J.; Sun, P.; Chen, L.; Yang, J.; Liu, Z.; Lian, H. Recent Advances of Deep Learning in Geological Hazard Forecasting. CMES—Comput. Model. Eng. Sci. 2023, 137, 1381–1418. [Google Scholar] [CrossRef]
Ma, Z.; Mei, G. Deep Learning for Geological Hazards Analysis: Data, Models, Applications, and Opportunities. Earth-Sci. Rev. 2021, 223, 103858. [Google Scholar] [CrossRef]
Xu, G.; Wang, Y.; Wang, L.; Soares, L.P.; Grohmann, C.H. Feature-Based Constraint Deep CNN Method for Mapping Rainfall-Induced Landslides in Remote Regions with Mountainous Terrain: An Application to Brazil. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 2644–2659. [Google Scholar] [CrossRef]
Wei, R.; Ye, C.; Sui, T.; Zhang, H.; Ge, Y.; Li, Y. A Feature Enhancement Framework for Landslide Detection. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103521. [Google Scholar] [CrossRef]
Wang, Q.; Sun, L.; Chen, Y. The Influence and Improvement of a Deep Learning-Based Uncertainty Model Integrating Multi-Scale Information in Landslide Detection. In Proceedings of the IGARSS 2024—2024 IEEE International Geoscience and Remote Sensing Symposium, Athens, Greece, 7–12 July 2024; pp. 10413–10416. [Google Scholar]
Yu, B.; Zhu, M.; Chen, F.; Wang, N.; Zhao, H.; Wang, L. Multi-Scale Differential Network for Landslide Extraction from Remote Sensing Images with Different Scenarios. Int. J. Digit. Earth 2024, 17, 2441920. [Google Scholar] [CrossRef]
Liu, X.; Xu, L.; Zhang, J. Landslide Detection with Mask R-CNN Using Complex Background Enhancement Based on Multi-Scale Samples. Geomat. Nat. Hazards Risk 2024, 15, 2300823. [Google Scholar] [CrossRef]
Guo, S.; Li, B.; Wu, X.; Niu, R.; Wu, W. Landslide Detection Based on Differential Fusion of Multi-Level Features From Optical Remote Sensing Images and Topographical Data. Trans. GIS 2025, 29, e70046. [Google Scholar] [CrossRef]
Zhong, C.; Liu, Y.; Gao, P.; Chen, W.; Li, H.; Hou, Y.; Nuremanguli, T.; Ma, H. Landslide Mapping with Remote Sensing: Challenges and Opportunities. Int. J. Remote Sens. 2019, 41, 1555–1581. [Google Scholar] [CrossRef]
Martha, T.R.; Kerle, N.; Jetten, V.; van Westen, C.J.; Kumar, K.V. Characterising Spectral, Spatial and Morphometric Properties of Landslides for Semi-Automatic Detection Using Object-Oriented Methods. Geomorphology 2010, 116, 24–36. [Google Scholar] [CrossRef]
Cai, X.; Lai, Q.; Wang, Y.; Wang, W.; Sun, Z.; Yao, Y. Poly Kernel Inception Network for Remote Sensing Detection. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 27706–27716. [Google Scholar]
Mei, X.; Pan, E.; Ma, Y.; Dai, X.; Huang, J.; Fan, F.; Du, Q.; Zheng, H.; Ma, J. Spectral-Spatial Attention Networks for Hyperspectral Image Classification. Remote Sens. 2019, 11, 963. [Google Scholar] [CrossRef]
Wang, H.; Cao, P.; Wang, J.; Zaiane, O.R. UCTransNet: Rethinking the Skip Connections in U-Net from a Channel-Wise Perspective with Transformer. In Proceedings of the AAAI Conference on Artificial Intelligence; PKP PS: Burnaby, BC, Canada, 2022; Volume 36, pp. 2441–2449. [Google Scholar] [CrossRef]
Misra, D.; Nalamada, T.; Arasanipalai, A.U.; Hou, Q. Rotate to Attend: Convolutional Triplet Attention Module. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 3139–3148. [Google Scholar]
Wen, H.; Huang, J.; Qian, L.; Li, Z.; Zhang, Y.; Zhang, J. The Spatial-Temporal Evolution Patterns of Landslide-Oriented Resilience in Mountainous City: A Case Study of Chongqing, China. J. Environ. Manag. 2024, 370, 122963. [Google Scholar] [CrossRef]
Guo, Y.; Song, W. Spatial Distribution and Simulation of Cropland Abandonment in Wushan County, Chongqing, China. Sustainability 2019, 11, 1367. [Google Scholar] [CrossRef]
Liao, M.; Wen, H.; Yang, L. Identifying the Essential Conditioning Factors of Landslide Susceptibility Models under Different Grid Resolutions Using Hybrid Machine Learning: A Case of Wushan and Wuxi Counties, China. Catena 2022, 217, 106428. [Google Scholar] [CrossRef]
Xu, Y.; Ouyang, C.; Xu, Q.; Wang, D.; Zhao, B.; Luo, Y. CAS Landslide Dataset: A Large-Scale and Multisensor Dataset for Deep Learning-Based Landslide Detection. Sci. Data 2024, 11, 12. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Li, Q.; Lu, J.; Zheng, K.; Wei, L.; Xiang, Q. A Transfer Learning Remote Sensing Landslide Image Segmentation Method Based on Nonlinear Modeling and Large Kernel Attention. Appl. Sci. 2025, 15, 3855. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Xu, Q.; Ouyang, C.; Jiang, T.; Yuan, X.; Fan, X.; Cheng, D. MFFENet and ADANet: A Robust Deep Transfer Learning Method and Its Application in High Precision and Fast Cross-Scene Recognition of Earthquake-Induced Landslides. Landslides 2022, 19, 1617–1647. [Google Scholar] [CrossRef]
Hou, C.; Yu, J.; Ge, D.; Yang, L.; Xi, L.; Pang, Y.; Wen, Y. A Transfer Learning Approach for Landslide Semantic Segmentation Based on Visual Foundation Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 11561–11572. [Google Scholar] [CrossRef]
Kienzle, D.; Kantonis, M.; Schön, R.; Lienhart, R. Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation. In Proceedings of the 2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA, 7–9 August 2024; pp. 75–81. [Google Scholar]
Liu, Q.; Wang, T.; Zheng, Z.; Wang, B. A Method for Identifying Gully-Type Debris Flows Based on Adaptive Multi-Scale Feature Extraction. Geomat. Nat. Hazards Risk 2025, 16, 2502593. [Google Scholar] [CrossRef]
Liu, Z.; Cui, S.; Yan, Q. Building Extraction from High Resolution Satellite Imagery Based on Multi-Scale Image Segmentation and Model Matching. In Proceedings of the 2008 International Workshop on Earth Observation and Remote Sensing Applications, Beijing, China, 30 June–2 July 2008; pp. 1–7. [Google Scholar]
Mantovani, J.R.; Bueno, G.T.; Alcântara, E.; Park, E.; Cunha, A.P.; Londe, L.; Massi, K.; Marengo, J.A. Novel Landslide Susceptibility Mapping Based on Multi-Criteria Decision-Making in Ouro Preto, Brazil. J. Geovisualization Spat. Anal. 2023, 7, 7. [Google Scholar] [CrossRef]
Cai, J.; Liu, G.; Jia, H.; Zhang, B.; Wu, R.; Fu, Y.; Xiang, W.; Mao, W.; Wang, X.; Zhang, R. A New Algorithm for Landslide Dynamic Monitoring with High Temporal Resolution by Kalman Filter Integration of Multiplatform Time-Series InSAR Processing Kalman. Int. J. Appl. Earth Obs. Geoinf. 2022, 110, 102812. [Google Scholar] [CrossRef]

Figure 1. Study area location map. (a) The location of Chongqing in China; (b) The geographical location of Wushan County in Chongqing; (c) The GF-2 satellite coverage map of Wushan County and the locations of historical landslide points.

Figure 2. Schematic diagram of landslide and bare land. (a,c) Real-world visual data (GF-2 satellite imagery). (b,d) Landslide and bare land labels.

Figure 3. Technical workflow for multi-scale landslide detection using UCTransNet-TPKI.

Figure 4. Illustrates the overall architecture of the UCTransNet-TPKI model.

Figure 5. Principle of the PKI (Pyramid Kernel Interaction) module technology.

Figure 6. Principle of the Triplet Attention module technology.

Figure 7. Ablation experiment results on the Wushan dataset.

Figure 8. Ablation experiment results on the Mengdong dataset.

Figure 9. Comparison of Grad-CAM activation maps showing changes in model focus after incorporating Triplet Attention. The white areas represent the landslide boundaries.

Figure 10. Comparison experiment results on the Wushan dataset.

Figure 11. Comparison experiment results on the Mengdong dataset.

Figure 12. Landslide Inventory Map for Wushan based on the UCTransNet-TPKI Model.

Table 1. Information on the datasets used in this study.

Dataset Name	Quantity	Sensor	Resolution
Wushan Dataset	1000	GF-2	0.8 m
Mengdong Dataset	1155	SuperView-1	0.5 m

Table 2. Hyperparameters of the UCTransNet-TPKI model.

Hyperparameter	Value
Batch size	6
Optimizer	Adam
Adam β₁, β₂	0.9, 0.999
Initial Learning Rate	1 × 10⁻⁴
Epoch	500
Learning Rate Scheduling Strategy	ReduceLROnPlateau
Loss Function Specifics	Weighted Dice-BCE Loss

Table 3. Results of the ablation study model (Wushan dataset).

Dataset	Model	Precision	F1	IoU	Acc
Wushan Dataset	UCTransNet	0.8850	0.8888	0.8151	0.9790
	UCTransNet-Triplet	0.8925	0.8919	0.8180	0.9790
	UCTransNet-PKI	0.8930	0.8925	0.8185	0.9785
	UCTransNet-TPKI	0.8936	0.9008	0.8252	0.9806

Table 4. Results of the ablation study model (Mengdong dataset).

Dataset	Model	Precision	F1	IoU	Acc
Mengdong Dataset	UCTransNet	0.8825	0.9103	0.8365	0.9631
	UCTransNet-Triplet	0.8912	0.9150	0.8437	0.9658
	UCTransNet-PKI	0.8950	0.9180	0.8455	0.9765
	UCTransNet-TPKI	0.9015	0.9230	0.8560	0.9780

Table 5. Performance comparison of different models (Wushan dataset).

Dataset	Model	Precision	F1	IoU	Acc
Wushan Dataset	MFFENet	0.8992	0.8758	0.7790	0.9724
	TransLandSeg	0.8985	0.8778	0.7823	0.9725
	Segformer++	0.9056	0.8915	0.8042	0.9387
	UCTransNet-TPKI	0.8936	0.9008	0.8252	0.9806

Table 6. Performance comparison of different models (Mengdong dataset).

Dataset	Model	Precision	F1	IoU	Acc
Mengdong Dataset	MFFENet	0.8945	0.9149	0.8434	0.9700
	TransLandSeg	0.9104	0.8943	0.8088	0.9762
	Segformer++	0.9137	0.9019	0.8214	0.9468
	UCTransNet-TPKI	0.9015	0.9230	0.8560	0.9780

Table 7. Model efficiency comparison of UCTransNet variants on the Wushan dataset.

Model	Parameters	FLOPs	Inference Time
UCTransNet	66.24 M	43.058 G	23.68 ms
UCTransNet-Triplet	16.83 M	40.82 G	16.05 ms
UCTransNet-TPKI	19.56 M	50.90 G	16.79 ms

Table 8. Performance comparison across 5 random seeds on the Wushan dataset test set.

Model	IoU (Mean ± Std × 10⁻²)	F1-Score (Mean ± Std × 10⁻²)
UCTransNet	0.8886 ± 0.11	0.8150± 0.09
UCTransNet-TPKI	0.9007 ± 0.21	0.8251 ± 0.23

The standard deviation is scaled by 10⁻² for display.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, M.; Ding, W.; Liu, M.; Liu, Z.; Liu, X.; Wen, Y.; Li, H. Improved UCTransNet by Integrating Pyramid Kernel Interaction with Triplet Attention for Identifying Multi-Scale Landslides from GF-2 Imagery. Remote Sens. 2026, 18, 492. https://doi.org/10.3390/rs18030492

AMA Style

Wang M, Ding W, Liu M, Liu Z, Liu X, Wen Y, Li H. Improved UCTransNet by Integrating Pyramid Kernel Interaction with Triplet Attention for Identifying Multi-Scale Landslides from GF-2 Imagery. Remote Sensing. 2026; 18(3):492. https://doi.org/10.3390/rs18030492

Chicago/Turabian Style

Wang, Miao, Weicui Ding, Meiling Liu, Zujian Liu, Xiangnan Liu, Yanan Wen, and Hao Li. 2026. "Improved UCTransNet by Integrating Pyramid Kernel Interaction with Triplet Attention for Identifying Multi-Scale Landslides from GF-2 Imagery" Remote Sensing 18, no. 3: 492. https://doi.org/10.3390/rs18030492

APA Style

Wang, M., Ding, W., Liu, M., Liu, Z., Liu, X., Wen, Y., & Li, H. (2026). Improved UCTransNet by Integrating Pyramid Kernel Interaction with Triplet Attention for Identifying Multi-Scale Landslides from GF-2 Imagery. Remote Sensing, 18(3), 492. https://doi.org/10.3390/rs18030492

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved UCTransNet by Integrating Pyramid Kernel Interaction with Triplet Attention for Identifying Multi-Scale Landslides from GF-2 Imagery

Highlights

Abstract

1. Introduction

2. Study Area and Data

2.1. Study Area

2.2. Experimental Data

3. Method

3.1. Model Construction

3.1.1. Original UCTransNet Architecture

3.1.2. Pyramid Kernel Interaction (PKI) Module for Multi-Scale Feature Fusion

3.1.3. Triplet Attention for Suppressing Ambiguous Backgrounds and Enhancing Weak Features

3.1.4. The Encoder

3.1.5. The Decoder

3.2. Model Training and Implementation

3.3. Performance Evaluation Metrics

4. Results

4.1. Ablation Study

4.1.1. Ablation Study on Wushan Dataset

4.1.2. Ablation Study on Mengdong Dataset

4.1.3. Visual Analysis of Feature Attention

4.2. Comparative Study

4.2.1. Comparative Study on Wushan Dataset

4.2.2. Comparative Study on Mengdong Dataset

4.3. Model Efficiency and Robustness Analysis

4.3.1. Computational Efficiency and Complexity

4.3.2. Statistical Significance and Robustness

4.4. Extraction Results in Wushan County

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI