Next Article in Journal
Decentralized BIM Workflows with Smart Contract Execution
Previous Article in Journal
Development Virtual Sensors for Vehicle In-Cabin Temperature Prediction Using Deep Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Intelligent Landslide Susceptibility Assessment Framework Using the Swin Transformer Technique: A Case Study of Changbai County, Jilin Province, China

College of Earth Sciences, Jilin University, Changchun 130061, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(1), 301; https://doi.org/10.3390/app16010301
Submission received: 20 November 2025 / Revised: 23 December 2025 / Accepted: 26 December 2025 / Published: 27 December 2025
(This article belongs to the Section Earth Sciences)

Abstract

Frequent geological hazards such as landslides and rockfalls, intensified by human activities and extreme rainfall, highlight the urgent need for rapid, accurate, and interpretable susceptibility assessment. However, existing methods often struggle with insufficient characterization of spatial heterogeneity, fragmented spatial structures, and limited mechanistic interpretability. To overcome these challenges, this study proposes an intelligent landslide susceptibility assessment framework based on the Swin-UNet architecture, which combines the window-based self-attention mechanism of the Swin Transformer with the encoder–decoder structure of U-Net. Eleven conditioning factors derived from remote sensing data were used to characterize the influencing conditions. Comprehensive experiments conducted in Changbai County, Jilin Province, China, demonstrate that the proposed Swin-UNet framework outperforms traditional models, including the information value method and the standard U-Net. It achieves a maximum overall accuracy of 99.87% and consistently yields higher AUROC, AUPRC, F1-score, and IoU metrics. The generated susceptibility maps exhibit enhanced spatial continuity, improved geomorphological coherence, and greater interpretability of contributing factors. These results confirm the robustness and generalizability of the proposed framework and highlight its potential as a powerful and interpretable tool for large-scale geological hazard assessment, providing a solid technical foundation for refined disaster prevention and mitigation strategies.

1. Introduction

Geological hazards, the dynamic geological phenomena induced by both natural processes and human activities, are characterized by a sudden onset and a destructive nature [1]. They manifest in various forms, typically including landslides, rockfalls, ground subsidence, regional settlement, and debris flows [2]. These hazards not only pose direct threats to human life and property but also exert profound impacts on ecosystems, infrastructure, and the sustainable development of social and economic systems [3]. In recent years, under the combined pressures of accelerated climate warming, intensive large-scale engineering project constructions, high-intensity mineral resource exploitation, and excessive groundwater extraction, the spatial and temporal distributions of geological hazards have expanded and continue to expand significantly [4]. Against this backdrop, accurately predicting the risk of geological hazard occurrence in different regions through scientific approaches and constructing refined susceptibility zoning maps have become key components of regional disaster early-warning systems and critical foundations for disaster prevention and mitigation decision-making [5].
Currently, geological hazard susceptibility assessment models can be broadly categorized into four main types. The first category comprises qualitative evaluation models based on expert knowledge systems, such as the Analytic Hierarchy Process (AHP) [6] and fuzzy comprehensive evaluation [7]. Although these approaches can integrate multiple influencing factors, they are often limited by high subjectivity and poor reproducibility. These limitations mainly arise from the reliance on expert judgment for factor weighting and rule definition [8], which may vary significantly among different analysts and study areas, thereby reducing the robustness and transferability of the results [9]. The second category involves process-based physical and mechanical simulation models, which are suitable for small-scale, fine-grained analyses. However, their application to large-scale regional assessments is constrained by the high cost of acquiring high-precision data and the substantial computational demands involved [10]. The third category consists of statistical quantitative models, such as the information value method and frequency ratio method [11], which quantify the spatial correlations between conditioning factors and hazard occurrences to reveal potential patterns [12]. These models are generally straightforward to implement and computationally efficient [13], making them suitable for regional-scale applications where rapid assessment is required [14]. The fourth category consists of data-driven models represented by machine learning techniques, including random forests [15], support vector machines [16], and gradient boosting algorithms [17], that have become a research hotspot in recent years. With their strong capability for modeling nonlinear relationships, these methods have significantly enhanced the level of intelligence in geological hazard susceptibility assessment and enabled rapid evaluations [18]. By learning complex interactions among multiple conditioning factors directly from data [19], these approaches reduce the need for explicit assumptions regarding functional relationships [20].
However, when applied to high-resolution grids and complex geomorphological patterns, these approaches still face three common challenges. First, spatial heterogeneity is often insufficiently characterized, i.e., the same factor may exhibit “context-dependent” effects across different geomorphic units or neighborhood scales, making global parameters or fixed-bandwidth kernel functions difficult to adapt [21]. Second, the representation of spatial structural continuity and boundary geometry remains inadequate: pixel-based independent modeling can lead to “salt-and-pepper noise” and fragmented patches, thereby reducing the practical utility of the results [22]. Such fragmented susceptibility patterns are often inconsistent with the actual spatial organization of landslides [23] and limit their usefulness for hazard management and land use planning [24]. Third, interpretability and credibility remain limited: as models incorporate increasing nonlinearity and interactions, it becomes difficult to trace and quantify factor contributions and associated uncertainties. These limitations collectively highlight an urgent need to enhance spatial pattern-learning capabilities while simultaneously ensuring mechanistic interpretability and probabilistic calibration [25].
In recent years, deep learning–based semantic segmentation frameworks have been widely applied in fields such as computer vision, medical imaging, and transportation. With the substantial improvement in the spatial resolution of remote sensing imagery and digital elevation models (DEMs), new technical opportunities have emerged to address the aforementioned challenges [26]. These advances enable the extraction of detailed spatial patterns [27] and morphological features that were previously difficult to capture using traditional approaches [28]. Convolutional neural networks (CNNs), with their local receptive fields and weight-sharing mechanisms, are adept at learning morphological cues from local neighborhoods—such as ridge–gully structures, flow convergence channels, and patch textures [29]. Such characteristics make CNNs particularly suitable for modeling localized geomorphic signatures associated with landslide initiation. Among them, the representative U-Net model, featuring an “encoder–decoder with skip connections” architecture, has achieved remarkable performance in pixel-level segmentation tasks and has become a fundamental baseline for natural hazard mapping and geomorphological interpretation [30]. Its skip-connection design effectively preserves high-resolution spatial information while incorporating deep semantic features. The Transformer architecture, by introducing self-attention mechanisms, enables the modeling of long-range dependencies and contextual interactions over larger spatial extents [31]. Building on this, the Swin-Transformer employs a window-based, shifted-window strategy, which maintains the capacity to capture long-range relationships while controlling computational complexity [32]. This hierarchical attention mechanism allows global contextual information to be gradually aggregated across multiple spatial scales. When integrated with a U-Net decoder, it offers the combined advantages of global contextual perception and fine-grained spatial reconstruction [33].
To address the challenges of insufficient characterization of spatial heterogeneity, inadequate spatial structural continuity, and weak mechanistic interpretability in geological hazard susceptibility assessment, this study proposes an intelligent landslide susceptibility evaluation method based on the Swin-UNet model. Taking Changbai County, Jilin Province, China, as the study area and landslides as the representative hazard type, the proposed approach utilizes the existing remote sensing data to extract eleven conditioning factors: elevation, slope, aspect, plan curvature, profile curvature, the topographic wetness index (TWI), the normalized difference vegetation index (NDVI), land use, and distances to roads, rivers, and faults. The Swin-UNet model is constructed by introducing a window-based self-attention mechanism to enhance the recognition of complex geomorphic zonation and long-range spatial dependencies, enabling the effective extraction of local morphological features around known landslide occurrences. The model’s performance is comprehensively evaluated using metrics such as area under the receiver operating characteristic curve (AUROC), area under the precision–recall curve (AUPRC), F1-score and Intersection over Union (IoU), confirming its effectiveness. Based on this method, a 30 m-resolution landslide susceptibility zonation map for the study area is generated.
This paper is organized as follows. First, the current research states and settings of the study area are described in Section 1 and Section 2. Then the data used in the proposed approach are analyzed in Section 3, and the detailed method of Swin-UNet model is provided in Section 3, too, and the experiment results are described in Section 4. Finally, the study is concluded in Section 5.

2. Geological Settings

Changbai Korean Autonomous County is located in the southeastern part of Jilin Province, China, along the border with North Korea. The eastern part of the study area faces North Korea across the Yalu River, while the western and northern parts border Linjiang City and Fusong County of Baishan City, respectively (Figure 1). The region experiences a temperate continental humid monsoon climate with distinct seasonal variations, characterized by long, cold winters and short, rainy summers. The mean annual temperature is approximately 2.1 °C, with extreme highs reaching 34.8 °C and extreme lows dropping to −36.4 °C. The average frost-free period is about 110 days, the seasonal frozen-soil depth is around 1.5 m, and the mean annual evaporation is 1101.5 mm.
Changbai County is located on the southern flank of the Changbai Mountains, where the overall terrain exhibits a northeast–southwest gradient, descending from higher elevations in the northeast to lower elevations in the southwest. The average elevation is approximately 1570 m. Most of the county is characterized by a lava plateau landform, with the terrain gradually declining from the highlands around Mount Changbai and Sidingfang toward the southwest and southern regions. The area features isolated middle mountains and residual hills, and valleys are well developed, typically exhibiting a characteristic V-shaped cross-section. The eastern and southern parts of the study area consist of small plains formed by alluvial deposits from the Yalu River and its tributaries, with an average elevation of about 589 m. Slope gradients vary significantly across the region: steep slopes with average gradients exceeding 30° are found along the Yalu River and its tributary valleys, whereas the lava plateau areas are relatively gentle, with average slopes generally below 25°. Overall, Changbai County is situated within the Changbai Mountain region, where fluvial incision is intense and topographic relief is pronounced. Based on geomorphological genesis, the region can be divided into volcanic landforms, tectonic-denudational landforms, erosional–depositional landforms, and accumulational landforms.
The geographic and climatic characteristics of the study area were summarized based on the Geological Hazard Risk Survey Report of Changbai Korean Autonomous County.

3. Materials and Methods

Artificial intelligence (AI) applications are typically built using large-scale training datasets, enabling computers to learn labeled feature patterns and perform various intelligent tasks. Accordingly, this study collected the existing DEM, remote sensing, and landslide inventory data for Changbai County, Jilin Province. A total of 11 different factors, including 11 conditioning factors, were extracted, i.e., elevation, slope, aspect, plan curvature, profile curvature, the TWI, the NDVI, land use type, distance to roads, distance to rivers, and distance to faults, and they were analyzed from four different perspectives, which are topographic characteristics, environmental conditions, geological conditions, and human activities. All factors were rasterized and standardized to a spatial resolution of 30 m × 30 m, forming the dataset used to construct an AI-based intelligent landslide susceptibility evaluation model.
The digital elevation model (DEM) was derived from the Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM, 30 m resolution, manufacture: NASA, Washington, DC, USA). The NDVI was generated from Copernicus Sentinel-2 Level-2A surface reflectance products (Manufacture: European Space Agency (ESA), Paris, France). Land use data were obtained from the GlobeLand30 global land cover dataset at a 30 m spatial resolution (Manufacture: National Geomatics Center of China, Beijing, China). Vector layers representing roads, river networks, and faults were obtained from the Geological Hazard Risk Survey Results Report of Changbai Korean Autonomous County (Manufacture: Jilin Provincial Coalfield Geological Exploration and Design Institute, Changchun, Jilin, China) and were used to derive distance-based conditioning factors.

3.1. Materials

An official landslide inventory was adopted in this study to support model training, validation, and regional landslide susceptibility analysis in Changbai Korean Autonomous County, Jilin Province. The inventory was obtained from the Geological Hazard Risk Survey Results Report of Changbai Korean Autonomous County and represents the most complete and authoritative landslide dataset currently available for the study area.
The adopted inventory was produced through a county-scale landslide risk survey and integrates multiple data sources, including historical landslide records, high-resolution satellite image interpretation, and systematic field investigations. Field surveys were conducted in strict accordance with national and provincial technical specifications for landslide investigations, with particular emphasis on areas posing potential threats to settlements, transportation infrastructure, and important public facilities. All suspected landslide sites were verified in the field, and stabilized or engineering-treated slopes, as well as sites lacking clear geomorphological evidence of slope instability, were excluded through a standardized verification procedure.
To ensure the spatial accuracy and currency of the inventory, high-resolution GF-6 (Gaofen-6) optical satellite imagery was used during the risk survey for remote sensing interpretation. Two GF-6 scenes acquired on 5 April 2021 were employed. The 8 m multispectral imagery was fused with the 2 m panchromatic band and resampled to 2 m and 1 m spatial resolutions, which met the requirements for landslide interpretation at mapping scales of 1:50,000 and 1:10,000. Image processing included geometric correction, image fusion, DEM-based orthorectification, false-color composition, and digital image enhancement. The interpreted results were subsequently validated through field inspections.
During the interpretation process, boundary delineation uncertainty was controlled within two pixels, and only landslide features exceeding the minimum interpretable unit defined by the survey specifications were retained. Multiple levels of quality control, including self-checking, cross-checking, supervisory sampling, and final inspection, were implemented during the survey to ensure data reliability.
As a result, the official inventory contains a total of 388 landslide locations, which were directly adopted as positive samples in this study. These locations include both historically documented landslides and newly identified or updated sites confirmed during the 2021 remote sensing interpretation and field survey campaign, among which 103 locations were directly interpreted from GF-6 imagery and verified in the field. All landslide locations were converted into point representations corresponding to source areas and spatially aligned with the conditioning factor layers. This authoritative, multi-source inventory provides a reliable reference dataset for landslide susceptibility modeling.

3.1.1. Topographic Characteristics

Based on the DEM data, topographic characteristics were derived through sink-filling and subsequent terrain analysis in ArcGIS 10.8. Six factors, including elevation, slope, aspect, plan curvature, profile curvature, and the TWI, were extracted to characterize the topography of the study area (Figure 2). Elevation (Figure 2a) reflects geomorphological zonation and overall relief patterns, showing a general trend of “low in the west and high in the east”, with the highest elevations occurring in the northeast and the lowest in the west. Slope (Figure 2b), which directly determines the gravitational component and potential shear stress, exhibits a heterogeneous distribution: steep slopes dominate the central part of the region, become gentler on both sides, and increase again along the southern margin. Aspect (Figure 2c) influences solar radiation and soil moisture content. Plan curvature (Figure 2d) represents the convergence and divergence of surface flow in the horizontal direction, whereas profile curvature (Figure 2e) characterizes variations in steepness along the vertical profile. The TWI (Figure 2f) approximates areas with potentially high water accumulation and elevated pore-water pressure controlled by topographic conditions.

3.1.2. Environmental Factors

Remote sensing imagery of the study area provides an intuitive representation of vegetation coverage and hydrographic distribution. The NDVI reflects the extent and vigor of surface vegetation, with higher coverage generally enhancing soil shear strength and reducing pore-water pressure. The distance to rivers characterizes the hydrodynamic and geomorphic effects associated with valleys and river channels; areas closer to river networks experience more concentrated shear stress, thereby increasing the likelihood of landslide initiation.
During data processing, remote sensing images with the lowest cloud cover were first selected in ArcGIS. Cloud and shadow masking and image mosaicking were performed, followed by reprojection and resampling to a uniform spatial resolution of 30 m. Two environmental conditioning factors, including the NDVI and distance to rivers, were extracted for the study area (Figure 3). The NDVI (Figure 3a), calculated from optical remote sensing imagery in ArcGIS, clearly shows that vegetation coverage in the central part of the region is significantly higher than that on the eastern and western sides. The distance-to-river raster (Figure 3b) was generated using the Euclidean distance tool applied to the existing river vector data (in meters). The results indicate that river networks are generally dispersed across the study area, with a higher concentration of watercourses in the central region.

3.1.3. Geological Factors

Fault zones and their surrounding influence areas are typically characterized by fractured rock masses, well-developed joints and fissures, and enhanced weathering. Areas closer to faults are more likely to meet the conditions for disaster occurrence, resulting in higher geological susceptibility. In this study, the distance to faults was used to represent the controlling influence of geological structures on landslide occurrence. Based on the existing fault line data, a distance raster was generated in ArcGIS by calculating the Euclidean distance between the center points of grid cells and the nearest fault line, producing a spatial distribution map of geological influence intensity (Figure 4). As shown in Figure 4, the northeastern part of the study area is significantly farther from fault zones, indicating that most of the faults are concentrated in the southwestern region.

3.1.4. Human Activity Factors

Road excavation and embankment construction can reduce slope stability, alter surface runoff pathways, and concentrate drainage, thereby increasing the risk of landslides in areas closer to roads. Land use also influences runoff generation, infiltration capacity, and shear strength through vegetation coverage and soil disturbance intensity. For example, construction land and cropland generally present higher susceptibility than forested areas. In this study, the impacts of human activities on landslide susceptibility were characterized using the distance to roads (Figure 5a) and land use types (Figure 5b). Similarly to the distance-to-fault calculation, the distance-to-road raster was generated in ArcGIS by computing the Euclidean distance between the center points of grid cells and the road line features, while land use information was directly extracted from the classified raster data.
The analysis of human activity factors (Figure 5) shows that roads are primarily concentrated in the central–western and eastern parts of the study area, with fewer roads distributed in the western and southern regions. Land use is predominantly characterized by forest cover and cropland, with scattered patches of artificial surfaces in the southern part of the region and small areas of grassland in the northeast. As the only categorical variable, land use information was represented as an integer-valued categorical raster and incorporated as an independent input channel together with the continuous factors.

3.1.5. Correlation Analysis

Correlation analysis is used to quantify the strength and direction of dependence between two or more variables and serves as a fundamental step in feature selection and multi-collinearity diagnostics. If the correlation between factors is excessively high, it may lead to parameter space inflation. When multiple features exhibit strong multi-collinearity, gradient updates along redundant directions within the network can become unstable or oscillatory, resulting in slower convergence or even stagnation at local saddle points. Highly correlated inputs can also cause the network to focus unnecessarily on nearly identical information channels, thereby reducing its sensitivity to a small number of critical triggering factors and ultimately impairing its generalization performance across different regions or temporal scenarios.
In this section, factor correlations are analyzed from three perspectives: the Pearson correlation coefficient, variance inflation factor (VIF), and tolerance (TOL). The Pearson correlation coefficient (r) measures the strength and direction of the linear association between two continuous variables, with values ranging from −1 to 1. A positive r indicates a positive correlation, while a negative r indicates a negative correlation. The closer |r| is to 1, the stronger the linear relationship, whereas values close to 0 indicate a weak or no linear relationship.
As shown in Table 1, the absolute values of all correlation coefficients are below 0.8, indicating that the variables meet the requirements for subsequent analysis.
TOL and VIF are complementary indicators commonly used to assess multi-collinearity. TOL describes the proportion of variance in a given variable that cannot be linearly explained by other variables, i.e., a lower TOL indicates higher redundancy. However, VIF represents the degree to which multi-collinearity inflates the variance of a regression coefficient. These two metrics are reciprocals of each other.
As shown in Table 2, all parameters have TOL values greater than 0.2 and VIF values less than 5, indicating that the variables meet the requirements for subsequent analysis.

3.2. Methods

U-Net is a classical encoder–decoder–based semantic segmentation network [34], named after its symmetric “U-shaped” architecture (Figure 6). The left encoder path extracts multi-scale contextual information through convolution and downsampling, while the right decoder path progressively upsamples the feature maps to recover spatial resolution [35]. A key component of U-Net is the use of multi-level skip connections, which directly fuse fine-grained features from the encoding stage with the corresponding decoding layers [36]. This design preserves precise spatial localization while leveraging global semantic information. Originally developed for biomedical image segmentation, U-Net has demonstrated excellent performance even in small-sample scenarios when combined with data augmentation. It is now widely used in pixel-level segmentation tasks across fields such as remote sensing [37], medical imaging, and industrial inspection, and has inspired many improved variants, including 3D U-Net, Attention U-Net [38], and U-Net++ [39].
Essentially, U-Net is a fully convolutional architecture built on top of CNNs [40]. It employs a standard CNN encoder to progressively extract multi-scale features, paired with a symmetric decoder composed of upsampling and convolution layers to gradually restore spatial resolution [41]. Through skip connections, shallow-layer detail features are directly passed to their corresponding deeper layers, extending CNNs’ image-level classification capability to dense, pixel-level prediction and segmentation [42]. Thus, while U-Net shares the same operators and training approach as conventional CNNs, its encoder–decoder symmetry and skip-connection design make it particularly effective for tasks involving small datasets, fine boundary delineation, and precise spatial localization.
As shown in Figure 6, a typical architecture of the U-Net model includes many modules. The module of Conv denotes a 2D convolution operation, while DoubleConv refers to two consecutive Conv2d layers followed by BatchNorm2d and ReLU activation. Batch normalization stabilizes feature distribution and ReLU introduces nonlinearity while mitigating gradient vanishing. ConvTranspose2d represents transposed convolution (deconvolution), which performs upsampling to enlarge spatial resolution. Cat denotes feature concatenation, and MaxPool2d refers to max pooling, which downsamples features by selecting the maximum value within a spatial window.
The Transformer is a deep neural network architecture built around the self-attention mechanism [43], designed for sequence modeling and representation learning [44]. It replaces recurrent and convolutional operations with multi-head self-attention and feed-forward networks, and introduces positional encoding to incorporate sequential information [45]. This enables the modeling of long-range dependencies at a global scale while supporting efficient parallel computation [46]. Compared with recurrent neural networks (RNNs) and CNNs, Transformers excel at capturing long-range contextual relationships and exhibit stronger scalability and transferability, which have led to their dominant success in machine translation, pre-trained language models, computer vision, and multimodal generation [47]. However, the primary drawback of the standard Transformer is that the computational cost of the attention mechanism grows quadratically with sequence length, motivating the development of efficient variants such as sparse attention and linear attention for long-sequence scenarios.
The Swin Transformer is a hierarchical self-attention framework tailored for visual tasks [48]. It restricts self-attention to fixed-size local windows and applies window shifting across adjacent layers, enabling information exchange between windows [49]. This design preserves the model’s global context modeling capacity while significantly reducing computational and memory costs [50]. The network first performs patch embedding using small-stride convolutions and then progressively downsamples through patch merging while increasing the channel dimensions, forming a pyramid-like feature hierarchy that facilitates integration with a U-shaped decoder for segmentation tasks. Each stage alternates between window-based and shifted-window attention and incorporates relative position bias to enhance spatial relationship representation. With the design of “local computation combined with hierarchical representation”, the Swin Transformer achieves a strong balance between fine-grained boundary details and global semantic understanding, making it a robust encoder for classification, detection, and segmentation tasks.
In this study, combining the Swin Transformer and U-Net architectures, an intelligent landslide susceptibility assessment method based on Swin-UNet is proposed. The approach first performs unified projection and resolution standardization of multi-source factors, including topography, geology, hydrology, and land use, and calculates clipping and normalization parameters for continuous variables in the training area. Buffered spatial partitioning is then applied to obtain training, validation, and test datasets, with sliding-window balanced sampling employed in the training phase to mitigate class imbalance. The Swin-UNet model (Figure 7) is subsequently trained to learn the spatial characteristics of known landslide occurrences.
To reduce spatial dependence and prevent spatial leakage between training and evaluation samples, a block-based spatial partitioning strategy was adopted. The study area was first divided into a regular grid of non-overlapping blocks with a spatial size of 128 × 128 pixels, corresponding to approximately 3.84 km × 3.84 km under the 30 m spatial resolution. For each block, the number of valid pixels was calculated based on the global valid-area mask. Blocks were then randomly assigned, using current time as a random seed, to the test and validation subsets by cumulatively allocating valid pixels until the predefined proportions of 10% for the test set and 10% for the validation set were reached. The remaining blocks were used to construct the training set.
To further eliminate spatial adjacency between different subsets, a no-contact buffer zone was introduced along the boundaries of each partition. This buffer was generated by applying a morphological dilation with a radius of 8 pixels, equivalent to approximately 240 m in the horizontal direction. Pixels located within this buffer zone were excluded from all subsets and stored separately as a void mask. As a result, the final training, validation, and test regions are spatially disjoint and separated by an explicit gap, effectively mitigating spatial leakage caused by near-neighbor similarity across subset boundaries.
During both training and large-scale inference, a sliding-window strategy was employed. At the edges of the study area, sliding windows that partially extended beyond the valid-area boundary were retained to ensure full spatial coverage. Model predictions were computed only for pixels within the valid-area mask, while pixels outside the study area were ignored. Overlapping predictions, including those near the study area boundary, were merged using the same overlap-weighted averaging scheme as applied in interior regions, thereby reducing edge artifacts and ensuring spatial continuity. No artificial padding or spatial extension beyond the study area boundary was introduced; instead, the valid-area mask was consistently used to constrain model input and output.
As illustrated in Figure 7, Conv2d denotes a 2D convolution operation, which aggregates weighted information within a local receptive field and extracts spatial features such as texture and edges. BatchNorm2d + ReLU normalizes feature distributions based on batch statistics, introduces nonlinearity, stabilizes training, and enhances feature representation. LayerNorm performs normalization across feature dimensions for individual samples. Swin represents a window-based self-attention block that applies multi-head self-attention and an MLP within a fixed window and exchanges information across blocks via the shifted-window mechanism, enabling mid- and long-range dependency modeling. Patch merging refers to the downsampling process, in which adjacent 2 × 2 patches are concatenated and linearly projected, reducing spatial dimensions (H and W) by half while doubling the number of channels. Interpolate denotes upsampling using bilinear or nearest-neighbor interpolation to restore spatial resolution. Cat refers to feature concatenation, which merges two feature maps of the same spatial size along the channel dimension.
The Swin-UNet model adopts the classic encoder–decoder structure of U-Net, with the Transformer serving as the encoder to extract four levels of hierarchical pyramid features. These multi-scale representations are then channel-aligned using 1 × 1 convolutions and provided as skip connections to the U-Net decoder. The decoder progressively reconstructs the spatial resolution through the sequence of “upsampling-concatenation-convolution”, ultimately producing pixel-level logits. Functionally, the Transformer excels at modeling long-range dependencies and the global context, while U-Net focuses on fine-grained spatial details and boundary reconstruction. Their combination offers a clear advantage over conventional U-Net models by preserving global semantic information while enhancing local texture and edge representation, resulting in improved stability and accuracy, particularly for high-resolution imagery and elongated or small-scale targets.
The Swin-UNet model follows a U-Net-style encoder–decoder structure and is trained with a composite loss function of binary cross-entropy (BCE) and Dice loss. Model performance is monitored using validation metrics including Dice coefficient, IoU, AUROC, AUPRC, and F1 score–based optimal threshold.
The hyperparameters of the Swin-UNet model are listed in Table 3.
Overall, the hyperparameter settings of the Swin-UNet model (Table 3) were determined through multiple trial experiments. Different configurations were tested, and the final parameter combination was selected based on achieving the best overall performance on the validation set while ensuring stable convergence and reasonable computational efficiency.

4. Results and Discussion

The software and hardware configurations used in the experiments are summarized in Table 4.

4.1. Comparative Analysis of Model Performance

In order to evaluate the performance of the proposed approach in geological hazard susceptibility assessments, the information value method and the U-Net model were implemented to conduct experiments. Their performances were comparatively analyzed. The results are visualized using precision–recall (PR) curves, as shown in Figure 8.
As illustrated in Figure 8, both the U-Net and Swin-UNet models outperform the traditional information value method. The U-Net model was optimized by selecting the best-performing weights based on the Dice coefficient of the validation set. When evaluated on the independent test set, the U-Net achieved an AUROC of 0.996, AUPRC of 0.926, F1-score of 0.941, IoU of 0.889, and overall accuracy of 0.999. As shown in Figure 8, the PR curve maintains a high precision level even in regions of high recall, with an average precision (AP) of 0.926. This indicates that the model effectively controls false positives while increasing the detection rate.
The PR curve of the Swin-UNet model on the test set exhibits a “long plateau followed by a sharp decline” pattern. Within a wide recall range of approximately 0–0.90, precision remains nearly constant at 1.0, indicating very few false positives and highly stable ranking performance. As recall approaches 1.0, the curve drops sharply, suggesting that further improving recall comes at the cost of a significant reduction in precision. The area under the curve reaches 0.933, demonstrating strong discriminative capability and good generalization even under pixel-level and highly imbalanced class conditions. The overall trend is consistent with that observed on the validation set, reflecting the model’s robust performance on unseen samples.
Based on the performance of all three models on the test set, receiver operating characteristic (ROC) curves were generated to analyze the false positive rate (FPR) and true positive rate (TPR), as shown in Figure 9, providing a more comprehensive evaluation of their overall effectiveness.
As shown in Figure 9, both the U-Net and Swin-UNet models outperform the information value method on the test set. The ROC curves of the two deep learning models closely follow the coordinate axes and remain relatively flat at high true positive rates, clearly deviating from the diagonal line of a random classifier. This indicates that the models maintain both high true positive rates and low false positive rates across a wide range of decision thresholds. The steep initial rise in the curves suggests that significant improvements in recall can be achieved even in regions with extremely low false-positive rates. The Swin-UNet model achieves an AUROC of 0.988, close to a perfect score, reflecting its strong discriminative ability and stable ranking performance. Combined with the “long plateau” behavior observed in the PR curve, these results consistently demonstrate the model’s robust generalization to unseen data and reliable discrimination between positive and negative pixels.
The overall performance results of the three models on the same test dataset are summarized in Table 5.
As shown in Table 5, the Swin-UNet model achieves an AUROC of 0.988 and an AUPRC of 0.933 on the test dataset, indicating excellent discriminative capability and stable ranking performance under class-imbalanced conditions. At the operational threshold of approximately 0.68, the overall accuracy reaches 0.9987, with a precision of 0.984, a recall of 0.876, and corresponding F1-score and IoU values of 0.927 and 0.864, respectively. These results suggest that the model achieves high recall while maintaining an extremely low false-positive rate, producing predictions with high spatial overlap and boundary consistency relative to ground truth. Noteworthily, the overall accuracy is influenced by the large proportion of background pixels; thus, metrics such as AUPRC, F1-score, and IoU provide a more objective evaluation of segmentation quality. Overall, the model demonstrates stable performance and good generalization potential for practical applications.
To further evaluate the ability of the U-Net and Swin-UNet models to prioritize high-risk regions, which is a key requirement in real-world disaster management, a Top-K precision analysis was conducted. Pixels across the entire study area were ranked in a descending order according to the predicted susceptibility probability. The top 1%, 2%, 5%, 10%, and 20% of the pixels (Top-K subsets) were then selected as priority inspection areas, and the proportion of true landslide pixels within each subset was calculated as the corresponding precision. Each bar in Figure 10 represents the precision at a given area percentage, with taller bars indicating a higher hit rate and cleaner selection under the same area constraints.
Unlike global metrics such as AUROC and AUPRC, Top-K precision focuses on the “head” of the ranked predictions and directly reflects a model’s ability to push truly high-risk pixels to the top of the list. This makes it a more practical indicator for early warning and mitigation scenarios under limited resources. A higher Top-K precision at small area ratios (e.g., Top-1% or Top-2%) indicates that the model is better suited for prioritized hazard mitigation and rapid field inspection.
As shown in Figure 10, the Top-1% precision of the U-Net and Swin-UNet models is 0.83 and 0.832, respectively, indicating that covering only the top 1% of high-risk pixels captures more than 83% of true landslide pixels. The Swin-UNet model outperforms U-Net in this regard. When the coverage expands to 2%, the precisions decrease to 0.43 and 0.431, with Swin-UNet still performing slightly better. However, when the coverage further increases to 5%, 10%, and 20%, the precision of the Swin-UNet model declines and becomes lower than that of the U-Net model. This “high-then-sharp-decline” pattern suggests that the Swin-UNet model strongly concentrates the most hazardous pixels at the very top of the ranking, which is highly advantageous for “priority inspection under limited resources”. Therefore, compared with U-Net, the Swin-UNet model demonstrates superior capability in focusing on and evaluating landslide susceptibility.
Furthermore, a recall at different area proportion (RA)curve was employed to evaluate the detection capability of the U-Net and Swin-UNet models under a “limited inspection area” constraint. Pixels across the entire study area were ranked in a descending order based on the predicted susceptibility probability. The top α proportion of the area was then iteratively selected, and the proportion of true landslide pixels within this subset relative to the total number of landslide pixels was calculated as the recall. By plotting the area proportion on the x-axis and the recall on the y-axis (Figure 11), the curve illustrates how effectively each model prioritizes high-risk regions. A curve closer to the upper-left corner indicates that more true landslide pixels can be recalled with a smaller inspection area, demonstrating more effective prioritization.
Unlike global discriminative metrics, such as AUROC and AUPRC, the RAcurve directly answers the following practical question: “If only the top α% of high-risk areas can be inspected first, how many true landslide pixels can be detected?” This makes it a more application-oriented metric for early warning and field investigation scenarios. For example, when α = 10%, the corresponding y-axis value represents the detection rate achieved by inspecting only the top 10% of the study area, thereby quantifying the model’s focusing ability in the top-ranked regions.
As shown in Figure 11, approximately 90.1% of landslide occurrences can be recalled by inspecting only the top 1% of the area using the U-Net model, whereas the Swin-UNet model achieves a slightly higher recall of 90.4% within the same area. This indicates that Swin-UNet is more advantageous for prioritizing high-risk zones and enabling graded early warning under limited mitigation resources. Moreover, without compromising overall discriminative capability, the Swin-UNet model provides a cleaner and more structurally coherent susceptibility distribution, making it better suited as a foundational layer for planning and field inspection.

4.2. Susceptibility Zonation

To construct the landslide susceptibility maps for the study area, the information value method, the U-Net model, and the proposed Swin-UNet model were applied for large-scale inference over the entire region. The workflow is summarized as follows:
First, the multi-channel image stack was read in the exact channel order and spatial resolution used during the training phase, along with the valid-area mask and the reference raster’s projection, resolution, and dimensions.
Next, standardization identical to that used during training was applied exclusively to valid pixels. Specifically, pixel values were clipped according to channel-specific upper and lower bounds, followed by logarithmic transformation for selected channels. Mean–variance normalization was then performed using the statistical parameters derived from the training dataset. Pixels outside the valid mask were set to zero to avoid interference in convolution operations.
Since the entire study area covers a large spatial extent, a full-coverage sliding-window strategy was adopted for inference. The lower-right boundary of the moving window was aligned to ensure complete coverage without gaps. The predicted probability map for each window was multiplied by a two-dimensional weighting matrix and accumulated into a global prediction map, along with a cumulative weight map. A seamless probability map for the entire area was then generated using weighted averaging, while pixels outside the valid mask were assigned null values for compatibility with GIS visualization.
To enhance the interpretability of the predicted probabilities, temperature scaling was applied using two calibration parameters, which are temperature and bias, estimated from the validation set. Increasing the temperature smooths the probability distribution, while the bias shifts the overall probability upward or downward, thereby improving the alignment between predicted probabilities and actual occurrence frequencies.
Finally, a georeferenced probability raster was generated for the entire study area, and a binary landslide susceptibility map was derived by applying the optimal threshold selected from the validation set. For susceptibility zonation and spatial interpretation, the continuous probability maps were independently classified into five susceptibility classes using the natural breaks method, without affecting the quantitative evaluation results.
Through the combined strategies of consistent standardization, overlap-weighted seamless stitching, and probability calibration, the large-scale inference process ensures numerical consistency, spatial continuity, and probabilistic interpretability. The final regional-scale landslide susceptibility assessment results are shown in Figure 12.
As shown in Figure 12, the spatial consistency between susceptibility classes and landslide occurrences is generally high across all three models: most landslide points fall within medium- to high-susceptibility zones, with pronounced clustering observed along rivers and roads, indicating that the models respond appropriately to key controlling factors.
Compared with the traditional information value method, the U-Net results (Figure 12b) exhibit more continuous high-value strips and finer local structures along valleys and marginal zones, while low-susceptibility areas are cleaner. This reflects the deep learning model’s enhanced capability to capture nonlinear interactions among multiple factors and integrate spatial contextual information.
The susceptibility map generated by the Swin-UNet model (Figure 12c) shows a clearer spatial differentiation. High-susceptibility zones predominantly form linear and band-like patterns, continuously extending along major flow paths, valleys, and slope toes, with edge-aligned high-value belts appearing near coastal zones and steep escarpments. These high-susceptibility belts show a strong spatial co-location with known landslide points, which cluster densely within these areas, indicating that the model effectively captures terrain–hydrological structures associated with slope instability. Plateau and gentle-slope areas are mostly characterized by low susceptibility, where the density of landslide occurrences is significantly reduced, suggesting effective background suppression. A small number of missed landslides are mainly distributed within low-to-medium transition zones, which may be related to micro-geomorphology, near-surface disturbances, or sample location inaccuracies. Some narrow high-susceptibility belts near the boundary require further interpretation to verify whether they correspond to real geomorphological features such as marine erosion cliffs or scarps, in order to avoid misinterpreting boundary effects as risk areas.
Overall, the strong correspondence between landslide occurrences and high-susceptibility areas demonstrates the discriminative power and application potential of the proposed model. The Swin-UNet model produces susceptibility maps with superior spatial continuity and geomorphological consistency: high-risk zones form continuous bands following flow paths and ridge–gully structures, while salt-and-pepper noise in low-susceptibility areas is significantly reduced. This suggests that the self-attention mechanism in the encoder effectively models long-range dependencies, enhancing the model’s ability to represent the combined influence of geomorphological, structural, and hydrological factors.
Moreover, the Swin-UNet model aggregates high-risk pixels into coherent strips and patches while maintaining cleaner non-hazard regions, making the results more readily convertible into management zones, inspection corridors, and early-warning belts. This advantage stems from the self-attention mechanism’s capacity to capture long-range dependencies and integrate topographic, structural, and hydrological constraints over broader spatial contexts, while conventional convolutional networks tend to produce fragmented patterns and boundary fluctuations. In other words, the susceptibility map produced by Swin-UNet is more structured and more actionable. Even if its pixel-level composite score is only slightly higher, it offers substantially greater practical value in planning and decision-making scenarios.
To further substantiate this practical advantage, we analyzed the spatial distribution of correctly and incorrectly predicted landslides with respect to the Swin-UNet susceptibility map (Figure 13). In this analysis, when the high and very high susceptibility classes are treated as the high-risk zone, 282 out of 388 mapped landslides (72.7%) fall within these areas, whereas 106 events (27.3%) occur in lower-susceptibility zones.
As shown in Figure 13a, the undetected landslides (false negatives) are mainly concentrated in cropland and grassland on gently sloping hilly terrain. These failures typically occur on anthropogenically modified slopes with relatively weak relief and moderate gradients, where long-term cultivation, local slope cutting at the toe, and rudimentary drainage works interact with fine-scale microtopographic irregularities to promote instability. As the conditioning factors used in this study are primarily derived from 30 m-resolution DEM-based variables and conventional land use data, such small-scale engineering disturbances and micro-geomorphic features cannot be explicitly represented. Thus, the model is less sensitive to the failure conditions in these areas and tends to assign them to low- or moderate-susceptibility classes, leading to missed events.
In contrast, as illustrated in Figure 13b,c, most false positives (pixels classified as highly susceptible but without mapped landslides) form clusters along major road corridors and valley-side slopes near river channels. This pattern indicates a strong sensitivity of the model to combined settings such as “steep slopes + linear infrastructure” or “steep slopes + fluvial erosion”. Although no landslides have yet been recorded in these high-susceptibility pixels, the geomorphological and engineering geological conditions suggest that these locations possess inherent potential instability. Therefore, these false positives are not purely random noise but are clearly associated with specific geomorphological units and land use types, and they may provide useful guidance for prioritizing field inspections and monitoring along road networks and riverbanks.

4.3. Model Interpretability Analysis

Shapley additive explanations (SHAP) is a model interpretation framework based on cooperative game theory. It treats each feature as a player and defines its contribution to a single prediction (the Shapley value) as the average marginal gain across all possible feature coalitions. This approach satisfies three key properties, which are additivity, local accuracy and consistency, and allows for the attribution of positive or negative contributions and their magnitudes at the sample level, which can further be aggregated into global importance scores. In practice, several SHAP explainers (algorithms inlucded in the SHAP module) have been developed for different model classes: TreeSHAP provides an efficient and exact solution for tree-based models, while DeepSHAP and KernelSHAP offer approximate solutions for deep or general models. SHAP is are often visualized using bar plots, dependence plots, and force plots. Compared with traditional feature importance metrics, SHAP offers superior comparability and local interpretability, though it is sensitive to the choice of background samples and computational cost.
To investigate the decision-making mechanisms of the two segmentation models, SHAP was integrated into the Swin-UNet framework. The trained model was encapsulated as a callable function, with the mean logit value within a target pixel or candidate region used as the explanatory output. Feature coalitions were defined as the set of input spectral bands and topographic factors, and the same masking rules as those used in training were applied. A subset of background samples was selected from the validation set, and Shapley values were approximated using KernelSHAP. This process yielded sample-level contribution heatmaps and global feature importance rankings.
To enhance spatial interpretability, pixel-level contributions were aggregated based on superpixels or geomorphological partitions and then mapped back to geographic space, visually indicating which factors, including, where and in what direction, influence the prediction. This procedure is fully applicable to both networks. During sliding-window inference with the Swin-UNet model, SHAP values were computed for each window and averaged with overlap weighting to ensure comparability and stability of interpretation across locations and scales. As aspect was represented by three raster layers (sine, cosine, and defined/undefined flags), they were grouped and evaluated jointly to avoid diluting the importance of a single physical variable.
During interpretation, several random sample patches were drawn from the test set, and the mean of the absolute SHAP values across all samples was used to calculate global feature importance.
As shown in Figure 14, the model’s discriminative capability is primarily driven by distance-related factors and terrain elevation. The four most influential variables, i.e., distance to roads, elevation, distance to rivers, and distance to faults, contribute to more than 80% of the total feature importance. These are followed by aspect, slope, profile curvature, land use, the TWI, plan curvature, and the NDVI, all of which have considerably lower contributions. This indicates that these factors provide complementary rather than decisive information in the susceptibility assessment.
The beeswarm plot (Figure 15) further reveals the direction of influence and the distribution of feature values. For the three distance-related factors, most data points are located to the left of the zero line, indicating negative contributions when values are high (red) and positive contributions when values are low (blue). In other words, landslide susceptibility increases as the distance to roads, rivers, or faults decreases, which forms a pattern consistent with the mechanisms of engineering disturbance, fluvial erosion, and tectonic activity. Elevation exhibits an overall negative contribution, suggesting that high-altitude regions have relatively low susceptibility. Slope shows a more concentrated positive contribution on the right side, indicating that increasing slope generally elevates landslide risk. Aspect has a relatively small and bidirectional influence, reflecting localized variability. Curvature, land use, the TWI, and the NDVI are distributed mostly near the zero line, indicating a weaker overall effect. Among these factors, the NDVI and TWI tend to show slight negative contributions, while land use exhibits bidirectional effects depending on the land cover type.
Notably, the elevation and distance to faults exhibit a strong positive correlation (r = 0.794), as shown in Table 1. This pattern is consistent with the regional geomorphic–tectonic setting of Changbai County, where major fault zones are predominantly distributed within uplifted and strongly dissected mountainous terrain, while low-lying plains and valley bottoms are generally located farther from tectonic structures. Accordingly, elevation and distance to faults capture partially overlapping tectono-geomorphic information rather than representing a direct causal relationship between the two variables.
Under such collinearity, SHAP attributions may be shared or redistributed between correlated predictors, leading to partial attenuation or exchange of importance across samples. In this study, elevation mainly reflects the regional topographic background and relief energy, whereas distance to faults represents the structural control associated with fractured rock masses and potential weak zones. Therefore, their SHAP contributions are more appropriately interpreted jointly as a coupled tectono-geomorphic influence on landslide susceptibility, rather than as strictly independent causal effects.
In summary, the model’s responses to key factors exhibit strong interpretability and physical consistency. Distance-related variables, including those closely associated with surface disturbance and erosion processes, combined with elevation are the most critical determinants of susceptibility, while slope provides a clear risk-enhancing signal. Other factors contribute less to the overall prediction result but still offer complementary information at the local scale. These findings are consistent with the established landslide formation mechanisms and provide practical insights for disaster mitigation, emphasizing the need for focused monitoring and control in areas adjacent to roads, major rivers, and active faults. While the bar plot quantifies “which factors matter most”, the beeswarm plot illustrates “how each factor influences the prediction across its value range”. Furthermore, they provide complementary and mutually reinforcing evidence supporting the interpretation of susceptibility patterns in the study area.

5. Conclusions

To address the insufficient characterization of spatial heterogeneity, inadequate spatial continuity, and limited mechanistic interpretability in geological hazard susceptibility assessments, an intelligent landslide susceptibility evaluation method based on the Swin-UNet model is proposed. This method is focused on landslides and integrates 11 conditioning factors—elevation, slope, aspect, plan curvature, profile curvature, the TWI, the NDVI, land use, distance to roads, distance to rivers, and distance to faults—derived from remote sensing data. By incorporating window-based self-attention into the Swin-UNet architecture, the model’s ability to recognize complex geomorphic zones and long-range dependencies is enhanced, effectively capturing local morphological features around known landslide occurrences.
On the same test dataset, the Swin-UNet model achieved the highest overall accuracy (99.87%) when compared with the information value method and the standard U-Net model, demonstrating its strong capability for performing landslide susceptibility assessments. The 30 m resolution susceptibility map generated by the Swin-UNet exhibits improved spatial continuity and geomorphological consistency, with high-susceptibility pixels forming continuous belts and coherent patches while non-target regions remaining cleaner. This improvement stems from the self-attention mechanism in the encoder, which captures long-range dependencies and integrates topographic, structural, and hydrological constraints, thereby enhancing the model’s practical utility in applications such as planning and field inspection.
A SHAP-based interpretability analysis was conducted to further interpret the proposed method. The SHAP global importance analysis and beeswarm plots further elucidate the underlying mechanisms of feature influence. Distance-related variables, including distance to roads, rivers, and faults, together with elevation rank among the most significant contributors, followed by morphological indicators such as aspect, slope, and curvature. High-value or small-distance samples generally exert a positive influence on the model output, aligning with the physical understanding of geomorphic processes and triggering conditions in the region. This interpretability evidence not only enhances the credibility of the model’s predictions but also provides a quantitative basis for susceptibility zonation and for prioritizing inspection and mitigation efforts along roads, valleys, and other high-risk zones.

Author Contributions

Conceptualization, X.R.; Methodology, X.R. and J.L.; Software, J.L.; Validation, J.L. and X.W.; Investigation, X.W.; Data curation, J.L.; Writing—original draft preparation, J.L.; Writing—review and editing, X.R. and J.L.; Visualization, J.L.; Supervision, X.W.; Funding acquisition, X.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Jilin Province Science and Technology Department Project (Grant No. 20220203197SF), and the Jilin Provincial Department of Education Scientific Research Project (Grant No. JJKH20251217SK).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Acknowledgments

The authors would like to thank Linfu Xue and Yanyan Zhang for all the work of validation and visualization. And the valuable comments from the editors and reviewers should be appreciated.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Pacheco Quevedo, R.; Velastegui-Montoya, A.; Montalván-Burbano, N.; Morante-Carballo, F.; Korup, O.; Daleles Rennó, C. Land use and land cover as a conditioning factor in landslide susceptibility: A literature review. Landslides 2023, 20, 967–982. [Google Scholar] [CrossRef]
  2. Casagli, N.; Intrieri, E.; Tofani, V.; Gigli, G.; Raspini, F. Landslide detection, monitoring and prediction with remote-sensing techniques. Nat. Rev. Earth Environ. 2023, 4, 51–64. [Google Scholar] [CrossRef]
  3. Huang, F.; Xiong, H.; Jiang, S.-H.; Yao, C.; Fan, X.; Catani, F.; Chang, Z.; Zhou, X.; Huang, J.; Liu, K. Modelling landslide susceptibility prediction: A review and construction of semi-supervised imbalanced theory. Earth-Sci. Rev. 2024, 250, 104700. [Google Scholar] [CrossRef]
  4. Reichenbach, P.; Rossi, M.; Malamud, B.D.; Mihir, M.; Guzzetti, F. A review of statistically-based landslide susceptibility models. Earth-Sci. Rev. 2018, 180, 60–91. [Google Scholar] [CrossRef]
  5. Zhang, J.; Ma, X.; Zhang, J.; Sun, D.; Zhou, X.; Mi, C.; Wen, H. Insights into geospatial heterogeneity of landslide susceptibility based on the SHAP-XGBoost model. J. Environ. Manag. 2023, 332, 117357. [Google Scholar] [CrossRef]
  6. Forman, E.H.; Gass, S.I. The Analytic Hierarchy Process—An Exposition. Oper. Res. 2001, 49, 469–486. [Google Scholar] [CrossRef]
  7. Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
  8. Liu, X.; Shao, S.; Shao, S. Landslide susceptibility zonation using the analytical hierarchy process (AHP) in the Great Xi’an Region, China. Sci. Rep. 2024, 14, 2941. [Google Scholar] [CrossRef]
  9. Ahmad, M.S.; Lisa, M.; Khan, S. Comparative analysis of analytical hierarchy process (AHP) and frequency ratio (FR) models for landslide susceptibility mapping in Reshun, NW Pakistan. Kuwait J. Sci. 2023, 50, 387–398. [Google Scholar] [CrossRef]
  10. Shano, L.; Raghuvanshi, T.K.; Meten, M. Landslide susceptibility evaluation and hazard zonation techniques—A review. Geoenviron. Disasters 2020, 7, 18. [Google Scholar] [CrossRef]
  11. Yilmaz, I. Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: A case study from Kat landslides (Tokat—Turkey). Comput. Geosci. 2009, 35, 1125–1138. [Google Scholar] [CrossRef]
  12. Yu, L.; Wang, Y.; Pradhan, B. Enhancing landslide susceptibility mapping incorporating landslide typology via stacking ensemble machine learning in Three Gorges Reservoir, China. Geosci. Front. 2024, 15, 101802. [Google Scholar] [CrossRef]
  13. Ba, Q.; Chen, Y.; Deng, S.; Wu, Q.; Yang, J.; Zhang, J. An Improved Information Value Model Based on Gray Clustering for Landslide Susceptibility Mapping. ISPRS Int. J. Geo-Inf. 2017, 6, 18. [Google Scholar] [CrossRef]
  14. Wang, Q.; Guo, Y.; Li, W.; He, J.; Wu, Z. Predictive modeling of landslide hazards in Wen County, northwestern China based on information value, weights-of-evidence, and certainty factor. Geomat. Nat. Hazards Risk 2019, 10, 820–835. [Google Scholar] [CrossRef]
  15. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  16. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  17. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  18. Guo, Z.; Tian, B.; Zhu, Y.; He, J.; Zhang, T. How do the landslide and non-landslide sampling strategies impact landslide susceptibility assessment?—A catchment-scale case study from China. J. Rock. Mech. Geotech. Eng. 2024, 16, 877–894. [Google Scholar] [CrossRef]
  19. Taalab, K.; Cheng, T.; Zhang, Y. Mapping landslide susceptibility and types using Random Forest. Big Earth Data 2018, 2, 159–178. [Google Scholar] [CrossRef]
  20. Chen, W.; Xie, X.; Wang, J.; Pradhan, B.; Hong, H.; Bui, D.T.; Duan, Z.; Ma, J. A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. CATENA 2017, 151, 147–160. [Google Scholar] [CrossRef]
  21. Li, Y.; Liu, X.; Han, Z.; Dou, J. Spatial Proximity-Based Geographically Weighted Regression Model for Landslide Susceptibility Assessment: A Case Study of Qingchuan Area, China. Appl. Sci. 2020, 10, 1107. [Google Scholar] [CrossRef]
  22. Blaschke, T. Object based image analysis for remote sensing. ISPRS J. Photogramm. Remote Sens. 2010, 65, 2–16. [Google Scholar] [CrossRef]
  23. Zhang, K.; Wu, X.; Niu, R.; Yang, K.; Zhao, L. The assessment of landslide susceptibility mapping using random forest and decision tree methods in the Three Gorges Reservoir area, China. Environ. Earth Sci. 2017, 76, 405. [Google Scholar] [CrossRef]
  24. Trigila, A.; Iadanza, C.; Esposito, C.; Scarascia-Mugnozza, G. Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy). Geomorphology 2015, 249, 119–136. [Google Scholar] [CrossRef]
  25. Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]
  26. Li, J.; Cai, Y.; Li, Q.; Kou, M.; Zhang, T. A review of remote sensing image segmentation by deep learning methods. Int. J. Digit. Earth 2024, 17, 2328827. [Google Scholar] [CrossRef]
  27. Azarafza, M.; Azarafza, M.; Akgün, H.; Atkinson, P.M.; Derakhshani, R. Deep learning-based landslide susceptibility mapping. Sci. Rep. 2021, 11, 24112. [Google Scholar] [CrossRef]
  28. Thi Ngo, P.T.; Panahi, M.; Khosravi, K.; Ghorbanzadeh, O.; Kariminejad, N.; Cerda, A.; Lee, S. Evaluation of deep learning algorithms for national scale landslide susceptibility mapping of Iran. Geosci. Front. 2021, 12, 505–519. [Google Scholar] [CrossRef]
  29. Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
  30. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef]
  31. Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y. A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 87–110. [Google Scholar] [CrossRef]
  32. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar] [CrossRef]
  33. Khan, B.A.; Jung, J.-W. Semantic Segmentation of Aerial Imagery Using U-Net with Self-Attention and Separable Convolutions. Appl. Sci. 2024, 14, 3712. [Google Scholar] [CrossRef]
  34. Kugelman, J.; Allman, J.; Read, S.A.; Vincent, S.J.; Tong, J.; Kalloniatis, M.; Chen, F.K.; Collins, M.J.; Alonso-Caneiro, D. A comparison of deep learning U-Net architectures for posterior segment OCT retinal layer segmentation. Sci. Rep. 2022, 12, 14888. [Google Scholar] [CrossRef] [PubMed]
  35. Zhang, H.; Jiang, Z.; Zheng, G.; Yao, X. Semantic Segmentation of High-Resolution Remote Sensing Images with Improved U-Net Based on Transfer Learning. Int. J. Comput. Intell. Syst. 2023, 16, 181. [Google Scholar] [CrossRef]
  36. Alom, M.Z.; Hasan, M.; Yakopcic, C.; Taha, T.M.; Asari, V.K. Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation. arXiv 2018, arXiv:1802.06955. [Google Scholar] [CrossRef]
  37. Zhang, Z.; Liu, Q.; Wang, Y. Road Extraction by Deep Residual U-Net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
  38. Oktay, O.; Schlemper, J.; Folgoc, L.L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N.Y.; Kainz, B. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar] [CrossRef]
  39. Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Stoyanov, D., Taylor, Z., Carneiro, G., Syeda-Mahmood, T., Martel, A., Maier-Hein, L., Tavares, J.M.E.S., Bradley, A., Paulo Papa, J., Belagiannis, V., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 3–11. [Google Scholar] [CrossRef]
  40. Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016; Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 424–432. [Google Scholar] [CrossRef]
  41. Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.-W.; Wu, J. UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: New York, NY, USA, 2020; pp. 1055–1059. [Google Scholar] [CrossRef]
  42. Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. IEEE Trans. Med. Imaging 2020, 39, 1856–1867. [Google Scholar] [CrossRef]
  43. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need n.d. arXiv 2017, arXiv:1706.03762. [Google Scholar]
  44. Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H. Restormer: Efficient Transformer for High-Resolution Image Restoration. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: New York, NY, USA, 2022; pp. 5718–5729. [Google Scholar] [CrossRef]
  45. Zheng, W.; Lu, S.; Yang, Y.; Yin, Z.; Yin, L. Lightweight transformer image feature extraction network. PeerJ Comput. Sci. 2024, 10, e1755. [Google Scholar] [CrossRef]
  46. Wang, W.; Xie, E.; Li, X.; Fan, D.-P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. PVT v2: Improved baselines with pyramid vision transformer. Comput. Vis. Media 2022, 8, 415–424. [Google Scholar] [CrossRef]
  47. Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. SwinIR: Image Restoration Using Swin Transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021; IEEE: New York, NY, USA, 2021; pp. 1833–1844. [Google Scholar] [CrossRef]
  48. Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Ning, J.; Cao, Y.; Zhang, Z.; Dong, L.; et al. Swin Transformer V2: Scaling Up Capacity and Resolution. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: New York, NY, USA, 2022; pp. 11999–12009. [Google Scholar] [CrossRef]
  49. Liu, Z.; Ning, J.; Cao, Y.; Wei, Y.; Zhang, Z.; Lin, S.; Hu, H. Video Swin Transformer. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: New York, NY, USA, 2022; pp. 3192–3201. [Google Scholar] [CrossRef]
  50. Jiang, J.; Zhu, J.; Bilal, M.; Cui, Y.; Kumar, N.; Dou, R.; Su, F.; Xu, X. Masked Swin Transformer Unet for Industrial Anomaly Detection. IEEE Trans. Ind. Inform. 2023, 19, 2200–2209. [Google Scholar] [CrossRef]
Figure 1. The location of the study area (a) and the DEM characteristics (b).
Figure 1. The location of the study area (a) and the DEM characteristics (b).
Applsci 16 00301 g001
Figure 2. The topographic characteristics of 6 different factors: (a) elevation; (b) slope; (c) aspect; (d) profile curvature; (e) plan curvature; and (f) TWI.
Figure 2. The topographic characteristics of 6 different factors: (a) elevation; (b) slope; (c) aspect; (d) profile curvature; (e) plan curvature; and (f) TWI.
Applsci 16 00301 g002
Figure 3. The characteristics of 2 different environmental factors: (a) the NDVI and (b) distance to rivers.
Figure 3. The characteristics of 2 different environmental factors: (a) the NDVI and (b) distance to rivers.
Applsci 16 00301 g003
Figure 4. The distribution of fault influence intensity.
Figure 4. The distribution of fault influence intensity.
Applsci 16 00301 g004
Figure 5. The distribution of human activities: (a) distance to roads and (b) land use.
Figure 5. The distribution of human activities: (a) distance to roads and (b) land use.
Applsci 16 00301 g005
Figure 6. The architecture of the U-Net model. Different types of modules are represented by different colors, and the arrows indicate the direction of feature propagation through the network.
Figure 6. The architecture of the U-Net model. Different types of modules are represented by different colors, and the arrows indicate the direction of feature propagation through the network.
Applsci 16 00301 g006
Figure 7. The architecture of the Swin-UNet model. Different types of modules are represented by different colors, and the arrows indicate the direction of feature propagation through the network.
Figure 7. The architecture of the Swin-UNet model. Different types of modules are represented by different colors, and the arrows indicate the direction of feature propagation through the network.
Applsci 16 00301 g007
Figure 8. The PR curves of different models.
Figure 8. The PR curves of different models.
Applsci 16 00301 g008
Figure 9. The ROC curves of different models. The dashed diagonal line indicates the performance of a random classifier (TPR = FPR; AUC = 0.5).
Figure 9. The ROC curves of different models. The dashed diagonal line indicates the performance of a random classifier (TPR = FPR; AUC = 0.5).
Applsci 16 00301 g009
Figure 10. The comparison of Top-K precision: (a) the U-Net model and (b) Swin-UNet model.
Figure 10. The comparison of Top-K precision: (a) the U-Net model and (b) Swin-UNet model.
Applsci 16 00301 g010
Figure 11. The comparison of RA curves: (a) the U-Net model and (b) Swin-UNet model.
Figure 11. The comparison of RA curves: (a) the U-Net model and (b) Swin-UNet model.
Applsci 16 00301 g011
Figure 12. Landslide susceptibility maps: (a) the information value method; (b) U-Net model and (c) Swin-UNet model.
Figure 12. Landslide susceptibility maps: (a) the information value method; (b) U-Net model and (c) Swin-UNet model.
Applsci 16 00301 g012
Figure 13. The spatial analysis of prediction errors: (a) land use; (b) distance to roads and (c) distance to rivers.
Figure 13. The spatial analysis of prediction errors: (a) land use; (b) distance to roads and (c) distance to rivers.
Applsci 16 00301 g013
Figure 14. SHAP bar chart.
Figure 14. SHAP bar chart.
Applsci 16 00301 g014
Figure 15. SHAP beeswarm plot.
Figure 15. SHAP beeswarm plot.
Applsci 16 00301 g015
Table 1. The Pearson correlation coefficients (reported to three decimals): I. distance to roads; II. distance to faults; III. elevation; IV. slope; V. aspect; VI. profile curvature; VII. plan curvature; VIII. the TWI; IX. land use; X. the NDVI; and XI. distance to rivers.
Table 1. The Pearson correlation coefficients (reported to three decimals): I. distance to roads; II. distance to faults; III. elevation; IV. slope; V. aspect; VI. profile curvature; VII. plan curvature; VIII. the TWI; IX. land use; X. the NDVI; and XI. distance to rivers.
IIIIIIIVVVIVIIVIIIIXXXI
I1.000.0760.180−0.1820.044−0.013−0.0090.071−0.0010.1220.376
II0.0761.000.794−0.040−0.128−0.008−0.006−0.0070.073−0.1100.092
III0.1800.7941.00−0.133−0.090−0.0870.005−0.0320.0020.0070.363
IV−0.182−0.040−0.1331.00−0.030−0.0050.037−0.4530.0680.192−0.227
V0.044−0.128−0.090−0.0301.000.003−0.001−0.003−0.0040.0480.019
VI−0.013−0.008−0.087−0.0050.0031.00−0.3860.2920.040−0.071−0.025
VII−0.009−0.0060.0050.037−0.001−0.3861.00−0.394−0.0240.009−0.017
VIII0.071−0.007−0.032−0.453−0.0030.292−0.3941.000.020−0.1660.054
IX−0.0010.0730.0020.068−0.0040.040−0.0240.0201.00−0.256−0.045
X0.122−0.1100.0070.1920.048−0.0710.009−0.166−0.2561.000.076
XI0.3760.0920.363−0.2270.019−0.025−0.0170.054−0.0450.0761.00
Table 2. The TOL and VIF of each factor (reported to three decimals).
Table 2. The TOL and VIF of each factor (reported to three decimals).
FactorVIFTOL
Distance to roads1.2080.828
Distance to faults3.2750.305
Elevation3.7180.269
Slope1.4810.675
Aspect1.0230.978
Profile curvature1.2410.806
Plan curvature1.3420.745
TWI1.6260.615
Land use type1.0960.912
NDVI1.2070.829
Distance to rivers1.4870.673
Table 3. The hyperparameter settings of the Swin-UNet model.
Table 3. The hyperparameter settings of the Swin-UNet model.
ParameterValueFunctionSelection Rationale
OptimizerAdamWParameter updateProvides stable convergence for segmentation tasks and is commonly paired with Transformer encoders.
Learning rate3 × 10−4Gradient step sizeDetermined through multiple trials; larger values cause oscillations, while smaller values slow convergence.
Weight decay0.05Overfitting suppressionTuned through multiple experiments; 0.05 effectively suppresses overfitting.
Epochs50Total training iterationsAfter tuning, 50 epochs are sufficient to achieve stable performance.
Batch size8Number of samples per training stepA trade-off between Graphics Processing Unit (GPU) memory usage and gradient stability.
Data loading workers4DataLoader parallelismTypical configuration considering Windows Operating System (OS) and Solid State Drive (SSD) performance.
Loss functionBCE + Dice (weighted)Optimizes pixel classification and region overlapDice is more stable under class imbalance, while BCE preserves probability separability.
BCE weight0.3Controls the contribution of BCEEmphasizing region overlap can improve spatial continuity and IoU.
Dice weight0.7Controls the contribution of DiceSame as above
Table 4. The hardware and software configurations used in the experiments.
Table 4. The hardware and software configurations used in the experiments.
Experimental PlatformSpecification
Central Processing Unit (CPU)AMD Ryzen 9 7945HX with Radeon Graphics (Manufacturer: Advanced Micro Devices, Inc., Santa Clara, CA, USA)
GPUNVIDIA GeForce RTX 4060 (Manufacturer: Nvidia Corporation, Santa Clara, CA, USA)
GPU Memory (GRAM)8 GB
OSWindows 11
Development EnvironmentPython3.10.18 + PyTorch2.7.1
Table 5. The evaluation results of the information value method and the U-Net and Swin-UNet models.
Table 5. The evaluation results of the information value method and the U-Net and Swin-UNet models.
Evaluation MetricInformation Value MethodU-Net ModelSwin-UNet Model
AUROC0.1360.9960.988
AUPRC0.180.9260.933
F10.1550.9410.927
IoU0.0840.8890.864
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, J.; Ran, X.; Wang, X. Intelligent Landslide Susceptibility Assessment Framework Using the Swin Transformer Technique: A Case Study of Changbai County, Jilin Province, China. Appl. Sci. 2026, 16, 301. https://doi.org/10.3390/app16010301

AMA Style

Liu J, Ran X, Wang X. Intelligent Landslide Susceptibility Assessment Framework Using the Swin Transformer Technique: A Case Study of Changbai County, Jilin Province, China. Applied Sciences. 2026; 16(1):301. https://doi.org/10.3390/app16010301

Chicago/Turabian Style

Liu, Jiachen, Xiangjin Ran, and Xi Wang. 2026. "Intelligent Landslide Susceptibility Assessment Framework Using the Swin Transformer Technique: A Case Study of Changbai County, Jilin Province, China" Applied Sciences 16, no. 1: 301. https://doi.org/10.3390/app16010301

APA Style

Liu, J., Ran, X., & Wang, X. (2026). Intelligent Landslide Susceptibility Assessment Framework Using the Swin Transformer Technique: A Case Study of Changbai County, Jilin Province, China. Applied Sciences, 16(1), 301. https://doi.org/10.3390/app16010301

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop