1. Introduction
Permafrost, covering approximately 24% of the Northern Hemisphere’s land surface, acts as a critical stabilizer for the cryosphere but is undergoing rapid degradation due to global climate warming [
1,
2]. This thermal disturbance has triggered a surge in thermokarst landforms, most notably retrogressive thaw slumps, which are dynamic, horseshoe-shaped slope failures driven by the ablation of massive ground ice [
3,
4]. Unlike gradual thawing, RTSs represent an abrupt permafrost collapse that significantly alters surface topography, releases stored carbon, and disrupts hydrological systems. The Qinghai–Tibet Plateau (QTP), characterized by warm and sensitive high-altitude permafrost, has become a hotspot for such geohazards under the dual influence of warming and wetting trends [
5,
6,
7,
8]. In particular, the Yangtze River Source Region faces increasing threats from expanding RTSs, which not only jeopardize the fragile alpine ecosystem but also pose severe risks to the stability of the Qinghai–Tibet engineering corridor [
9], highlighting an urgent need for precise spatial prediction and assessment.
The identification and monitoring of retrogressive thaw slumps are prerequisites for assessing their environmental impacts. Generally, detection methods fall into three categories: field surveys, Unmanned Aerial Vehicle (UAV) photogrammetry, and satellite remote sensing [
10,
11,
12]. Field surveys [
13] provide the most ground-truth details but are severely limited by harsh environmental conditions and poor accessibility in high-altitude regions. UAV photogrammetry offers ultra-high resolution and 3D terrain reconstruction capabilities [
14,
15], making it ideal for monitoring the micro-topographic changes in individual slumps; however, it is impractical for regional-scale mapping due to limited flight range and high costs. Consequently, satellite remote sensing, with its advantage of extensive spatial coverage and high temporal frequency, has become the dominant tool for identifying RTSs at regional to continental scales [
16,
17,
18]. Historically, manual visual interpretation of optical satellite imagery and aerial photography has been regarded as the standard method for RTS inventorying. This approach relies on the expert recognition of distinct geomorphological features, such as the arcuate headwall and the muddy slump floor. Numerous studies have demonstrated the efficacy of this method. For instance, in the Arctic, Lantz and Kokelj (2008) [
19] manually digitized over 500 slumps in the Mackenzie Delta using aerial photos to study their density and distribution. Similarly, Balser et al. (2014) [
20] utilized time-series SAR and optical images to visually determine the onset timing of RTSs in Alaska. On the Qinghai–Tibet Plateau, Luo et al. (2019) [
4] and Yang et al. (2023) [
15] successfully compiled long-term RTS inventories for the Beiluhe region using multi-source high-resolution imagery (e.g., PlanetScope, SPOT-5). The primary advantage of manual interpretation lies in its high accuracy; human experts can effectively distinguish active RTSs from stabilized ones or other confusing landforms by integrating contextual information like vegetation succession and terrain texture [
1,
14,
21].
In recent years, to overcome the labor-intensive nature of manual mapping, automated methods based on machine learning and deep learning have emerged. Studies have explored algorithms ranging from Random Forest and Support Vector Machines [
17] to advanced deep learning architectures like U-Net and DeepLabv3+ [
22,
23]. While these automated methods show promise in specific test areas, they face significant challenges in transferability and robustness when applied to complex terrains like the QTP. As noted by Nitze et al. (2018) [
24] and Xia et al. (2022) [
25], the diversity in RTS morphology, scale, and spectral characteristics often leads to misclassification. Furthermore, the reliance on medium-resolution open-source data, such as Landsat and Sentinel-2, presents inherent limitations. Firstly, their spatial resolution (10–30 m) is often insufficient to detect small-scale or incipient RTSs, leading to omission errors. Secondly, the QTP is frequently subject to persistent cloud cover and seasonal snow, which contaminate optical signals and obscure surface features. These factors, combined with complex topographic effects (e.g., mountain shadows) and the temporal variability of thaw processes, limit the effectiveness of automated detection in this region. The QTP landscape is riddled with confusing features—such as erosional gullies, landslides, and barren patches—that share similar spectral signatures with RTSs, causing high false-positive rates in automated detection, especially with medium-resolution data (e.g., Sentinel-2, Landsat). Given these limitations, although deep learning represents a future trend, manual visual interpretation remains the most reliable method for generating high-precision RTS datasets in complex permafrost terrains [
26,
27]. It avoids the “black box” uncertainties of AI models and ensures that the identified features are geomorphologically valid. Therefore, considering the dynamic nature of RTSs and the complex background of the Yangtze River Source Region, this study adopts expert-based visual interpretation of high-resolution satellite imagery as the optimal strategy to ensure the accuracy and reliability of the RTS inventory.
While field investigations and remote sensing have successfully identified the regional occurrence and dynamics of RTSs [
20,
28,
29], understanding the complex environmental controls behind these distributions remains a challenge. This necessitates a shift from identifying existing slumps to assessing landslide susceptibility, which is defined as the spatial likelihood of slope failure occurrence determined by intrinsic controlling factors (e.g., topography, lithology, and hydrology) [
30]. Unlike hazard assessment, which incorporates temporal triggers, susceptibility focuses on predicting “where” potential instability exists based on the fundamental assumption that future failures are likely to occur under conditions similar to past events [
31,
32]. To quantify this spatial probability, Machine Learning (ML) approaches have rapidly advanced. Compared to physically based models, ML algorithms offer distinct advantages in computational efficiency and the ability to handle non-linear relationships in high-dimensional datasets [
33,
34,
35], making them highly effective for modeling permafrost landform dynamics [
36,
37] and assessing regional RTS susceptibility [
38,
39]. However, compared to the mature field of general landslide susceptibility mapping, research specifically targeting RTSs remains underdeveloped. In general landslide studies, robust methodologies for feature selection and model construction are well-established, utilizing a wide array of geological and topographical variables. Current trends have also moved towards ensemble learning techniques to boost prediction accuracy by combining multiple weak classifiers [
39,
40]. Specifically, algorithms based on Bagging (e.g., Random Forest) and Boosting (e.g., XGBoost, LightGBM) frameworks have become the standard. These ensemble methods effectively mitigate the bias-variance trade-off inherent in single classifiers and demonstrate superior robustness in capturing complex, non-linear interactions among environmental variables. In contrast, RTS susceptibility modeling has not fully exploited these advancements. A critical oversight is the underutilization of spectral features. The development of RTSs involves specific surface processes, namely, the stripping of vegetation and the thawing of ice-rich permafrost, that manifest as distinct spectral signatures. These include the reflectance patterns of exposed soil, significant moisture signals from thawing ice, and clear indicators of vegetation disturbance compared to stable slopes. Capturing these water-related indicators and surface changes is crucial for physically interpreting susceptibility, yet they are often overlooked. The Qinghai–Tibet Plateau is characterized by extensive exposed terrain and sparse vegetation, providing an ideal background for optical remote sensing to capture subtle variations in soil moisture, texture, and composition. While such multi-dimensional spectral indices are widely used in environmental monitoring, they are rarely incorporated into RTS susceptibility frameworks [
41,
42]. Furthermore, the potential of ensemble learning to mitigate the bias and variance of single classifiers—a standard practice in landslide research—has not been systematically explored for RTSs.
To address the aforementioned limitations and bridge the gap between general landslide modeling and specific permafrost hazard assessment, this study proposes a comprehensive framework for RTS susceptibility mapping on the Qinghai–Tibet Plateau. Specifically, this research focuses on three innovative aspects: (i) constructing a high-dimensional feature space that, for the first time, integrates optical spectral features (e.g., indices of soil moisture and texture) with traditional topographical and hydrological factors, leveraging the unique exposed terrain of the QTP; (ii) deploying advanced ensemble learning strategies to overcome the performance bottlenecks and variance limitations of individual classifiers; and (iii) conducting a quantitative analysis of factor importance to disentangle the complex environmental controls driving RTS occurrence. Ultimately, this study aims to establish a high-precision susceptibility model and produce a reliable zonation map, providing critical insights for disaster risk management and infrastructure protection in fragile permafrost regions.
2. Study Area
The Yangtze River Source Region, covering an area of 90,463.88 km
2, is located in the central Tibetan Plateau (
Figure 1a) and constitutes one of the three headwater regions of the Sanjiangyuan (Three-River-Source) area. The region is characterized by high-elevation terrain ranging from 4000 to 6800 m a.s.l. (
Figure 1b), with a mean elevation exceeding 4500 m. Geomorphologically, the landscape exhibits typical plateau basin-and-range topography, dominated by broad intermontane valleys, gentle alluvial terraces, and degraded alpine surfaces interspersed with residual mountain ridges. Geologically, the area lies within the Qiangtang–Qinghai composite terrane, featuring late Paleozoic to Mesozoic marine sedimentary sequences (limestone, sandstone, mudstone) overlain by Cenozoic lacustrine and fluvial deposits. Quaternary unconsolidated sediments—particularly ice-rich silty clays and peaty deposits exceeding 5–10 m thickness in valley bottoms—provide the critical substrate for thermokarst development. Permafrost conditions are predominantly continuous (>90% coverage), with active layer thickness ranging from 1.5 to 3.5 m and mean annual ground temperatures between −3 °C and −1 °C in degradation-prone zones. The region serves as the primary water source for the Yangtze River, with major tributaries including the Tuotuo, Dangqu, and Chumar rivers, which collectively contribute approximately 25% of the Yangtze’s annual discharge through seasonal snowmelt and permafrost baseflow.
In recent decades, accelerated climate warming and humidification across the Tibetan Plateau—with regional temperatures increasing at rates 2–3 times the global average (0.3–0.5 °C/decade since 1980)—have triggered widespread thermokarst landscape transformation. The YRSR has experienced exponential proliferation of retrogressive thaw slumps and thermokarst lakes, with RTS numbers increasing from fewer than 200 documented features prior to 2000 to over 2000 active slumps by 2024 (
Figure 1c). The spatial distribution of RTSs exhibits strong clustering along the Qinghai–Tibet Engineering Corridor (Qinghai–Tibet Highway and Railway) and within the Hoh Xil Uninhabited Zone, reflecting both anthropogenic disturbance and natural climate-driven degradation hotspots. The rapid thawing of permafrost not only releases significant amounts of stored organic carbon, exacerbating greenhouse gas emissions, but also threatens the stability of critical infrastructure within the Qinghai–Tibet engineering corridor (
Figure 1d–g), posing challenges to regional ecological security and sustainable development.
3. Materials and Methods
This study adopts a systematic three-stage methodological framework (
Figure 2) to evaluate RTS susceptibility in the Yangtze River Source Region.
- (i)
Phase I established a high-precision 2024 RTS inventory through rigorous visual interpretation of sub-meter satellite imagery (Jilin-1/Beijing-2), overcoming the resolution limitations of publicly available data. A multi-dimensional feature dataset was compiled, integrating static topographic controls with dynamic time-series spectral indices (2019–2024) to effectively capture surface thermal processes associated with permafrost degradation.
- (ii)
Phase II: Machine learning modeling implemented and compared eight distinct algorithms, ranging from traditional statistical methods to state-of-the-art ensemble learning frameworks (e.g., XGBoost, LightGBM, CatBoost). Rigorous hyperparameter optimization and cross-validation were employed to identify the model with the highest generalization capability for the plateau environment.
- (iii)
Phase III: Susceptibility analysis and mapping stage conducted systematic feature importance experiments to quantify the incremental value of spectral data. The SHAP framework was utilized to interpret nonlinear environmental drivers and mechanistic controls, and the optimal model was deployed to generate a spatially continuous, stratified RTS susceptibility map for regional hazard assessment.
3.1. Phase I: RTS Inventory Development and Feature Dataset Construction
The primary objective of this study was to construct a comprehensive and reliable inventory of retrogressive thaw slumps across the entire Yangtze River Source Region for the year 2024. We compiled the RTS inventory using high-resolution satellite imagery from Jilin-1 and Beijing-2 platforms, accessed through the freely available Omap portal. The baseline inventory was derived from open-source datasets provided by Luo et al. (2019) [
4] and Xia et al. (2024) [
43]. Building upon these foundations, we systematically verified all existing records and identified new RTSs through detailed visual interpretation of high-resolution imagery.
While semantic segmentation algorithms have been increasingly employed for automated RTS detection in recent studies, their reliable application fundamentally depends on the availability of high-resolution imagery. Currently, commercial high-resolution satellites (e.g., Jilin-1, Beijing-2, GF series) typically require substantial financial investment for data acquisition, whereas freely accessible alternatives like PlanetScope imagery (3 m spatial resolution) exhibit limited effectiveness for RTS identification on the Tibetan Plateau. While RTSs possess distinct morphological boundaries, including well-defined headwalls, exposed ground ice, and lobate debris tongues, these features become increasingly difficult to delineate at moderate spatial resolutions (≥3 m), particularly for smaller-scale slumps where spectral mixing and insufficient spatial detail hinder accurate boundary detection.
Figure 3 illustrates this critical resolution dependency through three representative examples (Sites A, B, C) comparing PlanetScope (3 m) and Beijing-2 (0.8 m) imagery for the same locations in 2024. In PlanetScope imagery, RTS boundaries appear ambiguous and are often indistinguishable from surrounding erosional features, making it difficult to confidently identify slump extent or even confirm RTS presence. In contrast, Beijing-2 imagery clearly reveals diagnostic morphological characteristics—sharp headwall scarps, smooth mudflow surfaces, and distinct color contrasts between exposed mineral soil and adjacent vegetation, enabling accurate delineation of RTS boundaries and confident differentiation from other mass wasting processes. This visual comparison validates our methodological decision to prioritize sub-meter resolution imagery for inventory construction.
To ensure systematic coverage and data quality, we implemented a rigorous two-stage verification protocol (
Figure 3). First, the entire YRSR was subdivided into fourteen spatially distinct subregions to facilitate systematic interpretation and prevent positional ambiguity arising from navigating such an extensive study area without spatial partitioning. Each subregion was independently assigned to one of five professional researchers with expertise in geomorphology and remote sensing for initial RTS identification through visual interpretation within the Omap platform. Second, following the initial identification phase, a different team member performed independent verification and supplementary screening within the same subregion to identify overlooked features and confirm uncertain cases. This cross-validation approach, combined with high-resolution imagery access, minimized both omission errors (missing actual RTSs) and commission errors (misclassifying other features as RTSs), ensuring the completeness and reliability of the final 2024 RTS inventory.
To facilitate susceptibility analysis of RTSs, we constructed a comprehensive multi-dimensional feature dataset integrating traditional topographic variables, land cover characteristics, and dynamic spectral indices. The feature selection was designed to capture the multifaceted environmental controls on RTS occurrence and activity. Traditional topographic and environmental features included digital elevation model (DEM)-derived variables and categorical land surface characteristics (
Figure 4). The SRTM DEM (30 m spatial resolution) served as the foundation for extracting aspect, slope, plan curvature, and profile curvature using spatial analysis functions in ArcGIS Pro 3.17. Vegetation type classification was obtained from Zhou et al. (2025) [
44], who developed an innovative methodology for long-term continuous annual vegetation mapping based on reference vegetation maps with annual updates, producing a 500 m resolution vegetation map of the Qinghai–Tibet Plateau for 2022 using MOD09A1 data. Landform classification was derived from the SRTM Landform dataset, which provides landform classes created by integrating the Continuous Heat-Insolation Load Index and the multi-scale Topographic Position Index available in Google Earth Engine [
45]. Surface water distribution was characterized using the Global Surface Water dataset, which contains spatiotemporal maps of surface water extent from 1984 to 2021 [
46]. We utilized the total observation count layer to calculate Euclidean distance to water bodies using the spatial analyst tools in ArcGIS Pro, capturing the proximity of potential RTS sites to hydrological features that may influence ground thermal regimes.
Given that exposed headwalls and disturbed surfaces constitute defining characteristics of RTSs, spectral indices provide valuable metrics for tracking and quantifying surface conditions associated with permafrost degradation. Accordingly, we constructed a comprehensive suite of spectral indices using multi-sensor satellite data for the period 2019–2024, focusing on summer acquisitions to capture dynamic surface processes during the peak thaw season (
Figure 5). Based on Sentinel-2 MSI imagery, we derived standard indices including the Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), Normalized Difference Moisture Index (NDMI), and Normalized Burn Ratio (NBR). Additionally, the Tasseled Cap transformation components, Brightness (TCB), Greenness (TCG), and Wetness (TCW), were calculated using the specific coefficients for Sentinel-2 proposed by Shi et al. (2019) [
47]. Land Surface Temperature (LST) was extracted from Landsat 8/9 Thermal Infrared Sensor data, providing direct measurements of thermal forcing driving ground ice thaw. Net Degree-Days (NDD), calculated as the annual net balance of Thawing and Freezing Degree-Days, were derived from MODIS MOD11A1 thermal data to quantify the cumulative surface energy balance influencing active layer dynamics. All spectral processing and data integration were executed on the Google Earth Engine platform. To facilitate research reproducibility and data reuse, the complete dataset of derived spectral features has been open-sourced. These data, provided by Taorui Zeng, are accessible via the National Tibetan Plateau Data Center and the Third Pole Environment Data Center (e.g.,
https://cstr.cn/18406.11.Terre.tpdc.303004 (accessed on 16 November 2025)).
All feature layers were resampled and standardized to a uniform 30 m × 30 m spatial resolution and reprojected to a consistent coordinate reference system to ensure geometric consistency across the dataset. The polygon boundaries of the RTS inventory were converted to point features at 30 m spacing, yielding a total of 62,401 RTS presence points. To construct a balanced dataset for machine learning applications, an equal number of non-RTS points were randomly sampled from areas located beyond a 300 m buffer zone surrounding mapped RTSs, ensuring spatial independence between presence and absence samples. The final dataset was partitioned into training (70%) and testing (30%) subsets using stratified random sampling to maintain representative distributions of environmental conditions in both subsets.
3.2. Phase II: Machine Learning Modeling
To evaluate RTS susceptibility across the YRSR, we implemented and compared eight widely used machine learning algorithms, each offering distinct advantages for capturing complex nonlinear relationships between environmental features and RTS occurrence. The selected algorithms encompassed both traditional statistical methods and state-of-the-art ensemble learning techniques.
Logistic Regression (LR) served as the baseline statistical model, providing interpretable probability estimates through a linear combination of input features transformed by the logistic function. Despite its simplicity, LR offers computational efficiency and serves as a benchmark for assessing the performance gains achieved by more sophisticated algorithms. Multi-Layer Perceptron Neural Network (MLPNN) represented the neural network approach, capable of approximating arbitrary nonlinear functions through multiple hidden layers with nonlinear activation functions. The MLPNN architecture enables the learning of hierarchical feature representations from input data. K-Nearest Neighbors (KNNs) employed an instance-based learning strategy, classifying samples based on the majority class among their k-nearest neighbors in feature space. This non-parametric approach makes no assumptions about underlying data distributions and can capture localized patterns in the feature space.
Gradient Boosting Decision Tree (GBDT) implements the foundational gradient boosting framework, sequentially constructing an ensemble of decision trees where each subsequent tree corrects the residual errors of the previous ensemble. GBDT employs gradient descent optimization in function space to minimize a differentiable loss function, providing a robust and flexible approach to capturing complex feature interactions and nonlinear relationships. Random Forest (RF) utilized an ensemble of decision trees trained on bootstrap samples with random feature selection at each split, combining predictions through majority voting. This bagging approach reduces overfitting and provides inherent feature importance measures through mean decrease in impurity.
XGBoost, LightGBM, and CatBoost represented advanced gradient boosting frameworks that extend the foundational GBDT approach with algorithmic innovations. XGBoost implements regularized gradient boosting with tree pruning, parallel computation, and built-in handling of missing values, achieving improved computational efficiency and predictive performance. LightGBM employs a histogram-based algorithm with leaf-wise tree growth strategy and Gradient-based One-Side Sampling, achieving superior computational efficiency for large-scale datasets while maintaining accuracy. CatBoost incorporates ordered boosting and native categorical feature processing with symmetric tree structures, reducing prediction shift and overfitting risks through novel algorithmic design. These gradient boosting methods have demonstrated superior performance in numerous geoscience applications due to their ability to handle feature interactions and nonlinear relationships.
All models were implemented in Python 3.7 using scikit-learn, XGBoost, LightGBM, and CatBoost libraries. Hyperparameter optimization was performed using grid search with 5-fold cross-validation on the training dataset to identify optimal configurations for each algorithm.
Model performance was assessed using five complementary evaluation metrics calculated from the confusion matrix of predictions on the independent test dataset. Area Under the Receiver Operating Characteristic Curve (AUC) quantifies the overall discriminative ability of the model across all classification thresholds, with values ranging from 0.5 (random classifier) to 1.0 (perfect classifier). AUC provides a threshold-independent measure particularly valuable for imbalanced datasets. Accuracy measures the proportion of correctly classified samples (both RTS and non-RTS) relative to the total number of samples, providing a general assessment of classification correctness. Precision (positive predictive value) calculates the proportion of true RTS predictions among all samples predicted as RTS, indicating the model’s ability to avoid false alarms. Recall (sensitivity or true positive rate) quantifies the proportion of actual RTS samples correctly identified by the model, reflecting the completeness of RTS detection. F1-Score represents the harmonic mean of precision and recall, providing a balanced measure that accounts for both false positives and false negatives, particularly useful when class distributions are unequal. These metrics collectively provide a comprehensive evaluation framework, enabling assessment of model performance from multiple perspectives including overall accuracy, classification balance, and threshold-independent discriminative power.
3.3. Phase III: Susceptibility Analysis and Mapping
3.3.1. Feature Dataset Analysis
To systematically evaluate the contribution of spectral features to RTS susceptibility modeling, we designed four comparative experiments with progressively increasing feature complexity. Each experiment was specifically designed to test a distinct hypothesis regarding the role of static versus dynamic environmental variables.
Experiment 1 (Conventional Features Only): This baseline experiment utilized only traditional topographic and environmental variables, including DEM-derived metrics (aspect, slope, curvatures), landform, vegetation types, and distance to water body. The purpose of this experiment is to establish a performance benchmark representing the standard approach employed in most existing landslide and permafrost hazard susceptibility studies, against which the added value of spectral data can be measured.
Experiment 2 (Spectral Features Only): This experiment employed exclusively multi-temporal spectral indices (NDVI, EVI, NDMI, NBR, TCB, TCG, TCW, LST, NDD) derived from satellite observations during 2019–2024. This configuration aims to test the hypothesis that dynamic surface spectral characteristics alone—capturing processes like moisture change and vegetation loss—contain sufficient information to identify RTS susceptibility zones, even without explicit topographic data.
Experiment 3 (Conventional and 2024 Spectral Features): This experiment combined conventional features with spectral indices from 2024 only. This setup is designed to evaluate whether a “snapshot” of recent surface conditions is sufficient to enhance model performance, thereby testing the trade-off between model simplicity (fewer temporal dimensions) and predictive accuracy.
Experiment 4 (All Features): This comprehensive experiment integrated all conventional features with multi-temporal spectral indices from the entire 2019–2024 period. This final configuration tests the core hypothesis of this study: that the integration of static environmental controls (topography) with long-term dynamic process indicators (multi-temporal spectral trends) yields the highest predictive performance by capturing the complete spatiotemporal context of RTS evolution. All eight machine learning algorithms were trained and evaluated under each experimental configuration using identical training-testing data partitions. Performance metrics were compared across experiments to quantify the incremental value of spectral information and identify the optimal feature combination for RTS susceptibility assessment.
To identify the primary environmental controls governing RTS occurrence, we conducted feature importance analysis using the four advanced ensemble learning models (RF, XGBoost, LightGBM, CatBoost), which provide inherent mechanisms for quantifying feature contributions. Each algorithm employs distinct importance calculation methods: RF utilizes mean decrease in impurity, while the gradient boosting frameworks (XGBoost, LightGBM, CatBoost) compute importance based on the frequency and gain of feature usage in tree splits, offering complementary perspectives on feature contributions. Recognizing that different algorithms may yield importance scores on different scales and with varying magnitudes, we developed an ensemble importance metric to synthesize insights across models. For each of the four models, feature importance scores were normalized to the range [0, 1] using Min-Max scaling, eliminating scale differences and enabling cross-model comparison. The normalized importance scores from all four models were averaged for each feature, producing a consensus importance metric that integrates the perspectives of multiple algorithms. Standard deviation, minimum, and maximum values of normalized importance scores across models were recorded to assess inter-model consistency. Features with low standard deviation indicate robust importance across different modeling approaches, while high variability suggests algorithm-specific sensitivity.
3.3.2. SHAP-Based Model Interpretability Analysis
While feature importance rankings indicate which variables most strongly influence model predictions, they do not reveal the directional nature or functional form of these relationships. To address this limitation and provide mechanistic insights into RTS susceptibility drivers, we employed SHapley Additive exPlanations (SHAP), a unified framework for interpreting model predictions based on cooperative game theory. SHAP values quantify the marginal contribution of each feature to individual predictions by considering all possible feature combinations, offering several key advantages: (i) local interpretability through instance-level explanations, (ii) global interpretability via aggregation of local explanations, (iii) consistent and theoretically grounded attribution of feature effects, and (iv) ability to reveal nonlinear relationships and feature interactions. We applied SHAP analysis to the best-performing model identified through the comparative evaluation. SHAP summary plots visualizing the distribution of feature effects across all samples, revealing which features most consistently drive predictions and the magnitude of their impacts. SHAP dependence plots illustrating how feature values influence predicted RTS susceptibility, identifying positive or negative feedback relationships.
The optimal model, selected based on comprehensive performance evaluation across all metrics and experimental configurations, was deployed to generate a spatially continuous RTS susceptibility map for the entire YRSR. All feature layers were prepared at 270 m spatial resolution covering the study domain, and the trained model was applied in a pixel-by-pixel manner to produce susceptibility probability estimates ranging from 0 to 1. The resulting continuous susceptibility values were classified into four discrete hazard zones based on probability thresholds that reflect the likelihood of RTS occurrence [
30,
34]: very low susceptibility (0–0.05), representing areas with minimal probability of RTS development; low susceptibility (0.05–0.35), indicating relatively stable conditions with limited thaw-induced disturbance potential; moderate susceptibility (0.35–0.75), denoting areas with elevated risk where permafrost degradation processes may trigger RTS formation under favorable conditions; and high susceptibility (0.75–1.00), identifying critical zones with the highest probability of RTS occurrence where permafrost is most vulnerable to thermal disturbance and slope failure.
5. Discussion
This study establishes a robust framework for RTS hazard assessment in the Yangtze River Source Region by integrating a rigorously verified high-resolution inventory with an advanced multi-dimensional machine learning workflow. By synergizing static topographic controls with dynamic time-series spectral indices (2019–2024), we demonstrated that temporal spectral signatures are critical for capturing the thermal-hydrological evolution of permafrost degradation, thereby enabling the optimized CatBoost ensemble to achieve exceptional predictive accuracy (AUC = 0.994) compared to traditional static-feature models. This methodological innovation not only facilitated the identification of a substantially expanded RTS inventory, revealing a 113% increase in abundance relative to the 2018–2020 baseline, but also, through SHAP-based interpretation, successfully quantified the non-linear geophysical thresholds and complex environmental interactions driving the accelerated thaw slumping on the Tibetan Plateau.
5.1. RTS Characteristics and Environmental Controls
The spatiotemporal analysis of the 2024 inventory reveals a dramatic acceleration in permafrost degradation across the Yangtze River Source Region. The documented 83.5% increase in RTS abundance and 53% expansion in area between 2022 and 2024 significantly outpace the growth rates observed in the 2018–2022 period. This surge, characterized by a proliferation of small-scale features (median area 7744 m2) alongside the progressive expansion of mega-slumps (>40 ha), suggests the region has crossed critical thermal thresholds associated with the 0.5 °C/decade warming trend. The dominance of early-stage, small features indicates a continuous initiation of new instabilities, while the substantial total area increase confirms that once initiated, retrogressive growth is fueled by positive feedback loops. This non-linear intensification underscores that RTS development in the region is no longer an episodic phenomenon but has transitioned into a phase of systemic, rapid permafrost collapse.
Spatially, RTS distribution exhibits a stringent dependence on topoclimatic niches, effectively confined to a specific zone defined by specific elevation (4693–4812 m) and aspect (northeast-to-east) thresholds. Our analysis indicates that topography acts as the fundamental template for susceptibility by regulating solar radiation and ground ice preservation. The concentration on gentle slopes (mean 6°) and specific aspects reflects an asymmetric thermal regime: these locations receive sufficient summer insolation to trigger active layer thickening yet remain cool enough in winter to preserve the massive ground ice necessary for headwall retreat. The absence of RTSs on cooler aspects or steeper gradients validates the hypothesis that susceptibility is governed by a delicate balance between gravitational potential energy and thermal input, where topographic variables serve as proxies for these underlying energy budgets.
Crucially, the integration of multi-temporal spectral analysis with SHAP-based model interpretation elucidates the hydro-thermal mechanisms driving this degradation. While topography defines where RTSs can occur, dynamic spectral signals reveal when and how they develop. The sharp decline in vegetation and moisture indices (NDVI: −15.06%, NDMI: −71.20%) over the 2019–2024 period provides convergent evidence of physiological stress caused by ground ice thaw and subsequent drainage. This is corroborated by the SHAP feature importance ranking, where thermal (LST) and moisture (NDWI) indices emerged as the dominant predictors. The high predictive value of LST series underscores that cumulative thermal forcing is the proximate trigger for instability, while the strong signal from moisture indices captures the coupled hydro-thermal disruption. Thus, the RTS characteristic profile is defined by a dual control mechanism: static topographic preconditions that permit ground ice existence, activated by dynamic thermal forcing that exceeds the resilience of the overlying vegetation and active layer.
5.2. Feature Application in RTS Susceptibility Modeling
The exceptional predictive performance achieved in this study (AUC = 0.994) is fundamentally attributed to the high fidelity of the input data rather than the complexity of the modeling architecture alone. Our comparative experiments demonstrate that the “All Features” configuration yielded the most robust results, indicating that the constructed feature space successfully captured the multifaceted environmental controls of RTSs without introducing detrimental noise. This success validates our methodological emphasis on Phase I: by utilizing sub-meter resolution imagery (Jilin-1/Beijing-2) to construct the inventory, we eliminated the label noise often present in coarse-resolution datasets. The precise delineation of RTS boundaries ensured that the training samples extracted for modeling were spectrally and topographically pure, allowing the machine learning algorithms to learn the distinct signatures of permafrost degradation with minimal ambiguity.
Furthermore, the inclusion of multi-source features, integrating static geomorphometric variables with dynamic spectral indices, proved to be a comprehensive strategy that negated the need for aggressive feature reduction. This approach advances beyond recent RTS susceptibility assessments [
37,
38] which predominantly rely on static topographic and lithological variables to infer instability. While traditional modeling approaches often require extensive feature selection (e.g., RFE or PCA) to mitigate multicollinearity, our results suggest that advanced tree-based ensemble models like CatBoost can effectively manage feature redundancy while leveraging the subtle information contained within correlated variables. The dynamic spectral features (e.g., LST trends, NDVI anomalies), in particular, provided critical temporal dimensionality that static topographic factors lack. Unlike models limited to static snapshots, these features acted as proxies for the active thermal erosion processes, complementing the gravitational potential information provided by slope and elevation, thereby enabling a holistic representation of the RTS mechanism within the feature space.
Consequently, the high efficacy of the full feature set suggests that the “bottleneck” in regional RTS susceptibility modeling lies in data quality rather than algorithm optimization. The extensive effort invested in the preliminary stages, specifically the rigorous visual interpretation of high-resolution imagery and the systematic calculation of time-series spectral indices, provided a “clean” and information-rich dataset. This robust data foundation allowed the model to achieve near-perfect classification capabilities with standard hyperparameter tuning, proving that complex post-processing or elaborate model ensemble strategies are unnecessary when the underlying physical and spectral characterization of the hazard is sufficiently accurate. This finding advocates for a paradigm shift in future research: prioritizing high-precision inventory construction and domain-knowledge-based feature engineering over purely algorithmic complexity.
5.3. Implications for Hazard Management in the Yangtze River Source Region
The Yangtze River Source Region serves as a critical “Water Tower” for Asia, sustaining fragile alpine ecosystems and downstream livelihoods. However, the rapidly warming climate has transformed this stable permafrost zone into a hotspot of geomorphic instability. Our study confirms that permafrost degradation is intensifying, manifesting not only as the widespread proliferation of retrogressive thaw slumps but also through the expansion of thermokarst lakes, which collectively disrupt the hydrological balance and carbon cycle [
48]. The susceptibility map generated in this study provides a vital tool for navigating this escalating crisis. By delineating the spatially heterogeneous risk distribution, our results reveal that high-susceptibility zones are tightly clustered within specific topographic niches, predominantly north-to-east aspect at mid-elevations, rather than being randomly distributed.
This precise spatial targeting has significant implications for linear infrastructure planning and ecological conservation. The “very high” susceptibility areas, though covering a small fraction of the total landscape, coincide dangerously with potential corridors for engineering projects such as the Qinghai–Tibet engineering corridor. The identification of these hotspots allows for proactive avoidance strategies during the planning phase of future infrastructure, such as highways or pipelines. Furthermore, the strong correlation between RTS clusters and drainage networks observed in our results suggests a direct threat to water quality through sediment release and nutrient mobilization. Consequently, to maximize its practical value, we recommend integrating this susceptibility map into existing disaster risk management frameworks through a hierarchical strategy. First, within the Environmental Impact Assessment protocols for new infrastructure, this map should serve as the primary “coarse-scale screening layer” to enforce mandatory route optimization away from high-risk thermal niches. Second, for unavoidable high-susceptibility zones, site-specific geotechnical investigations (e.g., ground-penetrating radar, borehole thermal monitoring) must be mandated to quantify subsurface ground ice conditions. Finally, these identified hotspots should be prioritized for inclusion in the national early warning network, where high-frequency remote sensing (e.g., InSAR) is selectively deployed to detect pre-failure deformation, effectively transitioning management from passive static zoning to active dynamic prevention.
5.4. Limitations and Future Perspectives
While this study successfully demonstrates the efficacy of integrating high-resolution inventories with time-series spectral features, several limitations warrant consideration.
First, the temporal resolution of our spectral analysis, though improved, relies on annual composites to mitigate cloud cover interference common on the Tibetan Plateau. This aggregation may smooth out short-term phenological changes or rapid slump initiation events that occur within a single thaw season. Second, this study relies on a comprehensive RTS inventory derived from a single time slice (2024). We have not yet constructed a systematic multi-temporal RTS inventory. Consequently, this limits our ability to quantitatively assess the dynamic evolution of RTSs, such as calculating precise headwall retreat rates or quantifying area expansion over time. Without multi-temporal boundary data, it remains challenging to strictly correlate morphological changes with specific climatic drivers, such as temperature anomalies and extreme precipitation events. Third, although the model achieves high internal validation accuracy, the lack of extensive geotechnical data (e.g., ground ice content, borehole temperature) limits our ability to physically validate the “very high” susceptibility zones against subsurface reality. The model currently relies on surface proxies (topography and vegetation) to infer subsurface conditions.
Future research will focus on constructing high-resolution, multi-temporal RTS inventories to bridge this gap [
49]. By integrating these inventories with meteorological data (e.g., temperature and precipitation), we aim to quantitatively analyze the spatiotemporal evolution of RTSs and identify the critical climatic thresholds triggering their expansion. Crucially, to address the temporal resolution constraints of annual composites, we plan to incorporate high-frequency remote sensing data, such as daily imagery from CubeSat constellations (e.g., PlanetScope). This will allow us to capture intra-seasonal phenological variations and pinpoint the precise timing of rapid slump initiation. Additionally, incorporating higher-frequency remote sensing data (e.g., CubeSat, InSAR) and extending the “dynamic feature” concept will help move from static susceptibility mapping to dynamic risk assessment, which is essential for adapting to accelerating permafrost degradation.
6. Conclusions
This study established a comprehensive framework for assessing permafrost instability in the Yangtze River Source Region by integrating a high-resolution, inventory derived from sub-meter satellite imagery with advanced machine learning susceptibility modeling. By analyzing the spatiotemporal evolution of Retrogressive Thaw Slumps alongside a rich dataset of topographic, climatic, and time-series spectral features, we identified the critical drivers governing permafrost degradation in this high-altitude environment.
Our analysis yields three primary conclusions. The YRSR is experiencing a rapid acceleration of permafrost degradation, evidenced by an 83.5% surge in RTS abundance between 2022 and 2024. This proliferation is dominated by small-scale slumps, indicating that recent warming has triggered widespread instability in previously stable permafrost zones. RTS distribution exhibits a distinct dependency on both topographic and spectral characteristics. Topographically, hazards are clustered within a specific zone of specific elevations (4693–4812 m) and north-to-east aspect. Spectrally, RTSs are characterized by significant anomalies in vegetation and moisture indices; specifically, the decline in vegetation vigor (NDVI) and variations in surface moisture (TCW) act as critical surface proxies for the underlying active thermal erosion and ground ice melt. The susceptibility modeling achieved exceptional predictive accuracy (AUC = 0.994) using the CatBoost algorithm on the full feature set. The SHAP analysis further confirmed that dynamic spectral features (e.g., LST trends and vegetation anomalies) contribute significantly to the model’s decision-making, providing essential temporal dimensionality that complements static geomorphometric factors.
Collectively, this research not only delineates the high-risk zones threatening the “Water Tower” of Asia but also demonstrates that integrating high-fidelity visual interpretation with dynamic spectral monitoring is the most effective strategy for capturing the complex mechanism of permafrost hazards.