1. Introduction
Rainfall-induced landslides are a major natural hazard, posing severe risks to lives and infrastructure in regions characterized by steep terrain and frequent rainfall. Accurate prediction of landslide displacement is therefore essential for early warning systems and effective emergency planning [
1]. China, with its complex geological structures, active crustal movements, and uneven precipitation patterns, is among the countries most vulnerable to landslide disasters worldwide [
2,
3], making advanced monitoring and forecasting technologies a national priority. In 2019, the Ministry of Natural Resources launched a nationwide initiative to develop and deploy standardized landslide monitoring equipment, prompting the China Institute of Geo-Environment Monitoring (CIGEM) to establish universal monitoring networks across the country, called the Universal Landslide Monitoring Project (ULMP). In 2024, CIGEM had issued 622 hazard alerts, enabling the evacuation of 10,235 residents, and preventing an estimated 725 million yuan in direct losses [
4]. These results highlight the effectiveness of low-cost, large-scale monitoring when coupled with robust predictive methods to strengthen disaster preparedness and risk mitigation. Despite these achievements, significant challenges persist. Many monitoring networks remain under construction, resulting in short displacement time series; regional coverage is limited by funding compromises affecting instrument accuracy; and overall performance still lags behind benchmark projects such as the Three Gorges Dam [
5,
6]. As a result, available datasets often lack the duration and resolution necessary for reliable, slope-specific prediction.
Landslide displacement is a key indicator of landslide evolution and stability [
7]. In ULMP, monitoring often begins only after visible deformation has occurred, producing incomplete time-series records that omit early-stage behavior and give rise to diverse displacement curve morphologies. Since the 1960s, mechanical analyses of landslide displacement curves have consistently supported the Saito evolutionary model, which describes three stages of deformation [
8,
9]: initial activation, steady-state movement, and accelerated displacement. Xu later subdivided the acceleration stage into initial, moderate, and advanced phases [
10], providing a more detailed theoretical framework for deformation modeling and hazard assessment. Based on extensive case studies, Xu classified these curves into three types: sudden-onset, gradual-progression, and stable-equilibrium [
11]. Complementing this, Liu, drawing on field-based hazard management, proposed a parallel classification: slowly to be stable, evolution in terraces and losing stability type [
12]. The losing stability type is marked by continuously increasing gradients that signal imminent failure. The terrace type alternates between rapid movement and quiescence, introducing uncertainty in prediction. The stable type suggests long-term stability unless reactivated by external disturbances such as rainfall.
Landslide displacement prediction remains a central research focus, requiring both high predictive accuracy and interpretability to support actionable early warnings and targeted risk mitigation. Landslide initiation and evolution result from a complex interplay between static geological controls, such as lithology, slope angle, and structural conditions, and dynamic triggers, most notably short-duration, high-intensity rainfall. While medium-term modeling has advanced considerably, short-term prediction remains relatively underexplored, with most studies constrained to single-slope monitoring datasets [
13,
14,
15]. Existing approaches can be grouped into deterministic, statistical, and nonlinear models [
16]. Deterministic models, grounded in landslide mechanics, offer theoretical reliability but demand detailed geological data and complex parameter calibration [
8]. Statistical models are simple and transparent but highly sensitive to data completeness and quality [
17]. Nonlinear models capture complex variable interactions and generally outperform traditional methods [
18,
19,
20], yet their “black-box” nature limits mechanistic insight and complicates integration into risk-governance practice. Recent advances in deep learning, spanning CNNs, recurrent neural networks (RNNs), long short-term memories (LSTMs), and Transformers, have accelerated predictive modeling across image recognition, natural language processing, and time-series forecasting [
21,
22]. Applied to landslide displacement prediction, deep learning can extract complex spatiotemporal patterns from multi-source monitoring data, enabling high-precision forecasts and automation of early warning systems [
23,
24,
25]. Nevertheless, conventional deep learning approaches in this field often rely on a single displacement variable, struggle to incorporate domain-specific geological knowledge, and remain largely opaque, limiting interpretability for risk assessment and engineering decision-making. In addition, ULMP data typically exhibit short time series, small sample sizes, and high noise levels, further complicating model training and limiting generalizability.
These gaps underscore the need for a predictive framework that integrates high short-term accuracy, resilience to sparse and noisy data, and interpretability rooted in domain knowledge. Developing such a framework is essential for enhancing early warning systems and enabling targeted mitigation in landslide-prone regions. To this end, this study introduces a short-term displacement prediction framework that integrates transfer learning with factor integration, combining both dynamic and static variables based on landslide classification. This approach is designed to address the technical challenges of early warning in complex geological settings while balancing predictive accuracy with interpretability. By striking this balance, the framework seeks to advance the theoretical understanding of rainfall-induced landslide dynamics and strengthen the operational effectiveness of hazard monitoring and early warning systems.
2. Materials
The dataset used in this study was obtained from ULMP, led by the CIGEM. This national initiative seeks to establish a cost-effective, large-scale monitoring network for rainfall-induced landslides across China by integrating geotechnical observations with hydrometeorological data to enhance disaster prevention and mitigation efforts. However, datasets generated through this network are typically characterized by short time series, limited sample sizes, and high noise levels, which present significant challenges for model training.
A total of 274 landslide monitoring sites were provided by CIGEM, with the complete dataset totaling 2.7 GB. These sites are distributed across 17 provincial-level administrative regions in China (
Figure 1), reflecting diverse spatial and geological conditions. Each site is equipped with synchronized rain gauges and geotechnical monitoring instruments, enabling detailed investigation of rainfall-induced landslide mechanisms. All time-series data were desensitized and resampled at six-hour intervals.
The dataset spans the period from the initial deployment of monitoring equipment through 1 January 2023, with each site monitored for at least six months. Detailed monitoring durations for all 274 sites are presented in
Table 1. For predictive modeling, data collected before 1 July 2022, were used for model training, while records from 1 July 2022, onward were reserved for validation to support short-term landslide displacement forecasting.
CIGEM also provided static attribute data for each slope, derived from field investigations conducted by technical staff. These attributes were extracted and desensitized from the monitoring sites construction plan, plan layout diagrams, and landslide cross-sections specific to each site. In addition to geological attributes, climate [
26] and annual average rainfall [
27] were selected. Climate captures the long-term environmental conditions of a region, while annual rainfall provides a quantitative measure of precipitation intensity directly related to rainfall-induced landslide triggering. As rainfall cannot be directly inferred from climate [
28], these two variables offer complementary perspectives. Certain attributes, such as lithology, planar shape, profile shape and climate type, were obtained directly from field surveys or extracted from existing site documentation and required no further categorization. However, the remaining data types mostly exist as estimated ranges without precise, measurable values. Therefore, it is necessary to establish their classification levels based on existing research findings. The classification criteria for the static data are summarized in
Table 2, with categorization levels and threshold values established based on existing landslide classification standards and prior research findings [
29,
30].
3. Methodology
This section outlines the methodological framework for short-term landslide displacement prediction, which integrates both static and dynamic factors. The framework consists of three main components: data preprocessing, transfer learning, and static–dynamic factor integration. A detailed workflow of the proposed approach is illustrated in
Figure 2.
This study leverages data from the ULMP, where monitoring typically begins after visible deformation, producing incomplete, noisy, and low-precision time series. To improve data quality, the raw records were processed through anomaly removal, gap filling, and denoising. To enhance predictive capability, a transfer learning framework was adopted, comparing conventional single-slope models with multi-slope ensemble approaches. Two datasets were constructed: a full-slope dataset and a grouped dataset, the latter informed by landslide multi-stage developmental characteristics. To ensure domain-informed interpretability while achieving high short-term accuracy, predictive modeling integrates static geological attributes with dynamic triggers to capture multi-factor mechanisms. Model performance was evaluated using accuracy metrics and statistical significance tests to ensure a robust assessment.
3.1. Data Preprocessing
High-quality monitoring data are critical for accurately modeling landslide displacement. However, universal landslide monitoring networks face persistent challenges, including instrument instability, equipment malfunction, and environmental interference, that often result in outliers, missing values, and noise within the dataset. To ensure data reliability for predictive modeling, this study implements a comprehensive preprocessing workflow that integrates anomaly detection, gap recovery, and advanced noise-reduction techniques.
The anomaly handling workflow developed in this study consists of three key stages: instrument anomaly clearance, statistical outlier detection, and anomaly recovery. The process begins with instrument anomaly clearance, in which measurements outside sensor ranges or containing error codes are flagged and removed. Next, statistical outlier detection is applied using the Yamamoto [
31,
32] method to identify change points in first-order differenced data, thereby segmenting the time series into statistically stable intervals. Within each interval, anomalous points are detected and further assessed for local abrupt changes, with invalid points subsequently eliminated. Finally, anomaly recovery is performed to restore data continuity: short gaps are filled using linear interpolation, while longer gaps are reconstructed through ARIMA-based [
33] predictions.
While anomaly detection removes extreme outliers, minor fluctuations and high-frequency noise often persist due to sensor limitations and environmental interference. To mitigate this issue, we applied a hybrid denoising method that combines complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and Wavelet Transform [
34,
35].
Figure 3 illustrates the denoising performance, including raw data and the denoised signal.
This integrated preprocessing workflow, encompassing anomaly detection, data recovery, and advanced denoising, substantially improves data quality. The resulting time series are continuous, statistically reliable, and effectively cleansed of misleading noise, thereby providing a robust foundation for subsequent modeling and displacement prediction experiments.
3.2. Monitoring Data Grouping
Manual interpretation remains the predominant approach for assessing the morphological characteristics of landslide displacement curves. Although effective for a limited number of slopes, this method is impractical for universal monitoring networks that require morphological classification across hundreds of sites. To overcome this challenge, this study introduces an automated grouping framework that classifies landslides based on displacement curve morphology. The framework integrates derivative dynamic time warping (DDTW) [
36] to quantify curve similarity with K-means [
37] clustering for automated classification.
Dynamic time warping (DTW) is a widely used technique for measuring time-series similarity, particularly effective for sequences with varying lengths or local distortions. Its identifies an optimal warping path that minimizes the cumulative alignment cost, defined as the DTW distance [
38]. DDTW extends DTW by incorporating first-order derivatives to capture slope information, thereby mitigating alignment singularities and enhancing morphological similarity assessment.
In implementation, displacement data were normalized to the range [0, 1], rescaled, and standardized to a one-year window. DDTW-derived distances were then used as the similarity metric for K-means clustering. Clustering was performed with three groups (K = 3) and a maximum of 100 iterations, producing three distinct morphological categories and enabling consistent, automated classification of displacement curves.
3.3. Model Methodology
An adapted temporal fusion transformer (TFT) was implemented as a unified framework to jointly model displacement through dynamics and static variables. For comparative evaluation, three widely used baseline models with proven effectiveness in prior studies were selected: Support vector regression (SVR), extreme gradient boosting (XGBoost), and LSTM networks.
3.3.1. SVR
SVR extends the support vector machine framework to regression by constructing a hyperplane that approximates the target function while penalizing only data points lying outside a defined margin [
39]. This approach provides a balance between predictive accuracy and generalization. Hyperparameters were optimized using particle swarm optimization (PSO) [
14] with a swarm size of 1000, a maximum of 300 iterations, an inertia weight of
w = 1, and learning factors
c1 = 1.5 and
c2 = 1.7. Further details are provided in
Table 3.
3.3.2. XGBoost
XGBoost is an optimized implementation of gradient boosting decision trees that incorporates regularization terms and employs second-order Taylor expansion to approximate the objective function [
40]. As a tree-based ensemble model, XGBoost sequentially trains weak learners and aggregates their predictions to enhance overall accuracy, offering strong robustness, interpretability, and generalization. In this study, XGBoost hyperparameters were tuned using PSO, with a particle population of 1000, a maximum of 300 iterations, an inertia weight of
w = 1, and learning factors
c1 = 1.5 and
c2 = 1.7. Additional details are provided in
Table 4.
3.3.3. LSTM
LSTM networks, a specialized form of RNNs, employ a gated architecture consisting of forget, input, and output gates to regulate information flow across time steps [
6]. This design mitigates the vanishing gradient problem and improves the modeling of long sequential dependencies. To control computational cost, core hyperparameters such as the number of layers and hidden state dimensions were fixed, as summarized in
Table 5.
3.3.4. TFT
To incorporate both static and dynamic variables while preserving interpretability, the TFT was selected and adapted. The TFT integrates the strengths of LSTM networks and Transformer architectures [
41], and has been widely applied in domains such as transportation [
42], power systems [
43], and stock price forecasting [
44], where its predictive effectiveness has been consistently validated in practical applications. LSTM layers preserve local temporal encoding for short-term sequential patterns, while the Transformer’s encoder–decoder structure with multi-head attention captures long-range dependencies and multivariate interactions. Furthermore, attention weights and variable selection outputs enhance interpretability by highlighting influential time periods and critical variables.
We employ the TFT to predict landslide displacement by jointly modeling static and dynamic factors. Static covariates denote slope-inherent attributes that remain constant over time, with geological characteristics as a representative example, whereas dynamic inputs consist of displacement time series and rainfall records. The model architecture integrates a static covariate encoder (SCE), LSTM-based sequence modeling, and a temporal fusion decoder (TFD). The SCE generates context vectors that initialize recurrent states and feed into the static enrichment layer (SEL), thereby ensuring consistent incorporation of static information. Historical displacement and rainfall series are processed by the LSTM encoder, while the decoder incorporates known future features. Within the TFD, static enrichment, multi-head temporal self-attention, and position-wise feed-forward layers jointly capture long-term dependencies, integrate slope-specific geological context, and refine temporal representations.
TFT employs a quantile loss function for multi-step forecasting:
where
denotes the
quantile prediction at forecast horizon
;
represents the historical target variables;
are past-observed time-dependent variables;
are known future inputs; and
denotes the static variables.
To prioritize the rapid displacement phase of rainfall-induced landslides, which is critical for disaster management, higher weights were assigned to samples with elevated displacement velocities to emphasize their importance during model training. Accordingly, the TFT model was adapted to incorporate a weighted root mean square error (RMSE) loss function, defined as
where
is the sample weight,
is the predicted value,
is the true value, and
is the total number of samples. To prevent excessive influence from outliers, the sample weights were defined as
In this study,
,
and
denote the 95th and 5th percentiles of displacement velocity, respectively.
is a mapping function that assigns weights within the range [
] according to the value of
. The function is defined as follows:
This weighting strategy prioritizes critical high-velocity events while reducing the influence of outliers. Hyperparameter optimization was performed using Optuna, with the key settings summarized in
Table 6.
3.4. Evaluation Metrics
3.4.1. Prediction Accuracy
To ensure consistent performance evaluation across datasets, RMSE [
45] and mean error of the rapid displacement period (MERDP) were adopted as the primary metrics. RMSE, defined as the square root of the mean squared differences between predicted and observed values, provides a direct measure of model fit and overall predictive accuracy.
In early warning and forecasting, the rapid displacement phase of landslides poses the greatest risk, making model accuracy during this stage a priority. When overall RMSE values are similar and thus less discriminative, evaluation shifts to performance within this critical phase. Building on the combined error of rapid displacement period [
46], this study introduces mean error of rapid displacement period, which measures the average RMSE specifically during periods of rapid deformation. MERDP is formally defined as
In practice, is defined as the maximum observed displacement velocity, while is derived from the velocity distribution to ensure that the primary deformation phase is captured within the interval []. In this study, corresponds approximately to the 10th percentile of displacement velocities at each monitored site. MERDP is then computed as the mean RMSE within this velocity interval, obtained by integrating the RMSE curve across and normalizing by the interval length. Lower MERDP values reflect smaller average errors during rapid displacement phases, thereby providing a focused and reliable measure of predictive accuracy in the landslide’s most critical high-risk period.
3.4.2. Performance Difference Significance Assessment
This study applies the Paired Wilcoxon Signed-Rank Test [
47] to evaluate whether prediction results differ significantly between models for the same slope. As a non-parametric method, it assesses whether paired differences are statistically meaningful without requiring the assumption of normality, making it robust to skewed distributions and less sensitive to outliers. The null hypothesis states that no significant difference exists between model predictions, while a
p-value below 0.01 is taken as evidence of a statistically significant difference.
5. Discussion
We further explored the relationship between rainfall-induced landslides and rainfall factors through model interpretability. A key advantage of the TFT model over baseline approaches is its ability to directly provide both temporal attention distributions and the relative importance of static variables. Because grouped dataset modeling is conducted independently for each group, the interpretability outputs vary across landslide categories.
Figure 8 illustrates the temporal attention distributions for the model outputs of each group.
The temporal attention distribution of the constant-velocity displacement group differs markedly from the other two groups. In this group, attention declines sharply with time, indicating that the model relies mainly on short-term information and is relatively insensitive to longer-term data. By contrast, the stepwise and accelerated displacement groups display a more gradual decline in attention, followed by a moderate recovery that peaks around day ten. These patterns suggest that displacement in the constant-velocity group has only a weak correlation with historical rainfall, whereas displacement in the stepwise and accelerated groups is more strongly influenced by precipitation.
Figure 9 illustrates the relative importance of static variables across different landslide groups. In the stepwise and accelerated displacement groups, annual mean precipitation emerges as the dominant driver, whereas in the constant-velocity group, landslide volume exerts the strongest influence. The results indicate that rainfall remains a primary trigger for rainfall-induced landslides. Among the top five most influential static factors in the accelerated displacement group, all variables except annual precipitation also appear in the top five of the constant-velocity group, highlighting that rainfall is the key factor driving the transition from stable to accelerated displacement. Moreover, the findings underscore the critical role of slope-inherent attributes in understanding landslide dynamics. Notably, four static variables, volume, planar, profile, and lithology, consistently rank among the top five factors for both the accelerated and constant-velocity groups, indicating that these background geological characteristics provide essential context for slope stability [
48,
49] and significantly enhance predictive model performance. At the same time, volume and lithology remain highly influential across all groups, reflecting the persistent impact of slope geometry [
50,
51] and mass on displacement behavior, consistent with the conclusions of previous mechanistic studies on landslides [
52,
53].
Analysis of the TFT model’s temporal attention suggests that constant-velocity displacement is primarily governed by internal factors and shows limited sensitivity to historical rainfall. By contrast, the stepwise and accelerated displacement groups are influenced by a combination of internal and external drivers [
54], consistent with the high importance of both landslide volume and annual mean precipitation.
Overall, this analysis demonstrates that grouping landslides by displacement curve morphology effectively captures underlying deformation mechanisms. The integrated dynamic–static TFT model built on this framework delivers superior predictive performance compared with models relying solely on dynamic factors. Moreover, its attention distributions and factor-importance rankings correspond closely with established geological processes, reinforcing both the architectural soundness and the credibility of its predictions.
This study is constrained by the limited precision and completeness of baseline geological survey data, due to budgetary and temporal limitations of the universal landslide monitoring program. Nevertheless, our findings indicate that static attributes, particularly geological factors, significantly enhance predictive performance, with their relative importance varying across landslide groups, reflecting potential differences in underlying mechanisms. Future studies incorporating more detailed and comprehensive static survey data could not only improve the accuracy of displacement predictions but also provide deeper insights into the interactions between static slope characteristics and dynamic triggers, thereby advancing our understanding of landslide evolution and underlying driving processes.
6. Conclusions
The framework was validated using deformation monitoring data from 274 landslide sites across China. The group-based data augmentation strategy, informed by displacement curve morphology, improved both dataset representativeness and modeling performance. Implemented via a TFT, the integrated framework combining static and dynamic factors yielded substantial gains in predictive accuracy and interpretability.
Experimental results, validated through significance testing, demonstrate that group-based modeling substantially outperforms both single-slope models and models trained on aggregated datasets, enhancing predictive robustness and generalizability. Building on this foundation, a predictive framework integrating static geological attributes with dynamic triggering factors was implemented using a TFT. Comparative analyses show that the proposed approach achieves superior predictive accuracy, structural coherence, and interpretability compared with baseline models, including SVR, XGBoost, and LSTM. Moreover, within each group, TFT models trained on combined static–dynamic datasets consistently outperformed those using only dynamic factors, with improvements confirmed through significance testing.
These results highlight the framework’s potential to improve short-term landslide forecasting and strengthen the reliability of early warning systems. By integrating static and dynamic factors, the approach enhances both the theoretical soundness and structural consistency of the model, providing a robust foundation for accurate short-term displacement prediction and practical early-warning applications.