1. Introduction
Landslides are sudden, far-reaching, and often cascading disasters that cause loss of life, property damage, infrastructure disruption, and secondary hazards [
1,
2]. Improving landslide prediction is fundamental to risk identification, spatial planning, engineering protection, and emergency management. Among candidate indicators, the run-out distance—the horizontal travel from the source to the distal deposit—most directly delineates the affected footprint. The magnitude of run-out distance depends on the geometry of the landslide source area and the slope, including crown–toe relief, source area, source volume, mean inclination, etc. [
3,
4]. Developing a robust, interpretable predictor of run-out distance from widely available descriptors enables rapid regional screening and supports site-scale engineering decisions, strengthening both scientific rigor and operational viability.
Four research lines have shaped the understanding of landslide run-out distance [
5]. Analytical and semi-analytical models derive travel bounds from dynamics and energy considerations through effective friction, path roughness, and conversion between potential and kinetic energy; they offer clear physical interpretability and modest data requirements, but rely on simplifying assumptions and can struggle to represent complex topography or heterogeneous materials [
6,
7,
8,
9,
10]. Numerical simulations reconstruct motion and deposition under detailed topography and material parameters. They provide high spatial fidelity and process richness, yet demand substantial input data and computational resources and can be sensitive to rheology choices and parameter uncertainty [
11,
12,
13]. Physical modeling uses scaled experiments with controlled boundary conditions to test mechanisms and to provide calibration evidence; it enables direct observation and repeatable hypothesis testing, while remaining constrained by scaling laws, boundary effects, and practical limits on the range of conditions explored [
14]. Empirical and statistical approaches fit transferable relations from historical cases, using geometric and topographic descriptors to quantify run-out under diverse settings; they are efficient to implement and validate across large datasets, but may face extrapolation risk, confounding from omitted variables, and reduced performance when regime heterogeneity is pronounced [
15,
16,
17,
18,
19,
20]. Despite their theoretical rigor, traditional physically based and empirical models rely on simplified mechanical assumptions and often require parameters that are difficult to measure at regional scales. These approaches are typically developed under specific geomorphic or material conditions, which may limit their applicability when event characteristics vary substantially. Moreover, heterogeneity in landslide size, slope geometry, and material properties can challenge the generalizability of single-regime formulations.
Building on these foundations, data-driven learning has emerged as a complementary avenue that leverages growing event inventories to provide flexible function approximation, scalable inference, and reproducible validation [
21,
22,
23]. Machine learning methods impose fewer explicit assumptions regarding functional form and can flexibly capture nonlinear interactions among predictors [
24,
25]. Such approaches are particularly advantageous when dealing with heterogeneous, multi-regime datasets compiled from diverse environments [
26,
27]. However, data-driven models may suffer from reduced interpretability and limited extrapolation capability beyond the training domain [
28]. Machine learning for landslide prediction spans several families: linear and generalized linear methods, kernel and distance-based methods, neural networks and deep learning, and tree-based ensembles [
29,
30,
31]. For the nonlinearity, scale effects, and interactions typical of run-out, tree ensembles offer strong engineering practicality. Random Forest, gradient boosting, CatBoost, and LightGBM capture nonlinear structure and higher-order interactions, remain robust to mixed feature scales after standardization, and train efficiently [
32,
33,
34,
35]. Crucially, the SHAP framework provides theory-grounded attributions for trees, with TreeSHAP delivering exact Shapley values in polynomial time while satisfying local accuracy, missingness, and consistency [
36,
37,
38]. This pairing unifies high predictive performance and rigorous interpretability within one framework.
In addition, most existing machine learning studies implicitly assume a single, homogeneous population and therefore overlook heterogeneity in geomorphic behavior. Here, we adopt a cluster-aware modeling framework in which statistically identified groups are interpreted as regime-like behavioral partitions rather than strictly physics-defined process regimes. Moreover, interpretability analyses are often conducted at the global level, making it difficult to assess whether feature attributions remain consistent and physically meaningful across different landslide types. These limitations motivate the need for a regime-aware and explainable modeling framework.
In this study, we address two core scientific questions. First, can a cluster-aware approach that partitions the population and models each group separately improve generalization and stability over a single-population baseline built from widely available descriptors? Second, can the resulting models provide principled, cross-cluster-comparable, and physically consistent additive explanations that support mechanism insight and decision transparency?
To address these questions, we develop a cluster-aware and explainable pipeline grounded in widely available predictors and a large, vetted dataset. We adopt four widely used, readily obtainable predictors—
H,
A,
V, and
. The dataset comprises 10,159 rainfall-induced landslides compiled from official inventories and peer-reviewed publications, enabling stable clustering and cluster-wise modeling. The methodological framework is structured as follows. We standardize the four predictors, determine the number of clusters by information criteria under a spherical-mixture approximation, and fit
k-means clustering [
39,
40]. Before any tuning, we benchmark Random Forest (RF), eXtreme Gradient Boosting (XGB), CatBoost, and vanilla LightGBM on consistent splits using
, RMSE, and MAE, then select LightGBM as the base learner [
41,
42]. Within each cluster, we perform constrained hyperparameter searching with the Alpha Evolution (AE) algorithm and use Particle Swarm Optimization (PSO) and Bayesian Optimization (BO) as budget-matched baselines [
43]. From the optimized configurations, we select AE-LightGBM as the final model. For interpretability, we apply TreeSHAP with cluster-specific background distributions to decompose predictions into a baseline plus additive contributions from the four predictors, yielding explanations that are consistent and comparable across clusters.
Having described the framework, the remainder of the paper is organized as follows.
Section 2 describes the dataset and variable definitions.
Section 3 presents the methodology including clustering criteria and the
k-means method, the principles of LightGBM, and the Alpha Evolution (AE) optimization, as well as the TreeSHAP implementation.
Section 4 reports cluster-wise predictions and compares them with PSO and BO optimized baselines.
Section 5 discusses the effects of cluster imbalance on performance, the rationale for choosing tree ensembles with TreeSHAP.
Section 6 concludes with key findings and outlines directions for future work.
3. Methods
As presented in
Section 2, we model the landslide run-out distance
L as a function of four predictors describing the source-region geometry and slope: crown–toe relief
H, source area
A, source volume
V, and mean inclination of the source-slope section
. The workflow comprises three stages. First, we standardize
H,
A,
V, and
, determine the optimal number of clusters using information criteria, and separate the dataset into several groups using the
k-means method. Second, within each cluster, we train a LightGBM model to predict
L from
, with hyperparameters optimized by the Alpha Evolution (AE) algorithm and benchmarked against Particle Swarm Optimization (PSO) and Bayesian Optimization (BO). Third, we explain the model using TreeSHAP. Before cluster-wise optimization, we compare Random Forest (RF), eXtreme Gradient Boosting (XGB), CatBoost, and LightGBM using consistent data splits and the metrics
, RMSE, and MAE. LightGBM offers the best performance and is therefore selected as the base learner. See
Figure 3 for the overall pipeline of the method.
3.1. k-Means Clustering
The dataset is classified using
k-means clustering [
48,
49]. All predictors are standardized to zero mean and unit variance. For a given
K,
k-means partitions the feature space of
by minimizing the within-cluster sum of squares
where
is the set of samples assigned to cluster
k and
is its centroid. Adopting the spherical-Gaussian-mixture approximation of
k-means, the shared variance is estimated as
with
N the number of samples and
the number of predictors. The maximized log-likelihood is
Counting
centroid parameters,
mixture weights, and the shared variance gives
The information criteria are then computed as
We evaluate these scores for and prioritize BIC when the two disagree. In our data, the information-criterion analysis supports , and we therefore fit k-means with and assign each sample a cluster label for downstream modeling. It is important to clarify that the clustering step is not intended to identify physically discrete geomorphic process classes. Landslide behavior is expected to follow complex, anisotropic, and heavy-tailed distributions, which cannot be strictly represented by spherical mixture models. In this study, k-means serves as a pragmatic partitioning operator that separates the heterogeneous population into statistically comparable subsets, enabling conditional learning and stable model interpretation. Therefore, the clusters should be interpreted as behavioral partitions in feature space rather than strict physical regimes, and the predictive conclusions do not rely on the generative validity of the spherical assumption.
3.2. Cluster-Wise Modeling Based on AE-LightGBM
3.2.1. Principles of LightGBM
Within each cluster
k, we predict
L from
using LightGBM, which constructs an additive ensemble of regression trees [
50,
51]:
At boosting round
t, LightGBM minimizes a second-order Taylor approximation of the regularized objective around the current predictions
:
where
and
are the first and second derivatives of the loss,
is the contribution of the new tree, and the tree complexity penalty is
with
the leaf values,
a penalty on leaf count, and
the
regularization strength. For a leaf
j with gradient sum
and Hessian sum
, the optimal leaf value is
For a candidate split into left/right children with statistics
and
, the improvement is
LightGBM adopts leaf-wise growth by selecting at each step the split that maximizes
while controlling complexity via
,
, and constraints such as maximum depth or maximum leaves.
Figure 4 illustrates the principle of LightGBM.
3.2.2. Alpha Evolution Hyperparameter Optimization
The Alpha Evolution (AE) algorithm, proposed in 2024 [
43], has demonstrated superior performance to many mainstream optimizers across diverse benchmarks. For each cluster
k, we tune the LightGBM hyperparameters
within a box-constrained search space
by minimizing the cross-validated RMSE:
A population
of candidate vectors is initialized uniformly in
. At each generation, a domain-scaled exploratory step is formed as
where
and
have i.i.d.
entries,
S is a Bernoulli(0.5) mask, and ⊙ denotes element-wise multiplication. The progression factor
decreases with the number of function evaluations
up to a budget
. A global reference
P is updated by weighted aggregation of a sampled subpopulation
B via
with nonnegative weights
summing to one. For contrastive guidance, each candidate forms a trial point
where
and
are elite and non-elite references and
is an element-wise random weight drawn in
or
. The trial is projected back into
and accepted greedily if it does not worsen
. The final model
is then trained in the cluster using
. See
Figure 5 for the flowchart of the alpha evolution (AE) algorithm.
3.3. Principles of TreeSHAP
To explain how the predictors affect
L, we use TreeSHAP to express, for any instance
i with features
, the prediction as an additive Shapley decomposition
where
is the baseline under a background distribution taken as the cluster-wise training distribution, and
is the attribution for feature
. In Shapley form with
features,
where
is the full feature set,
S is a coalition (subset), and
denotes the model output for instance
i when only features in
S are considered and the remaining features are integrated over the background distribution. TreeSHAP computes these quantities exactly in polynomial time by dynamic programming along decision-tree paths, using path probabilities to avoid explicit enumeration of all feature subsets, thereby yielding attributions that satisfy local accuracy, missingness, and consistency for the cluster-wise LightGBM models
F [
52,
53].
Figure 6 illustrates the framework of the AE-LightGBM and TreeSHAP combination.
4. Results
4.1. Clustering Results Based on k-Means Method
Figure 7 reports information-criterion diagnostics for k-means clustering based on the four predictors
. Both BIC and AIC, under full and diagonal covariance approximations shown for comparability, decrease steeply as the number of clusters increases from 1 to 3, and then exhibit a clear elbow at
. Beyond
, additional clusters yield only marginal gains. Consistent behavior across criteria supports selecting 4 clusters as the optimal number.
Figure 8 displays pairwise scatter plots with diagonal marginal distributions in the
space for the four identified clusters. The corresponding sample sizes are: Cluster 1 (
), Cluster 2 (
), Cluster 3 (
), and Cluster 4 (
). Cluster centers are indicated by yellow crosses. Overall, the geomorphic size variables
H,
A, and
V exhibit pronounced right skewness and heavy-tailed behavior, while
A and
V show a strong positive correlation, indicating that source area and volume scale jointly with event magnitude. The slope parameter
is approximately unimodal but displays systematic shifts in central tendency across clusters. From a physical perspective, Cluster 4 is characterized by low-to-moderate
H,
A, and
V values together with relatively small
, representing the most frequent small-to-moderate, lower-energy events. Cluster 1 comprises events with moderate geometric scales but comparatively larger
, consistent with steeper slopes and potentially enhanced runout efficiency for medium-scale failures. Cluster 2 exhibits substantial dispersion in
H,
A, and
V combined with intermediate
, forming a transitional group that spans predominantly small-to-moderate events while extending toward larger magnitudes. In contrast, Cluster 3, despite its limited sample size, is concentrated at high
A and
V with relatively large
H, representing rare, high-energy, large-scale events that dominate the distributional tails. These results demonstrate that the clustering effectively captures heterogeneity in both geometric and dynamical characteristics of rainfall-induced landslides, providing robust physical interpretation and statistical justification for subsequent cluster-specific modeling.
4.2. Prediction Results Based on AE-LightGBM
Based on the clustering results in
Section 4.1, the dataset was partitioned into four groups (Clusters 1–4), and separate predictive models were developed for each cluster to account for cluster-specific behavior and distributional heterogeneity. Because each record represents an independent historical landslide event compiled from multiple regions rather than spatially contiguous observations within a single study area, the samples approximate event-based independence. Accordingly, data within each cluster were randomly split into 80% for training and 20% for independent testing to ensure objective and comparable model evaluation. Spatial cross-validation commonly required for susceptibility mapping is therefore not directly applicable in this context.
To establish a robust and computationally efficient baseline, four widely used tree-based algorithms—Random Forest (RF), eXtreme Gradient Boosting (XGB), CatBoost, and LightGBM—were first evaluated. Based on preliminary performance metrics, including the coefficient of determination (), root mean square error (RMSE), and mean absolute error (MAE), LightGBM was selected as the base learner for further optimization. Subsequently, the Alpha Evolution (AE) algorithm was employed to optimize the LightGBM hyperparameters. For benchmarking purposes, two well-established automated optimization methods—Particle Swarm Optimization (PSO) and Bayesian Optimization (BO)—were also implemented. For fair comparison, AE, PSO, and BO were executed under an identical optimization budget defined by the same number of objective function evaluations and the same hyperparameter search space, rather than identical iteration counts.
Figure 9 illustrates the convergence behavior of the best fitness value during the AE-based hyperparameter optimization of LightGBM. The fitness value decreases rapidly from 66 to 43 within the first two iterations and further declines to approximately 37 before entering a plateau phase characterized by minor fluctuations between 36.7 and 37.5. The global optimum is attained at the 18th iteration, with a fitness value of 36.8468, as indicated by the red circle. This convergence pattern suggests that the algorithm efficiently approaches the optimal region in the early stages and maintains stable performance thereafter.
Figure 10 compares AE-LightGBM model outputs with monitoring observations of landslide run-out distance
L on the training set. Points generally align with the 1:1 reference line and the black fit line has a slope close to unity, indicating limited systematic bias. Residual histograms are narrowly centered at zero, suggesting predominantly small random errors. Importantly, the green fit line represents the AE-LightGBM model’s learned response rather than a simple linear regression: it is a data-driven curve produced by the tree ensemble’s piecewise nonlinear combinations of predictors. It means that the proposed model captures salient nonlinearities and interaction effects among covariates, consistent with the high
, low RMSE and MAE, and concentrated residuals. Overall, Clusters 1, 2, and 4 exhibit stronger agreement (
larger than 0.95), whereas Cluster 3 is more dispersed but still coherent.
Figure 11 shows the corresponding test-set comparisons, where predictions remain tightly distributed around the 1:1 line and residuals stay concentrated near zero, indicating good generalization. As in the training dataset, Cluster 3 displays a looser pattern with heavier tails, which is likely due to its smaller sample size and higher variability. Clusters 1, 2, and 4 maintain higher consistency and narrower residual ranges. Collectively, the two figures show that AE-LightGBM reproduces
L robustly, with the learned nonlinear structure playing a key role in its accuracy and stability. Due to space constraints, we present in the main text only the training and testing dataset comparisons and residual distributions for the best-performing AE-LightGBM model. Prediction results of testing data for the six benchmark models including RF, XGB, CatBoost, LightGBM, PSO-LightGBM, and BO-LightGBM are provided in
Appendix B.
Figure 12 illustrates the
, RMSE and MAE of the seven models for each cluster. In Cluster 1, errors are very close, with RMSE varying between 5.92 and 6.48. AE-LightGBM leads with RMSE 5.9168 and
0.9593, while Catboost edges the best MAE at 3.5771 by a hair. Cluster 2 shows moderate separation with RMSE ranges between 44.91 and 50.25: AE-LightGBM is best on all three metrics, with PSO-LightGBM and BO-LightGBM variants next, and XGB and Catboost trailing. Cluster 3 is the hardest and most differentiated with RMSE ranging between 20.32 and 83.67: AE-LightGBM dominates, with about 64% lower RMSE than vanilla LightGBM, while the baselines show poor fit. Cluster 4 is stable with small gaps, with RMSE between 6.03 and 6.80: AE-LightGBM again tops the board, with about 4.8% lower RMSE than LightGBM. Overall, Catboost and AE-LightGBM are nearly tied in Cluster 1, AE-LightGBM attains the lowest RMSE in all four clusters and the best MAE and
in clusters 2, 3, and 4. For the mean evaluation metrics of the four clusters, AE-LightGBM attains the best overall performance. Relative to standard LightGBM, AE-LightGBM reduces error by roughly one third and improves explanatory power by about +0.08 in
. The other automated LightGBM variants—BO-LightGBM and PSO-LightGBM—form the next tier, trailing AE by a modest margin while clearly outperforming non-AE baselines. In contrast, XGB, CatBoost, and RF yield substantially larger errors and lower goodness of fit. Overall, these cross-cluster averages indicate that coupling LightGBM with the Alpha Evolution (AE) algorithm yields consistently stronger and more robust generalization than both vanilla LightGBM and alternative baselines. See
Table 1 for the average RMSE, MAE, and
across all clusters.
The high predictive accuracy mainly reflects strong geometric constraints on runout distance rather than model memorization. Because mobility approximately follows energy-scaling relationships, a substantial fraction of variance is physically explainable, leading to higher than typical site-specific hazard models.
Figure 13 compares the residual distributions of the seven models across the four clusters. AE-LightGBM consistently produces the narrowest, highest peaks centered near zero, indicating superior predictive stability and accuracy. Vanilla LightGBM performs better than XGB, RF, and CatBoost, and among the optimized variants, AE-LightGBM outperforms both PSO-LightGBM and BO-LightGBM. Across clusters, the residuals are approximately symmetric and bell-shaped around zero, consistent with a near-normal distribution. Notably, Cluster 3 exhibits a much more dispersed residual pattern with heavier tails, implying lower predictive accuracy than Clusters 1, 2, and 4; this behavior is likely attributable to its smaller sample size and greater data heterogeneity. Synthesizing the evaluation metrics
, RMSE, and MSE with these residual analyses, we conclude that AE-LightGBM is the most competitive model. Accordingly, we adopt AE-LightGBM as the final predictor and develop a TreeSHAP-based explainer in the following section, so as to quantify feature contributions and analyze the principal drivers of the response including cross-cluster heterogeneity and interaction effects, as well as local and global attributions.
4.3. Explanatory Analysis Based on TreeSHAP
For each cluster, an AE-LightGBM model was trained independently, and feature importance was quantified using the mean absolute SHAP value. As shown in
Figure 14, across all four clusters, the results exhibit a consistent hierarchy: the vertical drop
H is the dominant predictor of runout distance
L, with an importance substantially exceeding that of the other covariates. The slope angle
and source area
A show comparatively minor and broadly similar contributions, whereas the source volume
V has a negligible effect within the fitted models. Cluster 3, which represents large, high-energy events, is the only notable deviation from this baseline: although
H remains predominant, the relative importance of
A increases, indicating that planform extent provides additional explanatory power for extreme cases. Clusters 1, 2, and 4 conform to the general pattern of:
H dominant;
and
A secondary; and
V minimal. Overall, the TreeSHAP analysis is consistent with physical expectations: the gravitational-potential proxy
H governs runout across regimes, with source geometry becoming more influential in the tail of large events, while
and
V contribute only marginal improvements in predictive performance.
Figure 15 summarizes the global contribution of each input variable. In all clusters,
H exhibits SHAP values that are predominantly positive with a pronounced right tail, indicating a consistently strong and positive impact on
L. The contributions of
and
A are comparatively modest and centered near zero, reflecting weaker but directionally mixed effects.
A displays a longer positive tail in Cluster 3, implying increased relevance for large events. The SHAP values of
V are tightly concentrated around zero across clusters, indicating negligible marginal influence within the present modeling framework. Differences in the horizontal scale among panels further indicate that Clusters 2 and 3 are associated with greater output sensitivity than Clusters 1 and 4.
As
H was identified as the most influential predictor,
Figure 16 focuses specifically on the SHAP dependence of
H in each cluster. Across all clusters,
increases approximately monotonically with
H, indicating a robust positive marginal effect of
H on the predicted runout distance
L. Clusters with larger event scales (notably Cluster 2 and Cluster 3) display wider ranges of
, consistent with stronger output sensitivity to variation in
H. Color gradients suggest limited but non-negligible interactions:
modulates the effect of
H in Clusters 1, 2, and 4, whereas
A plays a more pronounced interacting role in Cluster 3. Overall, the dependence curves corroborate the conclusion that
H is the primary driver of
L across regimes, while interactions become more evident for large events.
Figure 17 shows the two-dimensional response surfaces from AE-LightGBM showing the predicted
L over selected feature pairs. Surfaces involving
H (panels
A–
H,
–
H, and
V–
H) display a dominant gradient along the
H axis: predicted
L increases markedly with
H, with a steeper rise in the mid-to-high range of
H. By comparison, variations along
A,
, or
V at fixed
H produce smaller changes in
L, indicating weaker marginal effects. The
A–
V and
–
V surfaces are largely flat along
V, corroborating the minimal role of
V observed in the SHAP analyses. Moderate shifts along
and, in specific ranges, along
A yield secondary adjustments to
L, consistent with their status as auxiliary modulating factors. These bivariate patterns align with the univariate SHAP findings and reinforce the conclusion that
H governs runout across clusters, while
and
A provide limited modulation and
V contributes negligibly.
Beyond confirming variable dominance, the response surfaces indicate regime-dependent sensitivity rather than purely monotonic dependence. The gradient of predicted L with respect to relief H increases at larger values, suggesting a progressive transition from friction-limited motion toward inertia-enhanced mobility. This behavior implies a continuous mechanical transition rather than a discrete threshold. Similarly, the increasing influence of source area in high-energy cases reflects enhanced lateral spreading and internal deformation processes. These patterns suggest that landslide mobility evolves gradually with scale, and the model captures nonlinear changes in response intensity rather than simple proportional scaling.
The SHAP attribution patterns can be interpreted in terms of landslide mobility mechanics rather than purely statistical importance. The dominance of the relief H indicates that run-out distance is primarily controlled by gravitational potential energy. This is consistent with classical mobility scaling relationships in which the ratio approximates an effective friction coefficient. Under this framework, the relatively weak contribution of slope angle suggests that local inclination mainly modulates motion once the total drop height is fixed.
The negligible influence of volume V implies that mobility is not directly mass-controlled but energy-controlled, consistent with observations that landslides of different magnitudes often follow similar run-out scaling. In contrast, the increased contribution of source area A within the extreme-event cluster suggests enhanced lateral spreading and internal deformation in high-energy failures. Therefore, the explainable machine learning results recover physically meaningful mobility behavior rather than providing only statistical correlations.
5. Discussion
5.1. Cluster-Aware Modeling Implications Under Sample Imbalance
The clustering procedure yielded groups with markedly different sample sizes. In particular, cluster 3 contains substantially fewer observations than the remaining three clusters. This imbalance is reflected in model performance: the cluster-wise AE-LightGBM fitted in cluster 3 exhibits lower predictive accuracy and higher variance relative to the models trained in the larger clusters. We interpret this discrepancy as a property of the data rather than a methodological artifact. From a geomorphic perspective, the clustering partitions the dataset into statistically inferred regimes representing distinct ranges of event scale and slope configuration. The compiled sample is dominated by small- to medium-sized landslides, whereas cluster 3 corresponds to a comparatively rare high-magnitude regime. Consequently, the model in this subset observes a narrower and less representative span of –L relationships and has fewer opportunities to learn stable nonlinear interaction patterns.
It should be noted that the imbalance is intrinsic to the physical occurrence of landslides rather than a sampling artifact. The rare-event cluster represents high-magnitude failures occupying the heavy tail of the distribution. Applying resampling or synthetic data augmentation would artificially modify the frequency structure and potentially distort the relationship between predictors and run-out distance. Therefore, instead of enforcing balanced performance across clusters, the reduced accuracy in this regime is interpreted as increased epistemic uncertainty, which is consistent with the objective of regime-aware analysis rather than benchmark optimization.
Although cross-validated training and early stopping mitigate overfitting, residual epistemic uncertainty remains larger in this regime. This behavior also illustrates a key distinction between the present cluster-aware framework and conventional clustering-based modeling strategies. Rather than treating clustering as a preprocessing step alone, regime identification directly governs model construction, hyperparameter optimization, and interpretability analysis. As a result, data imbalance influences not only statistical representation but also regime-specific predictive stability and attribution consistency. From a practical standpoint, risk communication and decision thresholds should acknowledge the wider predictive intervals associated with the rare-event regime, and future data collection should prioritize underrepresented regimes to improve balance and reduce uncertainty.
5.2. Roles of Predictors and Modeling Implications
The predictors area
A and volume
V exhibit substantial positive correlation, as shown in
Figure 18. A reductionist approach might remove one of the two to limit redundancy. However, landslide looseness, internal structure, and thickness vary across events, implying that bulk density and vertical extent can differ even for similar planform areas. Volume therefore conveys information about effective mass and potential energy that is not fully captured by area, while area may reflect lateral spreading potential and roughness interactions not encoded by volume alone. For these reasons, both
A and
V are retained. The cluster-wise AE-LightGBM can exploit their partially overlapping but distinct signals, and TreeSHAP then quantifies their conditional, marginal contributions relative to the other predictors. In parallel, we monitor attribution stability to guard against interpretability distortions arising from multicollinearity; where strong redundancy is detected, future extensions may incorporate composite features or penalization schemes that preserve physically meaningful signal.
The use of crown–toe relief
H highlights an important distinction between explanatory and predictive modeling. Although
H is generally unknown prior to failure, it serves as a physically meaningful descriptor of potential energy and event scale within historical datasets. The inclusion of
H in this study was motivated by its widespread use in empirical and statistical runout analyses, where it serves as a proxy for gravitational potential energy and the overall geometric scale of the event. This situation is common in mobility studies of long runout mass movements, where geometric drop height becomes available only after the event. Consistent with this perspective, the TreeSHAP results indicate that
H plays a dominant role in explaining model outputs, reinforcing its physical relevance for describing mobility behavior within historical-event datasets. Therefore, the present framework should not be interpreted as a pre-event forecasting tool but as a post-event mobility characterization and comparative analysis approach consistent with established empirical runout studies [
54].
5.3. Limitations and Directions for Future Work
The proposed approach has defined applicability limits. The predictors describe geometric controls but do not explicitly represent material strength, path roughness, or hydrological conditions, and deviations may occur where rheological effects dominate. The compiled inventory contains observational bias typical of historical datasets, where accessible and moderate-sized events are preferentially recorded. Consequently, the learned relationships represent a conditional empirical mobility relation valid within the range of geomorphic conditions present in the training data. The framework is most reliable for interpolation within comparable environments, while extrapolation to regions with substantially different lithology, climate, or topographic configuration should be undertaken cautiously.
6. Conclusions
This study introduced a cluster-aware and explainable framework for predicting landslide run-out distance L from four predictors . Information-criterion diagnostics supported a four-cluster partition, reflecting the dominance of small- to medium-sized landslides in the compiled dataset. We first established a baseline by comparing Random Forest, eXtreme Gradient Boosting, CatBoost, and LightGBM on consistent splits, then selected LightGBM as the best learner, and then optimized it within each cluster using the Alpha Evolution (AE) algorithm. PSO- and BO-optimized LightGBM variants served as benchmarks. The final predictor was AE-LightGBM. AE-LightGBM achieved the best cross-cluster averages with mean . Relative to vanilla LightGBM, the error was reduced by approximately one third and increased by about on average. Cluster-wise testing results corroborate this pattern. Residual distributions across clusters were approximately symmetric and narrowest for AE-LightGBM, indicating improved stability.
Explainable analysis indicates that run-out distance is primarily controlled by gravitational potential energy represented by the relief H, while slope inclination and source geometry act as secondary modulators. The increased contribution of source area in the rare-event regime suggests enhanced spreading processes in large, high-energy failures. These findings support an energy-dominated mobility scaling rather than a mass-dominated one, and demonstrate that predictive uncertainty itself becomes scale dependent, with extreme events inherently less predictable.
The framework should be interpreted as a conditional empirical mobility model rather than a universal predictive law. Because only geometric descriptors are included, deviations may occur in environments where material strength, rheology, or hydrological conditions dominate. In addition, the compiled inventory contains observational bias typical of historical datasets, and the model is therefore most reliable for interpolation within comparable geomorphic conditions rather than extrapolation to entirely different regions.
Overall, the study shows that combining regime-aware learning with interpretable attribution can provide both predictive capability and physical insight into landslide mobility. Future work should expand observations of rare events and incorporate proxies for material and path properties to improve generalization and reduce epistemic uncertainty.