Accurate prediction of landslide run-out distance is fundamental to hazard mapping, emergency planning, and risk-informed engineering design. However, many data-driven studies implicitly treat landslides as a homogeneous population and provide limited, physically interpretable insights into how geomorphic factors govern run-out behavior. To address
[...] Read more.
Accurate prediction of landslide run-out distance is fundamental to hazard mapping, emergency planning, and risk-informed engineering design. However, many data-driven studies implicitly treat landslides as a homogeneous population and provide limited, physically interpretable insights into how geomorphic factors govern run-out behavior. To address these limitations, we propose a cluster-aware and explainable modeling framework to predict run-out distance
L using four source-region and slope descriptors: crown–toe relief
H, source area
A, source volume
V, and mean source-slope inclination
. The dataset consists of 10,159 rainfall-induced landslides compiled from official inventories and peer-reviewed literature. After standardizing predictors, the optimal number of clusters is determined using information criteria (AIC/BIC), followed by
k-means clustering to identify distinct landslide regimes. We first benchmark Random Forest, eXtreme Gradient Boosting, CatBoost, and LightGBM on identical data splits without hyperparameter tuning, using
, RMSE, and MAE as performance metrics. LightGBM consistently outperforms the alternatives and is therefore selected as the base learner. Within each cluster, LightGBM is further optimized using the Alpha Evolution (AE) algorithm, with Particle Swarm Optimization and Bayesian Optimization serving as benchmarks. The resulting AE-LightGBM model achieves the highest predictive accuracy across clusters. Model interpretability is achieved using TreeSHAP, which decomposes predictions into cluster-specific baselines and additive contributions from
H,
A,
V, and
. By integrating regime-sensitive learning with robust explainability, the proposed framework improves run-out distance prediction while providing transparent, physically meaningful insights to support scenario analysis and engineering decision-making.
Full article