Improving the Generalization Performance of Debris-Flow Susceptibility Modeling by a Stacking Ensemble Learning-Based Negative Sample Strategy
Abstract
1. Introduction
2. Study Area and Data
2.1. Study Area
2.2. Selection of Conditioning Factors
3. Methods
3.1. Negative Sample Screening via Stacking Ensemble Learning
3.2. Stacking-Random Forest Model
3.3. Validation Metrics
3.4. SHAP Model Explanation
- Global explanation: Using SHAP summary plots to identify overall feature importance.
- Local explanation: Applying force plots and waterfall charts to explain specific sample predictions.
- Dependency analysis: Revealing nonlinear relationships and interactions through dependence plots, especially for top features like distance from roads or POI density.
- Theoretical foundation: SHAP values adhere to game theory axioms—efficiency, symmetry, dummy, and additivity—ensuring local accuracy and fairness.
- Additive decomposition: SHAP decomposes any prediction into base value and feature contributions.
- Model agnosticism: SHAP works with tree models, neural networks, and more, and is highly effective in revealing nonlinear interactions in random forests.
4. Results
4.1. Model Performance Comparison
4.1.1. Baseline Model Performance Evaluation
4.1.2. Performance Evaluation of the Stacking Ensemble Model and Stacking–Random Forest Model
4.2. Negative Sample Screening via Stacking Ensemble Model
4.2.1. Negative Sample Screening Mechanism
- Integrating predictions from four base learners (random forest, logistic regression, decision tree, and gradient boosting decision tree).
- Employing a multi-factor collaborative filtering mechanism through the dynamic adjustment of sample weights, guided by prior geological knowledge. This mechanism comprehensively analyzes all 19 conditioning factors, including strata, topographic relief, and distance from faults.
4.2.2. Validation of Geographical Characteristics of Very Low-Susceptibility Zones
4.3. Susceptibility Zonation Using the Stacking–Random Forest Model
4.4. Model Cross-Regional Validation in Qingchuan County
- (1)
- Conventional RF model:
- Over 70 percent of historical disasters occurred within zones classified as very low to low susceptibility, indicating high false positive rates;
- Fewer than 5 percent of disasters were located in high- or very high-susceptibility zones, demonstrating severe underprediction;
- Zero events in very high-susceptibility zones.
- (2)
- Stacking-RF model:
- Disaster proportion reduced in very low- to low-susceptibility zones to 22.73%;
- Elevated proportion in high- to very high-susceptibility zones to 53.41%;
- Increased very high-susceptibility zone density from 0 to 0.0469 locations/km2.
4.5. SHAP Explanation
4.5.1. Global Explanation
- Feature importance ranking plot (ordered by descending mean |SHAP| value);
- SHAP summary plot (depicting the distributional relationship between feature values and SHAP values).
Feature Importance Ranking
Analysis of Feature Influence Direction and Patterns
- (1)
- Key linear features
- (2)
- Non-linear features
- (3)
- Secondary features
4.5.2. Local Explanation
- (1)
- Distance from the road (Figure 12a)
- (2)
- POI kernel density (Figure 12b)
- (3)
- Elevation (Figure 12c)
4.5.3. Cross-Regional Mechanism Stability Validation: Qingchuan
- Factor importance rankings and SHAP summary plots in Qingchuan County align closely with those of Wenchuan County.
- The top six features exhibit particularly high consistency.
- The distributions of factors such as distance from the road and POI kernel density further corroborate the universal principle: “Human activity intensity dominates debris-flow risk.”
- Samples with the highest predictions are primarily driven by features like the distance from the road = 0 and distance from the river ≈ 0, collectively elevating the model’s high-risk probability assessment.
- Conversely, samples with the lowest predictions display numerous features, including elevation = 3671.92 m and POI kernel density = 0, exerting an overall negative contribution to debris-flow prediction and thereby substantially suppressing the output probabilities.
- Marked contrasts in the key factor values validate the model’s sensitivity to risk-level transitions between engineering-disturbed zones and sparsely populated areas, confirming the stability and transferability of feature threshold effects across regions.
- The negative sample strategy derived from ensemble learning establishes a more reliable modeling foundation for new regions, significantly improving high-susceptibility zone identification.
- The human activity–terrain environment synergy mechanism remains highly consistent across the regions, reflecting the model’s profound capability to characterize geological disaster formation.
5. Discussion
5.1. Methodological Innovation: Integrated Negative Sample Screening Enhances Model Generalization
5.2. Mechanistic Analysis: Nonlinear Coupling Mechanisms of Human Activities and Topography
5.3. Uncertainty: Boundary Ambiguity and Dynamic Factor Deficiencies
- A notable source of uncertainty stems from model ambiguity near the critical thresholds, particularly where NDVI values approach 0.1—indicating sparse vegetation—and when slope angles approximate 35°, which is commonly recognized as a geomorphic instability threshold. Under these conditions, the model demonstrates inconsistent prediction confidence, which is likely due to complex nonlinear interactions and overlapping susceptibility characteristics. While SHAP analysis reveals the sensitivity of predictions in such regions, it does not resolve classification ambiguity. Future improvements could involve integrating NDVI time-series datasets to better reflect seasonal vegetation variability, or implementing fuzzy classification and hybrid threshold-based models to address boundary uncertainty and improve decision-making clarity.
- Static data constraints weaken the model’s temporal predictive capability by excluding dynamic factors such as hourly rainfall peaks, seasonal vegetation fluctuations, and slope deformation trends. These missing dynamics reduce the model’s responsiveness to event-based triggers (e.g., > 50 mm/h of rainfall) or temporal slope instabilities. Future work should consider integrating satellite-based InSAR deformation time series and TRMM/GPM rainfall datasets to model the short-term triggering mechanisms. Additionally, multi-temporal NDVI products can enhance the characterization of vegetation dynamics. Temporal sequence modeling techniques, such as attention-based LSTM or Transformer frameworks, will be explored to incorporate these dynamic factors and improve the model’s sensitivity to time-varying debris-flow triggers.
5.4. Practical Implications: Precision Mitigation and Spatial Planning
- Delineate priority avoidance zones encompassing 500-meter buffer zones along roads and medium- to low-altitude areas where POI kernel density exceeds 50 units/km2.
- Enhance engineering standards to require slope reinforcement along roads that have been designed to withstand rainfall events, with a 50-year return period.
- Enforce development restrictions in very high-susceptibility zones, which cover 15.86% of the study area and account for 92.3% of historical debris-flow events, through residential construction bans and the relocation of populations to low-susceptibility areas, characterized by topographic relief below 60 m.
5.5. Model Selection and Performance Divergence: Mechanistic Interpretations of Predictive Discrepancies
5.6. Future Work: Toward Robust Validation and Field Integration
6. Conclusions
- (1)
- Negative sample screening significantly improves model generalization and reliability.
- (2)
- The Stacking-RF model enables high-precision early-warning and cross-regional applicability.
- (3)
- Human activity factors are dominant in the nonlinear driving mechanism of debris flows.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
SHAP | Shapley Additive Explanations |
LR | Logistic Regression |
DT | Decision Tree |
GBDT | Gradient Boosting Decision Tree |
RF | Random Forests |
Stacking-RF | Stacking–Random Forest |
SVM | Support Vector Machine |
NB | Naïve Bayes |
ROC | Receiver Operating Characteristic |
AUC | Area Under the ROC Curve |
DEM | Digital Elevation Model |
TWI | Topographic Wetness Index |
NDVI | Normalized Difference Vegetation Index |
STI | Sediment Transport Index |
SPI | Stream Power Index |
InSAR | Interferometric Synthetic Aperture Radar |
TRMM | Training Resource Management Meeting |
References
- Iverson, R.M. The physics of debris-flow. Rev. Geophys. 1997, 35, 245–296. [Google Scholar] [CrossRef]
- Yang, S.; Mei, G.; Zhang, Y. Susceptibility analysis of glacier debris flow by investigating glacier changes based on remote sensing imagery and deep learning: A case study. Nat. Hazards Res. 2023, 4, 539–549. [Google Scholar] [CrossRef]
- Cui, P.; Wei, F.Q.; He, S.M.; You, Y.; Chen, X.Q.; Li, Z.L.; Dang, C.; Yang, C.L. Mountain disasters induced by the 5 · 12 Wenchuan earthquake and disaster reduction measures. J. Mt. Sci. 2008, 26, 280–282. (In Chinese) [Google Scholar] [CrossRef]
- Wang, G.F.; Yang, Q.; Tian, Y.T.; Ye, Z.N.; Chen, Z.L.; Gao, Y.L.; Guo, N.; Deng, B. Construction of debris flow susceptibility evaluation model: A case study of Yangtang River section in Shimen Township, Bailong River Basin. Arid Zone Res. 2019, 36, 761–770. (In Chinese) [Google Scholar] [CrossRef]
- Sun, B.; Zhu, C.B.; Kang, X.B.; Ye, L.; Liu, Y. Susceptibility assessment of debris flow in Dongchuan, Yunnan based on information value model. Chin. J. Geol. Hazard Control 2022, 33, 119–127. (In Chinese) [Google Scholar] [CrossRef]
- Hong, H.; Liu, J.; Zhu, A.X.; Shahabi, H.; Pham, B.T.; Chen, W.; Pradhan, B.; Bui, D.T. A novel hybrid integration model using support vector machines and random subspace for weather-triggered landslide susceptibility assessment in the Wuning area (China). Environ. Earth Sci. 2017, 76, 652. [Google Scholar] [CrossRef]
- Perov, V.; Chernomorets, S.; Budarina, O.; Savernyuk, E.; Leontyeva, T. Debris flow hazards for mountain regions of Russia: Regional features and key events. Nat. Hazards 2017, 88, 199–235. [Google Scholar] [CrossRef]
- Zhou, X.; Wen, H.; Zhang, Y.; Xu, J.; Zhang, W. Landslide susceptibility mapping using hybrid random forest with GeoDetector and RFE for factor optimization. Geosci. Front. 2021, 12, 101211. [Google Scholar] [CrossRef]
- Zhang, K.; Sang, G.; Cheng, J.; Liu, Z.; Zhang, Y. Negative sampling strategy based on multi-hop neighbors for graph representation learning. Expert Syst. Appl. 2025, 263, 125688. [Google Scholar] [CrossRef]
- Hong, H.; Wang, D.; Zhu, A.X.; Wang, Y. Landslide susceptibility mapping based on the reliability of landslide and non-landslide sample. Expert Syst. Appl. 2024, 243, 122933. [Google Scholar] [CrossRef]
- Chen, Z.; Quan, H.; Jin, R.; Lin, Z.; Jin, G. Debris flow susceptibility assessment based on boosting ensemble learning techniques: A case study in the Tumen River basin, China. Stoch. Environ. Res. Risk Assess. 2024, 38, 2359–2382. [Google Scholar] [CrossRef]
- Wu, Y.; Zhou, Y. Hybrid machine learning model and Shapley additive explanations for compressive strength of sustainable concrete. Constr. Build. Mater. 2022, 330, 127298. [Google Scholar] [CrossRef]
- Li, K.; Zhao, J.; Lin, Y.J.N.H. Debris-flow susceptibility assessment in Dongchuan using stacking ensemble learning including multiple heterogeneous learners with RFE for factor optimization. Nat. Hazards 2023, 118, 2477–2511. [Google Scholar] [CrossRef]
- Wen, H.; Li, J.; Liao, M.; Di, M.; Hu, J.; Liu, B. A hybrid-optimized Random Forest interpretable model for debris flow susceptibility by prior model-based negative sampling. Adv. Space Res. 2025, 76, 202–220. [Google Scholar] [CrossRef]
- Daud, H.; Dou, J.; Khan, N.G.; Xu, B.; Dong, S.; Dong, A.; Ma, H. Tree-Based Machine Learning and Flow Simulation for Debris Flow Susceptibility, Runout Propagation, and Dynamics in the Higher Himalayas. Math. Geosci. 2025, 1–39. [Google Scholar] [CrossRef]
- Guo, Z.; Zeng, T.; Zhang, Y.; Yu, W.; Wang, L.; Guo, Z.; Glade, T. A novel hybrid model integrating high resolution remote sensing and stacking ensemble techniques for landslide susceptibility mapping: Application to event-based landslide inventory. Geomorphology 2025, 486, 109886. [Google Scholar] [CrossRef]
- Xu, C.; Dai, F.C.; Chen, J.; Tu, X.B.; Xu, L.; Li, W.C.; Tian, W.; Cao, Y.B.; Yao, X. Remote sensing explanation of secondary geological disasters in worst-hit areas of Wenchuan Ms8.0 earthquake. J. Remote Sens. 2009, 13, 754–762. (In Chinese) [Google Scholar] [CrossRef]
- Yan, Y.; Ge, Y.G.; Zhang, J.Q.; Zeng, C. Cause and characteristic analysis of “7.10” debris flow disaster in Cutou Gully, Wenchuan County, Sichuan Province. J. Catastrophol. 2014, 3, 229–234. (In Chinese) [Google Scholar] [CrossRef]
- Guo, X.J.; Fan, J.L.; Cui, P.; Yan, Y. Rainfall threshold for debris flow triggering in Wenchuan earthquake area. Mt. Res. 2015, 33, 579–586. (In Chinese) [Google Scholar] [CrossRef]
- Li, Q.; Tang, Y.G. Analysis on the causes and prevention countermeasures of debris flow. Gansu Sci. Technol. 2020, 36, 58–60. (In Chinese) [Google Scholar] [CrossRef]
- Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Liang, Y.; Jia, Z.; Wu, Q.; Xiao, K.; Yuan, R.; Zhou, H.; He, Y. Probabilistic slope stability analysis based on the Hermite-logistic regression approach. Adv. Eng. Softw. 2025, 208, 103973. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Quinlan, J.R. Induction of Decision Tree. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
- Jerome, H.F. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Jenks, G.F. The Data Model Concept in Statistical Mapping. Int. Yearb. Cartogr. 1967, 7, 186–190. [Google Scholar]
- He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Tao, X.; Guo, X.; Xu, A.; Shi, L.; Li, J.; Liu, K.; Tao, S. Majority data-based overlapping shift technique for imbalanced datasets classification with small disjuncts and outliers. Expert Syst. Appl. 2025, 289, 128204. [Google Scholar] [CrossRef]
- Zha, Q.; Liu, X.; Cheung, Y.M.; Peng, S.J.; Xu, X.; Wang, N. UCPM: Uncertainty-Guided Cross-Modal Retrieval with Partially Mismatched Pairs. IEEE Trans. Image Process. 2025, 34, 3622–3634. [Google Scholar] [CrossRef] [PubMed]
- Brenning, A. Spatial prediction models for landslide hazards: Review, comparison and evaluation. Nat. Hazards Earth Syst. Sci. 2005, 5, 853–862. [Google Scholar] [CrossRef]
- Lundberg, S.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Xia, C.H.; Zhu, J.; Chang, M.; Yang, Y. Debris flow susceptibility analysis and evaluation based on probabilistic mathematical method and GIS: A case study of Wenchuan County. J. Yangtze River Sci. Res. Inst. 2017, 34, 34–38, 44. (In Chinese) [Google Scholar] [CrossRef]
- Chen, J.; Li, Y.; Zhou, W.; Xu, C.; Wu, S.; Yue, W. AHP-Based Susceptibility Assessment on Debris-flow in Semiarid Mountainous Region: A Case of Benzilan-Changbo Segment in the Upper Jinsha River, China. In Proceedings of the Geo-Spatial Knowledge and Intelligence, Singapore, 8–10 December 2018; pp. 495–509. [Google Scholar] [CrossRef]
Models | Dataset | ACC | Recall | F1-Score | Precision | AUC |
---|---|---|---|---|---|---|
LR | Training set | 0.8766 | 0.8846 | 0.8790 | 0.8734 | 0.8765 |
Testing set | 0.8182 | 0.8750 | 0.8235 | 0.7778 | 0.8704 | |
RF | Training set | 0.9935 | 1.0000 | 0.9936 | 0.9873 | 0.9934 |
Testing set | 0.8030 | 0.9062 | 0.8169 | 0.7436 | 0.8952 | |
NB | Training set | 0.7013 | 0.4615 | 0.6102 | 0.9000 | 0.7045 |
Testing set | 0.7273 | 0.5000 | 0.6400 | 0.8889 | 0.8575 | |
DT | Training set | 0.9935 | 0.9872 | 0.9935 | 1.0000 | 0.9936 |
Testing set | 0.8030 | 0.7812 | 0.7937 | 0.8065 | 0.8024 | |
GBDT | Training set | 0.9935 | 1.0000 | 0.9936 | 0.9873 | 0.9934 |
Testing set | 0.8030 | 0.9062 | 0.8169 | 0.7436 | 0.9035 | |
SVM | Training set | 0.6299 | 0.3333 | 0.4771 | 0.8387 | 0.6338 |
Testing set | 0.6061 | 0.2188 | 0.3500 | 0.8750 | 0.6769 |
Models | Dataset | ACC | Recall | F1-Score | Precision | AUC |
---|---|---|---|---|---|---|
Stacking ensemble model | Training set | 0.9805 | 1.0000 | 0.9811 | 0.9630 | 0.9979 |
Testing set | 0.8182 | 0.8750 | 0.8235 | 0.7778 | 0.9044 | |
Stacking-RF model | Training set | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
Testing set | 0.9697 | 0.9375 | 0.9677 | 1.0000 | 0.9931 |
Models | RF | Stacking-RF | ||||
---|---|---|---|---|---|---|
Susceptibility Levels | Historical Disaster Events | Proportion of Historical Disasters | Disaster Events Density | Historical Disaster Events | Proportion of Historical Disasters | Disaster Events Density |
Very low | 17 | 19.32% | 0.0194 | 0 | 0.00% | 0.0000 |
Low | 45 | 51.14% | 0.0225 | 20 | 22.73% | 0.0186 |
Medium | 22 | 25.00% | 0.0348 | 21 | 23.86% | 0.0244 |
High | 4 | 4.55% | 0.0626 | 39 | 44.32% | 0.0268 |
Very high | 0 | 0.00% | 0.0000 | 8 | 9.09% | 0.0469 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, J.; Zhang, J.; Yu, J.; Chu, Y.; Wen, H. Improving the Generalization Performance of Debris-Flow Susceptibility Modeling by a Stacking Ensemble Learning-Based Negative Sample Strategy. Water 2025, 17, 2460. https://doi.org/10.3390/w17162460
Li J, Zhang J, Yu J, Chu Y, Wen H. Improving the Generalization Performance of Debris-Flow Susceptibility Modeling by a Stacking Ensemble Learning-Based Negative Sample Strategy. Water. 2025; 17(16):2460. https://doi.org/10.3390/w17162460
Chicago/Turabian StyleLi, Jiayi, Jialan Zhang, Jingyuan Yu, Yongbo Chu, and Haijia Wen. 2025. "Improving the Generalization Performance of Debris-Flow Susceptibility Modeling by a Stacking Ensemble Learning-Based Negative Sample Strategy" Water 17, no. 16: 2460. https://doi.org/10.3390/w17162460
APA StyleLi, J., Zhang, J., Yu, J., Chu, Y., & Wen, H. (2025). Improving the Generalization Performance of Debris-Flow Susceptibility Modeling by a Stacking Ensemble Learning-Based Negative Sample Strategy. Water, 17(16), 2460. https://doi.org/10.3390/w17162460