Mitigating Overfitting and Physical Inconsistency in Flood Susceptibility Mapping: A Physics-Constrained Evolutionary Machine Learning Framework for Ungauged Alpine Basins
Abstract
1. Introduction
- (1)
- The ‘Accuracy Paradox’ in Ungauged Basins: In regions lacking ground monitoring, hydrological models driven by satellite precipitation (e.g., TRMM/CMFD) inevitably exhibit systematic amplitude errors (underestimation or overestimation). Existing frameworks lack a transformation strategy to convert these ‘quantitatively inaccurate’ absolute fluxes into ‘qualitatively reliable’ relative indicators (e.g., temporal phasing and flow routing paths), leading to a risk of physical bias propagation.
- (2)
- Trivial Learning in Sampling: Traditional random negative sampling strategies often draw non-flood points randomly from the entire domain (including vast high-altitude mountains). This practice easily leads to the “dominance of simple negative samples”—where the model only needs to learn the simple topographic rule that “high altitude equals safety” to achieve high accuracy, thus falling into the trap of trivial learning. The lack of targeted “Hard Negatives” (such as low-lying but safe terraces) makes it difficult for the model to finely distinguish between “flood-prone floodplains” and “safe zones” within topographically similar valley regions, leading to blurred decision boundaries [21].
- (3)
- Overfitting Dilemma in Small-Sample Scenarios: Alpine cryosphere basins typically suffer from a severe scarcity of historical flood records (Sample Size Feature Dimension). In such data-starved environments, unconstrained ML models (e.g., deep decision trees) tend to memorize high-frequency stochastic noise rather than learning generalizable laws. There is a lack of adaptive structural regularization strategies (specifically tailored for small datasets) to enforce the Principle of Parsimony, ensuring spatial continuity.
- (1)
- Construct a hybrid assessment framework that integrates the dynamic runoff generation and flow routing mechanisms of the BTOP model into machine learning classifiers, utilizing runoff depth () as a “Refinement Factor” to provide basin-scale hydrological connectivity constraints, thereby correcting topographic artifacts produced by static topographic factors.
- (2)
- Propose and validate the PCNS strategy to prevent the model from relying on spatial memorization by introducing “Hard Negatives,” thereby improving the specificity of the model in defining “safe zones” within complex valley environments.
- (3)
- Employ Particle Swarm Optimization (PSO) to identify the optimal “shallow tree” configuration, demonstrating how structural regularization prevents overfitting and ensures spatial continuity in data-scarce basins.
- (4)
- Analyze the model’s physical interpretability based on the SHAP (SHapley Additive exPlanations) method, verifying the consistency between the model’s decision rules and hydrological physical principles.
2. Study Area and Data
2.1. Study Area
2.2. Data Acquisition and Processing
2.2.1. Hydrological and Meteorological Data (For BTOP)
2.2.2. Static Conditioning Factors (For Machine Learning)
3. Methodology
3.1. Physical Process Modeling via BTOP
3.1.1. Model Principles and Runoff Generation Mechanism
3.1.2. Parameter Optimization and Calibration Configuration
3.1.3. Construction of the Relative Hydraulic Intensity Index
- (1)
- Temporal Aggregation: The inner term extracts the maximum daily runoff depth generated during historical flood events, capturing the extreme hydrological state most representative of disaster-causing conditions.
- (2)
- Logarithmic Smoothing: Because extreme runoff values exhibit a heavily right-skewed spatial distribution (i.e., massive volumes in main channels, near-zero on hillslopes), a base-10 logarithmic transformation is applied (adding 1 to prevent undefined operations for zero-runoff pixels). This critical step compresses extreme outliers and enhances the continuous spatial texture of the flow network.
- (3)
- Resampling and Normalization: To align with the 30 m static environmental variables, the processed 1 km raster is spatially downscaled using bilinear interpolation. Finally, to eliminate magnitude bias in subsequent machine learning algorithms, this interpolated raster is normalized to a dimensionless [0, 1] interval using the Min-Max scaling method (as defined in Equation (5)), yielding the final input feature .
3.2. Dataset Construction and Sampling Strategy
3.2.1. Selection and Quantification of Flood Conditioning Factors
3.2.2. Physiographically Constrained Negative Sampling (PCNS) Strategy
3.2.3. Hybrid Dataset Allocation and Validation Scheme
3.2.4. Ablation Study Design
3.3. Hybrid Model Construction: Optimization and Comparison
3.3.1. Benchmark Models
3.3.2. Proposed Model: Physics-Constrained PSO-RF
3.4. Evaluation and Interpretation
3.4.1. Statistical Performance Metrics
3.4.2. Physical Interpretability
- (1)
- Mean Decrease Impurity (MDI)
- (2)
- SHAP (SHapley Additive exPlanations)
4. Results and Discussion
4.1. Hydrological Simulation and Physical Constraints
4.2. Optimization Landscape and Structural Regularization
4.3. Comparative Model Performance
4.3.1. Statistical Evaluation and Structural Regularization Effects
4.3.2. Sampling Strategy Verification
4.3.3. Limitations of the AHP Model
4.4. Physical Mechanism Verification
4.4.1. Superiority of Dynamic Constraints over Static Indices
4.4.2. Directional Consistency via SHAP
4.5. Spatial Susceptibility Mapping and Artifact Correction
4.5.1. General Spatial Pattern and Risk Zone Statistical Characteristics
4.5.2. Correction of Topographic Artifacts
4.5.3. Suppression of Random Noise
4.6. Uncertainty Analysis and Limitations
5. Summary and Conclusions
- (1)
- Dynamic Constraints Correct Topographic Artifacts: Despite the systematic amplitude bias () in the physical simulation, the BTOP-derived Runoff Depth () proved to be a robust “Relative Hydraulic Intensity Index.” Ablation studies confirmed that significantly outperforms the static Topographic Wetness Index (TWI) in feature importance (0.15 vs. 0.08). It acts as a critical “Refinement Factor,” incorporating hydrological connectivity to correctly identify “dry lowlands” (hydrologically isolated depressions) that static models often misclassify.
- (2)
- A crucial finding was revealed: The “shallow tree” () selected via PSO significantly outperformed structurally complex unconstrained models. Although the deep learning field generally posits that increasing model complexity helps capture nonlinear patterns, in geological hazard assessment scenarios characterized by sample scarcity () and significant noise, excessive complexity leads the model to “rote memorization” of high-frequency spatial noise rather than learning generalizable physical laws [18]. We hypothesize that the core mechanisms controlling flood occurrence (gravity-driven runoff, topographic convergence) inherently belong to a “low-dimensional physical manifold.” Limiting tree depth effectively imposes a structured prior, forcing the model to make decisions based only on the most dominant physical factors (elevation, ), which aligns with the Parsimony Principle in environmental modeling.
- (3)
- The P-PDRF framework achieved a significant improvement in specificity (0.861), providing robust validation for the effectiveness of the Physiographically Constrained Negative Sampling (PCNS) strategy. Unlike traditional random sampling, the PCNS strategy intentionally retained a large number of “hard negatives”—safe points located in low-altitude, gentle regions (e.g., river terraces)—within the training set. This strategy artificially reduced the distinguishability of topographic factors between positive and negative samples, forcing the classifier to rely on hydraulic factors () to identify the physical differences between “absolute safe zones” (terraces) and “absolute flood-prone zones” (floodplains). The over-generalization of Support Vector Machines (SVM) at valley bottoms (large areas of red false alarms) highlights the limitations of kernel-based methods when facing such hard samples, conversely underscoring the superiority of rule-based tree models combined with the PCNS strategy in precisely delineating high-risk corridors.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Intergovernment Panel on Climate Change. Climate Change 2007: The Physical Science Basis; IPCC: Geneva, Switzerland, 2007. [Google Scholar]
- Hirabayashi, Y.; Mahendran, R.; Koirala, S.; Konoshima, L.; Yamazaki, D.; Watanabe, S.; Kim, H.; Kanae, S. Global flood risk under climate change. Nat. Clim. Change 2013, 3, 816–821. [Google Scholar] [CrossRef]
- Winsemius, H.C.; Aerts, J.C.; Van Beek, L.P.; Bierkens, M.F.; Bouwman, A.; Jongman, B.; Kwadijk, J.C.; Ligtvoet, W.; Lucas, P.L.; Van Vuuren, D.P. Global Drivers of Future River Flood Risk. Nat. Clim. Change 2016, 6, 381–385. [Google Scholar] [CrossRef]
- Costache, R.; Pham, Q.B.; Sharifi, E.; Linh, N.T.T.; Abba, S.I.; Vojtek, M.; Vojteková, J.; Nhi, P.T.T.; Khoi, D.N. Flash-Flood Susceptibility Assessment Using Multi-Criteria Decision Making and Machine Learning Supported by Remote Sensing and GIS Techniques. Remote Sens. 2019, 12, 106. [Google Scholar] [CrossRef]
- Choubin, B.; Moradi, E.; Golshan, M.; Adamowski, J.; Sajedi-Hosseini, F.; Mosavi, A. An Ensemble Prediction of Flood Susceptibility Using Multivariate Discriminant Analysis, Classification and Regression Trees, and Support Vector Machines. Sci. Total Environ. 2019, 651, 2087–2096. [Google Scholar] [CrossRef] [PubMed]
- Immerzeel, W.W.; Van Beek, L.P.H.; Bierkens, M.F.P. Climate Change Will Affect the Asian Water Towers. Science 2010, 328, 1382–1385. [Google Scholar] [CrossRef]
- Hrachowitz, M.; Savenije, H.H.G.; Blöschl, G.; McDonnell, J.J.; Sivapalan, M.; Pomeroy, J.W.; Arheimer, B.; Blume, T.; Clark, M.P.; Ehret, U.; et al. A Decade of Predictions in Ungauged Basins (PUB)—A Review. Hydrol. Sci. J. 2013, 58, 1198–1255. [Google Scholar] [CrossRef]
- Blöschl, G. Runoff Prediction in Ungauged Basins: Synthesis Across Processes, Places and Scales; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
- Yue, J.; Zhou, L.; Du, J.; Zhou, C.; Nimai, S.; Wu, L.; Ao, T. Runoff Simulation in Data-Scarce Alpine Regions: Comparative Analysis Based on LSTM and Physically Based Models. Water 2024, 16, 2161. [Google Scholar] [CrossRef]
- Aronica, G.; Bates, P.D.; Horritt, M.S. Assessing the Uncertainty in Distributed Model Predictions Using Observed Binary Pattern Information within GLUE. Hydrol. Process. 2002, 16, 2001–2016. [Google Scholar] [CrossRef]
- Beven, K.J.; Kirkby, M.J. A Physically Based, Variable Contributing Area Model of Basin Hydrology/Un Modèle à Base Physique de Zone d’appel Variable de l’hydrologie Du Bassin Versant. Hydrol. Sci. Bull. 1979, 24, 43–69. [Google Scholar] [CrossRef]
- Beven, K. A Manifesto for the Equifinality Thesis. J. Hydrol. 2006, 320, 18–36. [Google Scholar] [CrossRef]
- Yalcin, A. GIS-Based Landslide Susceptibility Mapping Using Analytical Hierarchy Process and Bivariate Statistics in Ardesen (Turkey): Comparisons of Results and Confirmations. Catena 2008, 72, 1–12. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Tehrany, M.S.; Pradhan, B.; Jebur, M.N. Flood Susceptibility Mapping Using a Novel Ensemble Weights-of-Evidence and Support Vector Machine Models in GIS. J. Hydrol. 2014, 512, 332–343. [Google Scholar] [CrossRef]
- Khosravi, K.; Shahabi, H.; Pham, B.T.; Adamowski, J.; Shirzadi, A.; Pradhan, B.; Dou, J.; Ly, H.-B.; Gróf, G.; Ho, H.L. A Comparative Assessment of Flood Susceptibility Modeling Using Multi-Criteria Decision-Making Analysis and Machine Learning Methods. J. Hydrol. 2019, 573, 311–323. [Google Scholar] [CrossRef]
- Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; Prabhat, F. Deep Learning and Process Understanding for Data-Driven Earth System Science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef] [PubMed]
- Valavi, R.; Elith, J.; Lahoz-Monfort, J.J.; Guillera-Arroita, G. Modelling Species Presence-only Data with Random Forests. Ecography 2021, 44, 1731–1742. [Google Scholar] [CrossRef]
- Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-Informed Machine Learning. Nat. Rev. Phys. 2021, 3, 422–440. [Google Scholar] [CrossRef]
- Nearing, G.S.; Kratzert, F.; Sampson, A.K.; Pelissier, C.S.; Klotz, D.; Frame, J.M.; Prieto, C.; Gupta, H.V. What Role Does Hydrological Science Play in the Age of Machine Learning? Water Resour. Res. 2021, 57, e2020WR028091. [Google Scholar] [CrossRef]
- Shafizadeh-Moghadam, H.; Valavi, R.; Shahabi, H.; Chapi, K.; Shirzadi, A. Novel Forecasting Approaches Using Combination of Machine Learning and Statistical Models for Flood Susceptibility Mapping. J. Environ. Manag. 2018, 217, 1–11. [Google Scholar] [CrossRef]
- Lin, X.; Zhang, Y.; Yao, Z.; Gong, T.; Wang, H.; Chu, D.; Liu, L.; Zhang, F. The Trend on Runoff Variations in the Lhasa River Basin. J. Geogr. Sci. 2008, 18, 95–106. [Google Scholar] [CrossRef]
- He, J.; Yang, K.; Tang, W.; Lu, H.; Qin, J.; Chen, Y.; Li, X. The First High-Resolution Meteorological Forcing Dataset for Land Process Studies over China. Sci. Data 2020, 7, 25. [Google Scholar] [CrossRef]
- Gao, J.; Shi, Y.; Zhang, H.; Chen, X.; Zhang, W.; Shen, W.; Xiao, T.; Zhang, Y. China Regional 250m Fractional Vegetation Cover Data Set (2000–2023); National Tibetan Plateau/Third Pole Environment Data Center: Beijing, China, 2022. [Google Scholar]
- Shi, G.; Sun, W.; Shangguan, W.; Wei, Z.; Yuan, H.; Zhang, Y.; Liang, H.; Li, L.; Sun, X.; Li, D. A China Dataset of Soil Properties for Land Surface Modeling (Version 2). Earth Syst. Sci. Data Discuss. 2024, 2024, 1–35. [Google Scholar]
- Yang, J.; Huang, X. 30 m Annual Land Cover and Its Dynamics in China from 1990 to 2019. Earth Syst. Sci. Data Discuss. 2021, 13, 3907–3925. [Google Scholar] [CrossRef]
- Takeuchi, K.; Hapuarachchi, P.; Zhou, M.; Ishidaira, H.; Magome, J. A BTOP Model to Extend TOPMODEL for Distributed Hydrological Simulation of Large Basins. Hydrol. Process. Int. J. 2008, 22, 3236–3251. [Google Scholar] [CrossRef]
- Tianqi, A.; Takeuchi, K.; Ishidaira, H.; Yoshitani, J.; Fukami, K. Development and Application of a New Algorithm for Automated Pit Removal for Grid DEMs. Hydrol. Sci. J. 2003, 48, 985–997. [Google Scholar] [CrossRef]
- Duan, Q.; Sorooshian, S.; Gupta, V.K. Optimal Use of the SCE-UA Global Optimization Method for Calibrating Watershed Models. J. Hydrol. 1994, 158, 265–284. [Google Scholar] [CrossRef]
- Rahmati, O.; Pourghasemi, H.R.; Zeinivand, H. Flood Susceptibility Mapping Using Frequency Ratio and Weights-of-Evidence Models in the Golastan Province, Iran. Geocarto Int. 2016, 31, 42–70. [Google Scholar] [CrossRef]
- Saxton, K.E.; Rawls, W.J. Soil water characteristic estimates by texture and organic matter for hydrologic solutions. Soil Sci. Soc. Am. J. 2006, 70, 1569–1578. [Google Scholar] [CrossRef]
- Cosby, B.J.; Hornberger, G.M.; Clapp, R.B.; Ginn, T.R. A Statistical Exploration of the Relationships of Soil Moisture Characteristics to the Physical Properties of Soils. Water Resour. Res. 1984, 20, 682–690. [Google Scholar] [CrossRef]
- Cronshey, R. Urban Hydrology for Small Watersheds; US Department of Agriculture, Soil Conservation Service, Engineering Division: Washington, DC, USA, 1986. [Google Scholar]
- Mishra, S.K.; Singh, V.P. Soil Conservation Service Curve Number (SCS-CN) Methodology; Springer Science & Business Media: Dordrecht, The Netherlands, 2013; Volume 42. [Google Scholar]
- Han, J.; Pei, J.; Tong, H. Data Mining: Concepts and Techniques; Morgan Kaufmann: San Francisco, CA, USA, 2022. [Google Scholar]
- Tien Bui, D.; Pradhan, B.; Nampak, H.; Bui, Q.-T.; Tran, Q.-A.; Nguyen, Q.-P. Hybrid Artificial Intelligence Approach Based on Neural Fuzzy Inference Model and Metaheuristic Optimization for Flood Susceptibilitgy Modeling in a High-Frequency Tropical Cyclone Area Using GIS. J. Hydrol. 2016, 540, 317–330. [Google Scholar] [CrossRef]
- Bui, D.T.; Shirzadi, A.; Shahabi, H.; Chapi, K.; Omidavr, E.; Pham, B.T.; Asl, D.T.; Khaledian, H.; Pradhan, B.; Panahi, M.; et al. A Novel Ensemble Artificial Intelligence Approach for Gully Erosion Mapping in a Semi-Arid Watershed (Iran). Sensors 2019, 19, 2444. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009. [Google Scholar]
- Kohavi, R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the International Joint Conference on Articial Intelligence (IJCAI), Montreal, QC, Canada, 20–25 August 1995; Volume 14, pp. 1137–1145. [Google Scholar]
- Saaty, T.L. The Analytic Hierarchy Process; Mcgraw Hill: New York, NY, USA, 1980. [Google Scholar]
- Hong, H.; Tsangaratos, P.; Ilia, I.; Liu, J.; Zhu, A.-X.; Chen, W. Application of Fuzzy Weight of Evidence and Data Mining Techniques in Construction of Flood Susceptibility Map of Poyang County, China. Sci. Total Environ. 2018, 625, 575–588. [Google Scholar] [CrossRef]
- Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: New York, NY, USA, 2013. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Kavzoglu, T.; Colkesen, I. A Kernel Functions Analysis for Support Vector Machines for Land Cover Classification. Int. J. Appl. Earth Obs. Geoinf. 2009, 11, 352–359. [Google Scholar] [CrossRef]
- Yao, X.; Tham, L.G.; Dai, F.C. Landslide Susceptibility Mapping Based on Support Vector Machine: A Case Study on Natural Slopes of Hong Kong, China. Geomorphology 2008, 101, 572–582. [Google Scholar] [CrossRef]
- Tehrany, M.S.; Pradhan, B.; Jebur, M.N. Flood Susceptibility Analysis and Its Verification Using a Novel Ensemble Support Vector Machine and Frequency Ratio Method. Stoch. Environ. Res. Risk Assess. 2015, 29, 1149–1165. [Google Scholar] [CrossRef]
- Kennedy, J.; Eberhart, R. Particle Swarm Optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks; IEEE: New York, NY, USA, 1995; Volume 4, pp. 1942–1948. [Google Scholar]
- Poli, R.; Kennedy, J.; Blackwell, T. Particle Swarm Optimization: An Overview. Swarm Intell. 2007, 1, 33–57. [Google Scholar] [CrossRef]
- Geman, S.; Bienenstock, E.; Doursat, R. Neural Networks and the Bias/Variance Dilemma. Neural Comput. 1992, 4, 1–58. [Google Scholar] [CrossRef]
- Nguyen, P.T.; Ha, D.H.; Jaafari, A.; Nguyen, H.D.; Van Phong, T.; Al-Ansari, N.; Prakash, I.; Le, H.V.; Pham, B.T. Groundwater Potential Mapping Combining Artificial Neural Network and Real AdaBoost Ensemble Technique: The DakNong Province Case-Study, Vietnam. Int. J. Environ. Res. Public Health 2020, 17, 2473. [Google Scholar] [CrossRef]
- Pham, B.T.; Jaafari, A.; Prakash, I.; Bui, D.T. A Novel Hybrid Intelligent Model of Support Vector Machines and the MultiBoost Ensemble for Landslide Susceptibility Modeling. Bull. Eng. Geol. Environ. 2019, 78, 2865–2886. [Google Scholar] [CrossRef]
- Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
- Landis, J.R.; Koch, G.G. The Measurement of Observer Agreement for Categorical Data. Biometrics 1977, 33, 159–174. [Google Scholar] [CrossRef]
- Hossin, M.; Sulaiman, M.N. A Review on Evaluation Metrics for Data Classification Evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1–11. [Google Scholar]
- Youden, W.J. Index for Rating Diagnostic Tests. Cancer 1950, 3, 32–35. [Google Scholar] [CrossRef] [PubMed]
- Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
- Molnar, C. Interpretable Machine Learning; Lulu. Com: Morrisville, NC, USA, 2020. [Google Scholar]
- Štrumbelj, E.; Kononenko, I. Explaining Prediction Models and Individual Predictions with Feature Contributions. Knowl. Inf. Syst. 2014, 41, 647–665. [Google Scholar] [CrossRef]
- Moriasi, D.N.; Arnold, J.G.; Van Liew, M.W.; Bingner, R.L.; Harmel, R.D.; Veith, T.L. Model Evaluation Guidelines for Systematic Quantification of Accuracy in Watershed Simulations. Trans. ASABE 2007, 50, 885–900. [Google Scholar] [CrossRef]
- Sun, M.; Liu, A.; Zhao, L.; Wang, C.; Yang, Y. Evaluation of Multi-Source Precipitation Products in the Hinterland of the Tibetan Plateau. Atmosphere 2024, 15, 138. [Google Scholar] [CrossRef]
- Kratzert, F.; Klotz, D.; Herrnegger, M.; Sampson, A.K.; Hochreiter, S.; Nearing, G.S. Toward Improved Predictions in Ungauged Basins: Exploiting the Power of Machine Learning. Water Resour. Res. 2019, 55, 11344–11354. [Google Scholar] [CrossRef]
- Mallick, J.; Hang, H.T.; Das, A.; Poddar, S.; Singh, C.K. Integrating Ensemble Machine Learning and SAR-Based Geospatial Modelling for Inclusive and Equitable Urban Flood Resilience. Sustain. Cities Soc. 2026, 137, 107158. [Google Scholar] [CrossRef]
- Tien Bui, D.; Hoang, N.-D.; Martínez-Álvarez, F.; Ngo, P.-T.T.; Hoa, P.V.; Pham, T.D.; Samui, P.; Costache, R. A Novel Deep Learning Neural Network Approach for Predicting Flash Flood Susceptibility: A Case Study at a High Frequency Tropical Storm Area. Sci. Total Environ. 2020, 701, 134413. [Google Scholar] [CrossRef]
- Costache, R.; Tin, T.T.; Arabameri, A.; Crăciun, A.; Ajin, R.S.; Costache, I.; Islam, A.R.M.T.; Abba, S.I.; Sahana, M.; Avand, M.; et al. Flash-Flood Hazard Using Deep Learning Based on H2O R Package and Fuzzy-Multicriteria Decision-Making Analysis. J. Hydrol. 2022, 609, 127747. [Google Scholar] [CrossRef]
- DeLong, E.R.; DeLong, D.M.; Clarke-Pearson, D.L. Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics 1988, 837–845. [Google Scholar] [CrossRef]
- Chapi, K.; Singh, V.P.; Shirzadi, A.; Shahabi, H.; Bui, D.T.; Pham, B.T.; Khosravi, K. A Novel Hybrid Artificial Intelligence Approach for Flood Susceptibility Assessment. Environ. Model. Softw. 2017, 95, 229–245. [Google Scholar] [CrossRef]
- Zhao, G.; Pang, B.; Xu, Z.; Cui, L.; Wang, J.; Zuo, D.; Peng, D. Improving Urban Flood Susceptibility Mapping Using Transfer Learning. J. Hydrol. 2021, 602, 126777. [Google Scholar] [CrossRef]












| Factor Name | Resolution (m) | Source |
|---|---|---|
| Elevation | 30 | ASTER GDEM (https://www.gscloud.cn/sources/accessdata/310?pid=302, accessed on 11 December 2025) |
| Aspect | ||
| Slope | ||
| FVC | 250 | The data set is provided by National Tibetan Plateau/Third Pole Environment Data Center (http://data.tpdc.ac.cn). |
| SCS_CN | 30 | Calculated from soil data and land use data. The sources of land use data are as follows: The 30 m annual land cover datasets and its dynamics in China from 1985 to 2022 (https://zenodo.org/record/8176941, accessed on 15 December 2025) |
| Soil_Ksat | 90 | The data set is provided by National Tibetan Plateau/Third Pole Environment Data Center (http://data.tpdc.ac.cn). |
| Runoff_Depth | 1000 | / |
| Model | Hyperparameter | Description | Search Space/Method | Optimal Value |
|---|---|---|---|---|
| PSO-RF | Number of trees | PSO Search: [20, 200] | 60 | |
| Maximum depth of trees | PSO Search: [2, 20] | 4 | ||
| Swarm Size | Particle population | Fixed | 20 | |
| Iterations | Optimization rounds | Fixed | 50 | |
| Standard RF | Number of trees | Fixed (Default) | 1000 | |
| Maximum depth of trees | Fixed (Unconstrained) | None | ||
| Criterion | Splitting rule | Default | Gini Impurity | |
| SVM | Kernel | Kernel function type | Fixed (Non-linear) | Radial Basis Function |
| C | Regularization parameter | Grid Search: [0.1, 1, 10, 50, 100, 200] | 1 | |
| Kernel coefficient | Grid Search: [1, 0.1, 0.01, 0.001, scale] | ‘scale’ |
| Factors | Weights | CI | RI | CR |
|---|---|---|---|---|
| Elevation | 0.2413 | 0.0339 | 1.32 | 0.0257 |
| Slope | 0.1604 | |||
| Aspect | 0.0333 | |||
| FVC | 0.0705 | |||
| Soil_Ksat | 0.1058 | |||
| SCS_CN | 0.0333 | |||
| Runoff_Depth | 0.3555 |
| Model | AUC | Accuracy | Recall | Specificity | Kappa |
|---|---|---|---|---|---|
| PSO-RF | 0.942 | 0.875 | 0.889 | 0.861 | 0.750 |
| Standard RF | 0.919 | 0.861 | 0.889 | 0.833 | 0.722 |
| SVM | 0.913 | 0.861 | 0.889 | 0.833 | 0.722 |
| AHP | 0.853 | 0.639 | 0.389 | 0.889 | 0.278 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yan, C.; Wu, L.; Huang, P.; Yue, J.; Li, H.; Zhou, C.; Fan, C.; Guo, Y.; Zhou, L. Mitigating Overfitting and Physical Inconsistency in Flood Susceptibility Mapping: A Physics-Constrained Evolutionary Machine Learning Framework for Ungauged Alpine Basins. Water 2026, 18, 882. https://doi.org/10.3390/w18070882
Yan C, Wu L, Huang P, Yue J, Li H, Zhou C, Fan C, Guo Y, Zhou L. Mitigating Overfitting and Physical Inconsistency in Flood Susceptibility Mapping: A Physics-Constrained Evolutionary Machine Learning Framework for Ungauged Alpine Basins. Water. 2026; 18(7):882. https://doi.org/10.3390/w18070882
Chicago/Turabian StyleYan, Chuanjie, Lingling Wu, Peng Huang, Jiajia Yue, Haowen Li, Chun Zhou, Congxiang Fan, Yinan Guo, and Li Zhou. 2026. "Mitigating Overfitting and Physical Inconsistency in Flood Susceptibility Mapping: A Physics-Constrained Evolutionary Machine Learning Framework for Ungauged Alpine Basins" Water 18, no. 7: 882. https://doi.org/10.3390/w18070882
APA StyleYan, C., Wu, L., Huang, P., Yue, J., Li, H., Zhou, C., Fan, C., Guo, Y., & Zhou, L. (2026). Mitigating Overfitting and Physical Inconsistency in Flood Susceptibility Mapping: A Physics-Constrained Evolutionary Machine Learning Framework for Ungauged Alpine Basins. Water, 18(7), 882. https://doi.org/10.3390/w18070882

