Coupled Risk Assessment of Flood Before and During Disaster Based on Machine Learning
Abstract
1. Introduction
2. Materials and Methods
2.1. Materials
2.1.1. Study Area
2.1.2. Sample
2.1.3. Impact Factors for Flood Assessment
2.2. Methods
2.2.1. Model Selection
2.2.2. Flood Risk Index
2.2.3. Shapley Additive Explanations (SHAP)
3. Results
3.1. Index Distribution in Research Region
3.2. Flood Risk Forecast
3.3. Flood Inundation Forecast
3.4. FRI
3.5. SHAP Results
3.5.1. Histogram of SHAP Values
3.5.2. Interaction Summary Plot
4. Discussion
4.1. Rationalization of the Resolution Used in the Current Study
4.2. SHAP-Based Indicator Correlation Analysis
4.3. References for Flood Management
4.4. Limitations
5. Conclusions
- XGBoost exhibits the best performance in predicting flood occurrence risk. By comparing precision, recall, F1-score, accuracy, and AUC, XGBoost outperformed other models for all evaluation metrics, and the average multi-sample metric values (precision, recall, F1-score, accuracy, AUC) of XGBoost reached 0.823398, 0.831667, 0.827090, 0.826435, and 0.871062, respectively.
- High flood risk areas, long flood durations, and high FRI values are concentrated in large river basins and coastal areas. The study shows that flood risk ranges from 0.000073 to 0.998483, with a mean value of 0.237031; flood duration ranges from 0.223598 to 2.077040, with a mean value of 0.940050; and FRI ranges from 0 to 0.934256, with a mean value of 0.091711.
- From a provincial perspective, the provinces with the highest percentage of medium and high-value zones are Shanghai Municipality, Jiangsu Province, Anhui Province, and Zhejiang Province. Ten cities—Suzhou, Jiaxing, Yangzhou, Suqian, Changzhou, Wuxi, Lianyungang, Yancheng, Huai’an, and Bengbu—contain medium-high value zones exceeding 40%, with ratios of 48.99%, 48.07%, 46.87%, 44.19%, 43.43%, 43.20%, 42.21%, 40.88%, 40.73%, and 40.06%, respectively.
- SHAP interpreter analysis results indicate that elevation and rainfall are critical factors influencing flood occurrence. These two factors significantly affect the likelihood of flooding and should be key considerations in future flood management. In particular, the impact of elevation on flood risk should be fully addressed in urban planning, especially in coastal and floodplain areas, where flood protection infrastructure, such as dikes and urban drainage systems, is essential.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
XGBoost | extreme gradient boosting |
LSTM | long short-term memory |
RF | random forest |
GBDT | gradient boosted decision trees |
FRI | flood risk index |
YRD | Yangtze River Delta |
DEM | digital elevation model |
CLCD | Annual China Land Cover Dataset |
SPI | stream power index |
STI | sediment transport index |
TWI | topographic wetness index |
TCL | topographic control index |
NDVI | normalized difference vegetation index |
DIS | distance away from river |
DPN | density of pipeline networks |
ISR | impervious surface ratio |
SHAP | Shapley additive explanations |
AUC | area under the curve |
ROC | receiver operating characteristic curve |
MSE | mean squared error |
MAE | mean absolute error |
RMSE | root mean squared error |
MAPE | mean absolute percentage error |
References
- Salman, M.; Li, Y. Flood Risk Assessment, Future Trend Modeling, and Risk Communication: A Review of Ongoing Research. Nat. Hazards Rev. 2018, 19, 4018011. [Google Scholar] [CrossRef]
- Sajid, T.; Maimoon, S.K.; Waseem, M.; Ahmed, S.; Khan, M.A.; Tränckner, J.; Pasha, G.A.; Hamidifar, H.; Skoulikaris, C. Integrated Risk Assessment of Floods and Landslides in Kohistan, Pakistan. Sustainablity 2025, 17, 3331. [Google Scholar] [CrossRef]
- Kumar, V.; Sharma, K.; Caloiero, T.; Mehta, D.J.; Singh, K. Comprehensive Overview of Flood Modeling Approaches: A Review of Recent Advances. Hydrology 2023, 10, 141. [Google Scholar] [CrossRef]
- Wu, X.; Zhu, H.; Hu, L.; Meng, J.; Sun, F. Analysis of Short-Term Heavy Rainfall-Based Urban Flood Disaster Risk Assessment Using Integrated Learning Approach. Sustainability 2024, 16, 8249. [Google Scholar] [CrossRef]
- Šiljeg, S.; Milošević, R.; Mamut, M. Pluvial Flood Susceptibility in the Local Community of the City of Gospić (Croatia). Sustainability 2024, 16, 1701. [Google Scholar] [CrossRef]
- Karymbalis, E.; Andreou, M.; Batzakis, D.-V.; Tsanakas, K.; Karalis, S. Integration of GIS-Based Multicriteria Decision Analysis and Analytic Hierarchy Process for Flood-Hazard Assessment in the Megalo Rema River Catchment (East Attica, Greece). Sustainability 2021, 13, 10232. [Google Scholar] [CrossRef]
- Fraehr, N.; Wang, Q.J.; Wu, W.; Nathan, R. Assessment of surrogate models for flood inundation: The physics-guided LSG model vs. state-of-the-art machine learning models. Water Res. 2024, 252, 121202. [Google Scholar] [CrossRef]
- Mosavi, A.; Ozturk, P.; Chau, K.-W. Flood Prediction Using Machine Learning Models: Literature Review. Water 2018, 10, 1536. [Google Scholar] [CrossRef]
- Wang, Z.L.; Lai, C.G.; Chen, X.; Yang, B.; Zhao, S.; Bai, X. Flood hazard risk assessment model based on random forest. J. Hydrol. 2015, 527, 1130–1141. [Google Scholar] [CrossRef]
- Tehrany, M.S.; Pradhan, B.; Mansor, S.; Ahmad, N. Flood susceptibility assessment using GIS-based support vector machine model with different kernel types. CATENA 2015, 125, 91–101. [Google Scholar] [CrossRef]
- Mojaddadi, H.; Pradhan, B.; Nampak, H.; Ahmad, N.; Ghazali, A.H.B. Ensemble machine-learning-based geospatial approach for flood risk assessment using multi-sensor remote-sensing data and GIS. Geomat. Nat. Hazards Risk 2017, 8, 1080–1102. [Google Scholar] [CrossRef]
- Zhao, G.; Pang, B.; Xu, Z.; Peng, D.; Xu, L. Assessment of urban flood susceptibility using semi-supervised machine learning model. Sci. Total Environ. 2019, 659, 940–949. [Google Scholar] [CrossRef]
- Chen, J.; Huang, G.; Chen, W. Towards better flood risk management: Assessing flood risk and investigating the potential mechanism based on machine learning models. J. Environ. Manag. 2021, 293, 112810. [Google Scholar] [CrossRef]
- Kanani-Sadat, Y.; Safari, A.; Nasseri, M.; Homayouni, S. A novel explainable PSO-XGBoost model for regional flood frequency analysis at a national scale: Exploring spatial heterogeneity in flood drivers. J. Hydrol. 2024, 638, 131493. [Google Scholar] [CrossRef]
- Zhu, K.; Wang, Z.; Lai, C.; Li, S.; Zeng, Z.; Chen, X. Evaluating Factors Affecting Flood Susceptibility in the Yangtze River Delta Using Machine Learning Methods. Int. J. Disaster Risk Sci. 2024, 15, 738–753. [Google Scholar] [CrossRef]
- Zhu, S.; Feng, H.; Shao, Q. Evaluating Urban Flood Resilience within the Social-Economic-Natural Complex Ecosystem: A Case Study of Cities in the Yangtze River Delta. Land 2023, 12, 1200. [Google Scholar] [CrossRef]
- Deng, M.; Li, Z.; Tao, F. Rainstorm Disaster Risk Assessment and Influence Factors Analysis in the Yangtze River Delta, China. Int. J. Environ. Res. Public Health 2022, 19, 9497. [Google Scholar] [CrossRef]
- Zhang, Y.; Yao, R.; Zhu, Z.; Jin, H.; Zhang, S. Spatiotemporal evolution of population exposure to multi-scenario rainstorms in the Yangtze River Delta urban agglomeration. J. Geogr. Sci. 2024, 34, 654–680. [Google Scholar] [CrossRef]
- Saha, A.; Chandra Pal, S. Application of machine learning and emerging remote sensing techniques in hydrology: A state-of-the-art review and current research trends. J. Hydrol. 2024, 632, 130907. [Google Scholar] [CrossRef]
- Tellman, B.; Sullivan, J.A.; Kuhn, C.; Kettner, A.J.; Doyle, C.S.; Brakenridge, G.R.; Erickson, T.A.; Slayback, D.A. Satellite imaging reveals increased proportion of population exposed to floods. Nature 2021, 596, 80–86. [Google Scholar] [CrossRef]
- Vanama, V.S.K.; Mandal, D.; Rao, Y. GEE4FLOOD: Rapid mapping of flood areas using temporal Sentinel-1 SAR images with Google Earth Engine cloud platform. J. Appl. Remote Sens. 2020, 14, 034505. [Google Scholar] [CrossRef]
- Bui, Q.D.; Luu, C.; Mai, S.H.; Ha, H.T.; Ta, H.T.; Pham, B.T. Flood risk mapping and analysis using an integrated framework of machine learning models and analytic hierarchy process. Risk Anal. 2023, 43, 1478–1495. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Felton, T.; Mostafavi, A. Interpretable machine learning for predicting urban flash flood hotspots using intertwined land and built-environment features. Comput. Environ. Urban Syst. 2024, 110, 102096. [Google Scholar] [CrossRef]
- Fang, Z.; Wang, Y.; Peng, L.; Hong, H. Predicting flood susceptibility using LSTM neural networks. J. Hydrol. 2021, 594, 125734. [Google Scholar] [CrossRef]
- Huang, H.; Chen, X.; Wang, X.; Wang, X.; Liu, L. A Depression-Based Index to Represent Topographic Control in Urban Pluvial Flooding. Water 2019, 11, 2115. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar] [CrossRef]
- Dtissibe, F.Y.; Ari, A.A.A.; Abboubakar, H.; Njoya, A.N.; Mohamadou, A.; Thiare, O. A comparative study of Machine Learning and Deep Learning methods for flood forecasting in the Far-North region, Cameroon. Sci. Afr. 2024, 23, e02053. [Google Scholar] [CrossRef]
- Atashi, V.; Kardan, R.; Gorji, H.T.; Lim, Y.H. Comparative Study of Deep Learning LSTM and 1D-CNN Models for Real-time Flood Prediction in Red River of the North, USA. In Proceedings of the 2023 IEEE International Conference on Electro Information Technology (eIT), Romeoville, IL, USA, 18–20 May 2023. [Google Scholar] [CrossRef]
- Le, X.-H.; Nguyen, D.-H.; Jung, S.; Yeon, M.; Lee, G. Comparison of Deep Learning Techniques for River Streamflow Forecasting. IEEE Access 2021, 9, 71805–71820. [Google Scholar] [CrossRef]
- Rahimzad, M.; Moghaddam Nia, A.; Zolfonoon, H.; Soltani, J.; Mehr, A.D.; Kwon, H.-H. Performance Comparison of an LSTM-based Deep Learning Model versus Conventional Machine Learning Algorithms for Streamflow Forecasting. Water Resour. Manag. 2021, 35, 4167–4187. [Google Scholar] [CrossRef]
- Khairudin, N.B.M.; Mustapha, N.B.; Aris, T.N.B.M.; Zolkepli, M.B. Comparison of Machine Learning Models For Rainfall Forecasting. In Proceedings of the 2020 International Conference on Computer Science and Its Application in Agriculture (ICOSICA), Bogor, Indonesia, 16–17 September 2020. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Qi, M.; Huang, H.; Liu, L.; Chen, X. An Integrated Approach for Urban Pluvial Flood Risk Assessment at Catchment Level. Water 2022, 14, 2000. [Google Scholar] [CrossRef]
- Apel, H.; Martínez Trepat, O.; Hung, N.N.; Chinh, D.T.; Merz, B.; Dung, N.V. Combined fluvial and pluvial urban flood hazard analysis: Concept development and application to Can Tho city, Mekong Delta, Vietnam. Nat. Hazards Earth Syst. Sci. 2016, 16, 941–961. [Google Scholar] [CrossRef]
- Sanders, W.; Li, D.; Li, W.; Fang, Z.N. Data-Driven Flood Alert System (FAS) Using Extreme Gradient Boosting (XGBoost) to Forecast Flood Stages. Water 2022, 14, 747. [Google Scholar] [CrossRef]
- Chen, Z.; Zeng, Y.; Shen, G.; Xiao, C.; Xu, L.; Chen, N. Spatiotemporal characteristics and estimates of extreme precipitation in the Yangtze River Basin using GLDAS data. Int. J. Climatol. 2021, 41, E1812–E1830. [Google Scholar] [CrossRef]
- Dong, G.; Jiang, Z.; Wang, Y.; Tian, Z.; Liu, J. Evaluation of extreme precipitation in the Yangtze River Delta Region of China using a 1.5 km mesh convection-permitting regional climate model. Clim. Dyn. 2022, 59, 2257–2273. [Google Scholar] [CrossRef]
- Zhao, J.C.; Zhang, C.B.; Wang, J.; Abbas, Z.; Zhao, Y. Machine learning and SHAP-based susceptibility assessment of storm flood in rapidly urbanizing areas: A case study of Shenzhen, China. Geomat. Nat. Hazards Risk 2024, 15, 2311889. [Google Scholar] [CrossRef]
- Wang, Q.; Xu, Y.; Wang, J.; Lin, Z.; Dai, X.; Hu, Z. Assessing sub-daily rainstorm variability and its effects on flood processes in the Yangtze River Delta region. Hydrol. Sci. J. 2019, 64, 1972–1981. [Google Scholar] [CrossRef]
- Gulshad, K.; Yaseen, A.; Szydłowski, M. From Data to Decision: Interpretable Machine Learning for Predicting Flood Susceptibility in Gdańsk, Poland. Remote Sens. 2024, 16, 3902. [Google Scholar] [CrossRef]
- Starzec, M.; Kordana-Obuch, S. Evaluating the Utility of Selected Machine Learning Models for Predicting Stormwater Levels in Small Streams. Sustainability 2024, 16, 783. [Google Scholar] [CrossRef]
- Pradhan, B.; Lee, S.; Dikshit, A.; Kim, H. Spatial flood susceptibility mapping using an explainable artificial intelligence (XAI) model. Geosci. Front. 2023, 14, 101625. [Google Scholar] [CrossRef]
Batch | Categories | Parallel Sample 1 | Parallel Sample 2 | Parallel Sample 3 | |
---|---|---|---|---|---|
First batch | Part 1 | Flood points (12,000 sample points) | A11_1_1 | A12_1_1 | A13_1_1 |
Non-flood points (12,000 sample points) | A11_0_1 | A12_0_1 | A13_0_1 | ||
Second batch | Part 1 | Flood points (6000 sample points) | A21_1_1 | A22_1_1 | A23_1_1 |
Non-flood points (6000 sample points) | A21_0_1 | A22_0_1 | A23_0_1 | ||
Part 2 | Flood points (6000 sample points) | A21_1_2 | A22_1_2 | A23_1_2 | |
Non-flood points (6000 sample points) | A21_0_2 | A22_0_2 | A23_0_2 | ||
Third batch | Part 1 | Flood points (4000 sample points) | A31_1_1 | A32_1_1 | A33_1_1 |
Non-flood points (4000 sample points) | A31_0_1 | A32_0_1 | A33_0_1 | ||
Part 2 | Flood points (4000 sample points) | A31_1_2 | A32_1_2 | A33_1_2 | |
Non-flood points (4000 sample points) | A31_0_2 | A32_0_2 | A33_0_2 | ||
Part 3 | Flood points (4000 sample points) | A31_1_3 | A32_1_3 | A33_1_3 | |
Non-flood points (4000 sample points) | A31_0_3 | A32_0_3 | A33_0_3 |
Dimension | Index | Data Name Used | Year | Data Source | Spatial Resolution |
---|---|---|---|---|---|
Geography | DEM | National DEM 250 m data (SRTM 90 m) | 2000 | Resource and Environmental Science Data Platform (https://www.resdc.cn/data.aspx?DATAID=123, accessed on 3 June 2024) | 250 m × 250 m |
SLOPE | -- | -- | Obtained based on DEM calculated by Arcmap10.2 | 1000 m × 1000 m | |
RAIN | Annual precipitation data at 1 km resolution for China | 1982–2023 | National Earth System Science Data Center (http://www.geodata.cn, accessed on 27 September 2024) | 1000 m × 1000 m | |
CLCD | Annual China Land Cover Dataset | 1985–2023 | Zenodo (https://doi.org/10.5281/zenodo.5816591, accessed on 29 October 2024) | 30 m × 30 m | |
SPI | 1 | -- | [24] | 1000 m × 1000 m | |
STI | -- | [24] | 1000 m × 1000 m | ||
TWI | -- | [24] | 1000 m × 1000 m | ||
TCL | 3 | -- | [25] | 1000 m × 1000 m | |
Vegetation | NDVI | Annual NDVI, EVI 250 m dataset of China | 2023 | Resource and Environmental Science Data Platform (https://www.resdc.cn/DOI/doi.aspx?DOIid=160, accessed on 6 June 2024) | 250 m × 250 m |
River | DIS | -- | 2024 | Open Street Map (https://download.geofabrik.de/asia/china.html, accessed on 10 December 2024) | 1000 m × 1000 m |
Subsurface | DPN | -- | 2024 | Open Street Map (https://download.geofabrik.de/asia/china.html, accessed on 10 December 2024) | 1000 m × 1000 m |
ISR | Global 30 m impervious surface dataset | 2024 | Zenodo (https://doi.org/10.5281/zenodo.3505079, accessed on 18 April 2024) | 30 m × 30 m |
Models | Parameter Lattices | Search Space | Values Used in Research |
---|---|---|---|
GBDT | n_estimators | 50, 100, 200 | 200 |
Max depth | 3, 5, 7, 10 | 7 | |
Learning rate | 0.01, 0.1, 0.2 | 0.2 | |
Subsample | 0.8, 1.0 | 1.0 | |
Random seed | 42 | 42 | |
XGBoost | n_estimators | 50, 100, 200 | 200 |
Max depth | 3, 5, 7, 10 | 7 | |
Learning rate | 0.01, 0.1, 0.2 | 0.2 | |
Subsample | 0.8, 1.0 | 1.0 | |
Random seed | 42 | 42 | |
RF | n_estimators | 50, 100, 200 | 200 |
Max depth | 3, 5, 7, 10 | 5 | |
Max features | ‘sqrt’, ‘log2’, None | log2 | |
Bootstrap | True, False | True | |
Random seed | 42 | 42 |
Models | Mean Values of Precision | Mean Values of Recall | Mean Values of F1-Score | Mean Values of Accuracy | Mean Values of AUC |
---|---|---|---|---|---|
Random Forest | 0.806245 | 0.826472 | 0.803528 | 0.815102 | 0.862843 |
GBDT | 0.802759 | 0.801549 | 0.814619 | 0.814629 | 0.860472 |
XGBoost | 0.823398 | 0.831667 | 0.827090 | 0.826435 | 0.871062 |
Sampling Scheme | Precision | Recall | F1-Score | Accuracy | AUC | Optimal Threshold | Kappa | Confusion Matrix |
---|---|---|---|---|---|---|---|---|
A11 | 0.819870 | 0.790833 | 0.805090 | 0.808542 | 0.874733 | 0.527443 | 0.635699 | |
A12 | 0.822997 | 0.796250 | 0.809403 | 0.812501 | 0.875227 | 0.476919 | 0.640000 | |
A13 | 0.831808 | 0.795417 | 0.813206 | 0.817292 | 0.872167 | 0.508648 | 0.642083 | |
A21 | 0.841923 | 0.810000 | 0.825653 | 0.828958 | 0.866946 | 0.534385 | 0.656667 | |
A22 | 0.812826 | 0.845000 | 0.828601 | 0.825208 | 0.872687 | 0.562978 | 0.64875 | |
A23 | 0.809957 | 0.854167 | 0.831474 | 0.826875 | 0.871285 | 0.600911 | 0.658333 | |
A31 | 0.815432 | 0.876250 | 0.844748 | 0.838958 | 0.866999 | 0.470983 | 0.68125 | |
A32 | 0.843686 | 0.854583 | 0.849100 | 0.848125 | 0.870202 | 0.581675 | 0.702853 | |
A33 | 0.812083 | 0.862500 | 0.836533 | 0.831458 | 0.869311 | 0.536382 | 0.669583 |
Hyperparameter | Search Space | Values Used in Research |
---|---|---|
Hidden size | [32, 64, 100, 128, 256] | 100 |
Layers | [1, 2, 3, 4] | 3 |
Dropout | [0.1, 0.2, 0.3, 0.5, 0.7] | 0.5 |
Optimizer | [ADAM, SGD, RMSProp, Adagrad] | ADAM |
Learning rate | [0.0001, 0.001, 0.01, 0.1] | 0.001 |
Epochs | [100, 300, 500, 1000] | 500 |
Batch size | [8, 16, 32, 64, 128, 256] | 32 |
Batch Size | 8 | 16 | 32 | 64 | 128 | 256 |
---|---|---|---|---|---|---|
MSE | 0.036824 | 0.052417 | 0.034917 | 0.030901 | 0.045924 | 0.154982 |
MAE | 0.033416 | 0.033205 | 0.042230 | 0.053729 | 0.065872 | 0.103242 |
RMSE | 0.191825 | 0.228934 | 0.291326 | 0.175788 | 0.229389 | 0.245873 |
MAPE | 6.81% | 6.23% | 10.77% | 5.72% | 15.03% | 22.12% |
Explained Variance Score | 0.964239 | 0.949013 | 0.987244 | 0.844917 | 0.970787 | 0.986521 |
R-squared | 0.964123 | 0.948920 | 0.987243 | 0.844756 | 0.970749 | 0.935416 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, H.; Jiang, X.; Peng, S.; Zhou, K.; Xu, Z.; Wang, X. Coupled Risk Assessment of Flood Before and During Disaster Based on Machine Learning. Sustainability 2025, 17, 4564. https://doi.org/10.3390/su17104564
Zhang H, Jiang X, Peng S, Zhou K, Xu Z, Wang X. Coupled Risk Assessment of Flood Before and During Disaster Based on Machine Learning. Sustainability. 2025; 17(10):4564. https://doi.org/10.3390/su17104564
Chicago/Turabian StyleZhang, Hanqi, Xiaoxuan Jiang, Si Peng, Kecen Zhou, Zhinan Xu, and Xiangrong Wang. 2025. "Coupled Risk Assessment of Flood Before and During Disaster Based on Machine Learning" Sustainability 17, no. 10: 4564. https://doi.org/10.3390/su17104564
APA StyleZhang, H., Jiang, X., Peng, S., Zhou, K., Xu, Z., & Wang, X. (2025). Coupled Risk Assessment of Flood Before and During Disaster Based on Machine Learning. Sustainability, 17(10), 4564. https://doi.org/10.3390/su17104564