Bayesian Network-Based Earth-Rock Dam Breach Probability Analysis Integrating Machine Learning
Abstract
1. Introduction
2. Materials and Methods
2.1. Data Collection and Preprocessing
2.2. Identification of Risk Factors for Dam Failure in Reservoirs
2.3. Construction of Risk Networks for Earth-Rock Dams Integrating Domain Knowledge
2.3.1. Construction of Risk Network Structures
2.3.2. Bayesian Network Parameter Learning
2.4. Evaluation and Inference of Risk Network Models for Earth-Rock Dams
2.4.1. Predictive Performance Evaluation of Reservoir Dam-Break Models
2.4.2. Bayesian Network Inference
3. Results
3.1. Dam-Break Network Models for Earth-Rock Dams
3.1.1. Domain Knowledge-Based Network Structure
3.1.2. Structure Learning Algorithm-Based Network Structure
- (1)
- Direct connections between risk factors and the safety state, as indicated by the green arrows in Figure 4. The model contains five risk factors, including gate failure, exceedance flood, collapse, poor construction quality and abutment deterioration, pointing directly to the “Safety State of Earth-Rock Dams”. However, from an engineering mechanism perspective, these risk factors typically affect the safety state indirectly by triggering specific failure modes. Therefore, such connection relationships need adjustment.
- (2)
- Variable connections lacking causal basis, as indicated by the pink arrows in Figure 4. Although statistical correlations exist between some variables connected by directed edges, clear causal links are absent. Examples include over-retention of water pointing to exceedance flood, gate damage pointing to slope instability, and abutment softening pointing to collapse. These connections require correction based on domain knowledge.
- (3)
- Connections with incorrect causal direction, as indicated by the red arrows in Figure 4. In the model, seepage failure points to collapse; in fact, collapse is an influencing factor that can lead to seepage failure, thus the direction of the arrow between them is problematic. Additionally, the direction of the directed arc between “piping” and “seepage protection compromise” is also incorrect. “Piping” does not cause “seepage protection compromise”; conversely, damage to impervious facilities creates seepage pathways. Subsequently, fine particles in the soil are gradually carried away by water flow, giving rise to piping.
3.1.3. Integrated Network Structure Combining Domain Knowledge and Structure Learning Algorithms
3.2. Model Evaluation Results
3.3. Inference Results
4. Discussion
- (1)
- The key performance metrics of model G3, which integrates domain knowledge with the structure learning algorithm (AUC = 0.887, OA = 0.895, F1-Score = 0.899), significantly outperform those of the domain knowledge-based model G1 (AUC = 0.797, OA = 0.784, F1-Score = 0.762), as detailed in Table 3. This result reveals that the structure learning algorithm, through data-driven mining of objective associations, effectively compensates for the structural simplification defects inherent in purely domain knowledge-based models caused by excessive reliance on expert experience. These defects include neglecting the coupling effects among risk factors and relying on subjective causal assumptions. The structure learning algorithm significantly optimizes the completeness and causal rationality of the network structure. Consequently, it comprehensively enhances the model’s discrimination accuracy for the dam breach state. This validates the superiority of the strategy integrating machine learning algorithms in constructing high-precision quantitative risk models for earth-rock dams. This strategy leverages the advantage of data-driven approaches in capturing complex associations while utilizing domain knowledge to ensure the engineering rationality of the network topology. It effectively overcomes the subjectivity and limitations of single-domain knowledge modeling, thereby providing reliable methodological support for constructing high-precision and interpretable quantitative risk models for earth-rock dam breach.
- (2)
- As detailed in Table 3, the key performance metrics of the G3 model, which integrates domain knowledge: AUC, OA, and F1-Score are 0.887, 0.895, and 0.899, respectively. These values significantly outperform those of the data-driven model G2 (AUC = 0.835, OA = 0.861, F1-Score = 0.841). This indicates that the model possesses precise discrimination capability for the state of earth-rock dams. The high accuracy and F1-Score demonstrate that the model’s overall classification performance is stable and well-balanced. This result proves the effectiveness of the modeling strategy integrating domain knowledge with machine learning algorithms in enhancing model accuracy and reliability, thereby establishing a credible foundation for practical engineering risk quantification and assessment.
- (3)
- The forward inference of the Bayesian Network, as quantitatively revealed in Figure 7, identifies the systemic predisposing factors for dam breach and their coupling pathways. The results show that exceedance flood, with an occurrence probability of 12.80%, and insufficient spillway discharge capacity at 5.40%, are the root factors with the highest occurrence probability and the most significant risk contribution. Furthermore, these two factors elevate the probability of dam breach through a cascading effect: “exceedance flood → insufficient spillway discharge capacity → overtopping”. Therefore, in the routine management of earth-rock dams, it is essential to strengthen early warning systems for extreme weather events and optimize reservoir operation plans. Regular assessment of the actual discharge capacity and ensuring the reliable operation of key facilities, such as gates, are critical. Additionally, the coupling and amplification effects must be prioritized, necessitating the implementation of coordinated prevention and control measures.
- (4)
- Figure 8 employs Bayesian backward inference to quantitatively trace the contributions of direct failure modes under dam-break conditions: The posterior probability increase for flood overtopping is 53.2%, surpassing that of seepage failure (36.09%) and structural failure (24.08%), demonstrating that flood overtopping is the most critical direct failure mode. This result provides a fundamental basis for targeted prevention and control: Preventing flood overtopping must be prioritized, while addressing seepage failure and structural failure also requires attention. The quantified contributions establish an objective priority ranking for the scientific allocation of risk management resources.
- (5)
- Notwithstanding its demonstrated utility, the proposed model carries inherent limitations that warrant acknowledgment and pave the way for future research. Its performance is contingent upon the representativeness and quality of the historical dataset, which, despite being substantial for the domain, remains limited in size and potentially subject to reporting biases and regional specificity. The static nature of the model, a common characteristic of standard BNs, does not explicitly capture the temporal evolution of risk factors, such as the progressive nature of internal erosion or real-time hydrological changes. Furthermore, the structure learning process, while enhanced by expert knowledge, involves assumptions and discretionary adjustments that influence the final topology. Future work should therefore prioritize augmenting the database with more diverse and granular case histories. A pivotal advancement would be the development of a dynamic BN framework capable of assimilating real-time monitoring data such as seepage pressures, deformation and reservoir levels to enable continuous risk updating and truly proactive early warning. Exploring more advanced, hybrid structure learning methodologies and integrating principles from physics-informed machine learning could further refine the model’s causal fidelity and robustness, ultimately evolving it into a comprehensive, adaptive decision-support tool for dam safety management.
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ge, W.; Wang, X.; Li, Z.; Zhang, H.; Guo, X.; Wang, T.; Gao, W.; Lin, C.; van Gelder, P. Interval Analysis of the Loss of Life Caused by Dam Failure. J. Water Resour. Plann. Manag. 2021, 147, 4020098. [Google Scholar] [CrossRef]
- Sun, M.; Sakai, K.; Chen, A.Y.; Hsu, Y. Location problems of vertical evacuation structures for dam-failure floods: Considering shelter-in-place and horizontal evacuation. Int. J. Disaster Risk Reduct. 2022, 77, 103044. [Google Scholar] [CrossRef]
- Zhang, H.; Ge, W.; Zhang, Y.; Li, Z.; Li, W.; Zhu, J. Risk Management Decision of Reservoir Dams Based on the Improved Life Quality Index. Water Resour. Manag. 2023, 37, 1223–1239. [Google Scholar] [CrossRef]
- Ge, W.; Li, Z.; Li, W.; Wu, M.; Li, J.; Pan, Y. Risk evaluation of dam-break environmental impacts based on the set pair analysis and cloud model. Nat. Hazard. 2020, 104, 1641–1653. [Google Scholar] [CrossRef]
- Ge, W.; Qin, Y.; Li, Z.; Zhang, H.; Gao, W.; Guo, X. An innovative methodology for establishing societal life risk criteria for dams: A case study to reservoir dam failure events in China. Int. J. Disaster Risk Reduct. 2020, 49, 101663. [Google Scholar] [CrossRef]
- Li, Z.; Zhang, Y.; Wang, J.; Ge, W.; Li, W.; Song, H. Impact evaluation of geomorphic changes caused by extreme floods on inundation area considering geomorphic variations and land use types. Sci. Total Environ. 2021, 754, 142424. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, Z.; Ge, W.; Chen, X.; Xu, H.; Guo, X.; Wang, T. Impact of extreme floods on plants considering various influencing factors downstream of Luhun Reservoir, China. Sci. Total Environ. 2021, 768, 145312. [Google Scholar] [CrossRef]
- Chen, W.; Wang, X.; Liu, M.; Zhu, Y.; Deng, S. Probabilistic Risk Assessment of RCC Dam Considering Grey-Stochastic-Fuzzy Uncertainty. KSCE J. Civ. Eng. 2018, 22, 4399–4413. [Google Scholar] [CrossRef]
- He, K.; Pei, L.; Lu, X.; Chen, J.; Wu, Z. Research and Application of Critical Failure Paths Identification Method for Dam Risk Analysis. Math. Probl. Eng. 2020, 2020, 4103804. [Google Scholar] [CrossRef]
- Liu, M.; Dong, X.; Guo, H. Risk assessment of ice dams for water diversion projects based on fuzzy fault trees. Appl. Water Sci. 2021, 11, 23. [Google Scholar] [CrossRef]
- Jing, M.; Jie, Y.; Shou-yi, L.; Lu, W. Application of fuzzy analytic hierarchy process in the risk assessment of dangerous small-sized reservoirs. Int. J. Mach. Learn. Cybern. 2018, 9, 113–123. [Google Scholar] [CrossRef]
- Morales-Nápoles, O.; Delgado-Hernández, D.J.; De-León-Escobedo, D.; Arteaga-Arcos, J.C. A continuous Bayesian network for earth dams’ risk assessment: Methodology and quantification. Struct. Infrastruct Eng. 2014, 10, 589–603. [Google Scholar] [CrossRef]
- Malekmohammadi, B.; Moghadam, N.T. Application of Bayesian networks in a hierarchical structure for environmental risk assessment: A case study of the Gabric Dam, Iran. Environ. Monit. Assess. 2018, 190, 279. [Google Scholar] [CrossRef]
- Zhang, H.; Li, Z.; Ge, W.; Zhang, Y.; Wang, T.; Sun, H. An extended Bayesian network model for calculating dam failure probability based on fuzzy sets and dynamic evidential reasoning. Energy 2024, 301, 131719. [Google Scholar] [CrossRef]
- Anna, K.; Matteo, S.; Peter, B. Application of a Bayesian hierarchical modeling for risk assessment of accidents at hydropower dams. Saf. Sci. 2018, 110, 164–177. [Google Scholar] [CrossRef]
- Hegde, J.; Rokseth, B. Applications of machine learning methods for engineering risk assessment—A review. Saf. Sci. 2020, 122, 104492. [Google Scholar] [CrossRef]
- Novellino, A.; Cesarano, M.; Cappelletti, P.; Di Martire, D.; Di Napoli, M.; Ramondini, M.; Sowter, A.; Calcaterra, D. Slow-moving landslide risk assessment combining Machine Learning and InSAR techniques. Catena 2021, 203, 105317. [Google Scholar] [CrossRef]
- Li, X.; Wen, Z.; Su, H. An approach using random forest intelligent algorithm to construct a monitoring model for dam safety. Eng. Comput. 2021, 37, 39–56. [Google Scholar] [CrossRef]
- Wen, Z.; Fan, Z.; Su, H. An APPSO–SVM approach building the monitoring model of dam safety. Soft Comput. 2022, 26, 11451–11459. [Google Scholar] [CrossRef]
- Zhang, S.; Zheng, D.; Liu, Y. Deformation Prediction System of Concrete Dam Based on IVM-SCSO-RF. Water 2022, 14, 3739. [Google Scholar] [CrossRef]
- Setzu, M.; Guidotti, R.; Monreale, A.; Turini, F.; Pedreschi, D.; Giannotti, F. GLocalX—From Local to Global Explanations of Black Box AI Models. Artif. Intell. Rev. 2021, 294, 103457. [Google Scholar] [CrossRef]
- Li, Y.; Yin, Q.; Zhang, Y.; Wang, T.; Shi, N.; Xu, Z.; Liu, Y. Coupling analysis of earth-rock dam break risk factors based on the ISM-BN model. Environ. Earth Sci. 2025, 84, 488. [Google Scholar] [CrossRef]
- Li, Z.; Wang, T.; Ge, W.; Wei, D.; Li, H. Risk Analysis of Earth-Rock Dam Breach Based on Dynamic Bayesian Network. Water 2019, 11, 2305. [Google Scholar] [CrossRef]
- Lu, X.; Chen, C.; Li, Z.; Chen, J.; Pei, L.; He, K. Bayesian network safety risk analysis for the dam–foundation system using Monte Carlo simulation. Appl. Soft Comput. 2022, 126, 109229. [Google Scholar] [CrossRef]
- Tang, X.; Chen, A.; He, J. Optimized variable selection of Bayesian network for dam risk analysis: A case study of earth dams in the United States. J. Hydrol. 2023, 617, 129091. [Google Scholar] [CrossRef]
- Xu, W.; Niu, X.; Zhu, Y. Deformation behavior and damage evaluation of fly ash-slag based geopolymer concrete under cyclic tension. J. Build Eng. 2024, 86, 108664. [Google Scholar] [CrossRef]
- Zhu, Y.; Zhang, Z.; Gu, C.; Li, Y.; Zhang, K.; Xie, M. A Coupled Model for Dam Foundation Seepage Behavior Monitoring and Forecasting Based on Variational Mode Decomposition and Improved Temporal Convolutional Network. Struct. Control Health Monit. 2023, 2023, 3879096. [Google Scholar] [CrossRef]
- Wang, L.; Wu, C.; Gu, X.; Liu, H.; Mei, G.; Zhang, W. Probabilistic stability analysis of earth dam slope under transient seepage using multivariate adaptive regression splines. Bull. Eng. Geol. Environ. 2020, 79, 2763–2775. [Google Scholar] [CrossRef]
- Qi, X.; Fan, X.; Wang, H.; Lin, L.; Gao, Y. Mutual-information-inspired heuristics for constraint-based causal structure learning. Inform. Sci. 2021, 560, 152–167. [Google Scholar] [CrossRef]
- Ramirez-Hereza, P.; Ramos, D.; Toledano, D.T.; Gonzalez-Rodriguez, J.; Ariza-Velazquez, A.; Doncel, N. Score-based Bayesian network structure learning algorithms for modeling radioisotope levels in nuclear power plant reactors. Chemom. Intell. Lab. Syst. 2023, 237, 104811. [Google Scholar] [CrossRef]
- Scutari, M.; Graafland, C.E.; Gutiérrez, J.M. Who learns better Bayesian network structures: Accuracy and speed of structure learning algorithms. Int. J. Approx. Reason. 2019, 115, 235–253. [Google Scholar] [CrossRef]
- Vural, M.S.; Telceken, M. Modification of posterior probability variable with frequency factor according to Bayes Theorem. J. Intell. Syst. Appl. 2022, 5, 19–26. [Google Scholar] [CrossRef]
- Yu, Y.; Hou, L.; Liu, X.; Wu, S.; Li, H.; Xue, F. A novel constraint-based structure learning algorithm using marginal causal prior knowledge. Sci. Rep. 2024, 14, 19279. [Google Scholar] [CrossRef] [PubMed]
- Scanagatta, M.; Salmerón, A.; Stella, F. A survey on Bayesian network structure learning from data. Prog. Artif. Intell. 2019, 8, 425–439. [Google Scholar] [CrossRef]
- Sammaknejad, N.; Zhao, Y.; Huang, B. A review of the Expectation Maximization algorithm in data-driven process identification. J. Process Control 2019, 73, 123–136. [Google Scholar] [CrossRef]
- Carter, J.V.; Pan, J.; Rai, S.N.; Galandiuk, S. ROC-ing along: Evaluation and interpretation of receiver operating characteristic curves. Surgery 2016, 159, 1638–1645. [Google Scholar] [CrossRef] [PubMed]
- Martínez-Camblor, P.; Pérez-Fernández, S.; Díaz-Coto, S. The area under the generalized receiver-operating characteristic curve. Int. J. Biostat. 2022, 18, 293–306. [Google Scholar] [CrossRef]
- Theissler, A.; Thomas, M.; Burch, M.; Gerschner, F. ConfusionVis: Comparative evaluation and selection of multi-class classifiers based on confusion matrices. Knowl.-Based Syst. 2022, 247, 108651. [Google Scholar] [CrossRef]
- Li, Q.; Du, X.; Ni, P.; Han, Q.; Xu, K.; Yuan, Z. Efficient Bayesian inference for finite element model updating with surrogate modeling techniques. J. Civ. Struct. Health Monit. 2024, 14, 997–1015. [Google Scholar] [CrossRef]








| Failure Modes | Influencing Factors | Abbreviation | State Count | State Classification |
|---|---|---|---|---|
| Overtopping | Exceedance Flood | EFD | 2 | Yes/No |
| Spillway Failure | SF | 2 | Yes/No | |
| Management Deficiency | MDA | 2 | Yes/No | |
| Gate Failure | GF | 2 | Yes/No | |
| Reservoir Over-Filling | ROF | 2 | Yes/No | |
| Insufficient Spillway Capacity | ISC | 2 | Yes/No | |
| Seepage Failure | Exceedance Flood | EFD | 2 | Yes/No |
| Piping | PG | 3 | Serious/Minor/Normal | |
| Collapse | SE | 3 | Serious/Minor/Normal | |
| Cracks | CK | 3 | Serious/Minor/Normal | |
| Internal Erosion | ITE | 3 | Serious/Minor/Normal | |
| Seepage Protection Compromise | SPC | 2 | Yes/No | |
| Animal Activities or Plant Root Growth | AMT | 3 | Serious/Minor/Normal | |
| Structural Instability | Settlement | ST | 3 | Serious/Minor/Normal |
| Abutment Deterioration | ATF | 2 | Yes/No | |
| Foundation Deterioration | FDI | 2 | Yes/No | |
| Slope Instability | SPI | 2 | Yes/No | |
| Poor Construction Quality | PCQ | 2 | Yes/No |
| Predicted State | Total | |||
|---|---|---|---|---|
| Positive | Negative | |||
| Actual State | Positive | True Positive (TP) | False Negative (FN) | Actual Positives |
| Negative | False Positive (FP) | True Negative (TN) | Actual Negatives | |
| Total | Predicted Positives | Predicted Negatives | Total Samples | |
| Model | AUC | OA | Prec | Rec | F1-Score |
|---|---|---|---|---|---|
| G1 | 0.797 | 0.784 | 0.782 | 0.741 | 0.762 |
| G2 | 0.835 | 0.861 | 0.844 | 0.834 | 0.841 |
| G3 | 0.887 | 0.895 | 0.897 | 0.883 | 0.899 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Z.; Shi, Q.; Sun, H.; Zhou, Y.; Ma, F.; Wang, J.; van Gelder, P. Bayesian Network-Based Earth-Rock Dam Breach Probability Analysis Integrating Machine Learning. Water 2025, 17, 3085. https://doi.org/10.3390/w17213085
Li Z, Shi Q, Sun H, Zhou Y, Ma F, Wang J, van Gelder P. Bayesian Network-Based Earth-Rock Dam Breach Probability Analysis Integrating Machine Learning. Water. 2025; 17(21):3085. https://doi.org/10.3390/w17213085
Chicago/Turabian StyleLi, Zongkun, Qing Shi, Heqiang Sun, Yingjian Zhou, Fuheng Ma, Jianyou Wang, and Pieter van Gelder. 2025. "Bayesian Network-Based Earth-Rock Dam Breach Probability Analysis Integrating Machine Learning" Water 17, no. 21: 3085. https://doi.org/10.3390/w17213085
APA StyleLi, Z., Shi, Q., Sun, H., Zhou, Y., Ma, F., Wang, J., & van Gelder, P. (2025). Bayesian Network-Based Earth-Rock Dam Breach Probability Analysis Integrating Machine Learning. Water, 17(21), 3085. https://doi.org/10.3390/w17213085

