Explainable Hybrid Intelligence for Predicting Tunnel Water Inrush Quantity Under Small-Sample, High-Heterogeneity Conditions: GAN Augmentation and Swarm-Optimized CatBoost
Abstract
1. Introduction
2. Research Methodology
2.1. Optimize Algorithm
2.1.1. AsyLnCPSO (Asymmetric Linearly Varying Acceleration PSO)
2.1.2. BreedPSO (PSO with Breeding Operator)
2.1.3. CLSPSO (Chaotic Local Search PSO)
2.2. Interpretability Techniques
2.2.1. SHAP-Based Global Attribution and Directional Effects
2.2.2. PDP and ICE for Nonlinear Response Diagnostics
2.2.3. Interaction Exploration via Two-Variable (3D) PDP Surfaces
2.3. GAN Data Augmentation Technology
2.4. System Framework
2.4.1. Train/Test Split
2.4.2. Statistical Diagnosis on Original Data
2.4.3. Baseline Modeling on Original Training Set
2.4.4. Training-Set-Only Augmentation
2.4.5. Critical Distance-Based Model Ranking
2.4.6. Benchmark Evaluation of Optimizers
2.4.7. Construction of AsyLnCPSO–CatBoost Hybrid Model
2.4.8. Explainability Analysis
3. Project Overview
4. Performance Evaluation Indicators
5. Results and Analysis
5.1. Performance of Baseline Models on the Original Dataset
5.2. Effect of Training-Only Data Augmentation
5.3. Optimization Algorithm and Baseline Model Selection
5.4. Developing Hybrid Intelligence
5.5. Comprehensive Comparison of Model Performance
5.6. Interpretability Evaluation
6. Discussion
6.1. Limitations
6.2. Future Research Directions
7. Conclusions
- The original dataset with 55 samples exhibited distinct small-sample characteristics for tunnel water inrush quantity prediction, which directly resulted in severe overfitting of all selected tree-based baseline models. All baseline models showed excellent fitting accuracy on the original training set but experienced a sharp drop in predictive performance on the independent test set, reflecting that the models only memorized the limited sample patterns rather than effectively learning the intrinsic correlation between hydrogeological–structural indicators and water inrush quantity.
- The GAN-based data augmentation strategy applied exclusively to the training set expanded the model’s learning space and mitigated sample scarcity under the present dataset setting. The augmentation appeared to preserve the main statistical distributions and inter-variable coupling patterns of the raw dataset, while being designed to minimize the risk of information leakage by keeping the test set untouched throughout model development.
- CD analysis was adopted for the statistically rigorous comparison of six mainstream tree-based baseline models, and the results confirmed that CatBoost achieved the best overall statistical ranking. It exhibited more stable and competitive predictive performance than the other baseline models on both the original small-sample dataset and the GAN-augmented dataset, and was therefore selected as the baseline learner for subsequent hybrid model construction.
- Among the three enhanced PSO variants investigated, AsyLnCPSO showed the best benchmark performance among the tested optimizers and was therefore adopted for CatBoost hyperparameter tuning in this study.
- The proposed AsyLnCPSO–CatBoost hybrid model achieved the best numerical performance among the tested models under the current data split and project-specific dataset. However, because the real test set contained only 11 samples and the validation was not repeated across multiple splits or external tunnel projects, this result should be regarded as preliminary evidence of potential usefulness rather than conclusive proof of robust generalization.
- The multi-level interpretability suite integrating SHAP, PDP, and ICE curves provided transparent post hoc evidence for the feature–response relationships learned by the hybrid model. This analysis helped identify dominant driving factors, nonlinear response patterns, and key interaction effects, thereby improving model interpretability and offering project-specific decision-support insights for tunnel engineering.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zheng, S.; Zhang, Q.; Yang, Y.; Liu, X. Research and application of reliability evaluation model for water inrush risk during tunnel construction. Tunn. Undergr. Space Technol. 2026, 168, 107121. [Google Scholar] [CrossRef]
- Li, X.; Li, S.; Wang, B.; Qu, J.; Zhao, J.; Zhao, S. Water inrush risk assessment during karst tunnel construction based on knowledge decision and data-driven methods. Tunn. Undergr. Space Technol. 2026, 168, 19. [Google Scholar] [CrossRef]
- Mahmoodzadeh, A.; Mohammadi, M.; Noori, K.M.G.; Khishe, M.; Ibrahim, H.H.; Ali, H.F.H.; Abdulhamid, S.N. Presenting the best prediction model of water inflow into drill and blast tunnels among several machine learning techniques. Autom. Constr. 2021, 127, 103719. [Google Scholar] [CrossRef]
- Feng, X.; Lu, Y.; He, J.; Lu, B.; Wang, K. Bayesian-network-based predictions of water inrush incidents in soft rock tunnels. KSCE J. Civ. Eng. 2024, 28, 5934–5945. [Google Scholar] [CrossRef]
- Zhou, J.; Zhang, Y.; Li, C.; Yong, W.; Qiu, Y.; Du, K.; Wang, S. Enhancing the performance of tunnel water inflow prediction using random forest optimized by grey wolf optimizer. Earth Sci. Inform. 2023, 16, 2405–2420. [Google Scholar] [CrossRef]
- Zhang, N.; Niu, M.; Wan, F.; Lu, J.; Wang, Y.; Yan, X.; Zhou, C. Hazard prediction of water inrush in water—Rich tunnels based on random forest algorithm. Appl. Sci. 2024, 14, 867. [Google Scholar] [CrossRef]
- Zhuo, Y.; Chao, M. Risk prediction of water inrush of karst tunnels based on BP neural network. In Proceedings of the 2016 4th International Conference on Mechanical Materials and Manufacturing Engineering; Atlantis Press: Dordrecht, The Netherlands, 2016; Volume 36, pp. 1337–1342. [Google Scholar]
- Li, S.; He, P.; Li, L.; Shi, S.; Zhang, Q.; Zhang, J.; Hu, J. Gaussian process model of water inflow prediction in tunnel construction and its engineering applications. Tunn. Undergr. Space Technol. 2017, 69, 155–161. [Google Scholar] [CrossRef]
- Ma, D.; Duan, H.; Cai, X.; Li, Z.; Li, Q.; Zhang, Q. A global optimization-based method for the prediction of water inrush hazard from mining floor. Water 2018, 10, 1618. [Google Scholar] [CrossRef]
- Li, Z.; Wang, Y.; Olgun, C.G.; Yang, S.; Jiao, Q.; Wang, M. Risk assessment of water inrush caused by karst cave in tunnels based on reliability and GA-BP neural network. Geomat. Nat. Hazards Risk 2020, 11, 1212–1232. [Google Scholar] [CrossRef]
- Liu, D.; Xu, Q.; Tang, Y.; Jian, Y. Prediction of water inrush in long-lasting shutdown karst tunnels based on the HGWO-SVR model. IEEE Access 2021, 9, 6368–6378. [Google Scholar] [CrossRef]
- Zhang, Y.; Yang, L. A novel dynamic predictive method of water inrush from coal floor based on gated recurrent unit model. Nat. Hazards 2021, 105, 2027–2043. [Google Scholar] [CrossRef]
- Yin, H.; Wu, Q.; Yin, S.; Dong, S.; Dai, Z.; Soltanian, M.R. Predicting mine water inrush accidents based on water level anomalies of borehole groups using long short-term memory and isolation forest. J. Hydrol. 2023, 616, 17. [Google Scholar] [CrossRef]
- Pi, Y.; Sun, Z.; Lu, Y.; Xu, J. A novel model for risk prediction of water inrush and its application in a tunnel in Xinjiang, China. Front. Earth Sci. 2024, 12, 14. [Google Scholar] [CrossRef]
- Xu, Z.; Kong, F.; Cao, C.; Zhang, Z. Prediction and analysis of tunnel water inrush disasters in chinese karst area based on variable weight-weighted bayesian network model. Carbonates Evaporites 2024, 40, 19. [Google Scholar] [CrossRef]
- Shen, Q.; Yang, H.; Zhou, Z.; Chen, Z.; Zhang, Y. Simulation and parameter identification of water inrush in tunnel construction using physics-informed neural networks. Bull. Eng. Geol. Environ. 2025, 84, 370. [Google Scholar] [CrossRef]
- Huo, G.; Wang, H.; Zhang, J.; Xue, Y.; Fu, B.; Kong, F.; Yan, Z. Research on risk evaluation of tunnel water inrush based on multi-source geophysical exploration data fusion of MLP-transformer model. Appl. Geophys. 2025, 23, 1–14. [Google Scholar] [CrossRef]
- Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95—International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; p. 488968. [Google Scholar]
- Bao, G.Q.; Mao, K.F. Particle swarm optimization algorithm with asymmetric time varying acceleration coefficients. In 2009 IEEE International Conference on Robotics and Biomimetics, ROBIO 2009, Guilin, China, 19–13 December 2009; IEEE Computer Society: Washington, DC, USA, 2009; pp. 2134–2139. [Google Scholar]
- Løvbjerg, M.; Rasmussen, T.K.; Krink, T. Hybrid particle swarm optimiser with breeding and subpopulations. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2001; pp. 469–476. [Google Scholar]
- Gao, S.; Yu, Y.; Wang, Y.; Wang, J.; Cheng, J.; Zhou, M. Chaotic local search-based differential evolution algorithms for optimization. IEEE Trans. Syst. Man. Cybern. Syst. 2021, 51, 3954–3967. [Google Scholar] [CrossRef]
- Mosca, E.; Szigeti, F.; Tragianni, S.; Gallagher, D.; Groh, G. SHAP-based explanation methods: A review for NLP interpretability. In Proceedings of the 29th International Conference on Computational Linguistics; Association for Computational Linguistics (ACL): Gyeongju, Republic of Korea, 2022; pp. 4593–4603. [Google Scholar]
- Zhang, Y.; Qiu, Y.; Du, K.; Nguyen, H.; Armaghani, D.J.; Zhou, J. Optimizing flyrock forecasting in open-pit blasting using hybrid machine learning models. Rock. Mech. Rock. Eng. 2025, 58, 12523–12550. [Google Scholar] [CrossRef]
- Qi, H.; Zhou, J.; Khandelwal, M.; Onifade, M.; Lawal, A.I.; Li, C.; Bada, S.O.; Genc, B. An optimized machine learning framework for prediction of coal abrasive index: Leveraging supervised learning, metaheuristic optimization, and interpretability analysis. Fuel 2026, 403, 136065. [Google Scholar] [CrossRef]
- Zhou, J.; Zhang, Y.; Qiu, Y.; Peng, K.; Khandelwal, M. Enhancing tunnel safety with machine learning models for ground behavior prediction. Tunn. Undergr. Space Technol. 2025, 165, 106888. [Google Scholar] [CrossRef]
- Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative adversarial networks: An overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Cai, X.; Chen, L.; Zhou, Z.; Cheng, R.; Yuan, J. A feature fusion-based framework for robust prediction of underground pillar stability under small-sample conditions. Rock Mech. Rock Eng. 2025, 58, 13565–13586. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhou, J.; Li, J.; He, B.; Armaghani, D.J.; Huang, S. Advancing overbreak prediction in drilling and blasting tunnel using MVO, SSA and HHO-based SVM models with interpretability analysis. Geomech. Geophys. Geo-Energ. Geo-Resour. 2025, 11, 53. [Google Scholar] [CrossRef]
- Cai, X.; Wang, C.; Zhou, Z.; Cheng, R.; Gao, J.; Liu, B. Dynamic optimization of powder factor in extreme-cold region bench blasting considering temperature effects on single-hole blasting. Rock Mech. Rock Eng. 2025, 1–26, Correction in Rock Mech. Rock Eng. 2026, 1. https://doi.org/10.1007/s00603-026-05347-9. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, E.; Gu, J.; Du, K.; Zhou, J. Residential building cooling load prediction with optimized KELM models and interpretability insights. Appl. Therm. Eng. 2025, 272, 126421. [Google Scholar] [CrossRef]















| Studies | Data Quantity | AI Methods | Interpretability |
|---|---|---|---|
| Zhuo and Chao, [7] (2016) | 16 | BP | No |
| Li et al., [8] (2017) | 36 | GP-SVM-ANN | No |
| Ma et al., [9] (2018) | 18 | GA-SVM | No |
| Li et al., [10] (2020) | 100 | GA-BP | No |
| Liu et al., [11] (2021) | 181 | HGWO-SVR | No |
| Zhang et al., [12] (2021) | 180 | GRU | No |
| Yin et al., [13] (2023) | 36 | LSTM, iForest | Yes |
| Zhou et al., [5] (2023) | 600 | GWO-RF | No |
| Pi et al., [14] (2024) | 70 | SMF | No |
| Zhang et al., [6] (2024) | 185 | RF | No |
| Feng et al., [4] (2024) | 70 | BN | No |
| Xu et al., [15] (2024) | 91 | VW-WBN | No |
| Shen et al., [16] (2025) | 26 | PINN | No |
| Huo et al., [17] (2025) | 7 | MLP-Transformer | No |
| Li et al., [2] (2026) | 52 | DE-GWO-ELM | No |
| Zheng et al., [1] (2026) | 55 | RF | No |
| Category | Variable Name | Abbreviation | Unit | Average |
|---|---|---|---|---|
| Input | Reflector distribution coefficient | RDC | - | 0.39 |
| Input | Groundwater development coefficient | GDC | - | 0.46 |
| Input | Attitude of rock | AR | ° | 62.49 |
| Input | Fracture opening | FO | mm | 9.6 |
| Input | Stratum lithologic coefficient | SLC | - | 0.50 |
| Input | Depth of tunnel | TD | m | 46.41 |
| Output | Water inrush quantity | WIQ | m3/h | 20.10 |
| Parameters | Configuration |
|---|---|
| Lower bound for depth | 1 |
| Upper bound for depth | 10 |
| Lower bound for l2_leaf_reg | 1 |
| Upper bound for l2_leaf_reg | 20 |
| Lower bound for learning_rate | 0.01 |
| Lower bound for learning_rate | 0.30 |
| The maximum number of iterations | 500 |
| Population Size | Training (880 Samples) | |||
| R2 | MAPE (%) | MAE | RMSE | |
| 100 | 0.99884 | 0.5381 | 0.1083 | 0.16081 |
| Population size | Testing (11 samples) | |||
| R2 | MAPE (%) | MAE | RMSE | |
| 100 | 0.97736 | 1.9207 | 0.3779 | 0.67357 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Huang, R.; Chen, Y.; Wang, L.; Zhan, J.; Ji, Y.; Huang, T.; Yang, Y. Explainable Hybrid Intelligence for Predicting Tunnel Water Inrush Quantity Under Small-Sample, High-Heterogeneity Conditions: GAN Augmentation and Swarm-Optimized CatBoost. Infrastructures 2026, 11, 183. https://doi.org/10.3390/infrastructures11060183
Huang R, Chen Y, Wang L, Zhan J, Ji Y, Huang T, Yang Y. Explainable Hybrid Intelligence for Predicting Tunnel Water Inrush Quantity Under Small-Sample, High-Heterogeneity Conditions: GAN Augmentation and Swarm-Optimized CatBoost. Infrastructures. 2026; 11(6):183. https://doi.org/10.3390/infrastructures11060183
Chicago/Turabian StyleHuang, Rui, Yige Chen, Lanjing Wang, Jing Zhan, Yuanfan Ji, Tingyu Huang, and Yanbo Yang. 2026. "Explainable Hybrid Intelligence for Predicting Tunnel Water Inrush Quantity Under Small-Sample, High-Heterogeneity Conditions: GAN Augmentation and Swarm-Optimized CatBoost" Infrastructures 11, no. 6: 183. https://doi.org/10.3390/infrastructures11060183
APA StyleHuang, R., Chen, Y., Wang, L., Zhan, J., Ji, Y., Huang, T., & Yang, Y. (2026). Explainable Hybrid Intelligence for Predicting Tunnel Water Inrush Quantity Under Small-Sample, High-Heterogeneity Conditions: GAN Augmentation and Swarm-Optimized CatBoost. Infrastructures, 11(6), 183. https://doi.org/10.3390/infrastructures11060183
