Next Article in Journal
Intergovernmental Transfers as Determinants of Municipal Fiscal Sustainability: A Review of Theory and Empirical Evidence from Polish Municipalities
Previous Article in Journal
“Feel the Flow, See the Value”: S–O–R Model of Consumer Responses to ESG Advertising
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Intelligent and Sustainable Classification of Tunnel Water and Mud Inrush Hazards with Zero Misjudgment of Major Hazards: Integrating Large-Scale Models and Multi-Strategy Data Enhancement

1
Key Laboratory of Urban Underground Engineering of the Ministry of Education, Beijing 100044, China
2
School of Civil Engineering, Beijing Jiaotong University, Beijing 100044, China
3
Department of Electrical Engineering, Tsinghua University, Beijing 100084, China
4
College of Environmental Science and Engineering, Ocean University of China, Qingdao 266100, China
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(24), 11286; https://doi.org/10.3390/su172411286
Submission received: 16 November 2025 / Revised: 8 December 2025 / Accepted: 15 December 2025 / Published: 16 December 2025
(This article belongs to the Section Sustainable Engineering and Science)

Abstract

Water and mud inrush hazards pose significant threats to the safety, environmental stability, and resource efficiency of tunnel construction, representing a critical barrier to the development of sustainable transportation infrastructure. Misjudgment—especially missed detections of severe hazards—can lead to extensive geological disturbance, excessive energy consumption, and severe socio-environmental impacts. However, pre-trained large-scale models still face two major challenges when applied to tunnel hazard classification: limited labeled samples and the high cost associated with misclassifying severe hazards. This study proposes a sustainability-oriented intelligent classification framework that integrates a large-scale pre-trained model with multi-strategy data augmentation to accurately identify hazard levels during tunnel excavation. First, a Synthetic Minority Over-Sampling Technique (SMOTE)-based multi-strategy augmentation method is introduced to expand the training set, mitigate class imbalance, and enhance the model’s ability to recognize rare but critical hazard categories. Second, a deep feature extraction architecture built on the robustly optimized BERT pretraining approach (RoBERTa) is designed to strengthen semantic representation under small-sample conditions. Moreover, a hierarchical weighting mechanism is incorporated into the weighted cross-entropy loss to emphasize the identification of severe hazard levels, thereby ensuring zero missed detections. Experimental results demonstrate that the proposed method achieves an accuracy of 99.26%, representing a 27.96% improvement over the traditional SVM baseline. Importantly, the recall for severe hazards (Levels III and IV) reaches 100%, ensuring zero misjudgment of major hazards. By effectively reducing safety risks, minimizing environmental disruptions, and promoting resilient tunnel construction, this method provides strong support for sustainable and low-impact underground engineering practices.

1. Introduction

With the rapid advancement toward sustainable and resilient transportation infrastructure, an increasing number of tunnels are being constructed in geologically complex mountainous regions. Water and mud inrush has become not only one of the most frequent and hazardous disasters during tunnel construction [1,2], accounting for more than 40% of geological hazards [3,4], but also a major threat to environmental stability, resource efficiency, and green construction practices. This disaster, characterized by the sudden eruption of groundwater or mud–water mixtures through joints, faults, or karst conduits [5], may cause severe economic losses, substantial environmental disturbance, and even casualties [5,6,7]. Therefore, accurate classification of water and mud inrush hazards is essential for ensuring construction safety, minimizing ecological impact, and promoting sustainable tunnel development. However, the complex hydrogeological conditions of underground environments still pose significant challenges to accurate hazard classification.
Various methods have been proposed for evaluating the hazard levels of water and mud inrush. Early approaches such as two-level fuzzy comprehensive evaluation [8], set pair analysis [9], and attribute mathematics [10] offered preliminary risk assessments but failed to capture multi-factor interactions. Improved methods—including extension theory integrated with fuzzy evaluation [11], uncertain-measure–based multi-factor assessment [12], and combined weight cloud models [13]—enhanced uncertainty treatment but still suffered from subjective weighting schemes. Fuzzy theory [4] and entropy weighting [14] alleviated part of this issue but remained limited when handling conflicting multi-factor conditions. Approaches integrating AHP with TOPSIS [15], improved game theory with uncertainty measures [16], and ideal point interval recognition methods [17] achieved better handling of factor conflicts, but their accuracy was still insufficient for large-scale engineering data required in sustainable construction. These limitations highlight the potential of machine learning methods.
Machine learning has increasingly been applied to tunnel hazard prediction [18,19]. Shallow models such as SVM [20], RF [21], KNN [22], and ELM [23] offer advantages in efficiency and interpretability but struggle to model nonlinear patterns in complex datasets. Deep learning models, including CNNs [24,25], HRNet [24], and EfficientNet [26], have demonstrated stronger feature extraction capabilities. However, studies focusing on classifying hazard levels of water and mud inrush remain limited, with most relying on Bayesian Networks [27,28] or RF [29], whose shallow feature representations restrict performance and robustness—key requirements for sustainable and low-risk underground engineering.
Large-scale pre-trained models, such as RoBERTa and ChatGPT, provide powerful end-to-end feature extraction capabilities without manual feature engineering [30,31,32]. Yet fine-tuning these models typically requires large labeled datasets. In tunnel engineering, data acquisition is costly and challenging, and limited samples may lead to hallucination effects and reduced model reliability. Multi-strategy data augmentation has thus become an effective approach to enhance model generalization. Although GAN-based methods are widely used in image enhancement [33,34,35,36,37,38] and SMOTE-based techniques perform well in addressing class imbalance in tunnel engineering [39,40,41,42], augmentation strategies specifically designed for water and mud inrush hazards have not been fully explored.
Misclassification—especially missed detection of severe hazards—remains a critical challenge in sustainable risk management [43,44,45]. Failure to accurately identify major hazard levels may lead to catastrophic consequences, including environmental degradation, equipment damage, and secondary geological disasters, severely undermining the goals of green, safe, and resilient tunnel construction. However, existing studies lack effective solutions to achieve zero misclassification of major hazards.
To address these challenges, this paper proposes a sustainability-oriented intelligent classification method for water and mud inrush hazards by integrating large-scale pre-trained models with multi-strategy data augmentation. First, SMOTE, ADASYN, and Borderline-SMOTE are employed to expand minority classes and improve dataset balance. Second, a RoBERTa-based classification framework is developed to enhance semantic representation under small-sample conditions. Third, a weighted cross-entropy strategy is designed to achieve zero misclassification of major hazards while maintaining high accuracy for minor hazards. Finally, engineering data from the DD tunnel in western China are used to validate the method, demonstrating superior accuracy, robustness, and sustainability-oriented reliability.

2. Methodology

This study proposes an intelligent classification method for tunnel water and mud inrush hazards by integrating multi-strategy data augmentation with a large pre-trained model (RoBERTa). By transforming structured engineering data into textual representations, the model can effectively learn complex hydro-geological feature interactions. Combined with a weighted cross-entropy loss function, the method achieves zero misclassification of major hazards. The overall framework is shown in Figure 1.

2.1. Multi-Strategy Data Enhancement Method Based on SMOTE

2.1.1. Problem Description

When using pre-trained large-scale models for tunnel hazard classification prediction, a substantial amount of data is required to train the model and ensure its predictive accuracy. However, due to the complexity of engineering geology, collecting field data is both challenging and costly. To address this issue, this paper employs a multi-strategy data augmentation approach based on SMOTE, integrating three techniques: SMOTE, ADASYN, and Borderline-SMOTE.

2.1.2. Introduction of Multi-Strategy Data Enhancement Method

Data augmentation can significantly improve model prediction accuracy and enhance generalization capability, enabling better performance in complex and dynamic real-world environments [33]. Among these methods, SMOTE is one of the most representative algorithms.
SMOTE generates new minority class samples through interpolation. Its core idea involves calculating the k-nearest neighbors of each minority class sample, randomly selecting a sample within this neighborhood, and generating new samples using linear interpolation [39]. The computation equation for this method is as follows:
x new = x i + λ · x n n x i
where xnew represents the newly generated minority class sample, xi denotes a selected minority class sample, and xnn is another minority class sample chosen from the k-nearest neighbors of xi. The term λ∼U(0, 1) represents a randomly generated number following a uniform distribution.
The ADASYN method improves upon SMOTE by introducing an adaptive sampling strategy. Its core idea is to generate more minority class samples near the decision boundary, thereby enhancing the classifier’s ability to recognize difficult-to-learn samples [46]. ADASYN calculates the proportion of majority class samples among the k-nearest neighbors of each minority class sample xi. The calculation process is shown in Equation (2).
r i = n maj k
where ri represents the learning difficulty of sample xi, which is defined as the proportion of majority class samples among its k-nearest neighbors. k denotes the predefined number of neighbors, and nmaj refers to the number of majority class samples within the neighborhood of xi. Subsequently, based on the learning difficulty of all minority class samples, the total number of new samples to be generated is determined. The calculation formula is shown in Equation (3).
G = d × n maj n min
where G represents the total number of new samples to be generated, and d is a user-defined sampling ratio parameter (typically ranging from 0 ≤ d ≤ 1), indicating the number of minority class samples.
g i = r i r i × G
where gi represents the number of new samples to be generated for sample xi, and ∑ri denotes the total learning difficulty across all minority class samples. Finally, new minority class samples are generated using the SMOTE sampling strategy. Compared to SMOTE, ADASYN employs an adaptive approach to ensure that more minority class samples are generated in regions where classification is more challenging, enabling the classifier to better learn complex decision boundaries [47].
The Borderline-SMOTE method addresses the limitation of SMOTE, which uniformly generates samples across the entire minority class sample space, by proposing an oversampling approach that focuses on the decision boundary [48]. The core idea of this method is to first identify “borderline samples” within the minority class—those minority class instances with a high proportion of majority class samples in their k-nearest neighbors [49]. The decision criteria for identifying these borderline samples are defined in Equation (5).
n maj k > T
where T is a predefined threshold (typically set to 0.5 or higher). If the proportion of majority class samples among the k-nearest neighbors of a minority class sample exceeds T, the sample is classified as a “borderline sample.” Once the borderline samples are identified, SMOTE sampling is applied exclusively to these samples. Compared to SMOTE, Borderline-SMOTE enhances the representation of minority class samples near the decision boundary, improving the classifier’s ability to differentiate in critical regions and preventing the generation of ineffective samples that may occur with SMOTE.
In summary, this study employs SMOTE, ADASYN, and Borderline-SMOTE for data augmentation. By synthesizing new minority class samples, these methods enhance the model’s ability to learn from minority class instances, thereby improving the classifier’s generalization capability and predictive accuracy.

2.2. Construction of Hazard Classification Model Based on Large-Scale Models

2.2.1. Dataset Description

The occurrence of tunnel water and mud inrush hazards results from the combined influence of geological conditions, groundwater, and other factors. To accurately predict the severity of such hazards, it is essential to consider the key contributing factors. Based on prior knowledge and long-term on-site tunnel monitoring, this study collected data on tunnel water and mud inrush hazard classification (WMIHC) along with its corresponding influencing factors, including water head height (WY), rock quality designation (RQD), water-rich characteristics (WYP), rock structure orientation (RSO), aquifer thickness (AT), and unfavorable geology (UG), as illustrated in Figure 2. WMIHC is determined based on the “Technical Specification for Geology Forecast of Railway Tunnel (Q/CR 9217-2015)” [50]. Considering that both large-scale and extra-large-scale water inrush events during tunnel construction can lead to catastrophic consequences, they are classified into the same category. Therefore, the WMDG is divided into four levels. Deep learning models require input parameters to be quantitatively processed. Specifically, WY, RQD, WYP, and AT are quantitative indicators, while RSO and UG are represented by rock dip angles and surrounding rock quality levels, respectively. This study collected the aforementioned data and applied the data augmentation methods described in Section 2 to construct a dataset for building an effective hazard classification prediction model. The following sections provide further details on these parameters.
(1)
WH
WH directly determines groundwater pressure. When the water head is high, water can easily seep through fractures into the tunnel, increasing the risk of water inrush. An excessively high WH significantly elevates the hazard severity, particularly in water-rich layers and fractured rock masses.
(2)
RQD
RQD reflects the integrity of the rock mass. A low RQD value indicates a highly fractured rock mass with numerous fissures, increasing water permeability and the likelihood of water inrush. In areas with poor rock quality, the hazard severity is higher, particularly under high water head conditions.
(3)
WYP
WYP represents the water-bearing capacity of rock formations. In water-rich layers, significant water flow often occurs during construction, leading to frequent water inrush events. Areas with high water yield properties exhibit higher hazard severity, especially where aquifers are thick, and rock fractures are well-developed.
(4)
Rock stratum occurrence (RSO)
The dip of rock strata influences groundwater flow paths. When rock formations exhibit complex structures or steep inclinations, water can easily infiltrate the tunnel through fractures, increasing the risk of water inrush. Under poor rock conditions, hazard severity tends to be higher.
(5)
Aquifer thickness (AT)
AT determines the groundwater storage capacity. A greater aquifer thickness enhances groundwater permeability, exacerbating water inrush events. Areas with thick aquifers exhibit a higher hazard severity, particularly under high water head conditions and within fractured rock masses.
(6)
Unfavorable geology (UG)
UG includes geological structures such as faults and fracture zones, which have high permeability. Water can easily infiltrate tunnels through these unfavorable geological zones, significantly increasing the risk of water inrush. Areas with adverse geological conditions tend to have higher hazard severity.

2.2.2. The Construction of Intelligent Classification Model Based on RoBERTa

(1)
Introduction of RoBERTa model
Bidirectional Encoder Representations from Transformers (BERT) is a bidirectional Transformer-based language model proposed by Google AI in 2018 [51], including two standard configurations, BERT-base and BERT-large. Its core innovation lies in its dual-task pretraining mechanism, consisting of the Masked Language Model (MLM) and Next Sentence Prediction (NSP), enabling the model to capture contextual semantic information simultaneously [52,53]. The MLM task learns grammatical and semantic features by predicting randomly masked words, while the NSP task determines sentence coherence to understand inter-sentence relationships. This pretraining strategy endows BERT with powerful semantic representation capabilities, allowing it to be fine-tuned for various downstream tasks, including text classification [54], question answering [55], text generation [56], and semantic similarity computation [57].
In 2019, Facebook AI introduced RoBERTa [58], which incorporated several optimizations based on the BERT architecture. First, it removed the NSP task to simplify the training process [59]. Second, it adopted a dynamic masking strategy—randomly generating mask patterns during each training iteration—to enhance model generalization. Additionally, it expanded the training dataset and extended the training duration [59]. Experimental results show that by adjusting batch parameters and increasing training steps, RoBERTa demonstrated superior text comprehension performance in benchmark tests such as GLUE, particularly excelling in long-text modeling and complex semantic relationship processing [60].
To address the limitation that RoBERTa can only process textual inputs, this study proposes an innovative framework for converting structured engineering data into textual representations. The six engineering parameters related to tunnel water and mud inrush hazards (e.g., RQD, WYP) and their corresponding hazard levels are jointly encoded into text through a “feature description + numerical concatenation” strategy, enabling a direct mapping between numerical values and textual sequences. Specifically, all samples are encoded using a unified sentence template, and their numerical values are preserved with full precision, without rounding or reformatting. The generated text sequences are then processed by RoBERTa’s native BPE tokenizer, which represents integers and decimals as intact tokens or stable subword combinations, ensuring consistent expression of continuous numerical values. Based on this text-constructed dataset, a RoBERTa-based intelligent classification model integrating engineering features with a pre-trained language model is developed through fine-tuning. This approach overcomes the intrinsic limitation of traditional language models in handling numerical data and provides a novel deep-learning solution for engineering hazard assessment. The overall RoBERTa-based intelligent classification framework is shown in Figure 3.
(2)
Design of weighted cross entropy loss function
Hazard classification prediction during tunnel construction is a critical aspect of hazard prevention and control, as its outcomes directly influence structural design and construction safety decisions. In predicting the severity of tunnel water and mud inrush hazards, any misclassification of major hazards (Class III and IV) can result in significant safety risks and wasted resources. Therefore, ensuring zero misclassification of major hazards is crucial. Traditional hazard classification methods often struggle with misclassification risks. To address this technical challenge, this study introduces a zero-misclassification strategy for major hazard classification. This approach ensures that while maintaining high classification accuracy for minor hazards (Class I and II), there are no misclassifications for major hazards (Class III and IV). To achieve this goal, a weighted cross-entropy loss function is introduced to optimize the RoBERTa model. During training, a hierarchical weight allocation strategy is designed, assigning higher penalty weights to misclassification of major hazards. This forces the model to prioritize major hazard classification, shifting its optimization boundary toward zero misclassification. This approach not only enhances the model’s classification performance but also improves the reliability of the prediction system in real-world applications. As a result, it provides more precise and robust support for decision-making in the prevention and control of tunnel water and mud inrush hazards. This approach not only enhances the model’s classification performance but also improves the reliability of the prediction system in real-world applications. As a result, it provides more precise and robust support for decision-making in the prevention and control of tunnel water and mud inrush hazards.
In the classification task for tunnel water and mud inrush hazards, the imbalanced distribution of hazard severity levels—where major hazards (Class III and IV) have fewer samples—poses a challenge, particularly due to the high cost of misclassifying severe events. To address this, this study employs a weighted cross-entropy loss function [61,62] to optimize the RoBERTa model, as shown in Equation (6). This approach utilizes a hierarchical weight allocation strategy to adjust penalty intensities differently for each hazard level. By doing so, it maintains classification accuracy for minor hazards while prioritizing the identification of major hazards.
L W C E = 1 N i = 1 N c = 1 C w c · y i , c log p i , c
The weight vector w = [w1, w2, w3, w4] is positively correlated with the importance of each category. When wc > 1, misclassification of category c incurs a higher loss value, amplifying the corresponding parameter gradient by a factor of wc during backpropagation. This mechanism ensures that the model prioritizes correcting misclassifications of high-weight categories during training. Such asymmetric reinforcement significantly alters the topology of the loss surface, pushing the decision boundary toward better recognition of major hazards.
During the fine-tuning of the RoBERTa model, this study replaces the standard cross-entropy loss function with weighted cross-entropy loss function. A hierarchical weight allocation strategy is designed based on practical engineering requirements to maintain classification accuracy for minor hazards while prioritizing the recognition of major hazards.

2.3. Evaluating Indicator

Common evaluation metrics for classification tasks include Accuracy (ACC), Precision (PRE), Recall (REC), and F1-score (F1) [21,63]. As a simple binary classification example, Figure 4 illustrates the definitions of key evaluation components, including True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). The relationships between evaluation metrics and these classification components are defined by Equations (7)–(10).
ACC = TP + TN TP + TN + FP + FN
PRE = TP TP + FP
REC = TP TP + FN
F 1 = 2 × Precision × Recall Precision × Recall
The F1 represents a balance between REC and PRE, treating both as equally important. Additionally, the weighted F1-score (F1W) and macro-averaged F1-score (F1M) [24] are used to evaluate the predictive performance of models in imbalanced classification tasks, as defined in Equations (11) and (12).
F 1 W = j = 1 m w j · F 1 j
F 1 M = 1 m j = 1 m w j · F 1 j
HRMC = j 3 4 i = 1 2 c i , j · ConfusionMatrix j , i j 3 4 N j
where in the equation, wj represents the weight of category j; m denotes the total number of categories; Nj is the number of samples in category j; C(i,j) represents the cost of misclassifying category j as category i. The misclassification cost is set as 1 when a Class III hazard is misclassified as Class I or II, and 3 when a Class IV hazard is misclassified as Class I or II. ConfusionMatrixj,i denotes the number of samples from category j that were misclassified as category i, and Nj represents the total number of samples in category j.

3. Engineering Application and Result Analysis

In this section, a real-world tunnel construction case study is used to validate the effectiveness of the proposed tunnel water and mud inrush hazard classification method, which integrates large-scale models and multi-strategy data enhancement. Comparative analysis is conducted to demonstrate the superior predictive performance of this approach. All models are implemented in a Python 3.12-based development environment using PyCharm 2023.3.7. The experiments are conducted on a Windows system with hardware specifications of an Intel(R) Core(TM) i7-14700HX@5.50 GHz CPU and 16 GB RAM.

3.1. Engineering Background Introduction

3.1.1. Project Profile

The DD railway tunnel is located in Ganzi Tibetan Autonomous Prefecture, Sichuan Province, China, with a total length of approximately 3.4 km. The tunnel cross-section measures 7.5 m in both height and width and is constructed using the drill-and-blast method. The primary lithology consists of Yanshanian granite with well-developed joints and fractures. The tunnel locally intersects high-pressure water-rich fault zones and small-scale fault fracture zones, resulting in highly complex and variable hydrogeological conditions. The geological stratigraphy is illustrated in Figure 5.

3.1.2. Sample Set Construction and Analysis

This study aims to apply pre-trained large language models (LLMs) for the automated intelligent classification of tunnel water and mud inrush hazards. To ensure data quality and representativeness, a strict cleaning process was applied to the DD tunnel dataset. Samples outside engineering-reasonable ranges were first removed. Outliers in WY, RQD, WYP, and AT were then identified using the Interquartile Range (IQR) method. A multivariate consistency check based on Mahalanobis distance was finally performed to eliminate unrealistic hydrological–geological combinations. As a result, a total of 1000 valid samples were obtained for subsequent analysis and model development. Each sample consists of six input parameters and one output parameter (WMIHC). The input factors are categorized into two groups: hydrological factors (WH, WYP, and AT) and geological factors (RQD, RSO, and UG). The characteristics of the collected dataset are summarized in Table 1, which presents the classification of tunnel water and mud inrush hazards (WMIHC) ranging from Class I to Class IV. It is important to note that tunnel water and mud inrush hazards result from the combined influence of hydrological and geological factors. Additionally, to facilitate the development of the intelligent classification model, both input and output parameters were normalized using the min-max scaling method [64].
Figure 6 presents violin plots of various influencing parameters and WMIHC, providing insights into the distribution and magnitude of each parameter. To gain a comprehensive understanding of the dataset, the Pearson correlation coefficient (PCC) is used to evaluate the relationship between each parameter and WMIHC [63]. PCC measures the linear relationship between two variables, with values ranging from −1 to 1. A value of 0 indicates no correlation between the two variables. In contrast, a strong correlation between two variables suggests a significant influence between them. As shown in Figure 7, the PCC values for all parameters with WMIHC are greater than 0.5, indicating that each parameter has a substantial impact on WMIHC.

3.2. Analysis of the Prediction Results of the Proposed Model

3.2.1. Model Training and Parameter Optimization

This study applies a large-model-enhanced and multi-strategy data augmentation method for classifying tunnel water and mud inrush hazards. As described in Section 2.2, the original dataset contains 1000 samples, each with 6 feature variables and 1 target variable. The dataset is first split into a training set and a test set using an 80/20 ratio. Only the training set is augmented using three methods—SMOTE, ADASYN, and Borderline-SMOTE—to increase sample size and reduce class imbalance, while the test set remains unchanged. The augmented training set and the original test set are kept strictly separated throughout all experiments to ensure objective and reliable model evaluation. After augmentation, the training set contains 5410 samples, resulting in a total of 6410 samples in the final dataset.
The proposed research model consists of two main components: multi-strategy data augmentation and RoBERTa optimization. For data augmentation, the threshold T in Borderline-SMOTE is set to 0.5. For RoBERTa optimization, the RoBERTa-Base model is used, featuring 12 Transformer encoder layers, each with 768 hidden units and 12 self-attention heads, totaling approximately 110 million parameters. The model employs a weighted cross-entropy loss function, assigning weight values of w = [1, 1, 3, 3] for hazard levels I, II, III, and IV, respectively. This ensures greater emphasis on major hazards during training. As shown in Figure 8, the optimal model performance is achieved with 64 epochs, a batch size of 16, and a learning rate of 1 × 10−5. The specific hyperparameters of the model in this article are shown in Table 2.

3.2.2. Model Comparatively Analysis

To demonstrate the superior predictive performance of the proposed model, nine machine learning models—Support Vector Machine (SVM), RF, Decision Tree (DT), Logistic Regression (LR), Light Gradient Boosting Machine (LGB), Gradient Boosting Tree (GBT), Extreme Gradient Boosting (XGB), K-Nearest Neighbors (KNN), and Automatic Feature Interaction (AutoInt)—were selected for comparative analysis. The prediction results are summarized in Table 3. Based on the evaluation metrics in Table 3, the proposed model demonstrates significant superiority across multiple key performance indicators, particularly in terms of accuracy and F1W. The proposed model achieves an ACC of 99.26%, significantly outperforming traditional machine learning models such as SVM (77.57%), with an improvement of 27.96%. Additionally, the weighted F1-score of the proposed model reaches 99.25%, further demonstrating its strong capability in accounting for class-wise importance. These findings highlight the clear advantages of large-scale models in enhancing the accuracy of tunnel water and mud inrush hazard classification. In particular, they prove highly effective in handling complex hazard data, delivering more precise predictive outcomes.
Additionally, the proposed model also excels in REC and HRMC, particularly achieving a 100% REC rate for Class III and IV major hazards. This demonstrates its exceptional sensitivity to severe hazards. By incorporating a weighted cross-entropy loss function and assigning different penalty weights (1, 1, 3, 3) for each hazard level, the model effectively mitigates the occurrence of catastrophic hazards and achieves zero misclassification of major hazards. The final HRMC of the proposed model is 0.00%, indicating exceptional accuracy and reliability in hazard classification. This significantly enhances the credibility of the tunnel water and mud inrush hazard prediction system in real-world applications.
As shown in Figure 9, the proposed model exhibits significant superiority in multiple key metrics, particularly in the accuracy of hazard level predictions. The model performs well across all hazard levels, especially for Class III and IV hazards, where all samples are correctly classified. This indicates that the model achieves zero misclassification for major hazards, meeting the zero-misjudgment requirement. Such performance is not observed in traditional models like SVM, RF, and KNN, where the recall for Class III and IV hazards is significantly lower. This highlights the reliability of the proposed model in ensuring accurate predictions.
Additionally, by optimizing the model with a weighted cross-entropy loss function, the proposed approach assigns higher penalty weights to major hazards. This encourages the model to focus more on these severe events during the optimization process. As a result, the model maintains its performance on minor hazards, with slight improvements in some cases, while achieving zero misclassification for major hazard levels. This demonstrates its high reliability in practical applications. A comprehensive analysis indicates that the specialized large-scale models developed in this study effectively overcomes the limitations of traditional methods in hazard classification. In particular, it shows significant advantages in enhancing the prediction accuracy of major hazards and reducing misclassification rates.

3.2.3. Effectiveness Analysis of Model Improvement

Based on practical engineering requirements, this study designs a hierarchical weight allocation strategy: baseline weights of w1 = w2 = 1 are assigned to Class I and II hazards to maintain fundamental performance, while enhanced weights of w3 = w4 = 3 are applied to Level 3 and Level 4 hazards to improve model sensitivity. As shown in Table 4, the RoBERTa model demonstrates significant superiority after being optimized with a weighted cross-entropy loss function. Compared to the original RoBERTa model (with an ACC of 99.08% and F1M of 98.69%), the optimized RoBERTa model, incorporating the weighted cross-entropy function, achieves improvements in both ACC (99.26%) and F1M (98.87%). This enhancement highlights the effectiveness of the weighted cross-entropy approach in refining the overall model performance, particularly in improving hazard classification accuracy.
Additionally, the precision (PRE) and recall (REC) metrics in Table 4 further validate the optimized model’s effectiveness in accurately classifying major hazards. Specifically, the optimized RoBERTa model achieves a 100% REC rate for Class III and IV hazards, indicating that it completely eliminates misclassification of major hazards. In contrast, the unoptimized model attains recall rates of 99.13% and 99.81% for Class III and IV hazards, respectively. Although relatively high, these values indicate minor misclassification errors. After applying weighted cross-entropy optimization, the model’s ability to identify these two hazard levels significantly improves, effectively preventing the underestimation of risks. This enhancement strengthens the safety and reliability of the hazard early warning system.
The confusion matrices in Figure 10a,b further validate the prediction accuracy of the optimized model across different hazard levels. The optimized RoBERTa model outperforms the original RoBERTa in accuracy, particularly for Class III and IV hazards, where it achieves zero misclassification. This demonstrates that optimizing with a weighted cross-entropy loss function not only improves classification accuracy but also enhances the model’s ability to predict major hazards. As a result, HRMC is significantly reduced, ensuring the practical applicability of the model.
In conclusion, the weighted cross-entropy optimization method proposed in this study plays a crucial role in improving the classification accuracy of tunnel water and mud inrush hazard prediction while reducing misclassification rates. It effectively balances classification precision with engineering safety requirements, significantly enhancing the model’s practicality and reliability in real-world hazard prevention and control.

3.3. Analysis of Data Enhancement Effect

3.3.1. Analysis of Enhanced Data Distribution Characteristics

Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE) [65,66,67] were used to analyze the distribution characteristics of the augmented data, assessing its similarity to the original data. Figure 11 indicates a high degree of similarity between the distributions of the original and augmented data. The original data samples are primarily concentrated in the negative regions of PC1 and PC2, whereas the augmented data samples extend across a broader area, particularly in the positive direction of PC1 and PC2, demonstrating an expansion in data diversity. In the low-dimensional space, the augmented data distribution retains the structure of the original dataset while introducing reasonable feature-space expansion, contributing to improved model generalization. The loadings of key variables (e.g., WH and RQD) in the principal components remain consistent, indicating that data augmentation has not altered the relative importance of the variables. Overall, the augmented data preserves the core characteristics of the original dataset while enhancing sample diversity. This ensures the effectiveness of the training data, providing the specialized large-scale models with richer input and improving its predictive performance.
The t-SNE visualization in Figure 12 compares the distribution characteristics of the original and augmented data. The results indicate that while the original data exhibits a more concentrated distribution, the augmented data spans a broader range. This demonstrates that the data augmentation method effectively expands the data space, increasing diversity and variability, particularly in generating synthetic samples. This augmentation strategy effectively balances the distribution of different classes, particularly improving the model’s learning capability for minority or complex categories, such as major hazards. Additionally, the augmented data exhibits more dispersed points and clusters, indicating that the synthetic samples retain the characteristics of the original data while effectively capturing variations within the dataset. By applying these augmentation strategies, the model maintains its original accuracy while successfully reducing the misclassification rate of major hazards, thereby improving hazard severity classification accuracy. Thus, the proposed multi-strategy data augmentation method not only refines data distribution but also enhances the predictive capability of the hazard classification model, demonstrating significant advantages and practical value.

3.3.2. Effectiveness Analysis of Data Augmentation Strategy

To evaluate the effectiveness of the multi-strategy data augmentation method, a comparative analysis was conducted to assess model performance under different data augmentation strategies. As shown in Figure 13, the total number of samples increased to 2472 and 6416 when using only SMOTE and the multi-strategy data augmentation method, respectively, representing approximately 2.5-fold and 6.4-fold expansions compared to the original dataset. Figure 14 illustrates the performance of three model settings—no augmentation, SMOTE, and multi-strategy augmentation—across three evaluation metrics: ACC, F1 for Class III (F1-III), and F1-score for Class IV (F1-IV). The results indicate that the multi-strategy data augmentation method achieves the best performance across all metrics, particularly attaining a high level of 0.99 in ACC and F1-III. This demonstrates its strong capability in data processing and model optimization. In comparison, the SMOTE method also shows some improvement, achieving 0.88 in F1-III. However, its overall performance remains inferior to the multi-strategy data augmentation approach. The model without augmentation performs the worst across all metrics, particularly in F1-IV, where it scores only 0.11. This highlights its limitations in handling imbalanced data. Therefore, the multi-strategy data augmentation method is proven to be a more effective and robust data processing approach.
Figure 15 presents the confusion matrices of the proposed model under different data augmentation strategies. The analysis clearly highlights the superiority of the multi-strategy data augmentation approach. Without data augmentation, the model exhibits relatively poor classification performance, particularly in predicting Class I and IV hazards, where the misclassification rate is notably high. The accuracy for Class IV hazard is especially low. When applying the SMOTE data augmentation strategy, the model’s predictive accuracy for Class IV hazard improves significantly. By integrating SMOTE, ADASYN, and Borderline-SMOTE, the multi-strategy data augmentation approach significantly enhances classification accuracy across all hazard levels. Notably, the prediction accuracy for Class IV hazards is nearly perfect, while also preventing any decline in the accuracy of lower-level hazard classifications. The multi-strategy augmentation effectively balances class distribution within the dataset, optimizes the model’s adaptability to different hazard levels, enhances its ability to classify severe hazards, and maintains high predictive accuracy for other categories. Therefore, the multi-strategy data augmentation approach not only mitigates class imbalance but also improves overall model classification performance, demonstrating significant advantages in handling complex hazard classification tasks.

3.3.3. Analysis of Model Performance Improvement Effect After Data Enhancement

To further demonstrate the reliability and superiority of the proposed multi-strategy data augmentation method, the improvement in predictive performance after applying multi-strategy data augmentation was analyzed for various models, including SVM, RF, DT, LR, LGB, GBT, XGB, KNN, AutoInt, and the proposed model, as shown in Table 5. The experimental results indicate that the proposed method exhibits significant advantages in handling imbalanced data. The proposed model achieves a 22.26% increase in ACC and a 22.33% improvement in F1W. Additionally, its REC for major hazards (Class III and IV) reaches 38.64% and 81%, respectively, representing an improvement of over 20% compared to traditional models such as SVM and LR. The multi-strategy augmentation approach systematically addresses the long-tail distribution issue in hazard classification by leveraging SMOTE for global sample balancing, ADASYN for adaptively generating hard-to-classify samples, and Borderline-SMOTE for strengthening classification boundaries. This approach significantly enhances the recognition capability for minority classes, particularly for Class IV hazards.
Compared to single augmentation strategies (e.g., AutoInt achieving a 90% recall for Class III but suffering a decline in overall performance), the proposed method balances both global and local data characteristics, mitigating the risk of overfitting and highlighting the comprehensive advantages of an integrated strategy. Additionally, by incorporating a weighted cross-entropy loss function (with a penalty weight of 3 for Class III and IV hazards), the model maintains classification accuracy for minor disasters (Class I and II) while achieving zero misclassification for major hazards. This confirms the synergistic effect of data augmentation and loss function optimization. Comparative experiments further demonstrate that tree-based models (RF and XGB) and ensemble methods (LGB) benefit significantly from the multi-strategy augmentation, with improvements exceeding 10% in both ACC and F1W. In contrast, LR, which is highly sensitive to sample distribution, experiences a performance decline, underscoring the proposed method’s adaptability to complex models. In conclusion, the proposed model framework employs a three-pronged optimization approach—multi-dimensional data generation, large-scale models fine-tuning, and loss function design—effectively enhancing the robustness and reliability of hazard classification. This provides a novel perspective for intelligent hazard diagnosis in small-sample engineering scenarios.

3.4. Model Robustness Analysis

To assess the robustness of the model, its predictive performance was analyzed under varying levels of Gaussian noise added to the test set. As shown in Figure 16, when the noise variance increases from 0.0 to 0.3, the global evaluation metrics ACC and F1-score decrease from 99.2% and 98.5% to 96.1% and 96.3%, respectively. The reduction is limited to 3.1% and 2.2%, demonstrating the model’s resilience to noise interference. Notably, hazard severity-specific metrics exhibit significant differences: Class III recall (REC-III) decreases from 97.8% to 95.5%, while Class IV recall (REC-IV) remains consistently above 99.0%. This characteristic arises from the weighted cross-entropy loss function’s reinforced weighting strategy across all four hazard levels, along with the pre-trained RoBERTa parameters’ implicit noise-filtering capability. As a result, the model ensures zero misclassification of severe hazards while maintaining overall classification stability.
The model’s noise resistance can be attributed to a multi-dimensional technical synergy: SMOTE-based algorithms improve the continuity of geological parameter distributions (e.g., RQD and WYP) by synthesizing minority class samples, while RoBERTa’s attention mechanism dynamically focuses on key features (e.g., WH and RSO) to suppress noise propagation. Experimental results show that when the noise variance reaches 0.3, REC-IV remains at 99.0%, validating the model’s reliability in identifying major hazards, which are critical for engineering safety. However, the slight decline in REC-III (Δ = 2.3%) reveals the sensitivity of mid-level hazard boundary features to noise disturbances, which may be related to representational bias in the nonlinear coupling of geological parameters. The experiments demonstrate that the proposed intelligent classification model for tunnel water and mud inrush hazards exhibits strong robustness under Gaussian noise interference by integrating large-scale models with multi-strategy data augmentation.
According to the results in Table 6, different class-weight settings have a clear impact on model performance. When all class weights are set to 1, the model accuracy is 98.75%, and the recall of Levels III and IV is relatively low, indicating that the model tends to focus on the more frequent minor-disaster samples. When the weights begin to shift toward the severe classes (e.g., [1, 1, 2, 2]), the ACC increases to 99.05%, and the recall values for Levels III and IV rise to 99.5% and 99.1%, showing that the model becomes more sensitive to high-risk categories. With the proposed weight vector [1, 1, 3, 3], the model achieves the best overall performance, reaching 99.26% ACC and 99.25% F1W, while both REC-III and REC-IV reach 100%, meaning there are no missed severe-disaster cases. This indicates that the proposed weight setting provides the best balance between accuracy and safety. When the weight is further increased to [1, 1, 5, 5], the recall remains high, but ACC and F1W slightly decrease, suggesting that overly large weights cause the model to overemphasize severe disasters and reduce overall performance. Therefore, the weight setting [1, 1, 3, 3] is the most effective choice for balancing prediction accuracy, robustness, and engineering safety.

4. Conclusions and Outlook

This study addresses the challenges of small sample constraints and safety control requirements in intelligent classification of tunnel water and mud inrush hazards. An innovative approach integrating multi-strategy data augmentation and large-scale model optimization is proposed. The effectiveness of this method is comprehensively evaluated using data from the DD tunnel project in the mountainous regions of western China. The main conclusions are as follows:
(1)
Expansion of sample space through multi-strategy data augmentation: By integrating SMOTE, ADASYN, and Borderline-SMOTE, the proposed multi-strategy data augmentation method expands the dataset by 6.4 times and improves model ACC by 22.26%. This approach achieves density balancing and local feature enhancement, effectively addressing the issue of low prediction accuracy in tunnel water and mud inrush hazard classification caused by insufficient samples. It provides a generalizable solution for small-sample challenges in engineering applications.
(2)
Specialized large-scale models for water and mud inrush prediction: This study fine-tunes the pre-trained language model RoBERTa using a tunnel water and mud inrush hazard classification dataset, transforming its “language understanding ability” into “numerical prediction capability” to enable intelligent hazard classification. Engineering application results show that, compared to nine other machine learning models, the proposed model achieves an ACC of 99.26% and an F1W of 99.25%, demonstrating superior prediction accuracy and strong robustness.
(3)
Zero-misclassification strategy for major hazard classification: The RoBERTa model is optimized using a weighted cross-entropy loss function, with a hierarchical weight allocation strategy designed to enhance the model’s learning capability for major hazards. Experimental results indicate that this method maintains classification accuracy for minor hazards while achieving zero misclassification for major hazards. Specifically, with PRE-I and PRE-II at 99.33% and 99.34%, REC-III and REC-IV both reach 100%, and HRMC remains at 0.00. This approach effectively balances classification accuracy with engineering safety requirements, significantly enhancing the model’s practicality and reliability in real-world hazard prevention and control.
Although the proposed multi-strategy data augmentation and large-model fusion method performs well for classifying tunnel water and mud inrush hazards, several limitations remain. First, the model is trained on data from a single tunnel in Sichuan, and variations in lithology, groundwater conditions, and unfavorable geology across different tunnels may cause the model to rely on site-specific patterns, limiting its cross-regional generalization. Future work should therefore include cross-tunnel validation and introduce transfer learning and domain-adaptation techniques (such as fine-tuning, MMD-based alignment, and semi-supervised transfer) to improve adaptability across diverse geological environments. Second, RoBERTa still exhibits a degree of “black-box” behavior, and attention weights alone cannot reveal the complex hydro-geological mechanisms behind hazard evolution; more advanced interpretability methods such as integrated gradients and Shapley-value attributions will be explored. In addition, small-sample conditions may still lead to hidden overfitting risks, requiring more robust data-generation strategies and uncertainty-quantification techniques. Finally, the relatively slow inference speed of RoBERTa may limit real-time early-warning applications, and lightweight strategies such as model compression, knowledge distillation, and edge deployment will be necessary to enhance practical usability. Overall, developing a hazard-classification model that is transferable, interpretable, and deployable will be an important direction for future research.

Author Contributions

Conceptualization, X.Y. and M.H.; methodology, X.Y.; software, F.S.; validation, X.Y.; formal analysis, X.Y.; investigation, X.Y.; resources, M.H.; data curation, L.Y.; writing—original draft preparation, X.Y.; writing—review and editing, X.Y., M.H., F.S. and L.Y.; visualization, L.Y.; supervision, M.H.; project administration, M.H.; funding acquisition, M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Science and Technology Research and Development Plan of China State Railway Group Co., Ltd. (Grant No. P2019G055) and the Major Science and Technology Special Project of Xinjiang Uygur Autonomous Region (Grant No. 2020A03003).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that this study received funding from Science and Technology Research and Development Plan of China State Railway Group Co., Ltd. (Grant No. P2019G055). The funder had the following involvement with the study: Mingli Huang.

References

  1. Wang, S.; Li, L.; Cheng, S.; Yang, J.; Jin, H.; Gao, S.; Wen, T. Study on an improved real-time monitoring and fusion prewarning method for water inrush in tunnels. Tunn. Undergr. Space Technol. 2021, 112, 103884. [Google Scholar] [CrossRef]
  2. Wu, X.; Feng, Z.; Yang, S.; Qin, Y.; Chen, H.; Liu, Y. Safety risk perception and control of water inrush during tunnel excavation in karst areas: An improved uncertain information fusion method. Autom. Constr. 2024, 163, 105421. [Google Scholar] [CrossRef]
  3. Dong, J.; Shen, Z.; Cao, L.; Mi, J.; Li, J.; Zhao, Y.; Mu, H.; Liu, L.; Dai, C. Water-sand inrush risk assessment method of sandy dolomite tunnel and its application in the Chenaju tunnel, southwest of China. Geomat. Nat. Hazards Risk 2023, 14, 2196369. [Google Scholar] [CrossRef]
  4. Wang, Y.; Chen, F.; Yin, X.; Geng, F. Study on the risk assessment of water inrush in karst tunnels based on intuitionistic fuzzy theory. Geomat. Nat. Hazards Risk 2019, 10, 1070–1083. [Google Scholar] [CrossRef]
  5. Xie, Q.; Cao, Z.; Sun, W.; Fumagalli, A.; Fu, X.; Wu, Z.; Wu, K. Numerical simulation of the fluid-solid coupling mechanism of water and mud inrush in a water-rich fault tunnel. Tunn. Undergr. Space Technol. 2023, 131, 104796. [Google Scholar] [CrossRef]
  6. Liu, N.; Pei, J.; Cao, C.; Liu, X.; Huang, Y.; Mei, G. Geological investigation and treatment measures against water inrush hazard in karst tunnels: A case study in Guiyang, southwest China. Tunn. Undergr. Space Technol. 2022, 124, 104491. [Google Scholar] [CrossRef]
  7. Lan, Q.; Zhang, Z.; Xu, P. Research on disaster-causing characteristics of water and mud inrush and combined prevention-control measures in water-rich sandstone and slate interbedded strata tunnel. Tunn. Undergr. Space Technol. 2025, 156, 106250. [Google Scholar] [CrossRef]
  8. Chu, H.; Xu, G.; Yasufuku, N.; Yu, Z.; Liu, P.; Wang, J. Risk assessment of water inrush in karst tunnels based on two-class fuzzy comprehensive evaluation method. Arab. J. Geosci. 2017, 10, 179. [Google Scholar] [CrossRef]
  9. Wang, Y.; Jing, H.; Yu, L.; Su, H.; Luo, N. Set pair analysis for risk assessment of water inrush in karst tunnels. Bull. Eng. Geol. Environ. 2017, 76, 1199–1207. [Google Scholar] [CrossRef]
  10. Yang, X.; Zhang, S. Risk assessment model of tunnel water inrush based on improved attribute mathematical theory. J. Cent. South Univ. 2018, 25, 379–391. [Google Scholar] [CrossRef]
  11. Zhang, K.; Zheng, W.; Xu, C.; Chen, S. An improved extension system for assessing risk of water inrush in tunnels in carbonate karst terrain. KSCE J. Civ. Eng. 2019, 23, 2049–2064. [Google Scholar] [CrossRef]
  12. Li, S.; Wu, J. A multi-factor comprehensive risk assessment method of karst tunnels and its engineering application. Bull. Eng. Geol. Environ. 2019, 78, 1761–1776. [Google Scholar] [CrossRef]
  13. Xue, Y.; Li, Z.; Li, S.; Qiu, D.; Su, M.; Xu, Z.; Zhou, B.; Tao, Y. Water inrush risk assessment for an undersea tunnel crossing a fault: An analytical model. Mar. Georesour. Geotechnol. 2019, 37, 816–827. [Google Scholar] [CrossRef]
  14. Wu, B.; Chen, H.; Huang, W.; Meng, G. Dynamic evaluation method of the EW–AHP attribute identification model for the tunnel gushing water disaster under interval conditions and applications. Math. Probl. Eng. 2021, 2021, 6661609. [Google Scholar] [CrossRef]
  15. Kong, H.; Zhang, N. Risk assessment of water inrush accident during tunnel construction based on FAHP–I–TOPSIS. J. Clean. Prod. 2024, 449, 141744. [Google Scholar] [CrossRef]
  16. Zhao, R.; Zhang, L.; Hu, A.; Kai, S.; Fan, C. Risk assessment of karst water inrush in tunnel engineering based on improved game theory and uncertainty measure theory. Sci. Rep. 2024, 14, 20284. [Google Scholar] [CrossRef]
  17. Wang, S.; Ding, H.; Huang, F.; Wei, Q.; Li, T.; Wen, T. Ideal point interval recognition model for dynamic risk assessment of water inrush in karst tunnel and its application. Pol. J. Environ. Stud. 2024, 33, 1875–1886. [Google Scholar] [CrossRef]
  18. Cheng, S.; Yin, X.; Gao, F.; Pan, Y. Surrounding rock squeezing classification in underground engineering using a hybrid paradigm of generative artificial intelligence and deep ensemble learning. Mathematics 2024, 12, 3832. [Google Scholar] [CrossRef]
  19. Wang, M. Intelligent classification model of surrounding rock of tunnel using drilling and blasting method. Undergr. Space 2021, 6, 539–550. [Google Scholar] [CrossRef]
  20. Bao, G.; Hou, K.; Sun, H. Rock burst intensity-grade prediction based on comprehensive weighting method and Bayesian optimization algorithm–improved-support vector machine model. Sustainability 2023, 15, 15880. [Google Scholar] [CrossRef]
  21. Zhang, C.; Wang, Y.; Wu, L.; Dong, Z.; Li, X. Physics-informed and data-driven machine learning of rock mass classification using prior geological knowledge and TBM operational data. Tunn. Undergr. Space Technol. 2024, 152, 105923. [Google Scholar] [CrossRef]
  22. Zeng, Y.; Wei, Y.; Yang, Y. A novel identification technology and real-time classification forecasting model based on hybrid machine learning methods in mixed weathered mudstone–sand–pebble formation. Tunn. Undergr. Space Technol. 2024, 153, 106545. [Google Scholar] [CrossRef]
  23. Li, M.; Li, K.; Qin, Q. A rockburst prediction model based on extreme learning machine with improved Harris Hawks optimization and its application. Tunn. Undergr. Space Technol. 2023, 134, 104978. [Google Scholar] [CrossRef]
  24. Ma, J.; Ma, C.; Li, T.; Yan, W.; Shirani Faradonbeh, R. Real-time classification model for tunnel surrounding rocks based on high-resolution neural network and structure–optimizer hyperparameter optimization. Comput. Geotech. 2024, 168, 106155. [Google Scholar] [CrossRef]
  25. Yang, Z.; He, B.; Liu, Y.; Wang, D.; Zhu, G. Classification of rock fragments produced by tunnel boring machine using convolutional neural networks. Autom. Constr. 2021, 125, 103612. [Google Scholar] [CrossRef]
  26. Zhuang, X.; Fan, W.; Guo, H.; Chen, X.; Wang, Q. Surrounding rock classification from onsite images with deep transfer learning based on EfficientNet. Front. Struct. Civ. Eng. 2024, 18, 1311–1320. [Google Scholar] [CrossRef]
  27. Song, Q.; Xue, Y.; Li, G.; Su, M.; Qiu, D.; Kong, F.; Zhou, B. Using Bayesian network and intuitionistic fuzzy analytic hierarchy process to assess the risk of water inrush from fault in subsea tunnel. Geomech. Eng. 2021, 27, 605–614. [Google Scholar] [CrossRef]
  28. Feng, X.; Lu, Y.; He, J.; Lu, B.; Wang, K. Bayesian-network-based predictions of water inrush incidents in soft rock tunnels. KSCE J. Civ. Eng. 2024, 28, 5934–5945. [Google Scholar] [CrossRef]
  29. Zhang, N.; Niu, M.; Wan, F.; Lu, J.; Wang, Y.; Yan, X.; Zhou, C. Hazard prediction of water inrush in water-rich tunnels based on random forest algorithm. Appl. Sci. 2024, 14, 867. [Google Scholar] [CrossRef]
  30. Jibril, E.C.; Tantug, A.C. ANEC: An Amharic named entity corpus and transformer based recognizer. IEEE Access 2023, 11, 15799–15815. [Google Scholar] [CrossRef]
  31. Mustafa, A.; Naseem, U.; Rahimi Azghadi, M. Large language models vs human for classifying clinical documents. Int. J. Med. Inform. 2025, 195, 105800. [Google Scholar] [CrossRef] [PubMed]
  32. Liu, J.; Zhao, Z.; Lv, C.; Ding, Y.; Chang, H.; Xie, Q. An image enhancement algorithm to improve road tunnel crack transfer detection. Constr. Build. Mater. 2022, 348, 128583. [Google Scholar] [CrossRef]
  33. He, B. Applying data augmentation technique on blast-induced overbreak prediction: Resolving the problem of data shortage and data imbalance. Expert Syst. Appl. 2024, 237, 121616. [Google Scholar] [CrossRef]
  34. Qin, H.; Zhang, D.; Tang, Y.; Wang, Y. Automatic recognition of tunnel lining elements from GPR images using deep convolutional networks with data augmentation. Autom. Constr. 2021, 130, 103830. [Google Scholar] [CrossRef]
  35. Li, P. Generative adversarial network for optimization of operational parameters based on shield posture requirements. Autom. Constr. 2024, 165, 105553. [Google Scholar] [CrossRef]
  36. Zhou, Z.; Zhang, J.; Gong, C.; Wu, W. Automatic tunnel lining crack detection via deep learning with generative adversarial network-based data augmentation. Undergr. Space 2023, 9, 140–154. [Google Scholar] [CrossRef]
  37. Zhao, N.; Song, Y.; Yang, A.; Lv, K.; Jiang, H.; Dong, C. Accurate classification of tunnel lining cracks using lightweight ShuffleNetV2-1.0-SE model with DCGAN-based data augmentation and transfer learning. Appl. Sci. 2024, 14, 4142. [Google Scholar] [CrossRef]
  38. Yu, H.; Sun, H.; Tao, J.; Qin, C.; Xiao, D.; Jin, Y.; Liu, C. A multi-stage data augmentation and AD-ResNet-based method for EPB utilization factor prediction. Autom. Constr. 2023, 147, 104734. [Google Scholar]
  39. Chen, J.; Huang, H.; Cohn, A.G.; Zhang, D.; Zhou, M. Machine learning-based classification of rock discontinuity trace: SMOTE oversampling integrated with GBT ensemble learning. Int. J. Min. Sci. Technol. 2022, 32, 309–322. [Google Scholar] [CrossRef]
  40. Liu, Q.; Wang, X.; Huang, X.; Yin, X. Prediction model of rock mass class using classification and regression tree integrated AdaBoost algorithm based on TBM driving data. Tunn. Undergr. Space Technol. 2020, 106, 103595. [Google Scholar] [CrossRef]
  41. Xue, Y.; Zhang, W.; Wang, Y.; Luo, W.; Jia, F.; Li, S.; Pang, H. Serviceability evaluation of highway tunnels based on data mining and machine learning: A case study of continental United States. Tunn. Undergr. Space Technol. 2023, 142, 105418. [Google Scholar] [CrossRef]
  42. Katuwal, T.B.; Panthi, K.K.; Basnet, C.B. Machine learning approach for rock mass classification with imbalanced database of TBM tunnelling in Himalayan geology. Rock Mech. Rock Eng. 2024, 58, 11293–11318. [Google Scholar] [CrossRef]
  43. Yang, H.; Li, H.; Chen, C.; Liu, X. Rapid stability assessment of barrier dams based on the extreme gradient boosting model. Nat. Hazards 2025, 121, 3047–3072. [Google Scholar] [CrossRef]
  44. Wang, L.; Guo, S.; Wang, J.; Chen, Y.; Qiu, H.; Zhang, J.; Wei, X. A novel multi-scale standardized index analyzing monthly to sub-seasonal drought-flood abrupt alternation events in the Yangtze River basin. J. Hydrol. 2024, 633, 130999. [Google Scholar] [CrossRef]
  45. Li, Z.; Liu, T.; Guan, C.; Liu, L.; Han, M. Prediction for rock conditions in a tunnel area using advanced geological drilling predictions based on multiwavelet analysis and modified evidence reasoning. Int. J. Geomech. 2024, 24, 04024027. [Google Scholar] [CrossRef]
  46. Zhong, K.; Tan, X.; Liu, S.; Lu, Z.; Hou, X.; Wang, Q. Prediction of slope failure probability based on machine learning with genetic-ADASYN algorithm. Eng. Geol. 2025, 346, 107885. [Google Scholar] [CrossRef]
  47. Li, Y.; Chen, J.; Fang, Q.; Zhang, D.; Huang, W. Towards automated lithology classification in NATM tunnel: A data-driven solution for multi-dimensional imbalanced data. Rock Mech. Rock Eng. 2024, 58, 2349–2366. [Google Scholar] [CrossRef]
  48. Wang, H.; Meng, Y.; Xu, H.; Wang, H.; Guan, X.; Liu, Y.; Liu, M.; Wu, Z. Prediction of flood risk levels of urban flooded points though using machine learning with unbalanced data. J. Hydrol. 2024, 630, 130742. [Google Scholar] [CrossRef]
  49. Li, K.; Ren, B.; Guan, T.; Wang, J.; Yu, J.; Wang, K.; Huang, J. A hybrid cluster-borderline SMOTE method for imbalanced data of rock groutability classification. Bull. Eng. Geol. Environ. 2022, 81, 39. [Google Scholar] [CrossRef]
  50. Q/CR 9217-2015; Technical Specification for Geology Forecast of Railway Tunnel. China Railway Publishing House Co., Ltd.: Beijing, China, 2015.
  51. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 2019. [Google Scholar] [CrossRef]
  52. Sun, J.; Huang, S.; Wei, C. A BERT-based deontic logic learner. Inf. Process. Manag. 2023, 60, 103374. [Google Scholar] [CrossRef]
  53. Zhu, X.; Zhu, Y.; Zhang, L.; Chen, Y. A BERT-based multi-semantic learning model with aspect-aware enhancement for aspect polarity classification. Appl. Intell. 2023, 53, 4609–4623. [Google Scholar] [CrossRef]
  54. Briskilal, J.; Subalalitha, C.N. An ensemble model for classifying idioms and literal texts using BERT and RoBERTa. Inf. Process. Manag. 2022, 59, 102756. [Google Scholar] [CrossRef]
  55. Yang, Y.; Kang, S. Common sense-based reasoning using external knowledge for question answering. IEEE Access 2020, 8, 227185–227192. [Google Scholar] [CrossRef]
  56. Pan, W.; Jiang, P.; Li, Y.; Wang, Z.; Huang, J. Research on automatic pilot repetition generation method based on deep reinforcement learning. Front. Neurorobot. 2023, 17, 1285831. [Google Scholar] [CrossRef] [PubMed]
  57. Martín, A.; Huertas-Tato, J.; Huertas-García, Á.; Villar-Rodríguez, G.; Camacho, D. FacTeR-Check: Semi-automated fact-checking through semantic similarity and natural language inference. Knowl.-Based Syst. 2022, 251, 109265. [Google Scholar] [CrossRef]
  58. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A robustly optimized BERT pretraining approach. arXiv 2019. [Google Scholar] [CrossRef]
  59. Malik, M.S.I.; Nazarova, A.; Jamjoom, M.M.; Ignatov, D.I. Multilingual hope speech detection: A robust framework using transfer learning of fine-tuning RoBERTa model. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 101736. [Google Scholar] [CrossRef]
  60. Areshey, A.; Mathkour, H. Exploring transformer models for sentiment classification: A comparison of BERT, RoBERTa, ALBERT, DistilBERT, and XLNet. Expert Syst. 2024, 41, e13701. [Google Scholar] [CrossRef]
  61. Ahmed, J.; Green, R.C. Cost aware LSTM model for predicting hard disk drive failures based on extremely imbalanced SMART sensors data. Eng. Appl. Artif. Intell. 2024, 127, 107339. [Google Scholar] [CrossRef]
  62. Ho, Y.; Wookey, S. The real-world-weight cross-entropy loss function: Modeling the costs of mislabeling. IEEE Access 2020, 8, 4806–4813. [Google Scholar] [CrossRef]
  63. Chen, J.; Zhou, M.; Zhang, D.; Huang, H.; Zhang, F. Quantification of water inflow in rock tunnel faces via convolutional neural network approach. Autom. Constr. 2021, 123, 103526. [Google Scholar] [CrossRef]
  64. Tao, M.; Hong, Z.; Zhao, H.; Zhao, M.; Wang, D. Intelligent prediction method for underbreak extent in underground tunnelling. Int. J. Rock Mech. Min. Sci. 2024, 176, 105728. [Google Scholar] [CrossRef]
  65. Demetgul, M.; Zheng, Q.; Tansel, I.N.; Fleischer, J. Monitoring the misalignment of machine tools with autoencoders after they are trained with transfer learning data. Int. J. Adv. Manuf. Technol. 2023, 128, 3357–3373. [Google Scholar] [CrossRef]
  66. de Curtò, J.; de Zarzà, I.; Roig, G.; Calafate, C.T. Signature and log-signature for the study of empirical distributions generated with GANs. Electronics 2023, 12, 2192. [Google Scholar] [CrossRef]
  67. Asre, S.; Anwar, A. Synthetic energy data generation using time variant generative adversarial network. Electronics 2022, 11, 355. [Google Scholar] [CrossRef]
Figure 1. Overall framework.
Figure 1. Overall framework.
Sustainability 17 11286 g001
Figure 2. Tunnel construction parameters.
Figure 2. Tunnel construction parameters.
Sustainability 17 11286 g002
Figure 3. Flow chart of intelligent grading model based on RoBERTa.
Figure 3. Flow chart of intelligent grading model based on RoBERTa.
Sustainability 17 11286 g003
Figure 4. The relationship diagram used to calculate the evaluation index process.
Figure 4. The relationship diagram used to calculate the evaluation index process.
Sustainability 17 11286 g004
Figure 5. Tunnel longitudinal profile.
Figure 5. Tunnel longitudinal profile.
Sustainability 17 11286 g005
Figure 6. Data distribution of all parameters in the dataset. (a) WH; (b) RQD; (c) WYP; (d) RSO; (e) AT; (f) UG; (g) WMIHC.
Figure 6. Data distribution of all parameters in the dataset. (a) WH; (b) RQD; (c) WYP; (d) RSO; (e) AT; (f) UG; (g) WMIHC.
Sustainability 17 11286 g006
Figure 7. Correlation heat maps of all variables.
Figure 7. Correlation heat maps of all variables.
Sustainability 17 11286 g007
Figure 8. Model parameter optimization results. (a) Epoch; (b) Batch size; (c) Learning rate.
Figure 8. Model parameter optimization results. (a) Epoch; (b) Batch size; (c) Learning rate.
Sustainability 17 11286 g008
Figure 9. The confusion matrix of different model prediction results. (a) SVM; (b) RF; (c) DT; (d) LR; (e) LGB; (f) GBT; (g) XGB; (h) KNN; (i) AutoInt; (j) Proposed.
Figure 9. The confusion matrix of different model prediction results. (a) SVM; (b) RF; (c) DT; (d) LR; (e) LGB; (f) GBT; (g) XGB; (h) KNN; (i) AutoInt; (j) Proposed.
Sustainability 17 11286 g009aSustainability 17 11286 g009b
Figure 10. The confusion matrix of model prediction results before and after optimization. (a) Original RoBERTa; (b) Optimized RoBERTa.
Figure 10. The confusion matrix of model prediction results before and after optimization. (a) Original RoBERTa; (b) Optimized RoBERTa.
Sustainability 17 11286 g010
Figure 11. PCA visualization results of original data and enhanced data.
Figure 11. PCA visualization results of original data and enhanced data.
Sustainability 17 11286 g011
Figure 12. t-SNE visualization results of raw data and enhanced data.
Figure 12. t-SNE visualization results of raw data and enhanced data.
Sustainability 17 11286 g012
Figure 13. Total number of samples under different data augmentation strategies.
Figure 13. Total number of samples under different data augmentation strategies.
Sustainability 17 11286 g013
Figure 14. Model prediction results under different data augmentation strategies.
Figure 14. Model prediction results under different data augmentation strategies.
Sustainability 17 11286 g014
Figure 15. Confusion matrix of model prediction results under different data augmentation strategies. (a) No augmentation; (b) SMOTE; (c) multi-strategy augmentation.
Figure 15. Confusion matrix of model prediction results under different data augmentation strategies. (a) No augmentation; (b) SMOTE; (c) multi-strategy augmentation.
Sustainability 17 11286 g015
Figure 16. Prediction performance of the model under different Gaussian noises.
Figure 16. Prediction performance of the model under different Gaussian noises.
Sustainability 17 11286 g016
Table 1. Statistical characteristics of input and output parameters.
Table 1. Statistical characteristics of input and output parameters.
TypeIndicatorUnitMin–MaxMeanStd. Dev
InputsWHm11–14177.2629.06
RQD%9–9237.3617.00
WYPm3·h−1·m−10.1–11.74.111.98
RSO°10–8837.7311.44
ATm0–3613.817.16
UG3–54.370.57
OutputWMIHC1–42.050.66
Table 2. Hyperparameter settings.
Table 2. Hyperparameter settings.
NameValue
OptimizerAdamW
Learning rate1 × 10−5
β10.9
β20.999
Weight decay0.01
Dropout0.1
Warm-up steps500
Max sequence length128
Epoch64
Batch size16
Table 3. Performance evaluation results of different models.
Table 3. Performance evaluation results of different models.
ModelACC/%F1W/%F1M/%REC/%HRMC
Class IIIClass IV
SVM77.5772.9061.4773.4889.290.32
RF97.0496.9195.1099.57100.000.11
DT95.5695.5393.4896.0998.870.21
LR69.3266.3957.9166.0980.080.45
LGBM97.2997.1995.5299.13100.000.14
GBT95.2694.9892.1298.04100.000.15
XGB97.1097.0095.2898.7099.810.18
KNN97.0496.8795.0299.78100.000.08
AutoInt73.2673.0570.9490.0051.880.63
Proposed99.2699.2598.87100.00100.000.00
Table 4. Comparative analysis before and after model optimization.
Table 4. Comparative analysis before and after model optimization.
ModelACC/%F1M/%PRE/%REC/%HRMC
Class IClass IIClass IIIClass IV
Original RoBERTa99.0898.6999.7896.1799.1399.810.15
Optimized RoBERTa99.2698.8799.3399.42100.00100.000.00
Table 5. The prediction performance of different models is improved after data enhancement.
Table 5. The prediction performance of different models is improved after data enhancement.
ModelACC/%F1W/%REC/%
Class IIIClass IV
SVM−4.1−7.45+1.48+89.29
RF+10.04+10.24+22.3+50
DT+13.89+13.36+27.91+48.87
LR−10.68−13.02+0.18+30.08
LGB+10.96+11.09+21.860
GBT+8.26+8.2+20.770
XGB+10.77+10.83+28.25+1.55
KNN+10.71+10.7+22.51+72
AutoInt−2.74+5.08+90+51.88
Proposed+22.26+22.33+38.64+81
Table 6. Effect analysis under different weights of various disaster levels.
Table 6. Effect analysis under different weights of various disaster levels.
Weight Vector (I, II, III, IV)ACCREC-III and REC-IVF1W
[1, 1, 1, 1]96.75%96.2%, 96.8%96.74%
[1, 1, 2, 2]98.05%98.5%, 98.1%98.04%
[1, 1, 3, 3] (Proposed)99.26%100%, 100%99.25%
[1, 1, 5, 5]99.18%100%, 100%99.17%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yao, X.; Huang, M.; Shi, F.; Yu, L. Intelligent and Sustainable Classification of Tunnel Water and Mud Inrush Hazards with Zero Misjudgment of Major Hazards: Integrating Large-Scale Models and Multi-Strategy Data Enhancement. Sustainability 2025, 17, 11286. https://doi.org/10.3390/su172411286

AMA Style

Yao X, Huang M, Shi F, Yu L. Intelligent and Sustainable Classification of Tunnel Water and Mud Inrush Hazards with Zero Misjudgment of Major Hazards: Integrating Large-Scale Models and Multi-Strategy Data Enhancement. Sustainability. 2025; 17(24):11286. https://doi.org/10.3390/su172411286

Chicago/Turabian Style

Yao, Xiayi, Mingli Huang, Fashun Shi, and Liucheng Yu. 2025. "Intelligent and Sustainable Classification of Tunnel Water and Mud Inrush Hazards with Zero Misjudgment of Major Hazards: Integrating Large-Scale Models and Multi-Strategy Data Enhancement" Sustainability 17, no. 24: 11286. https://doi.org/10.3390/su172411286

APA Style

Yao, X., Huang, M., Shi, F., & Yu, L. (2025). Intelligent and Sustainable Classification of Tunnel Water and Mud Inrush Hazards with Zero Misjudgment of Major Hazards: Integrating Large-Scale Models and Multi-Strategy Data Enhancement. Sustainability, 17(24), 11286. https://doi.org/10.3390/su172411286

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop