1. Introduction
Shield tunnels are widely used in urban underground transportation systems because of their high construction efficiency and limited disturbance to surface activities. However, as groundwater fluctuations intensify under the combined effects of extreme weather and construction activities, shield tunnels are increasingly exposed to leakage-related hazards because their prefabricated lining system contains numerous segmental joints. Large transverse convergence may lead to joint opening or segment misalignment [
1,
2]. When such deformation exceeds the allowable limit, groundwater may transport surrounding soil into the tunnel, thereby triggering water–soil gushing accidents [
3,
4,
5]. The associated soil loss can induce severe ground disturbance and even large-scale collapse, thereby threatening tunnel serviceability and overburden strata (
Figure 1). For instance, in 2016, a water-leakage accident occurred during construction of the Fukuoka Subway Nanakuma Line in Japan. Continuous inflow of sandy soil into the tunnel triggered a sudden large-scale ground collapse, forming a sinkhole of about 30 × 27 × 15 m; in 2024, a similar water-and-sand gushing incident occurred between Kaiyuanmen Station and Tumen Station on Xi’an Metro Line 8, causing extensive surface collapse; in 2025, a metro line in Bangkok also experienced a water–sand inrush, which abruptly generated a large sinkhole on Samsen Road, and the void was estimated to extend nearly 30 m across and up to 50 m deep. As illustrated in
Figure 1b, water–soil inflow may first disturb the surrounding ground and form internal voids above the tunnel crown; continued soil gushing then aggravates upward propagation of the disturbed zone, eventually resulting in progressive soil collapse and sudden surface settlement. These typical accident manifestations indicate that water–soil gushing is not merely a local seepage defect, but a coupled stratum–structure hazard with potentially catastrophic consequences. Efficient prediction of the gushing-induced ground disturbance is therefore of clear practical significance for hazard assessment and preventive maintenance [
6].
Considerable effort has been devoted to understanding leakage- and gushing-induced disturbance around tunnels through field investigation, laboratory testing, and numerical simulation. Previous studies have shown that the evolution of water–soil gushing is governed by strongly coupled hydro-mechanical processes, including seepage-driven particle migration, progressive formation of flow channels, stiffness degradation of the surrounding ground, and stress redistribution between the stratum and tunnel lining [
7,
8,
9]. Physical model tests and case studies have provided valuable insight into the development of cavities, disturbed zones, and settlement troughs, and have clarified the influence of burial depth, leakage position, soil type, and hydraulic conditions. Karoui et al. (2018) [
10] investigated the settlement behaviour of deep soils induced by tunnel sand leakage under both constant and varying hydraulic heads. Liang et al. (2025) [
11] studied the collapse behaviour of composite strata with upper silt and lower sand under tunnel gushing by means of physical model tests. Nevertheless, laboratory tests are constrained by scale effects, simplified boundary conditions, and limited parameter combinations. Numerical simulation has therefore become a principal tool for investigating the mechanism and consequences of gushing hazards. Zheng et al. (2024) [
12] examined the deformation and failure of shield tunnels induced by contact loss using the Coupled Eulerian–Lagrangian method. Zhang et al. (2026) [
13] established a closed-loop governing framework for the transition from suffusion to leakage in seepage erosion based on CFD-DEM simulations and particle-scale upscaling. Other large-deformation methods, such as smoothed particle hydrodynamics (SPH) and coupled DEM–SPH models [
14] and large-deformation finite-element (LDFE) analysis [
15], have likewise been applied to landslide surge waves and tunnelling-induced sinkholes. In this study, the Material Point Method (MPM) is preferred because its material points retain history-dependent state variables (effective stress, pore pressure, and porosity) while the background grid avoids severe mesh distortion and naturally enforces the soil–structure contact. It has been widely applied to a range of geotechnical problems, including landslides [
16], tunnelling [
17], and soil–structure interaction [
18]. These MPM-based studies have demonstrated its capability in reproducing the staged development of water–soil gushing and capturing key response characteristics such as ground settlement and subsurface ground movement.
Despite these advances, numerical simulation is still computationally demanding, especially when a broad influence parameter space must be explored. In practical engineering, decision-makers often require rapid estimates of key response indices under varying conditions, rather than a small number of detailed forward simulations. Repeated simulations for these purposes are costly and are therefore not well suited to emergency evaluation or large-scale parametric assessment. This limitation motivates the introduction of machine-learning surrogate models [
19,
20]. For instance, Zhang et al. (2025) [
20] applied machine-learning models to predict the maximum seismic response of pile-supported structures in liquefiable soils and showed that XGBoost achieved the best overall predictive performance. Compared with empirical correlations and simplified analytical approaches, they are more capable of capturing strong nonlinearity and multi-parameter interaction, both of which are intrinsic to gushing-induced ground disturbance. In recent years, such data-driven approaches have shown considerable promise in underground engineering applications, including deformation prediction [
21], settlement assessment [
19], and parameter inversion [
22].
However, several gaps still remain in the prediction of gushing-induced ground disturbance around shield tunnels. First, most existing studies have focused on numerical simulation of leakage-induced deformation [
23], while an integrated framework linking gushing conditions to both surface and subsurface disturbance characteristics is still lacking. Second, the hydro-mechanical response induced by water-soil gushing is strongly nonlinear and governed by coupled effects of gushing location, soil properties, and so on, yet the applicability and comparative performance of different machine-learning models for this problem have not been systematically clarified. Third, although interpretability is essential for engineering adoption, limited attention has been paid to identifying the governing factors behind different disturbance descriptors and verifying whether the learned relationships are physically meaningful rather than purely statistical.
To address these issues, this study develops an interpretable machine-learning surrogate framework for predicting gushing-induced ground disturbance around shield tunnels, based on a validated two-phase MPM database. The framework links six governing variables to the key surface and subsurface descriptors of disturbance, and combines multi-model comparison with SHAP-based interpretation to achieve both predictive accuracy and physical transparency. The main novelties of this study are summarized as follows. First, in contrast to previous numerical studies on water–soil gushing, which mostly rely on a limited number of computationally expensive forward simulations, a validated two-phase MPM model is used to construct a systematic physics-based database that links gushing conditions to both surface and subsurface disturbance characteristics. Second, soil gushing mass (SGM) is introduced as a physically meaningful intensity variable to quantify the development stage of gushing; compared with empirical correlations that rely mainly on elapsed time or a single volume-loss measure, this enables the strongly nonlinear and coupled hydro-mechanical response to be represented more directly. Third, unlike most existing machine-learning studies on tunnel-induced ground response, which predict surface settlement alone using a single algorithm, five machine-learning models are systematically compared for three complementary descriptors, including maximum ground settlement, flow-zone width, and flow-zone centroid angle, so that the intensity, extent, and geometric pattern of disturbance are predicted simultaneously. Fourth, SHAP-based interpretation is employed to quantify the contribution of each governing variable and to verify that the learned input–output relationships are physically meaningful rather than purely statistical.
3. Machine Learning
Given the high computational cost of the numerical model for large-scale parametric analysis and rapid prediction, machine learning is adopted to establish efficient surrogate models for gushing-induced ground disturbance.
3.1. Candidate Algorithms
3.1.1. Multi-Layer Perceptron
A multi-layer perceptron (MLP), as a class of artificial neural networks, comprises an input layer, one or more hidden layers, and an output layer, as illustrated in
Figure 7a. By iteratively updating the network parameters during training, the MLP establishes a nonlinear function
f: ℝ
m → ℝ
o, in which
m and
o are the input and output dimensions, respectively. For a single-hidden-layer MLP, the output of the
j-th hidden neuron,
hj, is expressed as:
where
xi is the
i-th input variable;
s(·) represents the activation function in the hidden layer;
wj,i and
θj denote the connection weight and bias associated with the
j-th hidden neuron, respectively. The predicted value of the
k-th output neuron,
yk, is given by:
Here, Nh is the number of hidden neurons; wk,j denotes the weight from hidden neuron j to output neuron k; θk is the bias of the output layer; and f(·) represents the activation function at the output layer. Common activation functions include Identity (f(x) = x), Tanh (f(x) = tanh(x)), Logistic (f(x) = 1/(1 + e−x)), and ReLU (f(x) = max{0, x}). Here, ReLU was employed. The model performance was further enhanced by tuning the network architecture, including the number of hidden layers, the number of neurons per layer, and the associated weight and bias parameters.
3.1.2. Random Forest
Random forest (RF) is a supervised ensemble-learning method that can be applied to both classification and regression tasks. It builds a collection of decision trees [
32] based on bootstrap samples and random subsets of input features, which helps suppress overfitting and improve prediction reliability. As illustrated in
Figure 7b, the final output is produced by combining the predictions of individual trees, usually by majority voting or mean averaging:
where
y represents the prediction output;
X = [
x1,
x2, …,
xm]
T; and
N denotes the number of decision trees in the forest.
3.1.3. XGBoost
XGBoost, developed by Chen and Guestrin (2016) [
33], is an efficient tree-based ensemble learning algorithm that has been widely applied in regression and classification tasks. Owing to its high predictive accuracy, computational efficiency, and built-in regularisation, it is particularly suitable for engineering prediction problems. Methodologically, XGBoost is an implementation of gradient boosting decision trees (GBDT) [
34], in which decision trees are added sequentially and each new tree is trained to fit the residuals, i.e., the negative gradients of the loss function. In this study, XGBoost was adopted instead of a conventional GBDT framework because it provides a more efficient and robust solution for large-scale modelling and repeated hyperparameter optimisation. A key advantage of XGBoost is that it incorporates regularisation into the additive tree model to control model complexity and reduce overfitting. The regularised objective function is expressed as:
where
l(·) denotes the training loss function; W(·) is the regularisation term;
T represents the number of leaves in a tree; and
wj is the score of leaf
j. The regularisation strength is controlled by the parameters
g and
l.
At the
t-th boosting iteration, the objective function is formulated as:
By applying a second-order Taylor expansion of the loss around
, the objective can be rewritten in a leaf-wise summation form:
where
Ij denotes the set of samples assigned to leaf
j, and
q(·) is the leaf index function. In addition, XGBoost adopts an efficient greedy strategy to search for tree splits, which, together with the regularized formulation above, contributes to its strong generalization ability and high training efficiency in practice.
3.1.4. Support Vector Regression
SVR is adopted in this study as a kernel-based learning approach for regression prediction [
35]. Built on the structural risk minimization principle, SVR seeks a regression function:
which achieves a trade-off between fitting accuracy and model complexity by introducing an
e-insensitive tolerance in the optimization:
where
w and
b denote the weight vector and bias term, respectively;
m is the number of samples;
K(·) is the selected kernel function,
wi is the coefficient of the
ith sample;
yi is the corresponding target value; and
e represents the tolerance parameter. SVR is chosen here because it can capture the nonlinear coupling between the input factors and the gushing-induced ground responses while maintaining stable generalization through regularization. By introducing kernel functions in place of inner products, the input variables are implicitly projected into a higher-dimensional feature space, where linear regression can be carried out (
Figure 7d). In this study, the radial basis function (RBF) kernel is adopted:
where
g is controlling the kernel width and thus governs the flexibility of the SVR model.
3.2. Ridge Regression Baseline
In addition to the nonlinear ML models in
Section 3.1, a linear regression baseline based on Ridge regression [
36] is introduced for comparison. Ridge regression augments ordinary least squares with an L2 regularisation term, which penalises large coefficients and thus improves numerical stability and generalisation, particularly when input variables exhibit multicollinearity. Given its linear functional form, Ridge provides an interpretable and computationally efficient benchmark; however, its capacity to capture complex nonlinear input–output relationships is inherently limited. Therefore, the performance gap between Ridge and the proposed ML models can be used to quantify the added value of nonlinear learning for predicting gushing-induced ground disturbance.
3.3. Hyper-Parameter Tuning
The Appropriate hyper-parameters are essential for achieving reliable predictive performance of classical machine-learning models. In this study, a unified hyper-parameter optimisation framework based on Optuna [
37] is adopted for all surrogate models, as shown in
Figure 8. Optuna employs the Tree-structured Parzen Estimator (TPE) sampler, a Bayesian optimisation method that updates a probabilistic surrogate of the objective using accumulated trial results and proposes new hyper-parameter configurations with higher expected improvement. This strategy is well suited to mixed search spaces that commonly arise in classical machine-learning models. The optimisation objective is defined as the validation mean squared error (MSE), and the optimal hyper-parameter set is subsequently used for model retraining and final testing. The universal workflow of Optuna used in this study can be summarised in five steps:
(1) Step 1: Define the objective function (validation MSE), the hyper-parameter search space, and the maximum number of trials Nt. The dataset is first divided at the group level into a training set and an independent testing set, and a fixed random seed is specified for reproducibility.
(2) Step 2: For each trial, Optuna suggests a candidate hyper-parameter set li, trains the corresponding model on the training subset, and evaluates its performance on the validation subset.
(3) Step 3: Compute the trial loss
Li and update the current best objective value as its fitness
TL:
(4) Step 4: Repeat Steps (2)–(3) until Nt is reached. For models equipped with built-in early stopping (e.g., XGB boosting rounds or MLP training), early stopping is activated during training to reduce overfitting and unnecessary computations.
(5) Step 5: Select λ* corresponding to the minimum validation loss and retrain the model using the optimal hyper-parameters, followed by final evaluation on the independent test set. The trial history is recorded for further analysis (e.g., optimisation curves and best-trial statistics).
With the assistance of Optuna, we develop a hybrid algorithm termed “Optuna-ML” to calibrate the hyper-parameters of the adopted machine-learning models for predicting gushing-induced ground disturbance. For each Optuna-ML algorithm, the key hyper-parameters to be tuned are specified together with their data types and searching ranges. The selection of these critical hyper-parameters and their bounds is determined according to modelling experience and widely used practices in previous machine-learning studies.
Before running the optimisation, several Optuna settings are predefined to ensure a consistent and reproducible tuning process. Specifically, the sampler is set to a TPE-based Bayesian optimiser with a fixed random seed, and the maximum number of trials is limited to Nt as the termination condition. During optimisation, Optuna iteratively proposes candidate hyper-parameter sets, trains the corresponding model on the training subset, and evaluates its performance on the validation subset using the MSE as the objective value. Once the trial budget is exhausted, the best trial is selected as the optimal result and is used for subsequent model retraining and testing.
3.4. Auxiliary Methods
3.4.1. Group k-Fold Cross-Validation
To enhance the generalisation ability of the prediction model while avoiding information leakage caused by correlated samples, Group k-fold cross-validation (GroupKFold) is adopted on the model-development set in this study. Unlike standard k-fold cross-validation [
38], which randomly splits individual samples, GroupKFold partitions the data according to a predefined group label. The model-development set is divided into
k folds at the group level, such that samples belonging to the same group are always assigned to the same fold. In each iteration,
folds are used for training and the remaining fold is used for validation, and the procedure repeats
k times so that each fold serves as the validation set once. The overall cross-validation performance is taken as the average of the fold-wise errors, expressed as:
where MSE
i denotes the validation error of the
i-th fold. By enforcing group-wise separation, GroupKFold provides a more realistic assessment of model performance when samples within the same group share similar boundary conditions or physical settings, which is consistent with the data structure in this work.
3.4.2. Evaluation Metrics
To quantitatively evaluate the predictive performance of the proposed models, four evaluation indices are adopted in this study, including the root mean square error (RMSE), mean absolute error (MAE), symmetric mean absolute percentage error (sMAPE), and goodness of fit (R
2). Their computational definitions are given as follows:
where
m is the number of samples,
and
denote the measured (true) and predicted values of the
i-th sample, respectively;
is the mean of the measured values; and
e is a small constant introduced to avoid division by zero in sMAPE. In general, smaller RMSE/MAE/sMAPE and a larger R
2 indicate better model performance.
3.4.3. SHAP-Based Interpretation
To enhance the transparency and interpretability of the proposed machine-learning models, the SHapley Additive exPlanations (SHAP) method is employed to quantify the contribution of each input feature to the model predictions. SHAP is grounded in cooperative game theory and attributes the prediction of a given sample to individual features by computing Shapley values, thereby providing a consistent and locally accurate explanation. For a trained model, the prediction can be decomposed as:
where
x is the input feature vector with
p features,
f0 denotes the base value (i.e., the expected model output over the background dataset), and
fj represents the SHAP value of the j-th feature. A positive
ϕj indicates that the feature increases the prediction relative to the base value, whereas a negative
ϕj implies an opposite effect. By aggregating SHAP values across all samples (e.g., using the mean absolute SHAP value), the global importance ranking of features can be obtained, while dependence plots further reveal the nonlinear influence patterns and interaction trends among key variables. This SHAP-based interpretation facilitates identifying the dominant factors governing gushing-induced ground disturbance and supports a mechanistic understanding of the data-driven models.
5. Results and Discussion
5.1. Optimal Hyper-Parameters
Using Optuna to automatically search the hyper-parameter space, the optimisation trajectories of the four candidate ML models are visualised in
Figure 13, where the fitness
TL is tracked against the trial number. Overall, all models reach a stable fitness plateau well before the pre-defined termination at 50 iterations, indicating that Optuna can efficiently identify near-optimal configurations for this dataset. A clear distinction is observed in the convergence behaviours of the four Optuna-based optimisers. Among them, Optuna-SVR shows the most pronounced early-stage improvement across all three targets, with the objective value dropping sharply within the first few iterations and then rapidly reaching a plateau. This pattern indicates that SVR is highly sensitive to hyper-parameter selection, and that appropriate tuning can quickly shift the model from a poor initial configuration to a near-optimal solution. In contrast, Optuna-RF exhibits a comparatively smooth and flat convergence trajectory, with only limited reductions in the objective value throughout the search process, suggesting that RF is less reliant on fine hyper-parameter adjustment and remains relatively robust within a reasonable parameter range.
The other two models display intermediate behaviours. Optuna-MLP generally converges in a progressive manner, with the objective value decreasing steadily before stabilising after approximately 15–30 iterations. This trend reflects the continuous search landscape associated with neural-network hyper-parameters and the gradual refinement of network architecture and training settings. By comparison, Optuna-XGBoost shows a more stepwise convergence pattern, characterised by several discrete reductions at specific iterations prior to stabilisation. Such behaviour is consistent with the tree-based boosting mechanism, in which changes in key hyper-parameters may produce non-continuous gains in predictive performance.
It is also evident that the final converged objective values vary across the three output targets, indicating that the relative suitability of the models is output-dependent. For instance, RF achieves the lowest objective value for flow zone centroid angle, whereas XGBoost performs best for flow zone width, while SVR and RF remain particularly competitive for maximal ground settlement. Once the optimisation curve becomes essentially stationary, the corresponding hyper-parameter combination is taken as the optimal setting and is used to retrain the final model. The resulting optimal hyper-parameters for all models are listed in
Table 3 for reproducibility and subsequent comparison.
It should be noted that the hyperparameters were selected using only the GroupKFold validation MSE on the development set, whereas the independent scenario-level test set was reserved exclusively for the final evaluation and was never used during tuning. To examine the robustness of the optimization, the tuning–retraining–evaluation procedure was repeated under five random seeds and with two validation objectives (MSE and MAE); the results are summarized in
Table 4. The model ranking was preserved in all cases. RF was essentially invariant (test-RMSE coefficient of variation ≤ 0.9%), whereas MLP and XGBoost showed only small-to-moderate seed-to-seed variation (CV ≤ 8.7%) arising from stochastic weight initialization and boosting; switching the objective from MSE to MAE changed the test RMSE by at most 9.7% without altering the ranking. These results confirm that the conclusions are robust to the random seed and the optimization objective.
5.2. Performance of ML Models
Using the optimal hyperparameters summarised in
Table 3, the Optuna-ML models are developed for prediction. Model performance is evaluated according to the metrics defined in
Section 3.4.2. As illustrated in
Figure 14, the predictive accuracy of the candidate models is compared using three complementary metrics, namely MAE, RMSE, and sMAPE. A highly consistent trend is observed across all tasks: the nonlinear models (MLP, RF, and XGBoost) markedly outperform the linear baseline (Ridge), and the model ranking implied by the three metrics is largely coherent. In particular, Ridge yields the largest MAE/RMSE and the highest sMAPE in all three outputs, indicating that the gushing-induced ground disturbance is strongly nonlinear with pronounced feature coupling, and therefore cannot be adequately characterised by a purely linear regression model.
Across the three prediction targets, a consistent performance hierarchy can be observed. For the maximum ground settlement, Sm, MLP, RF, and XGBoost all maintain relatively low MAE and RMSE, with sMAPE remaining in a narrow range, indicating stable predictive capability for settlement response, whereas SVR and especially Ridge show markedly larger errors and much higher sMAPE, suggesting limited generalisation. For the centroid angle of flow zone θf, the three nonlinear models again outperform SVR and Ridge, with relatively low and stable errors, indicating their stronger ability to capture the nonlinear evolution of flow-zone geometry. A similar trend is observed for df, where MLP, RF, and XGBoost remain within the lower error range, while SVR and Ridge perform noticeably worse. Overall, MLP, RF, and XGBoost all demonstrate favourable predictive performance across the three outputs, whereas Ridge, as the linear baseline, is consistently inferior, further supporting the use of nonlinear surrogate models for predicting gushing-induced ground disturbance.
Figure 15,
Figure 16 and
Figure 17 present a comparison between the predicted and actual values of the five models, with the 45° line representing perfect prediction. The coefficient of determination R
2 is also reported in each model. For
Sm, MLP, RF, and XGBoost generate point clouds that closely follow the 1:1 line, indicating agreement over the full response range. SVR also achieves a high R
2, but with slightly greater scatter and mild deviation at the extremes. By contrast, Ridge shows a wider dispersion and a lower R
2, indicating clear underfitting. A similar pattern is observed for
θf and
df, where MLP, RF, and XGBoost again maintain strong agreement with the reference line, while SVR exhibits increased dispersion and Ridge performs substantially worse, with segmented prediction patterns that reflect limited ability to capture the nonlinear evolution of flow-zone geometry. Overall, the parity plots consistently indicate that MLP provides accurate and reliable surrogate predictions across all three outputs, whereas SVR is slightly less stable and Ridge is unable to adequately represent the strong nonlinearity and feature coupling involved in gushing-induced ground disturbance.
To exploit the complementary strengths of the models, a validation-error-weighted average and a non-negative linear stacking model of MLP, RF, and XGBoost were constructed on the out-of-fold development-set predictions and evaluated on the independent test set (
Table 5). The weighted average did not surpass the best single model, whereas the non-negative stacking ensemble matched or slightly improved on the best single model (MLP) for all three targets, providing the most reliable overall predictor.
5.3. Distribution of Prediction Errors
To further evaluate the reliability of different predictors beyond the aggregated metrics, the prediction residuals are post-processed in terms of the absolute percentage error (APE), defined as
Following common engineering tolerance criteria, the APE is classified into five intervals, i.e., 0–5%, 5–10%, 10–15%, 15–20%, and >20%. The corresponding frequencies for all models are summarised in
Figure 18 for the three outputs.
A clear and consistent trend is observed across all three targets. The nonlinear models, particularly RF and MLP, concentrate most samples in the low-error ranges (APE < 10%), with a dominant proportion falling within 0–5%, indicating both high predictive accuracy and strong robustness. XGBoost shows a similar but slightly less favourable distribution. In contrast, Ridge exhibits the weakest performance, characterised by a reduced proportion of low-error samples and the highest frequency of APE > 20%, reflecting its limited ability to capture the nonlinear gushing-induced disturbance. SVR generally lies between these two groups but still retains a noticeable medium-to-high error tail.
From an engineering prediction perspective, the practical value of a surrogate model depends not only on its average accuracy, but also on its ability to suppress large-error cases. In this respect, MLP and RF demonstrate the most favourable overall behaviour, combining a high proportion of low-APE predictions with a low occurrence of unacceptable errors. These distributional characteristics are consistent with the MAE, RMSE, and sMAPE results and further confirm the advantage of nonlinear machine-learning models for predicting gushing-induced ground disturbance.
5.4. Global Feature Importance by SHAP
SHAP values are employed to interpret the trained models and quantify the relative contribution of each input variable to the three target responses.
Figure 19 shows the global feature importance measured by the mean absolute SHAP value, mean(|SHAP|). A broadly consistent ranking is observed across the five models, indicating that the identified controlling factors are robust and physically meaningful.
For Sm, the gushing mass SGM is the dominant predictor, followed by tunnel depth H/D, showing that settlement magnitude is governed mainly by gushing severity and overburden confinement. Soil modulus E has only a secondary effect, whereas k and f contribute relatively little within the investigated ranges. For df, SGM and the gushing location are the two most influential variables, indicating that the scale of the disturbed zone is jointly controlled by gushing intensity and leakage geometry, with smaller contributions from H/D and soil properties. For qf, the gushing location is the dominant factor, while SGM and H/D provide noticeable secondary effects through their influence on flow-zone morphology. Overall, the SHAP results are consistent with the hydro-mechanical behaviour of gushing-induced ground disturbance, suggesting that the models have captured physically meaningful relationships.
6. Conclusions
This study developed an interpretable machine-learning surrogate framework for predicting gushing-induced ground disturbance around shield tunnels by integrating a validated two-phase MPM database with data-driven modelling. Within the scope of this work, the following conclusions can be drawn:
- (1)
A physics-based numerical database containing 39,810 samples was established using the validated two-phase MPM framework. By selecting the maximum ground settlement, flow zone centroid angle, and flow zone width as output variables, the database characterises gushing-induced disturbance from both surface and subsurface perspectives, thereby providing a physically meaningful basis for surrogate modelling.
- (2)
Among the five candidate algorithms, the nonlinear models consistently outperform the linear Ridge baseline for all three targets, confirming that the mapping between the governing factors and gushing-induced disturbance is strongly nonlinear. In particular, MLP, RF, and XGBoost achieve the best overall predictive accuracy, whereas Ridge shows clear underfitting and SVR exhibits relatively weaker robustness.
- (3)
In terms of comprehensive performance, MLP and RF are the most reliable models. They not only achieve low MAE, RMSE, and sMAPE values, but they also produce the most favourable error distributions, with a high proportion of predictions concentrated in the low-error intervals and a low frequency of large-error cases. This indicates that these two models are more suitable for practical engineering prediction where robustness is as important as average accuracy.
- (4)
SHAP-based interpretation demonstrates that the learned input–output relationships are physically consistent. For maximum ground settlement, the soil gushing mass SGM is the dominant controlling factor, followed by tunnel depth ratio H/D, indicating the primary roles of gushing severity and overburden confinement. For flow zone width, SGM and gushing location jointly govern the scale of the disturbed zone. For flow zone centroid angle, leakage location is the dominant factor, while SGM and H/D provide measurable secondary effects through their influence on flow-zone morphology.
From an engineering standpoint, the proposed surrogate framework provides an efficient alternative to repeated high-cost numerical simulations and can support several practical tasks. First, it enables pre-event scenario evaluation: by sweeping the governing variables (burial depth ratio, leakage location, and soil parameters), engineers can identify tunnel sections and ground conditions that are most susceptible to severe gushing-induced disturbance before any incident occurs. Second, once leakage information becomes available during an incident, the model allows rapid assessment of gushing severity, translating an estimated soil gushing mass into the expected maximum settlement and flow-zone geometry within seconds rather than hours of simulation. Third, the predicted flow-zone width and centroid angle provide a preliminary identification of the subsurface disturbance zone—its lateral extent and preferential propagation direction—which can guide targeted monitoring, prioritisation of inspection, and emergency-response planning. These capabilities make the framework suitable for risk-informed decision-making and for integration into smart-city underground infrastructure management systems.
It should nevertheless be emphasized that the present framework must be interpreted within its modelling scope. First, the database is generated from two-dimensional plane-strain MPM simulations, so three-dimensional effects such as longitudinal gushing development and spatial arching are not captured. Second, the ground is idealized as homogeneous sandy soil with simplified boundary conditions and a prescribed leakage-channel location and width, rather than layered or composite strata with an explicitly modelled waterproof-system failure. Third, the surrogate models are trained and validated against MPM results, and direct calibration with field monitoring data is still limited. Consequently, the trained models should be applied primarily within the investigated parameter ranges and to similar geological and geometric settings. Future work will incorporate field monitoring data for direct calibration, extend the database to three-dimensional and heterogeneous-stratum conditions and different tunnel geometries through transfer learning and adaptive retraining, introduce variables describing the waterproof-system state, and add uncertainty quantification, so as to broaden the applicable scope of the framework for practical tunnel engineering. Z.G. C.