3.1. Overall Prediction Performance of the Proposed Model
To evaluate the overall predictive capability of the proposed ensemble model on the test set, this section first analyzes the model performance from two aspects: prediction consistency and residual distribution. The actual-versus-predicted plot is used to determine whether the model can capture the overall variation trend of the target variable, while the residual plot is further employed to identify whether the model exhibits obvious systematic overestimation, underestimation, or structural errors that vary with the predicted values.
The overall predictive behavior of the proposed ensemble model was examined by comparing the actual and predicted values of the fracture asymmetry index η. As shown in
Figure 4a, most sample points lie close to the 1:1 reference line, indicating that the ensemble model can capture the general variation trend of fracture asymmetry within the investigated simulation space. The samples are mainly concentrated in the near-zero region, where the predicted values are generally consistent with the actual values; the coefficient of determination
reaches 0.8484 (test set, n = 52), indicating that the model provides a stable approximation within the main range of the target variable.
However, several sample points deviate from the 1:1 reference line, especially near the margins of the sample distribution. These deviations indicate that local prediction errors still exist for some individual simulation cases. Under the current small-sample condition, this is reasonable, as the available training data may not fully cover all local response patterns of fracture asymmetry.
The residual distribution is further shown in
Figure 4b. Most residuals are scattered around the zero-error line, and no obvious monotonic trend or systematic pattern is observed as the predicted values increase. This indicates that the model does not exhibit a clear overall tendency toward overestimation or underestimation. Although a few residuals are relatively large, they are limited in number and do not dominate the overall error distribution.
Overall, the actual-versus-predicted plot and the residual plot indicate that the proposed ensemble model can reasonably reproduce the simulated fracture asymmetry index η and maintain acceptable prediction stability within the predefined parameter space.
3.2. Expanded Model Comparison
To further evaluate the effectiveness of the proposed ensemble model, several representative baseline models and advanced regression models were selected for extended comparison. The compared models cover simple linear models, regularized linear models, kernel-based models, probabilistic models, and tree-based nonlinear models. By including models with different assumptions and function approximation capabilities, this comparison provides a more comprehensive evaluation of the proposed model in terms of nonlinear relationship capture, interaction-feature utilization, and prediction stability under small-sample conditions.
Figure 5 presents the summarized predicted-versus-actual scatter plots of different models, where “interactions” denotes the second-order interaction features constructed from well spacing, discharge rate, and natural fracture angle. LR, RF, GPR, GBRT, and SVR represent linear regression, random forest, Gaussian process regression, gradient boosted regression trees, and support vector regression, respectively. SVR-RBF denotes support vector regression using the radial basis function kernel. The proposed ensemble model consists of GBRT, RF, and Ridge regression.
As shown in
Figure 5, the proposed ensemble model achieves the best overall performance, with the highest
value of 0.874. These results indicate that, among all compared models, the proposed model provides the most accurate approximation of the fracture asymmetry index.
Table 5 below presents a summary of the specific predictive performance values of the aforementioned compared models.
As shown in
Table 5, the proposed ensemble model achieved the best overall performance among all compared models, with an R
2 of 0.874, RMSE of 0.0079, NRMSE of 0.0508, and MAE of 0.0044. These results indicate that the proposed ensemble model provides the most accurate approximation of the fracture asymmetry index. The GBRT model also achieved competitive performance, with an R
2 of 0.866, RMSE of 0.0082, NRMSE of 0.0524, and MAE of 0.0037, confirming its ability to capture nonlinear relationships in the current dataset. By integrating the complementary strengths of GBRT, RF, and Ridge regression, the ensemble model further improved the overall R
2, RMSE, and NRMSE. GPR also showed strong predictive performance, with an R
2 of 0.842, RMSE of 0.0089, NRMSE of 0.0570, and MAE of 0.0044. This result is consistent with the suitability of Gaussian process regression for small-sample regression problems. Nevertheless, its performance in terms of R
2, RMSE, and NRMSE was still lower than that of the proposed ensemble model, indicating that combining multiple learners can provide additional robustness. RF achieved an R
2 of 0.821, showing that tree-based ensemble learning can capture part of the nonlinear coupling among well spacing, discharge rate, and natural fracture angle. SVR-RBF achieved an R
2 of 0.780, but its prediction accuracy remained lower than that of the proposed ensemble model, GBRT, GPR, and RF.
In contrast, the linear and regularized linear models produced substantially lower prediction accuracy. To compare the effect of interaction features, a basic LR model without interaction features was designed, achieving an R2 of 0.421; an LR model with interaction features was also implemented, which slightly improved the R2 to 0.446. Ridge, Elastic Net, and Lasso showed similar performance, with R2 values of 0.437, 0.429, and 0.429, respectively. These results indicate that, although second-order interaction features can provide additional information, linear models are still insufficient to describe the nonlinear relationship between engineering and geological parameters and fracture asymmetry. Therefore, nonlinear learners are necessary for this prediction task.
The performance of LightGBM and XGBoost was also lower than that of the proposed ensemble model. XGBoost achieved an R2 of 0.738, while LightGBM achieved an R2 of 0.722. This may be related to the limited sample size, under which highly flexible boosting models may not fully exploit their advantages and can become sensitive to the training data distribution. Overall, the comparison demonstrates that the proposed ensemble model achieves a better balance among nonlinear fitting capability, prediction robustness, and model stability. The results also confirm the necessity of combining nonlinear learners with regularized components under small-sample conditions.
3.3. Robustness and Generalization Analysis
Although the above results indicate that the proposed ensemble learning framework can predict fracture asymmetry reasonably well, the reliability of a small-sample surrogate model cannot be judged solely by the accuracy metrics obtained from a single training–testing split. For limited sample data, prediction results may be jointly affected by the data splitting method, model complexity, coverage of the training samples, and prediction uncertainty. In particular, after introducing data augmentation and nonlinear ensemble learning, a high test accuracy may arise either from effective feature learning or from local sample distribution artifacts and model overfitting. Therefore, further examination of the model is required from the perspectives of robustness, complexity control, generalization to different operating conditions, and uncertainty characterization.
Based on the above considerations, this section presents the analysis from four aspects. First, the sensitivity of the model to random data splitting is evaluated through repeated validation to assess whether the prediction performance is statistically stable. Second, a regularization control experiment is conducted to examine the influence of model complexity on prediction results, thereby distinguishing effective fitting from potential overfitting. Third, a leave-one-simulation-out cross-validation is performed to further test the extrapolation capability of the model under unseen operating conditions, evaluating its generalization potential with respect to changes in engineering parameter combinations. Finally, uncertainty assessment is introduced to analyze the error distribution and prediction interval characteristics beyond point predictions, providing supplementary evidence for evaluating model reliability under small-sample conditions.
3.3.1. Repeated Validation
To examine the sensitivity of model performance to random data splitting, a repeated validation analysis was conducted. In small-sample regression tasks, different combinations of training and test samples may lead to noticeable fluctuations in evaluation metrics; therefore, the results from a single training–test split are insufficient to fully characterize the statistical reliability of the model. To reduce the uncertainty caused by such data splitting, multiple random splits of the original samples were performed. In each repetition, the processes of feature construction, training sample augmentation, and model training were all re-executed independently. Meanwhile, the test samples were consistently kept as the original unaugmented data to avoid information leakage.
Based on the results of 50 repeated validations, the prediction performance was statistically summarized using , RMSE, NRMSE, and MAE. The mean, standard deviation, and 95% confidence interval of each evaluation metric were further calculated to simultaneously quantify the average prediction accuracy of the model and its degree of fluctuation under different random splits. If these metrics remain stable across multiple splits, it indicates that the model performance is not dominated by any single specific sample split and that the model has good robustness under repeated data splitting conditions.
As shown in
Table 6, the model achieved an average
of 0.8484 over 50 repeated validations, with a 95% confidence interval of [0.8179, 0.8790]. Meanwhile, the mean values of NRMSE, RMSE, and MAE were 0.0687, 0.0079, and 0.0043, respectively, indicating that the model maintained a low prediction error overall under repeated random splits. Furthermore, the standard deviation of
was 0.1075, reflecting that the model performance is still inevitably affected by data splitting under the small-sample condition. Nevertheless, the confidence interval remains at a consistently high level, suggesting that the proposed prediction framework exhibits good robustness to data splitting.
3.3.2. Regularization-Control Experiment
For small-sample surrogate models, performance improvement brought by data augmentation does not necessarily equate to enhanced generalization ability. Since Gaussian perturbations mainly generate local samples in the neighborhood of the original samples, their effect is closer to local densification of the existing sample distribution rather than the introduction of new physical information or additional engineering scenarios. Therefore, it is necessary to further determine whether the accuracy improvement of the final model can be reproduced by conventional complexity control strategies on the original data. Based on this, a regularization control experiment on the original data was set up, as described in this section. Without employing Gaussian data augmentation, shrinkage regularization models such as RidgeCV, LassoCV, and ElasticNetCV were introduced as controls to analyze the relative contributions of regularization constraints, data augmentation, and interaction features to prediction performance improvement. In this experiment, no Gaussian data augmentation or interaction feature expansion was applied; instead, several regularization models with shrinkage constraints were constructed based solely on the original data as the control group. This design distinguishes the regularization effect within the original sample space from the effects of JS-divergence-constrained data augmentation and nonlinear interaction features, thereby providing a clearer understanding of the sources of model performance improvement.
Among these cases, Case A is the baseline model using only the original data. Cases B1–B3 introduce RidgeCV, LassoCV, and ElasticNetCV, respectively, on the basis of the original data. RidgeCV shrinks the model coefficients via an L2 penalty to reduce coefficient instability; LassoCV imposes sparsity constraints and variable selection through an L1 penalty; ElasticNetCV combines L1 and L2 penalties to balance feature selection and coefficient shrinkage. Thus, Cases B1–B3 serve as the regularization control group under purely original data conditions. In contrast, Case E represents the modeling strategy adopted in this paper, namely JS-divergence-constrained Gaussian data augmentation combined with second-order interaction features. Since Case E does not introduce new physical variables or additional simulation scenarios, the focus of this comparison is on determining whether the final accuracy improvement can be reproduced solely by conventional regularization methods on the original data.
As shown in
Table 7, regularization processing of the original data improved the model’s predictive performance to some extent. Compared with Case A, Case B1 (Original data + RidgeCV) increased the average R
2 from 0.6125 to 0.6782 while reducing the RMSE from 0.0186 to 0.0168. LassoCV and ElasticNetCV yielded moderate improvements, with average R
2 values of 0.6416 and 0.6539, respectively. This indicates that shrinkage regularization can partially alleviate the instability inherent in small-sample fitting, but performance gains remain limited when modeling solely with the original data.
Compared with the original-data regularization control group, Case E exhibits superior predictive performance across all evaluation metrics. Its average R2 reaches 0.8484, with a 95% confidence interval of 0.8179–0.8790; the RMSE, NRMSE, and MAE are 0.0079, 0.0687, and 0.0043, respectively. Relative to Case B1, the best-performing original-data regularization model, Case E achieves an average R2 increase of about 25.1%, while RMSE and MAE are reduced by approximately 53.0% and 65.6%, respectively. This suggests that the performance improvement does not stem solely from coefficient shrinkage or conventional regularization on the original data but rather benefits further from local sample augmentation under JS-divergence constraints, second-order interaction feature representation, and the nonlinear mapping capability of ensemble learning.
From a mechanistic perspective, the JS-divergence-constrained Gaussian data augmentation does not introduce new physical variables, additional simulation scenarios, or external engineering information. Its function is to generate controlled perturbed samples in the neighborhood of the original samples and to use distributional similarity constraints to prevent the augmented samples from deviating significantly from the original data distribution. Therefore, this augmentation process does not create new physical information but rather increases the local coverage density within the limited sample space. Meanwhile, second-order interaction features explicitly represent the coupling among well spacing, injection rate, and natural fracture angle, while ensemble learning reduces the sensitivity of a single model to local sample noise by blending diverse base learners. If the accuracy improvement mainly came from an interpolation structure made to more easily fit the augmented data, then the regularization models based on the original data (Cases B1–B3) should have achieved comparable performance. However, their improvements are clearly limited, indicating that simple shrinkage regularization cannot reproduce the predictive performance of Case E.
Thus, this comparison demonstrates that the performance gain of the final model does not merely arise from conventional regularization or the convenience of fitting due to altered data morphology, but more likely from the combined effect of distribution-controlled data augmentation, expression of parametric interaction relationships, and the stabilization mechanism of ensemble learning. Subsequent leave-one-simulation-scenario validation will further test whether this performance improvement extends to unseen simulation conditions, thereby providing a more rigorous assessment of the model’s cross-scenario generalization ability.
3.3.3. Leave-One-Simulation-Condition-Out Validation
In
Section 3.3.1, repeated random partitioning validation was used to evaluate the sensitivity of model performance to different splits of training and test sets. The results showed that the proposed model maintains relatively stable predictive accuracy under different random splits. However, the training and test sets under random partitioning may still contain similar combinations of simulation conditions, so the results mainly reflect the model’s prediction stability within the distribution of the existing samples. To further examine the model’s generalization ability to unseen original simulation conditions, this section adopts a leave-one-simulation-condition-out validation.
In this validation, one specific simulation condition is held out from the original simulation database each time as an independent test sample. This condition consists of a particular combination of well spacing, injection rate, and natural fracture angle. The remaining original conditions are used as the training basis, and only under these training conditions are JS-divergence-constrained data augmentation and feature construction performed. The held-out test condition does not participate in data augmentation, feature construction, or model training and remains as an unaugmented original sample. If the original database contains N simulation conditions, the process is repeated N times so that each original condition sequentially serves as an independent test sample. This procedure prevents augmented samples generated from the same original condition from entering both the training and test sets, thereby reducing the risk of information leakage due to data augmentation.
This validation method holds out one specific combination of conditions at a time. This setup better matches the structure of the current small-sample database and, while preserving the size of the training set, allows testing of the model’s predictive ability under original conditions not involved in training. Finally, R2, RMSE, NRMSE, and MAE are used to evaluate the overall predictive performance of the model across all held-out conditions.
As shown in
Table 8, under this leave-one-simulation-condition-out validation, the Ensemble model achieves the best overall performance, with R
2, RMSE, NRMSE, and MAE of 0.58, 0.0142, 0.1235, and 0.0100, respectively. The prediction accuracies of RF and GBRT are slightly lower than that of the ensemble model, indicating that nonlinear tree models can still capture some of the asymmetric fracture response patterns. In contrast, the Ridge model yields an R
2 of 0.47, with RMSE and MAE of 0.0158 and 0.0112, respectively, suggesting that while the linear regularized model exhibits some stability, its ability to characterize nonlinear responses under varying complex conditions is relatively limited.
These results further indicate that the ensemble model maintains reasonably good predictive capability under strictly unseen condition validation, but its accuracy is notably lower than the previously reported predictive performance. This shows that JS-divergence-constrained data augmentation and interaction features can improve local learning stability within the sampled parameter space but cannot fully replace the physical response information provided by adding new real simulation conditions. Therefore, for boundary conditions or parameter combinations with strong response variations, further calibration using additional numerical simulations or uncertainty analysis is still necessary.
3.3.4. Uncertainty Assessment
The repeated validation described above provides the mean, standard deviation, and 95% confidence intervals of evaluation metrics such as R2, RMSE, NRMSE, and MAE over 50 random splits, which are used to characterize the statistical stability of the overall model predictive performance. However, the object of such confidence intervals is the evaluation metrics themselves, not the possible range of variation in individual sample predictions. Therefore, even if the model exhibits a high average R2 and low error levels under repeated validation, it is still necessary to further analyze the uncertainty of individual prediction results.
To this end, this section introduces prediction intervals as a supplementary evaluation of the reliability of model outputs. A prediction interval indicates the possible range of values that the model gives for a prediction on a given sample. A wider interval generally implies greater uncertainty for that sample’s prediction, whereas a narrower interval indicates a more concentrated prediction range, though an overly narrow interval may fail to cover the true response. PI coverage denotes the proportion of samples for which the true value falls within the prediction interval and is used to assess the coverage capability of the prediction interval. Mean PI width is the average width of the prediction intervals, and PINAW is the ratio of the average prediction interval width to the range of the target variable, serving as a measure of the relative width of the prediction intervals. In general, a higher PI coverage indicates that the prediction interval more adequately encloses the true response, while a lower PINAW indicates a relatively narrower interval; however, the two need to be analyzed together, because an overly narrow interval, despite having a low PINAW, may lead to insufficient coverage.
This paper adopts two methods for prediction interval estimation: Bootstrap ensemble and Quantile GBRT. The Bootstrap ensemble repeatedly resamples the training data with replacement and retrains the model on each resampled dataset, thereby obtaining a set of predictions for the same sample; the dispersion of this set of predictions can be used to construct empirical prediction intervals. Thus, the Bootstrap ensemble primarily reflects the prediction fluctuation of the model under sample perturbations. Quantile GBRT, on the other hand, employs quantile regression to estimate the lower and upper quantiles of the target response, directly yielding a prediction interval at a given confidence level. Unlike the Bootstrap ensemble, which relies on the prediction distribution from resampling, Quantile GBRT focuses more on characterizing the upper and lower bounds of the conditional response distribution. Therefore, these two methods provide complementary perspectives for evaluating the credible range of model outputs under small-sample conditions.
Table 8 presents a comparison of point prediction and prediction-interval performance.
Table 9 summarizes the point prediction accuracy and prediction interval reliability under five-fold cross-validation on the augmented data. Point ensemble achieves the highest deterministic prediction accuracy, with an R
2 of 0.8637, RMSE of 0.0082, NRMSE of 0.0528, and MAE of 0.0058. These numerical values of the evaluation metrics lie within the intervals obtained from repeated validation, lending credibility to the results. This indicates that the constructed ensemble model can approximate the values of the fracture asymmetry indicator reasonably well. However, Point ensemble provides only a single predicted value for each sample and cannot directly reflect the uncertainty range of individual predictions; therefore, its reliability needs to be further evaluated using prediction intervals.
Bootstrap ensemble also maintains high point prediction accuracy, with an R2 of 0.8290 and an NRMSE of 0.0592. However, its 95% prediction interval coverage is only 0.3594, while the mean PI width and PINAW are 0.0123 and 0.0792, respectively. This indicates that the prediction intervals given by Bootstrap ensemble are rather narrow, but the true values of many validation samples fall outside these intervals. Therefore, under the current small-sample conditions, constructing prediction intervals solely from the dispersion among Bootstrap resampling models may underestimate the uncertainty of model predictions. In contrast, the point prediction accuracy of the Quantile GBRT 95% prediction interval is lower than that of Point ensemble, with an R2 of 0.6168, RMSE of 0.0138, NRMSE of 0.0886, and MAE of 0.0078, but its prediction interval coverage reaches 0.9102. The corresponding mean PI width and PINAW are 0.0515 and 0.3306, respectively, indicating that this method provides wider and more conservative uncertainty ranges. This result suggests that, under conditions of limited sample size and nonlinear fluctuations in the local response, wider prediction intervals can more adequately encompass the true responses of the validation samples, thereby avoiding overly optimistic interpretations of the model predictions.
Figure 6 further shows the 95% prediction intervals obtained by the Quantile GBRT method. The validation samples are sorted by the actual values of the fracture asymmetry indicator
, where the blue curve represents the actual values, the orange curve represents the predicted values, and the light blue shaded area represents the 95% prediction intervals. It can be seen that most of the actual values fall within the prediction intervals, which is consistent with the actual coverage (95% PI coverage = 0.9102) of Quantile GBRT in
Table 8. The mean prediction interval width with this method is 0.0515 (PINAW = 0.3306), indicating that it covers about 91.02% of the true responses of the validation samples through a relatively conservative interval range. In contrast, the PINAW of the Bootstrap ensemble is only 0.0792, but its coverage is only 0.3594, indicating that its prediction intervals are too narrow and underestimate the prediction uncertainty under small-sample conditions.
Overall, Point ensemble provides high-accuracy point predictions, while Quantile GBRT can provide more reliable sample-level uncertainty ranges. Therefore, this paper not only uses R2, RMSE, NRMSE, and MAE to evaluate the deterministic prediction accuracy of the model but also incorporates prediction interval information to further assess the reliability of the model in predicting fracture asymmetry.
3.4. Interpretability and Feature-Importance Robustness Analysis
3.4.1. Correlation and Multicollinearity Diagnostics
After evaluating the model’s prediction accuracy and prediction uncertainty, it is still necessary to further analyze the mechanism of feature influence underlying the model’s predictions. The previous results show that the proposed model can predict the fracture asymmetry indicator (η) well, but the error metrics and prediction intervals alone cannot explain how different input variables affect the model output. Therefore, this section proceeds to analyze feature relationships and model interpretability to reveal the contributions of well spacing, injection rate, natural fracture angle, and their interactions to fracture asymmetry. To avoid interference from input variable correlation or multicollinearity in the subsequent feature contribution analysis, a correlation and multicollinearity analysis of the input variables is first conducted.
Figure 7 below presents the Pearson correlation coefficients and Spearman rank correlation coefficients among well spacing, injection rate, and natural fracture angle. Pearson correlation mainly reflects the degree of linear correlation between variables, while Spearman rank correlation further determines whether a monotonic relationship exists between variables.
As shown in
Figure 7a, the Pearson correlation coefficients among the three original input variables are all close to zero. Specifically, the correlation coefficient between well spacing and injection rate is 0.00256, that between well spacing and natural fracture angle is −0.00334, and that between injection rate and natural fracture angle is 0.041. The Spearman rank correlation coefficients in
Figure 7b also remain at low levels, with a maximum absolute value of only 0.0825. These results indicate that there is no significant linear or monotonic correlation among the original input variables, suggesting that the main engineering control parameters in the dataset are statistically independent.
Furthermore, multicollinearity is diagnosed using the variance inflation factor (VIF). The definition of the variance inflation factor is given by the following formula:
where
is the coefficient of determination obtained from a linear regression using the
i-th feature as the dependent variable and the remaining features as independent variables.
According to commonly used empirical criteria in regression diagnostics, VIF < 5 generally indicates no significant harmful multicollinearity; 5 ≤ VIF < 10 suggests that some degree of multicollinearity may exist but is still acceptable; and VIF ≥ 10 typically indicates strong multicollinearity, requiring further consideration of variable selection, combination, or model re-specification.
Figure 8 below shows the VIF plot for the original data in this paper.
As can be seen from the results in
Figure 8, the VIF values of the three original input variables are all close to 1, far below the commonly used threshold of 5, indicating that there is no serious multicollinearity problem in the original feature space. Therefore, the high contributions exhibited by natural fracture angle, well spacing, and their interaction terms in the subsequent model interpretation are not caused by strong correlations or multicollinearity among the original input variables but more likely reflect the actual controlling effects of these engineering parameters on the fracture asymmetry response.
At the level of the original input variables, the Pearson and Spearman correlation analyses as well as the VIF results all indicate that there is no significant correlation or severe multicollinearity among well spacing, injection rate, and natural fracture angle. These findings suggest that the original feature space has good statistical independence, and that the subsequent feature contribution analysis is unlikely to be directly disturbed by strong correlations among the original variables. However, because this paper further introduces second-order interaction terms to characterize the coupling effects among engineering parameters, and these interaction terms are constructed by multiplying the original engineering parameters, they may introduce new variable correlations and feature redundancy while enhancing the nonlinear representation capability of the model. Therefore, this paper further conducts a correlation analysis between the original features and the interaction features to evaluate the impact of feature construction on model stability and interpretability.
Figure 9 below shows the correlation diagnostics of second-order interaction features.
On this basis, the correlation structure after constructing the second-order interaction features is further analyzed.
Figure 9 shows the Pearson and Spearman correlation matrices for the original variables together with their second-order interaction terms. Unlike the near-zero correlations among the original variables, moderate to strong correlations appear between some product features and their corresponding original variables after introducing the interaction terms. The Pearson correlation results show that the correlation coefficients between well spacing and “well spacing × injection rate” and between well spacing and “well spacing × natural fracture angle” are 0.721 and 0.641, respectively; between injection rate and “well spacing × injection rate” and between injection rate and “injection rate × natural fracture angle” they are 0.750 and 0.663, respectively; and between natural fracture angle and “well spacing × natural fracture angle” and between natural fracture angle and “injection rate × natural fracture angle” they are 0.719 and 0.781, respectively. The Spearman rank correlations show similar patterns: for example, the rank correlation coefficients between well spacing and its related interaction terms are approximately 0.710–0.714; between injection rate and its related interaction terms, approximately 0.615–0.696; and between the natural fracture angle and its related interaction terms, approximately 0.550–0.568.
Such correlations do not imply severe collinearity among the original variables but rather reflect structural correlations introduced by the construction of second-order interaction features. Since a product term inherently contains information about its constituent original variables, it will necessarily show high correlations with those variables. “Well spacing × natural fracture angle” carries information about both well spacing and natural fracture angle and therefore exhibits strong correlations with both variables. Similarly, the Pearson correlation coefficient between “injection rate × natural fracture angle” and natural fracture angle reaches 0.781, indicating that this interaction term contains strong information about the fracture angle. These results also demonstrate from a statistical perspective that the second-order interaction terms do not simply increase the number of variables but rather explicitly embed coupling information among engineering parameters into the feature space.
3.4.2. SHAP-Based Feature Contribution Analysis
Based on the correlation and multicollinearity diagnostics, to investigate the mechanisms controlling fracture asymmetry in greater detail, this section analyses SHAP-based feature importance, the interaction between natural fractures and well spacing, and feature dependence patterns.
Figure 10 presents the SHAP feature importance ranking. SHAP (SHapley Additive exPlanations) explains black-box models by quantifying the contribution of each input feature to individual predictions, thereby providing both global and local interpretability [
69,
70,
71].
Figure 10 quantifies the influence of each input feature on fracture asymmetry using the mean absolute SHAP value (ranging from 0 to 0.012). This analysis covers all core input parameters considered in this study (see
Table 5). The mean absolute SHAP value represents the average marginal contribution of a feature to the model’s predictions; therefore, a higher value indicates a stronger influence and provides a direct ranking of feature importance.
Figure 10 presents the global feature importance ranking based on the mean absolute SHAP value. The results show that the natural fracture angle is the most influential feature, with a mean absolute SHAP value of approximately 0.012, significantly higher than those of the other input variables. A larger SHAP value on the horizontal axis indicates a stronger effect of the corresponding feature on fracture asymmetry. The results demonstrate that the natural fracture angle contributes the most and is the dominant controlling factor.
Figure 11 presents the SHAP summary plot, which is used to analyze the contribution direction of different feature values to the prediction of the fracture asymmetry index. The horizontal axis represents the SHAP value, where positive values indicate that the feature increases the predicted value
, and negative values indicate that it decreases
. The color scale from blue to purple represents feature values from low to high. It can be observed that the natural fracture angle exhibits the widest SHAP value distribution, indicating its strongest impact on the model predictions. Low natural fracture angles mainly correspond to positive SHAP values, while high angles are more distributed near zero or in the negative SHAP region, suggesting that, under the current signed index definition, a smaller natural fracture angle tends to increase
, whereas a larger angle tends to decrease
or weaken its positive contribution. The interaction term between well spacing and the natural fracture angle ranks second in importance, indicating that well spacing modulates the influence of natural fracture orientation on the fracture asymmetry response. It should be noted that, since
is a signed index, “decrease” here refers to a reduction in the algebraic value of the prediction and does not necessarily imply a reduction in the degree of fracture asymmetry.
The above SHAP results are consistent with the interaction mechanism between hydraulic fractures and natural fractures. The natural fracture angle determines the intersection relationship between hydraulic fractures and pre-existing weak planes, thereby affecting whether the hydraulic fracture crosses the natural fracture, deflects along the weak plane, or forms local branches. Renshaw and Pollard [
72] proposed an experimentally verified criterion for fracture propagation across unbounded frictional interfaces in brittle linear elastic materials, providing a mechanical basis for evaluating whether hydraulic fractures can cross or be arrested by pre-existing weak interfaces. Gu et al. [
73] further extended this criterion to non-orthogonal intersections between hydraulic fractures and natural fractures, pointing out that the intersection angle is a key factor controlling whether the hydraulic fracture crosses or deflects. The multi-branch hydraulic fracture model by Dahi-Taleghani and Olson [
74] shows that interactions between induced fractures and natural fractures lead to complex fracture geometries and asymmetric propagation behavior. Therefore, the natural fracture angle emerges as the dominant feature in the SHAP analysis, indicating that the model captures the controlling effect of natural fracture guidance on fracture propagation paths and asymmetric responses.
Well spacing mainly affects fracture asymmetry by modulating the stress shadow effect between fractures from adjacent wells. A smaller well spacing enhances the superposition of stress fields induced by fractures from neighboring wells, alters the local principal stress direction and the stress state near the fracture tip, and thus further influences the effective intersection relationship between hydraulic fractures and natural fractures. Existing complex fracture network models and three-dimensional hydraulic fracture stress shadow studies have shown that mechanical interactions between fractures significantly modify the local stress perturbation range, fracture propagation direction, and competitive fracture growth behavior [
75,
76]. Consequently, the effect of well spacing on fracture asymmetry is not primarily an independent control of fracture paths but rather a modulating role by changing the local stress environment in which natural-fracture-guided propagation occurs. This also explains why the “well spacing × natural fracture angle” interaction term emerges as a secondary but significant interactive feature in the SHAP analysis: it reflects the coupled modulation mechanism between inter-well stress interference and the guiding effect of natural fractures.
To further test whether the SHAP feature ranking is affected by a single data split or small-sample fluctuations, this paper employs a Bootstrap resampling method to analyze the stability of the mean absolute SHAP values. This method constructs multiple resampled datasets by sampling the training data with replacement, retrains the model on each resampled dataset, and computes the SHAP values. The resulting distribution of mean absolute SHAP values can be used to characterize the range of fluctuation of feature contributions under sample perturbations, thereby determining whether the feature importance ranking is stable.
Figure 12 shows the Bootstrap mean absolute SHAP values with 95% confidence intervals.
The results show that the natural fracture angle consistently has the highest mean absolute SHAP value under bootstrap resampling, and its 95% confidence interval is clearly separated from those of the other features. This indicates that its dominant contribution is not driven by a single training run or a specific resampled dataset. The interaction term between well spacing and natural fracture angle ranks second in importance, and its confidence interval is also separated from those of the remaining lower-ranked features, suggesting that this interaction term provides a stable explanatory contribution under different sample perturbations. In contrast, the mean absolute SHAP values of the well spacing–discharge rate interaction term, discharge rate alone, well spacing alone, and the discharge rate–natural fracture angle interaction term are relatively small and show overlapping confidence intervals, indicating that these features mainly act as secondary moderating factors.
Overall, the bootstrap SHAP analysis provides additional evidence for the stability of the above interpretation from the perspective of constructed features. The prediction of fracture asymmetry is primarily controlled by the natural fracture angle and is further influenced by its interaction with well spacing. However, SHAP analysis evaluates the contribution of individual constructed features rather than the grouped effect of the original physical variables. Therefore, group permutation importance is further employed in the next section in a supplementary analysis to examine the robustness of this interpretation at the original-variable level.
3.4.3. Grouped Permutation Importance
To further examine the feature interpretation results from the perspective of the original physical variables, Bootstrap group permutation importance analysis is adopted. Unlike SHAP analysis, which evaluates the contribution of each constructed feature individually, the group permutation method perturbs all features related to each original input variable (i.e., the variable itself and its associated interaction terms) as a whole. If perturbing a variable group leads to a significant increase in model prediction error, it indicates that the original variable and its associated coupling information make a high contribution to the model predictions.
Figure 13 shows that permuting the natural fracture angle group causes the largest increase in RMSE, approximately 0.021, indicating that destroying information related to the natural fracture angle significantly reduces model prediction accuracy. Therefore, the natural fracture angle is the most important original controlling variable for fracture asymmetry prediction. Permuting the well spacing group leads to an RMSE increase of about 0.011, which is lower than that of the natural fracture angle but significantly higher than that of the injection rate, indicating that although well spacing is not the dominant independent controlling factor, it still makes an important contribution to fracture asymmetry prediction through its influence on inter-well stress shadow intensity and its coupling with the natural fracture angle. This result is consistent with the SHAP analysis conclusion: fracture asymmetry prediction is primarily controlled by the natural fracture angle and modulated by the coupling effects related to well spacing, while the contribution of injection rate is relatively weak.
Combining the correlation analysis, SHAP interpretation results, and group permutation importance results, it can be seen that fracture asymmetry prediction does not simply arise from statistical correlations among input variables but is closely related to natural fracture orientation and inter-well stress interference conditions. The natural fracture angle exhibits the most significant influence in both the SHAP analysis and the group permutation analysis, indicating that it plays a dominant role in hydraulic fracture crossing, deflection, and propagation along natural fractures. The influence of well spacing is mainly reflected in altering the stress shadow range and interference intensity between adjacent well fractures and further affects the asymmetric fracture response through its coupling with the natural fracture angle. In contrast, within the current parameter range, the injection rate and its associated interaction terms have a relatively weak impact on prediction results, suggesting that injection rate is not a primary factor controlling fracture asymmetry. These results demonstrate that the proposed interpretable learning framework not only achieves good predictive accuracy but also reveals the key influencing factors and their interactions that are consistent with the physical mechanisms of multi-well fracturing.