Next Article in Journal
Analysis of Control Factors for Sensitivity of Coalbed Methane Reservoirs
Previous Article in Journal
Research on Profile Control Potential Evaluation and Optimization Design Technology in Block M of Gudong Oilfield
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Capacity Prediction and Interpretability of Dense Gas Pressure Based on Ensemble Learning

1
CNPC Greatwall Drilling Co., Beijing 100012, China
2
CNPC Greatwall Drilling Co., Fracturing Co., Panjin 124000, China
3
Key Laboratory of Enhanced Oil Recovery of Minwastry of Education, Northeast Petroleum University, Daqing 163318, China
*
Author to whom correspondence should be addressed.
Processes 2025, 13(10), 3132; https://doi.org/10.3390/pr13103132
Submission received: 18 August 2025 / Revised: 20 September 2025 / Accepted: 24 September 2025 / Published: 29 September 2025
(This article belongs to the Topic Enhanced Oil Recovery Technologies, 4th Edition)

Abstract

Data-driven modeling methods have been preliminarily applied in the development of tight-gas reservoirs, demonstrating unique advantages in post-fracturing productivity prediction. However, most of the established predictive models are “black-box” models, which provide productivity predictions based on a set of input parameters without revealing the internal prediction mechanisms. This lack of transparency reduces the credibility and practical utility of such models. To address the challenges of poor performance and low trustworthiness of “black-box” machine learning models, this study explores a data-driven approach to “black-box” predictive modeling by integrating ensemble learning with interpretability methods. The results indicate the following: The post-fracturing productivity prediction model for tight-gas reservoirs developed in this study, based on ensemble learning, achieves a goodness of fit of 0.923, representing a 26.09% improvement compared to the best-performing individual machine learning model. The stacking ensemble model predicts post-fracturing productivity for horizontal wells more accurately and effectively mitigates the prediction biases of individual machine learning models. An interpretability method for the “black-box” ensemble learning-based productivity prediction model was established, revealing the ranked importance of factors influencing post-fracturing productivity: reservoir properties, controllable operational parameters, and rock mechanics. This ranking aligns with the results of orthogonal experiments from mechanism-driven numerical models, providing mutual validation and enhancing the credibility of the ensemble learning-based productivity prediction model. In conclusion, this study integrates mechanistic numerical models and data-driven models to explore the influence of various factors on post-fracturing productivity. The cross-validation of results from both approaches underscores the reliability of the findings, offering theoretical and methodological support for the design of fracturing schemes and the iterative advancement of fracturing technologies in tight-gas reservoirs.

1. Introduction

The low-permeability reservoirs in the Ordos Basin are mainly composed of river sand bodies, with complex connectivity between multiple sets of sand bodies and strong reservoir heterogeneity. Most of its reservoirs have the characteristics of low or even ultra-low porosity and permeability, and it is necessary to establish a productivity prediction mechanism model for horizontal well multicluster fracturing to guide the horizontal well multicluster fracturing process. The production capacity of horizontal wells after hydraulic fracturing is influenced by complex and multiple factors such as fracturing technology, geological structure, and reservoir properties. Excessive idealized assumptions may lead to predicted results deviating from reality. In addition, the high application cost and significant differences in the effectiveness of multicluster fracturing technology in horizontal wells also hinder the efficient development of tight-gas reservoirs.
Meanwhile, artificial intelligence, as a strategic universal technology leading the new round of technological revolution and industrial transformation, has become one of the main directions for technological innovation in China’s oil and gas energy sector. The deep integration of artificial intelligence technology with the oil and gas business, the use of intelligent algorithms to mine patterns from massive data of oil exploration and development, and the realization of intelligent regulation and optimization throughout the entire process have gradually become the paradigm of innovative development of oil and gas artificial intelligence [1,2,3,4].
As the core of artificial intelligence, machine learning is the fundamental way to make computers intelligent. Machine learning algorithms are gradually becoming important tools in the analysis and utilization of oil and gas big data due to their strong data adaptability. Among them, data-driven modeling methods have been preliminarily applied in the development of tight-gas reservoirs [5] and have demonstrated their unique advantages in predicting post-compression production capacity. However, most of the current data-driven prediction models are “black-box” models, whose prediction results are limited by the number of valid samples, and because they cannot explain the internal prediction mechanism of the model, their credibility and practicality are low.
Therefore, this article takes the tight sandstone in a certain block of the Ordos Basin as the object, constructs an interpretable integrated learning model, explores the relationship between the post compression productivity of tight-gas reservoirs and reservoir properties, construction parameters, etc., and compares its results with the results obtained from the mechanism driven fracture expansion productivity prediction coupling model under similar factors to explore the actual physical significance of each parameter, further verify the rationality and interpretability of the model, provide theoretical and technical support for the fracturing and production increase in tight-gas reservoirs, and also provide scientific basis for the hydraulic fracturing decision-making and evaluation of the research block.

2. Construction of Integrated Learning Model Framework

2.1. Algorithm Principle

Ensemble learning refers to the integration of multiple individual learners, using different learning strategies to construct ensemble learners that are superior to individual learners, in order to reduce the error rate of the integrated model. Assuming that the error rates of each learner are independent of each other, the error rate P of the integrated learner can be obtained from Equation (1). From this equation, it can be seen that as the number of individual learners increases, the error rate of the integrated learner gradually decreases.
P = k = 0 [ Y / 2 ] T k ( 1 ε ) k ε 1 k exp 1 2 T ( 1 2 ε ) 2
where T is the number of individual learners, and ε is the error rate.
Figure 1 shows the current ensemble prediction model, which consists of two layers: the first layer is the base learner, and the second layer is the meta-learner. Among them, the base learner is trained on the original dataset by a single machine learning algorithm, and then the meta-learner mainly trains the input features obtained from the new dataset “generated” by multiple base learners. If the machine learning algorithms used in the training base learner and meta-learner are different, it will directly affect the generalization performance of the stacking ensemble learning model.

2.2. Learner Selection

2.2.1. Selection of Base Learners

To achieve better generalization performance of the stacking ensemble model, it is necessary to choose a diverse and strong generalization ability base learner [6]. Therefore, this article selected and evaluated nine machine learning algorithms based on four different algorithmic principles. The performance comparison and correlation analysis of base learners will be carried out below.
(1)
Comparative analysis of learner performance
This article selects four types of individual machine learning models, namely parallel ensemble learning, serial ensemble learning, neural networks, and SVR, to train and predict on subsequent databases, and compares and analyzes the performance of individual machine learning models. The performance evaluation criteria mainly include mean square root error (RMSE), absolute error (MAE), mean absolute percentage error (MAPE), and goodness of fit (R2). The calculation formula is as follows:
R M S E = 1 N t e t = 1 N t e y i y i ^ 2
M A E = 1 N t e t = 1 N t e y i y i ^
M A P E = 1 N t e n = 1 N t e y i y i ^ y i × 100 %
R 2 = 1 i = 1 N t e y i y ^ i 2 i = 1 N t e y i y ¯ i 2
where N t e is the number of test samples, y i is the actual output values of test samples, y i ^ is the test sample prediction output value, and y ¯ i is the average actual output value of the test sample.
Table 1 presents a performance comparison of nine machine learning algorithms, while Figure 2, Figure 3 and Figure 4 also provide a performance metric comparison of the nine machine learning algorithms under different performance evaluation criteria. It can be seen that the XGBoost algorithm has the best performance, with an RMSE of 68.548, an MAE of 47.339, an MAPE of 17.0%, a small error, and an R2 of 0.732, indicating a high degree of goodness of fit. Compared with AdaBoost, GBRT, and LightBGM algorithms that belong to serial ensemble learning, XGBoost has improved the goodness of fit R2 by 30%, 11.7%, and 7.8%, respectively.
Among the four types of individual machine learning models, namely parallel ensemble learning, serial ensemble learning, neural networks, and SVR, the serial ensemble learning model has the lowest MAE and RMSE, indicating that its average prediction error is small and can effectively predict post-compression production capacity.
(2)
Analysis of learner relevance
The advantage of the stacking framework is that it integrates multiple prediction algorithms, allowing each algorithm to observe data from different spaces and structures, thereby effectively analyzing the data. Therefore, when choosing a base learner, in addition to selecting algorithms with good performance, different types of prediction algorithms should also be added as much as possible. This article uses the Pearson correlation coefficient of two-dimensional vectors to analyze the correlation between the prediction error distribution generated by each base learner in the production capacity prediction after horizontal well pressure.
Figure 5 shows that the error correlation between parallel ensemble learning models, serial ensemble learning models, and neural network models is usually high. This is because each algorithm has a strong learning ability, and there may be some inherent errors during the training process. Based on the above results, this article plans to select algorithms such as XGBoost, LightGBM, Random Forest, Bagging, BP, SVR, etc., as the base model.

2.2.2. Selection of Meta-Learners

Meta-learners can greatly eliminate the bias between the expected output values and the actual values of different submodels during correction. The commonly used meta-learners by many researchers can be roughly divided into three types: one is the probability-based voting method or weighted average method in statistical methods, the second is some easily interpretable machine learning algorithms such as logistic regression [7] and decision trees [8], and the last is nonlinear machine learning algorithms such as XGBoost [9] and GBRT [10].
There is currently no unified selection criterion for meta-learners, and linear or logistic regression methods are commonly used. However, meta-learners have a significant impact on the generalization ability of ensemble models. Therefore, this paper compares the performance of two regression algorithms, SVR and Random Forest, and determines the best meta regressor based on their performance [11].

2.3. Stacking Prediction Model Based on PSO

The stacking ensemble model selected in this article contains multiple machine learning algorithms, which will take more time in the training and learning process, and are prone to getting stuck in local optima. Therefore, the particle swarm optimization algorithm (PSO) is integrated into the stacking ensemble model to optimize its running logic. In a continuous spatial coordinate system, the mathematical description of the particle swarm optimization algorithm is [12]. A group of particles flies at a certain speed in a D-dimensional search space. When searching for each particle, the historical best point found by oneself and the historical best points of other particles in the group (or neighborhood) are taken into account, and their positions are changed based on this to obtain the results. The i-particles in the particle swarm consist of three D-dimensional vectors, which include:
Current location: x i = x i 1 , x i 2 , , x i D ;
Best historical location: p i = p i 1 , p i 2 , , p i D ;
Speed: v i = v i 1 , v i 2 , , v i D .
Regarding the current position as a set of coordinates describing spatial points, consider it as the solution method for each algorithm. If the current position x i is better than the best position in history p i , a secondary vector of position coordinates p i will be obtained. The best location for the particle swarm is
p g = p g 1 , p g 2 , , p g D
In this paper, we adopt a two-level stacking framework. Base learners are trained under 5-fold cross-validation. For each fold, we generate level-one predictions from every base learner; these predictions form the design matrix for the meta-learner. The meta-learner is then fitted on the level-two predictions from the training folds and evaluated on the corresponding validation fold.
PSO is used exclusively for joint hyperparameter tuning of (i) the selected base learners and (ii) the meta-learner. PSO does not perform feature selection; level-two model weighting is learnt by the meta-learner itself. The PSO objective is the level-one RMSE computed on the held-out fold.
For each fold k: (1) fit each base learner on D\Dk with PSO-tuned hyperparameters using inner CV; (2) produce predictions z j for all training samples; (3) train the meta-learner on Z = z 1 , ... , z m with PSO-tuned C , ε , γ hyperparameters; and (4) evaluate on Dk. Finally, refit the base and meta-learners on the entire training set with the selected hyperparameters. A framework for predicting production capacity after dense gas pressure is built. The calculation flowchart is shown in Figure 6.

3. Capacity Prediction After Dense Gas Pressure

3.1. Data and Sources

This article takes 200 horizontal wells that have undergone fracturing as samples, and their characteristics include reservoir properties, rock mechanics, geomechanics, fracturing technology, gas testing data, etc., as listed in Table 2.

3.2. Data Analysis

The collection includes 200 hydraulically fractured horizontal wells from a tight-sandstone formation in the Ordos Basin. Field records (petrophysics, rock mechanics, in situ stresses, and treatment schedules) were standardized and subjected to quality assurance to constitute the analysis sample.
Outliers are evident in permeability and treatment volumes, indicating operational disturbances and measurement noise characteristic of tight-gas projects.
Numerous engineering variables (e.g., permeability upper tail, total fluid, total proppant) display outliers and heavy tails, which are anticipated in tight-gas operations due to operational disturbances (screen-out management, rate escalation), measurement inaccuracies, and design variability between pads.
Due to the skewed, heavy-tailed characteristics of field data and the existence of outliers, all continuous predictors are standardized using RobustScaler (median–IQR) throughout the modeling process. This decision mitigates leverage from extremes and enhances numerical conditioning for scale-sensitive learners (SVR), while maintaining a consistent pipeline for tree-based models that are predominantly scale-insensitive.

3.3. Data Preprocessing

We assessed five normalizing methods frequently employed in petroleum production datasets—MinMaxScaler, MaxAbsScaler, StandardScaler (z-score), RobustScaler (median–IQR), and median normalization—and measured their impacts utilizing divergence measures.
x = x x min x max x min
x = x x max
x = x μ σ
x = x x m e d i a n x max x min
x = x x m e d i a n Q 3 x Q 1 x
where x m e d i a n is median, x min is minimum value, x max is maximum value, Q 3 x is upper quartile, Q 1 x is lower quartile, x max is maximum absolute value, μ is arithmetic mean, and σ is standard deviation.
We calculated Kullback–Leibler (KL) and Jensen–Shannon (JS) divergences to evaluate distributional dissimilarity prior to and subsequent to scaling.
K L ( m | | n ) = i = 1 k m x i ln m x i n x i
J S ( m | | n ) = 1 2 K L m x | | m x + n x 2 + 1 2 K L n x | | m x + n x 2
Tight-gas characteristics (rates, volumes, pressures) are generally skewed and heavy-tailed, with outliers resulting from operational disturbances and measurement noise; the KL/JS diagnostics (Figure 7) indicate significant deviations from normality. In these circumstances, z-score and min–max scaling can enhance the influence of outliers, while RobustScaler maintains rank order and diminishes the impact of extremes by median centering and interquartile range scaling. Statistical analysis of dataset distribution characteristics is showon in Table 3.
Following prior petroleum ML studies that adopt median–IQR robust scaling to mitigate outliers in field measurements [13,14], we employ RobustScaler for all continuous features before model training. Our model suite comprises scale-sensitive learners, such as support vector regression and back-propagation neural networks, in conjunction with predominantly scale-insensitive tree-based models. Consequently, we utilize RobustScaler as the standard preprocessing method to mitigate field outliers and heavy tails (Figure 8) and to enhance numerical conditioning for the scale-sensitive components.

3.4. Accuracy Evaluation Results of Stacking Integrated Model

To further assess how base learner choice affects the generalization of the stacking ensemble, we added a configuration that uses all candidate algorithms as base learners and constructed ten stacking models (Table 4).
In the stacking ensembles that used SVR as the meta-learner, Stacking #3 achieved the best performance.
In the ensembles that used Random Forest as the meta-learner, Stacking #6 was the strongest, but it still underperformed the best SVR-meta configuration.
We adopt a grouped nine-fold cross-validation at the pad/well level to avoid leakage across spatially correlated wells. In each outer fold, base- and meta-learner hyperparameters are tuned via PSO, the fold is evaluated, and metrics are logged. After completing nine folds, models are refit on the full training set and evaluated on a held-out test set (N = 20 wells). For CV, we report fold-wise mean ± 95% CI using a t-interval (t0.975, 8). On the held-out test set, Stacking #3 achieved MAE = 38.06 [34.6, 41.9], RMSE = 64.61 [58.5, 71.5], MAPE = 15.13% [13.4, 16.9], and R2 = 0.923 [0.901, 0.942] [15]. Stacking #3, 9-fold cross-validation. is showon in Table 5.
We substantiate these points by conducting statistical comparisons on the held-out test set using the Friedman test followed by the Nemenyi post hoc procedure. Specifically, we applied the Friedman test to per-well ranks of the 10 ensembles on the 20-well test set (k = 10, N = 20) and then used Nemenyi to identify pairwise performance differences [16]. Table of prediction error ranking for the stacking model is shown in Table 6
χ F 2 = 12 N k k + 1 j = 1 k R j 2 k k + 1 2 4 42.2018
F F = N 1 χ F 2 N k 1 χ F 2 5.819
W = χ 2 N k 1 0.234
N-numbers of wells, k-numbers of Model R j 2 —the average rank of the j-th model on N.
Both the Friedman statistic ( χ F 2 = 42.2018, d f = 9) and the Iman–Davenport adjustment ( F F = 5.819, d f 1 = 9, d f 2 = 171) exceed their 0.05 critical values; accordingly, we reject the null hypothesis of equal model performance (p ≪ 0.001). Kendall’s W ≈ 0.234 indicates modest rank concordance across the 20 wells—systematic but not strong—therefore, we applied the Nemenyi post hoc test to identify statistically significant pairwise differences.
C D = q α k ( k + 1 ) 6 N
For α = 0.05, C D 0 . 05 = 3.173, post hoc Nemenyi analysis indicates that the top performer (Stacking #3; average rank 2.95) is statistically indistinguishable from Stacking #9, #2, #6, and #10, whereas others are significantly worse (α = 0.05, CD = 3.173).
Comparing the two representative ensembles, Stacking #3 and Stacking #9, we observe that the smaller, more complementary set in Stacking #3 yields lower test errors than Stacking #9. On per-well ranks, the Nemenyi test indicates that #3 vs. #9 is not significant at α = 0.05, yet the direction consistently favors #3.
This pattern reflects the bias-variance trade-off in stacking: as the number of base learners grows, the level-two feature vector (OOF predictions) expands. When added learners are highly correlated or weak, they contribute little new signal but increase collinearity and variance at the meta level, raising overfitting risk. Conversely, a compact, diverse subset (as in #3) supplies complementary errors and improves generalization.
SVR establishes linear or nonlinear regression models by maximizing intervals, performs well in handling nonlinear problems and datasets with minimal noise, and has good generalization ability. Random Forest, on the other hand, constructs decision trees through random sampling and feature selection, demonstrating excellent performance in handling high-dimensional and large-scale datasets, and exhibiting robustness against missing data and noise. However, when used as a meta-learner, the sample set features it faces are the predicted values of the base learner, not high-dimensional data, so the performance of Random Forest as a meta-learner is not as good as SVR.
In addition, the research results indicate that when selecting all base learners to form a stacking ensemble model, its performance is not as good as the stacking ensemble model formed by selectively selecting base learners. Taking Stacking # 9 as an example, its error and goodness of fit are MAE = 44.425, RMSE = 64.703, MAPE = 15.538%, and R2 = 0.889, respectively, indicating poorer performance compared to Stacking # 3. The reason for this difference is that for base learners with similar algorithm principles, if their performance is relatively poor, their predicted values will have a significant impact on the error of the final prediction results.

3.5. Comparing the Accuracy of Integrated Models and Individual Models

In order to further analyze the performance of stacking ensemble models and individual machine learning models, this paper will compare the accuracy/error between Stacking #3 and Stacking# 9 ensemble models and individual machine learning models, and present the results in Figure 9. Both ensemble models use SVR as the meta-learner, and Stacking #3 specifically selects the base learner, while Stacking 9 includes all the base learners involved in this paper. As shown in the figure, the root mean square error (RMSE), absolute error (MAE), and mean absolute percentage error (MAPE) of Stacking #3 and Stacking #9 are significantly lower than those of individual machine learning models. In addition, the R2 values of these two ensemble models are 0.923 and 0.889, respectively, which are 26.09% and 21.44% higher than the optimal individual base learner XGBoost. Therefore, in terms of predicting production capacity after tight-gas pressure, the performance of these two integrated models has significantly improved compared to individual machine learning models.
The aforementioned results indicate that the stacking ensemble diminishes bias and enhances fit compared to individual learners.
Figure 10 illustrates the comparison between the anticipated values and the actual values derived from two distinct models. The figure clearly illustrates that the projected values of Stacking #3 and Stacking #9 in the test set are more closely aligned with the fitted line of y = x, but the fitted lines of individual machine learning models exhibit considerable deviation from y = x. This suggests that stacking ensemble models can proficiently rectify prediction biases introduced by individual machine learning models.
We conceptually assess our technique against two predominant families of industry-standard predictors.
History-oriented benchmarks (DCA/RTA): Empirical decline-curve analysis (including Arps-type and its contemporary extensions) and rate-transient analysis are extensively utilized once post-fracturing production data is accessible. These strategies are effective for surveillance and reserves; however, they are inapplicable during the pre-job design phase and are susceptible to the duration and quality of early-time data.
Physics-based simulations (analytical/numerical): Pseudo-3D and 3D fracture-reservoir simulators, along with analytical post-fracture proxies, are the standard design tools. Their projections in our sector deviate from data-driven results mainly due to their assumptions of uniform compensation, consistent cluster efficiency, and optimal leakoff, while the analyzed interbedded sandstone–shale displays significant heterogeneity and varied cluster performance.
Our model is positioned as a design-stage, data-driven complement: given pre-job geological/geomechanical descriptors and treatment parameters, it produces rapid forecasts of early-time productivity for screening and design optimization. A dedicated interpretability study presented later further examines the learned importance pattern (water saturation, number of clusters, total fracturing liquid volume) and shows its consistency with physics-based expectations while reflecting operational variability. Common approach contrast is showon in Table 7.

4. Interpretability Study

4.1. Prediction Model for Post-Compression Production Capacity Based on Physical Constraints

To further validate the rationality of the data-driven model for predicting production capacity after tight-gas pressure, this paper constructs a mechanism-driven prediction model to analyze the influence weights of different types of features on production capacity. Using fracture morphology as the connecting node, the non-idealized fracture morphology during multicluster fracturing of horizontal wells is first drawn using the Extended Finite Element Method (XFEM). Then, the fracture morphology is characterized into the mechanism model for predicting production capacity after tight-gas pressure, forming a coupled model of fracture expansion and production capacity prediction in tight-gas reservoirs. The orthogonal experimental method is used to study the degree of influence of fracturing construction parameters, reservoir properties, geomechanics, and rock mechanics on the production capacity of tight-gas reservoirs after fracturing.
Assuming that the outer boundary of the reservoir is a closed boundary, multiple clusters of fractures in the horizontal well section are located at the center of the rectangular closed boundary and distributed along the horizontal direction. The bottomhole flowing pressure is 5 MPa. Other relevant parameters, such as horizontal well fracturing parameters and tight sandstone gas reservoir parameters, are shown in Table 8.
As shown in Figure 11 and Figure 12, based on the free mesh partitioning method, the matrix system was meshed, and local mesh refinement was performed in the crack area. The reservoir was meshed using COMSOL Multiphysics v6.1., and a mesh map of the reservoir with multiple clusters of fractures in the horizontal well was obtained.
By simulating the range comparison shown in Figure 13, it can be seen that the importance ranking of the influencing factors on the production capacity prediction after tight-gas pressure is as follows: permeability > water saturation > porosity > horizontal section length > number of clusters > total liquid volume > elastic modulus > minimum horizontal geostress.
According to the effect curve of influencing factors shown in Figure 14, it can be seen that reservoir parameters have a linear effect on the production capacity after tight-gas pressure. Among them, water saturation can affect the maximum production capacity of horizontal wells; that is, reservoir quality determines the upper limit of production capacity. The minimum horizontal stress and elastic modulus of the reservoir are negatively correlated with the productivity after tight-gas pressure, indicating that under given conditions, the minimum horizontal stress and elastic modulus may have an inhibitory effect on crack propagation [17], thereby reducing the transformation area and lowering the productivity after compression.

4.2. Global Interpretation Analysis Based on SHAP Model

Complex machine learning models are often difficult to understand and interpret, and are therefore considered “black-box” models, which also limits their application in capacity prediction [18]. A technique developed by Lundberg and Lee [19] to explain black-box models stood out in the competition among machine learning based interpretable methods, known as Shapley Additive exPlanation (SHAP). The unit of Shapley value is the same as the predicted reservoir property parameters, and the contribution of each feature to the model in the prediction task can be expressed as
f ( x ) = g z = φ 0 + v = 1 K φ v z
where g is the interpretation model, z is a K-dimensional joint vector, φ 0 is the average predicted value, also known as the baseline value, and φ v is the contribution of the v-th feature.
At the global explanatory level of the model, it can be observed from Figure 15 that porosity is the most important feature variable for predicting productivity, followed by sandstone length, water saturation, permeability, fluid strength, and horizontal segment length. The mean of these features should be kept as far away as possible from the benchmark value of 0 in the middle, and exhibit a long horizontal extension and a tailing phenomenon. The performance of these features also indicates their high importance in predictive models.
Antwarg, L. et al. (2021) [20] pointed out that LIME is a parsing method for local black-box patterns that does not require dependencies on the pattern itself. LIME is a commonly used local replacement model implementation method for identifying and interpreting model discrimination rules. This method is only applicable for training local proxy models of individual instances and does not involve training the overall proxy model. Explanatory models tend to seek stable and locally realistic explanations [21].
Due to the fact that the local interpretation model can only explain one sample at a time and cannot explain the model from a global perspective, this paper adopts a LIME global interpretation algorithm flowchart to sequentially extract all samples from the test set. For each sample, the interpretability algorithm is used to fit it, interpret it within the local range, and determine the weights of 20 features in the linear model. The cumulative integration method is used to rank the 20 features based on the absolute values of the weights.
The global feature importance obtained by LIME is shown in Figure 16 and Figure 17, which explains the prediction results of the stacking ensemble model for production capacity. In the global feature importance obtained by LIME, porosity has the main impact on productivity, followed by sandstone length, water saturation, permeability, and other features. However, LIME global interpretation uses absolute weight values for sorting, which cannot analyze the trend of the impact of features on production capacity. For example, porosity has a positive impact on productivity, while the number of clusters may reduce productivity.
Considerations and strategies for enhancing interpretability. SHAP and LIME provide valuable insights, although they possess limitations: computational cost—both rely on many model evaluations on perturbed samples; (ii) stability—results depend on SHAP’s background set (Figure 18), LIME’s neighborhood/kernel (Figure 19), and fold splits; (iii) correlated features (e.g., porosity–permeability) cause credit-splitting that is not causal and can shift ranks; and (iv) sign/trend ambiguity under strong interactions or extrapolation, so high global importance does not imply monotonic effects.
We address this by (1) standardizing preprocessing (median–IQR scaling) and calculating explanations on the identical held-out/test partitions as accuracy, (2) reporting global importance (mean |SHAP|, LIME global weights) instead of single-instance attributions, and (3) interpreting correlated features collectively with physics-consistent reasoning. Interpretability is confined to the training domain and serves as a means of verification rather than a substitute for mechanical comprehension. Comprehensive robustness assessments (background-set and neighborhood sensitivity, grouping importance, and rank-correlation with uncertainty) are postponed for future research.

4.3. Verification of Interpretability of Prediction Models

To assess the physical validity of the stacked ensemble’s explanations, we compare factor importances derived from (i) a mechanistic fracture-growth-productivity model and (ii) data-driven global explanations of the Stacking predictor (LIME and SHAP). For the mechanistic model, an orthogonal design was performed, and the main-effect range was used as the importance metric. For the learning model, mean absolute SHAP values and LIME global weights were computed and likewise normalized to [0, 1]. The eight evaluated factors are permeability, porosity, water saturation, total treatment fluid volume, cluster count per stage, horizontal length, Young’s modulus, and minimum horizontal stress σmin. Ranking of influencing factors for mechanism model, LIME, and SHAP is showon in Figure 20.
Across all three sources, water saturation ranks high, confirming its adverse impact on tight-gas productivity. The mechanistic model emphasizes permeability, whereas LIME/SHAP place greater weight on porosity, relegating permeability to a secondary role. Two technical reasons plausibly explain this: (i) collinearity and attribution sharing—porosity and permeability are strongly correlated in these rocks, and additive explanation methods distribute credit across correlated predictors; and (ii) near-wellbore impairment—the mechanistic baseline neglects drilling/completion damage and stress sensitivity, while the data-driven model learns an effective permeability from field data that implicitly captures such degradations.
LIME/SHAP also assign greater importance to horizontal length than the mechanistic setting. This is consistent with the variable definition: in the mechanistic workflow, “horizontal length” was treated as sandstone (net-pay) length, whereas in the field data, it implicitly includes non-reservoir intervals, making it a stronger proxy for stimulated reservoir volume (SRV) and net-pay encounter.
For cluster count and total treatment volume, the mechanistic model yields modest normalized importance, while LIME/SHAP indicate stronger sensitivity. This difference reflects assumptions versus operations: the simulator assumes homogeneous rock, idealized cluster efficiency, and uniform leakoff, under which additional clusters or fluid provide diminishing marginal gains. In interbedded tight sandstone–shale, however, net-pay encounter, sand-body continuity, stress shadowing, and leakoff variability govern cluster take/participation, SRV growth, and proppant placement; thus, cluster density and treatment volume materially influence 90-day cumulative gas.
Limitations and future work. We report normalized global importances to make Figure 7, Figure 8, Figure 9 and Figure 10 directly comparable. More rigorous quantification—e.g., grouped importance for {porosity, permeability}, stratification by encounter ratio/connectivity, and rank-correlation with uncertainty—requires additional computations and is deferred to future work to better disentangle definition effects from genuine physical differences.

5. Conclusions

(1)
Performance and statistical validity. We developed a PSO-tuned stacking framework for tight-gas post-fracturing productivity prediction. Under a grouped 9-fold CV and a held-out 20-well test set, the best ensemble (Stacking #3) achieved MAE = 38.06, RMSE = 64.61, MAPE = 15.13%, and R2 = 0.923, exceeding the best single learner by 26%. On the test set, Friedman ( χ F 2 = 42.2018, d f = 9, p ≪ 0.001) confirms overall differences; Kendall’s W ≈ 0.234 indicates modest rank concordance, and Nemenyi shows Stacking #3 is statistically indistinguishable from #9/#2/#6/#10, while #1/#4/#5/#7/#8 are significantly worse.
(2)
Complexity–generalization trade-off. A compact, complementary base set (e.g., Stacking #3) generalizes better than a larger, redundant set (Stacking #9, nine bases), reflecting the bias–variance trade-off: adding highly correlated/weak bases inflates meta-level collinearity and variance without adding signal. Overfitting was managed via grouped CV with OOF stacking, PSO hyperparameter search, and a regularized SVR meta-learner.
(3)
Meta-learner choice. Across configurations, SVR as the level-two learner delivered the most reliable accuracy/robustness for the OOF feature space; random-forest meta-learning underperformed the best SVR setting, consistent with SVR’s margin-based regularization on low-dimensional, correlated meta-features.
(4)
Interpretability and physics consistency. Global explanations (LIME/SHAP) align with the mechanistic fracture-growth-productivity model on key drivers: water saturation (strong negative), lateral contact/length, cluster density, and treatment volume materially impact 90-day gas. Differences in the porosity–permeability ranking are attributable to collinearity and near-wellbore impairment captured by field data but idealized in the mechanistic baseline.
(5)
Utilize the model as a screening tool throughout the design phase for field deployment. Rank candidate clusters based on counts and treatment volumes relative to local stress contrast/encounter ratios, selecting the minimal design whose predicted P50 aligns with the target within its 95% confidence interval; during implementation, compare treating pressure against the predicted interval to identify interface activation or inadequate cluster acquisition, making real-time adjustments as necessary. For new reservoirs, the framework is transferable but requires brief recalibration on a local well set (feature harmonization and grouped-CV/PSO refit) and benchmarking against the operator’s mechanistic simulator prior to implementation.
The current data does not represent external regimes, so future validation will involve extending features and benchmarking against those scenarios.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/pr13103132/s1. Algorithm S1. Grouped OOF Stacking with PSO (leak-free, nested CV).

Author Contributions

Conceptualization, Z.Y.; Methodology, X.L. and Y.W.; Software, Y.W.; Validation, C.Z.; Formal Analysis, C.Z.; Resources, Z.Y.; Data Curation, Y.W.; Writing—Original Draft, Y.W.; Writing—Review and Editing, X.L. and Y.B.; Project Administration, X.L.; Funding Acquisition, Y.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work is sponsored by the 2024 Heilongjiang Province “Basic Research Support Program for Outstanding Young Teachers”(YQJH2024045), the National Natural Science Foundation of China (Grant No.: 52474035), and CNPC Innovation Fund (Grant No.: 2024DQ02-0114). Furthermore, the authors would like to thank all members of the research team.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Xuanyu Liu, Zhiwei Yu, and Chao Zhou were employed by the company CNPC Greatwall Drilling Co. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

  1. Bergen, K.J.; Johnson, P.A.; de Hoop, M.V.; Beroza, G.C. Machine learning for data-driven discovery in solid Earth geoscience. Science 2019, 363, 323. [Google Scholar] [CrossRef] [PubMed]
  2. Yu, S.; Ma, J. Deep learning for geophysics: Current and future trends. Rev. Geophys. 2021, 59, e2021R–e2742R. [Google Scholar] [CrossRef]
  3. Jia, Y.; Ma, J. What can machine learning do for seismic data processing? An interpolation application. Geophysics 2017, 82, V163–V177. [Google Scholar] [CrossRef]
  4. Baozhi, P.; Yanan, D.; Haitao, Z.; Xiaoming, Y.; Xue, H. A bfa-cm optimization log interpretation method. Chin. J. Geophys. 2016, 59, 364–372. [Google Scholar] [CrossRef]
  5. Nejad, A.M.; Sheludko, S.; Shelley, R.F.; Hodgson, T.; McFall, R. A case history: Evaluating well completions in the eagle ford shale using a data-driven approach. In Proceedings of the SPE Hydraulic Fracturing Technology Conference, Woodlands, TX, USA, 3–5 February 2015. [Google Scholar] [CrossRef]
  6. Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
  7. Yang, J.; He, S.; Zhang, Z.; Bo, X. NegStacking: Drug−target interaction prediction based on ensemble learning and logistic regression. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 18, 2624–2634. [Google Scholar] [CrossRef]
  8. Chen, C.-H.; Zhao, W.-D.; Pang, T.; Lin, Y.-Z. Virtual metrology of semiconductor PVD process based on combination of tree-based ensemble model. ISA Trans. 2020, 103, 192–202. [Google Scholar] [CrossRef] [PubMed]
  9. Zhong, J.; Sun, Y.; Peng, W.; Xie, M.; Yang, J.; Tang, X. XGBFEMF: An XGBoost-based framework for essential protein prediction. IEEE Trans. Nanobiosci. 2018, 17, 243–250. [Google Scholar] [CrossRef] [PubMed]
  10. Wang, J.; Li, P.; Ran, R.; Che, Y.; Zhou, Y. A short-term photovoltaic power prediction model based on the gradient boost decision tree. Appl. Sci. 2018, 8, 689. [Google Scholar] [CrossRef]
  11. Guo, Z.; Zhang, R.; Wang, L.; Zeng, S.; Li, Y. Optimal operation of regional integrated energy system considering demand response. Appl. Therm. Eng. 2021, 191, 116860. [Google Scholar] [CrossRef]
  12. Kennedy, J. The particle swarm: Social adaptation of knowledge. In Proceedings of the 1997 IEEE International Conference on Evolutionary Computation, Indianapolis, IN, USA, 13–16 April 1997. [Google Scholar] [CrossRef]
  13. Al-Ali, Z.H. Well Logs Interpretation Using Machine Learning Workflow. In Proceedings of the SPE Gas & Oil Technology Showcase and Conference, Dubai, United Arab Emirates, 13–15 March 2023; SPE: Dallas, TX, USA, 2023. [Google Scholar] [CrossRef]
  14. Khodabakhshi, M.J.; Bijani, M. Predicting scale deposition in oil reservoirs using machine learning optimization algorithms. Results Eng. 2024, 22, 102263. [Google Scholar] [CrossRef]
  15. Vaferi, B.; Torabi, F.; Gandomkar, A.; Ahmadi, Y.; Mansouri, M. Cloud point pressure estimation of gas-soluble chemicals in carbon dioxide by machine learning paradigms. Phys. Fluids 2025, 37, 076109. [Google Scholar] [CrossRef]
  16. Ahmadi, Y. Improving fluid flow through low permeability reservoir in the presence of nanoparticles: An experimental core flooding WAG tests. Iran. J. Oil Gas Sci. Technol. 2023, 12, 1–14. [Google Scholar]
  17. Guo, T.; Hao, T.; Yang, X.; Li, Q.; Liu, Y.; Chen, M.; Qu, Z. Numerical simulation study of fracture propagation by internal plugging hydraulic fracturing. Eng. Fract. Mech. 2024, 310, 110480. [Google Scholar] [CrossRef]
  18. Amiri, S.S.; Mottahedi, S.; Lee, E.R.; Hoque, S. Peeking inside the black-box: Explainable machine learning applied to household transportation energy consumption. Comput. Environ. Urban Syst. 2021, 88, 101647. [Google Scholar] [CrossRef]
  19. Lundberg, S.M.; Lee, S. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
  20. Antwarg, L.; Miller, R.M.; Shapira, B.; Rokach, L. Explaining anomalies detected by autoencoders using shapley additive explanations. Expert Syst. Appl. 2021, 186, 115736. [Google Scholar] [CrossRef]
  21. Zhang, Y.; Song, K.; Sun, Y.; Tan, S.; Udell, M. Why should you trust my explanation? understanding uncertainty in lime explanations. arXiv 2019, arXiv:1904.12991. [Google Scholar] [CrossRef]
Figure 1. Schematic diagram of stacking algorithm.
Figure 1. Schematic diagram of stacking algorithm.
Processes 13 03132 g001
Figure 2. Comparison of MAE for various algorithms.
Figure 2. Comparison of MAE for various algorithms.
Processes 13 03132 g002
Figure 3. Comparison of RMSE for various algorithms.
Figure 3. Comparison of RMSE for various algorithms.
Processes 13 03132 g003
Figure 4. Comparison of MAPE for various algorithms.
Figure 4. Comparison of MAPE for various algorithms.
Processes 13 03132 g004
Figure 5. Correlation degree of base learner algorithm.
Figure 5. Correlation degree of base learner algorithm.
Processes 13 03132 g005
Figure 6. Calculation flowchart of stacking model.
Figure 6. Calculation flowchart of stacking model.
Processes 13 03132 g006
Figure 7. Box plot of partial features in the dataset.
Figure 7. Box plot of partial features in the dataset.
Processes 13 03132 g007
Figure 8. Distribution plot of partial features in the dataset.
Figure 8. Distribution plot of partial features in the dataset.
Processes 13 03132 g008
Figure 9. Comparison of JS and KL divergence under different data processing methods.
Figure 9. Comparison of JS and KL divergence under different data processing methods.
Processes 13 03132 g009
Figure 10. Performance comparison heatmap of models.
Figure 10. Performance comparison heatmap of models.
Processes 13 03132 g010
Figure 11. Performance of integrated models with different combinations.
Figure 11. Performance of integrated models with different combinations.
Processes 13 03132 g011
Figure 12. Comparison of integrated models and individual models.
Figure 12. Comparison of integrated models and individual models.
Processes 13 03132 g012
Figure 13. Comparison of actual and predicted values for integrated models and individual models.
Figure 13. Comparison of actual and predicted values for integrated models and individual models.
Processes 13 03132 g013
Figure 14. Grid of coupled model for dense gas crack expansion and productivity prediction.
Figure 14. Grid of coupled model for dense gas crack expansion and productivity prediction.
Processes 13 03132 g014
Figure 15. Pressure contour after 90 days of production.
Figure 15. Pressure contour after 90 days of production.
Processes 13 03132 g015
Figure 16. Comparison of range of influencing factors.
Figure 16. Comparison of range of influencing factors.
Processes 13 03132 g016
Figure 17. Comparison of effects of influencing factors.
Figure 17. Comparison of effects of influencing factors.
Processes 13 03132 g017
Figure 18. Global interpretation for SHAP.
Figure 18. Global interpretation for SHAP.
Processes 13 03132 g018
Figure 19. Ranking of features in LIME Global Algorithm.
Figure 19. Ranking of features in LIME Global Algorithm.
Processes 13 03132 g019
Figure 20. Ranking of influencing factors for mechanism model, LIME, and SHAP.
Figure 20. Ranking of influencing factors for mechanism model, LIME, and SHAP.
Processes 13 03132 g020
Table 1. Comparison of model errors.
Table 1. Comparison of model errors.
AlgorithmRMSEMAEMAPER2
Bagging84.40465.16723.20.576
Random Forest85.34366.01323.20.558
AdaBoost86.70372.75624.90.542
GBRT77.19156.60221.50.646
XGBoost68.54847.33917.00.732
LightBGM76.36955.80619.70.675
BP149.407115.49534.00.434
GRNN182.473151.54652.80.258
SVR182.521150.22753.80.239
Table 2. Impact characteristics.
Table 2. Impact characteristics.
reservoir propertiesPorosity
Water saturation
Clay content
Permeability
rock mechanicsElastic modulus
Poisson’s ratio
Tensile strength
geomechanicsMinimum horizontal principal stress
Maximum horizontal principal stress
Vertical stress
fracturing technologyDisplacement
Total fracturing liquid volume
Total proppant mass
Liquid strength
Sand strength
Horizontal length
Sand body drilling encounter rate
Number of clusters
Flowback rate
Pre-liquid ratio
Average sand ratio
Table 3. Statistical analysis of dataset distribution characteristics.
Table 3. Statistical analysis of dataset distribution characteristics.
FeatureUnitMinMaxMedianIQR
Horizontal lengthm70012501111.429181.242
Sandstone lengthm6001200913.5834223.9088
Sand body drilling encounter rate%159552.9050627.40792
Porosity%4.810.17.0160272.084972
Water saturation%228059.939724.22778
Total fracturing liquid volumem34200120006851.8422163.333
Total proppant masst260980511.2991207.9189
Fluid intensity/sandstone lengthm3/m5.5128.2642251.954716
Proppant intensity/sandstone lengtht/m0.351.020.5636970.188821
90-day cumulative gas104 m3120820400220
Table 4. Error evaluation results of stacking models composed of different meta-regressors and base regressors.
Table 4. Error evaluation results of stacking models composed of different meta-regressors and base regressors.
Combination of Base LearnersMeta-LearnerEnsemble Learning Regression ModelMAERMSEMAPER2
Bagging + LightGBM + Back Propagation + SVRSVRStacking #147.33976.78116.9870.736
Random Forest + XGBoost + BPNeural Network + SVRSVRStacking #244.80967.38815.6940.83
Bagging + XGBoost + Back Propagation + SVRSVRStacking #338.0664.61315.1320.923
Random Forest + LightGBM + Back Propagation + SVRSVRStacking #453.11776.57119.7010.701
Bagging + LightGBM + Back Propagation + SVRRandom ForestStacking #555.25676.36919.6140.695
Random Forest + XGBoost + Back Propagation + SVRRandom ForestStacking #646.44268.54816.3410.759
Bagging + XGBoost + Back Propagation + SVRRandom ForestStacking #752.79376.78119.7480.702
Random Forest + LightGBM + Back Propagation + SVRRandom ForestStacking #856.60277.19121.4350.654
Bagging + Random Forest + AdaBoost + GBRT + XGBoost + LightBGM + Back Propagation + GRNN + SVRSVRStacking #944.42564.70315.5380.889
Bagging + Random Forest + AdaBoost + GBRT + XGBoost + LightBGM + Back Propagation + GRNN + SVRRandom ForestStacking #1045.52168.09115.9140.794
Table 5. Stacking #3, 9-fold cross-validation.
Table 5. Stacking #3, 9-fold cross-validation.
MetricFold1Fold2Fold3Fold4Fold5Fold6Fold7Fold8Fold9Mean[95% CI]
MAE38.1836.6538.5440.6436.4236.4240.7738.8335.8638.03[36.62, 39.45]
RMSE65.8665.8568.5462.5862.3360.9865.4766.0762.2164.43[62.52, 66.34]
MAPE (%)15.2514.6216.5414.2614.915.5814.8315.6714.7915.16[14.63, 15.69]
R20.9210.940.9250.9170.9320.9160.9270.910.9150.923[0.915, 0.930]
Table 6. Table of prediction error ranking for the stacking model.
Table 6. Table of prediction error ranking for the stacking model.
ModelM1M2M3M4M5M6M7M8M9M10
Avg. rank6.43.952.956.957.25.16.256.753.85.6
Table 7. Common approach contrast.
Table 7. Common approach contrast.
BenchmarkTypical InputsWhen UsableOutputStrengthsLimitations in Our Setting
DCA/RTA (Arps-type, rate-transient)Post-frac production rate/pressure historyPost-job (weeks–months of data)Short-/long-term forecast, reservesSimple, standard, well-understoodNot available pre-job; early-time bias; sensitive to shut-ins/operational noise
Analytical post-frac proxies (e.g., SRV-based, half-length models)Net pay, frac geometry, kh, μ, ΔpPre-/post-jobAOF/PI, early-time ratesFast, physics-transparentRequire idealized geometry; limited heterogeneity/stress-shadow handling
Numerical simulators (P3D/3D frac-reservoir)Detailed rock/DFN, cluster schedule, fluidPre-job (with calibrated inputs)Rate/pressure transientsRich physics, what-if studiesHeavy calibration; assume cluster efficiency and uniform leakoff; runtime/cost
This work: Stacking (with LIME/SHAP)Geology/mechanics + design features (pre-job)Pre-job and post-job screening90-day cumulative gas + explanationsData-efficient; fast; explanations consistent with physics; robust to operational variabilityPredictive scope tied to training domain; does not replace full-physics what-if analysis
Table 8. Modeling parameters for segmented fracturing of horizontal wells.
Table 8. Modeling parameters for segmented fracturing of horizontal wells.
Grid size/m × m2000 × 500Matrix permeability/μm210−4
Inner diameter of wellbore/mm62Rock compressibility coefficient/MPa−17.39 × 10−4
fracture0.50Formation temperature/K373.15
Crack permeability/D100Formation pressure/MPa33.13
rock porosity0.09Horizontal segment length/m1500
Gas viscosity/(mPa·s)11.07 × 10−3Natural gas density of reservoir/(g·m3)0.72 × 10−3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, X.; Yu, Z.; Zhou, C.; Wang, Y.; Bai, Y. Research on Capacity Prediction and Interpretability of Dense Gas Pressure Based on Ensemble Learning. Processes 2025, 13, 3132. https://doi.org/10.3390/pr13103132

AMA Style

Liu X, Yu Z, Zhou C, Wang Y, Bai Y. Research on Capacity Prediction and Interpretability of Dense Gas Pressure Based on Ensemble Learning. Processes. 2025; 13(10):3132. https://doi.org/10.3390/pr13103132

Chicago/Turabian Style

Liu, Xuanyu, Zhiwei Yu, Chao Zhou, Yu Wang, and Yujie Bai. 2025. "Research on Capacity Prediction and Interpretability of Dense Gas Pressure Based on Ensemble Learning" Processes 13, no. 10: 3132. https://doi.org/10.3390/pr13103132

APA Style

Liu, X., Yu, Z., Zhou, C., Wang, Y., & Bai, Y. (2025). Research on Capacity Prediction and Interpretability of Dense Gas Pressure Based on Ensemble Learning. Processes, 13(10), 3132. https://doi.org/10.3390/pr13103132

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop