3.1. Construction of a Production Prediction Model Driven by Hybrid Physical Mechanisms
The objective of production optimization is the dynamic adaptation and precise quantification under a geology–engineering integrated framework. This requires a deep coupling among geological reservoir potential, engineering stimulation effectiveness, and the dynamic evolution of production performance. Therefore, it is necessary to characterize the complex pore structure and seepage behavior of shale reservoirs based on fractal theory, and to integrate hydraulic fracturing engineering mechanisms for production prediction. The flowchart of the hybrid physics-driven production prediction model is shown in
Figure 2.
First, core experimental data from the L Formation shale gas reservoir—including porosity, permeability, total organic carbon (TOC) content, and brittleness index—were collected, together with logging data (GR, RT, AC, DEN), seismic interpretation results (reservoir thickness, burial depth, fault development characteristics), hydraulic fracturing operational parameters (injection rate, sand ratio, fracturing fluid volume, perforation parameters), and production dynamic data (daily gas production, bottom-hole flowing pressure, cumulative gas production). Through data preprocessing procedures such as data cleaning (outlier removal and missing value imputation), normalization (to eliminate dimensional inconsistencies), and spatiotemporal matching of geological and engineering parameters, a total of 120 coupled geology–engineering–production samples were obtained, covering different burial depths, fracturing scales, and production stages. The 120 coupled geology–engineering–production samples used in this study were constructed from field-acquired and field-interpreted multi-source data, including laboratory measurements on reservoir core samples, logging data, seismic interpretation results, hydraulic fracturing operational records, and actual production dynamic data. These samples therefore represent matched real-well observations rather than numerical-simulation-generated samples. For clarity, the synthetic fracture data and SMOTE-generated samples introduced later in
Section 3.2 were used only for auxiliary training enhancement and domain adaptation, and were not included in the original 120-sample dataset.
Feature parameter selection was designed to account for both the static endowment of the geological reservoir and the dynamic response of engineering stimulation and production processes. Static parameters mainly represent the fundamental geological conditions and baseline engineering attributes, including reservoir static parameters (porosity, permeability, TOC content, brittleness index, clay mineral content, reservoir thickness, burial depth, and formation pressure coefficient) and engineering static parameters (fracturing stage length, perforation density, total proppant volume, total fluid volume, and peak injection rate). Dynamic parameters characterize the dynamic interactions among the reservoir, fluid flow, and stimulated reservoir volume during production, including production dynamic parameters (daily gas production, bottom-hole flowing pressure, casing pressure, and production pressure drawdown) and post-fracturing dynamic response parameters (instantaneous flowback rate, total dissolved solids of flowback fluid, and proppant flowback concentration).
Considering the characteristics of the dataset—namely high dimensionality (32 static and dynamic parameters), strong coupling (e.g., significant correlations between porosity and permeability, and between proppant volume and fracture conductivity), temporal dependency (dynamic parameters evolving continuously with production time), and localized nonlinearity caused by reservoir heterogeneity—a hybrid physics-driven CNN–LSTM modeling framework was constructed. Specifically, the convolutional neural network (CNN) was employed to extract local features and uncover latent coupling relationships between high-dimensional geological and engineering static parameters, such as the influence of the matching between brittleness index and proppant volume on fracture propagation. Meanwhile, the long short-term memory (LSTM) network was utilized to capture temporal dependencies in dynamic parameters and to model their time-dependent driving effects on production performance. Furthermore, fractal seepage governing equations—such as fractal porous-media permeability models and fractal fracture conductivity formulations—were embedded as physical constraints, enabling a deep integration of data-driven learning and physical mechanisms, thereby improving the accuracy and generalization capability of production prediction.
A random forest consisting of 200 decision trees was constructed using a random forest–based feature association model, and permutation importance was employed to evaluate the influence of fractal parameters on production performance:
The importance scores in
Table 1 were calculated using permutation importance based on the trained random forest model with 200 decision trees. For each feature, its values were randomly shuffled while the remaining features were kept unchanged, and the corresponding decrease in model predictive performance was recorded; the final score was defined as the normalized average performance degradation over repeated permutations. It should be noted that the selection of the top three parameters was not based on a claim of statistical significance between the third- and fourth-ranked variables. Rather, these three variables were retained as a compact core subset because they ranked highest overall and were most directly related to the fracture-dynamic characterization framework of this study, whereas the remaining variables were treated as supporting explanatory factors.
The three parameters with the highest importance scores were selected as the focus of this study, namely the dynamic fractal dimension, fractal conductivity ratio, and adsorption hysteresis coefficient.
By extracting the decision paths of high-production samples (above the 80th percentile), key governing rules were identified:
After data cleaning and outlier detection, an improved wavelet-threshold denoising method was applied to suppress data noise, and the isolation-forest algorithm was used to identify anomalous samples. Subsequently, dynamic time warping was employed for temporal alignment of time-series data, while inverse-distance-weighting interpolation was adopted to transform discrete microseismic event clouds into continuous fracture-density fields. Fractal descriptors such as dominant orientation and fractal dimension were then extracted and mapped to the equivalent fractal fracture model to update the permeability tensor and related dynamic parameters, thereby completing the spatiotemporal alignment of multi-source data and providing the basis for subsequent history matching and dynamic parameter inversion.
3.2. Development of a Machine Learning Prediction Model Driven by Fractal Theory and Physical Mechanisms
This study proposes an intelligent analysis framework for shale gas development that integrates fractal theory with physical mechanisms. By employing dynamic fractal fracture characterization, a time-varying mathematical model of complex fracture networks is established. Combined with a physics-constrained machine learning architecture [
24,
25]—including three core modules: (i) an LSTM embedded with seepage governing equations for dynamic production prediction, (ii) a fractal-preference random forest for feature association analysis, and (iii) a three-dimensional fractal convolutional network for fracture identification—a multi-objective optimization strategy is constructed to enable coordinated decision-making for dynamic production regulation and decline management under fracturing-constrained reservoir conditions. In this study, hydraulic fracturing design parameters were incorporated as static engineering inputs for feature association and fracture characterization, whereas the final optimization variables were restricted to post-fracturing operational controls, namely bottom-hole flowing pressure, flowback rate, and production regime switching time. The proposed framework innovatively employs fractal dimension as a unifying linkage throughout the entire workflow, encompassing data preprocessing, feature engineering, and model training. Based on numerical simulation models, an iterative optimization chain of “dynamic prediction–feature association–fracture identification” is formed. Through dynamic updating of fractal parameters and coupled multi-physics-field solutions, the framework drives the closed-loop evolution of development strategies, promoting a paradigm shift in shale gas development from experience-driven practices toward an intelligent mechanism–data fusion approach. This work provides theoretical tools and methodological support for the efficient development of unconventional oil and gas resources.
- (1)
Design of Core Modules and Collaborative Mechanisms
- a.
Dynamic Prediction Module (Seepage-Equation-Embedded LSTM)
To address the temporal evolution characteristics of production dynamic parameters, fractal-corrected seepage governing equations—including the equivalent permeability tensor formulation and the fractal fracture conductivity attenuation model—were embedded into the training process of a long short-term memory (LSTM) network as physical constraints. Through the forget gate, input gate, and output gate mechanisms of the LSTM architecture, the temporal dependencies of dynamic variables such as daily gas production and bottom-hole flowing pressure were effectively captured. Meanwhile, a physics-based loss function was introduced to constrain the network outputs to comply with fractal seepage laws, thereby preventing time-series predictions from deviating from engineering realities.
The total training loss is formulated as
where
Ldata denotes the data-misfit term,
Lphys denotes the physics-residual term,
Lic denotes the initial-condition consistency term, and
Lbc denotes the boundary/control-condition consistency term. In this study,
Ldata is used to fit the observed production variables, including daily gas production and bottom-hole flowing pressure.
Lphys is constructed from the fractal seepage governing equation together with the dynamic fracture-conductivity relationship, so as to penalize predictions that violate the physical evolution law of the reservoir-fracture system.
Lic is used to constrain the prediction to match the prescribed initial production state, and
Lbc is used to enforce consistency with the corresponding operational control conditions, including bottom-hole flowing pressure, flowback rate, and regime-switching settings. After normalization of the different loss components, the weighting coefficients λ
d, λ
p, λ
ic, and λ
bc are introduced as balancing hyperparameters during training, where λ
d is taken as the reference coefficient for data fitting and the remaining coefficients are tuned on the validation set to balance predictive accuracy and physical consistency.
The core input features of this module include key fractal parameters such as the dynamic fractal dimension, fractal conductivity ratio, and adsorption hysteresis coefficient, together with dynamic response parameters including production pressure drawdown and flowback rate.
- b.
Feature Association Module (Fractal-Preference Random Forest)
A random forest model consisting of 200 decision trees was constructed to identify feature combinations with significant impacts on production performance using permutation importance analysis. The results indicate that the dynamic fractal dimension, fractal conductivity ratio, and adsorption hysteresis coefficient constitute the top three most influential features. This module focuses on uncovering nonlinear associations between static parameters (e.g., porosity, total organic carbon (TOC) content, and fracturing stage length) and fractal characteristics. Decision rules extracted from high-production samples—for instance, optimal production achieved when the fractal dimension falls within the range of 1.52–1.68 and matches a high proppant volume—provide targeted feature-weight allocation for the CNN module, thereby enhancing the representation of critical geology–engineering coupling information.
- c.
Fracture Identification Module (3D Fractal Convolutional Network)
To address the discreteness of microseismic monitoring data, a three-dimensional convolutional neural network was employed to extract fractal features of fracture spatial distribution through 3D convolution operations. Combined with the global fractal dimension calculated using the box-counting method, discrete microseismic event point clouds were transformed into a continuous fractal stimulated reservoir volume (F-SRV) representation. In this study, the fractal stimulated reservoir volume (F-SRV) represents the effective stimulated reservoir space associated with the fracture network under the fractal representation. The reference F-SRV used for validation is obtained from microseismic-interpreted stimulated fracture volume, while the model-predicted F-SRV is derived from the corresponding fracture-space representation generated by the EFF or DFN model under the same reservoir conditions. The outputs of this module—including fracture density fields and spatial distributions of fractal dimensions—are used to dynamically update the equivalent permeability tensor field, providing real-time reservoir seepage condition parameters for the LSTM module. This enables dynamic coupling between fracture evolution and production prediction.
In the present framework, the mapping from microseismic observations to permeability is implemented as a constrained sequential process. Discrete microseismic event clouds are first denoised, temporally aligned, and spatially interpolated into a continuous fracture-density field. Fractal descriptors extracted from this field are then used to characterize the corresponding F-SRV, and the resulting fracture-density and fractal-geometry information is subsequently mapped to the equivalent permeability tensor through the fractal permeability formulation introduced in
Section 2.2. It should be noted that the current inversion is regularized by feasible fractal-dimension ranges, fracture-spacing constraints, fracture-conductivity retention constraints, and production-history matching.
For equivalent fractal fractures, the following constraints are imposed on the optimization model to ensure fracture effectiveness:
(1) Fracture Complexity Constraint
The fractal dimension must satisfy the feasibility requirements of hydraulic fracturing operations:
(2) Fracture Spacing Non-Interference Constraint
Based on fractal percolation theory, inter-stage stress interference is avoided:
where
Lf denotes the fracture half-length, and
Dc denotes a critical connectivity threshold in the fractal-percolation-based description of the fracture network. It is used to characterize the minimum fractal-connectivity condition required for effective fracture communication under the equivalent fractal representation. In this study,
Dc is introduced as a physically constrained threshold parameter in the fracture-spacing non-interference condition, rather than as an independent optimization variable.
(3) Fracture Conductivity Degradation Constraint
A constraint is imposed on the retention of fracture conductivity after five years of production:
To address the optimization challenges in shale gas horizontal well development—characterized by multiple objectives, multiple constraints, and high-dimensional decision spaces—this section proposes an improved multi-objective intelligent optimization algorithm guided by fractal theory. Through dynamic parameter adjustment, hybrid strategy integration, and enhanced computational efficiency, production scheme optimization is achieved [
28,
29].
- (2)
Model Training Optimization Strategies
- a.
Domain-Adaptive Fine-Tuning
To mitigate the distribution discrepancy between synthetic fracture data and real microseismic monitoring data, a domain adaptation approach was adopted. Specifically, the maximum mean discrepancy (MMD) distance was minimized using data from 50 real wells. By employing a Gaussian kernel function, the feature distribution divergence between the source domain (synthetic data) and the target domain (real data) was reduced, thereby improving the model’s adaptability to actual shale gas reservoirs. The core formulation is expressed as:
- b.
Positive–Negative Sample Balancing
To address model bias caused by the scarcity of low-production well samples, the Synthetic Minority Over-sampling Technique (SMOTE) was employed to generate synthetic samples for the minority class (low-production wells). New feature vectors were constructed using a random interpolation strategy, thereby improving class balance and enhancing the robustness of the model:
- c.
Bayesian Hyperparameter Optimization
Considering the high computational cost of model training and the complex relationship between hyperparameters and model performance, Bayesian optimization was selected as the core optimization strategy. A Gaussian process surrogate model was employed to efficiently search for the optimal combination of hyperparameters. Meanwhile, constraints on fractal dimension (1.2–1.8) and geological–engineering rules (such as the fracture spacing non-interference condition) were embedded into the optimization process to ensure the physical compatibility of the optimized hyperparameters. This strategy significantly reduces trial-and-error costs during model training and improves overall training efficiency.
- d.
Design of Multi-Dimensional Optimization Objectives
To achieve multi-objective optimization in shale gas horizontal well development, the optimization framework was constructed with cumulative gas production, stable production period, and fractal stimulated reservoir volume as optimization objectives, forming a multi-objective optimization model:
(1) Maximization of Cumulative Production
(2) Extension of the Stable Production Period
The stable production period is defined as the duration during which the annual production decline rate
is less than 20%:
(3) Optimization of Fractal Stimulated Reservoir Volume (F-SRV)
The effective stimulated region is characterized by the dynamic evolution of the fractal dimension:
The three engineering indicators in Equations (19)–(21), namely cumulative production, stable production period, and fractal stimulated reservoir volume (F-SRV), are used to characterize the production, decline, and stimulated-volume responses of candidate operating strategies. In the subsequent case study, these indicators are not treated as separate final economic objectives; instead, they are translated into time-phased production and control profiles, and the ultimate optimization criterion is the net present value (NPV).