2.1. Simulation of Fracturing Process
The complex oilfield environment has brought many challenges to the collection of fracturing operation data. On the one hand, equipment deployment is difficult and maintenance costs are high. On the other hand, restricted by strict safety and environmental protection regulations, data collection is difficult to carry out frequently. In addition, problems such as fragmentation and inconsistent formats of historical data further exacerbate the shortage of available data. To obtain enough available data, fracturing simulation software can be used to simulate the fracturing operations process and generate data such as fracture parameters and SRV. Conventional fracturing software, such as StimPlan and MFrac Suite, often adopt the linear elastic deformation mechanism, which frequently causes the fracture parameters derived from software simulations to deviate from actual measured values. In contrast, GOHFER software simulates the fracturing process based on physical assumptions that better align with real-world rock fracture or fracture propagation, such as the shear-slip decoupled fracturing mechanism and the process zone stress (PZS) fracture criterion. Compared with the linear elastic deformation mechanism used in traditional fracturing simulation software, the shear-slip decoupled fracturing mechanism can more realistically depict the dynamic coupling effect between rock shear deformation and sliding separation during fracture propagation. Especially in heterogeneous reservoirs or complex stress fields, it can effectively capture irregular fracture paths and morphological changes in rocks that are difficult to reflect by the linear elastic mechanism. The PZS fracture criterion breaks through the simplified assumption that solely relies on stress concentration at the fracture tip. It fully considers the stress distribution characteristics of the process zone around the fracture tip and judges fracture propagation conditions based on a nearly constant stress threshold within this process zone. This better conforms to the mechanical response law of the process zone during actual rock fracture and can more accurately quantify the critical states of fracture initiation and propagation, particularly in ductile rocks or dynamic fracturing scenarios. As a result, GOHFER software can more precisely simulate fracture propagation, pressure response, and fluid flow behavior. Additionally, GOHFER has an advantage in fracture height control accuracy, mainly because it fully accounts for multiple fracture height constraint mechanisms, including interlayer slip, natural fractures, and plastic deformation. In conclusion, this study selects GOHFER software to simulate the fracturing operation process.
This study conducts fracturing simulation based on on-site geological, engineering, and other actual data from Block Ma 2 of the Mabei Oilfield, located at the northwestern margin of the Junggar Basin. GOHFER software simulates the fracturing operations process through the following steps. First, based on the basic parameters of the T1b2 interval of the Baikouquan Formation in Block Ma 2 of the Mabei Oilfield such as average porosity of approximately 8%, average permeability of about 0.5 mD, and reservoir pressure of around 52 MPa, combined with actual logging data from a fractured well (Well X) in this interval, the formation rock mechanical parameters, including Young’s modulus, Poisson’s ratio, and fracture toughness, are obtained through inversion using elastic wave theory formulas, thereby constructing a 3D geological model of the target interval. Second, we integrate information such as wellbore trajectory, perforation cluster positions, and completion structure to establish a production well model. Subsequently, we input on-site pumping data such as fracturing fluid type, pumping rate, and pressure to drive the software’s fluid–structure interaction numerical simulation engine for calculations [
22,
23]. In addition, GOHFER fully considers the heterogeneity of the geological model and the dynamic interference caused by the stress shadow effect between perforation clusters [
24,
25]. After the simulation of the fracturing process is completed, the software outputs parameters such as SRV, fracture geometric parameters, and production performance for each fractured interval. By simulating multiple combinations of different geological conditions, well conditions, and pumping schedules, the software generates 12,000 sets of data including fracture parameters and SRV, effectively addressing the issue of insufficient on-site data.
2.2. Parameter Sensitivity Analysis
Since hydraulic fracture propagation is a complex multi-factor coupled process involving reservoir geological conditions and fracturing operation parameters, it is comprehensively influenced by geological parameters such as Young’s modulus, formation water saturation, and reservoir temperature, as well as engineering parameters such as pumping pressure, pumping rate, and sand concentration. If all potential parameters are directly incorporated into the model inputs, it is prone to causing the curse of dimensionality. This not only increases the computational complexity of model training and extends the convergence time but also may lead to model overfitting due to interference from redundant features, thereby reducing the prediction accuracy and generalization ability for fracture propagation. Therefore, it is necessary to conduct parameter sensitivity analysis to clarify the key input features of the model. In addition, since the SRV is greatly affected by fracture geometric parameters such as fracture length and fracture width, parameter sensitivity analysis is only carried out for fracture geometric parameters, so as to further screen out the key parameters that have a significant influence on fracture geometric parameters.
The parameter sensitivity analysis in this study is conducted based on the single-factor control variable method combined with the GOHFER software. In each analysis, only the target input parameter is set as a variable, while the remaining parameters are fixed to the on-site measured values of Block Ma 2 in the Mabei Oilfield. By observing the variation amplitude of the fracture geometric parameters output by the GOHFER software, the sensitivity degree of fracture propagation to this parameter is evaluated, and finally, the key input parameters are screened out. Based on the on-site hydraulic fracturing operation experience in Block Ma 2 of the Mabei Oilfield, 10 representative candidate parameters have been initially screened, covering two major categories (geological and engineering). The details are shown in
Table 1.
Sensitivity analysis was conducted on these candidate parameters, with the specific steps as follows: For each candidate parameter, 10 gradient levels were reasonably set in accordance with the principle of uniform distribution and combined with on-site practical conditions, and the gradient setting of the remaining parameters all referred to this principle. The 10 gradient values of a specific target parameter were input sequentially into the GOHFER software, while the remaining parameters remained unchanged. Each gradient group corresponded to one independent hydraulic fracturing simulation experiment; that is, 10 hydraulic fracturing simulation experiments were required for the sensitivity analysis of each parameter, so a total of 100 hydraulic fracturing simulation experiments were needed to complete the sensitivity analysis of all parameters. The values of fracture length, fracture width, and fracture height output from each experiment were recorded, and their variation patterns were analyzed to clarify the sensitivity degree of each candidate parameter to fracture propagation.
2.3. Data Processing
Pumping time-series data such as pumping pressure, pumping rate, and sand concentration generated during fracturing operations serve as the key link connecting fracturing operation parameters and the physical process of fracture propagation. Therefore, it is necessary to collect pumping data. Pumping data can be collected by deploying high-precision sensors such as pressure transducers, flow meters, and densitometers at key locations, including pump outlets, wellhead manifolds, and fracturing pipelines. During on-site fracturing operations, accidents such as pump group vibration, equipment failure, signal transmission interruption, and construction operation errors may occur, leading to discontinuous data collection with missing values or outliers. If the field-collected data are directly used to train the neural network model, the model will learn pseudo-features, causing prediction results to deviate from reality. Therefore, pre-processing of the original data is required to ensure data integrity and accuracy. This paper employs the workflow shown in
Figure 1 to preprocess the original data: firstly, the k-nearest neighbor (KNN) algorithm is used to fill missing values to ensure data integrity; secondly, the 3-sigma method is adopted to identify outliers in the data to guarantee data accuracy; finally, the binomial feature transformation method is used to generate feature combinations to enhance data diversity. This study conducts data preprocessing work based on on-site actual fracturing data collected from multiple fractured wells in Block Ma 2 of the Mabei Oilfield. Among them, the pumping data processed by the KNN algorithm and the 3-sigma method in
Figure 1 are all obtained from part of the on-site pumping data in this block. Furthermore, in the process of generating feature combinations via the binomial feature transformation method as shown in
Figure 1, various involved features, such as x, y, and so on, all correspond to specific types of on-site pumping data, e.g., pumping pressure, pumping rate, and so on.
The KNN algorithm is primarily used for classification and is a supervised learning algorithm. The basic principle of the algorithm is to classify new samples by calculating the distance between samples, essentially making category judgments based on the similarity of the sample space [
26]. When using the KNN algorithm to fill missing values in pumping data, it is first necessary to screen samples with complete features (such as pumping pressure, pumping rate, and sand concentration) from the original data as the known reference dataset. Meanwhile, the remaining samples with missing values are marked as the target dataset to be filled. Then, the distance between the target sample and the reference samples is calculated. Euclidean distance is typically used to measure the distance between two samples. The Euclidean distance is calculated only for the non-missing feature dimensions of the target sample. A smaller distance indicates greater similarity between samples. This quantifies the degree of difference between the target sample and the reference samples. The specific calculation process is as follows:
where
xi and
yi represent the
i-th feature value of the target sample and the reference sample, respectively. For instance, when the pumping pressure value of the target sample is missing, the distance to the reference sample is calculated based solely on pumping rate and sand concentration, ignoring the pumping pressure feature.
Finally, a reasonable K value needs to be determined. The basic process is shown in
Figure 2. The optimal K value can be effectively found through cross-validation and grid search. The specific process is as follows. Firstly, approximately 10% of missing values are randomly introduced into the known reference data to simulate actual missing scenarios. Next, multiple candidate K values are selected. The selection of K values should follow the principle of giving priority to odd numbers to avoid voting ties when samples have the same distance. Additionally, the selection of K values needs to balance algorithm stability and feature capture ability. If K is too small, the algorithm is susceptible to local noise data. If K is too large, the data will be over-smoothed, leading to the neglect of local features [
27]. Then, 5-fold cross-validation is performed for each candidate K value. The reference dataset containing simulated missing values is divided into 5 equal parts. Each time, one part is selected as the validation set, and the remaining 4 parts form the training set. For each candidate K value, the KNN model is trained using the training set. After training, the model is used to fill missing values in the validation set, and the Mean Squared Error (MSE) between the filled values and the true values is calculated. After completing 5-fold cross-validation, the average MSE of each K value across all folds is calculated. Finally, the K value with the smallest average MSE is determined as the optimal value for filling actual missing values. After completing the above steps, the actual target dataset needs to be filled with missing values. For each target sample, K nearest neighbor samples are selected from the known reference dataset based on the Euclidean distance and the method for determining the optimal K value. If the target sample has only one missing feature, such as only the pumping pressure value is missing, the pumping pressure values of these K nearest neighbor samples are extracted. Weights are assigned based on the distances between the samples (the closer the distance, the higher the weight), and the filling value is calculated through weighted averaging to complete the filling of the missing feature value. If the target sample has multiple missing features, the above process is performed separately for each missing feature.
For individual values in the pumping data that deviate significantly from others, the 3-sigma method is used for identification and processing. The 3-sigma method leverages the properties of the normal distribution in statistics (approximately 99.7% of data falls within the
µ ± 3
σ range) to quickly screen for outliers [
28,
29]. The first step is to calculate the mean of each parameter in the pumping data to reflect the central tendency of the parameter. Then, calculate the standard deviation of each parameter. Taking pumping pressure as an example, other parameters such as pumping rate and sand concentration can be calculated similarly. The specific calculation process is as follows:
where
n represents the number of collected pumping pressure data points, and
pi and
µp denote the value of the
i-th pumping pressure and the mean value of all collected pumping pressures, respectively.
As shown in
Figure 3, taking the pumping pressure parameter as an example, the data used are obtained from the on-site measured pumping pressure sequence of the Mabei Oilfield. According to the 3-sigma principle, normal pumping pressure values should fall within the interval [
µp − 3
σp,
µp + 3
σp], and data points outside this interval are identified as outliers. Similarly, for parameters such as pumping rate and sand concentration, outliers can be identified using the intervals [
µq − 3
σq,
µq + 3
σq], and [
µc − 3
σc,
µc + 3
σc] respectively. Each data sequence of the parameters is traversed, and each data point is compared with its corresponding normal interval. If the data point falls outside the interval, it is marked as an outlier. These outliers are replaced by the average of the two nearest data points in the time series.
In fracturing operations, parameters such as pumping pressure, pumping rate, and sand concentration are interrelated. Their combined relationships, such as the product of pumping pressure and pumping rate, the square of pumping pressure, etc., contain key information about fracture propagation. Fracture propagation is inherently a nonlinear and multi-factor coupled process. By applying binomial feature transformation to generate power and interaction terms from the original features, these complex coupling relationships can be quantified into new features, thereby more effectively revealing the underlying physical laws of fracture propagation. The binomial feature transformation method mainly includes univariate binomial, bivariate binomial, and full combinatorial binomial feature transformations [
30]. Among them, univariate binomial feature transformation highlights the nonlinear variation law of a single feature through power operation. Bivariate binomial feature transformation captures the synergistic effect between features through the product operation of two features. Full combinatorial binomial feature transformation integrates the advantages of the previous two transformation methods, including both the self-combination of single features and the cross-combination of two features, comprehensively considering the nonlinear relationships of features and the synergistic effects between features. However, univariate binomial feature transformation only focuses on the nonlinear variations in a single feature and fails to capture the interaction effects between different features; full combinatorial binomial feature transformation tends to cause excessive expansion of feature dimensions and increase computational complexity. By contrast, bivariate binomial feature transformation not only can effectively capture the interaction effects between features but also has the advantage of lower computational cost. Therefore, this paper selects bivariate binomial feature transformation to perform feature combination on the original data. As shown in
Figure 4, taking pumping pressure P, pumping rate Q, and sand concentration C as examples, 9 features can be generated through this method: {P, Q, C, P
2, Q
2, C
2, PQ, PC, QC}. Among them, P
2 highlights the impact of extreme pumping pressure on fracture propagation by amplifying the extreme value effect of pumping pressure. Q
2 enhances the difference between high and low pumping rate regions. For example, excessive pumping rate may cause fracture diversion and generate complex fractures, while low pumping rate may affect proppant carrying capacity. C
2 enhances the nonlinear response of high sand concentration regions, highlighting the abrupt increase in sand plugging risk caused by instantaneous high sand concentration. PQ reflects the energy input intensity, embodying the role of high pumping rate and high pumping pressure in synergistically accelerating fracture propagation. PC represents the pumping pressure cost of the proppant carrying process. Abnormal pressure increase under high sand concentration is a key signal for sand plugging risk warning. QC reflects the matching degree between pumping rate and sand concentration, affecting the proppant placement effect and support efficiency in fractures. Full combinatorial binomial feature transformation can, without increasing the amount of original data, use mathematical combinations to excavate the interaction effects and nonlinear relationships between features, expanding the representation capability of the feature space.
Due to the significant differences in dimensions and value ranges of parameters such as pumping pressure, pumping rate, and sand concentration, the Z-Score standardization method is adopted to process the feature data. This method transforms the original features into a standard normal distribution with a mean of 0 and a standard deviation of 1. The processing procedure is as follows:
where
x represents the original feature value, which can be the value of pumping pressure, pumping rate, or sand concentration.
µ represents the mean of the original feature value.
σ represents the standard deviation of the original feature value.
2.4. Pumping Time-Series Feature Extraction Method Based on Wavelet Transform
We considered adopting the wavelet transform time-frequency analysis method to extract the time-series features of pumping data. Wavelet transform decomposes the original signal into different time-frequency domains by stretching and translating the wavelet basis function, so as to effectively extract the features of the signal [
31]. The basic formula is as follows:
where
x (
t) is a signal of parameters such as pumping pressure, pumping rate, and sand concentration changing with time.
a is the scale factor, used to control the stretching of the wavelet and corresponding to frequency analysis.
b is the translation factor, mainly used to control the translation of the wavelet on the time axis and corresponding to time localization.
Ψ (
t) is the wavelet basis function, which generates different wavelet functions through scale stretching a and time translation b to analyze the signal.
W (
a,
b) is the wavelet transform coefficient, reflecting the similarity between the original signal and the wavelet basis function at different time-frequency positions.
Common wavelet basis functions include Daubechies wavelets, Symlets wavelets, and Coiflets wavelets, which should be adaptively selected based on the characteristics of pumping data [
32,
33,
34]. Among them, Daubechies wavelets tend to cause phase distortion due to their asymmetric nature, leading to misalignment of mutation points in pumping data. Symlets wavelets, with approximate symmetry, can not only effectively suppress phase distortion and accurately preserve the positional information of mutation points, but also retain the feature correlations of parameters when decomposing multiparameter-coupled pumping curves. Coiflets wavelets are suitable for the sparse representation of smooth signals and the extraction of slow-varying features. Considering the characteristics of pumping data with mutations, Symlets wavelets are selected for analysis. After the wavelet basis function is determined, the decomposition level J needs to be further set. If the decomposition level is too small, it is difficult to fully extract the multi-scale features of the signal. If the level is too large, it will increase the computational burden and may introduce redundant information. Here, a decomposition level of 3 is selected to decompose the pumping data signal to balance the completeness of features and computational cost.
Using the selected Symlets wavelet basis function and decomposition level, discrete wavelet decomposition is performed on the pumping data. Taking the pumping pressure signal as an example, decomposition by the Mallat algorithm gives:
where
A3 is the third layer approximation coefficient, reflecting the overall trend of the pumping pressure signal.
D1,
D2 and
D3 are the first, second- and third-layer detail coefficients, corresponding to the variation details of the pumping pressure in different frequency ranges, respectively. Similarly, decompose the pumping rate and sand concentration signals to obtain their respective approximation coefficients and detail coefficients.
Statistical information such as mean, variance, energy, and entropy are calculated for the wavelet coefficients of each layer. The mean reflects the central tendency of the coefficients, corresponding to the average intensity of the signal in the frequency band. The variance characterizes the dispersion degree of the coefficients, corresponding to the fluctuation amplitude of the signal in the frequency band. The energy represents the characteristic intensity of the frequency band. The entropy reflects the complexity and uncertainty of the coefficients. The larger the entropy value, the more complex the signal changes in the corresponding frequency band. The specific calculation methods are shown in Equations (6)–(9):
where
Nj is the number of data points of the
j-th layer coefficients.
cj,i is the
i-th coefficient value of the
j-th layer, and
µcj is the mean value of the
j-th layer coefficients.
where
σ2cj represents the variance of the
j-th layer coefficients.
where
Ecj represents the energy distribution of the
j-th layer coefficients, corresponding to the energy proportion of the signal in the frequency band. The greater the energy, the more significant the features of the frequency band.
where
Ej,i is the energy proportion of the
i-th coefficient in the
j-th layer.
Hcj represents the entropy of the
j-th layer coefficients.
Based on the time-series features of pumping pressure, pumping rate, and sand concentration extracted by wavelet transform, the fracture evolution stages throughout the pumping process, such as fracture initiation, stable propagation, and pump shutdown fracture closure, can be identified. After these features are input into the neural network model, the model identifies the fracture propagation moments, obtains the fluctuation characteristics of fracture propagation, and predict the fracture propagation status in real time.
2.5. The LSTM Model Integrated with Attention Mechanism
The Long Short-Term Memory (LSTM) is a special type of Recurrent Neural Network (RNN). By introducing gating mechanisms such as the forget gate, input gate, and output gate to control the flow of information, it solves the vanishing gradient and exploding gradient problems of traditional RNN and performs well in time-series data processing and prediction tasks [
35]. However, when processing the complex time-series features of pumping data, LSTM faces challenges in automatically identifying discriminative features that are highly relevant to critical fracture propagation stages, specifically initiation points and turning points [
36,
37]. To address this, this study proposes an LSTM-based method fused with an attention mechanism. The attention mechanism assigns weights to different features, strengthens the representation of critical features, and thereby improves the model’s prediction accuracy for fracture propagation status.
The structure of the preliminarily constructed LSTM model integrated with an attention mechanism in this paper is shown in
Figure 5. This model adopts a five-layer architecture design, including an input layer, an LSTM layer, an attention mechanism layer, a fusion layer, and an output layer [
38].
The input layer receives multi-dimensional time-series features of pumping pressure, pumping rate, sand concentration, etc., extracted by wavelet transform, and employs a dynamic batch normalization mechanism to support real-time data streams being input in batches according to time steps. The LSTM layer adopts a bidirectional structure, which simultaneously captures the historical dependencies and future information of time-series features through forward and backward neurons. Meanwhile, a Dropout layer is embedded between layers to suppress the overfitting phenomenon by randomly discarding the outputs of some neurons. The LSTM layer processes the features from the received input layer, outputs the corresponding hidden state at each time step, and forms a sequence. Since the hidden state sequence output by the LSTM layer may not have its dimensionality and feature distribution directly adapted to the computational requirements of the attention mechanism, the attention mechanism layer first needs to perform spatial dimension mapping on this hidden state sequence through a fully connected layer. This fully connected layer contains a learnable weight matrix and bias term and maps each hidden state from the LSTM feature space to the preset attention space through linear transformation. This mapping process not only realizes the adaptation of feature dimensionality but also can, through the learning of the weight matrix, perform targeted strengthening of the feature components in the hidden state that are related to the key stages of fracture propagation. For example, in the fracture initiation stage, the pumping pressure rises sharply and reaches the first peak within the construction cycle. In the stable propagation stage, the pumping pressure drops to a stable range with a small fluctuation amplitude, the pumping rate maintains continuous and stable output to match the fracture propagation demand, and the sand concentration increases stepwise to the design value and remains stable to support the fracture. In the fracture closure stage, the pumping rate shows a gradually decreasing trend and finally drops to zero pumping rate. At the same time, the pumping pressure shows a gradual attenuation trend as the pumping rate decreases. The specific calculation formula for the mapped feature vector is as follows.
where
ui is the mapped feature vector.
Wa represents the learnable weight matrix of the fully connected layer,
stands for the original hidden state output by the LSTM and
ba indicates the bias term of the fully connected layer.
After completing the spatial mapping, it is necessary to further calculate the attention scores. The core logic is to quantify the importance of the features at each time step in the fracture propagation process based on the mapped feature vectors. Generally, the initial scores are obtained by conducting similarity measurement between the mapped feature vectors and a learnable “query vector”. Subsequently, the attention weights are generated by normalizing the initial scores of all-time steps. In the fracture propagation scenario, these weights intuitively reflect the contribution degree of the features at each time step to the overall evolution process. The calculation formulas for attention scores and attention weights are as follows:
where
ei represents the initial attention score at the
i-th time step and
vq represents the learnable query vector.
Wq represents the learnable weight matrix and tanh serves as the activation function.
where
αi represents the attention weight at the
i-th time step and exp denotes the exponential function, which can amplify the differences in scores while ensuring the weights are positive.
T represents the total number of time steps.
The attention mechanism layer performs weighted summation of the original hidden states output by the LSTM layer and the corresponding attention weights to generate a context vector. Through weight assignment, this vector selectively aggregates the core features of the key stages of fracture propagation. This focused feature aggregation enables the context vector to more accurately characterize the laws of fracture propagation. The calculation formula for the context vector is as follows:
where
c represents the context vector.
αi represents the attention weight at the
i-th time step and
stands for the original hidden state output by the LSTM.
The fusion layer introduces a gating mechanism to adaptively adjust the fusion weights of LSTM hidden state sequences and context vectors through a dynamic learning strategy. The feature vector generated after weighted fusion contains both time-series features extracted by LSTM and key information focused by the attention mechanism, thereby deeply analyzing the intrinsic correlation between pumping time-series features and dynamic fracture propagation [
39,
40]. Since the prediction of fracture propagation is a regression task, the output layer uses a zero-centered Tanh activation function. Compared with other activation functions such as Sigmoid, Tanh can limit the output to −1 to 1, reduce vanishing gradients, accelerate training convergence, and improve model training efficiency.
2.6. Model Training
The dataset is divided into a training set and a test set at an 8:2 ratio. When training the neural network model, reservoir characteristics, fracturing operation data simulated by GOHFER software, and extracted pumping time-series features are used as inputs, and the model outputs fracture geometric parameters and SRV. To quantify the difference between model predictions and actual values and optimize parameters, regression tasks often use loss functions such as mean squared error (MSE), root mean squared error (RMSE), and mean absolute percentage error (MAPE). Among them, MAPE refers to the percentage of the mean of absolute errors between predicted values and actual values relative to the actual values. It can quantify the difference between the model’s predicted values and actual values into an intuitive percentage form and has significant advantages in cross-scale parameter evaluation and model generalization performance analysis. However, the calculation of MAPE takes actual values as the denominator. When the actual value is 0, the denominator loses mathematical significance, resulting in the failure of normal calculation of the indicator. This makes MAPE difficult to apply to scenarios where actual values include 0. Based on this, this study improves the calculation method of MAPE: a fixed constant is introduced into its denominator to circumvent the problem of the denominator being 0, thereby reducing the calculation deviation caused by the meaningless denominator and ensuring the effectiveness of the indicator in scenarios where actual values include 0. For the remainder of this study, the subsequently mentioned MAPE specifically refers to the modified MAPE. The specific calculation formula is as follows:
where
Ai represents the actual value,
Fi represents the predicted value of the model, and
n is the number of samples.
c represents a fixed value more than 0.
The hyperparameters to be optimized include learning rate, batch size, number of hidden layers, number of neurons, and Dropout ratio. In this study, the single-factor optimization method is adopted to conduct model training based on the training set. The MAPE between the fracture geometric parameters and SRV predicted by the model during each training session and those of the training set is calculated as the optimization index. The optimal values of each hyperparameter are determined in the sequence of learning rate, batch size, number of hidden layers, number of neurons, and Dropout ratio, and finally the determination of the model structure and training parameter tuning are completed. The specific optimization process is as follows: First, set the base values of each hyperparameter as the initial starting point, and prioritize the optimization of the learning rate. Test different values within the preset range of the learning rate, calculate the MAPE between the corresponding model prediction results and the training set, and select the learning rate that minimizes the MAPE as the optimal learning rate. Next, based on the optimal learning rate determined in the previous step and keeping the initial values of other parameters unchanged, continue to optimize the batch size. Similarly, test different values within its preset range and select the optimal value using MAPE as the index. Subsequently, using the optimal values of learning rate and batch size obtained from the previous two steps, sequentially optimize the number of hidden layers, number of neurons, and Dropout ratio according to the above method until the optimal values of all hyperparameters are determined. Finally, the model training and hyperparameter optimization are completed. After multiple rounds of tests and experiments, the training results of the model are shown in
Table 2.
Experiments indicate that when the learning rate is 0.0001, the batch size is 32, and the number of neurons is 36, the MAPE of the model on the training set reaches 14.98%, achieving the optimal prediction performance. To further explore the model performance, adjustments were made to the number of hidden layers in the neural network. After increasing the number of LSTM layers to 3, the model’s prediction accuracy did not improve significantly; instead, it increased the model complexity, leading to a decline in training efficiency and even the occurrence of overfitting. In addition, the deep structure may also trigger problems such as vanishing gradient or exploding gradient, making it difficult for the model to converge. By contrast, the two-layer LSTM architecture, through streamlining the neural network model structure, can not only significantly improve training efficiency and reduce computational resource consumption but also effectively lower the risk of overfitting. Therefore, considering both prediction accuracy and computational efficiency comprehensively, a neural network architecture with two LSTM layers was finally selected, and the surrogate model for intelligent prediction fracture propagation was obtained.