Research on the Multi-Objective Optimization of Drilling Parameters Based on an Improved Coupling Model of MSE and ROP

Wan, Lifu; Song, Hongchen; Wang, Ai-Guo; Li, Meng; Zhang, Zhili; Su, Kanhua; Xu, Jiangen; Kong, Lulin; Yan, Yan; Hu, Gui; Zhang, Guohui

doi:10.3390/pr14101570

Open AccessArticle

Research on the Multi-Objective Optimization of Drilling Parameters Based on an Improved Coupling Model of MSE and ROP

by

Lifu Wan

¹,

Hongchen Song

^1,*,

Ai-Guo Wang

^2,*,

Meng Li

^1,*,

Zhili Zhang

¹,

Kanhua Su

¹,

Jiangen Xu

¹,

Lulin Kong

²,

Yan Yan

³,

Gui Hu

⁴ and

Guohui Zhang

⁴

¹

College of Petroleum and Natural Gas Engineering, Chongqing University of Science and Technology, Chongqing 401331, China

²

CNPC Engineering Technology R&D Company Limited, Beijing 102206, China

³

State Key Laboratory of Oil and Gas Equipment, CNPC Tubular Goods Research Institute, Xi’an 710077, China

⁴

PetroChina, Research Institute of Petroleum Exploration and Development RIPED, Beijing 100083, China

^*

Authors to whom correspondence should be addressed.

Processes 2026, 14(10), 1570; https://doi.org/10.3390/pr14101570

Submission received: 14 April 2026 / Revised: 8 May 2026 / Accepted: 12 May 2026 / Published: 13 May 2026

(This article belongs to the Special Issue Development of Advanced Drilling Engineering)

Download

Browse Figures

Versions Notes

Abstract

Aiming at the engineering challenges of low Rate of Penetration (ROP), high rock-breaking energy consumption, and the difficulty in realizing collaborative optimization of efficiency improvement and consumption reduction through traditional drilling parameter regulation in deep interbedded sandstone and mudstone formations of the Southwest Oil and Gas Field, a study on multi-objective optimization of drilling parameters is carried out. First, the traditional Teale Mechanical Specific Energy (MSE) model is modified, with the introduction of the effective energy utilization coefficient of the drill bit and the downhole torque calculation method, to establish an improved MSE model that fits the actual field drilling conditions. Second, a hybrid TCN-LSTM model fused with an additive attention mechanism is constructed to realize high-precision dynamic prediction of ROP. Finally, taking the on-site real-time controllable Weight on Bit (WOB) and rotary speed as decision variables, the synergy–conflict boundary between the dual objectives of ROP maximization and MSE minimization is clarified, a dual-objective coupled optimization model is established, and the genetic algorithm is used to complete the global optimization. Case verification is carried out based on the actual drilling data of 12 deep wells in the study block, and the results show that the Coefficient of Determination R2 of the established ROP prediction model on the test set reaches 0.91, and the prediction accuracy is significantly better than that of BP, CNN and single LSTM models; after optimization, the ROP of the target well interval is increased by 13.1% compared with the on-site actual drilling value, and the improved MSE is reduced by 23.5%, which simultaneously realizes drilling efficiency improvement and rock-breaking energy saving within the safe operation boundary. The stability of the model and the effectiveness of each module are verified through five-fold inter-well cross-validation and module ablation experiments. The research results can provide a theoretical basis and technical support for the precise regulation of drilling parameters in deep formations of the target block.

Keywords:

drilling parameter optimization; mechanical specific energy; multi-objective optimization; rate of penetration prediction

1. Introduction

As China’s oil and gas exploration and development continue to expand toward deep formations, deep and ultra-deep wells have become the core domain for boosting oil and gas reserves and production. As a critical component of oil and gas exploration and development, the operational efficiency of drilling engineering directly determines the development cycle and comprehensive cost of oil and gas fields. The deep formations of the Southwest Oil and Gas Field are dominated by interbedded sandstone and mudstone, with the characteristics of large vertical depth, complex lithology and poor rock drillability. In field drilling operations, problems including low Rate of Penetration (ROP), high rock-breaking energy consumption, and severe abnormal bit wear are widespread [1]. Traditional adjustment and control of drilling parameters mostly rely on the operational experience of field engineers, which makes it difficult to achieve collaborative optimization of drilling efficiency improvement and rock-breaking energy saving. Therefore, it is urgent to establish a complete multi-objective optimization method for drilling parameters, which is well compatible with field working conditions, and possesses both a rigorous theoretical mechanism and high prediction accuracy [2].

The theory of Mechanical Specific Energy (MSE) was first proposed by R. Teale in 1964 [3]. It is defined as the total mechanical energy consumed by the drill bit to break a unit volume of rock, and serves as the core indicator for the quantitative evaluation of bit rock-breaking efficiency and real-time optimization of drilling parameters. Numerous studies on the optimization and application of the MSE model have been carried out by domestic and foreign scholars, who have continuously improved the calculation accuracy of MSE by introducing correction coefficients for formation properties, drilling fluid, bit wear, and other influencing factors [4,5]. However, most of the existing modified models fail to fully quantify the energy transmission loss between the surface and bottom hole caused by wellbore friction, nor have they established a convenient calculation method for the real bottom-hole torque compatible with conventional field logging data. This leads to a large deviation between the calculation results of the traditional MSE model and the actual downhole rock-breaking working conditions, making it impossible to accurately guide the real-time adjustment and control of drilling parameters.

High-precision prediction of the Rate of Penetration (ROP) is the core prerequisite for realizing refined optimization of drilling parameters [6]. Traditional ROP prediction is mainly based on empirical models and mechanistic models. Affected by the coupling of multiple factors such as formation conditions and drilling working conditions, the generalization ability and prediction accuracy of the models are difficult to meet the real-time requirements of field while-drilling control. In recent years, deep learning methods represented by Long Short-Term Memory (LSTM) networks and Temporal Convolutional Networks (TCNs) have shown significant advantages in feature mining and dynamic prediction of drilling time series data [7,8]. However, a single LSTM model tends to ignore the local abrupt features of data, while a single TCN model has insufficient ability to learn the long-term dependencies of time series data. The existing hybrid models still have the defects of insufficient key feature capture and unsatisfactory prediction accuracy under complex formation conditions, which makes it difficult to support the demand for a high-precision surrogate model for parameter optimization.

In terms of drilling parameter optimization, most existing studies take the maximization of ROP as the single optimization objective, ignoring the collaborative control of rock-breaking energy consumption and the consideration of bit service life. Some multi-objective optimization studies fail to clarify the synergy–conflict boundary between the dual objectives of ROP and MSE. Some of them simplify the dual objectives into a subsidiary evaluation dimension of the single objective without carrying out targeted dual-objective trade-off optimization, while others fail to realize the deep coupling of MSE and ROP, and the selection of decision variables does not fully consider the real-time controllability for field drillers [9]. These defects lead to the optimization results being difficult to simultaneously meet the dual engineering requirements of field efficiency improvement and consumption reduction, with limited field practicability. To address the above problems, this paper takes the field drilling data of 12 deep wells from the Daye 1H Well Platform of the Southwest Oil and Gas Field as the research basis. It should be clarified that the modification of the traditional Teale Mechanical Specific Energy (MSE) model in this paper is not the core innovation, but a necessary foundational work to restore the inherent trade-off relationship between Rate of Penetration (ROP) and MSE in actual engineering practice. If the uncorrected traditional MSE model is directly used, the true interaction between the dual objectives will be seriously distorted due to the unquantified energy transmission loss from the surface to the bottom hole caused by wellbore friction, leading to optimization results that contradict the field construction laws. The core pioneering work of this paper focuses on the multi-objective optimization of drilling parameters for deep interbedded sandstone and mudstone formations in the Southwest Oil and Gas Field: Through rock-breaking mechanism analysis and field drilling law verification, the phased characteristic of “dominated by collaborative optimization with local conflicts” of the dual objectives of ROP maximization and MSE minimization in the full feasible domain of Weight on Bit (WOB) and rotary speed is clarified for this block for the first time, which makes up for the defect of ambiguous dual-objective relationship and unclear optimization direction in existing studies. A parallel TCN–LSTM–Attention model adapted to the characteristics of drilling time series data in deep formations of this block is constructed for ROP prediction. Through dual-branch feature extraction and dynamic weight calibration, both the local abrupt features and long-term dependencies of drilling time series data are taken into account simultaneously. A high-precision prediction with R² = 0.91 is achieved in unfamiliar well intervals not involved in training, which solves the problem of insufficient feature capture capability and poor generalization ability of single models. A deep-coupled global optimization framework for dual-objective drilling parameters applicable to deep formations of this block is established. Only WOB and rotary speed, which can be controlled in real time by field drillers, are selected as decision variables. The collaborative optimization of drilling efficiency and energy consumption is realized through equal-weight weighted aggregation, which overcomes the disadvantage that traditional optimization results are difficult to apply on site due to the inclusion of uncontrollable parameters. Finally, case verification is carried out through the full interval data of Well Daye 1H3-4, which is not involved in training. The research results can provide a theoretical basis and technical support for the precise regulation of drilling parameters and efficiency improvement, and consumption reduction in deep formations of the target block.

2. Data Processing

2.1. Data Acquisition

The research data are derived from well logging, mud logging and well history data of 12 deep wells from the Daye 1H1 Platform to the Daye 1H3 Platform in the block of the Southwest Oil and Gas Field, with a vertical depth ranging from 4200 m to 4600 m, covering multiple lithologic formations, including sandstone and mudstone. The data were acquired at a depth interval of 1 m, with a total of 77,424 sample sets, covering various drilling operating conditions and formation conditions.

The data include two categories of core parameters. The first category is basic drilling parameters, including Weight on Bit (WOB), rotary speed (revolutions per minute, RPM), torque (T), Rate of Penetration (ROP), bit diameter (D), Hook Load, standpipe pressure, Pump Stroke, Inlet Flow Rate, etc. The second category is parameters related to correction coefficients, including DC Index, Bit Service Time (t), Rated Bit Life (t₀), Drilling Fluid Return Velocity (v_f), drilling fluid density, Equivalent Circulating Density (ECD), etc. The above parameters cover the core influencing factors of bit rock-breaking and drilling efficiency evaluation, which not only provide complete input parameters for the improved MSE mechanistic model, but also supply sufficient feature dimensions for the ROP machine learning prediction model.

To avoid data leakage during model training and ensure the favorable generalization ability of the model in undrilled new wells, the inter-well splitting method is adopted to divide the dataset. Specifically, the full-well interval data of eight wells are selected as the training set, the data of two wells as the validation set, and the data of two wells as the test set, with a sample size ratio of approximately 4:1:1 among the three sets. After division, the training set is used for model fitting, the validation set for hyperparameter tuning, and the test set for the final verification of the model’s generalization ability.

2.2. Data Preprocessing

Data preprocessing is designed to eliminate the impacts of outliers, missing values, and dimensional differences on model performance. First, the Z-score method is adopted for outlier detection, where data deviating from the mean by three times the standard deviation are identified as outliers and eliminated. This method can effectively recognize abrupt change data caused by instrument failure during the drilling process. For a small number of missing values with a missing proportion of less than 0.02%, manual completion is performed in combination with wellbore trajectory and offset well data, so as to avoid introducing additional errors from interpolation processing.

The wavelet filtering method is used for noise reduction in the data. By comparing the denoising effects of three wavelet bases (db4, sym4, and coif3) and three decomposition levels (3, 5, 7) (with Signal-to-Noise Ratio (SNR) and Mean Square Error (MSE) as evaluation indicators), it is found that when the db4 wavelet basis is used with five-level decomposition, the SNR reaches 28.7 dB and the MSE is 0.012, achieving the optimal denoising effect. It can remove instrument and vibration noise while retaining data trends and abrupt features. Therefore, the db4 wavelet basis, five-level decomposition, and SureShrink adaptive threshold are selected, and the denoising results are shown in Figure 1.

To eliminate the dimensional differences between different parameters, the min–max normalization method is adopted to perform normalization processing on all feature parameters, with the normalization formula as follows:

X_{s} = \frac{X - X_{m i n}}{X_{m a x} - X_{m i n}}

(1)

where

X

is the original sample value,

X_{m i n}

and

X_{m a x}

are the minimum and maximum values of the corresponding feature, respectively, and

X_{s}

is the normalized value. After processing, all data are scaled to the range of [0,1], which provides stable input for model training.

The Pearson correlation coefficient method and mutual information method are simultaneously used for feature screening of the denoised data, forming a dual verification system of “linear correlation + nonlinear dependence”. The mutual information calculation formula is

I (X; Y) = \iint p (x, y) l o g \frac{p (x, y)}{p (x) p (y)} d x d y

(2)

where

I (X; Y)

is the mutual information value between variables X and Y,

p (x, y)

is the joint probability density, and

p (x)

,

p (y)

are the marginal probability densities. A larger mutual information value indicates a stronger nonlinear correlation between variables.

Drilling engineering is a multi-factor, strongly coupled system. The linear correlation coefficient between a single factor and the Rate of Penetration (ROP) is generally lower than 0.5, which is a common characteristic of industry data. Based on 77,424 actual drilling samples, threshold sensitivity analysis was carried out to compare the model performance of three thresholds |r| = 0.15, 0.2 and 0.25: threshold 0.15 will introduce too many noise features leading to overfitting; threshold 0.25 will eliminate key coupling features such as torque and Total Pit Volume reducing accuracy; threshold 0.2 is the global optimal balance point, which can make the test set R² reach 0.91 while taking into account training efficiency and generalization ability.

On this basis, the equal-frequency discretization method was used to calculate the mutual information values between each feature and ROP, and a mutual information value > 0.1 was set as the auxiliary screening standard. The final screening standard is the absolute value of the Pearson correlation coefficient |r| > 0.2 and the mutual information value > 0.1. The coincidence rate of the screening results of the two methods reaches 100%, verifying the robustness of feature selection. The results of correlation analysis and mutual information calculation are shown in Figure 2.

A total of six core features were screened out: Hook Load (HL), Weight on Bit (WOB), rotary speed (RPM), torque (T), Total Pit Volume (TPV), and DC Exponent (DC), covering three dimensions: drilling working conditions, formation properties, and drilling fluid performance. Among them, WOB and RPM are core controllable parameters, torque is a rock-breaking response parameter, HL and TPV reflect working condition stability, and DC Exponent is a formation drillability indicator.

Aiming at the heterogeneity problem of the dataset, layered processing has been carried out according to formation lithology and drilling working conditions in the preprocessing stage: according to logging interpretation, samples are divided into three categories: sandstone (41.5%), mudstone (50.0%), and interbedded sandstone–mudstone (8.5%); non-drilling working condition data are eliminated, retaining 77,424 valid samples; stratified random sampling is adopted for dataset division to ensure consistent lithology distribution in each set and avoid model biased learning.

According to engineering attributes, the above features can be divided into two categories: actively controllable decision variables and uncontrollable state boundary variables, which provide a theoretical basis for variable selection in the subsequent multi-objective optimization of drilling parameters.

2.3. Model Evaluation Metrics

To quantify the deviation between the actual measured field data and the model-predicted results, the prediction performance of the model must be evaluated. Commonly used evaluation metrics include Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R²). Among these metrics, R² can directly reflect the prediction accuracy of the model, with a higher value indicating superior prediction performance. R² ranges within the interval [0,1], which converts the goodness of fit of the model into an intuitive proportional indicator (the higher the proportion of explained variance, the better the fitting performance). The closer the R² value is to 1, the better the model fits the data; conversely, when R² approaches 0, the model has insufficient predictive capability.

In contrast, RMSE and MAE take any non-negative values. For these two metrics, a larger value corresponds to a greater prediction error of the model, hence an inferior overall performance. However, unlike R², which enables rapid judgment of model performance from a proportional perspective, RMSE and MAE do not have this characteristic. Therefore, in practical engineering applications, R² is usually selected as the core evaluation metric, which is combined with RMSE and MAE to comprehensively evaluate the prediction accuracy and generalization ability of the model.

3. Model Establishment

3.1. Mechanical Specific Energy Model

It should be specially clarified that the modification of the traditional Teale MSE model in this paper is not the core innovation, but a necessary foundational work to restore the real trade-off relationship between ROP and MSE and ensure the engineering rationality of subsequent optimization results. Existing MSE modification studies are mainly divided into three categories: first, introducing coefficients such as bit wear and formation drillability to correct rock-breaking energy demand; second, correcting surface torque and WOB through downhole measured data; third, considering the auxiliary effect of drilling fluid hydraulic energy on rock breaking. This paper only carries out targeted modification for the engineering pain points that conventional logging data lacks downhole torque measured values, and the energy transmission loss caused by wellbore friction is not quantified. The core advantage is that no additional downhole measuring instruments are required, and accurate MSE calculation can be realized by relying only on conventional logging parameters, which is more suitable for field-while-drilling application scenarios.

3.1.1. Principle of the Traditional Mechanical Specific Energy Model

The theory of Mechanical Specific Energy (MSE) was proposed by R. Teale in 1964. It is defined as the work done by the drill bit to break a unit volume of rock under the action of Weight on Bit (WOB) and torque, which is used to characterize bit performance and evaluate drilling efficiency in real time. The MSE model is expressed as

E = \frac{4 W}{π d_{B}^{2}} + \frac{480 n T}{d_{B}^{2} v}

(3)

where E is the Mechanical Specific Energy, MPa; W is the Weight on Bit (WOB), kN; T is the torque, kN·m; n is the rotary speed (RPM), r/min;

v

is the Rate of Penetration (ROP), m/h; d_B is the bit diameter, mm.

The energy required to break a unit volume of rock is defined as the specific rock-breaking energy; the mechanical energy consumed to break a unit volume of rock per unit time is the Mechanical Specific Energy, which can characterize drilling efficiency. Theoretically, the minimum value of MSE is equal to the specific rock-breaking energy [10]. However, in actual drilling operations, energy loss occurs due to wellbore friction, drill string vibration, and other factors, resulting in the actual MSE value being higher than the specific rock-breaking energy. Therefore, under the condition of achieving an ideal ROP during drilling, a smaller MSE value indicates higher rock-breaking efficiency of the bit, more reasonable drilling parameters, and higher overall drilling efficiency [11].

3.1.2. Construction of Mechanical Specific Energy Model Considering Torque

In the actual drilling process, affected by adverse factors such as wellbore friction, the energy utilization efficiency is extremely low, usually ranging from 30% to 40% [12]. The low rock-breaking efficiency of the drill bit leads to the actual MSE value being about three times the rock strength. Therefore, to meet the actual field requirements and make the MSE closer to the real strength of the rock [13], the effective energy utilization coefficient of the drill bit is defined as E_f, and the modified Mechanical Specific Energy model is established as

E_{m} = E_{f} (\frac{4 W}{π d_{B}^{2}} + \frac{480 n T}{d_{B}^{2} v})

(4)

where

E_{m}

is the modified Mechanical Specific Energy, MPa;

E_{f}

is the effective energy utilization coefficient of the bit, reflecting the proportion of surface input energy transmitted to the bottom hole and used for rock breaking. To verify the rationality of this coefficient value, this paper carries out sensitivity analysis based on the actual drilling data of 12 wells in the study block, and calculates the MSE calculation error and subsequent optimization effect when

E_{f}

is in the range of 0.3~0.4. The results show that when

E_{f}

changes from 0.3 to 0.4, the relative error of the MSE calculation value is less than 5%, the optimized ROP increase rate is stable between 12.5% and 13.8%, and the MSE decrease rate is stable between 22.1% and 24.7%, indicating that this coefficient has little impact on the optimization results within a reasonable range. Based on the existing engineering statistical data of the target block [12], this paper takes

E_{f}

= 0.35 as the optimal value of the energy utilization coefficient.

Torque is a core variable in the Teale MSE model. During actual drilling operations, the main surface-recorded data include Weight on Bit (WOB), rotary speed, and Rate of Penetration (ROP), while the measured real downhole torque at the drill bit is usually unavailable. It is necessary to calculate the torque through the measured data, that is, to calculate the drill bit torque using the bit sliding friction coefficient and WOB.

The rock-breaking torque of a PDC bit mainly comes from the sliding friction between the cutting teeth and the rock. Assuming that the cutting teeth are evenly distributed on the bit end face and the normal pressure per unit area is uniformly distributed, the total bit torque can be obtained by integrating the friction torque on the bit end face. The bit end face is divided into countless micro-element areas, the normal pressure on each micro-element area is

d W = \frac{4 W}{π d_{B}^{2}} \cdot l d l d θ

, and the corresponding friction torque is

d T = μ \cdot d W \cdot l

.

Perform a double integral on the entire bit end face:

T = \frac{1}{1000} \int_{0}^{2 π} \int_{0}^{d_{B} / 2} μ \cdot \frac{4 W}{π d_{B}^{2}} \cdot l^{2} d l d θ

(5)

First, integrate the radial variable

l

:

\int_{0}^{d_{B} / 2} l^{2} d l = \frac{l^{3}}{3} |_{0}^{d_{B} / 2} = \frac{d_{B}^{3}}{24}

(6)

Then, integrate the circumferential variable

θ

:

\int_{0}^{2 π} d θ = 2 π

(7)

Substitute and simplify to get

T = \frac{1}{1000} \int_{0}^{d_{B}} \int_{0}^{2} l^{2} \frac{4 μ W}{π d_{B}^{2}} d l d θ = \frac{μ W d_{B}}{3000}

(8)

This friction torque calculation model has been verified in a large number of drilling engineering practices [13] and is suitable for estimating the rock-breaking torque of PDC bits in deep sandstone–mudstone interbedded formations.

Substituting Equation (8) into Equation (4) yields

E_{m} = E_{f} W (\frac{4}{π d_{B}^{2}} + \frac{0.16 μ n}{d_{B} v})

(9)

where

μ

is the sliding friction coefficient of the drill bit, which is generally taken as 0.25 for roller cone bits and 0.5 for Polycrystalline Diamond Compact (PDC) bits.

The above improved MSE model can realize an accurate quantitative evaluation of the bit rock-breaking efficiency. Its core input parameters include WOB, rotary speed and ROP, and the high-precision dynamic prediction of ROP is the core prerequisite for realizing real-time MSE evaluation and global optimization of drilling parameters.

Based on the derived calculation formula of the improved MSE above, the preprocessed field drilling data are substituted to calculate the improved MSE values of the 4200–4600 m well intervals of the 12 wells, and the calculation results are shown in Figure 3.

3.2. Rate of Penetration Prediction Model

To achieve high-precision dynamic prediction of the Rate of Penetration (ROP), a hybrid Temporal Convolutional Network (TCN)–Long Short-Term Memory (LSTM) model is constructed.

3.2.1. Temporal Convolutional Network

The Temporal Convolutional Network (TCN) is a temporal sequence modeling method based on a deep convolutional neural network. It can realize efficient modeling of time series data through a hierarchical temporal feature extraction mechanism and strict causal constraints [14]. The core structure of the TCN mainly consists of two components: dilated convolution and residual connection (see Figure 4). Causal convolution adopts a one-dimensional convolution kernel to perform sliding calculation of data along the time axis, and through left padding, the convolution kernel only covers historical time steps. This ensures that the output at the current moment only depends on the input from historical and current moments, which can effectively avoid future information leakage [15]. Dilated convolution controls the sampling interval through an exponentially increasing dilation factor, which can expand its receptive field without increasing the number of parameters. Meanwhile, it can effectively capture local abrupt features through small dilation factors in shallow layers. The residual connection module includes convolutional layers, an activation function, a normalization layer and a skip connection. It fuses shallow detailed features and deep abstract representations through skip connection, which can alleviate gradient vanishing, accelerate model convergence, and enhance the model’s representation capability for complex temporal features [16,17].

The calculation formulas of dilated causal convolution and residual connection are as follows [18]:

y_{t} = \sum_{i = 0}^{k - 1} w_{i} x_{t - d i}

(10)

O u t p u t = A c t i v a t i o n [F (x) + {C o n v}_{1 \times 1} (x)]

(11)

where

y_{t}

is the output at time step t;

w_{i}

is the

i

-th weight parameter of the convolution kernel;

x_{t - d i}

is the value at time step

t - d i

of the input sequence;

d

is the dilation rate, which is used to control the interval sampling of the convolution kernel;

k

is the kernel size, which determines the number of covered time steps;

F (x)

is the feature processed by the convolutional layer and activation function; and

{C o n v}_{1 \times 1} (x)

is the

1 \times 1

convolution for channel number adjustment, which matches the dimension of the residual branch with the main path.

3.2.2. Long Short-Term Memory Network

The Long Short-Term Memory (LSTM) network is an improved structure of the Recurrent Neural Network (RNN). Compared with the traditional RNN, the LSTM effectively alleviates the gradient vanishing and explosion problems by introducing a gating mechanism, and is capable of learning long-term dependencies [19].

The basic unit of the LSTM neural network consists of one cell state and three gating mechanisms: the forget gate, the input gate, and the output gate. Each gating mechanism controls the information flow through the sigmoid activation function, which maps the input to a value between 0 and 1, where 0 indicates completely closed and 1 indicates fully open. The structure diagram of its network unit is shown in Figure 5. In the figure, A on the left represents the cell state and output of the previous time step; A on the right represents the cell state and output of the current time step; and Xt represents the input of the current time step [20].

3.2.3. Additive Attention Mechanism

Additive attention is a critical attention mechanism. When analyzing temporal data, the introduction of the additive attention mechanism can significantly enhance the capability of capturing key information in time series and improve information utilization efficiency [22]. During the operation of additive attention, the input data are fed into the linear transformation layer and converted into the global query vector

q \in R^{d}

through the weight matrix. This process can efficiently extract the key feature representations required for subsequent attention calculation from the input. Then, the query vector and the key vector

k_{j} \in R^{d}

are linearly transformed through their respective linear transformation matrices to obtain intermediate vectors. Subsequently, the intermediate features are mapped into scalar similarity scores via a learnable parameter vector, and the scores are normalized by the Softmax function to generate the attention weight distribution. Finally, the output is obtained by a weighted summation of the value vectors

v_{j} \in R^{d}

, according to the weights.

The calculation formulas of the attention weight are as follows:

e_{i} = v_{a}^{T} t a n h (W_{a} q + U_{a} k_{i})

(12)

a_{i} = \frac{\exp (e_{i})}{\sum_{j = 1}^{N} e x p (e_{i})}

(13)

where

e_{i}

is the similarity score between the

i

-th key vector

k_{i}

and the query vector

q

;

W_{a} \in R^{d \times d}

is the learnable linear transformation matrix, which performs linear transformation on the query vector

q

and maps it to the appropriate feature space;

U_{a} \in R^{d \times d}

is the learnable linear transformation matrix, which performs a linear transformation on the key vector

k_{i}

to map it to the same feature space as the transformed query vector

q

;

v_{a}^{T} \in R^{d}

is the learnable parameter vector, which compresses the high-dimensional feature vector into a scalar; N is the sequence length of the query matrix Q; tanh is the hyperbolic tangent function; and

a_{i}

is the attention weight corresponding to the

i

-th key vector

k_{i}

.

3.2.4. Tcn-Lstm Model Fused with Additive Attention Mechanism

Based on the aforementioned research methods, this paper constructs a parallel prediction model (TCN–LSTM–Attention) integrating Temporal Convolutional Network (TCN), Long Short-Term Memory (LSTM) network, and an additive attention mechanism [23], with its detailed network structure shown in Figure 6. The key hyperparameters of each module of the model are all optimized and determined through the grid search method. The TCN module is stacked with four residual blocks, and each residual block contains two dilated causal convolution layers with dilation factors of [1,2,4,8] in turn, a kernel size of 3, and an output channel number of 64. Each convolution layer is followed by a weight normalization layer, ReLU activation function and Dropout layer with a dropout rate of 0.2 in turn, and finally, the channel number is adjusted through 1 × 1 convolution to realize a residual connection. The LSTM module adopts a two-layer stacked structure, with 64 hidden-layer neurons in each layer, tanh as the activation function, and sigmoid as the recurrent activation function. Each layer is followed by a Dropout layer with a dropout rate of 0.2, and outputs the hidden state of the last time step. The additive attention module takes the concatenated output features of the TCN and LSTM modules as input, with a feature dimension of 128. Query vectors and key vectors are generated through linear transformation, and the features are weighted and summed after calculating the attention weights. The fully connected layer contains two layers of networks; the first layer has a dimension of 32 with a ReLU activation function, and the second layer is the output layer with a dimension of 1 and no activation function, directly outputting the ROP prediction value. It should be clarified that the core innovation of this paper focuses on the dual-objective coupled optimization framework of improved MSE and ROP. The introduction of the additive attention mechanism is a necessary basic work to improve the prediction accuracy of ROP, and its algorithm design itself is not the research focus of this paper. Therefore, comparative experiments of multiple attention mechanisms are not carried out, and this treatment does not affect the reliability of the core conclusions of this paper.

The model takes the six core features screened in Section 2.2 as inputs. Combined with the 1 m per step acquisition depth interval of the drilling and mud logging data, the length of the temporal input window is set to 5. That is, the temporal feature data of the 5 m interval before the current well depth are used to predict the Rate of Penetration (ROP) at the current well depth, which fully matches the engineering scenario of field while-drilling prediction. The above TCN-LSTM model fused with additive attention can realize high-precision ROP prediction under given drilling parameters, which lays a core predictive model foundation for the subsequent coupling with the improved MSE model and the multi-objective optimization of drilling parameters.

3.3. Multi-Objective Optimization Algorithm

3.3.1. Determination of Core Model Elements

In actual drilling operations, Weight on Bit (WOB) and rotary speed (Revolutions Per Minute, RPM) are the core operating parameters that can be directly and in real-time controlled by field drillers. Meanwhile, parameters including Hook Load (HL), torque, and standpipe pressure are response parameters generated during the drilling process, which cannot be preset and directly regulated. Total Pit Volume (TPV) and DC Exponent (DC) are fixed/slow-varying parameters related to formation and drilling fluid, with no real-time regulation space in the same target well interval. Therefore, WOB and RPM are finally selected as the decision variables of the optimization model, with the strongest engineering implementability and optimization pertinence [11].

Parameters such as flow rate and drilling fluid density are not included in the decision variables, mainly based on the following three engineering reasons and databases:

First, there is a significant difference in real-time controllability. WOB and rotary speed are core operating parameters that can be adjusted in real time and continuously by field drillers on a second-scale time basis, and the adjustment can be immediately reflected in rock-breaking efficiency and ROP. In contrast, flow rate and drilling fluid density are slow-varying parameters, with an adjustment cycle usually ranging from tens of minutes to several hours. Moreover, their adjustment mainly affects wellbore cutting carrying capacity and wellbore stability, and there is a significant hysteresis in the direct impact on bit rock-breaking efficiency, which cannot meet the demand of real-time optimization while drilling.

Second, the influence of parameters on the objective function is different. Correlation analysis and mutual information calculation results show that the Pearson correlation coefficient between flow rate and ROP is only 0.11, with a mutual information value of 0.08; the Pearson correlation coefficient between drilling fluid density and ROP is −0.24, with a mutual information value of 0.12. Their influence on ROP is much lower than that of WOB and rotary speed. In addition, in the 4200–4600 m well interval of the target block, both flow rate and drilling fluid density have been optimized to the optimal interval meeting the cutting carrying requirements, and further adjustment has limited contribution to efficiency improvement and consumption reduction.

Third, it conforms to the actual field operation logic. In actual drilling operations, the adjustment of flow rate and drilling fluid density needs to comprehensively consider multiple constraints such as well control safety, wellbore stability, and equipment capacity, which are usually uniformly set by drilling engineers according to the construction plan of the whole well interval. Drill operators have no right to adjust them arbitrarily during the drilling process. Only selecting WOB and rotary speed as decision variables fully conforms to the actual field operation process and ensures that the optimization results can be directly applied on site.

For the remaining four state boundary variables, the optimization process is based on the in situ field drilling data of the corresponding well depth in the validation well interval. For the optimization calculation at each well depth, the measured values of HL, torque, TPV and DC Exponent corresponding to the well depth are adopted as fixed boundary inputs, and only the two decision variables of WOB and RPM are adjusted for optimization. This approach not only retains the real formation and working condition characteristics of the target well interval, avoids the decoupling between the model and dynamic field working conditions caused by adopting fixed mean values, but also fully conforms to the operation logic of field while-drilling optimization: field drillers can only achieve efficiency improvement and consumption reduction by adjusting WOB and RPM, and cannot change the established boundary conditions such as formation drillability and wellbore friction.

The core optimization objective of drilling operations is to simultaneously realize the collaborative improvement of high ROP and low MSE within the boundary of safe on-site construction. From the mathematical relationship perspective, both ROP and MSE are binary functions of Weight on Bit (WOB) and rotary speed (RPM), and they realize deep coupling through common decision variables.

ROP can be expressed as

R O P = f_{D L} (W O B, R P M, X_{b o u n d})

through the TCN–LSTM–Attention model constructed in this paper, where

X_{b o u n d}

are fixed boundary parameters such as Hook Load and DC Exponent.

MSE can be expressed as

M S E = f_{M S E} (W O B, R P M, R O P) = f_{M S E} (W O B, R P M, f_{D L} (W O B, R P M, X_{b o u n d}))

through the improved mechanistic model.

The above coupling relationship indicates that adjusting WOB and RPM will simultaneously change the values of ROP and MSE, which is the mathematical basis for constructing the dual-objective collaborative optimization model. The two core models can realize deep coupling through nested mapping of common variables, so as to construct a dual-objective collaborative multi-objective optimization model for drilling parameters [24].

The coupled multi-objective optimization system takes decision variables, objective functions, and constraint conditions as the three core elements, and its optimization objective function is expressed as

F (x) = m i n \{f_{1} (x), f_{2} (x)\}

(14)

f_{1} (x) = 1 / R O P (W O B, R P M)

(15)

f_{2} (x) = M S E (W O B, R P M)

(16)

where

F (x)

is the multi-objective optimization function;

f_{1} (x)

is the function of ROP with respect to WOB and RPM;

f_{2} (x)

is the function of MSE with respect to WOB and RPM.

In the coupled multi-objective optimization model, decision variables must have the core attribute of acting on both objective functions simultaneously, so that the global optimum of the objective functions can be achieved by adjusting their values. In actual drilling operations, WOB and rotary speed, as the key real-time controllable operating parameters, have a direct impact on both ROP and MSE.

From the perspective of rock-breaking mechanism and field drilling laws in the study block, the dual objectives of ROP maximization and MSE minimization present stage-wise characteristics dominated by collaborative optimization with conflicting in local intervals within the full feasible domain of WOB and rotary speed, which endows the multi-objective collaborative optimization with theoretical and engineering necessity.

Within the reasonable operation range of WOB and rotary speed, the dual objectives show a significant collaborative optimization relationship. Increasing WOB and rotary speed can enhance the effective penetration depth of the bit cutting teeth and the rock-breaking impact frequency, improving the volume breaking efficiency of the rock. The ROP increases synchronously, and the invalid time consumption per unit footage decreases; that is, the objective function

f_{1} (x)

decreases. Meanwhile, the proportion of invalid energy loss caused by wellbore friction and drill string vibration in the rock-breaking process decreases synchronously, the total mechanical energy required to break a unit volume of rock is reduced, and the MSE decreases accordingly; that is, the objective function

f_{2} (x)

decreases.

When WOB and rotary speed exceed the formation-adapted critical threshold, a trade-off conflicting relationship between the dual objectives will occur. Excessively high WOB will lead to over-deep bit penetration, overloading, and chipping of cutting teeth, while excessively high rotary speed will trigger drill string whirl and aggravated lateral vibration. At this point, the invalid energy consumption for rock breaking increases sharply, presenting a reverse variation law that ROP increase slows down or even declines while MSE rises drastically. Specifically, pursuing the ultimate maximization of ROP will inevitably lead to a significant increase in MSE, while pursuing the ultimate minimization of MSE will require sacrificing part of the ROP potential.

Even within the collaborative optimization interval, the optimal solutions of the two single objectives are not completely coincident. The maximum single-objective value of ROP usually appears at the position where WOB and rotary speed are close to the critical threshold, while the minimum single-objective value of MSE appears at the parameter matching point with the highest rock-breaking efficiency, with an inherent parameter deviation between the two. Single-objective optimization cannot simultaneously meet the dual engineering requirements of efficiency improvement and consumption reduction. Therefore, it is necessary to construct a dual-objective coupled optimization model to achieve the collaborative optimum of the two objectives through global optimization.

To eliminate the dimensional differences and numerical interval differences between the two objectives, avoid the dominant effect of a single objective on the optimization results, and ensure equal weight of the two optimization objectives, the min–max normalization method is adopted to standardize the dual objective functions. After processing, the values of both objectives are within the range of [0,1], where 0 corresponds to the theoretical optimal value of the objective, and 1 corresponds to the theoretical worst value. The normalization formulas are as follows:

f_{1, n o r m} (x) = \frac{f_{1} (x) - f_{1, m i n}}{f_{1, m a x} - f_{1, m i n}}

(17)

f_{2, n o r m} (x) = \frac{f_{2} (x) - f_{2, m i n}}{f_{2, m a x} - f_{2, m i n}}

(18)

where

f_{1, n o r m} (x)

and

f_{2, n o r m} (x)

are the normalized results of the two objective functions, respectively;

f_{1, m i n}

and

f_{1, m a x}

are the minimum and maximum values of the objective function

f_{1} (x)

within the constraint interval of decision variables, respectively; and

f_{2, m i n}

and

f_{2, m a x}

are the minimum and maximum values of the objective function

f_{2} (x)

within the constraint interval of decision variables, respectively, which are determined by a full traversal calculation of the constraint interval.

On this basis, the equal-weight weighted aggregation method is adopted to construct the comprehensive optimization objective function, ensuring that the two optimization objectives are equally considered, and finally realizing the synchronous optimization of the dual objectives. The expression of the comprehensive optimization objective function is

F_{c o m} (x) = m i n (ω_{1} \cdot f_{1, n o r m} (x) + ω_{2} \cdot f_{2, n o r m} (x))

(19)

where

F_{c o m} (x)

is the comprehensive optimization objective function;

ω_{1}

and

ω_{2}

are the weight coefficients of the two objectives, respectively, satisfying

ω_{1}

+

ω_{2}

= 1. In this paper, an equal-weight setting is adopted, that is,

ω_{1}

= 0.5 and

ω_{2}

= 0.5, which fully takes into account the dual core requirements of drilling efficiency improvement and rock-breaking energy saving. The weight coefficients can be flexibly adjusted according to the differentiated requirements of actual on-site construction to adapt to the optimization objectives under different working conditions.

The linear weighting method is a classical a priori preference-based solution method for multi-objective optimization. After eliminating dimensional differences through normalization, the optimization priority of dual objectives can be flexibly adjusted through weight coefficients to adapt to different field construction requirements. Compared with Pareto optimization methods such as NSGA-II, its output single optimal solution is more convenient for field engineers to directly apply without additional multi-solution decision-making steps, which is more in line with the engineering scenario of real-time optimization while drilling. This paper adopts an equal-weight setting (ω₁ = 0.5, ω₂ = 0.5), which fully takes into account the dual core requirements of drilling efficiency improvement and rock-breaking energy saving, consistent with the current construction orientation of “efficiency first, energy saving considered” in the target block. To verify the robustness of the weight setting, a weight sensitivity analysis was carried out. The results show that when the weight coefficient changes within the range of 0.4~0.6, the optimized ROP increase rate is stable between 12.3~13.8%, and the MSE decrease rate is stable between 21.7~24.2%, with minimal fluctuation in optimization effect.

Combined with the statistical distribution characteristics of actual drilling data of 12 wells in the study block, the constraint interval of WOB is finally determined as 20–200 kN, and the constraint interval of rotary speed is 30–90 r/min. Figure 7 shows the WOB and RPM distribution of representative wells, and the actual drilling parameters of all 12 wells fall within the above intervals, verifying the effectiveness and safety of this constraint boundary.

3.3.2. Model Solution Based on Standard Genetic Algorithm

Aiming at the above continuous global optimization problem with boundary constraints, the standard genetic algorithm (GA) is selected to complete the model solution. The genetic algorithm [25,26] is an intelligent global optimization algorithm constructed based on the theory of biological evolution. It does not require the objective function to satisfy the continuous and differentiable condition, and has extremely strong adaptability to nonlinear and strongly coupled mapping relationships. With the advantages of strong global optimization capability, good robustness, and stable convergence performance, it fully adapts to the engineering scenario of this drilling parameter optimization, and can effectively avoid the local optimum problem and efficiently converge to the global optimal solution. The core operating parameters of the genetic algorithm are shown in Table 1.

The algorithm solution is based on the safety constraint interval of the decision variables. First, the random uniform sampling method is adopted to generate the initial population within the established operation boundary of WOB and rotary speed, so that each individual in the population corresponds to a set of drilling parameter combinations to be optimized, and an initial search space covering the complete feasible domain is constructed for global optimization. On this basis, all individuals in the population are sequentially input into the trained ROP prediction model and the improved MSE mechanistic model to calculate the dual objective function values corresponding to each individual. After min–max normalization, the values are substituted into the comprehensive optimization objective function to complete the calculation of the fitness value of each individual, and quantify the comprehensive optimization effect of different parameter combinations in the population. Subsequently, the tournament selection method is adopted to screen individuals with excellent fitness performance from the current population as parents, so as to complete the screening and retention of superior genes and guide the population to continuously evolve towards the global optimum. Meanwhile, the simulated binary crossover method is used to perform crossover operation on the screened parent individuals to achieve gene recombination, and the polynomial mutation method is adopted to perform mutation operation on the generated offspring individuals to expand the search space of the algorithm. While fully maintaining the population diversity, it can effectively avoid the local optimum trap that is prone to occur in the optimization process. After the complete genetic operation is finished, the parent population and the offspring population are merged, and the individuals with the optimal fitness value are screened to form a new generation population, completing a full iterative cycle. The algorithm will continuously repeat the above iterative process until the number of iterations reaches the preset maximum number of generations, and then terminate the operation. Finally, it outputs the global optimal drilling parameter combination that meets the safety constraints and can simultaneously achieve ROP maximization and MSE minimization.

4. Case Study

Based on the field drilling data from the Daye 1H Well Platform in the Southwest Oil and Gas Field block, a systematic case verification is carried out on the TCN-LSTM ROP prediction model fused with an additive attention mechanism and the coupled multi-objective optimization model of improved MSE and ROP constructed above.

The well interval with a vertical depth of 4200–4600 m of Well Daye 1H3-4 in the Southwest Oil and Gas Field is selected as the verification object. This well interval is not involved in model training, consistent with the overall formation characteristics of the study block, and is dominated by interbedded sandstone and mudstone. Its vertical depth range matches the modeling data, and it is not involved in the construction of the training set and validation set, which can effectively verify the generalization ability of the model in unfamiliar well intervals.

4.1. Characteristic Analysis

For this verification well interval, Pearson correlation analysis is carried out to clarify the correlation characteristics between Rate of Penetration (ROP), improved Mechanical Specific Energy (MSE) and each drilling parameter under the target formation conditions, and quantify the main controlling factors. The analysis results are shown in Figure 8.

According to the results of the correlation analysis, the absolute values of the correlation coefficients between WOB, rotary speed, and ROP in this well interval are all greater than 0.3, and the absolute values of the correlation coefficients between the two parameters and the improved MSE are greater than 0.45. They are the core main controlling factors affecting the drilling efficiency and rock-breaking energy consumption of this well interval [27], which further verifies the rationality and pertinence of the aforementioned selection of WOB and rotary speed as the optimization decision variables. Meanwhile, the inter-well fluctuation amplitudes of Hook Load (HL), Total Pit Volume (TPV) and DC Exponent (DC) in this well interval are all less than 5%, and the torque exhibits a synchronous response to the variations in WOB and rotary speed. This is consistent with the aforementioned determination of the engineering attributes of the state boundary variables, and verifies the rationality of the meter-by-meter in situ assignment processing method adopted in the optimization process.

4.2. Rop Prediction Model Training and Accuracy Verification

4.2.1. Model Training Setup and Hyperparameter Tuning

Based on the preprocessed dataset and the constructed TCN-LSTM model fused with the additive attention mechanism, model training and verification are carried out [28]. Model fitting is performed using the training set of eight wells, hyperparameter tuning is carried out with the validation set of two wells, and the final generalization ability verification is completed with the test set of two wells (including Well Daye 1H3-4 for this verification). The model is built based on Python 3.13 and PyTorch 2.0 frameworks. The Adam optimizer is adopted, with the initial learning rate set to 0.001, the batch size set to 32, and the maximum number of iterations set to 1000. To prevent model overfitting, a dual regularization strategy is adopted. Dropout layers are introduced in the TCN and LSTM modules to randomly discard some neurons to reduce feature collinearity, and an early stopping strategy is set to terminate training when the validation set loss does not decrease for 20 consecutive epochs. During the training process, the training set loss and validation set loss decrease synchronously and finally converge to a stable value, with the final difference between them less than 0.03. No obvious overfitting phenomenon occurs, proving that the regularization strategy effectively suppresses the overfitting risk.

The core hyperparameters of the model are determined through the grid search method. Finally, the kernel size of the TCN module is selected as 3, the dilation factors are set to [1,2,4,8], the number of hidden layer neurons in the LSTM module is 64, and the feature dimension of the additive attention mechanism is 128. Meanwhile, to determine the optimal temporal input window length, sensitivity analysis is carried out, combined with the continuous action mechanism of rock-breaking in deep formations, and the model performance of 3, 5, 7, and 9 time steps (corresponding to 3 m, 5 m, 7 m, and 9 m well intervals) is tested respectively. The results show that when the window length is 5, the test set R² reaches the maximum value of 0.91, and the MAE and RMSE are the smallest. Too short a window cannot completely capture the temporal dependence of the rock breaking process, while too long a window will introduce redundant historical information and reduce the prediction accuracy. Therefore, this paper finally selects five time steps as the input window length.

4.2.2. Model Prediction Accuracy Verification

The Coefficient of Determination (R²), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) are adopted as evaluation indicators. Meanwhile, the BP model, CNN model, single LSTM model, and TCN-LSTM model without attention are introduced as control groups. To ensure the fairness of performance comparison, all benchmark models adopt exactly the same grid search method as the proposed model to optimize hyperparameters, with the search range including key parameters such as learning rate (0.0001~0.01), batch size (16~64), and hidden layer dimension (32~128), to ensure that each benchmark model achieves its optimal performance state [29]. On this basis, the performance advantages of the proposed hybrid model and the effectiveness of each module are verified, and the comprehensive accuracy results on the test set are shown in Table 2.

It can be seen from Table 2 that the TCN-LSTM model fused with an additive attention mechanism constructed in this paper achieves an R² of 0.91 on the test set, which is improved compared with other classical models. Meanwhile, the MAE and RMSE are significantly reduced, proving that the model has a more excellent fitting capability and generalization performance. It can accurately realize the dynamic prediction of ROP in the target block [30] and provides a reliable surrogate model foundation for subsequent parameter optimization.

To intuitively present the prediction effect of the model, the comparison curve between the ROP predicted values and field-measured values in the verification well interval is plotted, as shown in Figure 9. It can be seen that the predicted values of the model are highly consistent with the variation trend of the field measured values, and the model can still maintain high prediction accuracy in the well intervals with lithology changes and working condition adjustments, which further verifies the applicability of the model drilling scenarios while in the field.

(1) K-fold Inter-well Cross-validation

To avoid data leakage caused by random dataset splitting and more objectively evaluate the generalization ability of the model across different wells, this paper adopts the five-fold inter-well cross-validation method for supplementary verification. The 12 deep wells in the study block are randomly divided into five groups according to well numbers (three groups with two wells each and two groups with three wells each). Each time, four groups are selected as the training set, and the one remaining group is the independent test set; five independent experiments are repeated. All experiments adopt exactly the same model architecture, hyperparameter settings and training strategies. The cross-validation results are shown in Table 3.

The cross-validation results show that the mean R² of the model in five independent experiments is 0.9086, which is highly consistent with the original test set result (0.91), and the performance fluctuation of each fold is extremely small, proving that the model has excellent stability and cross-well generalization ability without overfitting. Compared with the traditional random splitting method, inter-well cross-validation is more in line with the actual drilling engineering and can truly simulate the prediction effect of the model in undrilled new wells.

(2) Statistical Significance Analysis

Based on the five-fold inter-well cross-validation results, the statistical characteristics and 95% confidence intervals of the model performance indicators are calculated (using t-distribution with 4 degrees of freedom), as shown in Table 4.

The statistical analysis results show that the 95% confidence interval of the model R² is [0.897, 0.921] with a width of only 0.024, indicating that the model prediction results have high statistical reliability. The confidence intervals of MAE and RMSE are also narrow, further verifying the stability of the model performance. The above results demonstrate that the TCN-LSTM-Attention model proposed in this paper can stably meet the engineering requirements of real-time while-drilling regulation.

(3) Module Ablation Experiment

Based on the comparison results in Table 2, module ablation experiments are carried out to verify the independent roles and effectiveness of the TCN module, LSTM module and additive attention mechanism.

Synergistic effect of TCN and LSTM modules: Compared with the pure LSTM model, the R² of the TCN-LSTM model without attention is increased by 7.3%, and the MAE and RMSE are decreased by 27.2% and 13.7% respectively, indicating that the TCN module can effectively capture local abrupt features in drilling time series data (such as sudden changes in ROP caused by lithology changes and working condition adjustments), and forms feature complementarity with the LSTM module which is good at mining long-term dependencies, significantly improving the overall prediction performance of the model.

Role of additive attention mechanism: After introducing the additive attention mechanism on the basis of TCN-LSTM, the model R² is further improved by 3.4%, and the MAE and RMSE are decreased by 16.5% and 4.0% respectively, indicating that the attention mechanism can dynamically assign weights to different time steps and different features, highlight the key information that has a greater impact on ROP prediction, and improve the feature utilization efficiency of the model.

4.3. Multi-Objective Optimization Solution of Drilling Parameters Based on Standard Genetic Algorithm

Based on the trained high-precision TCN-LSTM ROP prediction model fused with an additive attention mechanism and the improved MSE rock-breaking mechanistic model, combined with the dual-objective collaborative coupled optimization framework, the standard genetic algorithm is adopted to carry out the optimization solution of drilling parameters for the verification well interval with a vertical depth of 4200–4600 m of Well Daye 1H3-4 in the Southwest Oil and Gas Field. The core operating parameters of the algorithm adopt the established settings (Table 1). The constraint interval of decision variables is based on the safety boundary determined by the field drilling data of the study block, that is, WOB of 20–200 kN and rotary speed of 30–90 r/min. The optimization objective adopts an equal-weight setting, which equally takes into account the dual engineering requirements of ROP maximization and MSE minimization, to ensure that the optimization results have the comprehensive benefits of both efficiency improvement and energy saving.

During the iteration process of the algorithm, the fitness value of the comprehensive objective function gradually decreases and converges to a stable state with the increase in the number of generations, indicating that the population can quickly converge to the global optimal solution [31]. There is no local convergence problem in the optimization process, and the convergence performance is stable, which can effectively adapt to the nonlinear optimization requirements of drilling parameter optimization.

Based on the global optimal solution after algorithm convergence, the comprehensive optimal drilling parameter combination of the verification well interval is finally obtained. The comparison results with the field’s actual drilling mean value and constraint interval are shown in Table 5. It can be seen from the parameter optimization results that the optimized WOB and rotary speed are both within the on-site safe operation range, with no risk of parameter overrun. Compared with the field’s actual drilling mean value of this well interval, the optimized WOB is reduced from 135 kN to 125 kN, and the optimized rotary speed is reduced from 69 r/min to 52 r/min. The moderate reduction in WOB and rotary speed helps to alleviate drill string vibration and abnormal bit wear, enabling the bit to break rock under more stable working conditions, thereby improving energy utilization efficiency.

The optimized drilling parameter combination is substituted into the ROP prediction model and the improved MSE mechanistic model to calculate the optimized core indicator results. The comparison with the field actual drilling mean value of this well interval is shown in Table 6, and the comparison of the variation trends of ROP and improved MSE with well depth before and after optimization is shown in Figure 10.

It can be seen from the optimization results that after parameter optimization, the ROP of the verification well interval is increased from the actual drilling mean value of 7.76 m/h to the predicted value of 8.78 m/h, with an optimization amplitude of +13.1%; the improved MSE is reduced from the actual drilling mean value of 734.25 MPa to 561.81 MPa, with a reduction amplitude of −23.5%. To verify the statistical significance of the optimization effect, a paired t-test was carried out on the meter-by-meter data before and after optimization. The results show that the p-value of ROP is 0.002 and the p-value of MSE is 0.0003, both less than the significance level of 0.05, proving that the optimization effect is not caused by random fluctuations. Under the premise that both WOB and rotary speed are reduced, the ROP still achieves a significant increase, which benefits from the optimal parameter matching that gives full play to the rock-breaking efficiency of the drill bit, shortens the time consumption per unit footage, and can directly reduce the drilling cycle and comprehensive cost. Meanwhile, the significant reduction in the improved MSE after optimization indicates that the optimized parameter combination greatly reduces the invalid energy loss in the rock-breaking process and improves the utilization efficiency of the bit work, which is conducive to extending the bit service life and reducing the tripping frequency, thus realizing the optimization goal of rock-breaking energy saving [32].

It can be seen from the variation curves shown in Figure 10 that in the interbedded sandstone and mudstone formation of 4200–4600 m, the optimized Rate of Penetration (ROP) is overall higher than the field actual drilling values, and the optimized Mechanical Specific Energy (MSE) is overall lower than the field actual drilling values. A stable optimization effect can still be maintained in the well intervals with lithology changes. This fully proves that the coupled optimization model constructed in this paper has strong formation adaptability, and can adapt to the drilling parameter optimization requirements under different lithology conditions of the deep formation in the target block.

5. Conclusions

To address the demands for enhancing drilling efficiency and optimizing energy consumption in deep formations, this study established a multi-objective drilling parameter optimization model that integrates an improved Mechanical Specific Energy (MSE) model with the Rate of Penetration (ROP). Validated using actual drilling data from the Southwest Oil and Gas Field, the following conclusions are drawn:

(1) The conventional Teale MSE model was modified by introducing a coefficient for effective bit energy utilization and a downhole torque calculation method. The resulting improved MSE model aligns more closely with field conditions and enables accurate quantification of rock-breaking efficiency and energy consumption at the bit.

(2) A TCN–LSTM–Attention model incorporating an additive attention mechanism was developed to achieve high-precision dynamic prediction of ROP. The model attained a Coefficient of Determination (R²) of 0.91 on the test dataset, significantly outperforming classical models such as BP and CNN. The stability of the model and the synergistic effect of each module were verified through five-fold inter-well cross-validation and ablation experiments, thereby providing a reliable surrogate model for parameter optimization.

(3) The synergistic-conflicting boundary between the dual objectives of maximizing ROP and minimizing MSE was delineated. By selecting Weight on Bit (WOB) and rotational speed (RPM)—both adjustable in real time on site—as decision variables, a bi-objective coupled optimization model was formulated. Global optimization was performed using a genetic algorithm. Case validation demonstrated that, within safe operational constraints, the optimized drilling parameters increased the ROP of the target interval by 13.1% and reduced the improved MSE by 23.5%. Consequently, synergistic optimization of drilling efficiency enhancement and rock-breaking energy conservation was simultaneously achieved.

Author Contributions

Conceptualization, L.W.; Methodology, L.W.; Software, H.S.; Validation, Z.Z.; Formal analysis, K.S. and G.H.; Investigation, Z.Z., L.K., Y.Y. and G.Z.; Resources, L.W., A.-G.W., L.K., Y.Y., G.H. and G.Z.; Data curation, H.S., J.X., L.K. and Y.Y.; Writing—original draft, H.S.; Writing—review & editing, M.L.; Visualization, Z.Z.; Supervision, A.-G.W., M.L., K.S. and J.X.; Project administration, A.-G.W., K.S., J.X., Y.Y., G.H. and G.Z.; Funding acquisition, A.-G.W., M.L. and L.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 52574004) and the National Science and Technology Major Project of the Ministry of Science and Technology of China (Grant No. 2025ZD1401203).

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Conflicts of Interest

Authors Ai-Guo Wang and Lulin Kong were employed by the company CNPC Engineering Technology R&D Company Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Feng, J.; Gao, Z.; Cui, J.; Zhou, C. The exploration status and research advances of deep and ultra-deep clastic reservoirs. Adv. Earth Sci. 2016, 31, 718. [Google Scholar] [CrossRef]
Zhang, Y.R.; Guo, X.L.; Zhang, J.; Zhao, N. Research progress on drilling parameter optimization in petroleum drilling. Inn. Mong. Petrochem. Ind. 2023, 49, 92–95. [Google Scholar] [CrossRef]
Teale, R. The concept of specific energy in rock drilling. Int. J. Rock Mech. Min. Sci. Geomech. Abstr. 1965, 2, 245. [Google Scholar] [CrossRef]
Zhou, Y.; Zhang, W.; Gamwo, I.; Lin, J.-S. Mechanical specific energy versus depth of cut in rock cutting and drilling. Int. J. Rock Mech. Min. Sci. 2017, 100, 287–297. [Google Scholar] [CrossRef]
Zhou, F. Intelligent Optimization of Drilling Parameters Based on Mechanical Specific Energy and Rate of Penetration Data-Driven Model. Ph.D. Thesis, China University of Petroleum (Beijing), Beijing, China, 2024. [Google Scholar] [CrossRef]
Barbosa, L.F.F.M.; Nascimento, A.; Mathias, M.H.; de Carvalho, J.A., Jr. Machine learning methods applied to drilling rate of penetration prediction and optimization—A review. J. Pet. Sci. Eng. 2019, 183, 106332. [Google Scholar] [CrossRef]
Zhou, Y.; Chen, X.; Zhao, H.; Wu, M.; Cao, W.; Zhang, Y.; Liu, H. A novel rate of penetration prediction model with identified condition for the complex geological drilling process. J. Process Control 2021, 100, 30–40. [Google Scholar] [CrossRef]
Yang, S.; Guo, Z.; Zhang, H.; Gao, M. Prediction of rate of penetration based on integrated transfer learning. Comput. Syst. Appl. 2022, 31, 270–278. [Google Scholar] [CrossRef]
Zhou, D.; Li, J.; Xu, H. Research on PDC drilling parameter optimization based on dynamic drilling strategy. E3S Web Conf. 2023, 438, 01023. [Google Scholar] [CrossRef]
Li, L.; Guan, B.; Ming, R.; Zhang, X.; Zhang, J.; Lv, X.; Wang, G.; Zhao, X. The method for prediction of formation pore pressure based on mechanical specific energy theory. IOP Conf. Ser. Earth Environ. Sci. 2021, 859, 012004. [Google Scholar] [CrossRef]
Zhai, H.; Chen, H.; Shi, B.; Zhao, H.; Gao, F. Drilling Monitoring While Drilling and Comprehensive Characterization of Lithology Parameters. Appl. Sci. 2025, 15, 11134. [Google Scholar] [CrossRef]
Chen, Y.M.; Yan, Z.L. Calculation method of actual bottom-hole weight on bit in directional wells. Drill. Prod. Technol. 1996, 4, 1–5. [Google Scholar]
Meng, Y.F.; Yang, M.; Li, G.; Li, Y.J.; Tang, S.H.; Zhang, J.; Lin, S.Y. New method of real-time evaluation and optimization of drilling efficiency based on mechanical specific energy theory. J. China Univ. Pet. (Ed. Nat. Sci.) 2012, 36, 110–114+119. [Google Scholar]
Srinivas, S.; Jayaram, A.; Bhavadharani, K.; Pant, K.S.; Thirumala, K.; Kumar, T.S. Classification of Power Quality Disturbances Using Convolutional Neural Network and Temporal Convolutional Network Models. In Proceedings of the 2024 23rd National Power Systems Conference, NPSC, Indore, India, 14–16 December 2024. [Google Scholar] [CrossRef]
Chu, D.J.; Hu, Y.Z. Application research on pre-drill wave impedance prediction based on temporal convolutional network. In Proceedings of the 5th Oil and Gas Geophysics Academic Annual Conference, Qingdao, China, 19 April 2023; pp. 133–138. [Google Scholar] [CrossRef]
Williams, A.E.D.; Robinson, A.W.; Wells, J.; Tsakalidis, K.; Shen, Y.-C.; Browning, N.D. Deep Convolutional Neural Network Based Image Denoising in STEM. In Proceedings of the 13th Asia Pacific Microscopy Congress 2025 (APMC13), Brisbane, Australia, 2–7 February 2025. [Google Scholar] [CrossRef]
Arora, D.; Garg, M.; Gupta, M. Diving deep in Deep Convolutional Neural Network. In Proceedings of the 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India, 18–19 December 2020; pp. 749–751. [Google Scholar] [CrossRef]
Wang, J.; Ma, Y.; Huang, Z.; Xue, R.; Zhao, R. Performance Analysis and Enhancement of Deep Convolutional Neural Network. Bus. Inf. Syst. Eng. 2019, 61, 311–326. [Google Scholar] [CrossRef]
Majerus, S.; D’argembeau, A. Verbal short-term memory reflects the organization of long-term memory: Further evidence from short-term memory for emotional words. J. Mem. Lang. 2011, 64, 181–197. [Google Scholar] [CrossRef]
Liu, C.; Jin, Z.; Gu, J.; Qiu, C. Short-term load forecasting using a long short-term memory network. In Proceedings of the IEEE PES Innovative Smart Grid Technologies Conference Europe, Turin, Italy, 26–29 September 2017. [Google Scholar] [CrossRef]
Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]
Chen, H.; Li, N.; Li, M.; Sun, Z.; Meng, F.; Su, J. Ensemble Learning with Additive Attention Mechanism for Short-Term Load Forecasting. In Proceedings of the 2024 43rd Chinese Control Conference (CCC), Kunming, China, 28–31 July 2024; pp. 7113–7118. [Google Scholar] [CrossRef]
Wang, L.F.; Chen, J.Z. TCN-LSTM roadway deformation prediction model incorporating additive attention mechanism. Gold Sci. Technol. 2025, 33, 1020–1030. [Google Scholar]
Zouache, D.; Arby, Y.O.; Nouioua, F.; Ben Abdelaziz, F. Multi-objective chicken swarm optimization: A novel algorithm for solving multi-objective optimization problems. Comput. Ind. Eng. 2019, 129, 377–391. [Google Scholar] [CrossRef]
Zakaria, L.; Salim, C. Comparison of Genetic Algorithm and Quantum Genetic Algorithm. Int. Arab. J. Inf. Technol. 2012, 9, 243–249. [Google Scholar]
Smith, M.G.; Bull, L. Genetic Programming with a Genetic Algorithm for Feature Construction and Selection. Genet. Program. Evolvable Mach. 2005, 6, 265–281. [Google Scholar] [CrossRef]
Liao, X.; Khandelwal, M.; Yang, H.; Koopialipoor, M.; Murlidhar, B.R. Effects of a proper feature selection on prediction and optimization of drilling rate using intelligent techniques. Eng. Comput. 2020, 36, 499–510. [Google Scholar] [CrossRef]
Messaoud, A.; Weihs, C. Monitoring a deep hole drilling process by nonlinear time series modeling. J. Sound Vib. 2009, 321, 620–630. [Google Scholar] [CrossRef]
Zang, C.; Lu, Z.; Ye, S.; Xu, X.; Xi, C.; Song, X.; Guo, Y.; Pan, T. Drilling parameters optimization for horizontal wells based on a multiobjective genetic algorithm to improve the rate of penetration and reduce drill string drag. Appl. Sci. 2022, 12, 11704. [Google Scholar] [CrossRef]
Liu, W.; Fu, J.; Tang, C.; Huang, X.; Sun, T. Real-time prediction of multivariate ROP (rate of penetration) based on machine learning regression algorithms: Algorithm comparison, model evaluation and parameter analysis. Energy Explor. Exploit. 2023, 41, 1779–1801. [Google Scholar] [CrossRef]
Peng, C.; Zhang, H.-L.; Fu, J.-H.; Su, Y.; Li, Q.-F.; Yue, T.-Q. A novel drilling parameter optimization method based on big data of drilling. Pet. Sci. 2025, 22, 1596–1610. [Google Scholar] [CrossRef]
Ramba, V.; Selvaraju, S.; Subbiah, S.; Palanisamy, M.; Srivastava, A. Optimization of drilling parameters using improved play-back methodology. J. Pet. Sci. Eng. 2021, 206, 108991. [Google Scholar] [CrossRef]

Figure 1. Data-processing result curve.

Figure 2. Pearson correlation heatmap of ROP and drilling parameters.

Figure 3. Raw Mechanical Specific Energy.

Figure 4. Network structure diagram of TCN.

Figure 5. LSTM network unit structure diagram [21].

Figure 6. Network structure diagram of TCN-LSTM model integrated with additive attention mechanism.

Figure 7. WOB and RPM visualization.

Figure 8. Pearson correlation analysis of ROP and MSE with other drilling parameters.

Figure 9. Actual vs. predicted ROP.

Figure 10. Comparison curves of ROP and MSE with well depth before and after optimization.

Table 1. Core operating parameters of the genetic algorithm.

Parameter Name	Value
Population size	100
Maximum number of generations	200
Crossover probability	0.9
Mutation probability	0.1
Crossover distribution index	20
Mutation distribution index	20

Table 2. Comparison results of accuracy of different ROP prediction models.

Predictive Model	R²	MAE	RMSE
BP	0.79	1.86	2.37
CNN	0.725	1.95	3.1
LSTM	0.82	1.58	2.04
TCN-LSTM	0.88	1.15	1.76
TCN-LSTM-Attention	0.91	0.96	1.69

Table 3. Five-fold inter-well cross-validation results of TCN–LSTM–Attention model.

Fold	Number of Training Wells	Number of Test Wells	R²	MAE (m/h)	RMSE (m/h)
1	9	3	0.902	1.01	1.75
2	10	2	0.915	0.93	1.62
3	9	3	0.897	1.05	1.81
4	10	2	0.921	0.89	1.58
5	10	2	0.908	0.97	1.67
Mean	-	-	0.9086	0.97	1.686

Table 4. Statistical characteristics and 95% confidence intervals of model performance indicators.

Indicator	Mean	Standard Deviation	Standard Error	Lower Limit of 95% Confidence Interval	Upper Limit of 95% Confidence Interval
R²	0.909	0.0092	0.0041	0.897	0.921
MAE	0.97	0.062	0.028	0.89	1.05
RMSE	1.69	0.087	0.039	1.58	1.80

Table 5. Optimal drilling parameter combination results of the verification well section.

Decision Variables	Constraint Interval	Field Drilling Mean	Optimal Solution
WOB/kN	20~200	135	125
RPM/(r·min⁻¹)	30~90	69	52

Table 6. Comparison of core indicators before and after drilling parameter optimization.

Drilling Parameters	Pre-Optimization	Post-Optimization	Improvement Rate
ROP/(m·h⁻¹)	7.76	8.78	+13.1%
MSE/MPa	734.25	561.81	−23.5%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wan, L.; Song, H.; Wang, A.-G.; Li, M.; Zhang, Z.; Su, K.; Xu, J.; Kong, L.; Yan, Y.; Hu, G.; et al. Research on the Multi-Objective Optimization of Drilling Parameters Based on an Improved Coupling Model of MSE and ROP. Processes 2026, 14, 1570. https://doi.org/10.3390/pr14101570

AMA Style

Wan L, Song H, Wang A-G, Li M, Zhang Z, Su K, Xu J, Kong L, Yan Y, Hu G, et al. Research on the Multi-Objective Optimization of Drilling Parameters Based on an Improved Coupling Model of MSE and ROP. Processes. 2026; 14(10):1570. https://doi.org/10.3390/pr14101570

Chicago/Turabian Style

Wan, Lifu, Hongchen Song, Ai-Guo Wang, Meng Li, Zhili Zhang, Kanhua Su, Jiangen Xu, Lulin Kong, Yan Yan, Gui Hu, and et al. 2026. "Research on the Multi-Objective Optimization of Drilling Parameters Based on an Improved Coupling Model of MSE and ROP" Processes 14, no. 10: 1570. https://doi.org/10.3390/pr14101570

APA Style

Wan, L., Song, H., Wang, A.-G., Li, M., Zhang, Z., Su, K., Xu, J., Kong, L., Yan, Y., Hu, G., & Zhang, G. (2026). Research on the Multi-Objective Optimization of Drilling Parameters Based on an Improved Coupling Model of MSE and ROP. Processes, 14(10), 1570. https://doi.org/10.3390/pr14101570

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on the Multi-Objective Optimization of Drilling Parameters Based on an Improved Coupling Model of MSE and ROP

Abstract

1. Introduction

2. Data Processing

2.1. Data Acquisition

2.2. Data Preprocessing

2.3. Model Evaluation Metrics

3. Model Establishment

3.1. Mechanical Specific Energy Model

3.1.1. Principle of the Traditional Mechanical Specific Energy Model

3.1.2. Construction of Mechanical Specific Energy Model Considering Torque

3.2. Rate of Penetration Prediction Model

3.2.1. Temporal Convolutional Network

3.2.2. Long Short-Term Memory Network

3.2.3. Additive Attention Mechanism

3.2.4. Tcn-Lstm Model Fused with Additive Attention Mechanism

3.3. Multi-Objective Optimization Algorithm

3.3.1. Determination of Core Model Elements

3.3.2. Model Solution Based on Standard Genetic Algorithm

4. Case Study

4.1. Characteristic Analysis

4.2. Rop Prediction Model Training and Accuracy Verification

4.2.1. Model Training Setup and Hyperparameter Tuning

4.2.2. Model Prediction Accuracy Verification

4.3. Multi-Objective Optimization Solution of Drilling Parameters Based on Standard Genetic Algorithm

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI