Rate of Penetration Prediction in Steeply Dipping Coal Seams Using Data-Driven Modeling and Feature Engineering

Xue, Jiawen; Mao, Liangjie; Xing, Xuesong; Sun, Yanwei; Mo, Rihe; Pang, Zhaoyu

doi:10.3390/pr14071174

Open AccessArticle

Rate of Penetration Prediction in Steeply Dipping Coal Seams Using Data-Driven Modeling and Feature Engineering

by

Jiawen Xue

¹,

Liangjie Mao

^2,3,*,

Xuesong Xing

¹,

Yanwei Sun

⁴

,

Rihe Mo

¹ and

Zhaoyu Pang

¹

CNOOC Research Institute Co., Ltd., Beijing 100028, China

²

State Key Laboratory of Reservoir Geology and Development Engineering, Southwest Petroleum University, Chengdu 610500, China

³

School of Petroleum and Natural Gas Engineering, Southwest Petroleum University, Chengdu 610500, China

⁴

School of Mechanical and Electrical Engineering, Southwest Petroleum University, Chengdu 610500, China

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(7), 1174; https://doi.org/10.3390/pr14071174

Submission received: 12 March 2026 / Revised: 31 March 2026 / Accepted: 3 April 2026 / Published: 5 April 2026

(This article belongs to the Special Issue Data-Driven Analysis and Simulation of Coal Mining)

Download

Browse Figures

Versions Notes

Abstract

To accurately predict the rate of penetration (ROP) for steeply inclined coal seam blocks, this paper proposes a data-driven ROP prediction method incorporating feature processing. First, Savitzky–Golay (SG) filtering is applied to key continuous monitoring parameters to mitigate the impact of noise on model training. Subsequently, features are comprehensively screened across linear, monotonic, and nonlinear dependency dimensions using the Pearson correlation coefficient, Spearman correlation coefficient, and mutual information evaluation, identifying structural parameters significantly contributing to ROP. Based on this, a Time Convolution Network (TCN)-Bidirectional Long Short-Term Memory (BiLSTM)-Attention prediction model is constructed: TCN extracts local temporal patterns, BiLSTM captures forward and backward dependencies, and the attention mechanism adapts weight distribution for information across different time steps. This architecture significantly enhances the model’s ability to capture complex operational variations and improves prediction accuracy. Experimental results demonstrate that compared to benchmark models such as BiLSTM, TCN-BiLSTM, and BiLSTM-Attention, our method achieves superior performance across all evaluation metrics and exhibits strong generalization capabilities on diverse operational datasets.

Keywords:

ROP prediction; steeply dipping coal seam drilling; data-driven modeling; SG filter

1. Introduction

ROP refers to the distance advanced by the drill bit per unit time during drilling operations, serving as a key indicator for evaluating drilling efficiency [1]. Accurate and efficient ROP prediction provides the data foundation for the holistic, coordinated optimization of drilling parameters. Traditional ROP prediction methods rely on physical models or statistical regression models derived from field data [2], such as the Winters ROP equation [3], Bingham ROP equation [4] and Warren ROP equation [5]. However, although mechanism-based ROP prediction methods demonstrate significant advantages when confronting the complex downhole conditions of steeply inclined coal seams and massive drilling data, they face challenges such as high fitting difficulty and low accuracy in such applications. Soares et al. [6] highlighted the limitations of traditional ROP modeling based on analytical equations, underscoring the urgent need for new technologies to enhance ROP prediction accuracy.

With the advancement of computer technology, machine learning (ML) has also been widely applied in the oil and gas industry [7,8]. ML can effectively perform deep mining of drilling data, uncovering deep latent information overlooked by traditional models. It can fit complex nonlinear relationships without relying on theoretical analysis, thereby improving the accuracy of drilling rate prediction [9]. Various machine learning techniques, such as Support Vector Machines (SVM) and Extreme Gradient Boosting (XGBoost) [10,11], have also been applied to drilling rate prediction studies, achieving superior predictive performance compared to traditional models. In recent years, considering the time-series characteristics between drilling parameters and ROP, some researchers have applied neural networks such as Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), Transformers, and Backpropagation Neural Networks (BPNN) to ROP prediction studies [12,13,14,15].

Although ML-based ROP prediction methods offer significant advantages over traditional physics-based drilling rate prediction approaches, current research still exhibits the following limitations: (1) The drilling process is inherently a dynamic, continuous time-series activity where drilling parameters are closely intertwined with both current and historical engineering conditions. Traditional machine learning approaches struggle to capture and interpret the temporal evolution patterns of ROP, limiting their ability to accurately predict ROP trends and consequently affecting the precision and reliability of predictions. (2) ROP prediction tasks involve complex geological and engineering parameters. Traditional models often fail to effectively identify and utilize key parameters. Subjectivity in parameter selection and neglect of data nonlinearity prevent focusing on factors most influential to prediction outcomes. Consequently, critical information remains underutilized during prediction, impairing the model’s generalization capability.

In summary, this paper proposes an ROP prediction model based on TCN-BiLSTM-Attention, which has been applied in areas such as short-term power load forecasting, short-term PV forecasting, cutting tool remaining life prediction, and dam safety inspection [16,17,18,19]. With ROP as the prediction target, the model employs a TCN to construct a borehole depth sequence feature extractor under underground constraints. It utilizes an attention mechanism to prioritize key parameters influencing the output, enhancing prediction accuracy by capturing correlations between various influencing factors and the current ROP. Simultaneously, the BiLSTM layer addresses long-term dependencies in the time-series data, significantly enhancing the model’s ability to capture ROP trend changes. Finally, a fully connected layer synthesizes key information from preceding layers to generate the ROP prediction result. This approach demonstrates clear superiority in improving ROP prediction accuracy and its generalization capability across varying conditions.

2. Data Processing and Feature Analysis

2.1. Data Composition

The drilling data were collected from Block L, Xinjiang, China, where coal seams are characterized by steep dip angles ranging from 42° to 78° (mean 61.3°). In this paper, a “steeply dipping coal seam” is defined as a coal interval with true dip ≥ 45°, single-layer thickness ≥ 0.5 m, and typical coal log responses (GR < 50 API, density < 1.5 g/cm³). Over 35% of the coal samples in the dataset satisfy this steep-dip criterion, ensuring that the proposed ROP model is specifically trained and tested for steeply inclined coal seam conditions. The dataset includes logging data, drill bit records, and other parameters, comprising 25 parameter types and a total of 42,368 data entries. Data types encompass numeric values and strings, indexed by well depth with a sampling interval of 1 m. The dataset parameters are shown in Table 1.

Data sets can be categorized into continuous variables and discrete variables. Continuous variables include drilling engineering parameters, while discrete variables encompass formation information, drilling contractors, and drill bit model parameters. These features lack inherent numerical significance and cannot be directly represented by numerical values. Discrete features must be encoded using unique codes to represent multiple states, thereby converting them into continuous variables. Drill bit models are classified into four categories based on manufacturer, with corresponding codes: 1000 (Drill Bit Model 1), 0100 (Drill Bit Model 2), ..., 0001 (Drill Bit Model 4). Similarly, drilling contractors are categorized into five groups by organization, and formation information is divided into six types. The unidirectional coded drilling data is presented in Table 2.

2.2. Data Processing

Data quality is a key factor affecting the accuracy of predictive models [20]. However, actual drilling data may encounter issues such as sensor failures and storage errors, resulting in outliers that exceed normal parameter ranges. Therefore, outlier removal processing is required on raw data to eliminate interference from abnormal entries. Since drilling data approximately follows a normal distribution, the 3σ rule is applied for outlier removal, discarding data points exceeding three standard deviations from the dataset.

Despite outlier removal, the processed data still contained noise interference that affected model training performance. To address this, this paper employed SG for noise reduction on the original data [21]. This method effectively reduces random fluctuations by performing polynomial fitting within local windows while preserving the original trend of the data. It is particularly well-suited for handling non-periodic and nonlinear noise samples. Figure 1 illustrates the working principle of the SG convolution smoothing method.

Window width (

w

) determines the number of smoothing points. An excessively small window width cannot effectively smooth noise, while an excessively large window width may result in the loss of signal features. Polynomial degree (

q

) determines the complexity of the fit. A higher degree can better capture details, but an excessively high degree may lead to overfitting, thereby retaining noise.

To systematically evaluate the impact of

w, q

on noise reduction effectiveness, four distinct window widths (11, 13, 19, 23) were selected. Simultaneously, to prevent excessive order leading to overfitting, two different polynomial orders (2, 3) were chosen for experimentation. The resulting curves were compared between ROP and well depth. The comparative curves for different parameter combinations are shown in Figure 2.

We evaluated the effect of SG smoothing on logging data under different window widths and polynomial orders. When the window width is 11 and the polynomial order is 3, the mean squared error (MSE) is 137.5219, indicating good filtering performance with effective removal of noise points. The data variation diagram after SG smoothing and denoising with a window width of 11 and a polynomial order of 3, shown in Figure 3, effectively reduces random fluctuations and unnecessary noise in the drilling measurement data. The data curve becomes smoother and clearer without significantly altering the original data characteristics, making data features easier to identify and analyze. This demonstrates that this combination (window width 11, polynomial order 3) achieves a good balance between signal smoothing and detail preservation.

To ensure that input features with different dimensions are converted to a common scale, the data underwent normalization to eliminate differences between dimensions [22]. This paper employs maximum–minimum normalization to scale the data to the range [0, 1], as expressed by the following formula:

X_{i} = \frac{x_{i} - \min (x_{i})}{\max (x_{i}) - \min (x_{i})}

(1)

where

x_{i}

is the raw data,

\max (x_{i})

and

\min (x_{i})

represent the maximum and minimum values in the data, respectively, while

X_{i}

is the normalized data.

2.3. Feature Analysis

Numerous parameters influence ROP prediction, and selecting appropriate input features is crucial for ensuring the performance of data-driven models. Correlation analysis is one of the most widely used methods for feature selection, employing correlation coefficients to assess the relationship between two variables [23]. Considering that discrete data is statistically analyzed based on segmented data rather than point-by-point changes like continuous data, this study’s feature analysis primarily focuses on continuous variables. This study employs Pearson, Spearman, and mutual information methods to analyze both linear and nonlinear correlations between ROP and feature parameters. The calculation formulas for these three correlation analyses are shown in Equations (2)–(5). The correlation analysis results are presented in Figure 4.

r = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}} \sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}}

(2)

where

n

denotes the sample size,

x_{i}

represents the

i

-th observation of the target feature,

y_{i}

denotes the

i

-th observation of the input feature,

\bar{x}

and

\bar{y}

denote the mean values of the two features, respectively.

r

is the Pearson correlation coefficient, which ranges from −1 to 1.

ρ_{s} = 1 - \frac{6 \sum d_{i}^{2}}{n (n^{2} - 1)}

(3)

d_{i} = rank (x_{i}) - rank (y_{i})

(4)

where

ρ_{s}

denotes the Spearman coefficient, which ranges from −1 to 1.

d_{i}

represents the rank difference between the input feature and the target variable.

rank (x_{i})

indicates the sample’s rank

x_{i}

position in X, while

rank (y_{i})

denotes the sample’s rank

y_{i}

position in Y.

\sum d_{i}^{2}

signifies the sum of squared rank differences across all samples; a smaller value indicates greater consistency in the order of the two variables and stronger correlation.

n (n^{2} - 1)

represents the normalization factor.

I (X; Y) = \sum_{i, j} p (x_{i}, y_{j}) \log \frac{p (x_{i}, y_{j})}{p (x_{i}) p (y_{j})}

(5)

where

I (X; Y)

denotes the nonlinear relationship between input features

X

and target parameters,

p (x_{i}, y_{j})

represents the joint probability distribution, i.e., the probability of

X = x_{i}

and

Y = y_{i}

occurring simultaneously.

\frac{p (x_{i}, y_{j})}{p (x_{i}) p (y_{j})}

is a measure of the ratio between the probability of

X

and

Y

occurring together and their independent occurrence probabilities.

As shown in Figure 4, ROP exhibits negative correlations with TVD, SPP, WOB, and BT, while displaying positive correlations with other input parameters. However, this merely reflects linear relationships among the data and does not fully capture the intrinsic connections between physical parameters during drilling operations. Therefore, this study introduces the mutual information method as a correlation analysis technique for nonlinear relationships.

When the absolute value of the correlation coefficient exceeds 0.8, it indicates a strong correlation between the two variables. When the absolute value falls within [0.5, 0.8], the correlation is considered moderate. A value below 0.5 indicates a weak correlation, while a value below 0.3 suggests essentially no correlation [24]. Table 3 is derived based on these criteria. In this study, parameter selection was based on a threshold of moderate correlation. However, since both SPP and FLWpmps approached this moderate correlation threshold, these two parameters were also included as inputs to the model.

In summary, by analyzing the correlation between various drilling parameters and ROP, parameters with significant influence on ROP prediction can be effectively selected. The seven parameters—TVD, WOB, RPM, SPP, FLWpmps, BT, and TORQUE—serve as input variables for the ROP prediction model. Detailed statistical results are presented in Table 4.

3. Data-Driven ROP Prediction

3.1. Overall Framework

The framework structure of TCN-BiLSTM-Attention is shown in Figure 5. This model consists of three submodules: TCN, BiLSTM, and Attention. The model includes an input layer, TCN layer, Dropout layer, BiLSTM layer, Attention layer, and Flatten layer. The TCN layer primarily extracts features inherent to the time series, while the BiLSTM layer captures long-term and short-term dependencies by processing the forward and backward relationships between features. Subsequently, the Flatten layer with the Attention mechanism and the fully connected layer obtain correlations between sequences, filtering out potentially distracting irrelevant information to generate improved prediction sequences. TCN-BiLSTM accelerates processing by extracting features and dependencies from each sequence. After processing through the TCN-BiLSTM layer, feature dimensions are significantly reduced, facilitating the Attention mechanism to select useful features for prediction.

3.2. TCN-BiLSTM-Attention Model Theory

TCN is an enhanced framework based on the Convolutional Neural Network (CNN) architecture. Its core structure combines causal dilated convolutions and residual blocks to capture local temporal features and long-term dependencies of modal components [25]. The relevant TCN calculation formula is shown in Equation (8). The overall TCN framework is illustrated in Figure 6.

F (i) = \sum_{s = 0}^{u - 1} f_{s} z_{i - d s}

(6)

where

u

is the convolution kernel size;

f

is the filter coefficient of the TCN;

z_{i}

is the input data;

v

is the convolution kernel position;

z_{i - d s}

performs convolution calculations on historical load data;

F (i)

is the convolution result at

z_{i}

position.

BiLSTM is constructed based on LSTM and can effectively uncover latent connections between future and historical data points [26]. BiLSTM can process feature sequences output by TCN, enhancing the model’s ability to model temporal dynamics. The LSTM model results and corresponding gate calculation formulas are shown in Figure 7. The BiLSTM structure is depicted in Figure 8. The forward and backward calculation formulas for BiLSTM are shown in Equation (9).

\{\begin{array}{l} {\vec{h}}_{t} = LSTM (x_{t}, {\vec{h}}_{t - 1}) \\ {\overset{\leftarrow}{h}}_{t} = LSTM (x_{t}, {\overset{\leftarrow}{h}}_{t - 1}) \\ y_{t} = W_{1} {\vec{h}}_{t} + W_{2} {\overset{\leftarrow}{h}}_{t} + b_{t} \end{array}

(7)

where

{\vec{h}}_{t}

and

{\overset{\leftarrow}{h}}_{t}

represent the forward and backward iteration vectors at time step

t

, respectively.

y_{t}

denotes the output result,

W_{1}

and

W_{2}

represent the weights of the forward and backward parameters, respectively, and

b_{t}

is the bias.

The Attention mechanism, inspired by human visual attention, assigns differential weights to information at different positions within the BiLSTM output sequence. This enables the model to autonomously focus on valuable information, thereby enhancing its performance [27]. Attention predicts ROP by aggregating information from all time steps within the drilling operation time series, weighted according to their respective attention values. The structure of the Attention mechanism is illustrated in Figure 9.

In Figure 9,

y_{j} = [y_{1}, y_{2}, ..., y_{n}]

represents the input data for the

j

-th data point (

1 \leq j \leq n

);

s_{j} = [s_{1}, s_{2}, ..., s_{n}]

denotes the corresponding scoring function;

S o f t \max

is the activation function;

a_{j} = [a_{1}, a_{2}, ..., a_{n}]

represents the weights for the corresponding input vector; and a is the final output.

The implementation steps for Attention are as follows:

(1) Encode the input data to obtain a set of query vectors

q

; then calculate the similarity between scores

s_{j}

, where the scoring function

s_{j} (y_{j}, q)

is defined as shown in Formula (10):

s_{j} (y_{j}, q) = V^{T} \tanh (W y_{j} + U q)

(8)

Among these,

V^{T}

is the weight vector matrix for the function;

W

is the learning parameter matrix for the linear transformation of vector

y_{j}

;

U

is the learning parameter matrix for the linear transformation of vector

q

.

(2) Normalize using the function to obtain the weights of the input vector, calculated as shown in Formula (11):

a_{j} = softmax (s (y_{i}, q)) = \frac{\exp (s (y_{i}, q))}{\sum_{j = 1}^{n} \exp (s (y_{j}, q))}

(9)

(3) Calculate the weighted sum of the vector weights to obtain the final output a, as shown in Formula (12):

a = \sum_{j = 1}^{n} a_{j}

(10)

3.3. Model Evaluation Metrics

The experimental environment for this study is based on the Ubuntu 22.04 operating system, utilizing Python 3.10.0 as the development language. The deep learning framework employed is PyTorch 2.7.1 (CUDA 12.8), with experiments conducted on an NVIDIA GeForce RTX 5070 Ti GPU platform featuring 16 GB of video memory.

ROP forecasting is a classic regression problem. This paper selects Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R²) as evaluation metrics [28], with their calculation formulas shown in Equations (13)–(15).

M A E = \frac{1}{N} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(11)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(12)

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(13)

where

y_{i}

denotes the actual ROP value,

\bar{y}

represents the average ROP,

{\hat{y}}_{i}

indicates the model’s predicted ROP value, and

N

denotes the dataset size.

M A E

and

R M S E

take values in the range [0, +∞), where smaller values indicate higher prediction accuracy of the model. R² ranges from [0, 1], with values closer to 1 signifying higher variance explained by the model and thus better prediction performance.

4. ROP Prediction Results

4.1. Hyperparameter Optimization

The TCN-BiLSTM-Attention model features a large number of parameters, making manual tuning both inefficient and prone to inaccuracies. This paper attempts to introduce Dung Beetle Optimization (DBO) and its improved algorithms to optimize the parameters of TCN-BiLSTM-Attention, thereby enhancing model performance [29]. DBO primarily updates positions based on the characteristics of dung beetle behaviors—ball rolling, dancing, reproduction, foraging, and theft—offering advantages such as strong search capabilities and rapid convergence. To further enhance optimization effectiveness, this paper modifies DBO to form the IDBO algorithm. First, the population is initialized using Sobol sequences, leveraging their uniform distribution to accelerate convergence. Second, a dynamic subtraction factor is introduced to prevent local optima: two random vectors are generated within the optimization range, their difference is multiplied by a dynamically decreasing coefficient based on iteration count, and this result is incorporated into the position update function. The expressions are shown in Equations (16)–(18):

A = (b_{\max} - b_{\min}) N + b_{\min}

(14)

Q = \frac{0.5 - 0.5 i}{I_{\max}}

(15)

{V = (A}_{1} {- A}_{2}) FQ

(16)

where

A

denotes a random vector;

b_{\max}

and

b_{\min}

represent the maximum and minimum values of the parameters, respectively;

N

denotes a randomly generated vector with the same number of elements as the parameters to be optimized, where each element is sampled randomly from the interval [0, 1];

F

denotes the subtraction factor coefficient;

Q

denotes the dynamic coefficient;

i

denotes the current iteration count;

I_{\max}

denotes the maximum iteration count.

To select the most effective optimization algorithm, this paper employs DBO, IDBO, and the particle swarm optimization (PSO) algorithm for model parameter optimization. The model parameters to be optimized, as shown in Table 5, include the number of TCN layers, convolution kernel size, LSTM hidden layer count, LSTM layer count, initial learning rate, Dropout, sliding window size, and batch size.

During the specific optimization process, as shown in Table 5, by setting the initial ranges for each parameter, the three algorithms iteratively optimized within the same search space. The fitness curves are depicted in Figure 10.

Figure 10 illustrates the convergence speed and effectiveness of various algorithms during the optimization process. As shown in Figure 10, the IDBO algorithm demonstrates rapid convergence in the initial stages and maintains a stable, high-quality fitness value in the later stages, highlighting its robust global search capability and ability to avoid local optima. In contrast, PSO and DBO exhibit slightly inferior convergence rates and final fitness values, often becoming trapped in local optima during the search process, resulting in optimization performance that falls short of IDBO.

Table 6 presents the optimal parameter combinations obtained using the IDBO algorithm. Compared to the initial range, IDBO successfully identified a more ideal set of parameter configurations. These optimized parameter settings significantly improved the model’s prediction accuracy and generalization performance during experimental validation.

To prevent overfitting during model training, this study employs an early stopping strategy [30]. When the validation set loss ceases to decrease significantly over several consecutive training epochs, the training process is terminated prematurely. This approach effectively prevents the model from overfitting to the training data, thereby enhancing its generalization capability on unseen data.

4.2. Model Comparison

To validate the predictive performance of the TCN-BiLSTM-Attention model, this study selected BiLSTM, BiLSTM-Attention, and TCN-BiLSTM as comparison models for prediction. The data from the X1 and X2 well sections in Block L were used as the test set for prediction, with a training-to-test ratio of 7:3.

As shown in Figure 11, all four models effectively capture the fluctuating variations in drilling data and exhibit high consistency with actual ROP values. However, the BiLSTM, BiLSTM-Attention, and TCN-BiLSTM models demonstrate noticeable deviations from the true values. In contrast, the TCN-BiLSTM-Attention model demonstrates superior performance in ROP prediction, yielding results closer to the actual values and exhibiting a prediction trend more consistent with the actual ROP variations.

The performance of different models under various evaluation metrics is shown in Table 7. As indicated in Table 7, the TCN-BiLSTM-Attention model demonstrates the best performance. For the X1 and X2 well sections, its average RMSE, MAE, and R² values are 8.7409, 6.3238, and 0.9438, respectively. Compared to the BiLSTM, BiLSTM-Attention, and TCN-BiLSTM models, the TCN-BiLSTM-Attention model achieved reductions in RMSE of 47.29%, 32.28%, and 31.94%, respectively. MAE decreased by 46.93%, 31.04%, and 31.12%, respectively, while R² increased by 17.96%, 7.30%, and 7.04%, respectively.

SHAP (SHapley Additive exPlanations) is a tool for interpreting machine learning models [31]. This paper employs SHAP to conduct an interpretability analysis of input parameters. Figure 12 illustrates the contribution of seven feature variables to ROP, with the X-axis representing the computed SHAP values. Lower values of feature variables are depicted in blue, higher values in red, and intermediate values in transitional shades.

Figure 13 shows the SHAP average absolute value results for seven input parameters. From this figure, the SHAP average absolute values are ranked from smallest to largest as follows: TVD, RPM, WOB, SPP, FLWpmps, BT, and TORQUE. FLWpmps, BT, and TORQUE exert significant influence on ROP prediction, consistent with expectations from drilling engineering physics and mechanics [32]. WOB and SPP demonstrate moderate impact, while RPM and TVD exhibit weak influence. WOB is a key parameter influencing bit pressure during drilling. SPP and FLWpmp are closely related to drilling fluid: SPP correlates with fluid flow pressure, directly affecting fluid flow rate, while FLWpmp indicates pumping pressure during drilling and is closely linked to fluid flow rate. BT serves as an indicator for evaluating drill bit performance and indirectly influences ROP. RPM denotes the rotational speed of the drill bit. Figure 13 indicates that RPM has a relatively minor effect on ROP, likely because other factors in the model adequately explain drilling rate. Nevertheless, it remains an essential input parameter. TVD represents the vertical distance to the target depth, which is related to wellbore deviation in drilling operations and constitutes one of the indispensable factors affecting ROP.

5. Conclusions and Future Work

In ROP research for geological resource extraction, ML has been widely applied. This paper analyzes relevant models and proposes a data-driven ROP prediction model for steeply dipping coal seams based on feature processing by optimizing the hyperparameters of the IDBO model. Specific conclusions are as follows:

(1): Feature processing of model input parameters was conducted through linear and nonlinear analysis using Pearson correlation analysis, Spearman correlation analysis, and mutual information methods. Considering the characteristics of the SG denoising method, window widths and polynomial orders were analyzed to determine that SG smoothing with a window width of 11 and a polynomial order of 3 effectively denoises the data.
(2): Based on engineering parameter features and geological features, a TCN-BiLSTM-Attention model was constructed under well depth constraints, integrating temporal and geological parameters. This model comprehensively accounts for the combined effects of engineering parameters and geological features during drilling operations, providing an efficient technical solution for accurate ROP prediction.
(3): The proposed model was compared with BiLSTM, BiLSTM-Attention, and TCN-BiLSTM models in terms of prediction results. For the X1 and X2 well sections in Block L, Xinjiang, the model predictions outperformed other models in evaluation metrics.
(4): To comprehensively understand the influence of input variables on ROP, this study employs model interpretability methods. Through explanatory analysis provided by the SHAP method, the contribution of feature variables is ranked in ascending order as follows: TVD, RPM, WOB, SPP, FLWpmps, BT, and TORQUE.

To address the above limitations, future work will focus on introducing core geological indicators of steeply dipping coal seams and their coupling with drilling parameters; expanding data sources to enhance generalization; optimizing the IDBO model’s hyperparameter strategy; deepening interpretability via SHAP and other tools; and exploring integration with real-time drilling monitoring systems for engineering application.

Author Contributions

Conceptualization, J.X.; methodology, L.M.; software, X.X.; validation, Y.S.; data, R.M.; writing—original draft preparation, Y.S. and L.M.; writing—review and editing, Z.P. and L.M.; visualization, Y.S.; supervision, J.X. All authors have read and agreed to the published version of the manuscript.

Funding

National Science and Technology Major Project of the Ministry of Science and Technology of China (2025ZD1403201), CNOOC Comprehensive Research Project: Research and Application of Key Technologies for Drilling and Completion of High-Steep Coal Seams in Xinjiang (KJZH-2025-2306).

Data Availability Statement

The data that support the findings of this study are available from the corresponding authors upon reasonable request. Currently, the data comes from China National Offshore Oil Corporation (CNOOC), which is a state-owned enterprise in China. Due to confidentiality concerns, this data cannot be discussed in detail in the thesis. The data that support the findings of this study are available from the corresponding authors upon reasonable request.

Acknowledgments

The authors would like to thank the editor and the reviewers for their useful feedback that improved this paper.

Conflicts of Interest

Authors Jiawen Xue, Xuesong Xing, Rihe Mo and Zhaoyu Pang were employed by CNOOC Research Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The CNOOC Research Institute Co., Ltd had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviation

The following abbreviations are used in this manuscript:

ROP	rate of penetration
SG	Savitzky–Golay
TCN	Time Convolutional Network
BiLSTM	Bidirectional Long Short-Term Memory Neural Network
ML	Machine Learning
SVM	Support Vector Machine
XGBoost	Extreme Gradient Boosting
LSTM	Long Short-Term Memory
GRU	Gated Recurrent Unit
BPNN	Backpropagation Neural Network
WOH	Weight On Hook
TVD	True Vertical Depth
WOB	Weight on Bit
RPM	Revolutions Per Minute
SPP	Standpipe Pressure
PT	Pump Time
HKH	Hook Height
TO	Flow Temperature
MWO	Mud Weight
FO	Fluid Out
BT	Bit Time
MSE	mean squared error
CNN	Convolutional Neural Network
RMSE	Root Mean Squared Error
MAE	Mean Absolute Error
R2	Coefficient of Determination
DBO	Dung Beetle Optimization
PSO	Particle Swarm Optimization
SHAP	SHapley Additive exPlanations

References

Pei, Z.-J.; Song, X.-Z.; Wang, H.-T.; Shi, Y.-Q.; Tian, S.-C.; Li, G.-S. Interpretation and characterization of rate of penetration intelligent prediction model. Pet. Sci. 2024, 21, 582–596. [Google Scholar] [CrossRef]
Motahhari, H.R.; Hareland, G.; James, J. Improved drilling efficiency technique using integrated PDM and PDC bit parameters. J. Can. Pet. Technol. 2010, 49, 45–52. [Google Scholar] [CrossRef]
Winters, W.J.; Warren, T.; Onyia, E. Roller bit model with rock ductility and cone offset. In Proceedings of the SPE Annual Technical Conference and Exhibition? SPE: Calgary, AB, Canada, 1987; p. SPE-16696-MS. [Google Scholar]
Bingham, M.G. A New Approach to Interpreting—Rock Drillability; Petroleum Pub. Co.: Tulsa, OK, USA, 1965. [Google Scholar]
Warren, T. Penetration-rate performance of roller-cone bits. SPE Drill. Eng. 1987, 2, 9–18. [Google Scholar] [CrossRef]
Soares, C.; Daigle, H.; Gray, K. Evaluation of PDC bit ROP models and the effect of rock strength on model coefficients. J. Nat. Gas Sci. Eng. 2016, 34, 1225–1236. [Google Scholar] [CrossRef]
Sun, Y.; Mao, L.; Liu, Q.; Zhao, P.; Zhu, H. Sand Screenout Early Warning Model Based on Combinatorial Neural Network. Rock Mech. Rock Eng. 2025, 1–24. [Google Scholar] [CrossRef]
Zhu, Z.; Wan, M.; Sun, Y.; Gong, X.; Lei, B.; Tang, Z.; Mao, L. A Multi-Source Data-Driven Fracturing Pressure Prediction Model. Processes 2025, 13, 3434. [Google Scholar] [CrossRef]
Li, Q.; Li, J.-P.; Xie, L.-L. A systematic review of machine learning modeling processes and applications in ROP prediction in the past decade. Pet. Sci. 2024, 21, 3496–3516. [Google Scholar] [CrossRef]
Nautiyal, A.; Mishra, A. Drill bit selection and drilling parameter optimization using machine learning. In Proceedings of the IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bristol, UK, 2023; p. 012027. [Google Scholar]
Allawi, R.H.; Al-Mudhafar, W.J.; Abbas, M.A.; Wood, D.A. Leveraging boosting machine learning for drilling rate of penetration (ROP) prediction based on drilling and petrophysical parameters. Artif. Intell. Geosci. 2025, 6, 100121. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, F.; Yang, S.; Cao, J. Self-attention mechanism for dynamic multi-step ROP prediction under continuous learning structure. Geoenergy Sci. Eng. 2023, 229, 212083. [Google Scholar] [CrossRef]
Zhang, C.; Song, X.; Su, Y.; Li, G. Real-time prediction of rate of penetration by combining attention-based gated recurrent unit network and fully connected neural networks. J. Pet. Sci. Eng. 2022, 213, 110396. [Google Scholar] [CrossRef]
Ji, H.; Lou, Y.; Cheng, S.; Xie, Z.; Zhu, L. An advanced long short-term memory (LSTM) neural network method for predicting rate of penetration (ROP). ACS Omega 2022, 8, 934–945. [Google Scholar] [CrossRef] [PubMed]
Gan, C.; Wang, X.; Wang, L.-Z.; Cao, W.-H.; Liu, K.-Z.; Gao, H.; Wu, M. Multi-source information fusion-based dynamic model for online prediction of rate of penetration (ROP) in drilling process. Geoenergy Sci. Eng. 2023, 230, 212187. [Google Scholar] [CrossRef]
Feng, Y.; Zhu, J.; Qiu, P.; Zhang, X.; Shuai, C. Short-term power load forecasting based on TCN-BiLSTM-attention and multi-feature fusion. Arab. J. Sci. Eng. 2025, 50, 5475–5486. [Google Scholar] [CrossRef]
Liang, J.; Yue, J.; Xin, Y.; Pan, S.; Tian, J.; Sun, J. Short-Term photovoltaic power forecasting based on K-means++ clustering, secondary decomposition and TCN-BiLSTM-Attention model. Electr. Power Syst. Res. 2026, 255, 112749. [Google Scholar] [CrossRef]
Wang, S.; Jia, H.; Wang, A.; Wu, L.; Li, Q. Remaining Useful Life Prediction of Cutting Tools Based on a Depthwise Separable TCN-BiLSTM Model with Temporal Attention. Lubricants 2025, 13, 507. [Google Scholar] [CrossRef]
Xie, Z.; Chen, L.; Li, Y. Optimization of dam safety monitoring models based on residual compensation: A multidimensional temporal framework integrating TCN-BiLSTM-attention mechanism. Eng. Struct. 2026, 348, 121869. [Google Scholar] [CrossRef]
Sharma, A.; Burak, T.; Nygaard, R.; Hoel, E.; Kristiansen, T.; Welmer, M. Hybrid ROP modeling: Combining analytical and data-driven approaches for drilling. Geoenergy Sci. Eng. 2025, 251, 213877. [Google Scholar] [CrossRef]
Liu, S.; Xu, T.; Du, X.; Zhang, Y.; Wu, J. A hybrid deep learning model based on parallel architecture TCN-LSTM with Savitzky-Golay filter for wind power prediction. Energy Convers. Manag. 2024, 302, 118122. [Google Scholar] [CrossRef]
Huang, L.; Qin, J.; Zhou, Y.; Zhu, F.; Liu, L.; Shao, L. Normalization techniques in training dnns: Methodology, analysis and application. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10173–10196. [Google Scholar] [CrossRef] [PubMed]
Huang, F.; Sun, Y.; Yang, J.; Sha, Z.; Lu, J.; Qi, R. Early Warning of Lost Circulation Based on Physical Models and a Hybrid Neural Network. Processes 2026, 14, 559. [Google Scholar] [CrossRef]
Gong, H.; Li, Y.; Zhang, J.; Zhang, B.; Wang, X. A new filter feature selection algorithm for classification task by ensembling pearson correlation coefficient and mutual information. Eng. Appl. Artif. Intell. 2024, 131, 107865. [Google Scholar] [CrossRef]
Li, Y.; Song, L.; Zhang, S.; Kraus, L.; Adcox, T.; Willardson, R.; Komandur, A.; Lu, N. A TCN-based hybrid forecasting framework for hours-ahead utility-scale PV forecasting. IEEE Trans. Smart Grid 2023, 14, 4073–4085. [Google Scholar] [CrossRef]
Singla, P.; Duhan, M.; Saroha, S. An ensemble method to forecast 24-h ahead solar irradiance using wavelet decomposition and BiLSTM deep learning network. Earth Sci. Inform. 2022, 15, 291–306. [Google Scholar] [CrossRef]
Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
Gan, C.; Wang, Y.; Cao, W.-H.; Liu, K.-Z.; Wu, M. Real-time formation drillability sensing-based hybrid online prediction method for the rate of penetration (ROP) and its industrial application for drilling processes. Control Eng. Pract. 2025, 164, 106487. [Google Scholar] [CrossRef]
Lyu, L.; Jiang, H.; Yang, F. Improved dung beetle optimizer algorithm with multi-strategy for global optimization and UAV 3D path planning. IEEE Access 2024, 12, 69240–69257. [Google Scholar] [CrossRef]
Reguero, Á.D.; Martínez-Fernández, S.; Verdecchia, R. Energy-efficient neural network training through runtime layer freezing, model quantization, and early stopping. Comput. Stand. Interfaces 2025, 92, 103906. [Google Scholar] [CrossRef]
Takefuji, Y. Beyond XGBoost and SHAP: Unveiling true feature importance. J. Hazard. Mater. 2025, 488, 137382. [Google Scholar] [CrossRef]
Zhang, Y.; Yu, L.; Yang, L.; Hu, Z.; Liu, Y. Data-driven framework for predicting rate of penetration in deepwater granitic formations: A marine engineering geology perspective with comprehensive model interpretability. Eng. Geol. 2025, 351, 108039. [Google Scholar] [CrossRef]

Figure 1. Sliding window implementation.

Figure 2. Noise reduction results for different parameter combinations in SG data.

Figure 3. Comparison of data changes before and after SG smoothing noise reduction.

Figure 4. Results of the correlation analysis.

Figure 5. TCN-BiLSTM-Attention Framework.

Figure 6. TCN Causal Expansion Convolution and Residual Block Structure.

Figure 7. LSTM architecture.

Figure 8. BiLSTM architecture.

Figure 9. Attention structure.

Figure 10. Optimization results of three algorithms within the same search range.

Figure 11. Prediction results for the X1 and X2 well sections using four prediction models. (a) X1-stage BiLSTM prediction results; (b) X2-stage BiLSTM prediction results; (c) X1-stage TCN-BiLSTM prediction results; (d) X2-stage TCN-BiLSTM prediction results; (e) X1-stage BiLSTM-Attention prediction results; (f) X2-stage BiLSTM-Attention prediction results; (g) X1-stage TCN-BiLSTM-Attention prediction results; (h) X2-stage TCN-BiLSTM-Attention prediction results.

Figure 12. SHAP interpretation of prediction results.

Figure 13. SHAP Average absolute value results.

Table 1. Dataset parameters.

Category	Input Parameters
Construction parameters	Weight On Hook (WOH), Total Depth, True Vertical Depth (TVD), TORQUE, Weight on Bit (WOB), Revolutions Per Minute (RPM), Standpipe Pressure (SPP), FLWpmps, Pump Time (PT), Hook Height (HKH)
Drilling fluid parameters	Flow Temperature (TO), Mud Weight (MWO), Fluid Out (FO)
Drill bit parameters	Drill model, drill diameter, drill type
Other parameters	Stratigraphic information, Bit Time (BT), Construction unit

Table 2. Independent hot-coded dataset (first 4 rows).

Construction Contractor					…	Drill Bit Type				…	Stratigraphic Information						…
_1	_2	_3	_4	_5	…	_1	_2	_3	_4	…	_1	_2	_3	_4	_5	_6	…
0	0	1	0	0	…	0	0	0	1	…	0	0	0	0	1	0	…
0	0	0	0	1	…	0	0	1	0	…	0	1	0	0	0	0	…
0	1	0	0	0	…	0	1	0	0	…	0	0	0	1	0	0	…
1	0	0	0	0	…	1	0	0	0	…	1	0	0	0	0	0	…

Table 3. Evaluation results for model input parameters.

Serial Number	Characteristic Parameters	Feature Selection
Serial Number	Characteristic Parameters	Pearson	Spearman	Mutual Information
1	TVD	√	√	√
2	WOB	√	√	√
3	RPM	√	√	√
4	SPP	×	×
5	FLWpmps	×	×	√
6	MWO	×	×	×
7	TO	×	×	×
8	WOH	×	×	×
9	PT	×	×	×
10	BT	√	√	√
11	TORQUE	√	√	√
12	HKH	×	×	×
13	FO	×	×	×

Table 4. Input parameter statistics.

Parameter	Unit	Mean	Standard Deviation	Min	Max
TVD	m	1401.55	833.91	18.99	2850.69
WOB	t	6.63	2.16	0	16.30
RPM	r/min	51.24	19.38	0	132
SPP	MPa	15.92	5.15	1.99	25.41
FLWpmps	L/h	2742.81	3851.72	2988.01	4298.97
BT	h	14.47	13.48	0	86.74
TORQUE	kN*m	14.98	6.93	0	35.16

Table 5. Hyperparameter optimization range.

Parameter	Parameter Range
Parameter	Maximum Value	Minimum Value
TCN layers	5	1
TCN convolution kernel size	5	2
LSTM hidden layer	512	32
Number of LSTM layers	6	2
Initial learning rate	0.1	0.0001
Dropout	0.5	0.1
Sliding window size	30	5
Batch size	1024	256

Table 6. Optimized parameter values.

Parameter	Optimum Value	Parameter	Optimum Value
TCN layers	2	Initial learning rate	0.001
TCN convolution kernel size	4	Dropout	0.3
LSTM hidden layer	512	Sliding window size	8
Number of LSTM layers	4	Batch size	256

Table 7. Performance of different models under various evaluation metrics.

Model	Well Section	RMSE	MAE	R²
BiLSTM	X1	16.9416	12.1842	0.8301
BiLSTM-Attention	X1	13.4011	9.5337	0.8937
TCN-BiLSTM	X1	13.6465	9.7094	0.8898
TCN-BiLSTM-Attention	X1	8.6151	6.1536	0.9561
BiLSTM	X2	16.2312	11.6502	0.7701
BiLSTM-Attention	X2	12.4140	8.8078	0.8655
TCN-BiLSTM	X2	12.0388	8.6520	0.8735
TCN-BiLSTM-Attention	X2	8.8668	6.4939	0.9314

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xue, J.; Mao, L.; Xing, X.; Sun, Y.; Mo, R.; Pang, Z. Rate of Penetration Prediction in Steeply Dipping Coal Seams Using Data-Driven Modeling and Feature Engineering. Processes 2026, 14, 1174. https://doi.org/10.3390/pr14071174

AMA Style

Xue J, Mao L, Xing X, Sun Y, Mo R, Pang Z. Rate of Penetration Prediction in Steeply Dipping Coal Seams Using Data-Driven Modeling and Feature Engineering. Processes. 2026; 14(7):1174. https://doi.org/10.3390/pr14071174

Chicago/Turabian Style

Xue, Jiawen, Liangjie Mao, Xuesong Xing, Yanwei Sun, Rihe Mo, and Zhaoyu Pang. 2026. "Rate of Penetration Prediction in Steeply Dipping Coal Seams Using Data-Driven Modeling and Feature Engineering" Processes 14, no. 7: 1174. https://doi.org/10.3390/pr14071174

APA Style

Xue, J., Mao, L., Xing, X., Sun, Y., Mo, R., & Pang, Z. (2026). Rate of Penetration Prediction in Steeply Dipping Coal Seams Using Data-Driven Modeling and Feature Engineering. Processes, 14(7), 1174. https://doi.org/10.3390/pr14071174

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Rate of Penetration Prediction in Steeply Dipping Coal Seams Using Data-Driven Modeling and Feature Engineering

Abstract

1. Introduction

2. Data Processing and Feature Analysis

2.1. Data Composition

2.2. Data Processing

2.3. Feature Analysis

3. Data-Driven ROP Prediction

3.1. Overall Framework

3.2. TCN-BiLSTM-Attention Model Theory

3.3. Model Evaluation Metrics

4. ROP Prediction Results

4.1. Hyperparameter Optimization

4.2. Model Comparison

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Construction Contractor					…	Drill Bit Type				…	Stratigraphic Information						…
_1	_2	_3	_4	_5	…	_1	_2	_3	_4	…	_1	_2	_3	_4	_5	_6	…
0	0	1	0	0	…	0	0	0	1	…	0	0	0	0	1	0	…
0	0	0	0	1	…	0	0	1	0	…	0	1	0	0	0	0	…
0	1	0	0	0	…	0	1	0	0	…	0	0	0	1	0	0	…
1	0	0	0	0	…	1	0	0	0	…	1	0	0	0	0	0	…

Construction Contractor					…	Drill Bit Type				…	Stratigraphic Information						…
_1	_2	_3	_4	_5	…	_1	_2	_3	_4	…	_1	_2	_3	_4	_5	_6	…
0	0	1	0	0	…	0	0	0	1	…	0	0	0	0	1	0	…
0	0	0	0	1	…	0	0	1	0	…	0	1	0	0	0	0	…
0	1	0	0	0	…	0	1	0	0	…	0	0	0	1	0	0	…
1	0	0	0	0	…	1	0	0	0	…	1	0	0	0	0	0	…

Construction Contractor					…	Drill Bit Type				…	Stratigraphic Information						…
_1	_2	_3	_4	_5	…	_1	_2	_3	_4	…	_1	_2	_3	_4	_5	_6	…
0	0	1	0	0	…	0	0	0	1	…	0	0	0	0	1	0	…
0	0	0	0	1	…	0	0	1	0	…	0	1	0	0	0	0	…
0	1	0	0	0	…	0	1	0	0	…	0	0	0	1	0	0	…
1	0	0	0	0	…	1	0	0	0	…	1	0	0	0	0	0	…