Drilling Rate Prediction Based on Bayesian Optimization LSTM Algorithm with Fusion Feature Selection
Abstract
1. Introduction
2. BO-LSTM Neural Network Based on Fusion Feature Selection Algorithm
2.1. Fusion Feature Selection Algorithm
- (1)
- Pearson correlation analysis. Perform Pearson correlation calculation on the cleaned data, classify all feature parameters into high-correlation, medium-correlation, and low-correlation parameter groups according to Pearson correlation principles, and then select the high-correlation parameter group with high correlation to drilling rate.
- (2)
- Variance filtering. Perform variance filtering on discrete-type parameters among all feature parameters according to variance filtering principles, and then select feature parameters with high variance values.
- (3)
- Mutual information method. Calculate mutual information estimators for continuous-type parameters among all feature parameters according to mutual information calculation principles, sort them by mutual information estimation values, and then use forward search strategy combined with model validation for further feature selection.
- (4)
- Feature fusion. Take the intersection of the parameter group obtained through correlation filtering with the parameter group obtained through variance filtering and the parameter group obtained through mutual information filtering respectively, and finally take the union of the 2 intersection parameter groups as the final selection result of the fusion algorithm for feature selection.
2.2. LSTM Neural Network
2.3. Bayesian Optimization Algorithm
2.4. BO-LSTM-FS Neural Network
3. Drilling Rate Prediction Modeling Based on BO-LSTM-FS
3.1. Data Preprocessing
3.1.1. Data Filtering
3.1.2. Data Normalization
3.2. Fusion Feature Selection Analysis
3.2.1. Correlation Analysis
3.2.2. Variance Filtering
3.2.3. Mutual Information Analysis
3.3. Establishment of BO-LSTM-FS Neural Network
- (1)
- Comprehensively preprocess the drilling dataset, including scientific division of training set and test set, data normalization, data flattening, format conversion, and other operations to provide a high-quality data foundation for subsequent modeling;
- (2)
- Establish the LSTM prediction model framework, clarify the hyperparameter system of the model, and set reasonable optimization intervals for key parameters (initial learning rate and number of hidden layers);
- (3)
- Start the Bayesian optimization process, use the LSTM model as the optimization objective function, use the model prediction error as the evaluation standard, and intelligently search for the combination of initial learning rate and number of hidden layers that optimizes the model performance by constructing a probabilistic surrogate model and designing an acquisition function;
- (4)
- Substitute the optimized best hyperparameter values into the LSTM model for training, and complete the model training through iterative updates until the preset maximum number of training epochs is reached;
- (5)
- Output the model hyperparameter configuration after Bayesian optimization and the complete BO-LSTM-FS neural network model, and input the test set data into the trained model to finally generate the drilling rate prediction results.
4. Case Study
4.1. Parameter Preprocessing
4.2. Fusion Feature Selection Method Processing
4.2.1. Drilling Parameters Processing
4.2.2. Variance Filtering Case Study
4.2.3. Mutual Information Case Study
4.3. Drilling Rate Prediction and Error Analysis
5. Conclusions
- (1)
- Taking the Daye 1H1 platform to Daye 1H3 platform in the Southwest Oil and Gas Field block as an example, the fusion feature selection algorithm was used to select 6 optimal feature parameters from the original 53 drilling feature parameters, which can cover 92.3% of the original data features. This effectively reduces the dimensionality of the dataset, saves a significant amount of computation time, and improves the dataset utilization efficiency and model training efficiency.
- (2)
- Compared with traditional BP neural network, LSTM neural network, and CNN-LSTM model, the drilling rate prediction model established based on BO-LSTM-FS neural network shows significant improvements: the mean absolute error is reduced by 48.0%, 29.3%, and 23.5% respectively; the root mean square error is reduced by 45.5%, 38.5%, and 32.2% respectively; the mean absolute percentage error is reduced by 47.8%, 29.4%, and 22.6% respectively; and the coefficient of determination is improved by 8.6%, 4.4%, and 3.0% respectively. These results indicate that the BO-LSTM-FS model has high prediction accuracy, fast convergence speed, and strong generalization ability.
- (3)
- Based on the BO-LSTM-FS model, sensitivity analysis of drilling parameters can be performed for similar geological blocks at the same well depth, thereby guiding the optimization of drilling parameters in these blocks. For instance, in the Dalong Formation (well depth 4000–4500 m) of a platform in the Eastern Sichuan Block, the model inversion results indicate that Weight on Bit (WOB) and Rotary Speed (RPM) have the most significant impact on ROP, with their respective weights reaching 0.33 and 0.27, whereas the influence of Mud Weight (MW) and Flow Rate (Q) is relatively lower. Accordingly, during actual drilling operations, targeted adjustments to these key parameters can be implemented, achieving an ROP improvement of approximately 7–15%. This provides reliable support for optimizing drilling parameters under similar geological conditions, demonstrating clear value for practical engineering applications.
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Young, F.S., Jr. Computerized drilling control. J. Pet. Technol. 1969, 21, 483–496. [Google Scholar] [CrossRef]
- Moraveji, M.K.; Naderi, M. Drilling rate of penetration prediction and optimization using response surface methodology and bat algorithm. J. Nat. Gas Sci. Eng. 2016, 31, 829–841. [Google Scholar] [CrossRef]
- Liu, H.; Cui, S.; Meng, Y.; Han, Z.; Yang, M. Study on rock mechanical properties and wellbore stability of fractured carbonate formation based on fractal geometry. ACS OMEGA 2022, 7, 43022–43035. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Chen, X.; Zhao, H.; Wu, M.; Cao, W.; Zhang, Y.; Liu, H. A novel rate of penetration prediction model with identified condition for the complex geological drilling process. J. Process Control. 2021, 100, 30–40. [Google Scholar] [CrossRef]
- Seifabad, M.C.; Ehteshami, P. Estimating the drilling rate in Ahvaz oil field. J. Pet. Explor. Prod. Technol. 2013, 3, 169–173. [Google Scholar] [CrossRef]
- Gan, C.; Cao, W.; Wu, M.; Liu, K.Z.; Chen, X.; Hu, Y.; Ning, F. Two-level intelligent modeling method for the rate of penetration in complex geological drilling process. Appl. Soft Comput. 2019, 80, 592–602. [Google Scholar] [CrossRef]
- Al-Yaari, A.; Ling Chuan Ching, D.; Sakidin, H.; Sundaram Muthuvalu, M.; Zafar, M.; Haruna, A.; Merican Aljunid Merican, Z.; Azad, A.S. A New 3D Mathematical Model for Simulating Nanofluid Flooding in a Porous Medium for Enhanced Oil Recovery. Materials 2023, 16, 5414. [Google Scholar] [CrossRef] [PubMed]
- Tang, M.; Wang, H.C.; He, S.M.; Zhang, G.F.; Kong, L.H. Research on Rate of Penetration Prediction Based on PCA-BP Algorithm. Pet. Mach. 2023, 51, 23–31+76. [Google Scholar] [CrossRef]
- Song, Y.; Peng, F.K.; Meng, Z.R.; Cao, B. A Method for Predicting Rate of Penetration in Marine Shallow Drilling Based on BO-LSTM. Autom. Instrum. 2024, 39, 14–17. [Google Scholar] [CrossRef]
- Xu, Z.H.; Jiang, J.; Zhou, C.C.; Li, Q.; Ren, J. Research on a rate of penetration (ROP) prediction model based on feature selection integrated with particle swarm optimization (PSO). Drill. Eng. 2025, 52, 134–143. [Google Scholar]
- Zhou, C.C.; Jiang, J.; Li, Q.; Zhu, H.Y.; Li, Z.J.; Lu, L.L. Research on a Rate of Penetration Prediction Model Based on a Fusion Feature Selection Method. Drill. Eng. 2022, 49, 31–40. [Google Scholar] [CrossRef]
- Liu, J.N.; Liu, W.F.; Wang, B.X.; Luo, X.Z. A Wavelet Transform Filtering Algorithm for Ultrasonic Ranging. J. Tsinghua Univ. (Sci. Technol.) 2012, 52, 951–955. [Google Scholar] [CrossRef]
- Wang, X.M. Questioning the Comprehensive Scoring Method in Principal Component Analysis. Stat. Decis. 2007, 31–32. [Google Scholar] [CrossRef]
- Wang, X.M. Issues Worthy of Attention in the Application of Principal Component Analysis and Factor Analysis. Stat. Decis. 2007, 142–143. [Google Scholar] [CrossRef]
- Estévez, P.A.; Tesmer, M.; Perez, C.A.; Zurada, J.M. Normalized mutual information feature selection. IEEE Trans. Neural Netw. 2009, 20, 189–201. [Google Scholar] [CrossRef] [PubMed]
- Kang, W.H.; Xu, T.Q.; Wang, Y.G.; Deng, X.L.; Li, Y. Short-term Wind Power Forecasting Based on Two-layer Feature Selection and CatBoost-Bagging Ensemble. J. Chongqing Univ. Technol. (Nat. Sci.) 2022, 36, 303–309. [Google Scholar]
- Xie, H.; Jiang, X.; Wang, W. Research on Grid Regional Line Loss Prediction Method Based on LSTM. Electr. Autom. 2023, 45, 47–49. [Google Scholar]
- Zeng, Y.; Yao, K.; Ren, S.; Hu, W.C. Bayesian Optimization of Fuzzy Clustering for Acoustic Environment of Prefecture-level Administrative Regions. J. Appl. Acoust. 2024, 43, 385–392. [Google Scholar] [CrossRef]
- Zhou, X.; Zhai, J.H.; Huang, Y.J.; Shen, R.C.; Hou, Y.Z. A Voting Feature Selection Algorithm in Big Data Environment. Mini-Micro Syst. 2022, 43, 936–942. [Google Scholar] [CrossRef]
- Zhang, J.H.; Liu, Y.Y.; Wang, L.L.; Yuan, D.L. BP neural network model based on attribute kernel feature selection and dynamic determination of hidden layer node number. J. Qingdao Univ. Sci. Technol. (Nat. Sci. Ed.) 2021, 42, 113–118. [Google Scholar] [CrossRef]









| Parameter | Range of Values | Before Optimization | After Optimization |
|---|---|---|---|
| InitialLearnRate | [10−3, 1] | 0.005 | 0.0062 |
| NumOfUnits | [10, 100] | 32 | 64 |
| BatchSize | [16, 256] | 32 | 64 |
| Epochs | [50, 500] | 200 | 200 |
| DropoutRate | [0.1, 0.5] | 0.2 | 0.3 |
| Depth | HL | ROP | WOB | Torque | RPM | SPP | PS | Q | MW |
|---|---|---|---|---|---|---|---|---|---|
| 4382 | 1237.65 | 13.44 | 130.72 | 11.89 | 79.07 | 34.24 | 148.04 | 32.11 | 1.8 |
| 4383 | 1238.72 | 12.97 | 130.08 | 11.76 | 78.72 | 34.19 | 148.03 | 32.11 | 1.8 |
| 4384 | 1240.62 | 12.05 | 128.88 | 11.64 | 78.33 | 34.12 | 148.03 | 32.11 | 1.8 |
| 4385 | 1242.27 | 11.22 | 128.1 | 11.52 | 77.94 | 34.06 | 148.03 | 32.1 | 1.8 |
| 4386 | 1240.7 | 11.98 | 130.49 | 11.42 | 77.61 | 34.1 | 148.03 | 32.1 | 1.8 |
| 4387 | 1243.7 | 10.44 | 128.53 | 11.34 | 77.31 | 34.01 | 148.03 | 32.1 | 1.8 |
| 4388 | 1244.84 | 9.85 | 128.29 | 11.28 | 77 | 33.98 | 148.03 | 32.09 | 1.8 |
| Parameter | Variance Value | Information Richness | Retention Status | Filtering Criterion |
|---|---|---|---|---|
| WOB | 0.789 | High | Retained | Variance > 0.3 |
| RPM | 0.721 | High | Retained | Variance > 0.3 |
| Torque | 0.654 | High | Retained | Variance > 0.3 |
| GR | 0.589 | Medium | Retained | Variance > 0.3 |
| HL | 0.556 | Medium | Retained | Variance > 0.3 |
| Q | 0.547 | Medium | Retained | Variance > 0.3 |
| CAL | 0.512 | Medium | Retained | Variance > 0.3 |
| SPP | 0.489 | Medium | Retained | Variance > 0.3 |
| MW | 0.312 | Low | Removed | Variance < 0.3 |
| FP | 0.296 | Low | Removed | Variance < 0.3 |
| Parameter | Mutual Information Value | Nonlinear Correlation | Correlation with Drilling Rate |
|---|---|---|---|
| WOB | 0.821 | Strong Correlation | Strong nonlinear correlation |
| RPM | 0.756 | Strong Correlation | Strong nonlinear correlation |
| Torque | 0.712 | Strong Correlation | Strong nonlinear correlation |
| GR | 0.654 | Strong Correlation | Strong nonlinear correlation |
| Q | 0.649 | Strong Correlation | Strong nonlinear correlation |
| DP | 0.612 | Strong Correlation | Strong nonlinear correlation |
| CAL | 0.589 | Medium Correlation | Nonlinear correlation |
| SPP | 0.543 | Medium Correlation | Nonlinear correlation |
| HL | 0.526 | Medium Correlation | Nonlinear correlation |
| MW | 0.512 | Medium Correlation | Nonlinear correlation |
| FP | 0.268 | Weak Correlation | Weak nonlinear correlation |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Meng, Q.; Song, H.; Meng, D.; Liu, X.; Li, D.; Chen, X.; Wei, Y.; Zhang, C.; Wei, J.; Wu, Y.; et al. Drilling Rate Prediction Based on Bayesian Optimization LSTM Algorithm with Fusion Feature Selection. Processes 2026, 14, 274. https://doi.org/10.3390/pr14020274
Meng Q, Song H, Meng D, Liu X, Li D, Chen X, Wei Y, Zhang C, Wei J, Wu Y, et al. Drilling Rate Prediction Based on Bayesian Optimization LSTM Algorithm with Fusion Feature Selection. Processes. 2026; 14(2):274. https://doi.org/10.3390/pr14020274
Chicago/Turabian StyleMeng, Qingchun, Hongchen Song, Di Meng, Xin Liu, Dongjie Li, Xinyong Chen, Yuhao Wei, Chao Zhang, Jiongyu Wei, Yongchao Wu, and et al. 2026. "Drilling Rate Prediction Based on Bayesian Optimization LSTM Algorithm with Fusion Feature Selection" Processes 14, no. 2: 274. https://doi.org/10.3390/pr14020274
APA StyleMeng, Q., Song, H., Meng, D., Liu, X., Li, D., Chen, X., Wei, Y., Zhang, C., Wei, J., Wu, Y., Kuang, M., Yang, K., & Li, M. (2026). Drilling Rate Prediction Based on Bayesian Optimization LSTM Algorithm with Fusion Feature Selection. Processes, 14(2), 274. https://doi.org/10.3390/pr14020274
