A Feature Engineering and XGBoost Framework for Prediction of TOC from Conventional Logs in the Dongying Depression, Bohai Bay Basin
Abstract
1. Introduction
2. Background
2.1. Passey Method
2.2. Extreme Gradient Boosting
3. Geological Settings


4. Methodology
4.1. Samples
4.2. Input Selection
- (1)
- Resistivity Logging (RT)
- (2)
- Density Logging (DEN)
- (3)
- Acoustic Logging (AC)
- (4)
- Neutron Logging (CNL)
- (5)
- Gamma Ray Logging (GR)
4.3. Model Development Workflow
4.3.1. Feature Engineering
4.3.2. Feature Selection
4.3.3. Hyperparameter Optimization
4.3.4. Model Evaluation Metrics
5. Results and Discussion
5.1. Hyperparameter Optimization Convergence
5.2. Feature Importance Analysis
5.3. Model Performance
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Jarvie, D.M.; Hill, R.J.; Ruble, T.E.; Pollastro, R.M. Unconventional shale-gas systems: The Mississippian Barnett Shale of north-central Texas as one model for thermogenic shale-gas generation. AAPG Bull. 2007, 91, 475–499. [Google Scholar] [CrossRef]
- Liu, C.; Wang, Z.; Guo, Z.; Hong, W.; Dun, C.; Zhang, X.; Li, B.; Wu, L. Enrichment and distribution of shale oil in the Cretaceous Qingshankou Formation, Songliao Basin, Northeast China. Mar. Pet. Geol. 2017, 86, 751–770. [Google Scholar] [CrossRef]
- Zou, C.; Dong, D.; Wang, S.; Li, J.; Li, X.; Wang, Y.; Li, D.; Cheng, K. Geological characteristics and resource potential of shale gas in China. Pet. Explor. Dev. 2010, 37, 641–653. [Google Scholar] [CrossRef]
- Peters, K.E.; Cassa, M.R. Applied source rock geochemistry. In The Petroleum System—From Source to Trap; Magoon, L.B., Dow, W.G., Eds.; American Association of Petroleum Geologists: Tulsa, OK, USA, 1994; pp. 93–120. [Google Scholar] [CrossRef]
- Gao, Z.; Bai, L.; Hu, Q.; Yang, Z.; Jiang, Z.; Wang, Z.; Xin, H.; Zhang, L.; Yang, A.; Jia, L.; et al. Shale oil migration across multiple scales: A review of characterization methods and different patterns. Earth Sci. Rev. 2024, 254, 104819. [Google Scholar] [CrossRef]
- Rudra, A.; Wood, J.M.; Biersteker, V.; Sanei, H. Oil migration from internal and external source rocks in an unconventional hybrid petroleum system, Montney Formation, western Canada. Int. J. Coal Geol. 2024, 285, 104473. [Google Scholar] [CrossRef]
- Passey, Q.R.; Creaney, S.; Kulla, J.B.; Moretti, F.J.; Strosity, J.D. A practical model for organic richness from porosity and resistivity logs. AAPG Bull. 1990, 74, 1777–1794. [Google Scholar] [CrossRef]
- Alshakhs, M.; Rezaee, R. A new method to estimate total organic carbon (TOC) content, an example from Goldwyer Shale Formation, the Canning Basin. Open Pet. Eng. J. 2017, 10, 118–133. [Google Scholar] [CrossRef]
- Elsaqqa, M.A.; El Din, M.Y.Z.; Afify, W. Unconventional shale gas sweet spot identification and characterization of the Middle Jurassic Upper Safa sediments, Amoun field, Shushan Basin, Western Desert, Egypt. J. Geol. Geophys. 2023, 12, 1103. [Google Scholar]
- Vergara, R.V. Well-log based TOC estimation using linear approximation methods. Geosci. Eng. 2020, 8, 116–130. [Google Scholar]
- Nyakilla, E.E.; Silingi, S.N.; Shen, C.; Jun, G.; Mulashani, A.K.; Chibura, P.E. Evaluation of source rock potentiality and prediction of total organic carbon using well log data and integrated methods of multivariate analysis, machine learning, and geochemical analysis. Nat. Resour. Res. 2022, 31, 619–641. [Google Scholar] [CrossRef]
- Tissot, B.P.; Welte, D.H. Petroleum Formation and Occurrence, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1984; 702p. [Google Scholar] [CrossRef]
- Lai, J.; Zhao, F.; Xia, Z.; Su, Y.; Zhang, C.; Tian, Y.; Wang, G.; Qin, Z. Well log prediction of total organic carbon: A comprehensive review. Earth Sci. Rev. 2024, 258, 104913. [Google Scholar] [CrossRef]
- Zhu, L.; Zhang, C.; Zhang, C.; Zhang, Z.; Zhou, X.; Liu, W.; Zhu, B. A new and reliable dual model- and data-driven TOC prediction concept: A TOC logging evaluation method using multiple overlapping methods integrated with semi-supervised deep learning. J. Pet. Sci. Eng. 2020, 188, 106944. [Google Scholar] [CrossRef]
- McCarthy, K.; Rojas, K.; Niemann, M.; Palmowski, D.; Peters, K.; Stankiewicz, A. Basic petroleum geochemistry for source rock evaluation. Oilfield Rev. 2011, 23, 32–43. [Google Scholar]
- Schmoker, J.W. Determination of organic-matter content of Appalachian Devonian shales from gamma-ray logs. AAPG Bull. 1981, 65, 1285–1298. [Google Scholar] [CrossRef]
- Cui, X.; Liu, J.; Sun, Z.; Wang, H. Rock mechanical properties of immature, organic-rich source rocks and their relationships to rock composition and lithofacies. Pet. Geosci. 2023, 29, petgeo2022-021. [Google Scholar] [CrossRef]
- Hunt, J. Petroleum Geochemistry and Geology, 2nd ed.; W.H. Freeman: San Francisco, CA, USA, 1995; 743p. [Google Scholar]
- Du, J.; Zhang, X.; Zhong, G.; Feng, C.; Guo, L.; Zhang, X.; Luo, W. Analysis on the optimization and application of well logs indentification methods for organic carbon content in source rocks of the tight oil—Illustrated by the example of the source rocks of Chang 7 member of Yanchang Formation in Ordos Basin. Prog. Geophys. 2016, 31, 2526–2533. [Google Scholar] [CrossRef]
- Zhao, P.; Ma, H.; Rasouli, V.; Liu, W.; Cai, J.; Huang, Z. An improved model for estimating the TOC in shale formations. Mar. Pet. Geol. 2017, 83, 174–183. [Google Scholar] [CrossRef]
- Polat, C.; Eren, T. Modification of ΔlogR method and nonlinear regression application for total organic carbon content estimation from well logs. Hittite J. Sci. Eng. 2021, 8, 161–169. [Google Scholar] [CrossRef]
- Feurer, M.; Hutter, F. Hyperparameter optimization. In Automated Machine Learning: Methods, Systems, Challenges; Hutter, F., Kotthoff, L., Vanschoren, J., Eds.; Springer: Cham, Switzerland, 2019; pp. 3–33. [Google Scholar] [CrossRef]
- Syarif, I.; Prugel-Bennett, A.; Wills, G. SVM parameter optimization using grid search and genetic algorithm to improve classification performance. TELKOMNIKA 2016, 14, 1502–1509. [Google Scholar] [CrossRef]
- Robnik-Šikonja, M. Improving random forests. In Proceedings of the 15th European Conference on Machine Learning on Machine Learning: ECML 2004, Pisa, Italy, 20–24 September 2004; pp. 359–370. [Google Scholar] [CrossRef]
- Khan, S.; Liu, Z.; Lu, Z.; Hussain, W.; Ahmed, S.; Muhammad, M.; Umar, M.U. Comparative analysis of machine learning and empirical approaches for total organic carbon prediction in the J1d formation, Sichuan Basin, China. Phys. Fluids 2025, 37, 086644. [Google Scholar] [CrossRef]
- Mahmoud, A.A.A.; Elkatatny, S.; Mahmoud, M.; Abouelresh, M.; Abdulraheem, A.; Ali, A. Determination of the total organic carbon (TOC) based on conventional well logs using artificial neural network. Int. J. Coal Geol. 2017, 179, 72–80. [Google Scholar] [CrossRef]
- He, Y.; Zhang, Z.; Wang, X.; Zhao, Z.; Qiao, W. Estimating the total organic carbon in complex lithology from well logs based on convolutional neural networks. Front. Earth Sci. 2022, 10, 871561. [Google Scholar] [CrossRef]
- Goliatt, L.; Saporetti, C.M.; Pereira, E. Super learner approach to predict total organic carbon using stacking machine learning models based on well logs. Fuel 2023, 353, 128682. [Google Scholar] [CrossRef]
- Liu, Y.; Li, N.; Li, C.; Jiang, J.; Wu, X.; Liang, H.; Zhang, D.; Hu, X. Prediction of total organic carbon content in deep marine shale reservoirs based on a super hybrid machine learning model. Energy Fuels 2024, 38, 17483–17498. [Google Scholar] [CrossRef]
- Barham, A.; Ismail, M.S.; Hermana, M.; Padmanabhan, E.; Baashar, Y.; Sabir, O. Predicting the maturity and organic richness using artificial neural networks (ANNs): A case study of Montney Formation, NE British Columbia, Canada. Alexandria Eng. J. 2021, 60, 3253–3264. [Google Scholar] [CrossRef]
- Wu, Q.; Pang, H.; Zhang, B.; Jiang, F.; Wu, L.; Chen, J.; Ma, K.; Huo, X. Application of shale TOC prediction model using the XGBoost machine learning algorithm: A case study of the Qiongzhusi Formation in central Sichuan Basin. Carbonates Evaporites 2025, 40, 8. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4768–4777. Available online: https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf (accessed on 27 December 2025).
- Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
- Cheng, L.; Yang, Z.; Costa, F. Insights on source lithology and pressure-temperature conditions of basalt generation using machine learning. Earth Space Sci. 2024, 11, e2024EA003732. [Google Scholar] [CrossRef]
- Shuvo, M.A.I.; Hossain Joy, S.M. A data driven approach to assess the petrophysical parametric sensitivity for lithology identification based on ensemble learning. J. Appl. Geophys. 2024, 222, 105330. [Google Scholar] [CrossRef]
- Abe, J.; Adekanye, D. Advancing reservoir characterization: A comparative analysis of XG boost and ANN for accurate porosity prediction. J. Data Anal. 2024, 3, 47–60. [Google Scholar] [CrossRef]
- Putatunda, S.; Rama, K. A Comparative analysis of hyperopt as against other approaches for hyper-parameter optimization of XGBoost. In Proceedings of the 2018 International Conference on Signal Processing and Machine Learning, Shanghai, China, 28–30 November 2018; pp. 6–10. [Google Scholar] [CrossRef]
- Verma, V. Exploring key XGBoost hyperparameters: A study on optimal search spaces and practical recommendations for regression and classification. Int. J. All Res. Educ. Sci. Methods 2024, 12, 3259–3266. [Google Scholar] [CrossRef]
- Zhang, P.; Jia, Y.; Shang, Y. Research and application of XGBoost in imbalanced data. Int. J. Distrib. Sens. Netw. 2022, 18, 15501329221106935. [Google Scholar] [CrossRef]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar] [CrossRef]
- Watanabe, S. Tree-structured Parzen estimator: Understanding its algorithm components and their roles for better empirical performance. arXiv 2023. [Google Scholar] [CrossRef]
- Liu, X.; Tian, Z.; Chen, C. Total organic carbon content prediction in lacustrine shale using extreme gradient boosting machine learning based on Bayesian optimization. Geofluids 2021, 2021, 6155663. [Google Scholar] [CrossRef]
- Yang, K.; Liu, L.; Wen, Y. The impact of Bayesian optimization on feature selection. Sci. Rep. 2024, 14, 3948. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Ma, J.; Zhang, X.; Wang, Z.; Wang, F.; Wang, H.; Li, L. Prediction of total organic carbon content by a generalized ΔlogR method considering density factors: Illustrated by the example of deep continental source rocks in the southwestern part of the Bozhong sag. Prog. Geophys. 2020, 35, 1471–1480. [Google Scholar] [CrossRef]
- Wu, S.; Yu, Z.; Zhang, R.; Han, W.; Zou, D. Mesozoic–Cenozoic tectonic evolution of the Zhuanghai area, Bohai-Bay Basin, east China: The application of balanced cross-sections. J. Geophys. Eng. 2005, 2, 158–168. [Google Scholar] [CrossRef]
- Qi, J.; Yang, Q. Cenozoic structural deformation and dynamic processes of the Bohai Bay basin province, China. Mar. Pet. Geol. 2010, 27, 757–771. [Google Scholar] [CrossRef]
- Zhang, L.; Liu, Q.; Zhu, R.; Li, Z.; Lu, X. Source rocks in Mesozoic–Cenozoic continental rift basins, east China: A case from Dongying Depression, Bohai Bay Basin. Org. Geochem. 2009, 40, 229–242. [Google Scholar] [CrossRef]
- Allen, M.B.; Macdonald, D.I.M.; Zhao, X.; Vincent, S.J.; Brouet-Menzies, C. Early Cenozoic two-phase extension and late Cenozoic thermal subsidence and inversion of the Bohai Basin, northern China. Mar. Pet. Geol. 1997, 14, 951–972. [Google Scholar] [CrossRef]
- Zou, Y.; Sun, J.; Li, Z.; Xu, X.; Li, M.; Peng, P. Evaluating shale oil in the Dongying Depression, Bohai Bay Basin, China, using the oversaturation zone method. J. Pet. Sci. Eng. 2018, 161, 291–301. [Google Scholar] [CrossRef]
- Zhang, S.; Liu, H.; Wang, M.; Liu, X.; Liu, H.; Bao, Y.; Wang, W.; Li, R.; Luo, X.; Fang, Z. Shale pore characteristics of Shahejie Formation: Implication for pore evolution of shale oil reservoirs in Dongying sag, North China. Pet. Res. 2019, 4, 113–124. [Google Scholar] [CrossRef]
- Zhang, L.; Bao, Y.; Li, J.; Li, Z.; Zhu, R.; Zhang, J. Movability of lacustrine shale oil: A case study of Dongying Sag, Jiyang Depression, Bohai Bay Basin. Pet. Explor. Dev. 2014, 41, 703–711. [Google Scholar] [CrossRef]
- Guo, X.; Shi, Z.; Wang, Z.; Zhao, X.; Lu, J.; Zhang, Y. Geochemistry and mineralogy of quaternary sediments in the northern Bohai Bay Basin, North China: Implications for provenance and climate change. Can. J. Earth Sci. 2019, 57, 396–406. [Google Scholar] [CrossRef]
- Hu, Q.; Zhang, Y.; Meng, X.; Zheng, L.; Xie, Z.; Li, M. Characterization of micro-nano pore networks in shale oil reservoirs of Paleogene Shahejie Formation in Dongying Sag of Bohai Bay Basin, East China. Pet. Explor. Dev. 2017, 44, 720–730. [Google Scholar] [CrossRef]
- Zeng, X.; Cai, J.; Dong, Z.; Bian, L.; Li, Y. Relationship between mineral and organic matter in shales: The case of Shahejie Formation, Dongying Sag, China. Minerals 2018, 8, 222. [Google Scholar] [CrossRef]
- Yang, Y.; Khan, D.; Qiu, L.; Du, Y.; Long, J.; Li, W.; Zafar, T.; Ali, F.; Shaikh, A. Microscopic reservoir characteristics of the lacustrine calcareous shale: An example from the Es4s shale of the Paleogene Shahejie Formation in Boxing Sag, Dongying Depression. ACS Omega 2022, 7, 36748–36761. [Google Scholar] [CrossRef]
- Fang, X.; Ma, C.; Qin, F.; An, T.; Liu, R.; Song, H.; Zhang, C.; Wang, T.; Gao, B.; Hao, P. The control of astronomical cycles on lacustrine mixed sedimentation and hydrocarbon occurrence: A case study of the Paleogene Shahejie Formation in the Dongying Sag, Bohai Bay Basin. Pet. Sci. 2025; in press. [Google Scholar] [CrossRef]
- Li, S.; Pang, X.; Li, M.; Jin, Z. Geochemistry of petroleum systems in the Niuzhuang South Slope of Bohai Bay Basin—Part 1: Source rock characterization. Org. Geochem. 2003, 34, 389–412. [Google Scholar] [CrossRef]
- Liu, Q.; Zeng, X.; Wang, X.; Cai, J. Lithofacies of mudstone and shale deposits of the Es3z-Es4s formation in Dongying sag and their depositional environment. Mar. Geol. Quat. Geol. 2017, 37, 147–156. [Google Scholar] [CrossRef]
- Yang, H.; Liu, C.; Wang, F.; Tang, G.; Li, G.; Zeng, X.; Wu, Y. Geochemical characteristics and environmental implications of source rocks of the Dongying Formation in southwest subsag of Bozhong Sag. Bulle. Geol. Sci. Technol. 2023, 42, 339–349. [Google Scholar] [CrossRef]
- Li, C.; Wu, Y.; Ding, X.; Xie, X.; Luo, T.; Zhang, J.; Sun, Y.; Xia, C. Formation of carbonate laminae in shale and their impact on organic matter in Dongying depression. Sci. Rep. 2025, 15, 22093. [Google Scholar] [CrossRef]
- Chen, Z.; Jiang, W.; Zhang, L.; Zha, M. Organic matter, mineral composition, pore size, and gas sorption capacity of lacustrine mudstones: Implications for the shale oil and gas exploration in the Dongying depression, eastern China. AAPG Bull. 2018, 102, 1565–1600. [Google Scholar] [CrossRef]
- Wang, H.; Wu, W.; Chen, T.; Dong, X.; Wang, G. An improved neural network for TOC, S1 and S2 Estimation Based on Conventional well logs. J. Pet. Sci. Eng. 2019, 176, 664–678. [Google Scholar] [CrossRef]
- Verma, S.; Zhao, T.; Marfurt, K.J.; Devegowda, D. Estimation of total organic carbon and brittleness volume. Interpretation 2016, 4, T373–T385. [Google Scholar] [CrossRef]
- Zheng, W.; Tian, F.; Di, Q.; Zhang, J.; Zhou, H.; Zhang, W.; Wang, Z. A “data-feature-policy” solution for multiscale geological–geophysical intelligent reservoir characterization. Second Int. Meet. Appl. Geosci. Energy 2022, 41, 3272–3276. [Google Scholar] [CrossRef]
- Peng, Z.; Cao, D.; Xu, H.; Zhu, D.; Wang, P. Multi-scale information fusion of well-logging data-based deep learning 3D modeling method. J. Geophys. Eng. 2025, 22, 1671–1686. [Google Scholar] [CrossRef]
- Macêdo, B.S.; Wayo, D.D.K.; Campos, D.; Santis, R.B.; Martinho, A.D.; Yaseen, Z.M.; Saporetti, C.M.; Goliatt, L. Data-driven total organic carbon prediction using feature selection methods incorporated in an automated machine learning framework. Sci. Rep. 2025, 15, 10658. [Google Scholar] [CrossRef] [PubMed]
- Saeys, Y.; Inza, I.; Larrañaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007, 23, 2507–2517. [Google Scholar] [CrossRef] [PubMed]
- Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
- Botchkarev, A. Performance metrics (error measures) in machine learning regression, forecasting and prognostics: Properties and typology. Interdiscip. J. Inf. Knowl. Manag. 2019, 14, 45–76. [Google Scholar] [CrossRef]
- Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]












| Method | Test R2 | NRMSE | |
|---|---|---|---|
| Passey Method | 0.4136 | 0.1881 | 0.634 |
| RF | 0.7075 | 0.1329 | 0.298 |
| CNN (Wang et al. [63]) | 0.8283 | 0.1010 | 0.176 |
| XGBoost (Baseline) | 0.7889 | 0.1129 | 0.240 |
| XGBoost (Optimized) | 0.9395 | 0.0604 | 0.109 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhao, Z.; Zhong, G.; Diao, F.; Ding, P.; He, J. A Feature Engineering and XGBoost Framework for Prediction of TOC from Conventional Logs in the Dongying Depression, Bohai Bay Basin. Geosciences 2026, 16, 44. https://doi.org/10.3390/geosciences16010044
Zhao Z, Zhong G, Diao F, Ding P, He J. A Feature Engineering and XGBoost Framework for Prediction of TOC from Conventional Logs in the Dongying Depression, Bohai Bay Basin. Geosciences. 2026; 16(1):44. https://doi.org/10.3390/geosciences16010044
Chicago/Turabian StyleZhao, Zexi, Guoyun Zhong, Fan Diao, Peng Ding, and Jianfeng He. 2026. "A Feature Engineering and XGBoost Framework for Prediction of TOC from Conventional Logs in the Dongying Depression, Bohai Bay Basin" Geosciences 16, no. 1: 44. https://doi.org/10.3390/geosciences16010044
APA StyleZhao, Z., Zhong, G., Diao, F., Ding, P., & He, J. (2026). A Feature Engineering and XGBoost Framework for Prediction of TOC from Conventional Logs in the Dongying Depression, Bohai Bay Basin. Geosciences, 16(1), 44. https://doi.org/10.3390/geosciences16010044

