1. Introduction
In oil and gas exploration, lithology prediction is crucial for determining potential investment [
1,
2]. Forecasting the rock type in the target area assists in constructing the underground trap structure and predicting oil and gas production. It is important to note that larger reservoirs can produce a higher return on investment. Traditional methods used to predict reservoir lithology in oil and gas exploration rely on geology and petrophysics theories, as well as field geological observations and well logging data analysis [
3,
4]. While these methods have achieved some success, they often require manual interpretation and rely on subjective evaluations, which are limited by factors such as data quality and geological conditions [
5,
6,
7]. The development of data processing capabilities has resulted in the emergence of machine learning and artificial intelligence methods, such as deep learning models, that aim to improve the accuracy and efficiency of rockiness prediction. These methods can be integrated into traditional methods to create predictive models that combine data from multiple sources, which can better support decision making [
8,
9,
10]. A “blind well” refers to a well where no prior drilling samples or geological information have been obtained. It is typically used to test the predictive capabilities of rock properties, stratigraphic features, or reservoir characteristics. The method for predicting lithology in blind wells uses deep learning analyses and models well logging data, which enable the prediction of rock types or lithology even in the absence of core data, enhancing the accuracy and efficiency of rock prediction. While mud logging provides valuable lithology information through magma analysis, there are situations where lithology must be inferred from logging records due to the absence of core samples, as core samples may not be available or may not have been obtained from some sections of the well due to various limitations. In these cases, logging data become the primary source for lithologic interpretation. Moreover, logging data offer continuous measurements across the entire wellbore, enabling the thorough analysis of lithologic variations. Common types of logging data include gamma ray, resistivity, sonic, density, neutron, porosity, and permeability data, which are typically presented as curves on logging charts corresponding to well depth [
11]. Integrating these logging curves allows for a more comprehensive lithologic characterization, capturing subtle changes that may elude detection through rock chip analysis alone. The analysis and interpretation of these curves can yield detailed insights into formation and reservoir characteristics, aiding in well suitability assessments, the determination of production capacity, the evaluation of hydrocarbon reservoir potential, and making informed decisions regarding well completion or abandonment.
Traditional logging interpretation methods use traditional geology and physics for the interpretation and analysis of logging data. There are many traditional log interpretation methods, such as manually viewing and analyzing the morphology, trends, and interrelationships of logging curves to infer subsurface geologic features. For example, the resistivity, natural gamma radiation, and sonic velocity are used to understand the type of rock, hydrocarbon properties, and reservoir characteristics. Furthermore, the rock type, stratigraphic relationship, and reservoir characteristics can be deduced by observing the trend of change in different logging curves in the vertical direction, drawing the profiles of these logging curves, and combining the results with one’s knowledge of stratigraphy. Traditional logging interpretation methods play an important role in the exploration and development stages, and with the development of machine learning and automation technology, these methods are gradually being combined with computer-aided interpretation to improve efficiency and accuracy. Lithology prediction methods based on logging data analysis have been an increasingly popular research topic in the oil and gas exploration field in recent years [
12,
13]. Researchers have aimed to improve the accuracy of predicting the rock type or lithology in blind wells without core data by combining artificial intelligence techniques with information from logging data. Machine learning methods applied to automated logging can reduce exploration costs and improve prediction accuracy compared to computationally intensive manual logging interpretations. Machine learning algorithms can automatically process large amounts of logging data, which can significantly reduce human resource and time costs compared to manual processing. Machine learning methods can extract valuable features from logging data and identify hidden patterns and correlations. This helps to speed up the exploration process and improve the prediction accuracy. Based on the prediction results of machine learning models, decision makers can make more informed decisions to reduce exploration risks and increase the success rate, reducing unnecessary trial and error and the waste of resources [
4,
14,
15,
16]. These methods focus on the identification of optimal features using unsupervised and supervised machine learning algorithms, as well as the application of automated logging data to achieve reliable lithology prediction and subsequent reservoir characterization. With the continuous updating of deep learning algorithms and the improvement of arithmetic power, more relevant methods are being used for reservoir lithology prediction. These methods include the CNN, recurrent neural network, and LSTM network. CNNs are used to extract complex features from logging data, and LSTM is used to extract vertical spatial relationships from its output characteristics. Finally, the mapping relationship between logging data and lithology type can be established. This model helps in the recognition of the lithology of complex formations [
17]. A semi-supervised deep learning framework has been used with a closed-loop CNN and virtual logging labels. Closed-loop CNNs have predictive and generative sub-networks, and this model can be trained directly using seismic attribute data [
18]. The spatial and temporal features of the logging data are extracted using a combination of a CNN and LSTM neural networks. A particle swarm optimization algorithm can also be used to determine the optimal hyperparameters for predicting log profiles [
19]. The analysis must overcome several obstacles when employing conventional deep learning models. First, log data are often sparse, and the sample distribution is imbalanced. Second, log data can be influenced by noise, outliers, and other quality issues. Finally, the preprocessing and cleaning of data must be undertaken to enhance data quality and model robustness. Deep learning models usually function as black boxes, making it challenging to determine how they arrive at predictions and decisions. Model interpretability is crucial in well logging. Deep learning models may perform well on training sets due to the complexity of reservoir composition, but their ability to generalize to new data may be limited.
To address the above problems, we propose a Transformer- and LSTM-based hybrid approach to identify and classify lithology in blind wells. Using the multi-attention mechanism of the Transformer model and the ability of the LSTM network to capture the temporal spatial features of the lithology sequence, their combination can effectively learn the nonlinear relationship between logging curve data and their correlation in the depth dimension. After numerous experiments, we found that the Transformer–LSTM (T-LS) model outperformed several commonly used models in lithology prediction. To verify the model’s generalization ability, we used the T-LS model and random forest (RF) model to predict the lithology of blind wells without core data in the stratum, which showed that the T-LS model had a better generalization ability. The main contributions of this study are as follows:
A T-LS model is proposed to identify and classify lithology in blind wells. Combining the advantages of the Transformer model and LSTM network, the model effectively learns the nonlinear relationships between logging curve data and their correlation in the depth dimension;
A nested ResTCN is deployed to address missing data, which can efficiently complete missing content, thereby ensuring the completeness and accuracy of training data;
Comparative experiments demonstrate the advantages of the T-LS model in terms of several evaluation metrics. The results of neighboring blind well experiments further validate the model’s generalization ability.
In this study, the T-LS hybrid model is proposed for the identification and classification of lithologies in blind wells. To address the issue of raw logging data quality, this study employed a Savitzky–Golay filter to remove anomalous data during data preprocessing and equalize data samples using a genetic algorithm-based sample interpolation method. This study also employed a nested residual convolutional network (ResTCN) to fill in the missing signals of some logging data, enabling more complete and accurate training data to be obtained. These methods effectively solved the data quality problem and improved the training effect of the model. Through model comparison experiments, it was found that the T-LS model outperformed other commonly used models in lithology prediction. In order to verify the generalization ability of the model, this study also predicted the lithology of blind wells with no core data in the formation and compared its predictions with those of the RF model. The results show that the T-LS model has a good generalization ability and can perform well on unknown data. Finally, a Shapley analysis was used for the T-LS model, and it was concluded that for the multi-sample lithology classification and prediction task, the more information contained in the logging data and the more eigenvalues they have, the better the accuracy of the classification and prediction. Methods for the fusion of multiple logging information can lead to a better understanding of the properties of subsurface rocks and fluids, the improved assessment of the production capacity and recoverable reserves in oil and gas reservoirs, an increased optimization of drilling and production decisions, and a reduction in exploration and development risks.
The rest of this paper is organized as follows.
Section 2 focuses on related methods and techniques, including the ResTCN model, the proposed T-LS model, and Savitzky–Golay filtering.
Section 3 describes the case studies, including data description and processing, and parameter settings, and discusses the experimental results.
Section 4 presents our conclusions and discusses future work.
4. Conclusions
The prediction of lithology in oil and gas exploration target areas holds paramount importance for the development of oil and gas resources. Addressing the inherent subjectivity and limitations of traditional logging methods, we proposed a deep learning-based approach to realize lithology prediction in blind wells. Raw logging data often present practical challenges such as outliers influenced by noise, variations in raw data from different logging instruments, uneven distribution of logging data samples, and partially missing data. Initially, the Savitzky–Golay filtering method was employed to mitigate outliers in logging curves. A comparative analysis with the commonly used SVRs and RFR methods revealed the superior evaluation performance of the ResTCN in terms of MAE and , demonstrating its efficacy in completing missing data. Eight commonly used models for lithology classification prediction, including LR, KNN, DT, and RF, were selected for comparative experiments. Comprehensive evaluation metrics such as the precision, recall, and -score were employed to assess the model performance. Among the nine models evaluated, T-LS and RF exhibited superior evaluation results. Notably, in the prediction task encompassing nine rock samples, T-LS outperformed RF for seven samples, and achieved slightly lower ratings for two samples. The overall rating of the T-LS model reached 0.88, surpassing the RF model, which reached 0.74.
The comparative experimental results underscore the superior classification prediction of the proposed model compared to traditional methods. The lithology of adjacent unlabeled blind wells was predicted using the T-LS and RF models, revealing superior performance of the new model in terms of its generalization ability, thereby partially mitigating the challenges of inaccurate prediction and weak generalization abilities encountered by traditional machine learning and deep learning models in lithology recognition. Our future work will focus on improving the T-LS model by incorporating additional features and refining its architecture, so as to further enhance the lithology prediction accuracy. Additionally, exploring the integration of advanced data augmentation techniques and domain-specific knowledge could enhance the model’s robustness and generalization ability across diverse geological formations.