1. Introduction
Gas hydrates are crystalline substances with a rigid cage structure that arise from integrating gas and water molecules under conditions of low temperature and high pressure, similar to ice. They are found mainly in permafrost on the seabed and in inland areas [
1]. The exploration of gas hydrates in China commenced in 1999 [
2]. In November 2008, gas hydrate samples were first gained from the depth range of 133.5 to 135.5 m in the DK–1 borehole situated in the Muli permafrost zones of the Qilian Mountains (PZQM) on the Tibetan Plateau, establishing China as the first nation to successfully discover gas hydrates in the mid–low-latitude permafrost zone, which is of great scientific and economic importance [
3,
4,
5]. Subsurface geological stratigraphy and lithology are pivotal for reservoir characterization and resource assessment [
6,
7]. These factors impact rock physical properties like porosity and permeability, which affect gas hydrate saturation [
8,
9,
10]. Lithology classification is a crucial aspect of the hydrate exploration and development process, as it informs the classification of hydrate reservoir types. Reservoir delineation represents a pivotal element of hydrate fine description [
11], which is of paramount importance for the accurate quantitative evaluation of gas hydrate reserves. Therefore, reliable lithology identification results are conducive to improving the accuracy of reservoir physical property prediction and reducing uncertainty during the exploration, development, and stabilization of gas hydrate deposits [
12,
13]. Conventional identification techniques, including cross-plot [
14], statistical methods [
15,
16], and imaging logging [
17], entail the manual classification of lithologies by the interpreter, a process that is both laborious and time-consuming. Compared with the conventional lithological identification techniques (for example, coring), the identification formation through the utilization of logging data is characterized by rapidity and minimal expenditure. Prior research has demonstrated that geophysical logging curves are responsive to diverse lithologies, enabling the effective identification and delineation of formation lithologies [
18,
19].
Over the past few decades, multitudinous experts have conducted extensive research in the field of lithological classification based on logging data. As a consequence, they have proposed several conventional methods for lithological classification problems, including cross-plot, statistical methods, and imaging logging. However, these conventional methods present several shortcomings, including difficulties in identification, low accuracy, slow efficiency, and a high degree of susceptibility to human factors. Furthermore, the cost of stratigraphic imaging logging is a significant barrier to its implementation in a broad range of practical applications.
In recent years, machine learning (ML) techniques have facilitated the processing of geophysical datasets in novel ways [
20]. Researchers have commenced utilizing ML techniques to explore the correlation between logging data and rock types and to develop methodologies for forecasting rock types. For instance, supervised learning algorithms in ML algorithms, including support vector machine (SVM) [
21], decision tree (DT) [
22], multi-layer perceptron (MLP) [
23], and random forest (RF) [
24], have been effectively utilized. Furthermore, unsupervised ML algorithms include clustering [
25,
26] and principal components analysis [
27]. As the volume of data and computational capabilities have expanded, sophisticated deep neural network algorithms have been developed and implemented for lithology identification [
28,
29,
30,
31]. These ML algorithms, along with deep neural network algorithms, have taken lithology identification to the stage of automatic identification. For example, Delavar [
32] used a multiple-kernel-function SVM combined with three neutrally heuristic optimization generators, including particle swarm optimization (PSO), grasshopper optimization algorithm (GOA), and grey wolf optimizer (GWO), to classify carbonate reservoir fractures within the Asmari reservoir in the Middle East. A comparison with alternative ML methods demonstrates that the hybrid SVM(RBF)-GWO offers superior accuracy. Bressan et al. [
33] classified lithology using four ML algorithms, namely MLP, DT, RF, and SVM, on multivariate logging data from offshore wells of the International Ocean Discovery Program (IODP), achieving good classification results. Zhao et al. [
34] proposed a classification enhancement semi-supervised generative adversarial network (CE-SGAN), which employs a classification separation architecture and a pseudo-label processing mechanism to reduce the impact of data unbalance, marking its inaugural application in lithology identification. The results show significant improvements in lithology identification for small, unbalanced datasets, along with good generalization performance and competitive advantages in data augmentation. Alzubaidi et al. [
6] introduced a convolutional neural network (CNN) model based on the ResNeXt-50 architecture for the automatic prediction of core tray images, which outperformed CNN models based on ResNet-18 and Inception-v3 architecture in terms of prediction accuracy. In conclusion, a considerable number of researchers have employed lithological analyses in the pursuit of conventional oil, gas, and mineral resources. Nevertheless, there is a deficiency of research examining the utilization of lithological classification in gas hydrate boreholes, and the existing methods for lithological classification are imperfect. In the PZQM, geological logging data contain abundant lithological information. However, the permafrost zones have development faults, the presence of missing and duplicated stratigraphy, and a variety of rock types [
35,
36], and reservoirs abundant in gas hydrates are characterized by lithologies such as fine sandstone, oil shale, siltstone, and mudstone [
5,
37]; there is a difficult problem in recognizing these lithologies. Accordingly, an accurate delineation of lithology in gas hydrate boreholes within the specified area would facilitate subsequent exploration and development of hydrate resources.
In this study, we integrated the logging data from hydrate boreholes within the PZQM with lithology and selected the logging curves that are more responsive to lithology. At the same time, we utilized four ML algorithms, namely RF, DT, MLP, and LR, to apply to the lithology classification of gas hydrate boreholes in the research zone. This extends a novel approach and methodology for lithology classification in permafrost zones and establishes a foundation for the future identification and exploitation of gas hydrates in permafrost zones.
The rest of the paper is organized as follows.
Section 2 describes the geological background of the study.
Section 3 presents an overview of the fundamental principles of the selected algorithms and the metrics used for model evaluation.
Section 4 presents the results of the lithological classification and the evaluation of the model. Then, a discussion of the results is given in
Section 5, followed by conclusions in
Section 6.
2. Geological Background
In this study, the target zone is situated in the PZQM in Western China, within the Juhugeng mining area of the Muli Coalfield, Tianjun County, Qinghai Province (
Figure 1). The PZQM is situated in the northern part of the Tibetan Plateau. The internal topography is characterized by a general elevation gradient from west to east and south to north, with an altitude of 4100–4300 m. Permafrost is present throughout the year, with a thickness of 60 to 120 m. The area encompasses approximately 100,000 km
2, and the yearly average air temperature stands at −5.1 °C [
38]. The Qilian Mountains comprise three tectonic units: the Northern Qilian Tectonic Belt, the Central Qilian Tectonic Belt, and the Southern Qilian Tectonic Belt [
35,
39]. The aforementioned tectonic units are separated by four ruptures, including the northern margin of the North Qilian–Central Qilian Fracture, the southern margin of the Central Qilian Fracture, and the Tuergen Daban Mountain–Zongwunong Mountain–Qinghai Lake Fracture.
In the permafrost region, the entire area is developed with Jurassic coal seams, with the upper part being the Jiangcang Formation and the lower part being the Muli Formation. From bottom to top, the sedimentary environments have transitioned from braided rivers, floodplain swamps, deltas, shallow lakes, and semi-deep lakes to deep lakes. The Juhugeng mining area, shaped by tectonic processes and evolutionary outcomes, features an anticline of Triassic strata in its central zone, with the northern and southern flanks consisting of synclines formed by coal-bearing Jurassic layers. Overall, it consists of a major anticline and two minor synclines [
40]. Within the mine, northwest-trending reverse faults are significantly developed, while northeast-trending large-scale shear fractures cut it into intermittent blocks of different sizes, presenting the tectonic characteristics of north–south zoning and east–west zoning. It divides the Juhugeng coal mining area into a planar pattern of three open pits and four well fields [
41].
The drilling area for gas hydrate is found within the Sanlutian field, where 14 gas hydrate drilling boreholes have been completed. The period of 2008–2009 saw a collaborative effort between the China Geological Survey and the Qinghai Coal Geological Exploration Team 105, resulting in the drilling of seven boreholes in the research region, from which gas hydrate physical samples were acquired from DK–1, DK–2, and DK–3. In 2013, four boreholes were drilled in the northwest part of the study area, yielding gas hydrate physical samples from the wells of the DK11–14, DK13–11, and DK12–13. In 2014, an additional 10 holes were drilled in the central to the eastern section of the research zone, with gas hydrate physical samples obtained only from the DK8–19 hole [
41,
42].
In the Muli region, all of the gas hydrates found are located beneath the permafrost layer, primarily within the Jiangcang Formation, and the reservoir depths are mostly in the range of 100–400 m. Gas hydrate usually aggregates in both porous and fractured forms. Porous hydrate is mostly in the form of dots, layers, and ripples that fill argillaceous siltstone and siltstone [
5]. Fractured hydrate is mostly in the form of thin layers, flakes, and blocks that fill dense rock types such as siltstone, oil shale, and mudstone [
37].
6. Conclusions
In this study, four classical ML algorithms, namely, DT, RF, MLP, and LR, are applied to the problem of classifying seven lithologies within the research region of the Sanlutian field of the Muli PZQM in Qinghai. Six types of logging data were employed as input feature curves for the training samples, including GR, VP, CNL, RT, CAL, and DEN from gas hydrate boreholes in the study area. Major lithologies in the tundra area were used as the outputs, including siltstone, mudstone, oil shale, coal, sandstone, silty mudstone, and argillaceous siltstone. Four ML models were trained using both training and test sets. To evaluate the behavior of each classification model, precision, recall, F1-score, and Jaccard coefficient were calculated. The key findings are as follows:
- (1)
In comparison with alternative ML models, RF has been demonstrated to be the most effective for the classification of lithological characteristics in logging data. The model achieved the highest evaluation score for precision, recall, F1-score, and Jaccard coefficient, with values of 0.941, 0.941, 0.940, and 0.889, respectively. The evaluation scores for the remaining models, in descending order, were as follows: MLP, DT, and LR.
- (2)
Among the seven major lithologies in this research area, most of the mudstone samples are misclassified as siltstone, silty mudstone, and oil shale, which presents a significant challenge concerning classification. To improve the classification accuracy of mudstone, it is necessary to obtain a larger number of additional samples and mine more characteristic curves in the study region.
- (3)
The ML technique utilizing logging data can facilitate a more accurate method of classifying the lithology type of hydrate boreholes within the PZQM. It has the potential to provide novel technical support for prospective searches for gas hydrate, as well as offer valuable references for the identification and exploration of gas hydrate reserves within the PZQM.