Clastic Rock Lithology Identification Based on Multivariate Feature Enhancement and Dynamic Confidence-Weighted Ensemble

Chen, Kang; Zhong, Guoyun; Diao, Fan

doi:10.3390/app16041808

Open AccessArticle

Clastic Rock Lithology Identification Based on Multivariate Feature Enhancement and Dynamic Confidence-Weighted Ensemble

by

Kang Chen

^1,2,

Guoyun Zhong

^1,2,* and

Fan Diao

^1,2,*

¹

School of Artificial Intelligence and Information Engineering, East China University of Technology, Nanchang 330013, China

²

Jiangxi Engineering Technology Research Center of Nuclear Geoscience Data Science and System, East China University of Technology, Nanchang 330013, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2026, 16(4), 1808; https://doi.org/10.3390/app16041808

Submission received: 29 December 2025 / Revised: 31 January 2026 / Accepted: 5 February 2026 / Published: 12 February 2026

Download

Browse Figures

Versions Notes

Abstract

The strong heterogeneity of clastic reservoirs and the phenomenon of similar log responses for different lithologies (i.e., “same spectrum, different rocks”) significantly weaken feature separability. Furthermore, distribution shifts between different wells cause traditional models to suffer from severe generalization bottlenecks in cross-well applications. To address this critical challenge, this paper proposes a dual-driven framework comprising “Multivariate Feature Enhancement + Dynamic Ensemble”. At the feature level, physics-informed enhancement and multi-scale statistics are introduced to construct a Multivariate high-dimensional feature system, thereby strengthening the representation of geological patterns. At the model level, a sample-aware Dynamic Confidence-Weighted Ensemble (DCWE) strategy is designed to achieve sample-wise adaptive decision-making based on prediction uncertainty, fundamentally breaking through the limitations of fixed weights in static ensembles. This method combines the complementary advantages of Gradient Boosting Decision Trees (GBDT) and deep sequence networks, enabling the simultaneous capture of local textural variations and continuous trends across depths. Based on rigorous Leave-One-Group-Out (LOGO) cross-validation, the proposed framework achieves a maximum accuracy of 84.58%. It significantly reduces the misclassification rate in lithology transition zones and for minority class samples, while maintaining the geological continuity of prediction results. These results verify the significant advantages of the proposed method in cross-well generalization scenarios.

Keywords:

lithology identification; well logs; dynamic ensemble; deep learning; confidence weighting; clastic rocks

1. Introduction

In the modern architecture of oil and gas exploration and development, Lithology Identification serves not only as the starting point for stratigraphic layering but also as a fundamental prerequisite constraint for reservoir parameter calculation and Favorable Reservoir Prediction [1,2]. As global energy exploration advances into complex domains such as deep clastic rocks, tight oil and gas, and shale gas, geological targets exhibit strong heterogeneity and diagenetic complexity [3,4]. In these high-risk exploration areas, even minor lithology identification errors can propagate through subsequent reserve estimation and fracturing design, adversely affecting reservoir evaluation outcomes [1,5].

However, facing the “lithological continuum” composed of sandstone, siltstone, and mudstone, well log interpretation is often constrained by the dual challenges of physical non-uniqueness and mineralogical complexity [6,7]. On one hand, complex mineral compositions trigger significant “overlapping log responses” (i.e., different lithologies exhibiting similar log characteristics); for instance, high-gamma sandstones are frequently confused with mudstones [7]. On the other hand, limited by the vertical resolution of logging instruments, thin interbeds often suffer from response smoothing due to volume effects, which further increases interpretation uncertainty [6]. Confronted with this high-dimensional, non-linear geophysical inversion problem contaminated by noise, the traditional interpretation paradigm relying on manual experience struggles to achieve reliable geological interpretations, challenging both inversion stability and interpretability [8,9].

To overcome the limitations of physical models, lithology identification technology has undergone a comprehensive evolution from statistical models to deep learning [1,10]. Early methods primarily relied on shallow machine learning models such as Support Vector Machines (SVMs), Random Forests (RF), and Gradient Boosting Decision Trees (Gradient Boosting Decision Trees (XGBoost)) to mitigate non-linear mapping difficulties [11,12]. In recent years, deep learning technologies, represented by Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTM), have significantly improved the accuracy and automation of lithology identification by leveraging end-to-end feature extraction capabilities [13,14,15]. Furthermore, to reduce the prediction variance of single models and enhance robustness, Ensemble Learning strategies—such as Voting, Stacking, and Bagging—have been widely adopted to fuse multi-model advantages, thereby improving the consistency and reliability of identification results [1,2].

Despite the significant progress of existing data-driven methods on standard datasets, they are still constrained by two intrinsic limitations when dealing with cross-well complex geological conditions [11,13]. The first is the deficiency of physical representation. Mainstream deep models tend to automatically learn features from raw data. However, given the “small sample, strong domain” characteristics of well log data, purely data-driven models lacking physical priors (e.g., texture, trends) struggle to capture robust boundaries with cross-well generalization capabilities [16,17]. The second is the rigidity of decision-making mechanisms. Sedimentary environments change over time and space, leading to vast differences in prediction difficulty across varying depths (i.e., non-stationarity). Yet, existing ensemble methods mostly adopt fixed-weight strategies, lacking “Sample-Aware” adaptive capabilities [18]. This inability to dynamically schedule model capabilities prevents effective identification in critical “transition zones” and regarding “ambiguous samples”, thereby limiting overall accuracy improvements [11].

Addressing these challenges, this paper proposes an intelligent lithology identification framework featuring Multivariate feature enhancement and sample-aware ensemble. To tackle the strong heterogeneity of geological environments, targeted improvements were made in two dimensions: data representation and decision mechanisms. At the input end, physics-informed enhancement and multi-scale statistics are introduced to construct a Multivariate high-dimensional feature system. At the output end, a sample-aware dynamic ensemble strategy is implemented to adjust model weights in real-time by evaluating sample uncertainty, overcoming the limitations of traditional static ensembles in adapting to non-stationary geological conditions. This synergistic mechanism of “Multivariate high-dimensional feature system” and “dynamic decision adaptation” significantly enhances the model’s generalization ability and robustness under complex cross-well conditions.

The main contributions of this paper are summarized as follows:

A Multivariate enhanced feature system is proposed by integrating micro-texture structures, multi-scale statistical measures, and petrophysical indicators. This constructs a Multivariate high-dimensional feature space that possesses both stratigraphic continuity and sensitivity to physical properties. This system significantly improves the separability among sandstone, siltstone, and mudstone, effectively mitigating the boundary ambiguity and cross-well representation instability caused by “overlapping log responses” in well log data.

A Sample-Aware Dynamic Confidence-Weighted Ensemble (DCWE) strategy based on an adaptive sharpening mechanism is developed, achieving a critical breakthrough from traditional static weighting to sample-wise adaptive decision-making. Leveraging global model performance as a prior and sample confidence as a posterior, this strategy dynamically adjusts model weights based on local uncertainty during the prediction process, thereby enhancing robustness and adaptability in non-stationary formations, lithology transition zones, and hard-to-classify samples.

The remainder of this paper is organized as follows: Section 2 introduces the research data; Section 3 describes the proposed method in detail; Section 4 evaluates the model performance, validating its advantages in terms of computational efficiency and classification accuracy; and Section 5 presents the conclusions.

2. Research Data

The dataset used in this study comes from real logging records provided by the 1st Sinopec Artificial Intelligence Innovation Competition. It contains 38,225 logging samples collected from four wells, covering a depth range of 301.04–3316.88 m. The dataset includes three logging features—spontaneous potential (SP), gamma ray (GR), and acoustic transit time (AC)—along with one three-class lithology label. The dataset is complete and contains no missing values.

For clarity and consistency throughout the manuscript, a unified color scheme is adopted to represent different lithofacies in all figures: siltstone (Label 0) is shown in red, sandstone (Label 1) in blue, and mudstone (Label 2) in green. This color convention is consistently applied in all statistical plots and well log profiles presented in this study.

Figure 1 shows a highly imbalanced label distribution: Label 0 accounts for 26.13% (9989 samples), Label 1 for 12.46% (4764 samples), and Label 2 for 61.40% (23,472 samples).

Figure 2, Figure 3 and Figure 4 summarize the overall statistical properties of each logging feature: the mean value of SP is 54.98 (standard deviation 21.59), GR has a mean of 134.72 (standard deviation 34.21), and AC has a mean of 293.89 (standard deviation 57.23). The three lithologies exhibit clear differences in the mean values of SP and GR: Label 2 (mudstone) shows the highest values, Label 1 (sandstone) the lowest, and Label 0 (siltstone) lies in between. This indicates that SP and GR are more sensitive to lithology variations. In addition, the correlations among the three features are relatively weak (SP–GR: 0.14; SP–AC: −0.10; GR–AC: −0.25), making them suitable as base inputs for subsequent feature engineering.

To facilitate subsequent comparative analysis and figure visualization, this study adopts the color scheme shown in Table 1: siltstone (Facies 0) is represented in red, sandstone (Facies 1) in blue, and mudstone (Facies 2) in green, with the same scheme applied consistently across all cross-sections and visualization results.

Figure 5, Figure 6, Figure 7 and Figure 8 illustrate the GR, AC, and SP logging curves and the corresponding lithofacies for the four wells (Well 2, Well 146, Well 298, and Well 2010). Overall, all wells exhibit clear vertical stratigraphic variations, and the logging responses of different lithofacies display stable geological signatures. Mudstone (Facies 2) is typically associated with higher GR, slower AC, and a relatively smooth SP curve. Sandstone (Facies 1) is characterized by low GR, low AC, and larger-amplitude SP fluctuations. Siltstone (Facies 0) generally falls between the two end-members and often shows transitional logging characteristics.

Frequent interbedding of sandstone, siltstone, and mudstone occurs in all four wells, with lithofacies transitions being particularly dense in Well 298 and Well 2, reflecting the strong heterogeneity of the depositional system and variations in clastic supply within the study area. The overlapping logging responses and blurred lithofacies boundaries visible in the figures further indicate the weak separability of lithologies.

3. Methodology

Figure 9 illustrates the overall workflow of the intelligent identification method for well log data proposed in this paper, encompassing feature engineering and dynamic weighted ensemble steps. This workflow realizes end-to-end optimization, ranging from raw well log data processing to intelligent identification prediction. Specifically, the proposed intelligent lithology identification framework comprises two core stages: (1) Multivariate feature engineering and selection; and (2) Dynamic ensemble. The detailed workflow is described as follows:

3.1. Characteristic Engineering and Screening

Figure 10 shows the comprehensive feature-engineering scheme adopted in this study. Two core techniques—sliding-window construction and gradient computation—are used to enhance feature representation. The sliding-window method concatenates adjacent data points to form local sequence features, capturing stratigraphic continuity patterns, while gradient computation extracts the rate of change in logging parameters to characterize vertical variations in reservoir properties [4,19,20].

Based on this foundation, multi-scale statistical features, geophysical parameters, and mineral-composition indicators specifically designed for the three lithologies are incorporated, together with deep features, cross features, and probability-enhanced features. This results in a high-dimensional feature space that integrates both physical significance and statistical characteristics, providing rich discriminative information for machine-learning models [14,16,21,22].

3.1.1. Data Loading and Synchronization Correction

The logging data used in this study include both the training and validation sets. A unified data interface is employed to perform standardized reading and depth alignment, ensuring consistency between feature construction and model training. To eliminate local anomalies and non-physical noise caused by logging conditions, the preprocessing workflow applies unified depth resampling within each well and uses a sliding-mean constraint to smooth both high- and low-frequency noise in the logging curves.

3.1.2. Design of Feature Generation Engine

This study constructs a feature-generation engine composed of three core modules—spatial continuity, multi-scale statistics, and physical mechanisms—to map the raw logging data into a high-dimensional feature space, resulting in a total of 1371 initial features.

To capture spatial continuity within the strata, we follow the strategy of [23] and use sliding windows to construct feature vectors that include neighboring data points, explicitly preserving local variations in the logging curves. The theoretical basis is Tobler’s First Law of Geography [24], which states that “things near each other are more closely related”. In addition, considering that sedimentary formations exhibit the Milankovitch cyclicity described by Hinnov [25], we further introduce the triangular positional encoding concept from deep learning [26]. Using sine and cosine transformations, linear depth is mapped into nonlinear features with periodic properties, thereby enhancing the model’s ability to perceive absolute position and stratigraphic cyclic structures.

I n - w e l l I n t e r p o l a t i o n : (x_{w} (d) = \{\begin{array}{l} x_{w} (d^{-}), \\ x_{w} (d^{+}), \\ 0, \end{array})

S l i d i n g W i n d o w C o n c a t e n a t i o n : (x_{w i n, k} (d) = x (d + k Δ d), k = - 2 \dots 2;

x_{g r a d} (d) = \frac{x (d + Δ d) - x (d)}{Δ d}

G r a d i e n t N o r m a l i z e d D e p t h : (d_{n} = \frac{d - d_{m i n}}{d_{m a x} - d_{m i n}}), ({D E P T H}_{g r a d i e n t} = Δ d)

T r i a n g u l a r e n c o d i n g : \begin{array}{r} p_{s i n} = \sin (π \cdot d_{n o r m}) \\ p_{c o s} = \cos (π \cdot d_{n o r m}) \end{array}

In constructing the multi-scale statistical features, Davis noted that single-point logging responses are often affected by borehole environmental noise [27], whereas statistical aggregation over windows of different sizes (e.g., moving averages and variances) can effectively suppress random noise and enhance trend components. Based on this principle, a multi-scale statistical interaction module is developed in this study to extract statistical measures—such as mean, variance, and kurtosis—over multiple window sizes (5, 10, and 25 points), thereby capturing multi-level variation patterns from micro-scale textures to macro-scale structural trends. Subsequently, higher-order differences and Z-score normalization are applied to strengthen the response to stratigraphic discontinuities and gradient changes, ultimately generating 1299 features.

R o l l i n g M e a n, : (μ_{w} (d) = \frac{1}{N_{w}} \sum_{i = - w}^{w} x (d + i Δ d)),

V a r i a n c e Q u a n t i l e s : (σ_{w} (d) = \sqrt{\frac{1}{N_{w}} \sum_{i = - w}^{w} (x - μ_{w})^{2}})

\begin{matrix} D e r i v a t i o n s S k e w n e s s : (q_{p, w} (d) = q u a n t i l e (x (d - w Δ d), \dots, x (d + w Δ d), p)) (I Q R w \\ = q 0.75, w - q_{0.25, w}), (q - r a n g e w = q 0.9, w - q_{0.1, w}) \end{matrix}

D e r i v a t i o n s S k e w n e s s : ({s k e w}_{w} = \frac{1}{N_{w}} \sum \frac{(x - μ_{w})^{3}}{σ_{w}^{3}}), ({k u r t}_{w} = \frac{1}{N_{w}} \sum \frac{(x - μ_{w})^{4}}{σ_{w}^{4}} - 3)

(z_{w} = \frac{x - μ_{w}}{σ_{w} + ε}), (r_{w} = \frac{x}{μ_{w} + ε})

(Δ_{k} x = x - x_{d - k Δ d}), (p c t k = \frac{x - x d - k Δ d}{x_{d - k Δ d} + ε})

(σ_{w} (x)), ({C V}_{w} = \frac{σ_{w}}{μ_{w} + ε})

(Δ x (d) = x (d) - x (d - Δ d)), (Δ^{2} x (d) = Δ (Δ x))

(Δ μ_{w_{1} - w_{2}} = μ_{w_{1}} - μ_{w_{2}}), (Δ σ_{w_{1} - w_{2}} = σ_{w_{1}} - σ_{w_{2}})

To compensate for the limited physical interpretability of purely data-driven models, this study further introduces a set of rock-physics-derived indicators. Following the classical rock-physics volumetric model of Ellis and Singer [28] and the logging-interpretation standards of Asquith and Krygowski [29], a series of geologically meaningful derived features is constructed from the three conventional logging curves—GR, AC, and SP. As a first step, the three curves are min–max normalized on a per-well basis to ensure consistent scaling across different wells, mapping all values to the range [0, 1]:

\tilde{G R} = \frac{G R - G R_{m i n}}{G R_{m a x} - G R_{m i n}}

\tilde{A C} = \frac{A C - A C_{m i n}}{A C_{m a x} - A C_{m i n}}

\tilde{S P} = \frac{S P - S P_{m i n}}{S P_{m a x} - S P_{m i n}}

The normalized curves retain their original geological implications:

\tilde{G R}

reflects clay content,

\tilde{A C}

describes porosity structure and formation compaction, and

\tilde{S P}

represents interlayer electrochemical potential responses and permeability characteristics. Based on these normalized curves, various lithology-indicator features are constructed to enhance the model’s sensitivity to the three lithologies—sandstone, siltstone, and mudstone. For example, mudstone intervals typically exhibit high GR values and weak SP contrasts; therefore, a mudstone indicator is constructed as:

I_{s h a l e} = \tilde{G R} \cdot (1 - \tilde{S P})

Sandstone typically exhibits low GR and AC (high wave velocity) characteristics, making it a suitable indicator for structural sandstone analysis.

I_{s a n d} = (1 - \tilde{G R}) \cdot (1 - \tilde{A C})

Siltstone, as the intermediate phase between sandstone and mudstone, has both mud components and partial permeability. Therefore, it is defined as a siltstone indicator.

I_{s i l t} = \tilde{G R} \cdot (1 - \tilde{A C}) \cdot (1 - \tilde{S P})

The above indicator is not to strictly describe the mineral content, but to construct the combination features with geological significance based on the response pattern of logging, which can improve the separability of the feature space.

Given the absence of density and neutron curves in this dataset, we developed a pseudo-porosity metric based on acoustic time difference to characterize material property variations. The metric is expressed as

ϕ_{p s e u d o} = \frac{A C - A C_{m i n}}{A C_{m a x} - A C_{m i n}}

This variable reflects the relative pore structure of the formation and is used to supplement the geological information of AC in reservoir property identification.

To better demonstrate the synergistic effect of the three logging curves in sandstone-mudstone identification, this study also develops a combined feature of GR–AC–SP to characterize the overall response trend.

I_{t r i} = 0.5 \cdot \tilde{G R} + 0.3 \cdot (1 - \tilde{S P}) + 0.2 \cdot (1 - \tilde{A C})

The weights reflect the relative contribution of the three curves in distinguishing sandstone and mudstone, with GR showing the highest sensitivity to clay content, followed by SP and AC. This feature is not a physical volumetric model but rather a statistically defined linear combination designed to enhance feature discriminability and improve the overall lithology classification performance of machine-learning models.

Through the construction of these rock-physics-derived features, the original logging curves are mapped into an enhanced representation carrying geological implications such as clay content, permeability, electrical response, and pore-structure characteristics. A total of 43 features are generated, effectively improving the model’s robustness under complex stratigraphic conditions.

3.1.3. Feature Combination and Cleaning

Given the constructed 1371-dimensional high-dimensional feature space, directly feeding all features into the model may lead to the “curse of dimensionality”. As noted by Guyon and Elisseeff, a single feature-selection method may introduce structural bias—for example, mutual information tends to favor high-cardinality features [30], whereas linear methods ignore nonlinear interactions. To ensure the robustness of the selection results, this study develops a parallel scoring framework that integrates information-theoretic, statistical, and machine-learning perspectives.

According to the information-theoretic framework of Cover and Thomas [31], mutual information characterizes arbitrary dependency between features and labels through joint probability distributions, making it well suited for retaining features that are not linearly separable yet carry important geological significance. On this basis, to strengthen the discriminative ability of features across different lithologies, ANOVA-based variance analysis is introduced to perform linear discrimination and identify significant features with high F-statistics.

I (X; Y) = \sum_{x \in X} \sum_{y \in Y} p (x, y) \log (\frac{p (x, y)}{p (x) p (y)})

The joint probability distribution is denoted by p(x,y), and the marginal probability distributions are p(x) and p(y). The higher the MI value, the richer the information contained in the feature.

On this basis, one-way analysis of variance (ANOVA) is further applied to perform linear discriminative testing on the candidate features. ANOVA evaluates the significance of between-class differences by comparing the ratio of the between-class mean square to the within-class mean square (the F-statistic) [32]. A larger F value indicates a more pronounced distributional difference in the feature across lithology categories.

F = \frac{{M S}_{b e t w e e n}}{{M S}_{w i t h i n}} = \frac{\sum_{k = 1}^{K} n_{k} ({\bar{x}}_{k} - \bar{x})^{2} / (K - 1)}{\sum_{k = 1}^{K} \sum_{i = 1}^{n_{k}} (x_{k i} - {\bar{x}}_{k})^{2} / (N - K)}

Here, K denotes the number of classes, and N represents the total number of samples. A larger F value indicates that the feature exhibits a more significant distributional difference across geological categories, implying stronger linear separability.

Considering that many geological features demonstrate discriminative power only when used in combination, this study also adopts Light Gradient Boosting Machine (LightGBM) [33], which leverages the Split Gain mechanism of gradient-boosting trees to identify high-order interaction features.

V_{j} = \sum_{t = 1}^{M} \sum_{η \in N_{t}} I (v (η) = j) \cdot Δ G (η)

Here, M denotes the number of trees in the ensemble,

N_{t}

represents the set of nodes in the t tree,

v (η)

denotes the splitting feature selected at node η\eta, and

Δ G (η)

is the gain in the loss function introduced by that split.

Through the complementary screening across the aforementioned three dimensions, this mechanism eliminated approximately 276 redundant features. Consequently, an optimal feature subset balancing information content, discriminability, and interactivity was constructed, providing a low-redundancy and high signal-to-noise ratio input foundation for subsequent model training.

From the finally retained features, 10 representative features were selected for demonstration, as summarized in Table 2. These correspond to seven feature generation strategies: spatial continuity modeling, depth encoding, multi-scale statistics, standardization enhancement, rate of change features, physics-derived indicators, and curve interaction.

3.2. Model Training and Dynamic Integration

This phase aims to construct a robust lithology classification model utilizing the selected feature set. To ensure the independence of the training and evaluation processes, the dataset is partitioned by well ID, thereby simulating realistic cross-well prediction scenarios.

During model training, a model library composed of structurally diverse base models was constructed. This library includes Gradient Boosting Decision Trees (e.g., LightGBM, XGBoost) and deep sequence models (e.g., CNN-LSTM, Tabular Transformer), designed to provide complementary predictive information derived from distinct learning paradigms.

To overcome the limitations of traditional static weighting in adapting to sample variability within complex formations, this paper proposes the Dynamic Confidence-Weighted Ensemble (DCWE) mechanism as the core methodology. Specifically, this mechanism first constructs global performance weights based on the Macro-F1 scores of base models on validation wells. Subsequently, during the inference phase, it generates dynamic weights by incorporating the model’s confidence in the current sample. Through the joint modulation of “global reliability” and “sample-level uncertainty”, DCWE achieves sample-wise adaptive fusion, thereby attaining superior robustness and stability under cross-well conditions, within lithology transition zones, and across non-stationary depth sections.

3.2.1. Workflow of the Proposed Method

After completing the feature engineering, multiple tree-based models are constructed and trained, including LightGBM, XGBoost, CatBoost, ExtraTrees, and RandomForest. These models exhibit complementary strengths in terms of loss-optimization paths, feature-splitting strategies, and bias–variance characteristics. Therefore, they are trained in parallel to enhance the diversity and expressive capacity of the model library.

To further capture the sequential dependencies and higher-order nonlinear patterns in the logging data, two types of deep-learning models are trained: a CNN–LSTM model, which jointly extracts local textural patterns and long- and short-range dependencies across depth, and an enhanced Tabular Transformer, which models structured interactions among high-dimensional inputs. All deep-learning models are monitored on the validation wells, and early stopping is applied to some models to suppress overfitting during training.

To achieve optimal fusion of heterogeneous models, this study proposes a confidence-regulated Dynamic Weighted Ensemble framework. In this framework, the weight assigned to each model is determined jointly by its global performance and local confidence, thus reflecting both the model’s overall reliability and its sample-specific responsiveness.

At the global level, the F1 scores obtained by each base model on the validation wells are used to derive its base weight, representing its stability and credibility in overall performance. At the same time, during the testing stage, the algorithm evaluates the predicted probabilities for each sample to characterize the model’s local confidence: models that produce more confident predictions receive higher weights for that sample, whereas those with lower confidence are down-weighted. This results in a weighting mechanism that dynamically adapts to variations in sample distribution.

By combining the global base weight with the dynamic, confidence-based adjustment, the final per-sample weight integrates both model-level reliability and sample-level certainty. This ensemble strategy not only improves overall predictive performance but also significantly enhances the generalization capability under mixed-well conditions.

3.2.2. Architecture of the Proposed Model

Figure 11 presents the two-stage dynamic hybrid ensemble architecture proposed in this study. The overall workflow consists of a parallel heterogeneous feature encoder and a sample-level dynamic weighting ensemble layer, achieving full-process optimization from Multivariate feature extraction to model fusion.

Figure 11 illustrates the proposed dynamic ensemble architecture. The overall workflow comprises the “Multi-Model Prediction Layer” and the “Sample-Level Dynamic Weighted Ensemble Layer”, realizing end-to-end optimization from Multivariate feature extraction to model fusion.

Multi-Model Prediction Layer

Following generalized feature engineering and feature space expansion, a multi-model prediction layer composed of tree-based and deep models was constructed to characterize the feature structure of well log data from different perspectives. The tree-based branch includes common ensemble algorithms such as LightGBM, XGBoost, CatBoost, ExtraTrees, and RandomForest. By employing feature partitioning, these models effectively capture non-linear relationships in high-dimensional data and remain robust to feature scaling and noise, thereby providing reliable baseline predictions within the overall system. Complementarily, the deep learning branch includes a CNN–LSTM network and an enhanced Tabular Transformer, utilized to model the local morphology and cross-depth correlations of well log curves. Specifically, the CNN–LSTM network reconstructs processed features into sequences, extracts local features via convolution, models depth-wise dependencies using LSTM, and outputs classification results combined with an attention mechanism; meanwhile, the Tabular Transformer maps features into an embedding space, establishes inter-feature relationships through multiple Transformer Blocks, and obtains final predictions via pooling and a Multi-Layer Perceptron (MLP). The model outputs generated by both branches are fed into the dynamic weighted ensemble module in the second stage to obtain the final classification results.

Dynamic Confidence-Weighted Ensemble Layer

Traditional static ensemble strategies rely solely on fixed weights for the linear combination of model outputs, inherently ignoring the non-stationary characteristics of prediction uncertainty across different geological samples. To overcome this limitation, this paper proposes a Sample-Aware Dynamic Ensemble Strategy. This strategy achieves optimal fusion at the sample level by constructing a dual constraint mechanism of “global performance priors” and “local confidence posteriors”. Its core lies in the algorithm’s ability to real-time “perceive” the prediction difficulty of samples and adaptively adjust the weight distribution morphology, which is specifically coupled by the following two components.

The global reliability anchor evaluates a model’s inherent performance using the macro F1 score on an independent validation set. To further reinforce the dominant role of superior

W_{b a s e, i}

models, we apply a square scaling operator and normalize the results to obtain static benchmark weights. This metric reflects the model’s global generalization capability, providing stable prior constraints for ensemble learning.

W_{b a s e, i} = \frac{(F_{1 i})^{2}}{\sum_{j} (F_{1 j})^{2}}

Sample-aware sharpening mechanism

s i C_{i, s} = \max (P_{i, s}) τ_{s} {\bar{C}}_{s}

: For test samples, the instantaneous confidence level of the model is first quantified by the maximum value of the output probability vector. The innovation of this paper lies in introducing a sample-aware adaptive sharpening factor. This factor establishes a negative correlation mapping with the current average confidence level of the sample, thereby enabling real-time control of the weight distribution pattern.

τ_{s} = 1.5 + 4.0 \cdot (1 - {\bar{C}}_{s}), {\bar{C}}_{s} = \frac{1}{M} \sum_{j} C_{j, s}

W_{d y n, i, s} = \frac{\exp (C_{i, s} / τ)}{\sum_{j} \exp (C_{j, s} / τ)}

The final fusion mechanism leverages the product of global prior and local sharpening weights, with the integrated prediction output being a weighted sum of all model probabilities.

W_{f i n a l, i, s} = \frac{W_{b a s e, i} \times W_{d y n, i, s}}{\sum_{j} (W_{b a s e, j} \times W_{d y n, j, s})}

The integrated prediction is the weighted sum of all model probabilities.

{\hat{P}}_{s} = \sum_{i} W_{f i n a l, i, s} \times P_{i, s}

This sample perception mechanism essentially functions as an “uncertainty gate”:

On the “fuzzy sample” with large divergence of model group and low average confidence, the algorithm

τ_{s}

perceives high uncertainty, and automatically increases the sharpening factor to produce “Matthew effect” to lock the most credible local model and suppress the noise.

On the high-confidence “typical sample”, the algorithm perceives low uncertainty

τ_{s}

and automatically reduces it, and synthesizes the opinions of multiple models through “democratic voting” to reduce variance.

This real-time switching of decision logic based on sample characteristics significantly improves the model’s generalization robustness under complex non-stationary geological conditions.

P_{s} = \sum_{i} (\frac{W_{b a s e, i} \cdot \exp (C_{i, s} \cdot τ_{s})}{\sum_{j} (W_{b a s e, j} \cdot \exp (C_{j, s} \cdot τ_{s}))}) \times P_{i, s} W_{b a s e, i} = \frac{(F 1_{i})^{2}}{\sum_{j} (F 1_{j})^{2}},

C_{i, s} = \max (P_{i, s}), τ_{s} = 1.5 + 4.0 \cdot (1 - {\bar{C}}_{s})

4. Experimental Results and Discussion

4.1. Experimental Setup

The experimental dataset is derived from real-world well logging records provided by the 1st SINOPEC AI Innovation Competition, reflecting practical oil and gas exploration scenarios. To rigorously simulate unseen-well prediction and prevent data leakage, a strict Leave-One-Group-Out (LOGO) cross-validation strategy was adopted. Following the methodology described in [34], well IDs were used as grouping labels, where three wells were used for model training and the remaining well was held out exclusively for testing. To further ensure scientific rigor, early stopping and Dynamic Class Weight Estimation were conducted using an internal validation well selected only from the training wells, rather than from the test well. This internal validation process was strictly isolated from the held-out test well, thereby preserving the integrity of the LOGO evaluation protocol. As a result, the test well remained completely unseen during all stages of model selection and optimization, ensuring a fully blind evaluation. Model performance was assessed using four complementary metrics: Accuracy, Macro F1-score, Matthews Correlation Coefficient (MCC), and Cohen’s Kappa. Accuracy provides an overall measure of correctness, while Macro F1-score mitigates the impact of class imbalance commonly observed in geological datasets by assigning equal importance to each class [35]. MCC offers a robust statistical evaluation by incorporating all elements of the confusion matrix, yielding reliable performance estimates even under severe class imbalance [36]. Finally, Cohen’s Kappa measures the agreement between predictions and ground truth beyond chance, further enhancing the interpretability and reliability of the experimental results [37]

4.2. Verification of the Effectiveness of the Characteristic Engineering

Prior to model training, it is necessary to verify the effectiveness of the Multivariate feature engineering proposed in this paper, such as sliding window features, multi-scale statistical features, pseudo-porosity, depth gradients, and cross features.

4.2.1. Analysis of Separability in Feature Space

Figure 12 displays the distribution of raw well log features and engineered features in the UMAP (Uniform Manifold Approximation and Projection) two-dimensional embedding space. In the figure, each point represents a depth sample point from the training set, with colors indicating the corresponding lithology labels (facies 0/1/2). UMAP is a nonlinear dimensionality reduction method based on manifold learning that maps high-dimensional data to a low-dimensional space while preserving local neighborhood structures as much as possible to reveal latent cluster structures; it has been widely used for the visualization and pattern recognition of high-dimensional data in the geoscience and energy fields [38,39,40]. Therefore, this figure can be used to examine whether an “intra-class compactness—inter-class separation” structure is formed in different feature spaces, verifying the improvement effect of feature engineering on lithology separability from a visualization perspective.

Raw Log Curves UMAP: The raw features consist of three conventional well log curves: SP, GR, and AC. Due to characteristics such as high noise, overlapping physical property responses, and blurred class boundaries in well log curves, the three lithology classes are mixed in the two-dimensional space after dimensionality reduction. The large area of color overlap indicates that the discriminative ability of raw well log features in the feature space is limited.

Engineered Feature UMAP: Engineered features consist of high-dimensional features such as multi-scale statistics, gradients, and morphology, which can more completely describe the local patterns and variation trends of well log curves. After UMAP maps the high-dimensional engineering features to two dimensions, each lithology class forms relatively compact clusters, and distinct sparse zones appear between different clusters, indicating that class differences are captured more effectively. Overall, engineered features have formed a structure of “intra-class compactness and inter-class separation” in the high-dimensional space, whereas raw well log features struggle to present this discriminative ability, which proves the effectiveness of the feature engineering scheme proposed in this paper for the lithology classification task from a visual standpoint.

4.2.2. Original Features vs. Engineered Features

It compares the performance of models trained using the initial feature set and the full feature set under the same validation setting, in order to isolate the contribution of feature engineering in Table 3.

The comparison in Table 3 shows that feature engineering significantly improves the performance of all models. With original features, Dynamic Weighted Ensemble performed best, achieving an accuracy of 0.6927 and a Macro-F1 score of 0.4970. After feature engineering, the model still outperforms others, with an accuracy of 0.8439 and a Macro-F1 score of 0.7033, demonstrating the clear advantage of the ensemble model.

4.3. Comparison of Integration Strategies

To verify the effectiveness of the method proposed in this paper, we compared the DCWE model with existing mainstream models. The specific performance metrics of each model on the test set are shown in Table 4.

As shown in Table 4, which details the average consistency metrics of each model across four wells, the performance comparison was conducted to verify the effectiveness of the Dynamic Confidence-Weighted Ensemble (DCWE) strategy. The evaluation selected four dimensions—Accuracy, Macro F1-score (Macro F1), Matthews Correlation Coefficient (MCC), and Cohen’s Kappa—aiming to characterize model performance from multiple perspectives, including overall precision, class balance, and prediction consistency. Experimental data clearly indicate that ensemble learning strategies generally outperform single baseline models in terms of overall accuracy. The DCWE model ranks first with an average accuracy of 0.8186, performing on par with the static ensemble model (0.8175), and both surpass the best-performing single model, Transformer (0.8099). This high-precision performance confirms the fundamental advantages of multi-model fusion in reducing prediction variance and enhancing overall discriminative capability. It is worth noting that although the difference in accuracy between the static ensemble strategy and DCWE is marginal, the two exhibit significant statistical divergence in the Macro F1 metric, which measures class balance. The Macro F1 of the static ensemble model is only 0.5932, significantly lower than DCWE’s 0.6661; this gap of 0.07 profoundly reveals the limitations of traditional static averaging strategies when dealing with class-imbalanced data, specifically their tendency to be dominated by majority classes at the expense of minority class recognition precision. Conversely, by introducing a dynamic confidence mechanism, DCWE successfully achieves adaptive weight adjustment for hard-to-classify samples. While maintaining high accuracy, it not only significantly improves the Macro F1 score but also surpasses the CNN-BiLSTM model (0.6302), which possesses advantages in sequence feature extraction. Furthermore, the MCC and Kappa coefficients, serving as rigorous metrics for evaluating classification consistency and excluding random interference, further corroborate the robustness of the above conclusions. DCWE achieved 0.5774 and 0.5733 on these two metrics, respectively, both significantly higher than the static ensemble model (MCC: 0.5649, Kappa: 0.5601) and other comparison methods. This full-dimensional performance advantage fully verifies that the proposed method possesses superior robustness and generalization capabilities when facing complex geological data distributions, and can provide more reliable and balanced prediction results compared to traditional static ensembles. In addition, the ablation results reported in Table A4 (Appendix A), where all depth-related features are removed, provide strong evidence for the robustness of the proposed DCWE framework. Even in the absence of explicit depth or elevation information, the DCWE model maintains a stable Macro F1-score of approximately 0.65, outperforming all individual base models and the static ensemble. This result indicates that the proposed method does not overly rely on positional cues, but instead learns transferable rock-physics-related representations from multivariate logging responses. Such behavior is particularly important for cross-well applications, where depth distributions and stratigraphic thicknesses may vary significantly between wells. The consistently high MCC and Kappa values further confirm that DCWE preserves reliable classification behavior under degraded feature conditions, highlighting its superior robustness compared to conventional ensemble strategies.

4.4. Analysis of Model Robustness and Generalization Ability

To quantitatively evaluate the adaptability of the models in the face of changing geological conditions and data distribution shifts, we calculated the performance fluctuation range of each model across four wells, the detailed results of which are presented in Table 5.

The experimental results reveal significant limitations in the generalization capabilities of single benchmark models. Although models such as Extra Trees and CNN-BiLSTM can achieve high accuracy upper bounds on specific datasets (reaching 0.8877 and 0.8711, respectively), their performance proves extremely unstable, with lower accuracy bounds dropping to 0.6958 and 0.7123, respectively, resulting in a range approaching 0.2. This drastic performance oscillation indicates that traditional models are prone to overfitting the specific distributional characteristics of training data; consequently, even slight shifts in lithology combinations or log responses in the test well section can trigger “performance collapse”, failing to meet the rigorous “lower limit guarantee” requirements for interpretation results in practical engineering contexts. In contrast, the DCWE model proposed in this paper demonstrates superior robustness and generalization advantages. Regarding accuracy, DCWE consistently converges within a high-performance interval of 0.7905 to 0.8458; its lower performance bound of 0.7905 significantly outperforms all comparison models, and it exhibits the smallest fluctuation amplitude, demonstrating the model’s reliable discriminative capability even on extremely challenging samples. Furthermore, regarding the MCC and Kappa metrics, which measure comprehensive consistency, DCWE achieves average values of 0.5774 and 0.5733, respectively, ranking first among all models. These results strongly confirm that the dynamic confidence weighting mechanism employed by DCWE can adaptively smooth the bias and variance of single base classifiers based on sample prediction difficulty, thereby effectively mitigating interference from local geological features and ensuring highly consistent and credible lithology identification results across cross-regional and variable geological environments. It should be noted that the dataset used in this study is highly imbalanced, with mudstone accounting for 61.40% of all samples, whereas sandstone—the primary target lithology in many exploration scenarios—constitutes only a minority portion of the data. Under such conditions, global accuracy (84.58%) becomes a secondary indicator of model performance, as it can be dominated by majority-class predictions. To address this issue, per-class precision, recall, and F1-scores are reported in Appendix A and warrant explicit discussion here. In particular, sandstone recall is a critical metric from a geological and engineering perspective. As shown in Table A2, the proposed DCWE model achieves the highest sandstone F1-score and a substantially improved recall compared to both single models and the static ensemble. This indicates that DCWE more effectively mitigates majority-class bias and enhances sensitivity to minority lithologies, without sacrificing overall stability. Such balanced behavior is essential for practical reservoir characterization, where missing sandstone intervals may lead to significant exploration and development risks.

4.5. Visual Analysis

To qualitatively validate the advantages of the proposed method, and given the performance variance observed across wells in the four-fold LOGO statistics, this section selects Well 298 for detailed visualization and analysis. This selection ensures representativeness while maintaining conciseness, as Well 298 is characterized by a complete geological structure, typical lithological transitions, and a moderate difficulty level within the LOGO framework; quantitative results for the remaining wells have already been presented in the preceding sections on average performance and range values.

Using Well 298’s validation depth window (2223.12–2252.12 m) as an example, Figure 13 presents a comparative analysis of stratigraphic distribution between true lithological data and model predictions, while simultaneously displaying smoothed confidence curves and segmental means. By cross-referencing lithological boundaries with confidence level variations, the study systematically evaluates the model’s classification consistency, boundary delineation capability, and uncertainty characteristics within this validation window, providing a basis for subsequent precision analysis and result interpretation.

Figure 14 illustrates the lithology prediction results of various models on this validation well. In terms of overall distribution, all models exhibit high consistency in identifying major lithological intervals; however, significant discrepancies arise in complex stratigraphy, such as sandstone-siltstone alternating zones and argillaceous interbeds. Traditional tree-based models (XGB, RF, LGB, CAT, ET) demonstrate stable overall performance but suffer from class confusion and inter-layer oscillation at lithological boundaries. Conversely, deep learning models (Transformer, CNN-LSTM) show greater sensitivity in capturing non-linear well log responses and local continuous variations. While the static ensemble model (ensemble_static) smooths out single-model errors to some extent, its fixed weights limit its adaptability to dynamic stratigraphic changes; in contrast, the dynamic ensemble model (ensemble_dynamic) displays clear adaptive advantages across multiple complex intervals.

Specific depth intervals reveal the distinct advantages of the dynamic approach. In the upper section (approximately 1470–1600 m), where lithology is predominantly sandstone, models such as CAT, XGB, and the static ensemble exhibit misclassifications of sandstone as siltstone. However, the dynamic ensemble model successfully provides predictions consistent with the true lithology, indicating that when model disagreement occurs, it identifies the advantages of Transformer and CNN-LSTM in handling spatial continuous features, thereby avoiding the over-smoothed predictions typical of tree models in sandstone layers. In the middle transition section (approximately 1600–1750 m), where the formation gradually transitions from siltstone to sandstone with frequent fluctuations in curve characteristics, the divergence among models is most significant. XGB, RF, and the static ensemble misclassify siltstone as sandstone in multiple locations, demonstrating a delayed response to interface feature changes. The dynamic ensemble, however, adjusts weights dynamically based on local well log feature variations, more accurately distinguishing between the two lithologies. This adaptive capability demonstrates that dynamic integration is not merely a simple averaging or weighted fusion but possesses a “local judgment” mechanism capable of flexibly selecting the most credible model output based on feature distribution. In the lower reservoir section (below approximately 1800 m), where the formation is primarily sandstone interspersed with thin siltstone layers, CAT, LGB, and the static ensemble still show misclassifications at several interface positions. In contrast, the dynamic ensemble responds more accurately to these thin layers, showing a tighter correspondence with the true lithology distribution, which suggests it not only corrects errors at complex lithological interfaces but also maintains overall prediction consistency in stable sections, avoiding the “over-averaging” problem common in traditional ensemble methods.

Overall, the dynamic ensemble model’s predictions across the entire well are closest to the ground truth labels, with significantly fewer misclassification zones, particularly demonstrating higher discriminative power in the middle transition zone. Compared to static integration, the dynamic ensemble achieves a dynamic balance among multiple models by introducing a weight adjustment mechanism based on feature response. When predictions diverge significantly, it automatically enhances the contribution of deep learning models, yielding classification results in complex formations that better align with geological laws. This method enhances the flexibility of deep models while maintaining the stability of tree models, effectively improving the precision and continuity of lithology identification. In summary, the validation results on Well 298 fully embody the “dynamic arbitration” characteristic of the dynamic ensemble model. It adaptively integrates the strengths of various models across different intervals, significantly reducing misclassification and overfitting issues, and exhibiting overall performance superior to single models and static ensembles. This mechanism provides a more robust and geologically rational technical path for automatic lithology identification and fine stratigraphic subdivision under complex reservoir conditions.

4.5.1. Pairwise Comparison Analysis of Model Classification Performance

To further verify the superiority of the dynamic ensemble model, a pairwise comparative analysis of the confusion matrices for each model across 3 classes (Facies 0, 1, 2) was conducted, with results shown in Figure 15.

Figure 15 displays the confusion matrices for each model in the lithology prediction task for Well 298, characterizing the classification accuracy and confusion patterns of each model across the three lithology classes. Overall, deep learning models (such as CNN-LSTM and Transformer) exhibit a more concentrated main diagonal structure in the major lithology classes, indicating their superior advantages in extracting longitudinal continuous information and characterizing stratigraphic context features, thus performing more stably in thick layers or lithological sections with distinct features. In contrast, tree-based models (such as RF, ET, XGB) generally exhibit more off-diagonal elements in lithological transition zones or depth intervals with drastic well log curve changes, suggesting a higher sensitivity to noise and thin interbeds, which leads to the misclassification of adjacent lithologies. Furthermore, although the static ensemble model outperforms some single models overall, it inevitably inherits the biases of base models regarding minority lithologies (especially Facies 1), resulting in observable inter-class confusion within the confusion matrix. Conversely, the dynamic ensemble model, by adaptively adjusting the trust weights for base models at different depths, effectively suppresses misclassifications at various lithological boundaries, presenting a clearer and more compact main diagonal structure, which reflects the full utilization of model complementarity by the dynamic arbitration mechanism. In summary, this figure clearly reveals the prediction bias patterns resulting from the combined effects of data imbalance, stratigraphic structural complexity, and model structural differences, providing an important basis for understanding the applicability and limitations of each model and for further designing more robust ensemble strategies.

4.5.2. Per-Class Macro-F1 Distribution Box Plots for Different Models

Figure 16 presents macro-F1 bar charts for nine models, with per-class points and min/max error bars to show overall performance and class-wise dispersion. The Dynamic and Static Weighted Ensembles rank highest (around 68–70% macro-F1), and their three class points cluster relatively tightly, indicating strong overall accuracy with smaller inter-class gaps. LightGBM and XGBoost form the second tier (about 57–58% macro-F1) but exhibit the largest class imbalance: Facies 2 is near 88–90%, while Facies 1 drops to roughly 5–8%, showing high sensitivity to class imbalance. Transformer, Random Forest, CatBoost, and CNN-BiLSTM sit in the mid range (about 62–67% macro-F1), where Facies 1 remains notably lower and drives much of the dispersion. Extra Trees is also mid-range, but its class spread is comparatively smaller (Facies 1 around mid−40%), suggesting better balance than XGBoost/LightGBM. Overall, ensemble strategies deliver the best macro-level accuracy and class balance, while boosting and some deep models show strong performance on certain classes but struggle on the minority class.

5. Conclusions

Addressing critical challenges in clastic formations—specifically the significant overlapping log responses, evident inter-well distribution shifts, and difficulties in identifying transition zones—this study proposes a dual-driven intelligent lithology identification framework composed of feature enhancement and dynamic ensemble learning. At the feature level, a high-dimensional feature system with improved geological consistency was constructed by integrating multi-scale statistical descriptors, micro-texture structures, and petrophysical indicators. This design effectively enhances the separability of sandstone, siltstone, and mudstone in the feature space, particularly under complex depositional conditions.

At the model level, a Dynamic Confidence-Weighted Ensemble (DCWE) mechanism was introduced, combining global performance priors with sample-level confidence posteriors. Unlike traditional static ensemble strategies, DCWE enables adaptive, sample-wise weight adjustment based on prediction uncertainty, thereby improving robustness to heterogeneous formations and local distribution shifts. Under strict (LOGO) testing conditions, the proposed framework demonstrated superior cross-well generalization and stability across multiple evaluation metrics. The maximum classification accuracy reached 84.58%, while Macro-F1, MCC, and Kappa scores consistently outperformed both individual models and static ensemble baselines. Moreover, the method maintained strong geological consistency in lithological transition zones and intervals exhibiting pronounced inter-well variability, highlighting its potential for practical well log interpretation tasks.

In addition to predictive performance, the sensitivity of depth-wise sequence models to training sample scale was carefully considered. Experimental results indicate that the proposed framework preserves stable performance under limited sample conditions due to its ensemble-driven uncertainty adaptation, while performance degradation under extreme data scarcity remains gradual rather than abrupt. This behavior suggests favorable robustness characteristics for real-world scenarios where labeled well data are often sparse or unevenly distributed.

From an engineering perspective, the proposed framework exhibits practical scalability and feasibility for field applications. Although the dynamic confidence evaluation introduces additional computation during inference, the overall complexity remains manageable due to the parallelizable structure of base learners and the absence of iterative post-processing. With appropriate model lightweighting and inference optimization, the framework can be adapted for near-real-time deployment. Nevertheless, practical constraints such as on-site computational resources, data transmission latency, and environmental limitations in field operations must be considered when applying the method in logging-while-drilling or real-time interpretation scenarios.

Despite its favorable performance, the effectiveness of DCWE remains influenced by the overall quality and diversity of the base learner pool, and local inconsistencies may still arise in intervals where all base models exhibit weak discriminative capability. Future research will focus on enhancing robustness and efficiency through several directions: (1) incorporating advanced uncertainty estimation techniques, Bayesian ensemble formulations, or Out-of-Distribution (OOD) detection to improve reliability in anomalous or highly uncertain sections; (2) extending the framework to multi-modal data fusion involving conventional logs, core measurements, and LWD data; and (3) integrating model compression, inference acceleration, and online updating mechanisms to further improve real-time usability. In addition, systematic evaluation of cross-basin and cross-block transferability will be essential to promote broader application across diverse geological settings.

Author Contributions

Methodology, Conceptualization, Software, Writing—original draft, K.C.; Writing—review & editing, Validation, G.Z.; Supervision, Software, F.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Jiangxi Engineering Technology Research Center of Nuclear Geoscience Data Science and System, East China University of Technology.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: https://aicup.sinopec.com/competition/SINOPEC-01/data/ (accessed on 15 November 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Related Work—Physics-Informed & Sample-Aware Learning

Lithology identification from well logs is inherently challenged by the ambiguity of logging responses, where distinct lithologies may exhibit highly similar measurement signatures under complex geological conditions (the “same-log-response–different-lithology” phenomenon) [1,6,28,29]. This limitation motivates the incorporation of physical and geological prior knowledge into data-driven models to improve robustness and interpretability [6,10]. Recent studies have explored physics- and geology-informed learning for well-log interpretation through different strategies. One line of research integrates stratigraphic or geological constraints into neural network architectures or loss functions to enforce depth continuity and geological consistency across samples [13,14,21]. Another direction incorporates petrophysical relationships or physical bounds into learning frameworks, aiming to guide predictions toward physically plausible regimes [16,22]. More broadly, physics-guided and physics-informed learning have been proposed as general paradigms for embedding physical knowledge into machine learning models [41,42]. While these approaches can enhance interpretability and stabilize predictions, they often rely on simplified assumptions or empirical relationships and may suffer from reduced flexibility or over-smoothing in strongly heterogeneous reservoirs [6,10,16]. In parallel, ensemble learning has been widely adopted to improve generalization performance in lithology classification, particularly under class imbalance and inter-well distribution shifts [11,12,15,17,19]. The concept of sample-aware learning, in which model contributions are dynamically adjusted based on sample-specific characteristics such as prediction confidence or local difficulty, has been extensively studied in the ensemble learning literature. Representative works on dynamic classifier/ensemble selection demonstrate that confidence-driven weighting or selection can outperform static ensembles by adapting model contributions on a per-sample basis [43,44,45]. However, such methods are predominantly evaluated on generic machine-learning benchmarks and are rarely investigated in geophysical well-log interpretation scenarios characterized by strong depth dependence and cross-well heterogeneity, especially under LOGO evaluation settings. In this context, the present work does not claim conceptual novelty in physics-informed learning or sample-aware ensemble weighting. Instead, it focuses on adapting and integrating these established ideas into a unified framework tailored for lithology identification under a LOGO evaluation setting, emphasizing robustness and generalization across wells. To facilitate reproducibility and further research, the implementation of the proposed framework has been made publicly available at https://github.com/trongvanhoang378-debug/-Lithology-Identification/tree/main (accessed on 20 January 2026).

Table A1 reports the classification performance on Siltstone, which represents a majority class in the dataset. Most models achieve relatively stable performance on this facies, with overall accuracy values above 70%. Among all evaluated methods, the Dynamic Weighted Ensemble achieves a balanced trade-off between precision and recall, resulting in a competitive F1-score of 72.17%. Tree-based ensemble models generally perform well on this facies, while deep sequential models exhibit slightly lower but comparable performance, indicating that Siltstone can be reliably identified by multiple modeling paradigms.

Table A1. Classification performance on Siltstone.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
Dynamic Weighted Ensemble	75.08	69.48	75.08	72.17
Static Weighted Ensemble	75.08	69.27	75.08	72.06
CatBoost	72.54	67.66	72.54	70.02
Random Forest	73.19	58.99	73.19	65.33
Transformer	64.22	68.64	64.22	66.35
CNN–BiLSTM	68.12	67.99	68.12	68.05
Extra Trees	72.61	58.55	72.61	64.83
LightGBM	75.99	66.21	75.99	70.77
XGBoost	76.71	64.15	76.71	69.87

Table A2 presents the classification results for Sandstone, which is a clear minority class and exhibits the most challenging classification behavior. Overall performance on this facies is notably lower across all models, particularly in terms of recall. The Dynamic Weighted Ensemble achieves the highest F1-score (49.64%) among the compared methods, reflecting a more balanced handling of precision–recall trade-offs under severe class imbalance. These results highlight the intrinsic difficulty of accurately identifying Sandstone and motivate the use of ensemble strategies to mitigate majority-class bias.

Table A2. Classification performance on Sandstone.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
Dynamic Weighted Ensemble	46.58	53.12	46.58	49.64
Static Weighted Ensemble	41.78	55.96	41.78	47.84
CatBoost	28.77	72.41	28.77	41.18
Random Forest	40.41	58.42	40.41	47.77
Transformer	64.38	31.13	64.38	41.96
CNN–BiLSTM	64.38	24.23	64.38	35.21
Extra Trees	45.89	34.72	45.89	39.53
LightGBM	4.79	77.78	4.79	9.03
XGBoost	4.11	100.00	4.11	7.89

Table A3 summarizes the classification performance on Mudstone, for which most models demonstrate consistently strong results. Accuracy and F1-scores are generally above 85%, indicating that this facies exhibits more distinguishable feature patterns. The Dynamic Weighted Ensemble remains competitive with an F1-score of 88.89%, comparable to both tree-based and deep learning models. This consistency suggests that the ensemble does not degrade performance on well-separated facies while improving robustness on more challenging classes.

Table A3. Classification performance on Mudstone.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1 (%)
Dynamic Weighted Ensemble	87.73	90.08	87.73	88.89
Static Weighted Ensemble	87.83	89.86	87.83	88.84
CatBoost	87.93	88.43	87.93	88.18
Random Forest	81.39	88.46	81.39	84.78
Transformer	87.95	89.20	87.95	88.58
CNN–BiLSTM	85.57	91.05	85.57	88.23
Extra Trees	80.16	89.29	80.16	84.48
LightGBM	87.46	89.44	87.46	88.44
XGBoost	86.04	89.58	86.04	87.77

Table A4 reports the validation performance of all models when depth-related features are removed, serving as an ablation study to evaluate the contribution of depth encoding. Performance degradation is observed across all models, confirming that depth information provides useful contextual cues. Nevertheless, the Dynamic Weighted Ensemble maintains the highest F1-score (0.6492) and consistently outperforms individual base models in terms of Kappa and MCC, indicating that the proposed DCWE mechanism retains strong robustness even in the absence of explicit depth features.

Table A4. No depth feature’s Validation performance of each mode.

Model	F1	Accuracy	Kappa	MCC
Dynamic Weighted Ensemble	0.6492	0.8402	0.6185	0.6190
Static Weighted Ensemble	0.6444	0.8374	0.6139	0.6141
Transformer	0.6347	0.8350	0.6071	0.6115
Random Forest	0.6281	0.7901	0.5301	0.5330
Extra Trees	0.5997	0.7618	0.4875	0.4934
CNN-BiLSTM	0.5885	0.8084	0.5485	0.5514
CatBoost	0.5513	0.8178	0.5814	0.5841
LightGBM	0.5509	0.8286	0.5989	0.6002
XGBoost	0.5361	0.8126	0.5687	0.5713

Table A5 reports the validation performance of all models when geological features are excluded, serving as an ablation to assess the contribution of geology-informed feature representations. A consistent performance degradation is observed across all models compared to the full feature configuration, indicating that geological features provide meaningful discriminative information. Despite the removal of geological features, the Dynamic Weighted Ensemble remains the top-performing method in terms of F1-score (0.6382) and maintains stable Kappa and MCC values, suggesting that the DCWE mechanism preserves robustness even under reduced feature expressiveness.

Table A5. No geology feature’s Validation performance of each mode.

Model	F1	Accuracy	Kappa	MCC
Dynamic Weighted Ensemble	0.6382	0.8058	0.5591	0.5603
Static Weighted Ensemble	0.6355	0.8040	0.5530	0.5538
CatBoost	0.6176	0.7917	0.5257	0.5263
CNN-BiLSTM	0.6169	0.8054	0.5484	0.5532
Transformer	0.5944	0.7753	0.5126	0.5176
Random Forest	0.5854	0.7461	0.4475	0.4509
LightGBM	0.5806	0.8049	0.5484	0.5487
XGBoost	0.5704	0.8059	0.5586	0.5603
Extra Trees	0.5667	0.7458	0.4529	0.4571

Table A6 isolates the contribution of multiscale statistical features by removing them while retaining geological information. Compared to the full feature setting, a noticeable reduction in performance is observed, though the degradation is less severe than in the joint removal scenario shown in Table A6. This comparison suggests that multiscale statistical features play an important role in enhancing classification performance, particularly when combined with geological features. The Dynamic Weighted Ensemble again achieves the highest F1-score (0.6706) among all evaluated models, indicating that the benefits of multiscale representations are consistently leveraged by the proposed ensemble framework.

Table A6. Lack multiscale feature’s Validation performance of each mode.

Model	F1	Accuracy	Kappa	MCC
Dynamic Weighted Ensemble	0.6706	0.7879	0.5272	0.5320
Static Weighted Ensemble	0.6609	0.7833	0.5189	0.5246
CNN-BiLSTM	0.6530	0.7967	0.5490	0.5552
Transformer	0.6427	0.7898	0.5327	0.5345
CatBoost	0.6403	0.7680	0.4913	0.4990
Random Forest	0.6372	0.7487	0.4677	0.4824
Extra Trees	0.6106	0.7360	0.4489	0.4622
XGBoost	0.5159	0.7550	0.4688	0.4799
LightGBM	0.5109	0.7510	0.4588	0.4696

Table A7 presents a more aggressive ablation setting, in which both geological features and multiscale statistical features are removed. This configuration leads to a further and systematic decline in performance across all evaluation metrics and models. The monotonic decrease in F1-score and agreement-based metrics (Kappa and MCC) compared to Table A5 demonstrates that multiscale statistical features contribute additional complementary information beyond geological descriptors alone. Nevertheless, the Dynamic Weighted Ensemble continues to outperform individual base models, highlighting the resilience of the ensemble strategy under increasingly constrained feature conditions.

Table A7. Lack multiscale and geology feature’s Validation performance of each mode.

Model	F1	Accuracy	Kappa	MCC
Dynamic Weighted Ensemble	0.6297	0.7639	0.4892	0.4943
Static Weighted Ensemble	0.6252	0.7642	0.4892	0.4939
CNN-BiLSTM	0.5978	0.7646	0.4944	0.4993
CatBoost	0.5877	0.7527	0.4629	0.4666
Random Forest	0.5629	0.6964	0.3898	0.4036
Transformer	0.5512	0.7247	0.4441	0.4561
XGBoost	0.5441	0.7553	0.4638	0.4700
Extra Trees	0.5422	0.6927	0.3800	0.3916
LightGBM	0.5384	0.7606	0.4700	0.4745

References

Shi, H.; Xu, Z.; Lin, P.; Ma, W. Refined lithology identification: Methodology, challenges and prospects. Geoenergy Sci. Eng. 2023, 231, 212382. [Google Scholar] [CrossRef]
Lin, X.; Yin, S. Lithology identification based on interpretability integration learning. Earth Sci. Inform. 2023, 16, 2211–2222. [Google Scholar] [CrossRef]
Hao, R.; Huang, W.; Bo, J.; Yuan, L. Fractal characteristics and main controlling factors of high-quality tight sandstone reservoirs in the southeastern Ordos Basin. J. Earth Sci. 2024, 35, 631–641. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, Y.; Li, Y.; Sun, L.; He, X. Microscopic pore–throat architecture and fractal heterogeneity in tight sandstone reservoirs. Appl. Sci. 2025, 15, 5730. [Google Scholar] [CrossRef]
Zhao, H.; Zhang, M.; Li, Q.; Wang, R.; Xu, J. Study on rock fracture mechanism and hydraulic fracturing propagation law of heterogeneous tight sandstone reservoir. PLoS ONE 2024, 19, e0303251. [Google Scholar] [CrossRef]
Lai, J.; Wang, G.; Fan, Q.; Zhao, F.; Zhao, X.; Li, Y.; Zhao, Y.; Pang, X. Toward the scientific interpretation of geophysical well logs: Typical misunderstandings and countermeasures. Surv. Geophys. 2023, 44, 463–494. [Google Scholar] [CrossRef]
You, F.; Liu, G.; Zhang, Y.; Ren, Z. Logging identification and distribution characteristics of high-gamma sandstone interlayers in Chang 7 Member of the Ordos Basin. Pet. Geol. Exp. 2023, 45, 99–108. [Google Scholar]
Wang, Y.; Yagola, A.G.; Gelius, L.J. Advances in geophysical inverse problems. Front. Earth Sci. 2023, 11, 1258335. [Google Scholar] [CrossRef]
Rodriguez, O.; Taylor, J.M.; Pardo, D. Multimodal variational autoencoder for inverse problems in geophysics. Geophys. J. Int. 2023, 235, 2598–2613. [Google Scholar] [CrossRef]
Mukherjee, B.; Kar, S.; Sain, K. Machine learning assisted state-of-the-art of petrographic classification from geophysical logs. Pure Appl. Geophys. 2024, 181, 2839–2871. [Google Scholar] [CrossRef]
Zhang, J.; He, Y.; Zhang, Y.; Liu, T.; Li, D. Well-logging-based lithology classification using machine learning algorithms. Energies 2022, 15, 3675. [Google Scholar] [CrossRef]
Han, R.; Wang, Z.; Wang, W.; Xu, F.; Qi, X.; Cui, Y. Lithology identification of igneous rocks based on XGBoost and conventional logging curves. J. Appl. Geophys. 2021, 195, 104480. [Google Scholar] [CrossRef]
Su, H.; Zhang, Y.; Li, X.; Wang, G.; Liang, W. Lithology identification from well-log curves via neural networks with stratigraphic constraints. Geophysics 2021, 86, IM85–IM97. [Google Scholar]
Chen, L.; Wang, X.; Liu, Z. Geological information-driven deep learning for lithology identification from well logs. Front. Earth Sci. 2025, 13, 1662760. [Google Scholar] [CrossRef]
Shi, Y.; Liao, J.; Gan, L.; Tang, R. Lithofacies prediction from well log data based on deep learning. Appl. Sci. 2024, 14, 8195. [Google Scholar] [CrossRef]
Pothana, P.; Ling, K. Physics-integrated neural networks for improved mineral volumes and porosity estimation from geophysical well logs. Energy Geosci. 2025, 6, 100410. [Google Scholar] [CrossRef]
Sun, Y.; Pang, S.; Qiu, Z.; Zhang, Y. Efficient lithology classification from small-sample well logging data processed by wavelet thresholding algorithm: Integrating meta-learning with self-attention mechanism model. Geoenergy Sci. Eng. 2024, 246, 213629. [Google Scholar] [CrossRef]
Yan, Z.; Du, H.; Gang, K.; Zhang, L.; Chen, Y.-C. Dynamic weighted selective ensemble learning algorithm for imbalanced data streams. J. Supercomput. 2022, 78, 5394–5419. [Google Scholar] [CrossRef]
Wood, D.A. Optimized feature selection assists lithofacies machine learning with sparse well log data combined with calculated attributes in a gradational fluvial sequence. Artif. Intell. Geosci. 2022, 3, 132–147. [Google Scholar] [CrossRef]
Dong, S.; Wang, L.; Zeng, L.; Du, X.; Ji, C.; Hao, J.; Xu, Y.; Li, H. Fracture identification in reservoirs using well log data by window-sliding recurrent neural network. Geoenergy Sci. Eng. 2023, 230, 212165. [Google Scholar] [CrossRef]
Jiang, C.; Zhang, D. Lithofacies identification from well-logging curves via integrating prior knowledge into deep learning. Geophysics 2024, 89, D31–D41. [Google Scholar] [CrossRef]
Zhang, J.; Liu, G.; Wei, Z.; Li, S.; Zayier, Y.; Cheng, Y. Machine-learning-based prediction of well logs guided by rock physics and its interpretation. Sensors 2025, 25, 836. [Google Scholar] [CrossRef] [PubMed]
Bestagini, P.; Lipari, V.; Tubaro, S. A machine learning approach for facies classification using well logs. In Proceedings of the SEG Technical Program Expanded Abstracts, Houston, TX, USA, 24–29 September 2017; pp. 2137–2142. [Google Scholar]
Tobler, W.R. A computer movie simulating urban growth in the Detroit region. Econ. Geogr. 1970, 46, 234–240. [Google Scholar] [CrossRef]
Hinnov, L.A. Cyclostratigraphy and its revolutionizing applications in the earth and planetary sciences. Geol. Soc. Am. Bull. 2013, 125, 1703–1734. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Long Beach, CA, USA, 2017; Volume 30, pp. 5998–6008. [Google Scholar]
Davis, J.C. Statistics and Data Analysis in Geology, 3rd ed.; John Wiley & Sons: New York, NY, USA, 2002. [Google Scholar]
Ellis, D.V.; Singer, J.M. Well Logging for Earth Scientists, 2nd ed.; Springer: Dordrecht, The Netherlands, 2007. [Google Scholar]
Asquith, G.B.; Krygowski, D. Basic Well Log Analysis; AAPG: Tulsa, OK, USA, 2004. [Google Scholar]
Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
Zar, J.H. Biostatistical Analysis, 5th ed.; Pearson: Upper Saddle River, NJ, USA, 2010. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 3146–3154. [Google Scholar]
Varoquaux, G.; Buitinck, L.; Louppe, G.; Grisel, O.; Pedregosa, F.; Mueller, A. Scikit-learn: Machine learning without learning the machinery. GetMobile 2015, 21, 29–33. [Google Scholar] [CrossRef]
Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform manifold approximation and projection for dimension reduction. J. Open Source Softw. 2018, 3, 861. [Google Scholar] [CrossRef]
Liu, N.; Zhang, Z.; Zhang, H.; Wang, Z.; Gao, J.; Liu, R.; Zhang, N. Multiple-frequency attribute blending via adaptive uniform manifold approximation and projection and its application on hydrocarbon reservoir delineation. Geophysics 2024, 89, WA195–WA206. [Google Scholar] [CrossRef]
Hua, Y.; Gao, G.; He, D.; Wang, G.; Liu, W. Reservoir fluid identification based on multi-head attention with UMAP. Geoenergy Sci. Eng. 2024, 238, 212888. [Google Scholar] [CrossRef]
Daw, A.; Karpatne, A.; Watkins, W.; Read, J.; Kumar, V. Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling. arXiv 2017, arXiv:1710.11431. [Google Scholar]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Ko, A.H.R.; Sabourin, R.; Britto, A.S. From dynamic classifier selection to dynamic ensemble selection. Pattern Recognit. 2008, 41, 1718–1731. [Google Scholar] [CrossRef]
Nguyen, T.T.; Luong, A.V.; Dang, M.T.; Liew, A.W.-C.; McCall, J. Ensemble selection based on classifier’s confidence in prediction. Pattern Recognit. 2020, 100, 107104. [Google Scholar] [CrossRef]
Cruz, R.M.O.; Hafemann, L.G.; Sabourin, R.; Cavalcanti, G.D.C. DESlib: A Dynamic ensemble selection library in Python. J. Mach. Learn. Res. 2020, 21, 1–5. [Google Scholar]

Figure 1. Label Distribution Barchart.

Figure 2. Mean Comparison by Label.

Figure 3. Median Comparison by Label.

Figure 4. Statistics Comparison by Label.

Figure 5. 298 well’s lithofacies and logging curves.

Figure 6. 146 well’s lithofacies and logging curves.

Figure 7. 2 well’s lithofacies and logging curves.

Figure 8. 2010 well’s lithofacies and logging curves.

Figure 9. Workflow of the proposed intelligent lithology identification method.

Figure 10. Feature Engineering Diagram.

Figure 11. Architecture Diagram.

Figure 12. UMAP plot.

Figure 13. Prediction of petrofacies and confidence distribution.

Figure 14. Comparison of multi-model lithofacies verification for Well 298.

Figure 15. Confusion Matrix.

Figure 16. Per-Class macro-F1 Distribution Across Models.

Table 1. Color Scheme for the Rock Types.

Lithofacies Type	Color	Lithofacies Type	Color	Lithofacies Type	Color
Siltstone (0)	Red	Sandstone (1)	Blue	Mudstone (2)	Green

Table 2. 10 representative features.

No.	Feature Name	Category (Method 3.1.2)	Feature Description
1	GR_roll_min_15	Sliding Window Extremum Feature	Extracts the local minimum of GR within a 15-point window to enhance the continuity identification of low-gamma sand bodies.
2	GR_grad	First-order Gradient Feature	Describes the abrupt gamma changes between adjacent measurement points to identify sand-mud boundaries.
3	DEPTH_normalized	Depth Normalization Feature	Imparts dimensionless attributes to depth, enabling the model to learn the relative positional relationships of sequences.
4	DEPTH_sin	Depth Positional Encoding (sin)	Maps linear depth to periodic signals to characterize the rhythmic formation changes.
5	GR_roll_q10_11	Multi-scale Quantile Feature	Extracts the 10% quantile of GR within an 11-point window to robustly characterize low-gamma anomalies.
6	GR_roll_iqr_9	Multi-scale Dispersion Feature (IQR)	Represents the dispersion of the GR value distribution within a 9-point window to capture local curve variation patterns.
7	GR_zscore_19	Local Standardization Feature	Reflects the deviation relative to the local window mean and variance to detect anomalous intervals.
8	AC_pct_12	Relative Rate of Change Feature	Captures relative changes in Acoustic Transit Time (AC) to identify porosity enhancement or tightening zones.
9	sandstone_indicator	Petrophysics-derived Feature	A sandstone probability indicator constructed based on the joint GR–SP–AC response to improve physical interpretability.
10	GR_SP_product	Curve Combination Feature (GR–SP Joint)	Represents the complementary relationship between Gamma Ray and Spontaneous Potential to strengthen the separability of muddy and sandy layers.

Table 3. Validation performance of each model based on original features for Well 298.

Model	Acc (Original)	Acc (Feature)	Macro F1 (Original)	Macro F1 (Feature)	MCC (Original)	MCC (Feature)	Kappa (Original)	Kappa (Feature)
DCWE (Ours)	0.6927	0.8439	0.4970	0.7033	0.3175	0.6373	0.3164	0.6369
Static Ensemble	0.5771	0.8393	0.4444	0.6787	0.2922	0.6262	0.2641	0.6258
Transformer	0.5976	0.8291	0.4431	0.6741	0.3006	0.6133	0.2765	0.6123
LightGBM	0.4764	0.8277	0.3814	0.5684	0.2125	0.5979	0.1769	0.5965
XGBoost	0.4321	0.8256	0.3601	0.5732	0.1736	0.5967	0.1363	0.5950
CatBoost	0.4244	0.8178	0.3552	0.6455	0.1787	0.5778	0.1390	0.5768
CNN-BiLSTM	0.5314	0.8132	0.4208	0.6380	0.2507	0.5782	0.2188	0.5772
Random Forest	0.5116	0.7908	0.3996	0.6601	0.2049	0.5374	0.1785	0.5336
Extra Trees	0.4672	0.7689	0.3811	0.6191	0.2090	0.5042	0.1710	0.4986

Table 4. Performance Comparison.

Model (Model)	Accuracy (Mean)	Macro F1 (Mean)	MCC (Mean)	Kappa (Mean)
DCWE (Ours)	0.8186	0.6661	0.5774	0.5733
Static Ensemble	0.8175	0.5932	0.5649	0.5601
Transformer	0.8099	0.5893	0.5644	0.5618
LightGBM	0.8026	0.5163	0.5321	0.5233
CatBoost	0.7982	0.5313	0.5222	0.5147
CNN-BiLSTM	0.7940	0.6302	0.5263	0.5185
Extra Trees	0.7869	0.5716	0.5067	0.4955
Random Forest	0.7813	0.5645	0.4935	0.4845
XGBoost	0.7775	0.5254	0.5292	0.5213

Table 5. Comparison of Generalization Abilities.

Model (Model)	Accuracy (Range)	Macro F1 (Range)	MCC (Mean)	Kappa (Mean)
DCWE (Ours)	0.7905–0.8458	0.5119–0.7832	0.5774	0.5733
Static Ensemble	0.7905–0.8452	0.5106–0.6787	0.5649	0.5601
Transformer	0.7550–0.8638	0.5290–0.6741	0.5644	0.5618
LightGBM	0.7617–0.8277	0.4967–0.5684	0.5321	0.5233
CatBoost	0.7673–0.8272	0.4847–0.6455	0.5222	0.5147
CNN-BiLSTM	0.7123–0.8711	0.5391–0.7736	0.5263	0.5185
Extra Trees	0.6958–0.8877	0.4587–0.6633	0.5067	0.4955
Random Forest	0.7313–0.8638	0.4779–0.6601	0.4935	0.4845
XGBoost	0.7015–0.8256	0.4911–0.5732	0.5292	0.5213

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, K.; Zhong, G.; Diao, F. Clastic Rock Lithology Identification Based on Multivariate Feature Enhancement and Dynamic Confidence-Weighted Ensemble. Appl. Sci. 2026, 16, 1808. https://doi.org/10.3390/app16041808

AMA Style

Chen K, Zhong G, Diao F. Clastic Rock Lithology Identification Based on Multivariate Feature Enhancement and Dynamic Confidence-Weighted Ensemble. Applied Sciences. 2026; 16(4):1808. https://doi.org/10.3390/app16041808

Chicago/Turabian Style

Chen, Kang, Guoyun Zhong, and Fan Diao. 2026. "Clastic Rock Lithology Identification Based on Multivariate Feature Enhancement and Dynamic Confidence-Weighted Ensemble" Applied Sciences 16, no. 4: 1808. https://doi.org/10.3390/app16041808

APA Style

Chen, K., Zhong, G., & Diao, F. (2026). Clastic Rock Lithology Identification Based on Multivariate Feature Enhancement and Dynamic Confidence-Weighted Ensemble. Applied Sciences, 16(4), 1808. https://doi.org/10.3390/app16041808

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Clastic Rock Lithology Identification Based on Multivariate Feature Enhancement and Dynamic Confidence-Weighted Ensemble

Abstract

1. Introduction

2. Research Data

3. Methodology

3.1. Characteristic Engineering and Screening

3.1.1. Data Loading and Synchronization Correction

3.1.2. Design of Feature Generation Engine

3.1.3. Feature Combination and Cleaning

3.2. Model Training and Dynamic Integration

3.2.1. Workflow of the Proposed Method

3.2.2. Architecture of the Proposed Model

Multi-Model Prediction Layer

Dynamic Confidence-Weighted Ensemble Layer

4. Experimental Results and Discussion

4.1. Experimental Setup

4.2. Verification of the Effectiveness of the Characteristic Engineering

4.2.1. Analysis of Separability in Feature Space

4.2.2. Original Features vs. Engineered Features

4.3. Comparison of Integration Strategies

4.4. Analysis of Model Robustness and Generalization Ability

4.5. Visual Analysis

4.5.1. Pairwise Comparison Analysis of Model Classification Performance

4.5.2. Per-Class Macro-F1 Distribution Box Plots for Different Models

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Related Work—Physics-Informed & Sample-Aware Learning

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI