Optimization of Gas Production Using Machine Learning Modeling of Geological Core Facies and Monte Carlo Simulation: Application in the Permian, Southwest Kansas

Amosu, Adewale; Reyes, Martin; Sibaweihi, Najmudeen; Koray, Abdul-Muaizz; Appiah Kubi, Emmanuel; Gyimah, Emmanuel; Agyei, Emmanuel; Ampomah, William

doi:10.3390/app16052436

Open AccessArticle

Optimization of Gas Production Using Machine Learning Modeling of Geological Core Facies and Monte Carlo Simulation: Application in the Permian, Southwest Kansas

by

Adewale Amosu

^1,*

,

Martin Reyes

²,

Najmudeen Sibaweihi

¹

,

Abdul-Muaizz Koray

³

,

Emmanuel Appiah Kubi

³

,

Emmanuel Gyimah

²,

Emmanuel Agyei

^1,3

and

William Ampomah

^1,3

¹

Petroleum Recovery Research Center, New Mexico Tech, Socorro, NM 87801, USA

²

New Mexico Bureau of Geology, New Mexico Tech, Socorro, NM 87801, USA

³

Department of Petroleum and Natural Gas Engineering, New Mexico Tech, Socorro, NM 87801, USA

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(5), 2436; https://doi.org/10.3390/app16052436

Submission received: 9 January 2026 / Revised: 12 February 2026 / Accepted: 27 February 2026 / Published: 3 March 2026

(This article belongs to the Special Issue Applications of Artificial Intelligence and Big Data Analytics in Petroleum Engineering)

Download

Browse Figures

Versions Notes

Abstract

The Panoma Field in the Hugoton Embayment, Kansas, has produced significant gas resources from thousands of wells perforating the Permian Chase and Council Grove Groups. Variability in gas production from these formations is controlled by facies-influenced petrophysical properties. The use of geological facies data in numerical modeling is often limited to delineating regions of interest without intrinsic use in estimating petrophysical properties. Machine learning provides opportunities to integrate facies data into the numerical model-building process. In this study, we employ facies data in optimizing a numerical model permeability matrix scaling parameter using Monte Carlo Simulation of Markov Switching Dynamic Regression and machine learning. Realizations of the scaling parameter are included in a machine learning facies prediction workflow to identify the parameter that maximizes facies prediction accuracy, with test accuracy as high as 83%. A 3D numerical model was constructed to represent the interlayered carbonate, shale, and non-marine sandstones facies typical of the Council Grove intervals. Multiple field development and completion scenarios were evaluated to maximize cumulative gas recovery and assess the role of facies distribution on reservoir performance. History matching results of historical gas production demonstrate strong coupling between facies distribution and the optimized permeability, emphasizing the importance of facies data integration in reservoir property modeling and gas production estimation in Permian reservoirs. This implies that probabilistically constrained permeability scaling using the Monte Carlo and machine learning workflow produces more realistic modeling compared to traditional approaches.

Keywords:

machine learning; facies; Artificial Intelligence; Monte Carlo; Markov Switching Dynamic Regression; Permian

1. Introduction

Gas was first discovered in Hugoton Area, Kansas, in 1922; the area has since grown to become one of the largest sources of natural gas in the United States [1]. Due to the prolific nature of gas production, the area has been the subject of many projects and studies [2,3,4,5,6,7,8,9,10]. Yale and Jamieson [11] measured static and dynamic mechanical properties on the core from the Chase and Council Grove carbonate sequences in the Hugoton and Panoma fields to quantify how mechanical behavior varies with lithofacies and porosity and calibrate dynamic estimates to the more representative static values. Dubois et al. [5] produced practical facies classification datasets and demonstrated neural network methods to predict lithofacies in uncored wells for the Council Grove interval. Dubois et al. [12] established a multiscale workflow linking pore-scale petrophysics to field-scale models for the Permian Hugoton Field. Sorenson [13] developed a basin-scale dynamic model for the Permian Panhandle Hugoton region that integrates stratigraphy, structure, burial history, fluid flow, and diagenesis to explain the origin and distribution of hydrocarbons in the giant Hugoton Panhandle gas system. The study demonstrated that reservoir quality and gas accumulation are fundamentally controlled by the interaction of depositional facies with postdepositional processes, particularly differential burial, thermal maturation, and long-term hydrodynamic flow. Halotel et al. [10] demonstrated that incorporating geologically derived features such as stratigraphic context, spatial relationships, and expert-defined attributes substantially improves the performance, robustness, and interpretability of machine learning facies classification compared to using well log data.

Geological facies record lithology, texture, and the history of depositional processes. The spatial arrangement of facies is a fundamental control on petrophysical properties and fluid distribution in sedimentary reservoirs. Accurate representation of facies architecture in reservoir models helps improve upscaled property distribution used in simulation, history matching, and uncertainty quantification and decisions about production strategies.

Several studies have examined the influence of geological facies on petrophysical models and numerical modeling. The inclusion of facies in model building is augmented by the application of machine learning and geostatistical methods. Machine learning provides tools to discover new patterns, structures, and relationships in subsurface measurements that conventional techniques might not easily uncover [14,15,16,17]. Sharma et al. [18] present a core-driven hierarchical facies modeling approach for shoreface environments that explicitly links sedimentological observations to reservoir-scale facies architecture. Their methodology organizes facies information across multiple hierarchical levels, beginning with high-resolution core-derived facies and stacking patterns, which are then upscaled and conditioned to well and seismic scale interpretations. Amosu et al. [19] address the challenge of optimizing well placement by accurately characterizing reservoir electrofacies and developing a robust machine learning workflow that integrates multiple unsupervised clustering techniques into a committee machine framework. Bui et al. [20] investigate the integration of a machine-learning-based workflow with conventional numerical reservoir simulation to improve oil recovery in sand shale sequences and highly heterogeneous reservoirs. Ao et al. [21] presented an automated workflow for facies classification and its direct integration into reservoir simulation using Random Forest algorithms. The study focused on reducing interpreter subjectivity and improving the consistency of facies models by applying supervised machine learning to well log and petrophysical data. Random Forests were shown to effectively capture nonlinear relationships between input attributes and facies types, providing robust classification performance even with limited training data. Milad et al. [22] demonstrated the application of machine learning techniques to predict geological facies in complex carbonate reservoirs, where strong heterogeneity and multiscale depositional and diagenetic processes challenge conventional interpretation methods. Using well log-derived inputs, the authors trained supervised learning models to classify carbonate facies and evaluate their ability to capture nonlinear relationships between petrophysical responses and geological facies types. Multipoint statistical methods are well-suited for facies modeling. Zhang et al. [23] presented an optimized multipoint geostatistical facies modeling approach that replaces the explicit storage and sequential scanning of multipoint statistics with a deep feedforward neural network trained on patterns extracted from a training image. Strebelle [24] introduces a geostatistical framework for the conditional simulation of complex geological structures using multiple point statistics, addressing fundamental limitations of traditional two-point variogram-based methods.

Machine learning has been routinely integrated into history matching and reservoir simulation. Koray et al. [25] present a machine-learning-driven framework for reservoir characterization and numerical modeling using integrated well-log and core data, addressing challenges associated with data heterogeneity. The study uses machine learning techniques to identify lithofacies and predict key petrophysical properties by leveraging core measurements and continuous well log responses to improve the representation of reservoir heterogeneity. Demyanov et al. [26] investigate how geological feature selection can be systematically integrated into reservoir modeling and history matching using multiple kernel learning. The study found that multiple kernel learning provides an effective, data-driven way to identify and weight the most influential geological features that control dynamic reservoir response. Zhang et al. [27] introduced a fused generative adversarial network to address key challenges in reservoir history matching, particularly the efficient integration of heterogeneous data and the reduction in computational cost associated with traditional iterative workflows. Canchumuni et al. [28] explored the use of deep learning as a surrogate modeling framework for geological facies modeling and reservoir history matching, aiming to overcome the high computational cost of conventional simulation-based approaches. Park et al. [29] address how to quantify uncertainty in facies-based reservoir models when there are multiple discrete geological interpretations available and only limited production data. Their workflow uses the production data to estimate the posterior probability of each training image, reject those that are inconsistent with the observed production responses, and perform history matching within each retained image using a geologically realistic perturbation method to obtain posterior facies realizations and flow forecast uncertainty.

In this study, a novel machine learning workflow is developed for optimizing petrophysical properties in numerical model building. A Monte Carlo Simulation of Markov Switching Dynamic Regression is combined with machine learning to optimize a permeability scaling parameter with the goal of optimizing gas production. Several supervised machine learning classifiers were evaluated for facies prediction using input features such as gamma-ray response, resistivity, porosity, depositional history, and a permeability matrix scaling parameter. To account for uncertainty in permeability, multiple realizations of the matrix scaling parameter were generated via Monte Carlo simulation and incorporated as machine learning features. For each depth sample in each well, the matrix scaling parameter yielding the highest facies prediction accuracy was selected, resulting in an optimized permeability scaling parameter. This optimized permeability matrix scaling parameter was then propagated into a three-dimensional petrophysical model, and model performance was quantitatively assessed through history matching against observed production data.

2. Geological Setting

The Panoma Gas Field (Figure 1) is part of the Hugoton Area, a giant gas-producing region in southwest Kansas, forming the northern shelf, or embayment, of the much larger, deeper Anadarko Basin. The Anadarko Basin Province encompasses a small portion of southeastern Colorado and the northernmost part of Texas, with a major portion of the province situated in northwestern Oklahoma and southeastern Kansas [30,31]. The Anadarko Basin is bounded by the Nemaha uplift to the east; the Arbuckle Mountains and Ardmore basin to the southeast, and the Amarillo–Wichita uplift to the southwest [32]. This province contains a thick Paleozoic record of ~40,000 ft with a south-dipping asymmetric shape, reaching a maximum depocenter proximal to the Amarillo–Wichita uplift, and a thinner sequence prograding to the southeast at the Hugoton embayment, where the basement is reached at less than ~5000 ft depth [31]. The Hugoton embayment is a cratonward expression of the northern shelf of the Anadarko basin. It is delimited by the Central Kansas uplift and the Cambridge arch to the northeast and by the Las Animas arch to the northwest [32,33].

The tectonic configuration of the Anadarko Basin significantly influenced sedimentation, basin development, geometry, depositional environments, and other features that controlled economic prospectivity [30]. The current basin configuration of the Anadarko basin appeared during the early Pennsylvanian [32]. Initially, the basin province underwent rifting, magmatism, and igneous emplacement during the Precambrian and Early–Middle Cambrian periods [34,35]. An epeirogenic event and epicontinental sea with marine sedimentation occurred during the development of the initial Oklahoma aulacogen from the Late Cambrian to the Mississippian [32]. Early Cambrian subsidence in the intracratonic rift increased depositional rates near the present-day basin axis; subsidence slowed by the beginning of the Silurian and Devonian, as indicated by the coeval thin deposits [32].

The Oklahoma aulacogen was fragmented, allowing the formation of uplifts and major basins, such as the Amarillo–Wichita uplift and the Anadarko Basin, during the Pennsylvanian period due to compressional events associated with the Ouachita–Marathon orogeny [32]. The structural development of the Anadarko basin began in the latest Mississippian time and intensified during the Pennsylvanian in response to continental collision along the southern margin of Paleozoic North America [35].

A cratonic uplift episode occurred during the Permian and persists to the present. This uplift event filled the Anadarko Basin Province with Permian red beds, evaporites, and carbonates, as well as post-Permian deposits. However, most of these post-Permian strata have been eroded from the Anadarko Basin Province [32].

Stratigraphy of the Study Area

The Anadarko Basin exhibits a pronounced south–north gradient in accommodation, reflecting its asymmetric flexural geometry and proximity to basement-involved uplifts. Maximum Paleozoic thicknesses accumulate along the southern basin axis near the Amarillo–Wichita uplift [30,32], whereas strata thin markedly toward the craton as subsidence decreases across the Anadarko Shelf [31]. The Hugoton Embayment represents the shallow, northern shelfal extension of this system, characterized by gently dipping Paleozoic successions, widespread carbonate platform development, and a systematic transition from deeper basinal facies to mixed carbonate–siliciclastic shelf environments [30]. These south-to-north lithostratigraphic variations reflect the tectono-stratigraphic partitioning of the Anadarko basin during Paleozoic basin evolution. Northward, stratigraphic thickness decreases markedly across the Anadarko Shelf as accommodation diminishes toward the craton.

In northwestern Kansas, where the Hugoton embayment is located, the basement is characterized beneath the embayment as principally composed of Proterozoic crystalline rocks, such as granite and granitic gneiss, which belong to the Western Granite–Rhyolite province [30]. These terranes were later affected by Mesoproterozoic intrusions and by localized Cambrian rift-related magmatism associated with the Southern Oklahoma aulacogen [31]. This basement setting controls the basin geometry and subsidence toward the northern Anadarko, developing the Anadarko Shelf and the Hugoton Embayment [30].

During the Late Cambrian, the Anadarko basin province was characterized by widespread shallow-marine deposition during a prolonged epeirogenic phase, resulting in a thick, carbonate-dominated stratigraphic succession that extends well beyond basin margins [30,32]. Sedimentation began with the transgression region over a low-relief erosion surface, followed by continuous shallow limestone deposits, which reached 6000 ft in thickness in the depocenter and decreased northward with truncation along the northern flank within the Hugoton Embayment [32]. From the Middle Ordovician to the early Mississippian, shallow-marine carbonates with minor siliciclastics from eastern sources accumulated across the basin, reaching a thickness of ~500–4000 ft. Deposition was interrupted by two broad epeirogenic uplifts that generated major unconformities [30,32].

Mississippian strata are composed of limestones and dolomitized carbonates, representing widespread carbonate platform development and regional marine connectivity across the Midcontinent. This sedimentation reflects a transition from Late Devonian–Early Mississippian euxinic conditions to well-oxygenated shallow-marine environments, and increasing shale content toward the southern Anadarko basin; the total Mississippian thickness decreases from approximately 3000 ft in the basin depocenter to 1000 ft along the northern shelf [32,34].

Pennsylvanian deposits along the western side of the Hugoton Embayment exhibit clastic sequences of red siltstones, shales, and sandstones. These cyclicities reflect sea level fluctuations, resulting in shallow-marine carbonates interbedded with siliciclastic deposits, which decrease in thickness gradually toward the northern shelf. Shale deposits are thinner and more sparse due to the limited accommodation space in the Hugoton Embayment [32]. The late epeirogenic event of the Anadarko Basin province started in the Permian and remained active through the Holocene. The Permian strata within the Hugoton Embayment is characterized by carbonates, red beds, and evaporites, restricted to marine environments and arid climatic conditions [30,31].

The Early Permian the Anadarko Basin was structurally inactive [35], based on non-variation in thickness of early Permian units such as the Council Grove and Chase Groups (Figure 2), although Late Permian units reflect less uniform and more irregular thicknesses due to non-deposition and uplift, tilting and erosion attributed to the Laramide Orogeny [30], according to Johnson [32], and the Laramide orogeny was generated slight eastward tilting along the Anadarko Basin. A modest reactivation of older basin faults may have occurred during the Jurassic and Cretaceous periods, and minor Holocene movement has been documented along the Meers fault of the Wichita Mountain uplift, as well as possibly along other faults within the basin [35].

3. Data and Methodology

3.1. Data Description

The open source data [5,6] used in this study consist of well data from nine wells penetrating the Council Grove Group (Grenola Limestone to the Speiser Shale) in the Panoma Gas Field and Hugoton Area. The rocks of the Panoma field are composed of varying quantities of four mineral constituents, calcite, dolomite, quartz, and clay, and the relative ratios of these minerals define the fundamental rock type. The dataset includes logging measurements categorized into 9 facies types derived from core analysis (Table 1), identified based on the rock type and texture of the rock. The data consists of gamma ray (GR), deep induction resistivity (ILD), photoelectric factor (PE), average of density and neutron porosity (DeltaPHI), the difference between the neutron and the density porosity (PHIND), and manual lithofacies interpretation. The data also include two geologically derived features; a marine/non-marine indicator (NM_M) and the relative position (RelPos) in the stratigraphic cycle. Approximately 80% of the facies have marine depositional environments, while 20% have non-marine depositional environments. Figure 3 shows a correlation section from three wells constructed using the described data. The data and well location information are used to build a 3D numerical model.

Exploratory data analysis is implemented with the goal of understanding the structure, patterns, and characteristics of the data. Data preprocessing remains crucial for exploratory data analysis as it improves data quality and removes redundant data and inconsistent data. The histogram chart in Figure 4 is useful for understanding the univariate distribution of a single variable. It reveals the data range (minimum to maximum), skewness and missing data. This aids in data normalization to help prevent biased model training. The heatmap in Figure 5 and the cross plot in Figure 6 both reveal the rock physics relationship between data. The heatmap generates a numeric value for the relationship, whereas the cross plot shows if it is positive, negative, or pairwise or if there is no correlation. The main goal is to ensure that the data fed into a machine learning model is robust, representative, and primed for optimal performance.

3.2. Workflow Description

This work involves integrating machine learning techniques and Monte Carlo simulation to model petrophysical properties with the aim of optimizing gas production (Figure 7). Clusters were generated using the k-means unsupervised machine learning (ML) algorithm. The clusters generated are distinct and can be visualized using the t-SNE (t-distributed Stochastic Neighbor Embedding) algorithm. Supervised classification algorithms were evaluated for facies prediction using a set of input features that included gamma-ray response, resistivity, porosity, depositional history, and a permeability matrix scaling parameter. To explicitly account for uncertainty in permeability scaling, a Monte Carlo simulation was used to generate multiple realizations of the matrix scaling parameter, which were incorporated as additional features in the machine learning models.

3.3. Unsupervised Facies Classification

K-Means Clustering

The k-means clustering technique is a widely used unsupervised machine learning method that classifies data into k distinct clusters by reducing the variance within the cluster. This clustering method assigns data points to the nearest cluster centroid using an iterative method that depends on a distance metric and then updates each centroid as the mean position of the points assigned to it. This iterative assignment update process continues until convergence, defined by negligible changes in centroid positions or cluster memberships. Although the classical k-means algorithm is sensitive to the initial placement of centroids, modern implementations frequently employ improved initialization strategies such as k-means++ to enhance convergence stability and clustering performance [36]. Recent studies [37,38] highlight the fact that the K-means clustering technique is continuously improved upon with regard to its scalability, efficiency, and computational speed. This is evident when dealing with classification on a relatively large dataset with high dimensionality.

In this study, the k-means clustering method was applied to identify seven distinct clusters present in the dataset (Figure 8). Features used include GR, ILD, PE, DeltaPHI, PHIND, and NM_M, RELPOS. This was based on the similarities between the features without the use of predefined data labels. Increasing the number of groups generates overlapping clusters, suggesting similarities between some facies. Although very simple in operation, the k-means clustering technique is effective and widely adopted due to its robustness and computational efficiency [39,40].

3.4. Supervised Facies Classification

The study utilized various supervised machine learning methods to classify the different facies present from the well log data. This approach is effective when used in heterogeneous reservoirs where the changes in lithology result in complex nonlinear petrophysical measurements [41]. Several supervised machine learning algorithms were tested, and based on the properties and the accuracy of prediction, the following classification methods were implemented: Fine KNN, Fine Gaussian SVM, Bagged Tree Ensemble and the Wide Neural Network algorithms. KNN-based algorithms are generally more flexible than linear models and often outperform Naïve Bayes when facies overlap. Fine Gaussian SVM and Bagged Tree Ensemble are often more robust than standalone decision trees with respect to noise and outliers. Wide Neural Networks are generally more stable than deep neural networks, especially for large datasets. Section 3.4.1, Section 3.4.2, Section 3.4.3 and Section 3.4.4 below discuss the details of the methods. Table 1 shows a comparison of the methods.

3.4.1. Fine K-Nearest Neighbors (KNN)

Fine K-Nearest Neighbors is a non-parametric, instance-based learning algorithm that assigns a facies label to an unseen sample based on the dominant class among its nearest neighbors in feature space. For a given input vector, the predicted facies class is given by

ŷ = \binom{\arg m a x}{c} \sum_{i ϵ N_{k} (x)} I (y_{i} = c)

(1)

where

N_{k} (x)

denotes the set of nearest training samples and

I (.)

is the indicator function.

Euclidean distance was used as the similarity metric, as shown in the equation below:

d (x_{i}, x_{j}) = \sqrt{\sum_{m = 1}^{p} {(x_{i m} - x_{j m})}^{2}}

(2)

In this study, the fine KNN algorithm was adopted to capture variations in lithology. The fine KNN classification algorithm is sensitive to noise and outliers; hence it is essential that input data is preprocessed to boost performance [42]. The fine KNN was used to classify facies present based on the input features.

3.4.2. Fine Gaussian SVM (FGS)

Fine Gaussian SVM is a supervised classification method that applies a support vector machine with a Gaussian radial basis function (RBF) kernel to separate classes that are not linearly separable in the original feature space [43]. It operates based on the maximum-margin principle, where the model learns a separating hyperplane that maximizes the margin between classes while allowing controlled misclassifications through a soft-margin penalty [44]. The process involves using the kernel to replace dot products with Gaussian similarities, which implicitly maps the data into a higher-dimensional space where separation becomes feasible, then solving the resulting convex optimization to identify the support vectors and their coefficients, and finally classifying a new sample by evaluating the signed decision function. In a fine Gaussian configuration, the kernel width is set relatively small and the kernel scale parameter relatively large, so similarity decays quickly with distance and the decision boundary can change more locally across the feature space. The designation "Fine" specifically relates to the selection of a relatively large kernel scale parameter, which sharply limits the influence of individual support vectors to their immediate vicinity. This localization generates an intricate, highly flexible decision boundary, resulting in a model with significant capacity to classify highly nonlinear dataset [43,45].

3.4.3. Bagged Tree Ensemble (BTE)

The Bagged Tree Ensemble classification algorithm used the bootstrap aggregation technique. Multiple decision trees are trained on randomly resampled subsets of the training dataset. For each tree

T_{b}

, a bootstrap sample

D_{b}

is generated with replacement. The final prediction is obtained through majority voting as shown in the equation below.

ŷ = m o d e {T_{1} (x), T_{2} (x), \dots ., T_{B} (x)}

(3)

Bagging helps to reduce model variance and helps to mitigate overfitting. This is particularly important, especially for noisy and heterogeneous well log datasets [46]. By training trees on different bootstrap replicas of the dataset, the algorithm is able to identify different aspects of the facies and well log data relationships to help improve generalization. Practically, the bagged tree implementation follows standard ensemble formulations. Deep decision trees are combined to model the nonlinear interactions and threshold behavior that are commonly observed at the lithological boundaries, whereas the internal out-of-bag sampling approach provides an efficient estimate of predictive performance and the relevance of the predictor variable.

3.4.4. Wide Neural Network (WNN)

The Wide Neural Network comprises a single hidden layer with a large number of neurons. The network output is expressed as

ŷ = g (\sum_{j = 1}^{H} v_{j} σ (ω_{j} x + b_{j}))

(4)

where

σ (\cdot)

is the activation function,

H

is the number of hidden neurons,

v_{j}

and

ω_{j}

are the learnable weight parameters,

b_{j}

is the bias term and

g (\cdot)

is the output activation function. Model training was performed using the backpropagation technique to minimize a categorical cross-entropy loss function with regularization to help improve on the generalization and reduce the occurrence of overfitting [47]. The wide architecture enables the network to learn a rich set of functions from well log inputs. This helps in capturing interactions among multiple petrophysical attributes. The wide neural network is particularly effective for facies classification problems with overlapping log responses with small changes in the lithological transitions for heterogeneous reservoirs.

4. Results and Discussion

4.1. Supervised Facies Classification Results

The methods described above were applied for facies classification based on the input features. In the training process, 5-fold cross-validation is implemented and 10% of the data is set aside for testing. Table 2 shows the test accuracy of the methods. The performance of the methods was also examined using the confusion matrices and the following metrics. The True Positive Rate (TPR), also called the sensitivity or recall, quantifies the proportion of instances belonging to the positive class that are correctly identified, hence reflecting the specific algorithms’ ability to capture relevant signals in the data. The False Negative Rate (FNR) represents the fraction of positive instances that are incorrectly labeled as negative. The Positive Predictive Value (PPV), or precision, measures the proportion of instances predicted as positive that are truly positive, and the False Discovery Rate (FDR) captures the proportion of predicted positive instances that are false positives. Figure 9, Figure 10, Figure 11 and Figure 12 show the confusion matrices, TPR, and FNR for the KNN, FGS, BTE, and WNN methods, respectively. Figure 13, Figure 14, Figure 15 and Figure 16 show the confusion matrices, PPV, and FDR for the KNN, FGS, BTE, and WNN methods, respectively. The KNN classifier performs the best. Figure 17 shows the Shapley importance [48] for the Best Performing Model (KNN); the marine and non-marine features have the highest impact on the prediction of facies. In general, a high True Positive Rate (TPR) accompanied by a low False Negative Rate (FNR) indicates that the models correctly identify a clear majority of facies. Also, a high Positive Predictive Value (PPV) coupled with a low False Discovery Rate (FDR) indicates that the model’s predictions are highly reliable and contain minimal misclassification. A high PPV implies strong precision and a low prevalence of false positives. Together, these metrics demonstrate that the classifier exhibits strong discriminative capability in identifying the facies with high confidence.

As indicated by the Shapley values, the marine and non-marine data (NM_M) have the greatest influence on the prediction. The marine and non-marine data (NM_M) data is a depositional indicator that represents transgression and regression information that is directly related to the facies. It allows differentiation between lithofacies with similar petrophysical characteristics but deposited in different environments [10], so it has the greatest influence on facies prediction. The spatial distribution of facies represented by the Euclidean distance also plays an important role in the prediction, followed by the petrophysical properties.

4.2. Monte Carlo Markov Switching Dynamic Regression

Given that the KNN model gives the highest accuracy, it is employed in obtaining an optimal matrix scaling parameter. An initial scaling parameter was described by [49] for the dataset is given as

y = {0.9401 x}^{- 0.7759}

(5)

where y is the matrix scaling parameter and x is the plug permeability

A Monte Carlo simulation of a three-state Markov Switching Dynamic Regression (MSDR) model is used to generate scaling parameter perturbations by repeatedly sampling from the stochastic components that govern regime evolution, parameter uncertainty, and dynamic innovations. The regression process is characterized by three latent regimes, each associated with its own set of regression coefficients, while transitions among regimes are governed by a first-order Markov chain with estimated transition probabilities. We make use of a fully specified model.

A Markov switching dynamic regression model represents the time-varying behavior of a quantity in the presence of structural breaks or changes among multiple states, where statistical characteristics of the quantity can differ among the states. At any point in the sample, the regime can change or switch. The discrete state switching mechanism is a discrete-time Markov chain with a probabilistic transition matrix. The switching of those parameters is based on the Markov transition of n number of designated states.

P (S_{t} = s_{t}| S_{t} - 1) = [\begin{matrix} p_{00} & \dots & p_{0 n} \\ ⋮ & ⋱ & ⋮ \\ p_{n 0} & \dots & p_{n n} \end{matrix}]

(6)

Perturbations to the scaling parameter are modeled as

y_{t} \{\begin{matrix} f_{1} (y_{t}; x_{t}, θ_{1}), s_{t} = 1 \\ f_{2} (y_{t}; x_{t}, θ_{2}), s_{t} = 2 \\ ⁞ \\ f_{n} (y_{t}; x_{t}, θ_{n}), s_{t} = n \end{matrix}

(7)

where f_i(y_t;x_t,θ_i) is the dynamic regression model of y_t in state I; x_t is a vector of observed exogenous variables at time t; and θ_i is the collection of parameters of the submodel in state i.

The values of estimable parameters, such as model coefficients, the state transition probability matrix and innovations covariance matrices, are fit to the response data. The estimation procedure requires initial values (set arbitrarily) for all estimable parameters. Since the model contains an unknown parameter, it is a partially specified model. A partially specified model completely specifies the model structure but also specifies which parameters will be estimated.

For each Monte Carlo iteration, a sequence of latent states is simulated according to the Markov transition matrix, after which regime-specific parameters and random disturbances are drawn from their respective distributions. The dynamic regression equation is then propagated forward on the simulated state path, producing a single synthetic realization of the process. This process is repeated multiple times to generate multiple realizations that represent stochastic variability within regimes. This ensemble provides a probabilistic representation of the system’s behavior, enabling uncertainty quantification, predictive inference, and assessment of regime-dependent dynamics under multiple plausible scenarios.

To derive the final scaling parameter, each of the multiple realizations, together with other features, is used to predict the facies for each facies type. Five-fold cross-validation is implemented in the prediction process to address overfitting. The scaling parameter realization that maximizes facies prediction accuracy is then selected for that facies group. Figure 18 shows a sample of the multiple realizations and the initial and final scaling parameters after the MSDR simulation process.

4.3. Numerical Modeling

4.3.1. Static Model

Two 3D static numerical models were constructed in this study to characterize the Council Grove reservoir interval and to provide a consistent foundation for subsequent dynamic simulation. Both models share an identical reservoir geometry, grid architecture, stratigraphic framework, and fracture representation. Differences between the models arise solely from the scaling of permeability data. Previous studies in this area have commonly employed Cartesian grid-based static models for single-well or limited multi-well characterization. In contrast, this study extends prior work by implementing a corner-point gridding approach, which allows for flexible grid-node positioning and improved representation of structural and stratigraphic complexity in the Council Grove area. This approach enables direct incorporation of all available subsurface information, including well tops, well trajectories, spatial well distribution, and hydraulic fracture geometry. The use of a corner-point grid is particularly advantageous for layered carbonate–siliciclastic systems such as the Council Grove, where lateral facies transitions, vertical heterogeneity, and non-orthogonal geometries may not be adequately captured using uniform Cartesian grids.

This 3D geological framework was developed to represent the interlayered carbonate, shale, and non-marine sandstone facies characteristic of the Council Grove intervals. The resolution is optimized to resolve and capture significant variations observed in the geological data. Facies-constrained petrophysical property models were generated for porosity, permeability, and water saturation, ensuring consistency between depositional facies architecture and rock property distributions. All gas-producing wells associated with a producing gas lease in this field were explicitly included in the static models. Incorporating the full well population ensures accurate representation of well interference, pressure communication, and drainage patterns, thereby improving the reliability of both characterization and simulation results.

A planar hydraulic fracture was explicitly represented only in the region of the Rose Alexander ‘A’ 1 well using local grid refinement. The fracture was oriented in the J-direction with grid refinement. This refinement was critical for resolving steep pressure gradients and accurately modeling fracture–matrix flow exchange, which strongly influences production behavior in low-permeability reservoirs. The innermost fracture-grid planes were assigned a width of 2 ft to preserve equivalent fracture conductivity, while grid block dimensions normal to the fracture plane increased logarithmically away from the fracture zone. This logarithmic expansion provided a smooth numerical transition from fracture-dominated to matrix-dominated flow while minimizing numerical dispersion. The primary fracture width was defined as 1 ft, with an intrinsic fracture permeability of 200 mD. Vertically, the fracture extended 83 ft above and below the fracture origin, enabling hydraulic communication across multiple reservoir layers and reflecting observed field-scale fracture behavior.

The first static model (Figure 19b) was populated using permeability distributions derived directly from core plug measurements. The second static model (Figure 19a) is a modified version of the core-plug-based model, in which the facies-derived permeability matrix scaling parameter was applied to account for scale discrepancies between core plug and whole-core measurements. This scaling parameter was generated through comprehensive uncertainty and sensitivity analysis using a Monte Carlo simulation framework coupled with a Markov Switching Dynamic Regression process. The modified model generally shows increased values with an average RMSE of 21 mD. This approach captures nonstationary permeability behavior and transition regimes that are not resolved at the core plug scale. Reservoir rock compressibility was included in both models using a reference pressure of 250 psi and a rock compressibility of 3 × 10⁻⁶ psi⁻¹ to ensure realistic pressure–volume response during depletion. Horizontal permeability anisotropy was assumed to be isotropic, while vertical permeability was defined as 10% of horizontal permeability, consistent with the laminated nature of carbonate–siliciclastic depositional systems in the study area.

4.3.2. Simulation and History Matching

Previous history-matching efforts in the Council Grove interval have largely relied on manual calibration strategies, most commonly through ad hoc adjustment of a global matrix or fracture permeability scaling parameter. Both single-well and multi-well simulation studies reported a persistent mismatch between simulator-calculated and historically recorded gas production, particularly for the Alexander wells. In a single-well study focused on Alexander D2, simulator-predicted production consistently fell short of historical data even when fracture permeability was increased by as much as eight times its base value. These studies concluded that the modeled system appeared to be strongly matrix-limited, with low plug-scale matrix permeability restricting deliverability. In addition, vertically stacked, saturated non-marine siltstone layers were interpreted to inhibit vertical gas migration toward the wellbore, further limiting effective drainage and contributing to the observed production mismatch.

Similarly, a multi-well characterization and simulation study demonstrated that plug-derived permeability distributions were sufficient to historically match Chase Group production but were inadequate for the Council Grove reservoir, particularly at the Alexander D2 well [49]. Even after applying selective permeability scaling parameters, increasing limestone permeability by factors of up to four, and extending hydraulic fractures into additional stratigraphic intervals, simulator-calculated production remained substantially lower than observed field performance. These outcomes highlighted the limitations of this permeability scaling approach and emphasized the strong scale dependence of permeability, with core plug measurements systematically underestimating effective flow capacity relative to whole-core and flow-based estimates.

In contrast, history matching in this study was performed using an advanced machine-learning-assisted workflow designed to accelerate model calibration while reducing non-uniqueness in the resulting parameter sets. Rather than relying on manual tuning, a permeability matrix scaling parameter was first derived from a systematic comparison of whole-core and core-plug permeability measurements. This scaling parameter was then subjected to comprehensive uncertainty and sensitivity analysis using a Monte Carlo simulation framework coupled with a Markov Switching Dynamic Regression process. This approach explicitly captures regime-dependent permeability behavior and transition states that reflect changes in flow mechanisms and effective connectivity across scales. This approach also provides a statistically robust alternative to deterministic scaling parameter adjustment.

History matching results from the unmodified static model populated solely with core-plug-derived permeability demonstrate that simulator-predicted production generally underestimates historical gas rates and cumulative production, as reported in previous studies (Figure 20). The only exception is the Shankle 2-9 well, for which an excellent production match was achieved without any permeability modification. In addition, the base model successfully reproduced the late-time production behavior for the Nolan 1, Shrimplin Gas Unit, and Newby 2-28R wells, indicating that plug-scale permeability may be adequate for certain wells and flow regimes, particularly during boundary-dominated or depletion-dominated phases. However, when the machine-learning- and Monte-Carlo-assisted permeability modification workflow was applied, a very good match between simulated and historical gas production was achieved for all wells in the study area (Figure 21). The calibrated models simultaneously reproduce early-time transient behavior, peak production, and long-term decline trends without requiring unrealistic fracture property inflation or manual parameter manipulation. These results demonstrate that permeability scaling informed by whole-core data and constrained through probabilistic, data-driven methods provides a more physically consistent and transferable history-matching solution than traditional manual approaches. These history-matching results confirm earlier observations that plug-scale permeability alone may be insufficient to represent effective reservoir flow capacity in the Council Grove. However, unlike prior studies, the methodology presented here resolves this limitation through a systematic, reproducible, and uncertainty-aware workflow, yielding robust production matches across both single-well and multi-well settings while preserving the geological and petrophysical nature of the area.

5. Conclusions

Using a comprehensive dataset of well log data, core measured data, facies, and depositional environment information, this study has developed an integrated three-dimensional modeling framework in which facies-constrained petrophysical property models were constructed and evaluated. A machine learning workflow was applied to define clusters and facies groupings. Uncertainty in permeability scaling was assessed through a Monte Carlo simulation of a Markov switching dynamic regression process applied to a permeability matrix scaling parameter. Multiple realizations of this scaling parameter were generated and incorporated as machine learning features to propagate uncertainty into the modeling workflow. Field development and completion scenarios were subsequently assessed, and history matching of historical gas production was conducted to mitigate non-uniqueness in model responses. The results show that the Monte-Carlo-assisted permeability modification workflow demonstrates a very good match between simulated and historical gas production, indicating that permeability scaling constrained through a probabilistic, data-driven method provides a more physically consistent and transferable solution than traditional approaches. This study defines a framework and workflow for making use of facies data in building numerical models and reservoir optimization in future research work. One limitation of extending the workflow to other areas is the availability of comprehensive core-derived facies data, which is often sparse.

Author Contributions

Conceptualization, A.A., M.R. and N.S.; methodology, A.A., M.R., N.S., A.-M.K., E.A.K., E.G., E.A. and W.A. software, A.A., M.R., N.S., A.-M.K., E.A.K., E.G., E.A. and W.A.; validation, N.S., E.A. and W.A.; formal analysis, A.-M.K., E.A.K., E.G., E.A. and W.A.; investigation, A.A., M.R., N.S., A.-M.K., E.A.K., E.G., E.A. and W.A.; resources, A.A., M.R., N.S., A.-M.K., E.A.K., E.G., E.A. and W.A.; data curation, A.A., M.R., N.S., A.-M.K., E.A.K., E.G., E.A. and W.A.; writing—original draft preparation, A.A., M.R., N.S., A.-M.K., E.A.K., E.G., E.A. and W.A.; writing—review and editing, A.A., M.R., N.S., A.-M.K., E.A.K., E.G., E.A. and W.A.; visualization, A.A., M.R., N.S., A.-M.K., E.A.K., E.G., E.A. and W.A.; supervision, A.A., M.R., N.S., E.G. and W.A.; project administration, A.A., M.R., N.S., E.G. and W.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is publicly available from Dubois et al. (2006) [5] and Hall (2016) [6].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Skelton, L.H. A Brief History of the Kansas Oil and Gas Industry. Oil-Ind. Hist. 2006, 7. Available online: https://archives.datapages.com/data/phi/v7_2006/skelton.htm (accessed on 9 January 2026).
Byrnes, A.P.; Dubois, M.K.; Magnuson, M. Western Tight Gas Carbonates: Comparison of Council Grove Group, Panoma Field, Southwest Kansas, and Western Low Permeability Sandstones, American Association of Petroleum Geolology Annual Convention 10:A31. 2001. Available online: https://www.searchanddiscovery.com/abstracts/html/2001/annual/abstracts/0115.htm (accessed on 9 January 2026).
Bohling, G.C.; Dubois, M.K. An Integrated Application of Neural Network and Markov Chain Techniques to the Prediction of Lithofacies from Well Logs: Kansas Geological Survey Open-File Report 2003-50. 2003. Available online: https://www.researchgate.net/profile/Martin-Dubois/publication/242269985_An_Integrated_Application_of_Neural_Network_and_Markov_Chain_Techniques_to_Prediction_of_Lithofacies_from_Well_Logs_Kansas_Geological_Survey_Open_File_Report_2003-50/links/54ff1fb30cf2672e22427ac3/An-Integrated-Application-of-Neural-Network-and-Markov-Chain-Techniques-to-Prediction-of-Lithofacies-from-Well-Logs-Kansas-Geological-Survey-Open-File-Report-2003-50.pdf (accessed on 9 January 2026).
Dubois, M.; Bohling, G.; Chakrabarti, S. Comparison of rock facies classification using three statistically based classifiers. Tech. Rep. 2004, 64, 2004. [Google Scholar]
Dubois, M.K.; Byrnes, A.P.; Bhattacharya, S.; Bohling, G.C.; Doveton, J.H.; Barba, R.E. Hugoton Asset Management Project (HAMP). Hugoton Geomodel Final Report. KGS Open File Report. 2006. Available online: https://www.researchgate.net/profile/Martin-Dubois/publication/241754406_Hugoton_Asset_Management_Project_HAMP/links/54ff1fb60cf2672e22427ae1/Hugoton-Asset-Management-Project-HAMP.pdf (accessed on 9 January 2026).
Hall, B. Facies classification using machine learning. Lead. Edge 2016, 35, 906–909. [Google Scholar] [CrossRef]
Xu, C.; Torres-Verdín, C. Pore system characterization and petrophysical rock classification using a bimodal Gaussian density function. Math. Geosci. 2013, 45, 753–771. [Google Scholar] [CrossRef]
Wei, Z.; Hu, H.; Zhou, H.W.; Lau, A. Characterizing rock facies using machine learning algorithm based on a convolutional neural network and data padding strategy. Pure Appl. Geophys. 2019, 176, 3593–3605. [Google Scholar] [CrossRef]
Mohamed, I.M.; Mohamed, S.; Mazher, I.; Chester, P. Formation lithology classification: Insights into machine learning methods. In Proceedings of the SPE Annual Technical Conference and Exhibition, Las Vegas, NV, USA, 27–30 October 2019; SPE: Richardson, TX, USA, 2019; p. D021S033R005. [Google Scholar]
Halotel, J.; Demyanov, V.; Gardiner, A. Value of geologically derived features in machine learning facies classification. Math. Geosci. 2020, 52, 5–29. [Google Scholar] [CrossRef]
Yale, D.P.; Jamieson, W.H. Static and dynamic rock mechanical properties in the Hugoton and Panoma fields, Kansas. In Proceedings of the SPE Oklahoma City Oil and Gas Symposium/Production and Operations Symposium, Amarillo, TX, USA, 22 May 1994; SPE: Richardson, TX, USA, 1994; p. 27939. [Google Scholar]
Dubois, M.K.; Byrnes, A.P.; Bohling, G.C.; Doveton, J.H. Multiscale Geologic and Petrophysical Modeling of the Giant Hugoton Gas Field (Permian), Kansas and Oklahoma. 2006. Available online: https://www.kgs.ku.edu/PRS/publication/2006/2006-12/index.html (accessed on 9 January 2026).
Sorenson, R.P. A dynamic model for the Permian Panhandle and Hugoton fields, western Anadarko basin. AAPG Bull. 2005, 89, 921–938. [Google Scholar] [CrossRef]
Bergen, K.J.; Johnson, P.A.; de Hoop, M.V.; Beroza, G.C. Machine learning for data-driven discovery in solid Earth geoscience. Science 2019, 363, eaau0323. [Google Scholar] [CrossRef]
Konoshonkin, D.; Shishaev, G.; Matveev, I.; Volkova, A.; Rukavishnikov, V.; Demyanov, V.; Belozerov, B. Machine learning clustering of reservoir heterogeneity with petrophysical and production data. In Proceedings of the SPE Europec Featured at EAGE Conference and Exhibition, Houston, TX, USA, 8–11 June 2020; SPE: Richardson, TX, USA, 2020; p. D011S007R003. [Google Scholar]
Amosu, A.; Imsalem, M.; Sun, Y. Effective machine learning identification of TOC-rich zones in the Eagle Ford Shale. J. Appl. Geophys. 2021, 188, 104311. [Google Scholar] [CrossRef]
Amosu, A.; Sun, Y. Identification of thermally mature total organic carbon-rich layers in shale formations using an effective machine-learning approach. Interpretation 2021, 9, T735–T745. [Google Scholar] [CrossRef]
Sharma, S.K.; Chin, M.; Basu, T.; Bhargava, R.O.; Henson, R.; Jiang, L.; Shuhaimi, M.A.; Vizzini, L. Core Driven Hierarchical Facies Modeling of Shoreface Environments: A Case Study from Offshore Sabah, Malaysia. 2013. Available online: https://www.searchanddiscovery.com/documents/2013/20204sharma/ndx_sharma.pdf (accessed on 9 January 2026).
Amosu, A.; Bui, D.; Oke, O.; Koray, A.M.; Appiah Kubi, E.; Sibaweihi, N.; Ampomah, W. Committee Machine Learning for Electrofacies-Guided Well Placement and Oil Recovery Optimization. Appl. Sci. 2025, 15, 3020. [Google Scholar] [CrossRef]
Bui, D.; Koray, A.M.; Appiah Kubi, E.; Amosu, A.; Ampomah, W. Integrating machine learning workflow into numerical simulation for optimizing oil recovery in sand-shale sequences and highly heterogeneous reservoir. Geotechnics 2024, 4, 1081–1105. [Google Scholar] [CrossRef]
Ao, Y.; Li, H.; Zhu, L.; Ali, S.; Yang, Z. Identifying channel sand body from multiple seismic attributes with an improved random forest algorithm. J. Pet. Sci. Eng. 2019, 173, 781–792. [Google Scholar] [CrossRef]
Milad, I.; Farmer, R.; Vaitekaitis, T.; Majed, S.A. Machine Learning to Predict Geological Facies in Complex Carbonate Reservoirs. In Proceedings of the First EAGE Workshop on Advances in Carbonate Reservoirs from Prospects to Development, Kuwait City, Kuwait, 23–25 April 2024; European Association of Geoscientists & Engineers: Bunnik, The Netherlands, 2024; Volume 2024, pp. 1–3. [Google Scholar]
Zhang, D.; Zhang, H.; Ren, Q.; Zhao, X. Multiple-point geostatistical simulation of nonstationary sedimentary facies models based on fuzzy rough sets and spatial-feature method. SPE J. 2023, 28, 2240–2255. [Google Scholar] [CrossRef]
Strebelle, S. Conditional simulation of complex geological structures using multiple point statistics. Math. Geol. 2002, 34, 1–21. [Google Scholar] [CrossRef]
Koray, A.M.; Bui, D.; Kubi, E.A.; Ampomah, W.; Amosu, A. Machine learning based reservoir characterization and numerical modeling from integrated well log and core data. Geoenergy Sci. Eng. 2024, 243, 213296. [Google Scholar] [CrossRef]
Demyanov, V.; Backhouse, L.; Christie, M. Geological feature selection in reservoir modelling and history matching with Multiple Kernel Learning. Comput. Geosci. 2015, 85, 16–25. [Google Scholar] [CrossRef]
Zhang, J.D.; Wang, J.; Yao, C.J.; Yang, Y.F.; Sun, H.; Yao, J. Multi source information fused generative adversarial network model and data assimilation based history matching for reservoir with complex geologies. Pet. Sci. 2022, 19, 707–719. [Google Scholar] [CrossRef]
Canchumuni, S.W.; Emerick, A.A.; Pacheco, M.A. History matching geological facies models based on ensemble smoother and deep generative models. J. Pet. Sci. Eng. 2019, 177, 941–958. [Google Scholar] [CrossRef]
Park, H.; Scheidt, C.; Fenwick, D.; Boucher, A.; Caers, J. History matching and uncertainty quantification of facies models with multiple geological interpretations. Comput. Geosci. 2013, 17, 609–621. [Google Scholar] [CrossRef]
Higley, D.K. Thermal Maturation of Petroleum Source Rocks in the Anadarko Basin Province, Colorado, Kansas, Oklahoma, and Texas, chap. 3. In Petroleum Systems and Assessment of Undiscovered Oil and Gas in the Anadarko Basin Province, Colorado, Kansas, Oklahoma, and Texas USGS Province 58: U.S. Geological Survey Digital Data Series DDS 69 EE; USGC: Reston, MV, USA, 2014; 53p. [Google Scholar] [CrossRef]
Ball, M.M.; Henry, M.E.; Frezon, S.E. Petroleum Geology of the Anadarko Basin Region, Province (115), Kansas, Oklahoma, and Texas: U.S. Geological Survey Open File Report 88-450W. 1991; 36p. Available online: https://pubs.usgs.gov/of/1988/0450w/report.pdf (accessed on 9 January 2026).
Johnson, K.S. Geologic evolution of the Anadarko Basin. In Anadarko Basin Symposium, 1988: Oklahoma Geological Survey Circular 90; Oklahoma Geological Survey: Norman, OK, USA, 1989; pp. 3–12. Available online: http://www.ogs.ou.edu/pubsscanned/Circulars/Circular90.pdf (accessed on 9 January 2026).
Dubois, M.K.; Goldstein, R.H.; Hasiotis, S.T. Climate controlled aggradation and cyclicity of continental loessic siliciclastic sediments is Asselian Sakmarian cyclothems, Permian, Hugoton embayment, USA. Sedimentology 2012, 59, 1782–1816. [Google Scholar] [CrossRef]
Denison, R.E. Evolution of the Anadarko Basin: American Association of Petroleum Geologists Bulletin. GeoScienceWorld 1976, 60, 325. [Google Scholar]
Perry, W.J. Tectonic evolution of the Anadarko basin region, Oklahoma. U.S. Geol. Surv. Bull. 1989, 1866, A1–A16. [Google Scholar]
Arthur, D.; Vassilvitskii, S. k-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007; Association for Computing Machinery, Inc.: New York, NY, USA; pp. 1027–1035.
Ji, Y.; Liu, Z.; Wang, S.; Sun, Y.; Peng, Z. On simplifying large-scale spatial vectors: Fast, memory-efficient, and cost-predictable k-means. arXiv 2024, arXiv:2412.02244. [Google Scholar]
Pant, Y.R.; Leigh, L.; Fajardo Rueda, J. Improving k-means clustering: A comparative study of parallelized versions. Algorithms 2025, 18, 532. [Google Scholar] [CrossRef]
Semoglou, A.; Likas, A.; Pavlopoulos, J. Silhouette-guided instance-weighted k-means. arXiv 2025, arXiv:2506.12878. [Google Scholar]
Baligodugula, V.V.; Amsaad, F. Unsupervised learning: Comparative analysis of clustering techniques on high-dimensional data. arXiv 2025, arXiv:2503.23215. [Google Scholar] [CrossRef]
Zhang, J.; He, Y.; Zhang, Y.; Li, W.; Zhang, J. Well-Logging-Based Lithology Classification Using Machine Learning Methods for High-Quality Reservoir Identification: A Case Study of Baikouquan Formation in Mahu Area of Junggar Basin, NW China. Energies 2022, 15, 3675. [Google Scholar] [CrossRef]
Dudani, S. The distance-weighted k-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 1976, 6, 325–327. [Google Scholar] [CrossRef]
Guido, R.; Groccia, M.C.; Conforti, D. A hyper-parameter tuning approach for cost-sensitive support vector machine classifiers. Soft Comput. 2022, 27, 12863–12881. [Google Scholar] [CrossRef]
Chang, Y.J.; Lin, Y.L.; Pai, P.F. Support Vector Machines with Hyperparameter Optimization Frameworks for Classifying Mobile Phone Prices in Multi-Class. Electronics 2025, 14, 2173. [Google Scholar] [CrossRef]
Mastropietro, A.; Feldmann, C.; Bajorath, J. Calculation of exact Shapley values for explaining support vector machine models using the radial basis function kernel. Sci. Rep. 2023, 13, 19561. [Google Scholar] [CrossRef]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Shapley, L.S. Stochastic games. Proc. Natl. Acad. Sci. USA 1953, 39, 1095–1100. [Google Scholar] [CrossRef]
Bhattacharya, S.; Dubois, M.; Byrnes, A. Multi Well Characterization and Simulation—Council Grove Reservoir System, Kansas Geological Survey, Open File Report 2004-67. 2004. Available online: https://www.kgs.ku.edu/PRS/publication/2004/OFR04_67/index.html (accessed on 9 January 2026).

Figure 1. Map of study area, wells and approximate outline of the Panoma Gas Field.

Figure 2. Generalized stratigraphic column for the Hugoton Embayment with oil and gas shows modified after [30,32].

Figure 3. Well correlation section showing gamma-ray, resistivity, facies information, and depositional environment.

Figure 4. Histogram of some input features: gamma (GR), Log10 of deep induction resistivity (ILD), average of density and neutron porosity (DeltaPHI), the difference between the neutron and the density porosity (PHIND), photoelectric factor (PE), and the relative stratigraphic position (RELPOS).

Figure 5. Heatmap of some input features: gamma (GR), Log10 of deep induction resistivity (ILD), average of density and neutron porosity (DeltaPHI), the difference between the neutron and the density porosity (PHIND), photoelectric factor (PE), and the relative stratigraphic position (RELPOS).

Figure 6. Cross plot of some input features: gamma (GR), Log10 of deep induction resistivity (ILD), average of density and neutron porosity (DeltaPHI), the difference between the neutron and the density porosity (PHIND), photoelectric factor (PE), and the relative stratigraphic position (RELPOS).

Figure 7. Workflow diagram.

Figure 8. K-means clusters visualized using t-Distributed Stochastic Neighbor Embedding (tSNE).

Figure 9. Fine KNN True Positive Rate and False Negative Rate.

Figure 10. Fine Gaussian SVM True Positive Rate, and False Negative Rate.

Figure 11. Bagged Tree Ensemble True Positive Rate and False Negative Rate.

Figure 12. Wide Neural Network True Positive Rate and False Negative Rate.

Figure 13. KNN Positive Predictive Value and False Discovery Rate.

Figure 14. Fine Gaussian SVM Positive Predictive Value and False Discovery Rate.

Figure 15. Bagged Tree Ensemble Positive Predictive Value and False Discovery Rate.

Figure 16. Wide Neural Network Positive Predictive Value and False Discovery Rate.

Figure 17. Shapley Importance for the Best Performing Model (KNN).

Figure 18. Monte Carlo Simulation of the Matrix Scaling parameter and the Initial (red) and Final (black) scaling parameters.

Figure 19. Static geological model of Panoma Gas Field with permeability representative of (a) facies-based machine learning model (b) plug-derived permeability.

Figure 20. History match of monthly production rates of all gas-producing wells as obtained from the core plug permeability model.

Figure 21. History match of monthly gas production rates of all gas-producing wells as obtained from the facies-based machine learning permeability model.

Table 1. Facies types and description obtained from core analysis.

Facies Class Label	Facies Symbol	Facies Type	Adjacent Facies
1	SS	Non-marine sandstone	CsiS
2	CSiS	Non-marine coarse siltsone	SS, FSiS
3	FSiS	Non-marine fine siltsone	CSiS
4	SiSh	Marine siltsone and shale	MS
5	MS	Mudstone	SiSh, WS
6	WS	Wackestone	MS, D, PS
7	D	Dolomite	WS, PS
8	PS	Packstone-grainstone	WS, D, BS
9	BS	Phylloid-algal bafflestone	D, PS

Table 2. Accuracy prediction table of classification models.

Model Type	Model Description	Advantages	Limitations	Accuracy % (Validation)	Accuracy % (Test)
Fine KNN	Non-parametric and uses distance-based decision rule	Simple, intuitive and often captures fine local patterns.	Highly sensitive to noise and outliers and computationally expensive	80.31	83.86
Fine Gaussian SVM	Kernel based Gaussian SVM with margin-maximizing classifier	Good nonlinear classification performance	Hyperparameter tuning is critical and is computationally expensive for large datasets	77.43	80.06
Bagged Tree Ensemble	Ensemble of decision trees with variance-reduction technique	Robust to overfitting and handles nonlinearities and interactions well	Higher memory usage and bias reduction is limited	75.42	81.01
Wide Neural Network	Shallow neural network with many neurons	Has flexible architecture and learns complex feature interactions faster training than deep networks	Sensitive to architecture and hyperparameters	75.21	74.68

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Amosu, A.; Reyes, M.; Sibaweihi, N.; Koray, A.-M.; Appiah Kubi, E.; Gyimah, E.; Agyei, E.; Ampomah, W. Optimization of Gas Production Using Machine Learning Modeling of Geological Core Facies and Monte Carlo Simulation: Application in the Permian, Southwest Kansas. Appl. Sci. 2026, 16, 2436. https://doi.org/10.3390/app16052436

AMA Style

Amosu A, Reyes M, Sibaweihi N, Koray A-M, Appiah Kubi E, Gyimah E, Agyei E, Ampomah W. Optimization of Gas Production Using Machine Learning Modeling of Geological Core Facies and Monte Carlo Simulation: Application in the Permian, Southwest Kansas. Applied Sciences. 2026; 16(5):2436. https://doi.org/10.3390/app16052436

Chicago/Turabian Style

Amosu, Adewale, Martin Reyes, Najmudeen Sibaweihi, Abdul-Muaizz Koray, Emmanuel Appiah Kubi, Emmanuel Gyimah, Emmanuel Agyei, and William Ampomah. 2026. "Optimization of Gas Production Using Machine Learning Modeling of Geological Core Facies and Monte Carlo Simulation: Application in the Permian, Southwest Kansas" Applied Sciences 16, no. 5: 2436. https://doi.org/10.3390/app16052436

APA Style

Amosu, A., Reyes, M., Sibaweihi, N., Koray, A.-M., Appiah Kubi, E., Gyimah, E., Agyei, E., & Ampomah, W. (2026). Optimization of Gas Production Using Machine Learning Modeling of Geological Core Facies and Monte Carlo Simulation: Application in the Permian, Southwest Kansas. Applied Sciences, 16(5), 2436. https://doi.org/10.3390/app16052436

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimization of Gas Production Using Machine Learning Modeling of Geological Core Facies and Monte Carlo Simulation: Application in the Permian, Southwest Kansas

Abstract

1. Introduction

2. Geological Setting

Stratigraphy of the Study Area

3. Data and Methodology

3.1. Data Description

3.2. Workflow Description

3.3. Unsupervised Facies Classification

K-Means Clustering

3.4. Supervised Facies Classification

3.4.1. Fine K-Nearest Neighbors (KNN)

3.4.2. Fine Gaussian SVM (FGS)

3.4.3. Bagged Tree Ensemble (BTE)

3.4.4. Wide Neural Network (WNN)

4. Results and Discussion

4.1. Supervised Facies Classification Results

4.2. Monte Carlo Markov Switching Dynamic Regression

4.3. Numerical Modeling

4.3.1. Static Model

4.3.2. Simulation and History Matching

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI