2.1. Data Description
The South Yellow Sea basin is a south-west-west oriented rift depression basin located between the Subei basin and Korea peninsular as shown in
Figure 1b. It was formed as a result of multi-cycle tectonic action, erosion, and uplift events from the Neoproterozoic through to the Cenozoic era [
21,
22,
23]. It is worth noting that significant oil has been discovered in the Funing, Dainan, and Sanduo Formations of the southern basin. Funing, Dainan, and Sanduo Formations are composed mainly of an interbedding of sandstone and mudstone as indicated in
Figure 1c. Muddy limestone can be found in the middle and upper part of the Funing Formation. Also,
Figure 1c shows a layer of coal present in the lower portion of the Dainan Formation and muddy limestone in its upper portion.
Two wells (i.e., Well A and Well B) that target the oil-bearing Formations in the southern basin of the South Yellow Sea were considered in the present study (
Figure 1b). Well logs, core porosity, and permeability data of Well A were used to train the computational intelligence models while the models’ predictive capabilities were examined on the data of Well B. The input well log parameters were gamma ray (GR), sonic travel time (DT), resistivity (RT), and spontaneous potential (SP). The output parameters were core porosity and permeability. A total of 727 sample data of Well A was used for training and 311 sample data of Well B was used as testing data. Well logs adopted in the study are seen in
Figure 2. The statistical description of the well log parameters, porosity and permeability considered for this study are listed in
Table 1 and
Table 2.
2.2. Gaussian Process Regression
A Gaussian process (GP) is an infinite group of random variables of which any of the finite subsets has a constant joint Gaussian distribution [
24,
25,
26]. A GP is represented by a mean function and a covariance function. Since the GP is a linear combination of random variables having a normal distribution, by simplicity, the mean function is usually assumed to be zero. Assuming a training set
y of
n number of parameters and having an input matrix
and output variable
, which is expressed to be porosity or permeability. The Gaussian process is therefore represented in Equation (1) as:
where GP is the Gaussian process,
m(x) is the mean function and
is the covariance function. The
m(x) in the GP represents the expected value of the function
at the input matrix point
x as expressed in Equation (2):
The
is a measure of the confidence level for
m(x) as represented in Equation (3). The covariance function takes any two arguments such that it generates a non-negative covariance matrix
K.
The covariance function helps to implicitly specify certain aspects of the model such as smoothness, periodicity, stationarity, etc. The basic and widely used Gaussian process regression (GPR) is composed of a simple zero mean and squared exponential covariance function as expressed in Equation (4):
where
and
l are the hyperparameters and they influence the performance of GP.
is the model noise and
l is the length scale. The covariance is close for any set of inputs that are in close proximity; however, the covariance is exponentially decreased as the distance between input parameters increases.
There are various covariance functions (kernel functions) that can be employed in a GPR as denoted in Equations (6)–(9) [
16,
20]:
To estimate the expected function value (
), which is the joint Gaussian prior distribution given the test input (
), Equation (10) can be used:
is the mean value of the prediction and it gives the best estimate for
. The variance
is an indication of the uncertainty of the prediction. The mean prediction,
in Equation (11) is a linear combination of the target
while the variance
is not dependent on the target but only inputs:
where
is the covariance matrix of the training dataset.
represents the covariance matrix of the testing data, which represents the
covariance matrix obtained from training and testing data,
. The marginal likelihood over
is expressed in Equation (12) as:
where
Using the logarithmic identifier to simplify the integral expression of Equation (12) generates the log marginal likelihood given in Equation (15):
where
θ is the set of hyperparameters needed for a given covariance function. The minimum posterior hyperparameter in the covariance function can be obtained by maximizing the marginal likelihood
or minimizing the negative log marginal likelihood. The output of the GPR model is presented in terms of its mean and variance.