Median Polish Kriging and Sequential Gaussian Simulation for the Spatial Analysis of Source Rock Data

: In this technical note, a geostatistical model was applied to explore the spatial distribution of source rock data in terms of total organic carbon weight concentration. The median polish kriging method was used to approximate the “row and column effect” in the generated array data, in order for the ordinary kriging methodology to be applied by means of the residuals. Moreover, the sequential Gaussian simulation was employed to quantify the uncertainty of the estimates. The modiﬁed Box–Cox technique was applied to normalize the residuals and a cross-validation analysis was performed to evaluate the efﬁciency of the method. A map of the spatial distribution of total organic carbon weight concentration was constructed along with the 5% and 95% conﬁdence intervals. This work encourages the use of the median polish kriging method for similar applications.


Introduction
Onshore and coastal petroleum exploration commonly involves rock samples from different locations, geological formations and depths, which are analyzed for their mineral composition and organic properties. Geostatistics can be used to spatially analyze these properties and provide their spatial distribution in terms of mathematical principles of interdependence. In petroleum resources evaluation studies, geostatistics in the form of kriging have traditionally been used successfully [1][2][3][4][5]. Over the last decades, multipoint techniques, simulations and seismic inversion methods have been developed and successfully applied in reservoir modeling [6,7].
In this work, we present a combined methodology for spatial analysis of the distribution of total organic carbon weight concentration (TOC wt%) from rock samples. The weight percentage of organic carbon in source rocks represents the concentration of organic material. A 0.5% value by weight of TOC is the minimum allowed for an effective source rock sample. Besides, a value of 2% is considered the minimum for shale gas reservoirs. Values greater than 10% are also probable. However, such values indicate the likelihood that kerogen, rather than other types of hydrocarbons, fills the pore space [8].
The median polish method (MEP) combined with the sequentially Gaussian simulation (SGS) is applied to estimate the spatial distribution of the available dataset in an attempt to propose an alternative method to previous approaches for mapping TOC wt% [9]. In addition, it can designate areas that would be potentially under environmental risk during prospective explorations [10]. MEP approximates large scale variations by iteratively removing the median from the rows and columns of the gridded data. The determined residuals can be analyzed with the kriging method by means of the variogram to study the small-scale spatial structure variation of the study variable and to be combined with the estimated median polish trend [11,12]. MEP is widely used in geostatistical applications to overcome bias as well as the influence of extreme values [13,14], while SGS is used in geostatistical simulation applications in various disciplines [15][16][17]. However, the two methods have never been combined for a specific application, especially in the discipline of hydrocarbons. A similar work involving spatial polycyclic aromatic hydrocarbons (PAHs) analysis using indicator kriging and SGS has been successfully conducted [18], suggesting that such approaches can be useful for the spatial analysis of data related to hydrocarbons.

Materials and Methods
The data evaluated were from subsurface Mesozoic rock samples from the eastern onshore Gulf Coast Basin (mainly Mississippi and Louisiana), USA [19]. Specifically, 561 samples were collected by the USGS from 2011-2017 to investigate potential undiscovered petroleum resources. Part of the data from the aforementioned dataset was used in a recent study and showed that most samples were derived from a common mixed marine terrigenous source rock [20]. In this work, the determined TOC wt% was geostatistically evaluated in terms of its mean value at each sampling location. Thus, 132 unevenly distributed samples were used to perform the spatial analysis of TOC wt% in the study area ( Figure 1). The sample was decomposed into trend and residuals according to the principles of MEP method. The residuals were examined to see if they followed a normal distribution, and a modification of the classic Box-Cox technique [21][22][23] was implemented to transform them.

Median Polish Kriging (MEPK)
Let Z(s) be a random field of sample size N. A mean structure can be obtained by additive decomposition of the row and column effect to define irregular gridded spatial data in which grid spacings do not have to be equal in either the horizonal or vertical directions: where r k and c l are the row and column effects, respectively, and a is a constant. The nodes of an overlaid rectangular grid are allocated to the data positions if they are irregularly spaced. A method called median polish has been proposed to estimate the additive effects given above using median theory in order to avoid bias and the impact of extreme values. [11]. Median polish is applied by recurrent mining until convergence of the row and column medians by means of a convergence criterion. It gives new estimators of a, r k , c l which we write as a, r k , c l . Thus, the initial spatial data are defined as: where R(s i ) corresponds to the residual term which is trend free to consent the application of ordinary kriging (OK), For s = (x, y) in the area bounded by the lines that connect the four nodes: (x l , y k ) ; (x l+1 , y k ) ; (x l , y k+1 ) ; (x l+1 , y k+1 ) , subject to x l < x l+1 and y k < y k+1 , defines the planar interpolant, Details for the corresponding extrapolation equation can be found in Martínez et al. [13]. Thus, the median polish kriging predictor is provided from Equation (5), Furthermore, the median polish kriging variance is defined by the following equation: where 1 is an n × 1 vector of ones, γ the variogram vector of the residuals between s j (observation point) and s 0 (estimation point). Γ is the variogram matrix of the residuals, (N 0 + 1) × (N 0 + 1), at the observation locations [11,13]. The decomposed median predictors are not considered in the MEP variance error estimation. Therefore, to calculate the uncertainty of estimations in a reliable way, SGS is applied.

Sequential Gaussian Simulation
The sequential Gaussian simulation (SGS) is a stochastic approach for producing equiprobable realizations (maps) of spatial distribution of a variable on a grid by means of kriging methodology. SGS fundamentals are based on drawing from a collection of conditional distributions of univariate realizations. The conditional cumulative distribution function (CDF) is defined as, SGS is obtained using kriging estimates, where each simulated map is a realization of a multivariate normal process. For conditional SGS, the kriging variance at sampling locations is zero to ensure that the only possible drawings are those of the sampled values. SGS is applied in terms of kriging estimator by means of a linear combination of random variables at location s i to estimate the value at location s 0 [4,18]. OK method applies that z(s) and z ∈ Z is a random function with a constant but unknown mean. The OK estimatê z(s 0 ) at s 0 is calculated based on a weighted sum of the data, The weights λ i depend on the variogram model γ z (r) [24] and are calculated by minimizing the mean square estimation error conditionally on the zero-bias constraint [11]. In detail, the following (N 0 + 1) × (N 0 + 1) linear system of equations provides the weights λ i , where N 0 is the set of points included in the search neighborhood of s 0 ,γ z ( s i , s j ) and γ z ( s j , s 0 ) are the variograms between two observation points s i ,s j , and between s j and the estimation point s 0 , respectively, by means of a theoretical variogram model. The term µ corresponds to the Lagrange multiplier imposing the no-bias constraint. Equation (10) implements the zero-bias condition.
Kriging methods provide the estimation of a variableẑ(s 0 ) accompanied by the associated uncertainty, i.e., corresponding estimation's error variance. The error variance of OK is independent of data values, is zero at monitoring locations and increases away from them, depends on the data configuration, i.e., complexity of the random field spatial variability as modeled by the variogram, while two identical spatially distant pairs of sampled points have the same variance independently of their values [25].
The SGS algorithm steps are outlined below as follows [18]: (1) Examine if the original data follow normal distribution and apply transformation.
(2) A node at location s is randomly selected that has not been yet simulated.
(3) Apply kriging estimation atẑ(s 0 ) and calculate the corresponding kriging variance σ 2 E (s 0 ). (4) Draw a random value from the normal distribution N ẑ(s 0 ), σ 2 E (s 0 ) , which corresponds to the simulated value. The main advantage of using geostatistical simulation is that any realization has the same variogram as the data. This is due to the sampling that applies from the conditional distribution during the simulation process [24].

Variogram
The experimental variogram of the transformed data was first calculated using the method of moments,γ (11) where N(r k ) denotes the number of point pairs within class r k . The theoretical variogram fit was held using the Spartan variogram function considering the model parameters θ , The Spartan function is defined as follows [22,26,27]: where η 0 is the scale factor, η 1 is the rigidity coefficient, β 1 = |2 − η 1 | 1/2 /2 is a dimensionless wavenumber, β 2 = |2 + η 1 | 1/2 /2 and ω 1,2 = (|η 1 ∓ ∆|/2) 1/2 , ∆ = η 2 1 − 4 1/2 , h = r/ξ is the normalized lag vector, ξ is the correlation length, h =|h| is the distance norm and σ 2 z is the variance. The Spartan family models depending on η 1 coefficient values have a characteristic wave behavior near the sill that can capture important increments or decrements of the experimental variogram values. Therefore, this characteristic can provide optimum fitting.
The geostatistical analysis using the proposed methods and tools was performed in Matlab ® environment developing original codes.

Cross-Validation
The proposed method performance is assessed using a leave one out cross-validation procedure. The estimated values are compared with the corresponding observations using a series of performance metrics. The bias error, the mean absolute error and the linear correlation coefficient were examined: Bias (optimum value close to 0, positive or negative sign denotes overestimation or underestimation): Mean absolute error (MAE) (optimum value close to 0): Linear correlation coefficient (optimum value close to 1): The termẑ(s i ) is the estimation at point s i , z(s i ) the observed value, z(s i ) the spatial average of the observation data andẑ(s i ) the spatial average of the estimations [28].

Methodology Flowchart
The proposed methodological steps described previously in detail are summarized in a flowchart (Figure 2) presenting the basic steps of the combined geostatistical tools applied to provide a reliable map with the spatial distribution of TOC wt% and the relative uncertainty of estimations.

Results
In this technical note, we apply a robust method for generating maps of TOC wt% using non-uniformly distributed monitoring stations and consider the joint application of MEPK with SGS. The sample data (Figure 3) are aligned with the nearest grid nodes according to the methodology, receiving new coordinates that correspond to the grid lines. In the case of several data aligned to a grid node, the median substitutes the data values [14].
The advantage of MEP is the robust fitting of a smooth and flexible trend surface. The MEP procedure first decomposes the data into trend and residuals, as explained in the methodology section. It then tests whether the residuals follow a normal distribution. The characteristics of the sample are shown in Table 1. The residuals' skewness is equal to 1.68 and kurtosis to 4.86. Thus, TOC wt% residuals do not follow the normal distribution in order for OK and SGS to be suitable for application. Therefore, the modified Box-Cox technique [22,23] was applied to transform the residuals. The metrics are now improved to s = 0.02 and kurtosis, k = 3.26, very close to the desired Gaussian distribution metrics (s = 0 and k = 3.00). The transformed residuals' histogram is presented in Figure 4.  Table 1. Statistical measures of TOC wt% data. z min : minimum value; z max : maximum value; z 0.50 : median; m z : mean;σ z : standard deviation;ŝ z : skewness factor;k z : kurtosis factor. The residuals' spatial dependence structure was fitted using the Spartan model ( Figure 5). The calculated variogram parameters are σ 2 z = 1.38, η 1 = −1.96, ξ = 0.17 and nugget c = 0.23 wt% 2 . The MEPK method prediction error by means of cross-validation error (after back transformation of residuals) is presented in Table 2. The cross-validation shows that errors between estimated and measured values are close to zero and significantly correlated. Thus, the method provides satisfactory estimation capability. Furthermore, in this work, the mean of the TOC wt% simulated estimations ( Figure 6) is presented from MEPK using SGS, while error maps (uncertainty) were provided using the 95% (Figure 7) and 5% (Figure 8) confidence intervals of the CDF of the simulations. The provided maps were produced after back-transforming estimations to the original scale.
The spatial analysis of the TOC wt% shows the areas of the case study that have concentrations close to 2%, taking into account the associated uncertainty, which can be further investigated employing information from petroleum geology properties. The results of this work, in comparison to previous studies that employ partial information from the same dataset [20,29], are in direct agreement in terms of the designated locations that have the highest TOC wt% potential. The data used in those works were located in the core of the measurements' location, using geochemical and spatial analysis for hydrocarbon resources evaluation. Overall, similarly to previous works, the produced map highlights localized areas of high-quality source rock properties (TOC wt%) but of poor source rock properties on average. However, according to [29], additional work to map areas and better zones that may provide an improved potential for recoverable hydrocarbons should apply. Therefore, this work exploits the entire dataset [19], in terms of the mean TOC wt% in variable geological formations, to spatially map, by means of an efficient geostatistical methodology, the average TOC wt% potential in the study area. New locations forming zones of significant TOC wt% are presented at the south-west of the study area. Additionally, this research, compared to previous works [29], contributes, apart from the more detailed and high accuracy spatial analysis, to the uncertainty estimation of the spatial analysis by applying a simulation method that presents possible realizations of the TOC wt% providing the 5% and 95% bounds of the geostatistical method predictions uncertainty. The estimation of uncertainty can help to an integrated analysis of TOC wt% spatial distribution.   SGS is a robust stochastic simulation method that provides realizations that represent spatial patterns without smoothing effect. It exploits the capability to calculate the conditional probability of a univariate random variable given only a number of conditioning values. SGS realizations can be applied to represent all the possible spatial distribution patterns of the study variable and model the estimations uncertainty in TOC wt% evaluation.

Conclusions
In summary, MEPK can be characterized as a hybrid method that applies in a twodimensional surface to estimate spatial data distribution. It combines a geostatistical approach, by means of kriging methodology, and a row and column effect analysis. This work suggests that MEPK combined with SGS can model the spatial distribution of TOC wt% and provide satisfactory estimation metrics and uncertainty quantification. The main advantage of this approach is the application of a robust trend surface based on the statistical properties of the sample. Moreover, depending on the coefficient values, the Spartan variogram model can capture important increments, or decrements, of the experimental variogram shape and provide optimum fitting. Finally, this study presents an alternative methodology for the geostatistical analysis of hydrocarbon-related spatial data.

Conflicts of Interest:
The author declares no conflict of interest.