Evaluation of the Total Organic Carbon (TOC) Using Different Artificial Intelligence Techniques

Total organic carbon (TOC) is an essential parameter used in unconventional shale resources evaluation. Current methods that are used for TOC estimation are based, either on conducting time-consuming laboratory experiments, or on using empirical correlations developed for specific formations. In this study, four artificial intelligence (AI) models were developed to estimate the TOC using conventional well logs of deep resistivity, gamma-ray, sonic transit time, and bulk density. These models were developed based on the Takagi-Sugeno-Kang fuzzy interference system (TSK-FIS), Mamdani fuzzy interference system (M-FIS), functional neural network (FNN), and support vector machine (SVM). Over 800 data points of the conventional well logs and core data collected from Barnett shale were used to train and test the AI models. The optimized AI models were validated using unseen data from Devonian shale. The developed AI models showed accurate predictability of TOC in both Barnett and Devonian shale. FNN model overperformed others in estimating TOC for the validation data with average absolute percentage error (AAPE) and correlation coefficient (R) of 12.02%, and 0.879, respectively, followed by M-FIS and SVM, while TSK-FIS model showed the lowest predictability of TOC, with AAPE of 15.62% and R of 0.832. All AI models overperformed Wang models, which have recently developed to evaluate the TOC for Devonian formation.


Introduction
Recently, due to the advances in horizontal drilling and multi-stage fracturing, the possibility of producing hydrocarbon from unconventional hydrocarbon resources, such as shale oil and shale gas is significantly increased. The total organic carbon (TOC) is an essential parameter for unconventional shale resource characterization and evaluation. It expresses the amount of organic carbon present in the formation, thus, indicates the hydrocarbon reserve in these unconventional resources [1,2].
TOC is dependent on many factors, such as gas adsorption, maturity, and carbon content because these factors affect the reservoir organic porosity [2][3][4]. TOC is also significantly affected by the pore structure and wettability of the shale [2,5,6]. Thus, reserve prediction of unconventional reservoirs needs an accurate method to predict the TOC [5,6].
Currently, several empirical correlations, which were developed based on different assumptions, are used to evaluate the TOC for specific formation types, based on the available well logs. Schmoker [7] developed the first correlation for TOC prediction based on the formation bulk density (RHOB). (1) is developed initially for Devonian shale, this correlation estimates the TOC as volume percentage, which could then be converted to weight percentage as explained in Schmoker [7], TOC(vol.%) = (ρ B − ρ) 1.378 (1) where ρ B and ρ denote the organic matter free rock density and the rock bulk density both in g/cm 3 . Schmoker [8] revised his first model to be applicable for Bakken shale formation and he came up with the revised model in Equation (2),

His correlation in Equation
where ρ o denotes the density of the organic matter in g/cm 3 , R is the ratio of the organic matter to organic carbon as the weight percentage, ρ mi denotes the grain and pore fluid average density in g/cm 3 . Passey et al. [9] developed a simple model for TOC prediction, based on the deep resistivity (DR) and sonic transit time (DT) logs, this model is named ∆logR model, which is summarized in Equations (3) and (4). ∆logR model is currently widely used for evaluating the unconventional resources reserve, where ∆logR is the logs separation, R and R baseline denote the evaluated formation and the base formation resistivity in ohm.m, ∆t and ∆t baseline represent the evaluated formation and base formation sonic transit times both in µs/ft, and LOM is the level of maturity. The Schmoker and ∆logR models were evaluated by Charsky and Herron [10] into various formations in four different wells. The authors found that these models are not accurate, where TOC is predicted with an average absolute difference (ADD) of 1.6 wt%, forming the core derived TOC for Schmoker model and 1.7 wt% for ∆logR method.
The most recent and current studies focus on estimating the TOC by improving the accuracy of ∆logR model [11][12][13] or by applying machine learning techniques [14][15][16].
Wang et al. [12] revised the ∆logR models and developed new empirical correlations for TOC estimation in Devonian shale formation as a function of the DR, DT, RHOB, and gamma-ray (GR). In their models, Wang et al. [12] suggested to include GR log to enhance TOC estimation, and they used more common thermal indicators such as vitrinite reflectance (R o ) or T max instead of LOM, which simplify the use of Wang et al. [12] models, since the conversion between (T max or R o ) and LOM is not required. Therefore, it reduces the practical problems [17]. Equations (5) and (6) are the revised ∆logR models based on sonic and density logs, respectively. Equation (7) could be used to estimate the TOC using ∆logR and gamma-ray log: where ∆t m denotes the matrix sonic transit time (µs/ft), m represents the cementation exponent, ρ m and ρ baseline are the matrix and baseline densities (g/cm 3 ), where the baseline density corresponds to R baseline value, α, β, δ and η are the matrix constants, which are different for different formations and must be determined, T max is the maturity indicator ( • C), GR baseline is the baseline value of shale (API). Applying the revised ∆logR models into the Devonian shale formation showed an improvement in TOC evaluation with a coefficient of determination (R 2 ) of more than 0.92 compared with R 2 of 0.82 when the original ∆logR model is used.
Applying any of the previously discussed correlations to evaluate TOC in formations different than the one developed leads to inaccurate predictions. Recently, Mahmoud et al. [18,19] suggested an artificial neural network (ANN)-based correlation for TOC estimation in Barnett formation using conventional well logs. Later on, Elkatatny [20] applied the self-adaptive differential evolution algorithm to optimize Mahmoud et al.'s [18,19] ANN model and he was able to improve the model predictability.
In this study, four artificial intelligence (AI) models were developed to estimate TOC based on the application of the Takagi-Sugeno-Kang fuzzy interference system (TSK-FIS), Mamdani fuzzy interference system (M-FIS), functional neural network (FNN), and support vector machine (SVM). These models use conventional well logs of DR, GR, DT, and RHOB, collected from the Barnett shale formation.

Different Applications of Artificial Intelligence Techniques
Since the early 1990s, AI techniques had been extensively applied in many scientific and engineering fields, including in the petroleum industry. Nowadays, AI has been used by petroleum engineers and geologists to solve problems related to unconventional hydrocarbon resources evaluation [18][19][20], reservoir characterization [21,22], bubble point pressure evaluation [23], prediction of real-time change in the rheological parameters of the drilling fluids [24,25], optimization of rate of penetration [26], estimation of rock mechanical parameters [27,28], prediction of pore pressure and fracture pressure [29,30], evaluation of the wellbore casing integrity [31,32], hydrocarbon recovery factor estimation [33,34] optimization of the drilling hydraulics [35], and others. AI techniques have also been applied successfully in other fields like social media [36,37].

Experimental Testing Using Rock-Eval 6
The core samples collected from Barnett shale (Fort Worth Basin (FWB), North Texas, USA) and Devonian Duvernay shale (Western Canada Sedimentary Basin (WCSB)) were analyzed for TOC estimation. The collected samples were crushed to less than 63 µm, the weight percentage of the pyrolyzable carbon and pyrolyzable mineral-carbon in every sample were first determined by thermally decomposing the sample using the pyrolysis oven. During pyrolysis, the temperature was kept constant at 300 • C for three minutes then increased by 25 • C/min to reach 650 • C, the flame ionization detector and infrared cells are used to simultaneously detect the hydrocarbons, CO 2 , and CO. After that, the weight percentages of the residual carbon and oxidized mineral-carbon in every sample were determined by burning them in the oxidation oven at 300 • C for 30 seconds, then increasing the temperature up to 850 • C at a rate of 25 • C/min, and finally keeping the temperature at 850 • C for five minutes. More details about sample preparation procedures and considerations for TOC measurement by Rock-Eval 6 were reported by different authors [38][39][40].

Proposed Methodology
In this study, conventional well logs of DR, GR, DT, and RHOB, collected from Barnett shale, are used to train TSK-FIS, M-FIS, FNN, and SVM models to predict the corresponding laboratory-measured TOC. These AI models were used in this study to estimate the TOC because of their already proven high accuracy in evaluating petroleum-and geology-related parameters. A total of 838 data points of core and log data were collected from Barnett shale. Figure 1 shows the log data collected from Barnett shale which is used to develop the models. Different combinations of the design parameters of the AI models were optimized using inserted for loops built-in Matlab. The optimization process of the AI models was continued until the minimum average absolute percentage error (AAPE), and the highest coefficient of determination (R 2 ) and correlation coefficient (R) between the predicted and the core measured TOC are obtained. The trained and optimized AI models were then tested using another set of data from the same well, and validated using data points collected from the Devonian shale formation. TOC predictability of the developed AI models for the validation data collected from Devonian formation was then compared with that of Wang et al. [12] sonic-and density-based models summarized in Equations (5)- (7). the predicted and the core measured TOC are obtained. The trained and optimized AI models were then tested using another set of data from the same well, and validated using data points collected from the Devonian shale formation. TOC predictability of the developed AI models for the validation data collected from Devonian formation was then compared with that of Wang et al. [12] sonic-and density-based models summarized in Equations. (5)-(7).

Data Description and Preprocessing
Conventional log data of DR, GR, DT, and RHOB and the corresponding actual (laboratorymeasured) TOC values collected from Barnett shale formation were used to train the four AI models considered in this study. Before training, all the data was pre-processed to remove unrealistic values and outliers. After data pre-processing, 838 data points of the different well logs and their corresponding actual TOC values were found to be valid for model buildup. The use of 545, 545, 587, and 671 of the data to train TSK-FIS, M-FIS, FNN, and SVM models, respectively, were found to optimize the performance of the AI models in predicting the TOC. The number of training data was selected based on the optimization process, as discussed later in this paper. Table 1 compares the different statistical features of the training data that are used to learn the four AI models developed in this study. These statistical parameters are very important for

Data Description and Preprocessing
Conventional log data of DR, GR, DT, and RHOB and the corresponding actual (laboratory-measured) TOC values collected from Barnett shale formation were used to train the four AI models considered in this study. Before training, all the data was pre-processed to remove unrealistic values and outliers. After data pre-processing, 838 data points of the different well logs and their corresponding actual TOC values were found to be valid for model buildup. The use of 545, 545, 587, and 671 of the data to train TSK-FIS, M-FIS, FNN, and SVM models, respectively, were found to optimize the performance of the AI models in predicting the TOC. The number of training data was selected based on the optimization process, as discussed later in this paper. Table 1 compares the different statistical features of the training data that are used to learn the four AI models developed in this study. These statistical parameters are very important for consideration when the AI models are applied to estimate the TOC using new data. In this study, before testing and validating the developed AI models, the statistical parameters of the testing and validation data were determined to ensure that these data (i.e., testing and validation data) are within the range of the training data used to develop the AI models which are summarized in Table 1. Table 1. Statistical features of the data used to train the Takagi-Sugeno-Kang fuzzy interference system (TSK-FIS), Mamdani fuzzy interference system (M-FIS), functional neural network (FNN), and support vector machine (SVM) models. The relative importance of the selected training well log data on the predictability of the TOC values was then studied. Figure 2 compares the relative importance between the different conventional well logs used to train the four AI models and the laboratory-measured TOC values. As indicated in Figure 2 and for the data used to train all AI models, TOC is strongly dependent on the RHOB, while it is moderately related to DR, DT, and GR.

AI Model's Development
Four AI models namely: TSK-FIS, M-FIS, FNN, and SVM models were developed in this study to estimate the TOC using conventional well logs of DR, DT, GR, and RHOB. The four conventional well logs, used to train the AI models, were selected based on their relative importance to the core measured TOC, as discussed earlier and shown in Figure 2. However, the selection conforms to their published reported relationship with TOC. For example, DR is believed to be affected by the presence of kerogen in the source rock [41]; DT decreases with the increase in the TOC [42]; several studies have confirmed that GR could significantly enhance TOC prediction [41,43], but the relationship is controversial to others [44,45]; and RHOB decreases with the increase in the kerogen content, and hence, organic matter in the formation increases [7]. Because of the above-listed reasons, the four conventional well logs of DR, DT, GR, and RHOB are considered to develop the TOC models in this study.
All AI models were optimized for their design parameters and the training-to-testing data ratio. Table 2 summarizes the optimized design parameters of the AI models.

AI Model's Development
Four AI models namely: TSK-FIS, M-FIS, FNN, and SVM models were developed in this study to estimate the TOC using conventional well logs of DR, DT, GR, and RHOB. The four conventional well logs, used to train the AI models, were selected based on their relative importance to the core measured TOC, as discussed earlier and shown in Figure 2. However, the selection conforms to their published reported relationship with TOC. For example, DR is believed to be affected by the presence of kerogen in the source rock [41]; DT decreases with the increase in the TOC [42]; several studies have confirmed that GR could significantly enhance TOC prediction [41,43], but the relationship is controversial to others [44,45]; and RHOB decreases with the increase in the kerogen content, and hence, organic matter in the formation increases [7]. Because of the above-listed reasons, the four conventional well logs of DR, DT, GR, and RHOB are considered to develop the TOC models in this study.
All AI models were optimized for their design parameters and the training-to-testing data ratio. Table 2 summarizes the optimized design parameters of the AI models.

Evaluation Criterion
The predictability of the developed AI models, used to estimate the TOC for the training, testing, and validation data sets, was evaluated based on the absolute average percentage error "Equation (8)", correlation coefficient "Equation (9)", coefficient of determination "Equation (10)", and the visual check of the actual and predicted TOC.
where in all previous equations a and m denote the actual and estimated RF, respectively.

Application Examples to Barnett and Devonian Shale
The predictability of the four AI models considered in this study was evaluated using data of two different depositional environments. The first formation is the Mississippian Barnett shale, which was considered earlier by the United States Energy Information Administration as the main source rock of hydrocarbon in FWB [3,46]. In 2011, the proven reserve of this formation was more than 31 trillion cubic feet (TCF) with a cumulative gas production rate of 8.0 TCF. Several studies, such as Pollastro et al. [46], Romero-Sarmiento et al. [47], and Thomas [48] reported the general geologic information about Barnett shale. The second formation is the Devonian shale in WCSB, which is an organic-rich source rock in the Devonian conventional hydrocarbon system [49]. The oil and gas in place in this formation are 61.7 Billion barrels, and 443 Tcf, respectively. According to recent production data, this shale is rich in liquid [50].

Training the AI Models
The AI models considered in this work (TSK-FIS, M-FIS, FNN, and SVM) were trained to optimize their design parameters, the optimum design parameters of the AI models are summarized earlier in Table 2. Figure 3 compares the predictability of the four optimized AI models for the training data sets, as shown in Figure 3. The number of data used to train every AI model are different. As explained earlier, the training to testing data ratio is considered during the models optimization process, and based on this optimization, the number of training data that maximize predictability of every model is selected. different. As explained earlier, the training to testing data ratio is considered during the models optimization process, and based on this optimization, the number of training data that maximize predictability of every model is selected.  Figure 3 shows that the TSK-FIS model predicted the TOC for the training data set with the highest accuracy compared to other models, with AAPE of 7.12% and R of 0.968. M-FIS comes second with AAPE and R of 7.48% and 0.962, followed by the FNN model with AAPE of 8.05% and R of 0.936, and finally the SVM model with AAPE and R of 9.75%, and 0.933, respectively. The visual check of the plots confirms a high accuracy of the four AI models in estimating the TOC for the training data set.
Cross-plot of Figure 4 compares the measured and estimated TOC for the training data set. The narrow scattering of the points indicates the predictability of the models; TSK-FIS model is the highest with R 2 = 0.937, then M-FIS model with R 2 = 0.926, followed by FNN model with R 2 = 0.876, and finally SVM with the lowest R 2 of 0.871.  Figure 3 shows that the TSK-FIS model predicted the TOC for the training data set with the highest accuracy compared to other models, with AAPE of 7.12% and R of 0.968. M-FIS comes second with AAPE and R of 7.48% and 0.962, followed by the FNN model with AAPE of 8.05% and R of 0.936, and finally the SVM model with AAPE and R of 9.75%, and 0.933, respectively. The visual check of the plots confirms a high accuracy of the four AI models in estimating the TOC for the training data set.
Cross-plot of Figure 4 compares the measured and estimated TOC for the training data set. The narrow scattering of the points indicates the predictability of the models; TSK-FIS model is the highest with R 2 = 0.937, then M-FIS model with R 2 = 0.926, followed by FNN model with R 2 = 0.876, and finally SVM with the lowest R 2 of 0.871.

Testing the AI Models
The predictability of the four AI models, developed in this study, is then tested using data collected from the Barnett shale formation. The number of the testing data points is selected based on the optimization process as mentioned earlier. Figure 5 compares the predictability of the AI models to evaluate the TOC for the testing data sets. Visually, the four plots indicate similar predictability for the four models, with minor differences. Considering the AAPE and R M-FIS model is the highest with 11.10% and 0.933, followed by TSK-FIS model with 11.20% and 0.918, then FNN model with 11.29% and 0.905, and finally SVM model with 11.45%, and 0.931 respectively.
The cross-plot in Figure 6 presents the correlation between measured and estimated TOC for the testing data set. The plots indicate high correlation with R 2 equal 0.870, 0.867, 0.842, and 0.818 for M-FIS, SVM, TSK-FIS, and FNN models, respectively.

Testing the AI Models
The predictability of the four AI models, developed in this study, is then tested using data collected from the Barnett shale formation. The number of the testing data points is selected based on the optimization process as mentioned earlier. Figure 5 compares the predictability of the AI models to evaluate the TOC for the testing data sets. Visually, the four plots indicate similar predictability for the four models, with minor differences. Considering the AAPE and R M-FIS model is the highest with 11.10% and 0.933, followed by TSK-FIS model with 11.20% and 0.918, then FNN model with 11.29% and 0.905, and finally SVM model with 11.45%, and 0.931 respectively.
The cross-plot in Figure 6 presents the correlation between measured and estimated TOC for the testing data set. The plots indicate high correlation with R 2 equal 0.870, 0.867, 0.842, and 0.818 for M-FIS, SVM, TSK-FIS, and FNN models, respectively.

Validating the AI Models
The AI model's validation was completed using unseen data collected from the Devonian shale formation. The total number of core derived TOC data collected from Devonian shale are 22 data points, out of these data, only 20, 19, 19, and 15 were found to fit within the range of the training data that is used to develop TSK-FIS, M-FIS, FNN, and SVM models, respectively. The range for the

Validating the AI Models
The AI model's validation was completed using unseen data collected from the Devonian shale formation. The total number of core derived TOC data collected from Devonian shale are 22 data points, out of these data, only 20, 19, 19, and 15 were found to fit within the range of the training data that is used to develop TSK-FIS, M-FIS, FNN, and SVM models, respectively. The range for the

Validating the AI Models
The AI model's validation was completed using unseen data collected from the Devonian shale formation. The total number of core derived TOC data collected from Devonian shale are 22 data points, out of these data, only 20,19,19, and 15 were found to fit within the range of the training data that is used to develop TSK-FIS, M-FIS, FNN, and SVM models, respectively. The range for the training data are summarized in Table 1. Based on the AAPE and R results as indicated in Figure 7, FNN model was the best model with AAPE of 12.02% and R of 0.879, followed by M-FIS model with AAPE and R of 13.18 and 0.875, then SVM with AAPE and R of 14.52% and 0.860, and finally TSK-FIS model with AAPE of 15.62% and R of 0.832 respectively. As shown in Figure 7, all AI models are highly accurate compared to Wang et al. [12] sonic-and density-based models, Wang sonic-based model (WSBM) predicted the TOC with AAPE of 34.58% and R of 0.806, while Wang density-based model (WDBM) predicted TOC with AAPE, and R of 49.04% and 0.469, respectively.  From the the results of training, testing, and validation data, considering the similarity of the results of the evaluation parameters (AAPE and R), and taking into consideration that adding or omitting a few points may change the highest-to-lowest order of the parameters, we conclude that the four models are equally adequate to estimate the TOC using only the conventional well log used in this study. Nevertheless, we recommend using the FNN model as it is the best-performed model on the validation data.

Conclusions
In this study, four artificial intelligence (AI) models based on Takagi-Sugeno-Kang fuzzy interference system, Mamdani fuzzy interference system, functional neural network, and support vector machine are developed to estimate the total organic carbon (TOC) using conventional well logs of deep resistivity, gamma-ray, sonic transit time, and bulk density. The models are developed and tested using data collected from Barnett shale and then validated using unseen data from Devonian shale. The optimized AI models showed a high predictability of TOC for both formations evaluated in this study. The four models are equally adequate to estimate the TOC using the well log used in this study. Nevertheless, for the validation (unseen) data considered in this study, the FNN model overperformed other models in predicting the TOC, with the lowest AAPE and the highest R, compared with other techniques. All AI models over-performed Wang models, which are recently developed to evaluate the TOC for Devonian formation. From the the results of training, testing, and validation data, considering the similarity of the results of the evaluation parameters (AAPE and R), and taking into consideration that adding or omitting a few points may change the highest-to-lowest order of the parameters, we conclude that the four models are equally adequate to estimate the TOC using only the conventional well log used in this study. Nevertheless, we recommend using the FNN model as it is the best-performed model on the validation data.

Conclusions
In this study, four artificial intelligence (AI) models based on Takagi-Sugeno-Kang fuzzy interference system, Mamdani fuzzy interference system, functional neural network, and support vector machine are developed to estimate the total organic carbon (TOC) using conventional well logs of deep resistivity, gamma-ray, sonic transit time, and bulk density. The models are developed and tested using data collected from Barnett shale and then validated using unseen data from Devonian shale. The optimized AI models showed a high predictability of TOC for both formations evaluated in this study. The four models are equally adequate to estimate the TOC using the well log used in this study. Nevertheless, for the validation (unseen) data considered in this study, the FNN model overperformed other models in predicting the TOC, with the lowest AAPE and the highest R, compared with other techniques. All AI models over-performed Wang models, which are recently developed to evaluate the TOC for Devonian formation. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.