A Comparison of Machine Learning Algorithms in Predicting Lithofacies: Case Studies from Norway and Kazakhstan

: Deﬁning distinctive areas of the physical properties of rocks plays an important role in reservoir evaluation and hydrocarbon production as core data are challenging to obtain from all wells. In this work, we study the evaluation of lithofacies values using the machine learning algorithms in the determination of classiﬁcation from various well log data of Kazakhstan and Norway. We also use the wavelet-transformed data in machine learning algorithms to identify geological properties from the well log data. Numerical results are presented for the multiple oil and gas reservoir data which contain more than 90 released wells from Norway and 10 wells from the Kazakhstan ﬁeld. We have compared the the machine learning algorithms including KNN, Decision Tree, Random Forest, XGBoost, and LightGBM. The evaluation of the model score is conducted by using metrics such as accuracy, Hamming loss, and penalty matrix. In addition, the inﬂuence of the dataset features on the prediction is investigated using the machine learning algorithms. The result of research shows that the Random Forest model has the best score among considered algorithms. In addition, the results are consistent with outcome of the SHapley Additive exPlanations (SHAP) framework.


Introduction
It is important to understand the geological structure of formations based on the provided data in many applications. The key features of the complex subsurface can be defined by geophysicists. The geophysicists' experience on the finding of lithotypes can help to improve the accuracy of the labels in the well logs. Such experience requires many hours of work and additional data from different sources such as seismic survey, cores, etc. One of the possible solutions is to use the machine learning algorithms to accelerate the accurate prediction process in a systematic way. To achieve appropriate accuracy of results, the data-driven algorithms require a large amount of data which should be used in a balanced way in the training procedure. Traditionally, the most common features of a region are identified by geophysicists and then uncommon features are estimated by additional well log data using the knowledge of relationships among lithotypes such as PS, RHOB, and NPHI.
To determine lithotypes, geophysicists perform work in stages: first, Shale and sandstone are determined, often gamma logs are used, sometimes for control of PS, RHOB, and NPHI. After these rocks, the isolation of uncommon lithotypes is made only by their characteristic features. The more features (curves), the better the accuracy of determining the lithotypes. An inexperienced geologist without knowledge of the geology of the field may not accurately determine similar lithotypes; therefore, the use of trained models will solve the problem of the lack of knowledge among geophysicists about the field.
Machine learning can be an effective tool to enrich geoscience workflows. Geostatistical approaches were proposed in many studies [1][2][3][4] to reduce the uncertainty of the subsurface property of using the large datasets .
There are several works regarding application of data analysis methods for mining areas [5,6]. The importance of lithofacies detection for uranium mining is discussed and investigated in [7,8] using machine learning algorithms to solve multilabel lithofacies classification. The in situ leaching of uranium requires a better understanding of the permeable and impermeable rock types.
The authors of [9] have made comparisons of machine learning algorithms using scikitlearning framework (MLPClassifier, the DecisionTreeClassifier, the RandomForestClassifier, and the SVC) for data from offshore wells. Algorithms have been applied to three standard data templates and a practical data template in a lithology classification problem for wells from International Ocean Discovery Program (IODP) Expeditions. We used a dataset from the lithology subdivision in GP (group GP), G1 (group 1), G2 (group 2), and G3 (group 3). The comparison analysis showed that the multilayer perceptron MLP method had better results in the lithology classification for the practical template: lithology of the G2 group.
In [10], the authors proposed using embedded feature selection (EFS) and LightGBM to predict the permeability of a reservoir. Result of EFS was generated based on five features: DEPTH, AC, DEN, FMIT, and GR out of 22 features and was equal 0.9457 (R2). Furthermore, the authors made a comparison of several methods of selection: the mutual information regression (MIR) in FFS and the recursive feature elimination (RFE) in WFS. Commonly used feature selection methods include filter feature selection (FFS) and wrapper feature selection (WFS). The same comparison was done for LightGBM: Random Forest and XGBoost. The best result was from EFS+LightGBM: R2 of 0.9712, RMSE of 0.5959.
The authors of [11,12] presented the application of oil production exploration and development data to generate high-performance predictive models and optimal classifications of geology, reservoirs, and fluid characteristics. The deep learning algorithms have the perspective to solve problems in geoscience in piratically lithology classification as well [13][14][15].
In [16], the authors investigated data preprocessing methods for well logs such as a dimensional reduction and wavelet analysis in order to improve the accuracy of the group method of data handling (GMDH) for lithological classification. Wavelet analysis was used for the decomposition of the log signals for the algorithm (GMDH). The authors of [17] proposed using the continuous wavelet transform of the well log data to detect geological boundaries. One of the applications of the wavelet coefficient is to measure the edge of the boundary strength. The boundary strength is a measure of the geological thickness of units. In the method, instead of solving multivariate classification, additional features were generated to detect the boundaries of the formations. The multi-element geochemical data taken from 259 drill holes were studied and its efficiency was shown for the data with a maximum depth of 600 m.
In this paper, we investigate the prediction of lithofacies using machine learning algorithms for the geological data of Kazakhstan and Norway. We consider machine learning methods such as KNN, Decision Tree, Random Forest, XGBoost, and LightGBM with and without wavelet transformed data. Gamma-ray (GR), medium deep reading resistivity measurement (RMED), compressional waves sonic log (DTC), neutron porosity log (NPHI), bulk density log (RHOB), etc. are considered as the input data of the machine learning models. In addition, the results of the supervised learning are provided in the SHapley Additive exPlanations (SHAP) visualization framework by indicating significant well logs. Our research question is the following: how can some supervised machine learning algorithms accurately predict lithofacies based on the geophysical well log data from Norway and Kazakhstan fields?
The rest of the paper is organized as follows. In the next section, we describe the wavelet transformation, data analysis, and machine learning algorithms. Numerical results of algorithms are presented in Section 3. Section 4 concludes the paper.

Methodology
We first describe the wavelet transformation and then the flowchart of workflow for machine learning algorithms. Next, the data analysis and data preparation are presented. We briefly describe the considered machine learning algorithms for supervised multilabels classification.

Wavelet Transformation
We use the Gaussian wavelet transformation for the edge detection in the geology formation. The second-order derivative of the Gaussian function is also known as the Mexican hat wavelet. Inflection points of the Mexican hat wavelet represent edges of objects in the signal. Application of wavelet transformation to the given signal generates new artificial data which can be useful for further analysis.
The physical meaning of the wavelet transform is to calculate the joint energy spectrum of signals in the frequency-time domain and identify both the frequency and time information of the distinct modes [18].
Wavelet transformation decomposes a geophysical log into a combination of signals at different frequencies. It allows determining what frequency bands of log is noise and what frequency band is actual data. It provides a one-to-one mapping of the original log, so we can go back and forward between the original and transformed data.
The integral wavelet transform of a function f (x) with respect to a mother wavelet is given by where s > 0, τ are the scale factor and shift, respectively. For creation wavelet transformation, we used the Ricker wavelet, also known as the "Mexican hat wavelet": To illustrate the above explanation, we conduct wavelet transformation of the geophysical logs from Kazakhstan, see Figure 1.
To better display the result of the wavelet transformation of logs we use a log scale in Figure 1b. Figure 1a shows its application to the wavelet transform for two logs.
(a) Two logs in 1D (b) One log in 2D Figure 1. Result of applying continuous wavelet transformation.
We have followed the general workflow of a machine learning classifier which is illustrated in Figure 2. Our process of the classifier model consists of the following steps: 1.

2.
Application of the wavelet transformation to generate new features.

3.
Finding hyperparameters and construction of machine learning algorithm as a classifier of lithofacies.

4.
Training of the model on the well log data with the labeled lithology by geophysicist or geologist.

5.
Evaluation of the trained model of classifier according to specified score based on the test dataset.
The initial stage is started with a generation of new features from the current well logs. Next, training of the model for the new dataset, which includes wavelet-transformed well logs, is performed. The trained model is evaluated by estimation of the accuracy on the test dataset.

Data Analysis
We consider the well log data form an offshore field in the North Sea, near Norway. The study area contains 98 wells with a maximum depth of 5000 m. Dataset consists of interpreted lithofacies and well logs, 22 wireline log curves including gamma ray (GR), medium deep reading resistivity measurement (RMED), compressional waves sonic log (DTC), neutron porosity log (NPHI), and bulk density log (RHOB) and others. Digital measurements were recorded at 0.1 m intervals, see Table 1 and 2 for abbreviations and descriptions of the dataset. For data exploration we use a library Cegal https://github.com/cegaldev/cegaltools, accessed on 22 March 2021, which is the geoscience tool for loading, plotting, and evaluating well log data using python script. It is also an interactive tool to visualize data details and dependence. Figure 3 shows one well with its logs.

Data Preparation
The dataset contains some missing data. Key reasons for missing data are technical problems during acquisition data, cost optimization during geophysic logging, human factor, and others. We utilize the Missingo library [19] to detect the data gap from provided dataset. It helps to define logs with their location. In Figure 5, one well is presented and well logs contain missing data such as missing for full depth of well or with some gaps. After careful study and statistical analysis of logs for missing data, we decided to concentrate the following logs, which have a smaller percentage of missing data: DEPTH_MD, CALI, RSHA, RMED, RDEP, RHOB, GR, NPHI, PEF, DTC, SP, and BS.

Algorithms
There are various machine learning algorithms, and each algorithm has its own advantages and disadvantages for solving geoscience problems. In this paper, we made a comparison analysis of five algorithms: K-nearest neighbors (kNN) [20], the Decision Tree [21], the Random Forest Classifier (RFC) [22], and the extreme gradient boosting (XGBoost) [23], LightGBM [24]. They are also explored with and without the generation additional features obtained from the wavelet transformation. In this research, we used "scikit-learn" [25], the developed python framework for utilization of kNN, Desicion Tree, and Random Forest classifier. XGBoost and LightGBM have their own python framework.
K-Nearest Neighbors (kNN) is a machine learning method that has been used for data mining [20]. Each point (data point) has location in a multidimensional space, where the space consists of axis or features of current datasets. The trained model defines an optimal count of neighbors for the trained dataset and when we have a new (test) data point the model finds the K nearest neighbors for the test dataset. KNN has the advantage of being nonparametric. The method is sensitive to scale, so standardizing data is mandatory to eliminate differences in scale. It can be an issue when the dataset is very large, the application of special methods can solve the issue to decrease the space.
Decision tree methods are data mining methods, and they have been successfully used for classification problems. Decision trees were developed by Morgan and Sonquist in 1963, and they applied the algorithm for determinants of social conditions [21]. One advantage of the decision trees is that they are computationally fast and can handle high-dimensional data. On the other hand, a single decision tree can overfit on the data and the algorithm is greedy; therefore, it keeps growing deeper in the tree.
The random forest was introduced by Breiman as a learning tree classifier of an ensemble [22]. The key idea of the algorithm is to take the values of a random vector from an aggregated bootstrap sample (train dataset) and then to train many decision trees. However, the trained tree can have a lot of trees, thus it requires more computational resources.
The main advantage of the XGBoost is parallelization. XGBoost is a scalable version of the gradient boosting machine algorithm and showed efficiency in several machine learning applications. In [23], the XGBoost is an ensemble of classification and regression trees and works for data with nonlinear features. The key idea is to use weak trees and enhancement of trees accuracy for each iteration, taking account the error in prediction from the previous result of a weak tree, the next tree classifier is trained to take into account the error of the already trained ensemble.
LightGBM is a relatively new framework and has a wide application in machine learning/data science applications. The main issue of gradient boosting algorithms is that the algorithm processes all data to gain the result of possible separation points, which impacts performance. This method has been modified to improve the optimal search technique [24].
Based on the train dataset we calculated the main hyperparameters for Random Forest, see Table 3. The main hyperparameters for XGBoost and LightGBM are presented in Tables 4 and 5, respectively.  The prediction performance of the algorithms is evaluated by tree statistical quality indicators: Jaccard metrics (accuracy), Hamming loss, and Penalty metrics. The reader is referred to Table 5.
The Jaccard metric is computed as L Hamming (y,ŷ) = 1 n labels The Hamming loss is defined as To estimate the accuracy of models, a penalty matrix is used and is derived from the averaged input of a representative sample. This allows for petrophysical unreasonable predictions to be scored by a degree of "wrongness".
The scoring matrix is defined as follows: Aŷ i y i (6) where N is the number of samples,ŷ i is the true lithology label, and y i is the predicted lithology label.

Results
Computations are performed on a desktop machine (3.2 GHz Intel Core i7 8700 processor) with 32 GB RAM. Tuning hyperparameters and cross-validations operations are a time-consuming, therefore they are computed in parallel mode using eight cores.

Lithofacies Prediction for the Norway Data
The comparison of the selected algorithms has been performed on 12 features and additional seven features generated by wavelet transformation, for a total of 19 features. Table 6 shows scores of models on the test dataset by the Jaccard metrics (accuracy), Hamming loss, and Penalty metrics. We observe that the RFC has the highest score on the test set with metrics Accuracy, Penalty matrix, and Hamming loss of 0.948, −0.1289, and 0.0473, respectively. Thus, RFC was selected to provide a detailed analysis of lithofacies classification. The classification report for the RFC model (12 features) can be found in Table 7. By evaluating the precision information from Table 7, we noticed that the lowest value were computed for Dolomite (4) and Coal (10). A reason for such values could be the lack of representation of these lithofacies classes in the dataset.  To understand the good accuracy of the RFC model for lithology classification, we use the SHAP package to verify the results, which are consistent with another study [26]. SHAP is a good tool for explanation of the different models and it provides an important value for each features. SHAP builds an explanatory model for a single row-prediction pair to explain a result of prediction. The SHAP values are calculated by averaging the values over all possible features.
SHAP does not enable us to determine the probabilities of predicted classes in the multi-label classification. The explanation models (tree and kernel) cannot output probabilities due to the constraint associated with nonlinear transformations, but it provides the raw margin values of the objective function which fit the model. Figure 6 shows the global importance for 12 classes which was calculated as the average of absolute SHAP values. SHAP ranks the input features by the mean SHAP value, the amount of the value provides the importance of the feature in prediction of certain class (higher means more influential). The GR feature influences on the model prediction in all lithology classes, other features have less influence if compared with GR feature.

Lithofacies Prediction for the Kazakhstan Data
We carried out numerical experiments in the above-mentioned way for wells in the Kazakhstan oil and gas field; the study area contains 10 wells with a maximum depth of 1700 m. The lithology for the field primarily consists of clay, coal, limestone, dolomite, and sand. The data contain well logs such as thermal neutron porosity, caliper, gamma-ray, temperature, resistivity, sonic, and others. The information from well logs was recorded at every foot of the formation where it is logged across.
Data was split into train and test datasets split to 75% and 25%, respectively. In Figure 8, the distribution of lithologic types for train and test dataset in log scale is presented, and distributions have a similar shape. The total dataset is 59,423 rows and 23 features, the train dataset contains rows 47,538 and 23 features, and the test dataset contains 11,885 rows and 23 features.  Based on the result for the Norway dataset, we used the Random Forest Classifier for data from the Kazakhstan field which showed a better result on three metrics. In Table 8, there are three scores that summarize the performance of the Random Forest Classifier on the test datasets for the different lithofacies types. The Random Forest Classifier shows a precisely result as well. Class 2 (dolomite) was not precisely predicted, see Table 9. The reason for such values can be an imbalanced dataset.  Figure 9 shows the global importance for five classes. The PHIE (prediction of effective porosity) and PHIT (prediction of total porosity) features influence the model prediction in all Clay (3) and Sand (0) classes.

Conclusions
This paper analyzes the supervised learning algorithms for the well log data from Norway and Kazakhstan with or without the additional wavelet-transformed features. Our focus was on the data of offshore and onshore reservoirs. The findings suggest that our fitted Random Forest model shows the best results among the considered algorithms. The cross-validation methodology was applied in the machine learning models. Machine learning algorithms, in particular Random Forest method, can be integrated to specific geophysical software to proceed with a lithology classification automatically based on well logs without using information about sludge or core samples, and others. This process can improve efficiency of finding solution for some geophysical interpretation problems.
The nature of the decision tree methods (kNN, Random Forest, Decision Tree, etc.) is verified as set of good methods for the well log data, as it enables solving the nonlinear problem of the lithological classifications. The random forest model has an accuracy of 0.948, penalty matrix score of −0.1289, hamming loss score of 0.0473 for 12 features and an accuracy of 0.938, penalty matrix of −0.1697, and hamming loss of 0.0624 for 19 features including features which were generated from wavelet transformation of the data. Scores of algorithms that used the data and wavelet-transformed data are similar to scores of algorithms that trained only on the data without wavelet transformation. However, we believe that such additional features could help for different problems(regression) in geoscience such as identification of permeability or porosity.
We used the SHAP framework to explore the impact of features on the targeted classification and to detect the complex relationships between features. The result of the SHAP in our dataset showed that the significant features on a prediction of some lithology classes were GR, DTC, and RHOB. However, some classes such as Tuff and Coal can be detected by other features (NPHI and RDEP).
In our future research, we intend to concentrate on deep learning algorithm such as 1D-CNN, LSTM, and RNN for prediction of multi-label lithofacies classification, porosity, and permeability using the well log data.