A Data-Driven Approach for Lithology Identiﬁcation Based on Parameter-Optimized Ensemble Learning

: The identiﬁcation of underground formation lithology can serve as a basis for petroleum exploration and development. This study integrates Extreme Gradient Boosting (XGBoost) with Bayesian Optimization (BO) for formation lithology identiﬁcation and comprehensively evaluated the performance of the proposed classiﬁer based on the metrics of the confusion matrix, precision, recall, F1-score and the area under the receiver operating characteristic curve (AUC). The data of this study are derived from Daniudui gas ﬁeld and the Hangjinqi gas ﬁeld, which includes 2153 samples with known lithology facies class with each sample having seven measured properties (well log curves), and corresponding depth. The results show that BO signiﬁcantly improves parameter optimization e ﬃ ciency. The AUC values of the test sets of the two gas ﬁelds are 0.968 and 0.987, respectively, indicating that the proposed method has very high generalization performance. Additionally, we compare the proposed algorithm with Gradient Tree Boosting-Di ﬀ erential Evolution (GTB-DE) using the same dataset. The results demonstrated that the average of precision, recall and F1 score of the proposed method are respectively 4.85%, 5.7%, 3.25% greater than GTB-ED. The proposed XGBoost-BO ensemble model can automate the procedure of lithology identiﬁcation, and it may also be used in the prediction of other reservoir properties.


Introduction
Lithology identification plays a crucial role in the exploration of oil and gas reservoirs, reservoir modelling, drilling planning and well completion management. Lithology classification is the basis of reservoir characterization and geological analysis. It is possible to generate lithological patterns after being informed of lithology. Such patterns can be applied in simulators for the purpose of understanding the potentiality of an oil field. Apart from the significance in geological studies, lithology identification also has practical value in enhanced oil recovery processes and well planning [1][2][3][4].
Many researches have been done on lithology identification, which is mainly divided into two parts: direct and indirect. The direct methods to determine the lithology are to make inferences from core analysis and drilling cuttings [5,6]. However, it is expensive, time-consuming and not always reliable since different geologists may provide different interpretations [7]. Compared with the direct method, using well logs to identify lithology is a universal indirect method, which is more accurate, effective and economical. Until now, there have been many lithology identification methods associated with logs, including the cross plotting method, traditional statistical analysis and several machine learning methods [8][9][10]. The cross plotting takes a considerable amount of time of an experienced analyst, especially in the heterogeneous reservoir. Due to the high dimensional, non-linear and The relevant research data are collected from two gas fields in Ordos Basin, namely, Daniudi gas field (DGF) and Hangjinqi gas field (HGF) [19]. The dataset we used is log data from twelve wells (with 2153 examples) that have been labelled with a lithology type based on observation of core. This dataset consists of a set of eight predictor variables and a lithology class for each example vector. Predictor variables include seven from well log measurements and corresponding depth. The seven log properties include: acoustic log (AC), caliper log (CAL), gamma ray log (GR), deep latero log (LLD), shallow latero log (LLS), density log (DEN) and compensated neutron (CNL). The sampling frequency of well logs is 0.125 m.
Due to the complex geological structure, the tectonic development of the target layer and realtime logging data captured by sensors are noisy, which causes a certain particular deviation in the measurement results. Therefore, data preprocessing is required before using a machine learning algorithm.
Using the Tukey's method to detect outliers is a standard method of noise reduction. In this method, the sequence has been divided into four parts, and then it is required to find the first quartiles 1 Q and third quartiles 3 Q [22]. The interquartile range (IQR) is calculated by Equation (1). Lower Inner Fence (LIF) and Upper Inner Fence (UIF) have been calculated through Equations (2) and (3).
All the values outside of LIF and UIF are considered as outliers and should be eliminated. The data is shown in Table 1.  The relevant research data are collected from two gas fields in Ordos Basin, namely, Daniudi gas field (DGF) and Hangjinqi gas field (HGF) [19]. The dataset we used is log data from twelve wells (with 2153 examples) that have been labelled with a lithology type based on observation of core. This dataset consists of a set of eight predictor variables and a lithology class for each example vector. Predictor variables include seven from well log measurements and corresponding depth. The seven log properties include: acoustic log (AC), caliper log (CAL), gamma ray log (GR), deep latero log (LLD), shallow latero log (LLS), density log (DEN) and compensated neutron (CNL). The sampling frequency of well logs is 0.125 m.
Due to the complex geological structure, the tectonic development of the target layer and real-time logging data captured by sensors are noisy, which causes a certain particular deviation in the measurement results. Therefore, data preprocessing is required before using a machine learning algorithm.
Using the Tukey's method to detect outliers is a standard method of noise reduction. In this method, the sequence has been divided into four parts, and then it is required to find the first quartiles Q 1 and third quartiles Q 3 [22]. The interquartile range (IQR) is calculated by Equation (1). Lower Inner Fence (LIF) and Upper Inner Fence (UIF) have been calculated through Equations (2) and (3). All the values outside of LIF and UIF are considered as outliers and should be eliminated. The data is shown in Table 1.
The core analyses report that the lithology classes are composed of clastic rocks, carbonate rock (CR) and coal (C). According to the detrital grain size classification of Chinese oil industry [23], the clastic rocks in this research are divided into six types, namely, pebbly sandstone (PS) (>1 mm grain size), coarse sandstone (CS) (0.5-1 mm grain size), medium sandstone (MS) (0.25-0.5 mm grain size), fine sandstone (FS) (0.01-0.25 mm grain size), siltstone (S) (0.005-0.05 mm grain size), and mudstone (M) (<0.005 mm grain size). The number of samples in each class for the research areas could be presented, as shown in Figure 2. The core analyses report that the lithology classes are composed of clastic rocks, carbonate rock (CR) and coal (C). According to the detrital grain size classification of Chinese oil industry [23], the clastic rocks in this research are divided into six types, namely, pebbly sandstone (PS) (>1 mm grain size), coarse sandstone (CS) (0.5-1 mm grain size), medium sandstone (MS) (0.25-0.5 mm grain size), fine sandstone(FS) (0.01-0.25 mm grain size), siltstone(S) (0.005-0.05 mm grain size), and mudstone(M) (<0.005 mm grain size). The number of samples in each class for the research areas could be presented, as shown in Figure 2.

Extreme Gradient Boosting
XGBoost algorithm is a lifting algorithm that generates multiple weak learners by continuous residual fitting, which belongs to a kind of ensemble learning. The accumulation of weak learners will produce a strong one in the end. In the optimizing process, XGBoost uses Taylor expansion to introduce the information of the second derivative as loss functions, so the model has a faster convergence speed. Besides, to prevent overfitting, XGBoost added regularization terms in loss functions to inhibit the complexity of the model [24]. ( ) The prediction precision of the model is determined together by deviation and variance, while the loss function can reflect the deviation of the model. To control the variance and simplify the model, regularization terms are added to inhibit the model complexity. Together with the loss functions of the model, objective functions of XGBoost functions are constituted as following.

Extreme Gradient Boosting
XGBoost algorithm is a lifting algorithm that generates multiple weak learners by continuous residual fitting, which belongs to a kind of ensemble learning. The accumulation of weak learners will produce a strong one in the end. In the optimizing process, XGBoost uses Taylor expansion to introduce the information of the second derivative as loss functions, so the model has a faster convergence speed. Besides, to prevent overfitting, XGBoost added regularization terms in loss functions to inhibit the complexity of the model [24].
Assuming D = (x i , y i ) (|D|) = n, x i ∈ R d , y i ∈ R) is a data integration, including n samples with each sample having d eigenvalues, whereas x i represents the value of sample i. The classification and regression tree (CART) is selected as the base model, then the integrated model of XGBoost composes k (number of trees) base models into an addition expression to predict the final result [25].
The prediction precision of the model is determined together by deviation and variance, while the loss function can reflect the deviation of the model. To control the variance and simplify the model, regularization terms are added to inhibit the model complexity. Together with the loss functions of the model, objective functions of XGBoost functions are constituted as following.
where Ob j (t) is the objective function of tree t,ŷ is the summation of the output values of the previous t − 1 trees, f t (x i ) is the output value of tree t, l is the differentiable convex loss function to Energies 2020, 13, 3903 5 of 15 balance the prediction valueŷ i and true value y i . Ω is the penalty term representing model complexity, γ is the regularization parameter representing the leaf number, λ is the regularization parameter representing the leaf weights and w is the value of a leaf node. Define I j = i q(x i ) = j as the sample set of leaf set j, then expand the loss functions with Taylor series atŷ (t−1) i . Define g i and h i as the first derivative and second derivative of Taylor expansions, respectively, then remove the constant term, so the loss function after t iterations becomes where w j is the weight of leaf node j. Define G i = i∈l j g i and H i = i∈l j h i then substitute them into Equation (6), so the objective function can be simplified as In the above equation, leaf node w j is an uncertain value, so take the first derivative of objective function Ob j (t) with respect to w j and the optimal value w * j of leaf node j can be obtained.
Substitute w * j back into the objective function, so the minimum value of Ob j (t) can be calculated as

Bayesian Optimization
Unlike other optimization algorithms, BO constructs a probability model of functions to be optimized and utilizes the model to determine next point that requires evaluation. The algorithm assumes a collection function according to the prior distribution, while the new sampling sites are used to test the information of objective functions to update the prior distribution. Then most possible locations for the global values (given by posterior distribution) are tested. It can be seen that although BO algorithm needs to operate more iterations to confirm that next sampling point, fewer evaluations are required to find the minimum of a complex convex function [26].
There are two important procedures when operating BO algorithm. First, a transcendental function needs to be selected to show the distribution hypothesis of the optimized function. This procedure usually utilizes the Gaussian process, which is highly flexible and trackable. Then, an acquisition function is required to construct a utility function from the model posterior distribution, so next point could be determined to evaluate.
Gaussian Process refers to a set of random variables. In this paper, it represents different parameter combinations in the XGBoost algorithm, whereas the linear combinations with any limited samples have a joint Gaussian Process [27].
Energies 2020, 13, 3903 is the covariance function of x and MAE is the mean absolute error of f (x), the formula of which is In the formula, h x i refers to the predicted output of XGBoost and y i represents the true error when calculating the result under current influence factors. One of the characteristics of Gaussian Process is every x related with f (x), while for a set X = {x 1 , x 2 , . . . x 1 }, there is a joint Gaussian Process satisfying the formula below If a pair of known samples x 1: , f 1:t is added into the new sample x t+1 , the joint Gaussian Process expresses as Calculate the posterior probability of f t+1 with Equations (13) and (14) as After inputting the sample combination points into the Gaussian model, so the mean value and variance of f (x) can be gained. If the sample points increase gradually, the gap between posterior probability and true value of f (x) will be narrowed a lot. The target of defining an extract function is to extract sampling points in parameter space with purposes, while there are two directions to extract those points [28].
(1) Explore. Select unevaluated parameter combinations as much as possible to avoid local optimal values, so the posterior probability of f (x) will reach the true value of f (x). (2) Exploit. Based on the optimal values found, after searching the parameters around, the global optimum can be found faster.
Probability of Improvement (POI) function is chosen as the acquisition function. The basic idea of this method is to maximize the probability when the point to be selected next step improves the maximum. If the current found maximum is f (x + ), then the extract function is listed as below: In which Φ refers to the normal cumulative distribution function, which is also called maximum probability of improvement (MPI).
From Formula (18) it can be seen that the function turns to select the parameter combinations around the known optimal values. To make the algorithm explore more unknown space as much as possible, a trade-off function is added. In this case, it can be avoided to search the optimal values near Energies 2020, 13, 3903 7 of 15 to f (x + ) and the coefficient ε can be altered dynamically to control the direction to find the optimal values: Either in the explore direction or exploit direction.
BO algorithm aims to find the x in the set s and make the unknown function f (x) reach the global maximum or minimum. The selection of x follows

Details of Implementation
The dataset were randomly split into a training set and test set through the cross-validation. The iteration process of our proposed ensemble algorithm is shown in Figure 3. From Formula (18) it can be seen that the function turns to select the parameter combinations around the known optimal values. To make the algorithm explore more unknown space as much as possible, a trade-off function is added. In this case, it can be avoided to search the optimal values near to ( ) f x + and the coefficient ε can be altered dynamically to control the direction to find the optimal values: Either in the explore direction or exploit direction.
BO algorithm aims to find the x in the set s and make the unknown function ( )

Details of Implementation
The dataset were randomly split into a training set and test set through the cross-validation. The iteration process of our proposed ensemble algorithm is shown in Figure 3. The first step is to randomly generate initialization points based on the number and range of XGBoost hyperparameters. The next step is to input the initialization parameters into the Gaussian Process and modify them. Then we use the extract function to select the combined parameters to be evaluated from the modified Gaussian process. If the error of the newly chosen combined parameters meets the target requirements, the algorithm execution is terminated, and the corresponding combined parameters are output. We use this set of parameters to obtain the trained classifier and evaluate the classifier through the test set. If the target requirements are not met, modify the Gaussian Process until the set conditions are satisfied.

Performance Measure
Not only does the generalization performance evaluation of this lithology classifier requires an effective and feasible experimental estimation method, but also a standard to measure the generalization ability of the model, which is the performance measurement. The performance of each task was measured using different metrics, which depends on the task. When comparing the capabilities of varying lithology classifiers, different performance metrics often leads to different judgment results.
In order to more comprehensively evaluate the effect of performance of our model, we used the following metrics: accuracy, confusion matrix, precision, recall, F1-score, and the area under the receiver operating characteristic curve.
The accuracy, given by Equation (20), measures the percentage of the correct classification by comparing the predicted classes with those classified by the manual method. The first step is to randomly generate initialization points based on the number and range of XGBoost hyperparameters. The next step is to input the initialization parameters into the Gaussian Process and modify them. Then we use the extract function to select the combined parameters to be evaluated from the modified Gaussian process. If the error of the newly chosen combined parameters meets the target requirements, the algorithm execution is terminated, and the corresponding combined parameters are output. We use this set of parameters to obtain the trained classifier and evaluate the classifier through the test set. If the target requirements are not met, modify the Gaussian Process until the set conditions are satisfied.

Performance Measure
Not only does the generalization performance evaluation of this lithology classifier requires an effective and feasible experimental estimation method, but also a standard to measure the generalization ability of the model, which is the performance measurement. The performance of each task was measured using different metrics, which depends on the task. When comparing the capabilities of varying lithology classifiers, different performance metrics often leads to different judgment results.
In order to more comprehensively evaluate the effect of performance of our model, we used the following metrics: accuracy, confusion matrix, precision, recall, F1-score, and the area under the receiver operating characteristic curve.
The accuracy, given by Equation (20), measures the percentage of the correct classification by comparing the predicted classes with those classified by the manual method.
where f (x i ) is the predicted lithology classes of test samples and y i is the correct classification of this sample. If f (x i ) = y then I = 1, otherwise I = 0. According to the combination of exact lithology classes and algorithm prediction classes, the sample is divided into four cases: true positive (TP), false positive (FP), true negative (TN), and false negative (FN).
The recall is defined in Equation (21).
The recall indicates that the percentage of true positive samples that are classified as positive. The precision is defined in Equation (22).
The precision measures the proportion of actual positive samples among the samples that are predicted to be positive.
The F1 score is the harmonic mean of Recall and Precision. The F1 score can be calculated as: Receiver Operating Characteristics (ROC) curve is used to evaluate generalization performance, and the area of the curve is AUC. The ROC curve is acquired by the true positive rate (TPR) and the false positive rate (FPR) at assorted discrimination levels [29]. The TPR and FPR can be written as (24) and (25).
The AUC value varies from 0.5 to 1, and there are different standards in various situations. In medical diagnosis, a very high AUC (0.95 or higher) is usually required, while other domains consider that 0.7 AUC value has a strong effect [30].

Results and Discussion
This section comprehensively evaluates the lithology identification results of the XGBoost combined with BO. At first, the search process and BO results of the best parameters value set for XGBoost are presented. Furthermore, the evaluation matrix that contains accuracy, confusion matrix, precision, recall, F1-score and AUC is assessed over the model trained with the best hyperparameter in XGBoost classifier. Finally, we compare our studies with previous academics' studies in two areas using the same data.

Tuning Process
The optimum parameter settings used in BO for XGBoost model selection are shown in Table 2. The Number of estimators were randomly chosen in the interval [10,100]. The Max depth was randomly chosen in the interval [1,20]. The Learning rate was chosen from a uniform distribution ranging from 1 × 10 −3 to 5 × 10 −1 . The Subsample was randomly chosen in the interval [0.1, 1]. The Min child weight was randomly chosen in the interval [1,10]. The Gamma was randomly chosen in the interval [0.1, 0.6]. The Colsample bytree was randomly chosen in the interval [0.5, 1]. The Reg alpha was chosen from a uniform distribution ranging from 1 × 10 −5 to 1 × 10 −2 . The objective function is accuracy. Moreover, the max eval of the optimization function is set as 300. We use the data from DGF and HGF to determine the parameters for each gas field. The tuned optimum parameters are shown in Table 2.  To appraise the efficiency of the proposed approach, consider four levels for each one of the eight parameters. The grid search technique generates 4 8 = 65,536 candidate settings, while the BO generations require 300 model evaluations, representing 0.45% of the budget of the grid search technique. In order to reflect the Bayesian Optimization process more intuitively, taking HGF data as an example, the XGBoost algorithm parameters tuning process by BO are shown in Figure 4.
BO makes full use of the previous sample point's information during parameters tuning process. The eight parameters find their optimum value with relatively fewer iterations. The search range of Max Depth is relatively uniform, due to a narrow range of parameters. The first half of the search range of the Number of Estimators is relatively dispersed, while the last half is more concentrated. The search range of Learning Rate is mainly from 0 to 0.3. Moreover, the search range of subsample is mainly from 0.8 to 1, while that of Min Child Weight is from 0 to 0.5. The search range of Gamma is relatively uniform, while that of Colsample bytree mainly focuses between 0.5 and 0.7. The search in the boundary is less for Reg alpha. Generally, Bo initiates each parameter in the whole range. After acquiring more feedbacks from objective function as times goes by, the search range mainly focuses on the possible optimum range. Although it will search in the whole space, it is not as frequent as before. The Number of estimators were randomly chosen in the interval [10,100]. The Max depth was randomly chosen in the interval [1,20]. The Learning rate was chosen from a uniform distribution ranging from 1 × 10 −3 to 5 × 10 −1 . The Subsample was randomly chosen in the interval [0.1, 1]. The Min child weight was randomly chosen in the interval [1,10]. The Gamma was randomly chosen in the interval [0.1, 0.6]. The Colsample bytree was randomly chosen in the interval [0.5, 1]. The Reg alpha was chosen from a uniform distribution ranging from 1 × 10 −5 to 1 × 10 −2 . The objective function is accuracy. Moreover, the max eval of the optimization function is set as 300. We use the data from DGF and HGF to determine the parameters for each gas field. The tuned optimum parameters are shown in Table 2.
To appraise the efficiency of the proposed approach, consider four levels for each one of the eight parameters. The grid search technique generates 4 8 = 65,536 candidate settings, while the BO generations require 300 model evaluations, representing 0.45% of the budget of the grid search technique. In order to reflect the Bayesian Optimization process more intuitively, taking HGF data as an example, the XGBoost algorithm parameters tuning process by BO are shown in Figure 4.   BO makes full use of the previous sample point's information during parameters tuning process. The eight parameters find their optimum value with relatively fewer iterations. The search range of Max Depth is relatively uniform, due to a narrow range of parameters. The first half of the search range of the Number of Estimators is relatively dispersed, while the last half is more concentrated. The search range of Learning Rate is mainly from 0 to 0.3. Moreover, the search range of subsample is mainly from 0.8 to 1, while that of Min Child Weight is from 0 to 0.5. The search range of Gamma is relatively uniform, while that of Colsample bytree mainly focuses between 0.5 and 0.7. The search in the boundary is less for Reg alpha. Generally, Bo initiates each parameter in the whole range. After acquiring more feedbacks from objective function as times goes by, the search range mainly focuses on the possible optimum range. Although it will search in the whole space, it is not as frequent as before.

Evaluation Matrix
To acquire both steady and reliable results, learning dataset has been divided into a training set and test set using cross-validation, and then take the average of 10 independent runs as the final results. Then we averaged the scores, including precision, recall and F1-score to evaluate the performances of the model. Tables 3 and 4 show the scores of each lithology class for the model in the DGF and HGF, respectively. Table 3. Performance matrix of each lithology class for 5-fold-cross-validation averaged over ten runs in the DGF.

Evaluation Matrix
To acquire both steady and reliable results, learning dataset has been divided into a training set and test set using cross-validation, and then take the average of 10 independent runs as the final results. Then we averaged the scores, including precision, recall and F1-score to evaluate the performances of the model. Tables 3 and 4 show the scores of each lithology class for the model in the DGF and HGF, respectively. Table 3. Performance matrix of each lithology class for 5-fold-cross-validation averaged over ten runs in the DGF. which may increase the risk of misclassification of lithology in the DGF. In addition, the classification of lithology C is the best in both gas fields. The main reason is that the lithology of C has a significant gap compared with other lithologies. At the same time, the CS lithology classification effect is the worst in the DGF, but has better performance in the HGF. This may be because the number of CS samples in the HGF is about twice that of DGF, and the XGBoost algorithm can learn more features in the samples, thus, enhancing the accuracy of the results. Table 4. Performance matrix of each lithology class for 5-fold-cross-validation averaged over ten runs in the HGF.  In order to evaluate the generalization performance of the proposed algorithm, the ROC curve is used here to evaluate the expected generalization performance, as shown in Figure 6.  In the case of the DGF test set, the accuracy of M, C, and CR is highest, which probably because the lithology difference is significant compared with others. Four percent of M were misclassified as CR, due to the deposition environment of lake sediments. The type of M is carbonate-like mud, which contains a portion of CaCO 3 , resulting in misclassification of CR. Besides, there is more probability of misclassification between PS, CS, FS, MS, and S. It is likely that the grain sizes of sandstone are challenging to determine, and human error is involved in the interpretation of lithology.

Class
Overall, for the HGF test set, the accuracy of M, and C classes is higher than 90%. Especially the accuracy rate of class C is up to 100%. This may have been caused by the quite different lithology properties in C and M compared to other classes. Mistakes are also mainly concentrated in sandstone classes. Although there are significant variance between M and other lithologies, it still has 6% of M misclassified as PS. This is likely because the leaning sample is imbalanced, the PS classes have the highest proportion in the HGF. There is no misclassification in the DGF since the number of samples in M and PS is very close.
In order to evaluate the generalization performance of the proposed algorithm, the ROC curve is used here to evaluate the expected generalization performance, as shown in Figure 6. In order to evaluate the generalization performance of the proposed algorithm, the ROC curve is used here to evaluate the expected generalization performance, as shown in Figure 6. XGBoost can be used to calculate the TPR and FPR of the lithology to be tested at various thresholds, thereby drawing a ROC curve. In this way, a total of eight ROC curves can be drawn in the DGF, and seven ROC curves in the HGF. The ROC curves of each gas field are averaged to obtain the final ROC curves and AUC values corresponding to the gas field. The diagonal line in the figure corresponds to the random guess model, that is, the AUC value is 0.5, and the AUC value of 1 represents the ideal model. The AUC of HGF is 0.987, and the AUC of DGF is 0.968. The performance of HGF is better than DGF, and both are greater than 0.95. The proposed algorithm has a strong generalization ability in these two gas fields. Figure 7 shows the point plots comparing the results found for lithology identification with those available in references. The dataset of this article is the same as the references. Xie et al. used five machine learning methods to identify lithology and concluded that the Gradient Tree Boosting, which belonged to the ensemble learning method for formation lithology identification had the best performance [19]. Saporetti et al. combined the Gradient Tree Boosting with a differential evolution to identify lithology [31]. XGBoost can be used to calculate the TPR and FPR of the lithology to be tested at various thresholds, thereby drawing a ROC curve. In this way, a total of eight ROC curves can be drawn in the DGF, and seven ROC curves in the HGF. The ROC curves of each gas field are averaged to obtain the final ROC curves and AUC values corresponding to the gas field. The diagonal line in the figure corresponds to the random guess model, that is, the AUC value is 0.5, and the AUC value of 1 represents the ideal model. The AUC of HGF is 0.987, and the AUC of DGF is 0.968. The performance of HGF is better than DGF, and both are greater than 0.95. The proposed algorithm has a strong generalization ability in these two gas fields. Figure 7 shows the point plots comparing the results found for lithology identification with those available in references. The dataset of this article is the same as the references. Xie et al. used five machine learning methods to identify lithology and concluded that the Gradient Tree Boosting, which belonged to the ensemble learning method for formation lithology identification had the best performance [19]. Saporetti et al. combined the Gradient Tree Boosting with a differential evolution to identify lithology [31]. The scatter plot represents the average value of the calculated results. It can be seen from Figure  7 that the three evaluation indexes of precision, recall and F1 score of the proposed method are significantly greater than those of the references. In the DGF, there is little difference between the three indicators in the reference. The proposed method has a significant improvement effect on the results of precision, the performance improvement is significantly higher than the others, and the recall improvement is relatively minimal. In the HGF, the increase of precision and recall is similar, and both are higher than the F1 score. The three indexes have significant improvements in the HGF. The scatter plot represents the average value of the calculated results. It can be seen from Figure 7 that the three evaluation indexes of precision, recall and F1 score of the proposed method Energies 2020, 13, 3903 13 of 15 are significantly greater than those of the references. In the DGF, there is little difference between the three indicators in the reference. The proposed method has a significant improvement effect on the results of precision, the performance improvement is significantly higher than the others, and the recall improvement is relatively minimal. In the HGF, the increase of precision and recall is similar, and both are higher than the F1 score. The three indexes have significant improvements in the HGF. Precision and Recall have the biggest improvements, and they are much closer to each other. Compared with DGF, the improvement of our proposed model is much more significant, which probably because the sample dataset is more enough in the HGF and our model can extract more features. The improvement of lithology identification accuracy can promote the accuracy of our geological model effectively. Based on geological model, it is better and more precise to calculate the geological reserves. With more accurate lithology identification, it is better to optimize well placement and perforating location, as well as obtain higher drilling encountering rate and higher production.

Conclusions
In this study, in order to simplify the data-driven pipeline for formation lithology classification, we investigated the use of the BO in search of the optimal hyperparameters of the XGBoost classifier based on well log data. To acquire both steady and reliable results, learning dataset has been divided into a training set and test set using cross-validation, and then take the average of 10 independent runs as the final results.
In addition, we comprehensively evaluated the performance of the proposed classifier. The AUC values of the test sets of the DGF and HGF fields are 0.968 and 0.987, respectively, indicating that the proposed method has very high generalization performance. The averaged results of Precision, Recall, and F1 score of our method are respectively 5.9%, 5.7%, 6.15% greater than those of GTB, while 4.85%, 5.7%, 3.25% higher than those of GTB-DE.
Furthermore, the proposed classifier can assist the geologist to accurately and efficiently identify the formation lithology. Specialists can make use of the XGBoost model to analyze a large amount of well log data during geological exploration, such as the estimation of fracture density, the prediction of reservoir permeability and the calculation of porosity, which can improve the data analysis efficiency in petroleum geology.