Determination of Reservoir Oxidation Zone Formation in Uranium Wells Using Ensemble Machine Learning Methods

: Approximately 50% of the world’s uranium is mined in a closed way using underground well leaching. In the process of uranium mining at formation-inﬁltration deposits, an important role is played by the correct identiﬁcation of the formation of reservoir oxidation zones (ROZs), within which the uranium content is extremely low and which affect the determination of ore reserves and subsequent mining processes. The currently used methodology for identifying ROZs requires the use of highly skilled labor and resource-intensive studies using neutron ﬁssion logging; therefore, it is not always performed. At the same time, the available electrical logging measurements data collected in the process of geophysical well surveys and exploration well data can be effectively used to identify ROZs using machine learning models. This study presents a solution to the problem of detecting ROZs in uranium deposits using ensemble machine learning methods. This method provides an index of weighted harmonic measure (f1_weighted) in the range from 0.72 to 0.93 (XGB classiﬁer), and sufﬁcient stability at different ratios of objects in the input dataset. The obtained results demonstrate the potential for practical use of this method for detecting ROZs in formation-inﬁltration uranium deposits using ensemble machine learning


Introduction
Uranium deposits in Kazakhstan are mined using the environmentally efficient in situ leaching (ISL) method, which, however, requires a fairly accurate determination of the lithologic structure of the host rocks.Since about 41% of the world uranium production and almost the entire production volume in Kazakhstan is carried out using ISL [1], the relevance of the task of determining the characteristics of the host rocks is extremely high.To solve this task, in some cases, machine learning methods are employed, increasing the degree of automation of this process and reducing the influence of human error.The characteristics of rocks are determined through the process of geophysical research of boreholes (GRB).The GRB process is different for exploration and production wells.However, in both, as a standard set of GRB methods in Kazakhstan fields, apparent resistance logging (AR) and spontaneous polarization (SP) potential are used for lithologic classification and the determination of the filtration properties of rocks.Moreover, gamma ray logging (GR) is used for the calculation of uranium content on the basis of the gamma radiation of radium and its decay products with the use of conversion through the radioactive equilibrium coefficient.The log data are the recorded physical parameters inside the well in 10 cm increments in depth, and are visualized for evaluation by experts as graphs (curves).At the same time, the automatic interpretation of electric logs (AR and SP) is implemented only on the basis of AR logs, without taking into account data from other logs, information on neighboring wells, etc., which leads to significant manual adjustments by the interpreting engineer.In addition, this leads to the impossibility of correct lithologic interpretation in case of AR curve distortion.In this connection, the main directions of the application of machine learning methods in the processing of logging data at uranium deposits emerge are as follows:

•
Lithologic classification • Determination of the permeability of host rocks • Determination of reservoir oxidation zones (ROZs) (zones with disturbed radioactive equilibrium).

•
Determination of technological acidification zones (zones with distorted AR curve).
The methods of applying machine learning to lithologic classification have been discussed in a number of papers (see Section 2).The problem of determining the filtration coefficient is considered in [2,3], in which machine learning methods demonstrate an almost twofold increase in accuracy compared to the currently used methodology.
This study deals with the problem of ROZ determination.A machine learning-based method for determining ROZs from the data of exploratory wells with acceptable accuracy for practice is proposed.
In the research process, we answer two questions: 1. Is it possible to identify ROZs by machine learning using a standard log dataset? 2.
Which machine learning methods give the best classification result?
The novelty of this study includes two important aspects. 1.
The problem of ROZ detection using machine learning techniques is considered for the first time according to the review of investigations in this domain.

2.
An acceptable ROZ for practical applications is obtained, and the limitations of the proposed method are identified.
The paper consists of the following sections (Figure 1). the lithologic structure of the host rocks.Since about 41% of the world uranium production and almost the entire production volume in Kazakhstan is carried out using ISL [1], the relevance of the task of determining the characteristics of the host rocks is extremely high.To solve this task, in some cases, machine learning methods are employed, increasing the degree of automation of this process and reducing the influence of human error.The characteristics of rocks are determined through the process of geophysical research of boreholes (GRB).The GRB process is different for exploration and production wells.However, in both, as a standard set of GRB methods in Kazakhstan fields, apparent resistance logging (AR) and spontaneous polarization (SP) potential are used for lithologic classification and the determination of the filtration properties of rocks.Moreover, gamma ray logging (GR) is used for the calculation of uranium content on the basis of the gamma radiation of radium and its decay products with the use of conversion through the radioactive equilibrium coefficient.The log data are the recorded physical parameters inside the well in 10 cm increments in depth, and are visualized for evaluation by experts as graphs (curves).At the same time, the automatic interpretation of electric logs (AR and SP) is implemented only on the basis of AR logs, without taking into account data from other logs, information on neighboring wells, etc., which leads to significant manual adjustments by the interpreting engineer.In addition, this leads to the impossibility of correct lithologic interpretation in case of AR curve distortion.In this connection, the main directions of the application of machine learning methods in the processing of logging data at uranium deposits emerge are as follows: Determination of the permeability of host rocks • Determination of reservoir oxidation zones (ROZs) (zones with disturbed radioactive equilibrium).

•
Determination of technological acidification zones (zones with distorted AR curve).
The methods of applying machine learning to lithologic classification have been discussed in a number of papers (see Section 2).The problem of determining the filtration coefficient is considered in [2,3], in which machine learning methods demonstrate an almost twofold increase in accuracy compared to the currently used methodology.
This study deals with the problem of ROZ determination.A machine learning-based method for determining ROZs from the data of exploratory wells with acceptable accuracy for practice is proposed.
In the research process, we answer two questions: 1. Is it possible to identify ROZs by machine learning using a standard log dataset? 2. Which machine learning methods give the best classification result?
The novelty of this study includes two important aspects.
1.The problem of ROZ detection using machine learning techniques is considered for the first time according to the review of investigations in this domain.2.An acceptable ROZ for practical applications is obtained, and the limitations of the proposed method are identified.
The paper consists of the following sections (Figure 1).Section 2 provides a literature review concerning the interpretation of logging data using machine learning.
Section 3 describes the physical principles of ROZ formation and the limitations in determining ROZs from log data.Section 4 describes the data used and the data processing methods.Section 5 describes the computational experiments and demonstrates the results.The obtained results are discussed in Section 6.Finally, in Section 7, the limitations of the method and directions for future research are discussed.

Related Works
The use of machine learning in log data interpretation has aroused the interest of researchers since the 1970s.In the 1990s, active research on automatically interpreted log data using feedforward artificial neural networks began [4][5][6][7].A wide range of classical and modern methods of machine learning was applied.In particular, in article [8], a number of classical algorithms, such as support vector machine (SVM), decision tree (DT), random forest (RF), multi-layer perceptron (MLP), and ensemble machine learning method (XGBoost) are used for the classification of lithofacies in the Talcher coalfield, Eastern India, with an accuracy rate of more than 80% for binary classification (carbonaceous and non-coal lithofacies).In paper [9], the application of a convolutional neural network for lithofacies classification in the Eagle Ford and Austin Chalk shale oil fields is considered.The problem of lithologic classification of four geothermal wells in the Snake River Plain (SRP) in Idaho is considered in paper [10].The authors compared k-nearest neighbor (kNN), SVM, and eXtreme Gradient Boosting (XGBoost).The latter algorithm showed the highest classification accuracy (90.67%).The combined method of kernel principal component analysis-Bayesian optimization-categorical boost (KPCBA-CatBoost) demonstrated approximately the same accuracy in the task of lithologic classification in an oil and gas field (accuracy 90%) [11].In article [12], the task of classifying and predicting the geological facies using well log data in the Anadarko Basin oil field, Kansas, is considered.The feedforward neural network (FFNN) showed an accuracy of 88%.A similar task of classifying eight rock classes in the Vikulov Formation (western Siberia) is discussed in paper [13].The authors compared the CatBoost, RF, and MLP algorithms.Although the achieved classification accuracy is in the order of 64%, the authors conclude that machine learning algorithms can predict lithology from a standard set of log diagrams without normalization to reference formations, which can significantly reduce the time required to prepare log curves in advance.
The issue of determining rock permeability is considered in a number of works [14][15][16][17].For example, the problem of determining the rock permeability of oil reservoirs carbonate reservoirs (Ilam and Sarvak) in the southwest of Iran is discussed in article [18].The authors considered the following methods: multi-layer perceptron neural network (MLP), radial basis function neural network (RBF), SVM, DT, and RF, and achieved a coefficient of determination in the rock permeability prediction problem of 0.97 (SVM), which is higher than that obtained using traditional methods.
ML methods have been used at uranium deposits for lithologic classification [27]; stratigraphy [28]; the determination of the filtration properties of host rocks [2]; and the assessment of the influence of expert marking of logging data [29].It turned out that the quality of interpretation of logging data in uranium fields, as well as in some tasks solved in oil fields (problems of classifying the connectivity quality (among six ordinal classes) and hydraulic isolation (two classes)) [30], depends crucially on expert data labeling [31].The results of the application of machine learning methods in the processing of logging data obtained to date are briefly summarized in Table 1.Methods that showed the best results are highlighted in bold.There are significant differences between oil well logging and uranium well logging.At oil fields, the interpretation of data is required to identify formations with a thickness of one meter or more, for one or several wells of thousands of meters deep.Meanwhile, due to the technological peculiarities of mining processes, the interpretation of logging data at uranium deposits of the infiltration-type formation requires the identification of layers with a minimum thickness of 20 cm for dozens and hundreds of wells several hundred meters deep.The wells at uranium deposits of Kazakhstan are usually drilled with a smaller diameter (118-132 mm) than oil wells, which imposes restrictions on the length and diameter of the borehole devices used, and consequently, on the geophysical research methods used.Therefore, process wells in uranium deposits are investigated by relatively inaccurate electrical logging methods with a small number of measuring electrodes.The exception is exploration wells, where core sampling is performed from different depths, but even in this case, due to the fact that the deposits are located in sedimentary rocks (sands and clays), they are often eroded, and it is often impossible to extract rock samples of the required quality.
In general, the analysis of publications shows the interest and very significant success in the interpretation of log data using ML, especially in oil and gas fields.However, we have not identified any studies devoted to ROZ identification using ML methods for uranium deposits.The authors hope that the present article will fill this gap, since the timely identification of ROZs affects the correct estimation of extractable uranium reserves and can significantly reduce costs in the mining process.
For a better understanding of the necessity of such research, we consider the causes of ROZ, the currently used methods for its identification, and their limitations.

Physical Principles of ROZ Formation and Applied Methods for Its Identification
The peculiarities of deposit formation predetermine their radiological situation and the element composition.In this case, typically, in the ore deposit there is a shortage of radium in comparison with the equilibrium state, and in its frame the radioactive equilibrium is disturbed in the direction of excess radium, which is a consequence of the formation of so-called "residual" and "diffusion" halos of radium.This is due to the fact that deposits of the layer-infiltration type are formed in sedimentary permeable rock strata at the boundary of the redox barrier.Since in oxidizing and reducing conditions the behavior of mobile forms of uranium and radium differ significantly, in different morphological elements of ore bodies as a result of the processes of "export intake" of "mother" uranium and "daughter" radium, geochemical zones appear, where the correlation of mass fractions of radium and uranium differs from the values corresponding to the state of radioactive equilibrium between them.
The state of radioactive equilibrium between radium and uranium is traditionally characterized by the radioactive equilibrium violation coefficient (or, merely, radioactive equilibrium coefficient) often referred to as Kpp, which is equal to the ratio of the mass fractions of radium and uranium.
Thus, the value of Kpp = 1 corresponds to the presence of radioactive equilibrium, and the difference of Kpp values from 1 indicates the presence of systems that have not reached equilibrium or have undergone violations of their closure.
In the cut, the ore body at hydrogenic uranium deposits has the form of a roll moving in the direction of formation water movement (see Figure 2), and the change in Kpp obeys the following basic laws [32]: The average values of Kpp for the different morphological elements of ore bodies, sites, and geochemical zones of deposits (with uranium content of more than 0.01%) vary within a fairly wide range-from 0.60 to 1.0; • Directly behind the front of reservoir oxidation, the radioactive equilibrium is shifted towards an excess of radium (Kpp = 1.5-2.5 and more), up to almost a complete absence of uranium.These are the so-called "residual" radium halos;

•
As the radioactive equilibrium gradually shifts from equilibrium ores (near the reservoir oxidation zone (ROZ)) to an excess of radium, it forms small (0.2-0.4 m) areas of radium rims (the so-called "diffusion" radium halos) at the boundary of ore bodies, with uranium content of 0.01% and higher.
Mathematics 2023, 11, x FOR PEER REVIEW 5 of 21 fractions of radium and uranium differs from the values corresponding to the state of radioactive equilibrium between them.The state of radioactive equilibrium between radium and uranium is traditionally characterized by the radioactive equilibrium violation coefficient (or, merely, radioactive equilibrium coefficient) often referred to as Kpp, which is equal to the ratio of the mass fractions of radium and uranium.
Thus, the value of Kpp = 1 corresponds to the presence of radioactive equilibrium, and the difference of Kpp values from 1 indicates the presence of systems that have not reached equilibrium or have undergone violations of their closure.
In the cut, the ore body at hydrogenic uranium deposits has the form of a roll moving in the direction of formation water movement (see Figure 2), and the change in Kpp obeys the following basic laws [32]: The average values of Kpp for the different morphological elements of ore bodies, sites, and geochemical zones of deposits (with uranium content of more than 0.01%) vary within a fairly wide range-from 0.60 to 1.0; • Directly behind the front of reservoir oxidation, the radioactive equilibrium is shifted towards an excess of radium (Kpp = 1.5-2.5 and more), up to almost a complete absence of uranium.These are the so-called "residual" radium halos; As the radioactive equilibrium gradually shifts from equilibrium ores (near the reservoir oxidation zone (ROZ)) to an excess of radium, it forms small (0.2-0.4 m) areas of radium rims (the so-called "diffusion" radium halos) at the boundary of ore bodies, with uranium content of 0.01% and higher.At present, the uranium content is determined by dividing the radium content obtained as a result of interpretation by Kpp [2].Natural gamma radiation of radium and other uranium decay products is registered during GR.Since uranium is completely absent in the zones of reservoir oxidation, Kpp = ∞; in other words, in the presence of pronounced gamma anomalies, they must be taken into account in the interpretation of gamma-ray logging.ROZs can be extracted by core analysis at the exploration stage, and then geologic sections can be built and extended along geologic sections, as shown in  At present, the uranium content is determined by dividing the radium content obtained as a result of interpretation by Kpp [2].Natural gamma radiation of radium and other uranium decay products is registered during GR.Since uranium is completely absent in the zones of reservoir oxidation, Kpp = ∞; in other words, in the presence of pronounced gamma anomalies, they must be taken into account in the interpretation of gamma-ray logging.ROZs can be extracted by core analysis at the exploration stage, and then geologic sections can be built and extended along geologic sections, as shown in Figure 3. High-resolution pictures are given at in the Supplementary Materials.However, core sampling and laboratory testing are a long and expensive process.Moreover, the construction of sections and the extrapolation of selected ROZs are not always carried out in a timely and correct manner, as they require high qualification and considerable manual labor.As a result, the initial interpretation is often carried out without taking ROZs into account, and then it is necessary to recalculate it considering ROZs.
Another method of ROZ extraction is fission neutron logging (FN), as it allows the direct determination of uranium content, bypassing the stage of conversion of radium content to uranium content via Kpp.In such cases, ROZs are identified where gamma log ore intervals do not correspond to FN ore intervals.An example of GR reinterpretation after ROZ delineation based on FN results is shown in Figure 4.However, core sampling and laboratory testing are a long and expensive process.Moreover, the construction of sections and the extrapolation of selected ROZs are not always carried out in a timely and correct manner, as they require high qualification and considerable manual labor.As a result, the initial interpretation is often carried out without taking ROZs into account, and then it is necessary to recalculate it considering ROZs.
Another method of ROZ extraction is fission neutron logging (FN), as it allows the direct determination of uranium content, bypassing the stage of conversion of radium content to uranium content via Kpp.In such cases, ROZs are identified where gamma log ore intervals do not correspond to FN ore intervals.An example of GR reinterpretation after ROZ delineation based on FN results is shown in Figure 4.
Figure 4 shows that the actual ore intervals for FN were much smaller than those calculated for radium.However, the FN procedure is expensive and the logging speed is not more than 50 m/h, but the main limitation is that the resource of the neutron generator tube used is extremely limited.Therefore, FN is not ordered in all fields, and it only concerns small volumes (5-10%) of the total number of wells.Failure to account for ROZ is one of the main reasons for the overestimation of available ore reserves, and often leads to significant material losses when entire geological blocks turn out to be empty.At the moment, there is no fast and reliable way of ROZ identification in the process of GR interpretation.The development of a formal way of such identification is also very problematic.Therefore, ML is one of the likely ways to solve this problem.In the next section, the data and methods that were used to solve the ROZ identification problem using machine learning are discussed.Figure 4 shows that the actual ore intervals for FN were much smaller than those calculated for radium.However, the FN procedure is expensive and the logging speed is not more than 50 m/h, but the main limitation is that the resource of the neutron generator tube used is extremely limited.Therefore, FN is not ordered in all fields, and it only

Methodological Research Design
The method of solving the problem of ROZ identification using machine learning methods seems quite obvious.The input of the machine learning model is a set of n available log data at a certain depth, which form a vector of input values or features x (i) = (x 0 , x 1 , . . . ,x n ).A total of m training examples x (1) , x (2) , . . .x (i) , . . . ,x (m) ∈ X are used to train the model.Target value (y) is the rock code, which in this case takes three values: 1-permeable rocks, 2-impermeable rocks, and 8-ROZ.The ML model can afterwards be trained using the pairs x (i) , y (i) , and the resulting output is evaluated using the chosen metric.However, this straightforward approach yields rather poor classification performance.Significant improvement can be achieved by pre-training the data and applying feature engineering techniques, which include generating a balanced data set and windows, and using geographically close wells as a source of lithologic information.The structure of the proposed method is illustrated in Figure 5.A set of 1000 exploration wells with core sampling from the Inkai field was used to solve log data processing tasks.From this set of wells, a special set was manually generated for ROZ studies, input features were identified, and so-called floating data windows were generated for each well.The set of wells was divided into training, test, and validation wells, and was then used in a cyclic process of machine learning model selection and evaluation.The best performing model was used to interpret the validation dataset and visualize the results.A set of 1000 exploration wells with core sampling from the Inkai field was used to solve log data processing tasks.From this set of wells, a special set was manually generated for ROZ studies, input features were identified, and so-called floating data windows were generated for each well.The set of wells was divided into training, test, and validation wells, and was then used in a cyclic process of machine learning model selection and evaluation.The best performing model was used to interpret the validation dataset and visualize the results.

Data Collection and Preprocessing
For the task of ROZ identification from the Inkai deposit wells, a special data set was created with the division of wells into three classes: 42 wells without ROZ (LOW_ZPO), 84 wells with a ROZ of 5-50% of the ore-bearing horizon (MEDIUM ZPO), and 42 wells with a ROZ of more than 50% of the ore-bearing horizon (HI_ZPO).
For each well, for the analysis of the data of AR (Ohm × m), SP (mV), and GR (µR × h) logs recorded in 10 cm increments, lithologic intervals (upper boundary, rock code, permeability code, filtration coefficient, and lower boundary) and wellhead coordinates (X, Y, and Z) were used.Since the physical properties of the rocks (in particular, the AR recording level) vary considerably in different geologic horizons, logging data were used only within one horizon (the Inkuduk horizon).Since we used exploratory wells, the ZPO zones were identified in the course of laboratory studies and marked with a geochemical code of 8. Well logging data are recorded in several tables, of which the following indicators are used for further processing (Figure 6): The occurrence of ROZ is attributed to the extended groundwater movement, which justifies the use of well coordinates, LIT1 and LIT2.
In the process of initial data generation, the following transformations were performed to increase the accuracy of calculations:

•
Searching for the lithologic data of the nearest wells and using them as additional input parameters.
The use of floating data windows allowed us to increase the size of the input data vector by a multiple of the window size.For example, if the input vector size is n = 11, then using a window of size h = 3 gives the vector n' = h × n (n' = 33) at the model input.
In practice, the size of the floating window is chosen to be either equal to the length of the logging probe (110 cm) or larger.Since each line of the source data describes logging values for a 10 cm layer of rock, the height of the floating data window is equal to the length of the probe h = 11.If each window corresponds to a target value at a depth in the middle of the window, the window is symmetrical.For example, when h = 11, the top (tw) and bottom (bw) of the window may be equal to 5. In general, as computational experiments have shown, the use of symmetric windows is not necessary.About 800 data windows were generated from each well in this way.

•
Formation of the so-called floating data window [27,29] (Figure 6); • Searching for the lithologic data of the nearest wells and using them as additional input parameters.
The use of floating data windows allowed us to increase the size of the input data vector by a multiple of the window size.For example, if the input vector size is n = 11, then using a window of size h = 3 gives the vector n' = h × n (n' = 33) at the model input.We suppose that using the data from two nearest wells can improve the quality of the classification.A special program has been developed to test this assumption.The program calculates the distance to the two closest wells of the current well in a given data set and includes in the input vector the lithologic code and the square of the distance to the first (1) and second (2) closest wells: nearest_LIT1_1, nearest_LIT1_2, min_dist_sq1, min_dist_sq.Figure 7 shows a fragment of the data table for one of the wells in the training dataset.In practice, the size of the floating window is chosen to be either equal to the length of the logging probe (110 cm) or larger.Since each line of the source data describes logging values for a 10 cm layer of rock, the height of the floating data window is equal to the length of the probe h = 11.If each window corresponds to a target value at a depth in the middle of the window, the window is symmetrical.For example, when h = 11, the top (tw) and bottom (bw) of the window may be equal to 5. In general, as computational experiments have shown, the use of symmetric windows is not necessary.About 800 data windows were generated from each well in this way.
We suppose that using the data from two nearest wells can improve the quality of the classification.A special program has been developed to test this assumption.The program calculates the distance to the two closest wells of the current well in a given data set and includes in the input vector the lithologic code and the square of the distance to the first (1) and second (2) closest wells: nearest_LIT1_1, nearest_LIT1_2, min_dist_sq1, min_dist_sq.Figure 7 shows a fragment of the data table for one of the wells in the training dataset.The formation of training, test, and validation datasets deserves a special discussion.In order to avoid data leakage and, as a consequence, to obtain an overestimated classification quality, data from different wells should be used in the training and test sets.The division of the dataset into test and training was performed as follows.From the N wells available in a particular experiment, 0.1*N co-comprised the test set and 0.9*N the training set.However, since in some experiments N was not large (less than 40), we applied the kfold validation approach.The division was performed 9 times (k = 9), and the obtained estimates were averaged.The final model evaluation was performed on the validation set, which was specially formed depending on the experiment objectives (see below) and whose wells were not included in either test or training sets.The formation of training, test, and validation datasets deserves a special discussion.In order to avoid data leakage and, as a consequence, to obtain an overestimated classification quality, data from different wells should be used in the training and test sets.The division of the dataset into test and training was performed as follows.From the N wells available in a particular experiment, 0.1*N co-comprised the test set and 0.9*N the training set.However, since in some experiments N was not large (less than 40), we applied the k-fold validation approach.The division was performed 9 times (k = 9), and the obtained estimates were averaged.The final model evaluation was performed on the validation set, which was specially formed depending on the experiment objectives (see below) and whose wells were not included in either test or training sets.

Machine Learning Model Selection and Evaluation
In the process of preliminary experiments, the following list of machine learning algorithms was used: support vector machines [33], artificial neural network [34,35], random forest [36], and eXtreme Gradient Boosting (XGBoost) [37], which are traditionally used to solve lithologic and stratigraphic classification problems [2,31,38,39], as well as LightGBM [40].It turned out, however, that the most stable results are demonstrated by the ensemble learning methods.Therefore, in the final experiments, the ensemble learning method based on the gradient boosted trees algorithm (XGBClassifier, LightGBM) and the ensemble learning method based on bagging technique (random forest classifier) were used as the machine learning models.
XGBClassifier uses a boosting technique, where the next ensemble algorithm (t) is trained by taking into account the error gradient of the previous algorithm (t − 1).In other words, the subsequent algorithm is tuned so that the target value is not the target value ( y (i) ) m i=1 is a target value for the i-th example out of m training examples), but the antigradient of the error function of the previous algorithm.
is the hypothesis function of the previous algorithm, and θ are the parameters of hypothesis function (the weight of the leaves of the decision tree).This means that the next training algorithm uses pairs x (i) , −L (i) t−1 y (i) , ŷ t−1 x (i) instead of the traditional pairs ( x (i) , y (i) ).The optimal parameters at step t are found by minimizing the cost function of the following type: is the sum of regularizers of the type , where L i − number of leaves of the tree i; γ-the parameter regulating the division of the leaf into subtrees; and λ-the regularization parameter for the sum of tree weights.
The division of a leaf into subtrees is performed using the function of prediction of the ensemble of T algorithms (trees), which is found as The RF model uses the bootstrap aggregation technique, where a separate decision tree is constructed for each random subsample of the training dataset based on only a portion of the features.The final result of the classification task is generated by voting between the constructed trees.
where T i is the number of trees that "voted" for class i, and n c is the number of classes.
To perform the computation experiments, a Python language programming system was developed with the application of numpy, sklearn, mat-plotlib, cv2, alive_progress, pickle, and tensorflow libraries, which solves the tasks of reading and preparation of the initial data and includes 21 functions on data-frame formation, the selection and use of nearby well data, the resizing of the floating data window, the formation of the training and test set of wells, the use of underlying horizon data, the training and evaluation of machine learning models, and others.
Computational experiments were performed on a computer equipped with 32 GB RAM, Intel(R) Core(TM) i7-10750H processor, and discrete video card Nvidia GeForce GTX 1650 Ti.Program example and data sets are given at in the Supplementary Materials.
It should be noted that when using SVC from the Sklearn library, it was not possible to obtain the results during the experiments with a floating data window.
The performance of machine learning models in this case was evaluated using a confusion matrix and measures of accuracy (Ac), precision (Precision), completeness (Recall), and harmonic measure (f1-Score): where N t is the number of correct answers, and N is the total number of possible model answers.
where true positive (TP) and true negative (TN) are cases of correct classifier performance, meaning cases where the predicted class matched the expected class.Correspondingly, false negative (FN) and false positive (FP) are cases of misclassification.

Results
The preliminary experiments showed that the application of an ensemble learning method allows to detect ROZs with an accuracy of up to 90% in individual experiments.Nevertheless, for a productive evaluation of machine learning methods, it is necessary to take into account the fact that, depending on the deposit, the amount of data with and without acidification may significantly differ.Therefore, a group of experiments was designed to evaluate the robustness of the methods depending on the content of the training dataset.
First, we evaluated the effect of the size of the floating data window on classification quality (Table 1).
It can be seen that the f1-score increases by 3-4% when the floating window size increases.The most stable results are demonstrated by XGB (f1_macro > 0.7).The AdaBoost and Naive Bayes classifiers demonstrated unsatisfactory results.It is worth noting that the results shown in Tables 2-4 and Appendix A are the average of a k-fold validation at k = 9.In other words, during the computational experiments, the wells were divided into training (90%) and test (10%) well sets nine times.Each time, machine learning models were trained and the results were evaluated.The resulting machine learning model score is the average of these evaluations.The detailed results of the computational experiments are provided in Appendix A (Table A1).Second, we have assessed the impact of the nearest well lithology data (Table 3).To save time, the AdaBoost and Naive Bayes classifiers were excluded from further experiments.
The results of this experiment demonstrate that the use of the information about the lithologic composition of the nearest wells practically does not affect the results.This is probably due to the principle of its formation, in which wells were selected based on the percentage of ZPO content, and not proximity.A slight increase in the results was observed when using non-normalized input data (Appendix A. Table A2).The LGBM classifier was used with default settings.The performance of decision-tree-based classifiers depends on the depth of the tree (max_depth) and the number of trees (n_estimators) [41].For RFC, the best results are obtained with max_depth = 16, n_estimators = 150.For XGB max_depth = 12.The RFC results are significantly affected by the number of features for splitting selection (max_features), which is chosen as n x /3, where n x is the number of the input model parameters.The results of the experiments on tuning hyperparameters are given in the Supplementary Materials.
To evaluate the influence of the training dataset, experiments with different ratios of training and test data were performed (see Table 4).In Table 4, HI stands for high ROZ ratio in the dataset, LOW stands for low, and MEDIUM stands for medium.A total of four training datasets with different well ratios were generated: When evaluating the results of ML models, it should be taken into consideration that when classifying unbalanced datasets, three variants of estimating the main indicators are possible.In the first case, the quality metric is calculated within the objects of each class and then averaged across all classes (macro average).In the second case, objects of all classes contribute equally to the quality metrics (micro average).In the third case, the contribution of a class to the overall score depends on its size (weighted).In our case, since it is important for us to consider the influence of all classes, macro average and weighted average will give a more objective evaluation.The results of machine learning models are shown in Table 2.The best f1_macro and f1_weighted are highlighted in bold.

Discussion
The obtained results allow us to draw important conclusions about the influence of the size and balance of the training dataset on the results.It can be seen that when using a mixed balanced dataset, which includes data from wells with high, medium, and low ROZ, the results are on average no worse than when the type of training dataset coincides with the type of test dataset.If the training set type matches the test set type in terms of ROZ content, the XGBClassifier provides f1_macro estimates of 0.7096, 0.7020, and 0.7089 for HI_ZPO_val, LOW_ZPO_val, and MEDIUM_ZPO_val, respectively.Moreover, the estimates of the validation datasets in which the ROZ ratio is significantly different from the training set range from 0.2613 to 0.6285.If the balanced set HI_LOW_MED_ZPO_train is used for training, the f1_macro estimates for the sets HI_ZPO_val, LOW_ZPO_val, and MEDIUM_ZPO_val are 0.7024, 0.7146, and 0.6701, respectively.In other words, this training set provides a high stability of the model.The XGBClassifier and LGBM show slightly better results compared to the RandomForestClassifier.Certainly, the classification result significantly depends on the classifier settings.The meta parameters of the classifiers were fine-tuned to maximize the model score on one of the test sets.The meta-parameters were then fixed and the series of experiments mentioned above were conducted.Expanding the dataset may provide an opportunity to apply deep learning models and improve classification accuracy.However, this expansion of the dataset is not always possible because the number of exploration wells is limited.
Figure 8 shows examples of classification for six test wells, where expected classes are indicated by numbers: 1-permeable rocks, 2-impermeable rocks, and 8-oxidized rocks (ROZ).Blue shows the actual data, and red shows the predicted values at depths between 300 and 400 m within the Inkuduk stratigraphic horizon.Each well is divided into approximately 800 sections (interbeds) by depth.Each section has a thickness (thickness) of 10 cm.The minimum depth of the analyzed section of wells is about 300 m, while the maximum depth is about 400 m. (thickness) of 10 cm.The minimum depth of the analyzed section of wells is about 300 m, while the maximum depth is about 400 m.It can be seen that, in some cases, the prediction accuracy is very high (wells 2 and 3).However, there are also errors (wells 6 and 4), and overcoming them may be the subject of future research.It can be preliminary noted that since the field is located in sedimentary rocks (sands and clays), core losses during extraction amount to up to 20%.In addition, the error in tying core samples to depth can be 1-2 m.This leads to errors in expert assessments on which models are trained.
The developed model can be used for ROZ identification when interpreting the data of the technological wells in real time.First, the algorithm is trained on exploration wells; then, when logging data from a new well are obtained, the trained model interprets them in order to obtain rock codes, for example, as in one of the authors' works [27,29].In It can be seen that, in some cases, the prediction accuracy is very high (wells 2 and 3).However, there are also errors (wells 6 and 4), and overcoming them may be the subject of future research.It can be preliminary noted that since the field is located in sedimentary rocks (sands and clays), core losses during extraction amount to up to 20%.In addition, the error in tying core samples to depth can be 1-2 m.This leads to errors in expert assessments on which models are trained.
The developed model can be used for ROZ identification when interpreting the data of the technological wells in real time.First, the algorithm is trained on exploration wells; then, when logging data from a new well are obtained, the trained model interprets them in order to obtain rock codes, for example, as in one of the authors' works [27,29].In practice, when data from the nearby wells become available, the model can be adjusted to use them to improve the classification quality.

Conclusions
This article describes a method developed by the authors to identify the formation of the oxidation zones from well logs.To the authors' knowledge, this problem of log data interpretation has not previously been considered in the literature.In contrast to the currently used methods of manual ROZ estimation, the proposed method based on the use of machine learning algorithms is low-cost and fast, and does not require additional logging operations.
To identify the most accurate algorithms, computational experiments were performed.During the experiments, the influence of the size of the floating data window, the use of lithologic data from the nearby wells, and the normalization of input parameters were determined.The results were assessed using k-fold validation.The best results for all sets of input parameters and data preprocessing methods were demonstrated by ensemble machine learning algorithms.
The method has a fairly high accuracy (the value of the harmonic measure f1-score is about 0.7), which allows for a significantly reduction in errors in the calculation of ore reserves and, consequently, improves the economic performance of mining processes.
At the same time, some limitations of the proposed approach can be noted.

1.
In some cases, as illustrated above, machine learning algorithms produce incorrect ROZ results.

2.
A specific set of wells already marked in terms of ROZ is required to train the machine learning models.
In addition, it remains unclear whether it is possible to apply machine learning models trained on data from one field to ROZ detection in another field.
Therefore, future research can address the following challenges.

1.
Improving the accuracy of the method, for example, by applying a deep learning and stacking technique.

2.
Evaluation of the accuracy of uranium reserve determination with the application of the developed method.

3.
Assessment of the possibilities and limitations of applying the algorithms trained on data from one deposit to identify ROZs in another deposit.

Figure 1 .
Figure 1.Main sections of the article.Figure 1. Main sections of the article.

Figure 1 .
Figure 1.Main sections of the article.Figure 1. Main sections of the article.
Figure 3. High-resolution pictures are given at in the Supplementary Materials.

Figure 3 .
Figure 3. Geologic cut, with reservoir oxidation zones (highlighted in yellow) on the right and left.

Figure 3 .
Figure 3. Geologic cut, with reservoir oxidation zones (highlighted in yellow) on the right and left.

Figure 4 .
Figure 4. Example of GC reinterpretation after ROZ allocation based on FN results: 1-initial interpretation without ROZ, 2-actual reserves based on FN data, 3-ROZ allocated based on FN results, 4-final interpretation with ROZ taken into account.

Figure 4 .
Figure 4. Example of GC reinterpretation after ROZ allocation based on FN results: 1-initial interpretation without ROZ, 2-actual reserves based on FN data, 3-ROZ allocated based on FN results, 4-final interpretation with ROZ taken into account.

Mathematics 2023 , 21 Figure 5 .
Figure 5.The structure of the proposed method.

Figure 5 .
Figure 5.The structure of the proposed method.

Figure 8 .
Figure 8. ROZ detection based on log data for six test wells marked in the picture with numbers ①-⑥.Blue are the actual values, while red are the predicted values.The vertical line represents the number of the 10 cm layer, while the horizontal axis represents the rock codes, of which only three are used: 1-permeable rocks, 2-impermeable rocks, and 8-oxidized rocks (ROZ).

Figure 8 .
Figure 8. ROZ detection based on log data for six test wells marked in the picture with numbers

Table 1 .
Machine learning in logging data processing tasks.

Table 2 .
Classification quality (f1_macro) when changing the floating window size.

Table 3 .
Classification quality (f1_macro) when varying the floating window size.Two additional input parameters are used-the lithologic codes of the two nearest wells.

Table 4 .
Performance of machine learning models with different combinations of training and test data.

Table A2 .
Results of computational experiments for non-normalized input data.