Comparison of Machine Learning Algorithms for Sand Production Prediction: An Example for a Gas-Hydrate-Bearing Sand Case

: This paper demonstrates the applicability of machine learning algorithms in sand production problems with natural gas hydrate (NGH)-bearing sands, which have been regarded as a grave concern for commercialization. The sanding problem hinders the commercial exploration of NGH reservoirs. The common sand production prediction methods need assumptions for complicated mathematical derivations. The main contribution of this paper was to introduce machine learning into the prediction sand production by using data from laboratory experiments. Four main machine learning algorithms were selected, namely, K-Nearest Neighbor, Support Vector Regression, Boosting Tree, and Multi-Layer Perceptron. Training datasets for machine learning were collected from a sand production experiment. The experiment considered both the geological parameters and the sand control effect. The machine learning algorithms were mainly evaluated according to their mean absolute error and coefﬁcient of determination. The evaluation results showed that the most accurate results under the given conditions were from the Boosting Tree algorithm, while the K-Nearest Neighbor had the worst prediction performance. Considering an ensemble prediction model, the Support Vector Regression and Multi-Layer Perceptron could also be applied for the prediction of sand production. The tuning process revealed that the Gaussian kernel was the proper kernel function for improving the prediction performance of SVR. In addition, the best parameters for both the Boosting Tree and Multi-Layer Perceptron were recommended for the accurate prediction of sand production. This paper also involved one case study to compare the prediction results of the machine learning models and classic numerical simulation, which showed the capability of machine learning of accurately predicting sand production, especially under stable pressure conditions.


Introduction
Global energy demand has been rapidly increasing in recent years in both developed and developing countries.Natural gas hydrate (NGH) has been widely treated as a clean source of energy in the 21st century, producing fewer environmental pollutants compared to traditional energy sources (fossil fuels).The reserved natural gas hydrate amounts to 2.6 × 10 16 ~1.2× 10 17 m 3 [1].Hydrate reserves are classified into three major categories, which are the pore filling, fractured, and massive/nodule categories [2].NGH is mainly reserved in terrestrial permafrost and continental margin marine sediments.More than 90% of the estimated NGH reserves are distributed in the ocean [3].As a clean and highly efficient energy source, NGH shows a bright future in terms of development and utilization.Since NGH was discovered in 1967, some leading countries all around the world, including the United States, Russia, Japan, Canada, and China, have found large NGH sediments and have proposed in situ exploration schemes [4][5][6][7].According to most commercial trials Energies 2022, 15, 6509 2 of 32 of the production of the NGH, it has been found that three main problems are hindering its development, which are low productivity, sand problems, and poor economic efficiency [8].Based on previous research and production in field trials, the low productivity comes from two reasons.Firstly, the development of NGH relies on a phase change in the NGH, which absorbs the heat continuously; the heat transfer causes a low-temperature zone near the wellbore, which keeps expanding as the development process continues; the low temperature slows down the natural gas hydrate's gasification process [9].Secondly, a huge amount of sand that is formed flows into the wellbore and causes a blockage in it [10].In addition, the formation of secondary NGH increases the risk of blockage in the development of NGH [11].The wellbore blockage slows down-and sometimes even interrupts-the NGH recovery process.The poor economic efficiency of NGH development also comes from sand damage to the production facilities.The sand flowing from the formation into the wellbore can also cause severe damage to sand control devices, such as filtering screens [12].The produced sand that flows into the wellbore can cause direct damage to the submersible pump, production tubing, and well head.Deng J. and Deng F. reported the failure of NGH production was due to the huge amount of sand production in China [13,14].Trials of the production of offshore NGH in other countries also proved the negative effects of sand production, such as in Japan and Canada [15,16].
Sand management and theoretical analysis heavily rely on sand prediction models.Sand prediction models have two main methods: macro-level and micro-level methods.A macro-level method focuses on the mechanical behavior of an NGH sediment by considering the strength of the formation [7,17].A macro-level method needs to take the yield criterion into consideration, such as the Mohr-Coulomb yield criterion [18], Tresca yield criterion, Mises yield criterion [19], Drucker-Prager yield criterion [20], Hoek-Brown yield criterion [21], or Lade-Duncan yield criterion [22].The Mohr-Coulomb criterion is the one that is mostly commonly used, but its comparison to other criteria still needs further study.Micro-level sand prediction concerns the free-moving sand in NGH sediments.The classic sand movement model, which was presented by Uchida et al., divided the sand migration process into three main states-grain detachment, grain settling, and lifting [23].The simulation of sand migration requires complicated mathematical derivations and some necessary assumptions.For example, the rock particles were assumed to be incompressible in the simulation models of Ataie-Ashtiani et al., Chang et al., and Yan et al. [12,24,25].These assumptions can simplify the derivation process; however, they may reduce the accuracy of the simulation.To overcome the main drawbacks of current sand production simulation methods, machine learning is a novel method for sand prediction for unconsolidated NGH sediments.Based on a survey of the literature survey, it is difficult to find current research that is trying to apply machine learning in the prediction of sand production in the development of NGH reservoirs.
Machine learning has great advantages in terms of clustering, classification, and regression.In comparison with traditional mathematical modeling, machine learning has the capability of dealing with the growing complexity of data with few assumptions [26].Among the various machine learning algorithms, several powerful and commonly used algorithms were selected to predict sand production risks, namely, K-Nearest Neighbor (KNN), Support Vector Regression (SVR), Boosting Tree (BT), and Multi-Layer Perceptron (MLP).KNN and SVR were selected because of their robustness, which provides the capacity to handle complex problems [27].The Boosting Tree, which is also known as a treebased algorithm, has obvious advantages in dealing with distinct features and combinations of features [28].A tree-based algorithm can generate acceptable results that are not based on the assumption of normality [29].MLP, as a classic artificial neural network (ANN) algorithm, has been proven to solve problems efficiently and accurately, and it has no simple algorithmic solutions [30].

Machine Learning Algorithms 2.1. K-Nearest Neighbor Learning Algorithm
The K-Nearest Neighbor (KNN) algorithm is a simple and commonly used supervised learning method, and it was recognized as one of the top 10 algorithms [31].KNN is mainly used for classification.Figure 1 shows a schematic diagram of KNN.The working principle of KNN is to find out the K training samples closest to a new test data point in the training set by using some distance measurement, and then to use the label of the K similar points to predict the test samples.In the progress of a regression, the average of the label values of the K sample is used as the prediction result.The weighted average can also be based on the distance, so the closer the sample weight is, the more accurate the prediction can be when the sample distribution is uneven.The advantage of the KNN algorithm the avoidance of the training process before classification.Instead of a training process, it simply saves the samples and calculates the distance after receiving the samples to be predicted [32].On the other side of the coin, the time complexity of the prediction is large.In addition, the other main challenges in KNN include the computation of K, nearest neighbor selection, nearest neighbor search, and classification rule [33].Despite these shortcomings, KNN is still an efficient artificial intelligence (AI) algorithm according to the comparison of 16 different algorithms by Li et al. [34].

K-Nearest Neighbor Learning Algorithm
The K-Nearest Neighbor (KNN) algorithm is a simple and commonly u vised learning method, and it was recognized as one of the top 10 algorithms is mainly used for classification.Figure 1 shows a schematic diagram of KNN.ing principle of KNN is to find out the K training samples closest to a new test in the training set by using some distance measurement, and then to use the l K similar points to predict the test samples.In the progress of a regression, the the label values of the K sample is used as the prediction result.The weighted a also be based on the distance, so the closer the sample weight is, the more ac prediction can be when the sample distribution is uneven.The advantage of algorithm the avoidance of the training process before classification.Instead of process, it simply saves the samples and calculates the distance after receiving th to be predicted [32].On the other side of the coin, the time complexity of the pr large.In addition, the other main challenges in KNN include the computation of neighbor selection, nearest neighbor search, and classification rule [33].Des shortcomings, KNN is still an efficient artificial intelligence (AI) algorithm ac the comparison of 16 different algorithms by Li et al. [34].As mentioned above, K is an important super-parameter in KNN, and it d the precision of the prediction.The prediction results will be different when K ferent values.The computation of K relies on the sample's distribution [35].Th of K can be based on either different sample subspaces [36] or different test sam Small ranges of training samples are used for prediction if a small K value is s this way, the prediction error of KNN will be reduced if the sample size is larg This is because only the training samples close to the samples to be predicted in the prediction results.Meanwhile, it is easier to overfit because the predict are very sensitive to adjacent samples.The prediction makes mistakes quite e adjacent sample contains incorrect data.If a larger K value is selected, it is equ using a wide range of samples for prediction.The advantage is that it can redu sibility of overfitting by the learner, but in this way, the training samples that ar from the samples to be predicted will also play a role in prediction, resulting in in the prediction accuracy.Generally, in practice, a K value with a smaller va selected.On the other hand, if different distance calculation methods are used, t bors" found will be different, and the prediction results will, of course, be sig As mentioned above, K is an important super-parameter in KNN, and it determines the precision of the prediction.The prediction results will be different when K takes different values.The computation of K relies on the sample's distribution [35].The selection of K can be based on either different sample subspaces [36] or different test samples [37].Small ranges of training samples are used for prediction if a small K value is selected.In this way, the prediction error of KNN will be reduced if the sample size is large enough.This is because only the training samples close to the samples to be predicted play a role in the prediction results.Meanwhile, it is easier to overfit because the prediction results are very sensitive to adjacent samples.The prediction makes mistakes quite easily if an adjacent sample contains incorrect data.If a larger K value is selected, it is equivalent to using a wide range of samples for prediction.The advantage is that it can reduce the possibility of overfitting by the learner, but in this way, the training samples that are far away from the samples to be predicted will also play a role in prediction, resulting in a decline in the prediction accuracy.Generally, in practice, a K value with a smaller value is still selected.On the other hand, if different distance calculation methods are used, the "neighbors" found will be different, and the prediction results will, of course, be significantly different.The most commonly used distance measure is the L p distance or Minkowski distance.

Support Vector Regression Algorithm
Support Vector Regression (SVR) is also commonly accepted as one of the standard machine learning algorithms, and it falls under the category of supervised learning methods [38].Its algorithmic principle can be summarized in the following two main points: (1) Firstly, it is a linear fitting algorithm that is only suitable for linearly distributed data.For data with a nonlinear distribution, a special nonlinear mapping is used to make their highdimensional distribution linear, and the linear algorithm is used in the high-dimensional feature space to try to learn to fit the training data.(2) Secondly, the algorithm constructs an optimal hyperplane in the whole feature space based on the theory of structural risk minimization (including regularization).SVR is not a traditional regression fitting algorithm based on its basic principle.The main advantage of the SVR algorithm is the capability of achieving a global optimum and avoiding overfitting.The main feature of SVR comes from the different kernel functions that are used to fit different types of data.The computational complexity depends only on the number of support vectors.Therefore, a small number of support vectors determine the final result, not the entire dataset.
In the given training sample set D = {(x 1 , y 1 ), (x 2 , y 2 ), . . . . .., (x m , y m )}, yi ∈ R, the objective is to have a regression model f (x) = ω T x + b that makes f as close as possible to y.Both ω and b are the model parameters that need to be determined.For a certain sample (x, y), traditional regression models usually calculate the loss based on the difference between the model output f (x) and the real value y.The loss is zero only when f (x) and y are exactly the same.In contrast, SVR can tolerate the maximum deviation between f (x) and y.The loss is calculated only when the absolute value of the difference between f (x) and y is greater than ε.The loss is not calculated for a sample with a prediction error falling to ε, while the samples on and outside the spacer are called support vectors.Thus, the SVR problem can be written as: where C is the regularization constant, which is a super-parameter; l ε is the ε− insensitive loss function, which can be determined with the following equation: Introducing slack variables ξ i and ξi and rewriting the above formula gives: After introducing the Lagrange multiplier µ i ≥ 0, μi ≥ 0, α i ≥ 0, αi ≥ 0, the Lagrange function of Equation (3) can be obtained with the Lagrange multiplier method: Let the partial derivative of L(ω, b, α, α, ξ, ξ, µ, μ) to ω, b, ξ i , and ξi be zero: Energies 2022, 15, 6509 Substituting Equations ( 5)- (8) into Equation ( 4), the dual form of SVR can be obtained: Lb        to  , b , i  , and ˆi  be zero: 1 () Substituting Equations ( 5)- (8) into Equation (4), the dual form of SVR can be obtained: Applying the Karush-Kuhn-Tucker (KKT) conditions when searching for the optimized value yields [39]: ) 0, 0, 0, ( ) 0, ( ) 0 Andreani et al. applied a sequential minimum optimization algorithm (SMO) to solve the above optimization problem [40].Substituting Equation (5) into Equation (10), the solution of SVR is: Applying the Karush-Kuhn-Tucker (KKT) conditions when searching for the optimized value yields [39]: Andreani et al. applied a sequential minimum optimization algorithm (SMO) to solve the above optimization problem [40].Substituting Equation (5) into Equation (10), the solution of SVR is: According to the KKT conditions in Equation (10), the samples falling in the ε−spacer satisfy α i = 0 and αi = 0. Therefore, the samples of (α i − α i ) = 0 in Equation ( 11) can be the support vector of SVR, which falls outside the ε−spacing band.Obviously, the support vector of SVR is only a part of the training samples, and its solution is still sparse.
In addition, it can also be seen from the KKT conditions (Equation (10)) that (C − α i )ξ i = 0 and α i ( f (x i ) − y i − ε − ξ i ) = 0 for each sample (x i , y i ).Therefore, after getting α i , it must be that ξ i = 0 for 0 < α i < C, which yields: In practical problems, multiple samples satisfying the condition 0 < α i < C are often selected to solve b, and then the average is taken.
The sample is assumed to have a linear distribution in the above derivation; however, data are often nonlinear in real applications [41].For such problems, the samples can be mapped from the original space to a higher-dimensional feature space so that they are linearly distributed in this new space.
Let φ(x) denote the vector after mapping x to a high-dimensional space, so the corresponding model of the linear regression equation in the new high-dimensional space is: Energies 2022, 15, 6509 So, the solution of SVR is: However, there are still two problems: (1) Different data need different mapping functions φ(x) for different scenarios, which makes it hard to predict how high the dimension at which the original sample is to be mapped should be to create a linear distribution.Therefore, the first problem comes from the selection of the mapping function φ(x).(2) The solution involves the calculation of φ(x i ) T φ(x j ), which is the inner product of the sample x i and x j mapped to a high-dimensional space.Since the dimension may be very high or even infinite, a direct calculation is particularly difficult.A kernel function can help to solve these problems.Kernel functions show good performance in solving the optimization problems of increasing complexity [42,43].In Equation (14), φ(x i ) T and φ(x j ) always appear in pairs.Then, the following relationship can be derived: The inner product of x i and x j in a high-dimensional space is equal to the result calculated with the function κ(x i , x j ) in the original sample space.There is no need to pay attention to the selection of mapping functions with such a function.Therefore, the inner product in a high-dimensional or even infinite-dimensional feature space can be directly calculated, which involves mapping the input data into a higher-dimensional space [44].The most commonly used kernel functions are shown in Table 1.

Kernel Expression Comments
Linear Kernel

Boosting Tree Algorithm
Boosting is an integration method; within this category, the Boosting Tree is one of the most commonly used algorithms.The principle of ensemble learning is to learn repeatedly with a series of weak learners, which are later integrated into a strong learner to get better generalization performance [46].As the name implies, the Lifting Tree is an algorithm that selects a weak learner as a decision tree and then integrates it.A weak learner is a machine learning model with performance that is a little better than that of chance.Some researchers used field datasets to prove that the Boosting Tree algorithm was better than a neural network algorithm [47].The Boosting Tree mainly comprises two powerful algorithms, which are the Gradient Boosting Tree [48] and Extreme Gradient Boosting [49].According to a case study provided by Tixier et al., the Gradient Boosting Tree (GBT) was proven to have better performance than that of other machine learning methods, since it can determine nonlinear and local relationships [50].Extreme Gradient Boosting (XGB) is treated as an implementation of a Gradient Boosting Machine (GBM), but it is more accurate and efficient [51].Some researchers also found that XGB algorithms could significantly reduce the risk of overfitting [52].
Energies 2022, 15, 6509 7 of 32 Given the training sample set D = {(x 1 , y 1 ), (x 2 , y 2 ), . . . . .., (x N , y N )}, yi ∈ R, a regression tree corresponds to a partition of the input space and the output value on all partition units.Now, assuming that the space has been divided into M units R 1 , R 2 , . . ., R M , and there is a fixed output value c m on each unit.The regression tree model can be expressed as: In machine learning, the square error is applied to represent the prediction error of the regression tree for the training data, and the minimum square error criterion is used to solve the optimal output value on each unit, which is transformed into an optimization problem [53].The optimal value ĉm of c m on unit R m is the mean of output y i corresponding to all output instances x i on R m : The jth feature of the data (x (j) ) and a certain value (or values) are assumed to be selected as the segmentation variable and segmentation point.Two regions are defined as follows: Then, the optimal cut feature j and the optimal cut point s are found: Traversing all features, the optimal segmentation features and optimal segmentation points are found.According to this rule, the input space is divided into two regions.Then, the above process is repeated for each region until the stop condition is satisfied, where a regression decision tree can be generated.Combining Equation ( 18) with (17), the final regression tree model is: where the parameter θ = {(R 1 , ĉ1 ), (R 2 , ĉ2 ), . . ., (R M , ĉM )} represents the regional division of the tree and the optimal value of each region; M is the number of leaf nodes of the regression tree.
In the lifting tree, we use an additive model and a forward distribution algorithm.First, the initial lifting tree f 0 (x) = 0 is set, and the model in step t is: which is the current model f t−1 (x).The required solution is: The parameter θt of the t-tree is determined through empirical risk minimization.When the square error loss function is used, Then, the loss yields: Here, γ is the residual of the predicted value of the t−1 tree: So, the optimization goal of the t tree becomes: θt = arg min When the lifting tree solves the regression problem, it only needs to fit the residual of the predicted value of the previous tree.This means that the algorithm becomes quite simple, and the final model of the lifting tree can be expressed as an additive model.
where T(x; θ t ) represents the decision tree, θ t is the parameter of the decision tree, and T is the number of decision trees.

Multi-Layer Perceptron
The neural network is a representative algorithm in machine learning, and it is the most popular and widely used machine learning model [54].A Multi-Layer Perceptron (MLP) is a supplement to feed-forward neural networks [55].Similarly to a biological neural network, the most basic constituent unit of an artificial neural network is the 'neuron'.Each neuron is connected to other neurons.It multiplies the weight on each edge and adds the bias of the neuron itself when it receives an input.The output of a neuron is finally generated through an activation function, which is a nonlinear function, such as a sigmoid function, arc-tangent function, or hyperbolic-tangent function [56].A sigmoid function with many excellent properties is often selected as the most common activation function [57].The perceptron model consists of two layers, which are the input layer and output layer.However, the perceptron has only two layers and only one output layer.One layer of functional neurons restricts the model to fitting data.To solve complex problems, multi-layer functional neurons are needed.The neuron layer between the input layer and the output layer is called the hidden layer.In other words, the Multi-Layer Perceptron (MLP) is a neural network model with multiple hidden layers.Each layer of neurons is completely connected to the next layer of neurons, and there are no connections within the same layer or cross-layer connections.Some researchers proved that a neural network with more than three layers could simulate any continuous function with arbitrary accuracy [58,59].The learning of MLP takes place by adjusting the weights and biases between neurons according to the training data.An error back-propagation (BP) algorithm is commonly used to train multi-layer neural networks.These are based on a gradient descent strategy and are iterative optimization algorithms, which adjust the parameters in the negative gradient direction of the target parameters [60].According to the gradient descent strategy, let the loss function be f and a pair of parameters be (ω, b).The initial values (ω 0 , b 0 ) are randomly selected; in the (n + 1)th iteration, where the learning rate α ∈ (0, is used.The learning rate is usually set to 0.8 [60].Newton's method is applied for rapid convergence [61].Then, Equations ( 29) and ( 30) yield: Loss functions (or objective functions) may have multiple local extremums, but there is only one global minimum [62].The global minimum value is the final objective of the calculations.However, for the gradient descent algorithm, in each iteration, the gradient of the loss function is calculated at a certain point, and then the optimal solution is determined along the negative gradient direction [63].The gradient at the current point is zero if the loss function has reached the local minimum, where the updating of the parameters is terminated.This leads to a local extremum in the parameter optimization.The local minimum can be avoided through the following strategies in order to further approach the global minimum [64][65][66]: (1) The neural network is initialized with multiple sets of different parameters.The parameters with the loss function are the final solution after training.This process is equivalent to starting from multiple different initial points for optimization; then, we may fall into different local extremums, from which we can select the results closer to the global minimum.(2) The random gradient descent method is used.This method adds a random factor, and only one sample is used in each update.There will be a certain possibility of making the direction deviate from the optimal direction so that it can jump out of the local extremum.

Model Performance Evaluation
To evaluate machine learning models, several performance metrics are adopted and calculated to judge their performance.Based on case studies from other researchers, the mean absolute error (MAE) and coefficient of determination (R 2 Score) are often selected to evaluate the generalization ability of machine learning models [67].The MAE is the average difference between the true value and the predicted values, and it can be easily calculated and compared.Some researchers even used some revised forms of the MAE, such as the dynamic mean absolute error and mean absolute percentage error, to evaluate the accuracy of a prediction model [68,69].The more accurate the machine learning model is, the smaller the MAE will be.A perfect model causes the MAE to be close to 0. Let f (x) and y denote the predicted value and the true value, respectively.Then, the MAE can be expressed as: The MAE can intuitively measure the 'gap' between the predicted value and the real value.However, it is difficult to compare a model's effect when the equivalence dimensions are different.
Since the dimensions of different datasets are different, it is difficult to compare them by using a simple difference calculation.The coefficient of determination (R 2 Score) can be adopted to evaluate the degree of coincidence of the predicted values and true values [70].Most recent research has proved the feasibility of using the R 2 Score to evaluate mixed-effect models, rather than just linear models [71].The calculation of the R 2 Score involves three important performance judgment terms-the sum of squares for regression (SSR), sum of squares for error (SSE), and sum of squares for total (SST) [72]: Energies 2022, 15, 6509 10 of 32 Based on the above relationship, the calculation method for the R 2 Score can be defined as [73]: where MSE( f (x), y) is the mean square error between the predicted value f (x) and the real value y; Var(y) is the variance of the real value y.Specifically, there are the following situations in the analysis of the R 2 Score: For a bad scenario in which R 2 < 0, the numerator is greater than the denominator, that is, the error of the prediction data is greater than the variance of the real data.This indicates that the error of the predicted value is greater than the dispersion of the data, which indicates the misuse of a linear model for nonlinear data.

Gravel-Filling Simulation Experiment
A sand control experiment was designed, as shown in Figure 2. The whole experimental process was carried out in a lab incubator to ensure safety.The key step when preparing the experiment was to make NGH reservoir samples under lab conditions.According to a survey of the literature, NGH samples for lab tests are mainly from field coring and lab preparation [74,75].The field coring method is very costly because of the high requirements for temperature and pressure during transportation.In addition, some researchers have found that the physical properties of NGH samples obtained with the coring method may be changed in the sampling and transportation process [76].Therefore, NGH samples were prepared in lab conditions.The NGH lab preparation method for the experiment was an in situ rapid preparation method that was proposed by Li et al. [77].In the preparation, different sizes of produced sand (0.05 m 3 ) were added into a reactor to simulate NGH reservoirs.The produced sand was mixed for a sand sieve analysis before being added into the reactor.The purpose was to present the effect of the uniformity coefficient (UC).
To simulate a real production process, several sand control screens were adopted in the experiment.The sand-retaining precision of the sand control screens was selected according to the preferred gravel mesh.An experiment was used to measure the actual sand production and sand content for the simulation experiment.Yu et al. [78].designed experiments to show that the proper selection of the median diameter ratio of gravel can significantly reduce the risk of sand blockage in the exploitation and production of NGH.Therefore, the selection of the precision of gravel packing was vital in both the experiment and in field sand control.The criterion for the selection of the median gravel size was obtained with Saucier's method, with a gravel size of 5~6 × d 50 .
The purpose of the experiment was to use the changing variables as the independent variable x and the measured sand production as the label y to establish a machine learning model.The experiment contained eight characteristics: the well type, permeability (md), shale content (%), sand diameter d 50 (mm), effective porosity (%), hydrate saturation (%), sand-retaining precision (the diameter of the mesh screen, mm), and uniformity coefficient (d 40 /d 90 ); sand production (g) was used as a label value.The purpose of the experiment was to use the changing variables as the indep variable x and the measured sand production as the label y to establish a machine l model.The experiment contained eight characteristics: the well type, permeabilit shale content (%), sand diameter d50 (mm), effective porosity (%), hydrate saturat sand-retaining precision (the diameter of the mesh screen, mm), and uniformity cient (d40/d90); sand production (g) was used as a label value.Feature selection is the first and vital step in the process of modeling with machine learning.Proper feature selection can improve a machine learning model's performance by reducing overfitting and the curse of dimensionality [79].For processes with large amounts of data, feature selection has been shown to have a positive role in increasing the calculation efficiency [80].The main step in feature selection is the reduction of the feature numbers by checking if a parameter is irrelevant or redundant [81].Scatter diagrams of each feature d i and label y were drawn (Figures 3 -10), which aimed at analyzing the impact of each feature on the label value.Figure 3 roughly explains the effect of the well type.The scatter plot (Figure 3) shows the effects of well types on sand production.In addition, more significant sand production was observed in the horizontal well compared to that in the vertical well.This relationship was verified by some other researchers, such as Sparlin and Shang et al., who found that sand production was severe with long horizontal sections [82,83].Figure 4 shows the relationship between permeability and sand production.Generally, the various permeability values caused different sand production levels, as was also seen in previous research [84,85].It was concluded that permeability was an effective but not very significant feature in sand production.The effectiveness was because the scatter points were concentrated in the middle and on the left side of the graph, while the "not that significant" characterization came from the still two points located on the right side. Figure 5 illustrates that the effect of the shale content had the same trend as that of the effect of permeability on sand production.The scatter points in Figure 5 are slightly more concentrated than those of permeability in Figure 4.The preliminary assessment showed that the median size of the produced sand was more relevant than the shale content (Figure 6).As shown in Figure 7, the effective porosity was more obviously relevant to sand production compared to the other parameters.Previous studies explained that porosity could significantly affect sand production as a main transportation path for flowing sand [86][87][88].Based on the scatter points in Figure 8, the hydrate saturation was less relevant to sand production compared to the porosity.Fang et al. designed experiments to investigate the sand production behavior in an NGH sediment.In the above experiments, the free sand mainly came from the dissociation of the hydrate, which caused the saturation to affect sand production through the phase change [89].Saturation could be treated as an indirect factor by considering the changes in pressure and temperature.The pressure and temperature were fixed in this experiment, which explained the lower relevance of saturation as a feature to the output label value.The concentrated distributions of points in Figures 9 and 10 prove the high relevance of sand-retaining precision and the uniformity coefficient, respectively, to sand production.To summarize Figures 3-10, it seemed to be easier to produce more sand with horizontal wells than with vertical wells.Wells with low permeability produced sand easily.A large effective porosity value increased the amount of sand produced.A small uniformity coefficient increased the sand risk.It was found that not all features had a simple linear relationship with the label value.Based on this perception, a nonlinear algorithm was adopted to fit the features.It must also be noted that there were two data points outside the system for all features, which may have been for two reasons: (1) These were two noise points coming from experimental error, and they should be abandoned; (2) the coverage of the collected data was not wide enough.There were many data points between these two outliers, and there were other data that were not collected.Since the amount of data collected in this paper was small, the second reason could explain why the two points seemed to be outliers.
The above scatter plots helped to intuitively and briefly show the relevance of each feature to the output label.For a quantifiable verification, the correlations between features were analyzed by using the correlation coefficients between them (as shown in Table 3).The correlation coefficient is a special and important covariance that eliminates the influence of dimension [90].The closer the value is to 1, the more obvious the trend of positive correlation between features is.On the other side of the coin, the closer the value is to −1, the more obvious the trend of negative correlation between features is.A weaker the correlation is found if the correlation coefficient gets closer to 0. It was obvious that there were strong positive correlations between sand-retaining accuracy and uniformity coefficient, as well as between the median sand diameter and uniformity coefficient.This could be explained by the fact that the definition of the uniformity coefficient was highly related to the median value of particle size.A greater value for these features meant that a higher sand-retaining accuracy was required for sand production, which validated the accuracy of the experimental results.According to Table 3, there were no strong correlations between features on the whole.Since the number of features was not very large, feature extraction methods, such as principal component analysis, were not suitable for the data [91].Random forest has the advantages of high accuracy and good robustness [92].The fitting effect of the nonlinear relationship was also good, and there was no need for much debugging.The random forest method was used for feature selection in this study.The base learner used 100 decision trees.The importance of each feature in the random forests is shown in Figure 11.As shown in Figure 11, the uniformity coefficient and sand-ret the two most important characteristics, the total importance of whi cording to the characteristics of the random forest, the values of the features will decrease rapidly when one feature is selected.Therefore values of the uniformity coefficient and sand-retaining precision d As shown in Figure 11, the uniformity coefficient and sand-retaining precision were the two most important characteristics, the total importance of which exceeded 80%.According to the characteristics of the random forest, the values of the contribution of other features will decrease rapidly when one feature is selected.Therefore, the high importance values of the uniformity coefficient and sand-retaining precision did not mean that the other features were dispensable; however, it could provide a reference for the selection of features.The top four features were chosen to train the model; these were the uniformity coefficient, sand-retaining precision, effective porosity, and permeability.These four features are also consistent with the professional knowledge of sand control.The four features were selected and standardized with a standardized formula (Equation ( 36)).The standardized data could eliminate the influence of each feature dimension.Furthermore, they helped to avoid the interference of the model prediction without changing the distribution of the data themselves [93].
where x i represents the original data; mean(x) is the mean of the feature; std(x) is the standard deviation of the feature.The standardized data are shown in Table 4.The pseudo-code for the SVR Algorithm 2 is shown below.The pseudo-code for the Boosting Tree (XGBoost) Algorithm 3 is shown below.avoid contingency, we randomly divided all of the data six times, whi the above process six times and observing the results.The results are sh Figures 12-23.the above process six times and observing the results.The results are sh Figures 12-23.It could be seen that the R 2 Scores of the second and fifth train while the other R 2 Scores were positive.In addition, the values of MA and fifth training were very close to those of other models.This indi posed model had no problem in itself.Focusing on the training set a viding the data each time, it was found that the second and fifth tim outliers into the training set, leading to two problems: (1) The outlier training, but the test set did not have similar samples to predict, which in the model predictions.(2) There were no outliers in the test set, but t ples was small, resulting in the concentration of the label value of y an variance.This made the R 2 Score smaller and even negative.In other outliers were divided into the training set and test set so that the va Based on the above analysis, the second and fifth R 2 Scores were exclu ing the average performance of the model, as shown in Table 6 and Fi  It could be seen that the R 2 Scores of the second and fifth training were negative, while the other R 2 Scores were positive.In addition, the values of MAE from the second and fifth training were very close to those of other models.This indicated that the proposed model had no problem in itself.Focusing on the training set and test set after dividing the data each time, it was found that the second and fifth times divided the two outliers into the training set, leading to two problems: (1) The outlier data were used in training, but the test set did not have similar samples to predict, which led to large errors in the model predictions.(2) There were no outliers in the test set, but the number of samples was small, resulting in the concentration of the label value of y and a decrease in the variance.This made the R 2 Score smaller and even negative.In other partitions, the two outliers were divided into the training set and test set so that the values were normal.Based on the above analysis, the second and fifth R 2 Scores were excluded when calculating the average performance of the model, as shown in Table 6 and Figures 24 and 25.As shown in Figures 12-23, each algorithm could fit the training set well.Even though the fitting effect of SVR was slightly insufficient compared to those of the other algorithms, it still could help to learn the overall trend of data.Based on the training performance, the second modeling and the fifth modeling performed poorly.The reason for the poor performance was discussed in the previous section; it was mainly because the amount of data was not large enough to cover a large sample space, and some samples such as outliers, were not fully learned.
Figures 24 and 25 show that the best generalization model was the Boosting Tree (XGBoost), while the worst was KNN.SVR and MLP had similar performances.KNN predicted the new samples according to the adjacent samples in its space.This would require a large sample size, and the samples should be evenly distributed in the feature space.The sample size in this paper was small and the feature space was 'sparse', which decreased the algorithm's performance.The Boosting Tree is an integrated learning method that can be regarded as an additive model, and it is based on a residual learning mechanism.It did not require too many calculations and did not have the problem of overfitting, making it very excellent in this paper.SVR and MLP had many parameters and they were also able to show good performance when the parameters were adjusted properly.The two algorithms had a very strong fitting ability for nonlinear data and very good performance in the data processing used in this paper.

Tuning and Discussion
Based on the above training and testing, the Boosting Tree had the best performance in the prediction of sand production.SVR and MLP had the second best performance As shown in Figures 12-23, each algorithm could fit the training set well.Even though the fitting effect of SVR was slightly insufficient compared to those of the other algorithms, it still could help to learn the overall trend of data.Based on the training performance, the second modeling and the fifth modeling performed poorly.The reason for the poor performance was discussed in the previous section; it was mainly because the amount of data was not large enough to cover a large sample space, and some samples, such as outliers, were not fully learned.
Figures 24 and 25 show that the best generalization model was the Boosting Tree (XGBoost), while the worst was KNN.SVR and MLP had similar performances.KNN predicted the new samples according to the adjacent samples in its space.This would require a large sample size, and the samples should be evenly distributed in the feature space.The sample size in this paper was small and the feature space was 'sparse', which decreased the algorithm's performance.The Boosting Tree is an integrated learning method that can be regarded as an additive model, and it is based on a residual learning mechanism.It did not require too many calculations and did not have the problem of overfitting, making it very excellent in this paper.SVR and MLP had many parameters, and they were also able to show good performance when the parameters were adjusted properly.The two algorithms had a very strong fitting ability for nonlinear data and very good performance in the data processing used in this paper.

Tuning and Discussion
Based on the above training and testing, the Boosting Tree had the best performance in the prediction of sand production.SVR and MLP had the second best performance based on the MSE and R 2 Score.One of the current popular applications of machine learning is to build an ensemble machine learning model to increase the performance and accuracy of prediction, classification, and clustering [94][95][96].The main feature of ensemble machine learning models is the building of a coupled model with different machine learning algorithms.The final result of an ensemble machine learning model is determined by voting with the built-in algorithms, and it is the most accurate model in a specific scenario [97].An ensemble machine learning model is built based on the important knowledge that different algorithms have their own proper scenarios, causing the ensemble model to significantly reduce the variance [98][99][100].Therefore, the Boosting Tree (XGBoost), SVR, and MLP should all be discussed in this study, rather than only focusing on the best algorithm.
The main parameter in the SVR algorithm is the type of kernel function.Table 1 shows the common kernel functions, including the linear kernel, polynomial kernel, Gaussian kernel, and sigmoid kernel, which were all tested in the paper.The linear kernel is the simplest of all of the kernel functions.The linear kernel function is determined by the sum of the inner product of the training and true values (x,y) and an optional constant c [101].The polynomial kernel function is a non-stationary kernel, which works well for normalized training data [102].The Gaussian kernel is a radial basis function [103].Some researchers, such as Lin et al., pointed out that, as a radial basis function, the Gaussian kernel was better than the sigmoid kernel [104].The tuning process in this paper also compared the prediction performance when using different kernel functions (Figure 26).It could be seen that the Gaussian kernel had a low MSE and high R 2 Score.Therefore, the Gaussian kernel was recommended in the model for the prediction of sand production based on the SVR algorithm.
Gaussian kernel, and sigmoid kernel, which were all tested in the pa is the simplest of all of the kernel functions.The linear kernel func the sum of the inner product of the training and true values (x,y) an c [101].The polynomial kernel function is a non-stationary kernel, normalized training data [102].The Gaussian kernel is a radial basi researchers, such as Lin et al., pointed out that, as a radial basis f kernel was better than the sigmoid kernel [104].The tuning proc compared the prediction performance when using different kernel It could be seen that the Gaussian kernel had a low MSE and high R Gaussian kernel was recommended in the model for the predictio based on the SVR algorithm.28, the recommended value of gamma w to achieve a proper MSE and R 2 Score.Figure 29 shows that the sam ing sample should be 0.9 for a low MSE and a high R 2 Score at the s process also found that both the regular term of weight L1 and regu had little effect on the prediction performance (Figures 30 and 31).(Table 7) [105,106].The tuning results are illustrated in Figures 27-31.Figure 27 proves that the proper value of the maximum depth of tree was around 5, with a low MAE and high R 2 Score.According to Figure 28, the recommended value of gamma was around 1 in order to achieve a proper MSE and R 2 Score.Figure 29 shows that the sampling rate of the training sample should be 0.9 for a low MSE and a high R 2 Score at the same time.The tuning process also found that both the regular term of weight L1 and regular term of weight L2 had little effect on the prediction performance (Figures 30 and 31).The number of hidden layers in the MLP algorithm is usually recommend two in some case studies [107].The main parameter for MLP tuning was the nod of the hidden layer.The potential range of the node number was [10,38] with a Figure 32 illustrates the tuning results for the node number.It was shown that th node number for the hidden layer was 14 (Figure 32).The number of hidden layers in the MLP algorithm is usually recommend two in some case studies [107].The main parameter for MLP tuning was the node of the hidden layer.The potential range of the node number was [10,38] with a Figure 32 illustrates the tuning results for the node number.It was shown that th node number for the hidden layer was 14 (Figure 32).The number of hidden layers in the MLP algorithm is usually recommended to be two in some case studies [107].The main parameter for MLP tuning was the node number of the hidden layer.The potential range of the node number was [10,38] with a step of 1. Figure 32 illustrates the tuning results for the node number.It was shown that the proper node number for the hidden layer was 14 (Figure 32).

Case Study with Machine Learning Algorithms
This paper also involved one case study in order to validate the model and to show the application potential of the proposed model.The case study compared the results of the numerical simulation and those of the proposed model with machine learning algorithms.Uchida et al. built a mathematical model to predict sand production by investigating sand migration [23].The results of the comparison are shown in Figure 33.The training data were selected at every 6 h in the interval of the 2nd day to the 7th day in Uchida's results.It was concluded that the three machine algorithms (SVR, XGBoost, MLP) performed well and could match most of the simulation results, especially at the stable-pressure stage.The main drawback of the machine learning algorithms was displayed in the early stage of sand production.There were still some gaps between the results of the simulation and the machine learning predictions.However, it should be noticed that the XGBoost algorithm provided more matching results compared to the other two machine learning algorithms.The case study's results also supported the point that XGBoost could provide the most  The number of hidden layers in the MLP algorithm is usually recommended two in some case studies [107].The main parameter for MLP tuning was the node nu of the hidden layer.The potential range of the node number was [10,38] with a step Figure 32 illustrates the tuning results for the node number.It was shown that the p node number for the hidden layer was 14 (Figure 32).

Case Study with Machine Learning Algorithms
This paper also involved one case study in order to validate the model and to the application potential of the proposed model.The case study compared the resu the numerical simulation and those of the proposed model with machine learning rithms.Uchida et al. built a mathematical model to predict sand production by inves ing sand migration [23].The results of the comparison are shown in Figure 33.The training data were selected at every 6 h in the interval of the 2nd day to the 7th day in Uchida's results.It was concluded that the three machine algorithms (SVR, XGBoost, MLP) performed well and could match most of the simulation results, especially at the stable-pressure stage.The main drawback of the machine learning algorithms was displayed in the early stage of sand production.There were still some gaps between the results of the simulation and the machine learning predictions.However, it should be noticed that the XGBoost algorithm provided more matching results compared to the other two machine learning algorithms.The case study's results also supported the point that XGBoost could provide the most accurate results among the three algorithms.The case study also validated the feasibility of applying machine learning in the prediction of sand production.

Conclusions
The comparison of four machine learning algorithms for application to laboratoryscale sand production tests revealed the following results:

Conclusions
The comparison of four machine learning algorithms for application to laboratory-scale sand production tests revealed the following results:

Figure 1 .
Figure 1.Schematic diagram of KNN (the circle represents an isometric line).

Figure 1 .
Figure 1.Schematic diagram of KNN (the circle represents an isometric line).

Figure 2 .
Figure 2. Experimental device for the simulation of sand production.

Figure 2 .
Figure 2. Experimental device for the simulation of sand production.

Figure 3 .
Figure 3. Well type and measured sand production.

Figure 3 .
Figure 3. Well type and measured sand production.

Figure 3 .
Figure 3. Well type and measured sand production.

Figure 5 .
Figure 5. Shale content and measured sand production.

Figure 6 .
Figure 6.Median sand diameter and measured sand production.

Figure 3 .
Figure 3. Well type and measured sand production.

Figure 5 .
Figure 5. Shale content and measured sand production.

Figure 6 .
Figure 6.Median sand diameter and measured sand production.

Figure 5 .
Figure 5. Shale content and measured sand production.

Figure 3 .
Figure 3. Well type and measured sand production.

Figure 5 .
Figure 5. Shale content and measured sand production.

Figure 6 .
Figure 6.Median sand diameter and measured sand production.

Figure 6 .
Figure 6.Median sand diameter and measured sand production.

Figure 7 .
Figure 7. Effective porosity and measured sand production.

Figure 10 .
Figure 10.Uniformity coefficient and measured sand production.

Figure 10 .
Figure 10.Uniformity coefficient and measured sand production.

Figure 11 .
Figure 11.Importance of features calculated by the random forest.

Figure 11 .
Figure 11.Importance of features calculated by the random forest.

Algorithm 2 .
SVR pseudo-code.# Importing the necessary libraries # Read files to input pre-processed data and labels Input original training data: train.csv# Calculate parameters w, b For j = 1 in the training dataset b = average [yj-sum(Lagrangian multiplier*yj*xj)] w = sum(Lagrangian multiplier**yj*xj) Check KKT conditions (continuing if satisfied) Define Kernel Function (linear, sigmoid, polynomial, gaussian) Assume d = 1 For i = 1 in the test dataset Y_predicted = sign(sum(Lagrangian multiplier*x_test*Kernel Function + b

Figure 12 .
Figure 12. Results of the first training with training dataset.

Figure 13 .
Figure 13.Assessment of the first training with the test dataset.

Figure 12 .
Figure 12. Results of the first training with training dataset.

Figure 12 .
Figure 12. Results of the first training with training dataset.

Figure 13 .
Figure 13.Assessment of the first training with the test dataset.

Figure 13 .Figure 14 .
Figure 13.Assessment of the first training with the test dataset.

Figure 15 .
Figure 15.Assessment of the second training with the test dataset.

Figure 16 .
Figure 16.Results of the third training with the training dataset.

Figure 14 .Figure 14 .
Figure 14.Results of the second training with the training dataset.

Figure 15 .
Figure 15.Assessment of the second training with the test dataset.

Figure 16 .
Figure 16.Results of the third training with the training dataset.

Figure 15 .Figure 14 .
Figure 15.Assessment of the second training with the test dataset.

Figure 15 .
Figure 15.Assessment of the second training with the test dataset.

Figure 16 .
Figure 16.Results of the third training with the training dataset.

Figure 16 .
Figure 16.Results of the third training with the training dataset.

Energies 2022, 15 , 6509 21 of 32 Figure 16 .
Figure 16.Results of the third training with the training dataset.

Figure 17 .
Figure 17.Assessment of the third training with the test dataset.

Figure 17 .Figure 18 .
Figure 17.Assessment of the third training with the test dataset.

Figure 19 .
Figure 19.Assessment of the fourth training with the test dataset.

Figure 18 .Figure 18 .
Figure 18.Results of the fourth training with the training dataset.

Figure 19 .
Figure 19.Assessment of the fourth training with the test dataset.

Figure 19 .
Figure 19.Assessment of the fourth training with the test dataset.

Energies 2022, 15 , 6509 22 of 32 Figure 19 .
Figure 19.Assessment of the fourth training with the test dataset.

Figure 20 .
Figure 20.Results of the fifth training with the training dataset.

Figure 21 .
Figure 21.Assessment of the fifth training with the test dataset.

Figure 20 .
Figure 20.Results of the fifth training with the training dataset.

Figure 19 .
Figure 19.Assessment of the fourth training with the test dataset.

Figure 20 .
Figure 20.Results of the fifth training with the training dataset.

Figure 21 .
Figure 21.Assessment of the fifth training with the test dataset.

Figure 21 .Figure 22 .
Figure 21.Assessment of the fifth training with the test dataset.

Figure 22 .
Figure 22. Results of the sixth training with the training dataset.

Figure 22 .
Figure 22. Results of the sixth training with the training dataset.

Figure 23 .
Figure 23.Assessment of the sixth training with the test dataset.

Figure 23 .
Figure 23.Assessment of the sixth training with the test dataset.

Figure 24 .
Figure 24.Comparison of the different algorithms based on the MAE.

Figure 24 .
Figure 24.Comparison of the different algorithms based on the MAE.

Figure 24 .
Figure 24.Comparison of the different algorithms based on the MAE.

Figure 25 .
Figure 25.Comparison of the different algorithms based on the R 2 Score.

Figure 25 .
Figure 25.Comparison of the different algorithms based on the R 2 Score.

Figure 26 .
Figure 26.Tuning results of different kernel functions for the SVR algorith

Figure 26 .
Figure 26.Tuning results of different kernel functions for the SVR algorithm.The main parameters of XGBoost were tested in the proper range, which was similar to those used by Pan et al. and Parsa; the ranges are shown in the following table(Table 7)[105,106].The tuning results are illustrated in Figures27-31.Figure27proves that the proper value of the maximum depth of tree was around 5, with a low MAE and high R 2 Score.According to Figure28, the recommended value of gamma was around 1 in order to achieve a proper MSE and R 2 Score.Figure29shows that the sampling rate of the training sample should be 0.9 for a low MSE and a high R 2 Score at the same time.The tuning process also found that both the regular term of weight L1 and regular term of weight L2 had little effect on the prediction performance (Figures30 and 31).

Figure 27 .
Figure 27.Tuning results for the maximum depth of tree.

Figure 29 .
Figure 29.Tuning results for the sampling rate of the training sample.

Figure 27 .Figure 27 .
Figure 27.Tuning results for the maximum depth of tree.

Figure 29 .
Figure 29.Tuning results for the sampling rate of the training sample.

Figure 29 .
Figure 29.Tuning results for the sampling rate of the training sample.

Figure 29 .
Figure 29.Tuning results for the sampling rate of the training sample.

Figure 30 .
Figure 30.Tuning results for the regular term of weight L1.

Figure 31 .
Figure 31.Tuning results for the regular term of weight L2.

Figure 32 .
Figure 32.Tuning results for the node number for the hidden layer.

Figure 30 .Figure 30 .
Figure 30.Tuning results for the regular term of weight L1.

Figure 31 .
Figure 31.Tuning results for the regular term of weight L2.

Figure 32 .
Figure 32.Tuning results for the node number for the hidden layer.

Figure 31 .
Figure 31.Tuning results for the regular term of weight L2.
the three algorithms.The case study also validated the feasibility of applying machine learning in the prediction of sand production.

Figure 31 .
Figure 31.Tuning results for the regular term of weight L2.

Figure 32 .
Figure 32.Tuning results for the node number for the hidden layer.

Figure 32 .
Figure 32.Tuning results for the node number for the hidden layer.

Figure 33 .
Figure 33.Comparison between the numerical results from Uchida [23] and those of the machine learning algorithms.

( 1 )
This paper built a machine learning model to predict the sand production in an unconsolidated NGH reservoir with four different algorithms, which were KNN, SVR, Boosting Tree (XGBoost), and MLP.The input data for the model were provided by a sand production experiment.(2) As shown by the comparison of the four different algorithms, KNN had the worst performance, while XGBoost provided prediction results with the lowest MSE value and a high R 2 Score.The final algorithms selected for building further ensemble models were SVR, XGBoost, and MLP.(3) The tuning process showed that the kernel function had a great impact on the performance of SVR.The kernel function recommended for the sand prediction model was the Gaussian kernel.The best parameters for the XGBoost algorithm were tested and provided; these included the maximum depth of tree, gamma, sampling rate of the training sample, regular term of weight L1, and regular term of weight L2. (4) The three selected machine learning algorithms were also applied to the results of a rigorous numerical simulation (Uchida et al., 2016), and all of them were able to give results that reasonably matched with those of the numerical solution.XGBoost performed better and was recommended for the prediction of sand production in the early sand production stage.

Figure 33 .
Figure 33.Comparison between the numerical results from Uchida [23] and those of the machine learning algorithms.

( 1 )
This paper built a machine learning model to predict the sand production in an unconsolidated NGH reservoir with four different algorithms, which were KNN, SVR, Boosting Tree (XGBoost), and MLP.The input data for the model were provided by a sand production experiment.(2)As shown by the comparison of the four different algorithms, KNN had the worst performance, while XGBoost provided prediction results with the lowest MSE value and a high R 2 Score.The final algorithms selected for building further ensemble models were SVR, XGBoost, and MLP.(3) The tuning process showed that the kernel function had a great impact on the performance of SVR.The kernel function recommended for the sand prediction model was the Gaussian kernel.The best parameters for the XGBoost algorithm were tested and provided; these included the maximum depth of tree, gamma, sampling rate of the training sample, regular term of weight L1, and regular term of weight L2. (4) The three selected machine learning algorithms were also applied to the results of a rigorous numerical simulation(Uchida et al., 2016), and all of them were able to give results that reasonably matched with those of the numerical solution.XGBoost performed better and was recommended for the prediction of sand production in the early sand production stage.
[74]performance metric MSE( f (x), y) is 0, indicating that the predicted label value in the test sample is exactly the same as the true value.A perfect model has been built to predict all of the test samples.2.If R 2 ≈ 0, the numerator is equal to the denominator, indicating that our predicted values are all the mean values of the real values.In this situation, the prediction model explains none of the variability of the response data around its mean.Sánchez et al.(2019) also explained that, in this scenario, the inclusion of variables can be neglected, and the built prediction model is not adequate[74].3.If 0 < R 2 < 1, the score is within the normal range.A value closer to 1 indicates a better fit, while a value closer to 0 indicates a worse fitting effect.4.
Table 2 illustrates the measured values, where x i represents the ith dataset, d i represents the above features, and y represents the label value of sand production.
Energies 2022, 15, x FOR PEER REVIEW

Table 2
illustrates th ured values, where xi represents the ith dataset, i d represents the above features, represents the label value of sand production.

Table 2 .
Original data from the experiment.

Table 2 .
Original data from the experiment.

Table 3 .
Correlation coefficients between features.

Table 3 .
Correlation coefficients between features.

Table 4 .
Standardized data for model training.

Table 5 .
Results of the modeling assessment.

Table 5 .
Results of the modeling assessment.

Table 5 .
Results of the modeling assessment.

Table 6 .
Average scores in the evaluation of model performance.

Table 6 .
Average scores in the evaluation of model performance.

Table 7 .
Ranges of parameters in the tuning process of the XGBoost algorit

Table 7 .
Ranges of parameters in the tuning process of the XGBoost algorithm.