One-Dimensional Convolutional Neural Network with Adaptive Moment Estimation for Modelling of the Sand Retention Test

: Stand-alone screens (SASs) are active sand control methods where compatible screens and slot sizes are selected through the sand retention test (SRT) to ﬁlter an unacceptable amount of sand produced from oil and gas wells. SRTs have been modelled in the laboratory using computer simulation to replicate experimental conditions and ensure that the selected screens are suitable for selected reservoirs. However, the SRT experimental setups and result analyses are not standardized. A few changes made to the experimental setup can cause a huge variation in results, leading to different plugging performance and sand retention analysis. Besides, conducting many laboratory experiments is expensive and time-consuming. Since the application of CNN in the petroleum industry attained promising results for both classiﬁcation and regression problems, this method is proposed on SRT to reduce the time, cost, and effort to run the laboratory test by predicting the plugging performance and sand production. The application of deep learning has yet to be imposed in SRT. Therefore, in this study, a deep learning model using a one-dimensional convolutional neural network (1D-CNN) with adaptive moment estimation is developed to model the SRT with the aim of classifying plugging sign (screen plug, the screen does not plug) as well as to predict sand production and retained permeability using a varying sand distribution, SAS, screen slot size, and sand concentration as inputs. The performance of the proposed 1D-CNN model for the slurry test shows that the prediction of retained permeability and the classiﬁcation of plugging sign achieved robust accuracy with more than a 90% value of R 2 , while the prediction of sand production achieved 77% accuracy. In addition, the model for the sand pack test achieved 84% accuracy in predicting sand production. For comparative model performance, gradient boosting (GB), K-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM) were also modelled on the same datasets. The results showed that the proposed 1D-CNN model outperforms the other four machine learning models for both SRT tests in terms of prediction accuracy.


Introduction
The sand retention test (SRT) is a procedure often used to select the most optimal sand screen to be used in reservoir sand control [1]. Optimal sand screen selection refers to the selection of the most appropriate screen for sand control completion, minimizing sand production and maximizing hydrocarbon production. SRT simulates the sand production into a wellbore after passing through a filter, a standard test in the upstream sector in the oil and gas industry. It helps engineers choose the right screen and aperture size for the right field and environment.
Implementing SRT helps in providing a better understanding of sand retention efficiency and plugging performance. It can be done in the laboratory or by simulation of mathematical models. Both results from laboratory and simulation are compared to examine the method's accuracy [2]. The SRT laboratory method that focuses on stand-alone screen (SAS) application is classified into two groups: the slurry test and the sand pack test [3]. Generally, the slurry test refers to an experiment where sand is suspended in a fluid to form a slurry, which is then pumped through the screen, while for the sand pack test, sand is placed directly onto the screen along with the confining stress to ensure that the sand is compressed with the screen before the fluid is pumped through the sand pack and the screen [4]. One of the experimental setups of slurry and sand pack tests are shown in Figure 1. However, the SRT problems observed from previous studies [1][2][3][4][5][6][7][8][9][10] showed no standard guideline to carry out the SRT experiment. There are various experimental setups and different ways of interpreting the results. The plugging performance and sand retention efficiency are analyzed differently in selecting the most compatible SAS and screen slot size. A few changes made to the experimental setup can cause a huge variation in results. Besides, conducting many laboratory experiments is expensive and time-consuming [11]. Testing one particular screen and slot size to analyze the sand production can take hours to reach the final result despite using the laboratory or simulation tests.
Deep learning is a sub-branch of machine learning inspired by biological neurons of the brain called artificial neural networks (ANNs) with representation learning. It highlights the use of multiple connected layers with a dominant learning ability to take inputs as the features, which are modelled to predict outputs [12]. Deep learning has become the most vital discovery and research hotspot, which can tackle intricate patterns in a massive dataset of any data type [13]. It is not only implemented to model the prediction or classification problem, but it can also be applied for modelling forecasting time-series data [14][15][16][17][18][19][20]. It can automatically apply feature extraction where a separate implementation is needed for shallow machine learning approaches.
Convolutional neural networks (CNNs) is a specific type of ANN that focus on many deep layers of neural networks, which uses convolutional and pooling strategies [21]. It is a deep learning model used primarily in computer vision problems, such as image classification [22], image segmentation [23], and video object segmentation [24], where it shows promising results. It can also be used for regression problems to generate complex models with higher accuracy using complex datasets, which cannot be done with just a simple regression function. CNN typically uses two-or three-dimensional neural layers for classification problems that take the input in the same dimension to undergo a feature learning process. It works the same way despite having one, two, or three dimensions. The differences between dimensions are the input structure, pool size, filter size, and how it shifts from features to features. Developing CNN starts with a configuration of the external parameters known as hyperparameters [25]. Adjusting the hyperparameters is significant to get a better model with high predictive power.
The application of CNN in the petroleum industry has been undertaken to detect hydrocarbon zones in seismic images where 2D-CNN is used in the image segmentation process, and the model achieved more than 80% accuracy [26]. The developed model used 2D-CNN because the input of seismic images is in two dimensions (2D). Other than image segmentation, the implementation of CNN has also been applied for image recognition in the petroleum industry. Zhou et al. [27] developed a well pump troubleshooting model in which the 2D power card images are taken as input to classify different types and severity of pump troubles. The analysis of power card images helps to identify issues that could impact the oil well production and help in the configuration of pumping parameters. The accuracy of power cards equivalent to seven pump troubles achieved more than 96%, except for the undersupplying trouble type, which obtained 87% accuracy.
Besides, the CNN model's development on the regression problem was carried out by Daolun et al. [28], where a radial composite reservoir model was developed using 2D-CNN and verified by the oilfield measurement data. The mean absolute error (MAE) was used as the performance metric to verify the developed model. All outputs showed a small error with less than 0.7 for both validation and testing sets. Kwon et al. [29] applied 2D-CNN to determine the location of an oil well under geological uncertainty by predicting the cumulative oil production of reservoir simulation. The performance of the 2D-CNN model was compared with a shallow machine learning model, ANN, and the results showed that the 2D-CNN model achieved 88% accuracy with a relative error of 0.035. In comparison, the ANN model achieved 84% accuracy with 0.24 relative error. The CNN model outperforms the ANN model in predicting cumulative oil production. Li et al. [30] applied the CNN method to simultaneously predict the volume flow rate of multiphase flow of oil, gas, and water phases. The results are very convincing, with more than 80% accuracy for all phases and were verified by the measurement of an individual well in the petroleum industry.
Since the application of CNN in the petroleum industry attained promising results for both classification and regression problems, this method was applied on SRT to reduce the time and effort to run the laboratory test by predicting the plugging performance and sand production. The application of deep learning has yet to be imposed in SRT. Therefore, a onedimensional convolutional neural network (1D-CNN) was developed for SRT modelling to classify plugging sign, predict sand production, and retain permeability using various sand distribution, stand-alone screen, screen slot size, and sand concentration. Classification and prediction of the SRT results were used to filter the most compatible SAS and screen slot size accordingly. Then, 1D-CNN hyperparameters were tuned manually to determine optimal hyperparameters combination that will provide higher predictive power for the developed model.

Variable Identification in the Sand Retention Test
Since there is no implementation of any deep learning algorithm in SRT modelling, the SRT dataset requires feature selection using statistical analyses to determine appropriate features to fit into the model. Therefore, all the experimental setup and correlation results from various studies were compared to identify the factors that affect the plugging performance and sand retention efficiency to create the SRT dataset.
Some of the factors considered to interpret screen plugging are particle size distribution (PSD), flow rate, weave open volume, and pressure gradient. In contrast, the variables that affect screen retention and sand production are PSD, fines content, sorting and uniformity coefficients, SAS, screen slot width, fluid viscosity, fluid density, flow rate, pressure gradient, and sand concentration in the test cell. These factors were classified into four groups: sand characteristics, screen characteristics, fluid characteristics, and the condition in test cell.

Sand Characteristics
The PSD of d 1 , d 5 , d 10 , d 30 , and d 50 are used to analyze plugging performance and sand production [3][4][5][6][7][8]. The PSD of d x refers to the percentage of sand in the sample with a specific size. For example, if d 1 is equal to 300 microns, then 1% of the sample has a sand size of 300 microns. The PSD of d 5 , d 10 , d 30 , and d 50 of the same sample will have a smaller sand size than d 1 because the higher the value of x, the smaller the sand size in the sample. PSD of d 1 , d 5 , and d 10, which represent the large sand grains, shows a good correlation with sand retention, while PSD of d 50, representing the median sand size, gives a weaker relationship [3]. Markestad et al. [5] used only one point on the particle distribution curve, which is d 10 , but it cannot accurately predict plugging nor sand production. In other words, parameters other than d 10 must be considered when choosing the slot width of the sand control screen. In contrast, Ballard and Beare [4,6] used PSD of d 10 and d 30 as an indicator to select the screen slot size, which can control the amount of sand produced for a particular screen. For good sand retention, the screen slot size should be smaller than d 10 because the bigger the screen slot size, the higher the amount of sand produced [7,8].
Next, the other sand characteristics used in SRT are fines content, sorting, and uniformity coefficients. Ballard and Beare [4] found that a high fines content has a shallow impact on the sand production compared to the largest grains in the sand distribution. If the size of the sand in the sample is mostly less than 45 microns, then the amount of sand that passes through the screen will be high. The sorting and uniformity coefficients show a bad interaction with the amount of sand passing through the weave [4,6]. However, the combination of the fines content with the sorting coefficient gives a good interaction, where a poorly sorted sand with many fine particles results in a high risk of sand production. In contrast, the well-sorted sand with high fines content results in a smaller amount of sand passing through the weave [3,5].

Screen Characteristics
The factors involved in screen characteristics are weave open volume, type of screen, and screen slot size. Ballard and Beare [6] found that plugging also tends to occur for a small weave open volume. The lower the weave open volume, the higher the tendency of sand to lodge in the weave, reducing the overall area open to flow. Besides, the type of screen used also has an impact on retention. Ballard and Beare [4] used two kinds of screen: premium and wire wrapped screen. The performance of each screen can be identified by investigating the effect of d 10 on the amount of sand produced. The premium screen shows excellent performance, but the wire-wrapped screen indicates otherwise. Mathisen et al. [9] recommended a single wire-wrapped screen if the sand distribution is good and has a low tendency for more sand production. Otherwise, a premium screen should be used. Ballard and Beare [6] compared the screen performance between wire wrap with a metal mesh screen on the screen slot size by observing the pressure gradient. It turns out that the pressure gradient on the wire wrap screen is more sensitive to the screen slot size than the metal mesh screen due to the lower flow area and different flow regime.

Fluid Characteristics
The density and viscosity of fluid used in the sand slurry also affect sand production in SRT. The high density, which is a more viscous carrier fluid, is used along with low flow rates, resulting in a longer delay between coarse particles and fines particles reaching the screen [6]. As a result, the sand is carried with the flow into the slots and leads to high sand production.

The Condition in Test Cell
The test cells' conditions that consider plugging performance and sand retention efficiency are the flow rate, sand concentration, and pressure. The different flow rates used in the experiment lead to different results. The high flow rate initiated at the beginning of the experiment leads to plugging, but gradually increasing the flow rate throughout the experiment reduces the risk of plugging [5]. However, initiating the experiment with a low flow rate without increasing it leads to a higher amount of sand passing through the screen [6,10].
The sand concentration in the test cell also affects sand production. Ballard and Beare [6] found that the sand slurry's low sand concentration leads to an increasing amount of sand produced before bridging occurs. Large grains are essential for the bridging process to start on the screen and must be large enough to fit through the slot. This finding is supported by Fisher and Hamby [10], where the lower volume fraction of formation sand in the flow stream causes a higher sand production.
Other than that, pressure drop and pressure gradient are used to interpret the sand production and plugging performance. According to the practice of SAS selection recommended by Mathisen et al. [9], the screen with the lowest pressure drop and the highest permeability are associated with high sand retention. Ballard et al. [7] investigated the correlation of a sand reaching screen with the pressure drop. It showed a steep gradient for simulated laser particle size analysis (LPSA) sand compared to simulated sieve analysis and reservoir sand. An increase in the pressure drop represents an increase in the flow resistance through the screen, due to the build-up of particles on top and inside the screen [9]. Therefore, a higher pressure drop produces a more elevated amount of sand production [10].
Since there is no standard way to interpret the trend of the pressure gradient towards plugging, Mathisen et al. [9] mentioned that the pattern that shows a linear pressure build-up represents the formation of the permeable sand pack on top of the screen while the exponential behavior is a sign of plugging occurring on the screen. However, the observation of pressure during the laboratory test shows that plugging does not contribute to the decrease or increase of pressure [6]. The pressure slowly falls when there is some plugging of the screen and decreases initially when some new sand is washed through the screen. In other words, the pressure gradient results come from variations in sand characteristics rather than plugging.

Variable Summary
According to various studies [3][4][5][6][7][8][9][10], the variables that were considered for data collection are PSD of d 1 , d 5 , d 10 , d 30 , and d 50 ; fines content; sorting and uniformity coefficients; weave open volume; type of screen; screen slot size; fluid viscosity; fluid density; flow rate; sand concentration; pressure drop; pressure gradient; screen plugging; amount of sand produced; and retained permeability. The detailed procedures of developing a 1D-CNN model for SRT are described in Section 2, and the result of the model and the comparative model performances are discussed in Section 3.

Materials and Methods
The 1D-CNN model development workflow is presented in Figure 2 and the preparation of the methods are explained in Section 2.1 until Section 2.5. The workflow of 1D-CNN model development starts with data collection where all variables related to slurry and sand pack tests are collected and identification of inputs and outputs are made thoroughly. Next, the collected data is analyzed to explore the data and gain useful information. After that, the data undergoes pre-processing and normalization to be fitted for modelling. The modelling phase begins with the initialization of hyperparameters for 1D-CNN. Once the hyperparameters are initialized, the 1D-CNN model is trained with an adaptive moment estimation (Adam) optimizer. The hyperparameters are tuned and iterated using the trial-and-error method until the model shows good performance metrics with a minimal loss function. In other words, the iteration will stop when the loss function has reached convergence for both training and testing data, but the hyperparameters are tuned if the loss function does not converge. Besides, the stopping criteria for each iteration depends on the number of epochs. All models with a different set of hyperparameters are evaluated and validated. Lastly, the final 1D-CNN model with the Adam optimizer is developed for SRT.

Data Collection
The SRT dataset was extracted from various works of literature related to slurry  and sand pack tests . The sand retention experimental report from PETRONAS Research Sdn. Bhd. (PRSB) was also added into the dataset to identify the standard variable used for the whole SRT process. The standard set of variables is needed to ensure that no essential variable is left behind. As the SRT modelling problem falls under both classification and regression problems, variable identification works were done thoroughly. The variable was classified into input and output, specifying whether it is qualitative or quantitative and labelled as a discrete or continuous variable.

The Availability of Sand Retention Test Variables
Once the factors that affect screen plugging and sand retention efficiency were identified, data quality assessment was done to identify and remove missing values in the dataset. In total, 38 inputs and 21 outputs with 516 observations were identified for the slurry test, while there were 42 inputs and 19 outputs with 683 observations for the sand pack test. However, due to the unmatched variable used between the literature and PETRONAS's report, a lot of missing values were detected in the dataset, especially for the sand pack test. The previous works from the literature did not reveal all the exact parameters used for SRT, leading to missing values. In this study, the variables were reduced to 8 inputs and 4 outputs for the slurry test with a different number of observations for each output. As for the sand pack test, the variables were reduced to 5 inputs and only 1 output with 263 observations. The reduced set of variables is shown in Table 1, and the set of data used to train 1D-CNN is shown in Table 2.

Data Analysis
Data analysis for the SRT dataset was done by focusing on descriptive and inferential statistics. Descriptive statistics concentrate on the univariate analysis, where the distribution of each variable in Table 1 is visualized to summarize the data. In contrast, inferential statistics highlight the bivariate analysis where a statistical test is used to measure the correlation between input and output variables.

Univariate Analysis
The distribution of continuous variables was visualized using the kernel density plot to display the observed values' dispersion. In contrast, the bar chart was used for categorical variables to represent the group's frequency in each variable, as shown in Figure 3. The kernel density plot was used instead of a typical histogram because it produces a smooth curve without having a normality assumption by considering each observation of the variable to contribute to different classes representing the distribution [37]. It also gives clear information on whether the distribution has a normal, bimodal, or skewed distribution. A bar chart was used to visualize PLUG_SIGN and SCREEN variables because the number of groups in both variables is not huge. The groups in both variables are easily identified and interpreted when looking at the bar chart. As for the slurry data, the PLUG_SIGN and the SCREEN variables fall under the categorical type, while the SCREEN variable is the only categorical type in the sand pack data. It shows that the occurrence of the screen to plug is lower than the screen not to plug. The premium screen frequency has the highest number for both the slurry and sand pack data, followed by the WWS screen.
The continuous slurry variables with bimodal distributions were D10_B, D50_B, D90_B, UC, FINE_CO_B, and RETAINED_PRM. Thus, the mode was used as central values for these variables. On the other hand, the SLOT_SIZE variable distribution looked symmetrical, while the SAND_CONC_IN_TC, SAND_PRODUCED, and SAND_PROD_PER_AREA gave positive skewed distributions. Hence, the mean value was used to represent the central value of the SLOT_SIZE, whereas the median was used for the positive skewed distributions.
Furthermore, all continuous variables of sand pack data showed positive skewed distributions where the average value of each variable is greater than the median, and all variables have outliers. However, SIZE_CR1, SIZE_CR2, SIZE_CR3, and SIZE_CR4 variables also had two peaks in the distribution in which the data had more than one center, which means only SAND_PROD_PER_AREA showed a positive skewed distribution with a single peak. Therefore, the useful measure that can accurately capture the central tendency for SAND_PROD_PER_AREA is the median. Simultaneously, the mode is used for SIZE_CR1, SIZE_CR2, SIZE_CR3 and SIZE_CR4 because both the mean and median have no meaningful interpretations for a bimodal distribution.

Bivariate Analysis
Bivariate analysis focused on correlation analysis where Pearson's product-moment and Spearman's rank correlations were used to evaluate the degree of association between two continuous variables and determine the direction and strength of the relationship [38]. The significance of both correlations was tested using the p-value, where it represents the probability of rejecting the null hypothesis when the p-value is less than the significance level of 5%, ∝ = 0.05 [39]. The null hypothesis refers to the hypothesis of the correlation coefficient of two continuous variables that are not significantly different from zero and no statistical significance exists in the population. By rejecting the null hypothesis, two continuous variables are statistically significant. The significance of the p-value and the highest correlation coefficient were used as the final output. The only variables that will undergo correlation analysis are between continuous inputs and outputs. There was no computation of association within the inputs or outputs because the most important thing to explore is the dependency of output variables towards the input variables.
The Pearson's product-moment is denoted as r; and Spearman's rank correlations, R were computed according to Equations (1) and (2) [40]: In Equations (1) and (2), x i is the value of input for the i th observation; x is the average value of input; y i is the value of output for the i th observation; y is the average value of output; n is the number of observations in the dataset; and d i is the difference between the ranks of corresponding variables.
A Pearson correlation coefficient close to zero indicates no linear relationship between variables or the variables are independent while a −1 or +1 value indicates a perfect negative or positive linear relationship, respectively [41]. If the relationship between input and output variables is nonlinear, then the linear relationship's degree might be low. Thus, r; is close to zero [37]. When the p-value is less than ∝, in this case, 0.05, there is a statistically significant correlation between inputs and outputs. If the r; value is close to zero despite having a p-value < 0.05, the significant correlation might be nonlinear. Therefore, the Spearman's rank coefficient was used instead. The correlation coefficient of both the slurry and sand pack tests were visualized using a heatmap, as shown in Figure 4. The white box (no value) in Figure 4 represents the hypothesis tests that failed to reject the null hypothesis for both the Pearson and Spearman correlation tests due to the p-value, which is greater than the significance level. Thus, the correlation coefficient is not significant for use. The slurry test's highest correlation coefficient is between D50_B (input) and RETAINED_PRM (output) with a −0.66 value. It means that there is a moderate negative correlation between PSD of d 50 with the retained permeability. Meanwhile, the highest correlation coefficient of the sand pack test was observed between SIZE_CR2 (input) with SAND_PROD_PER_AREA (output) with a 0.36 value, which indicates a low positive correlation between the sizing criteria 2 with the amount of sand produced per unit area.

Data Preparation
SRT data underwent four processes before it could fit into the 1D-CNN model. The process started with handling categorical data, pre-processing the data using min-max scaler-famously called normalization-, reshaping the scaled data, and splitting the data into training and testing sets.
The categorical variables were identified in the data collection phase, which are the screen and plugging sign. As the group in screen variable is nominal, the one-hot encoding method was used to change the data from the character form to numeric form. For example, if the screen column has three categorical groups: premium, wire wrap, and metal mesh, three new columns were created with the value of 0 and 1. A premium column with a value of 1 means the observation used a premium screen while 0 refers to the other screens, as in wire wrap or metal mesh. As for the plugging sign, the group can be converted into a number using a label encoder without creating a new column. A value of 0 represents screens that do not plug, while 1 is assigned when the screen plugging occurs for respective observations.
Next, normalization was performed to transform variables within the range of 0 and 1. The normalized data have a mean of zero and a standard deviation of one. Using unnormalized data to train the neural network can lead to an unstable or slow learning process [42]. The min-max scaler for normalization was computed as follows: In Equation (3), x i is the value of the variable for the i th observation, min(x) is the minimum value of variable, and max(x) is the maximum value of the variable.
Subsequently, the scaling data needed to be reshaped into three dimensions, representing the row, column, and channel. Lastly, the reshaped data were divided into two sets using a 70:30 ratio for the training and testing set.

Convolutional Neural Network Experimental Setup
A 1D-CNN architecture for the SRT dataset is composed of an input layer, convolutional (CNN) layer, rectified linear unit (ReLU) activation function, pooling layer, fully connected (FC) layer or often called the dense layer, dropout layer, flatten layer, and a fully connected output layer. The process from the input layer to the output layer is called forward propagation, where the input is transferred and processed for feature extraction and prediction is generated. Meanwhile, updating model parameters, such as kernels, weights, and biases, from the output layer back to the input layer is called backward propagation or backpropagation. The forward and backward propagations were applied in the training phase to obtain the model parameters' values to reduce the loss function [43]. The pseudocode of the 1D-CNN modelling process is shown in Algorithm 1 and the detailed sequence of the 1D-CNN architecture after the configuration of the hyperparameters is presented in Figure 5. The input layer holds the SRT dataset before it is fed into the hidden layers. The hidden layers are the intermediate layers between the input and output layers, where the processing level occurs [44]. The number of layers is empirically optimized during the training and validation process. The hidden layers shown in Figure 5 consist of convolution, ReLU, pooling, dense, dropout, and flatten layers. The CNN layer generates a feature map when each neuron's output is computed by performing the dot product of input values with the weight of filters called kernels [44]. The filter is convolved with the features in the input layer, and it adopts a one-dimensional structure where it will extract the features in sequence according to its size and stride. The stride refers to the number of columns that should be shifted at each step, while the kernel size specifies the number of columns that the filter should extract at a single time [45]. The output feature map will then undergo nonlinear transformation using the rectified linear unit (ReLU) activation function. The ReLU activation function allows the network to learn the complex relationship of the SRT dataset and normalizes each neuron's output into a small range between 0 and 1. It is commonly used for deep learning models to achieve better performance results [46] without changing the function's output shape.
The outputs from the activation function are scaled down in the pooling layer where the downsampling operation, such as the average or maximum function, takes place [44]. The dimensionality of the features is reduced depending on the value of the pool size and stride to give the optimal network structure and enhance feature robustness [47]. In this architecture, the downsampling method used is max pooling, where the pool window slides across the input feature maps from the CNN layer by the stride value and takes the maximum amount as the output. Next, the pooling layer's output is fed into three dense layers where each layer passes through the ReLU activation function to provide stable convergence for the model. The number of neurons in the dense layer refers to the number of units that perform a linear transformation of the inputs with weights and biases [48]. The dense layer helps to interpret the learned features that have gone through feature extraction in the CNN and pooling layer before predicting the output. The forward propagation from the input layer to the input of the neuron where the convolution, activation function, and downsampling operation take place are demonstrated in (4)-(6).
In (4)-(6), x l k is the input in the CNN layer; b l k is the bias of the k th neuron in the CNN layer; s l−1 i is the output of the i th neuron in the input layer; w l−1 ik is the kernel from the i th neuron in the input layer to the k th neuron in the CNN layer; f x l k represents the ReLU activation function applied to the input in the CNN layer; y l k is the output of the convolution operation; s l k is the output of the k th neuron at the pooling layer; and ↓ SS refers to the downsampling operation with the factor, SS. Next, the forward propagation from the pooling layer to the input of the neuron in the dense layer is formulated in (7) and (8).
In (7) and (8); x l+1 i is the input in dense layer; b l+1 i is the bias of the i th neuron in the dense layer; w l ki is the weight from the k th neuron in the pooling layer to the i th neuron in the dense layer; and y l+1 i is the output of the dense layer after applying the ReLU activation function.
The dropout layer is added after the third dense layer to reduce overfitting due to the large number of model parameters that the network has. Dropout is a technique that drops specific nodes randomly according to the dropout rate and creates a network ensemble [48]. All the connections from and to the dropped nodes need to be removed as well. The use of dropout works nicely to improve regularization error and boost the testing set performance [44]. Lastly, the output from the dropout layer turns into a single vector in the flatten layer and passes through a dense layer before the output layer, which follows the format that can be used to generate the final prediction. The flatten layer is needed to convert the output from the dropout layer, which has a 3D output shape into a 1D output shape with a single long vector without changing the output values. The output layer can only take the input with a single vector from the previous layer.

Hyperparameters in the Convolutional Neural Network
Before implementing the 1D-CNN model on the SRT dataset, specific parameter values needed to be configured. There are two types of parameters that are used to train 1D-CNN model to make predictions. The internal parameters, which are learned automatically during the backpropagation process, are called model parameters. The model parameters are present only in the CNN, dense and output layers where the weights of filters or kernels, weights of neurons, and biases are learned during the training of 1D-CNN.
The external parameters that determine the structure of 1D-CNN and how it is trained is called hyperparameters. Hyperparameter tuning is based on a manual trial-and-error process. The list of hyperparameters and the range of values are shown in Table 3. Tuning of the hyperparameters leads to the development of six different models for each dataset, as shown in Table 4. All the list of hyperparameters is briefly explained previously except for the epoch, batch size, and optimizer. An epoch refers to the entire training dataset that passes through the forward and backward propagation through the network at one time, while the batch size refers to the number of samples (rows of data) in a single iteration before the updating of model parameters [25].  The total iteration needed to complete one epoch was calculated by dividing the number of samples in the training dataset by the batch size. The number of times that the model parameters will be updated is equal to the total number of iterations. For example, if the training dataset has 160 rows and the batch size is set to 32, then the model parameters are updated five times. Five iterations are needed to complete one epoch. Likewise, if the epoch is set to 60, then 1D-CNN is trained until 60 epochs, where 300 iterations are required for the entire training process.
The optimizer is an optimization algorithm used to update the model parameters iteratively based on the training dataset by calculating the error to minimize the loss function [45]. Adaptive moment estimation is an optimizer that is fast to converge, efficient in learning model parameters, and adequately solves practical deep learning problems [49,50]. Equations (9)-(13) demonstrate the model parameters update using adaptive moment estimation optimizer, and the value of hyperparameters used in the Adam optimizer is presented in Table 5: In Equations (9)-(13); m t is the first moment estimate of the gradient at timestep t; m t−1 is the first moment estimate of the gradient at timestep t − 1; v t is the second moment estimate of the squared gradient at timestep t; v t−1 is the second moment estimate of the squared gradient at timestep t − 1; g t is the gradient with respect to the stochastic objective at time t; g 2 t is the elementwise square of g t ; β 1 is the hyperparameter, which controls the exponential decay rate for the first moment estimate; β 2 is the hyperparameter, which controls the exponential decay rate for the second-moment estimate;m t is the bias-corrected estimator for the first moment;v t is the bias-corrected estimator for the second moment; θ is the updated model parameter; α is the step size or learning rate hyperparameter; and is a parameter configured as a very small number to prevent any division by zero.

Model Evaluation and Validation
One way to justify how well the model works for the SRT dataset is to evaluate the model performance using standard statistical metrics. The model is estimated using a testing dataset and then returns the model validation metrics for both regression and classification problems. As the SRT problem falls into both classification and regression, the validation metrics, such as mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), coefficient of determination (R 2 ), and confusion matrix (CM), were used.
Among all the models created using a different set of hyperparameters, the one having the smallest MSE, RMSE, and MAE but the highest R 2 was selected to be the best fit for a regression problem. MSE and RMSE are the most popular metrics for the regression task because of the theoretical relevance in statistical modelling despite being sensitive to outliers [51]. RMSE poses a high penalty on large errors through the defined least-square terms, which implied that it is useful for improving the model performance, especially when the model errors follow a normal distribution [52].
MAE is the average of the absolute difference between the predicted and actual values. It is suitable to portray the errors that show a uniform distribution [51]. Besides, MAE is the most natural and precise measure of the average error magnitude [53]. R 2 is a scale-free score that does not provide the model residuals' information because it only determines the data's dispersion, not the bias [54]. The computation of the regression validation metrics is shown in Equations (14)- (17).
In Equations (14)- (17); n is the total number of observations in the dataset; y j is the actual value for the j th observation;ŷ j is the predicted value for the j th observation; and y is the mean of the actual value.
In contrast, for the classification problem, the highest classification accuracy (ACC) was selected as the best fit model. The confusion matrix consists of four components, which are the true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The ACC can be calculated using CM's component, as shown in Equation (18): In Equation (18), TP is when the actual and predicted output are both 1; TN is when the actual and predicted output are both 0; FP is when the actual output is 0, but the predicted output is 1; and FN is when the actual output is 1, but the predicted output is 0. The result of the 1D-CNN models for each set in Table 2 is shown in Section 3.

Slurry Test Model Validation Result
The SRT dataset for the slurry test was divided into four sets, as shown Table 2, where each set was fitted separately by the output variables. Therefore, the prediction of the amount of sand produced in grams, retained permeability and the amount of sand produced per unit area, and the classification of the plugging sign were evaluated. The best model among all six hyperparameter configurations was selected according to the performance metrics. The validation results for the classification problem are shown in Table 6, while the regression problem is presented in Tables 7-9. The model validation results of the slurry test presented above showed that the hyperparameter configuration in model 5 gave the lowest average errors and the highest R 2 in predicting sand production and retained permeability. However, the classification model for plugging sign gave the best result using a combination of hyperparameters in model 2, 5, and 6. Therefore, model 5 is selected as the best model for predicting sand production and retained permeability of the slurry test, but either model 2, 5, or 6 can be chosen as the final model for the plugging sign. The line plot of model 5 was used for the plugging sign to visualize the trend between actual and predicted data. The actual versus predicted plots of model 5 for each slurry test set are shown in Figure 6.  In Figure 6, the trend of actual data is shown by the red line, while the blue line shows the predicted data for all slurry test sets. The line plot of the plugging sign in Figure 6a represents the classification of 49 observations of the actual and predicted data. The observations that fall under the 0 class refer to the observations with no sign of screen plugging, while the observations that were categorized as 1 refer to a sign of screen plugging. Two peaks at the red line represent actual data with a sign of screen plugging, but the blue line does not follow the peaks that portray the false classification of two observations, as shown in Table 5. The line plot of sand production in Figure 6b represents 111 predicted values along with the actual data. The predicted amount of sand produced in grams approximates the actual trend with 77% accuracy.
Furthermore, the line plot of retained permeability in Figure 6c represents 20 observations of screen-retained permeability. The prediction line closely follows the actual percentage of retained permeability with 99% accuracy. Lastly, Figure 6d shows a line plot of sand produced per unit area with 103 observations and 77% accuracy. Most of the prediction points are almost identical to the actual points.

Sand Pack Test Model Validation Result
The SRT dataset for the sand pack test has only one set, as shown in Table 2, where the dataset was fitted only to predict the amount of sand produced per unit area. The best model among all six hyperparameter configurations was selected according to the performance metrics. The validation result for the regression problem of the sand pack test is shown in Table 10. The model validation result for the sand pack test presented above showed that the hyperparameter configuration in model 1 gave the lowest MSE and RMSE but the second lowest for MAE. Furthermore, it provided the highest R 2 in predicting sand production. Hence, model 1 was selected as the best model to predict sand production for the sand pack test. The actual versus predicted plot of model 1 for the sand pack test is shown in Table 10.
In Figure 7, the actual amount of sand produced per unit area is portrayed by the red line, while predicted values are shown by the blue line with 79 observations and 84% accuracy. The prediction points of sand production below 0.2 lb/ft 2 do not approximately follow the actual points, but the points above 0.2 lb/ft 2 are almost identical.

Comparative Model Performance
The performance of different machine learning or deep learning algorithms may vary depending on the datasets. Generally, deep learning algorithms outperform shallow machine learning techniques. To validate this statement, four machine learning models, which are gradient boosting (GB), K-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM), were developed for the SRT problem using the same dataset as in Table 2.
All those four machine learning models are supervised learning which commonly used to solve a classification problem but still can be used for regression. The details of four comparative models are shown in Table 11. The model validation results for the slurry test can be seen in Tables 12-16, while Table 14 is the result of the sand pack test.  (20) KNN KNN cast the prediction by the weighted average of the targets according to its neighbor's closest distance [56].
x j is one of the KNN in training set; y(x j , c k ) is an indicator that the training set, x j belongs to class c k argmax k is the argument of function k that gives the maximum predicted probability.

Weight function used for prediction (uniform)
Algorithm used to compute the nearest neighbors (auto) Leaf size (30) Power parameter (2) Distance metric used for the tree (minkowski) RF RF represents a boosting method consisting of a collection of decision trees where a majority vote acquires the prediction over each tree's predictions [57].
Number of trees (1000) B is the total number of trees; f i (x ) is the prediction of the individual tree at i th node.
Function to measure the quality of a split (classification: Gini, regression: MSE) Minimum number of samples required to split (2) Minimum number of samples required to be at a leaf node (1) Minimum weighted fraction of total weights required to be at a leaf node (0) Number of features to consider when looking for the best split (auto) Random state (42)

SVM
The prediction of SVM depends on the hyperplane and the decision boundary in multidimensional space where the algorithm will find the best fit line with a maximum number of points [58].
Regularization parameter (1) Kernel (classification: linear, regression: RBF)  The results showed that the 1D-CNN model outperformed all four machine learning models for both the slurry and sand pack tests. It justifies that deep learning algorithms perform better than shallow machine learning methods.

Significance Analysis of Comparative Models
The significant difference between the means of all comparative models was tested using one-way analysis of variance (ANOVA) because it is a parametric test for more than two samples. However, the Kruskal-Wallis test was used to identify the significant difference between the medians of five comparative models since it is a non-parametric test for more than two samples. The verification of the statistically significant models comes with the condition where the null hypothesis is rejected. The null hypothesis refers to the situation where the mean or median difference between all five models is not statistically significant. Suppose the p-value is less than the significance level of 0.05, and the F statistic value is greater than the F critical value. In this case, the null hypothesis was rejected for the ANOVA test. Still, if the H statistic value is greater than the Chi-square critical value, then the null hypothesis is rejected for the Kruskal-Wallis test. The significance analysis using the ANOVA and Kruskal-Wallis tests are presented Table 17. The significance analysis using ANOVA and Kruskal-Wallis tests in Table 17 showed that the p-values are less than 0.05 and the test statistics are greater than the critical values. Thus, the null hypothesis is rejected, indicating that the differences between the mean or median of all five models in predicting the plugging sign, retained permeability, and sand production are statistically significant.

Conclusions
This paper proposed a 1D-CNN with adaptive moment estimation for modelling of the SRT dataset focusing on the slurry and sand pack tests that use stand-alone screens as a sand control method to reduce sand production. The proposed method was developed to examine the sand retention efficiency and plugging performance, which can help select the optimal screen and slot size.
The hyperparameters tuning of 1D-CNN was empirically performed using a trialand-error approach, leading to the development of six models tested for each output. The 1D-CNN model performance showed that the hyperparameters configuration in model 5 of the slurry test fits the best in predicting the amount of sand produced in grams, retained permeability, and the amount of sand produced per unit area because model 5 gave the lowest average errors and the highest R 2 . The best model for both sand productions gave an accuracy of 77%, while the best model for retained permeability gave 99% accuracy. Besides, the set of hyperparameters in model 2, 5, and 6 fit very well with the classification of the plugging sign because all three models gave the same accuracy of 96%. Thus, either model 2, 5, or 6 can be used as the best fit model. In contrast, model 1 of the sand pack test outperformed the other five models in predicting the amount of sand produced per unit area with an accuracy of 84%.
For comparative model performance, the accuracy of the 1D-CNN model was higher than the other four machine learning models in predicting all the outputs of slurry and sand pack tests. Therefore, the proposed deep learning model outperformed the other four machine learning methods based on validation metrics.
Since the proposed deep learning model is the first model developed for SRT problem, further optimization can be proposed for future research by focusing more on the feature engineering process and include more observations for the modelling phase. Informed Consent Statement: "Not applicable" for studies not involving humans.

Data Availability Statement:
The data that support the findings of this study are available from PETRONAS GR&T. Restrictions apply to the availability of these data, which were used under license for this study. Data are available from the authors with the permission of PETRONAS Group Research & Technology (GR&T).

Conflicts of Interest:
The authors declare no conflict of interest.