Predicting Gas Separation Efficiency of a Downhole Separator Using Machine Learning

: Artificial lift systems, such as electrical submersible pumps and sucker rod pumps, frequently encounter operational challenges due to high gas–oil ratios, leading to premature tool failure and increased downtime. Effective upstream gas separation is critical to maintain continuous operation. This study aims to predict the efficiency of downhole gas separator using machine learning models trained on data from a centrifugal separator and tested on data from a gravity separator (blind test). A comprehensive experimental setup included a multiphase flow system with horizontal (31 ft. (9.4 m)) and vertical (27 ft. (8.2 m)) sections to facilitate the tests. Seven regression models— multilinear regression, random forest, support vector machine, ridge, lasso, k-nearest neighbor, and XGBoost—were evaluated using performance metrics like RMSE, MAPE, and R-squared. In-depth exploratory data analysis and data preprocessing identified inlet liquid and gas volume flows as key predictors for gas volume flow per minute at the outlet (GVFO). Among the models, random forest was most effective, exhibiting an R-squared of 96% and an RMSE of 112. This model, followed by KNN, showed great promise in accurately predicting gas separation efficiency, aided by rigorous hyperparameter tuning and cross-validation to prevent overfitting. This research offers a robust machine learning workflow for predicting gas separation efficiency across different types of downhole gas separators, providing valuable insights for optimizing the performance of artificial lift systems.


Introduction
At a certain stage in the production life, artificial lift or tertiary recovery systems become inevitable for oil and gas wells worldwide.The main target for every operator company from the exploration to development phase is to optimize drilling and production of a well.When the natural reservoir pressure declines, artificial lift equipment are installed within the primary recovery stage to extract hydrocarbon from reservoir.Artificial lift systems play a pivotal role in ensuring the continued extraction of hydrocarbons from reservoirs that have undergone pressure depletion.Various forms of artificial lift equipment, such as sucker rod pumps (SRPs) or beam pumps, electrical submersible pumps (ESPs), plunger lifts, and progressive cavity pumps (PCPs), are selected based on specific operating conditions.The performance of these technologies is significantly impacted by the presence of gas in the production fluid, which can diminish the efficiency of the equipment and lead to increased non-productive time (NPT).A downhole gas separator is a device designed to separate gas from the liquid-gas mixture stream within the subsurface environment.It effectively channels gas into the casing-tubing annulus to be brought to the surface, while directing the liquid into the pump and up through the tubing string.By segregating gas from liquids, the separator enhances the effectiveness of artificial lift systems, boosts the productivity of the wellbore, and minimizes non-productive time [1,2].
Gas separator technology initially started as a water management tool and, with time, advanced significantly.Ref. [3] mentioned that gas separators, also known as gas anchors, were used to separate oil and gas from the produced water downhole, discharging the oil and gas mixture into the fluid stream and injecting produced water into the disposal zones.Hydrocyclone and gravity processes were used to segregate fluids in these separators.Also, if separating gas in the downhole is challenging, less efficient methods like hydraulic pump and gas lift methods could be used, rather than electrical submersible pump (ESP) of high flowrate capacity.Ref. [4] mentioned that gas separators expand the operating window for ESP, especially in high gas-oil ratio (GOR) wells.Natural gas separators, poor boy separators, modified poor boy separators, packer-type separators, and special separators are some of the most common downhole separators used in oil and gas wells.The oldest and most efficient separators are natural gas separators, and further research was conducted by modifying their shape and evaluating their separation efficiency [5][6][7][8].Almost all types of separators work on the principle of gravity.A major disadvantage of gravitational separators is the time consumed during the separation.Modifying the design of separators, for instance by adding a centrifugal section in the separator that uses centrifugal force, enhances and accelerates the process.Stokes' law governs the physical mechanism of gas-liquid separation [9].This law describes how a particle (or bubble) travels through a fluid, depending on its size, viscosity, and density.Centrifugal separators force an inflowing stream into a swirling motion to create a vortex-type phenomenon.This movement can be obtained by abruptly changing the direction of the fluid tangentially or by using helical elements or designs such as augers [9].
In the past, different types of downhole separators have been experimentally investigated.Ref. [10] presented a detailed laboratory investigation into the efficiency of different downhole gas separator designs, challenging common industry assumptions with experimental data.The study highlights the importance of separator entry port placement, dip tube length, and the innovative use of centrifugal forces in enhancing gas-liquid separation efficiency [10].Similarly, Refs.[1,2] conducted a study on the effectiveness of a prototype centrifugal packer-type downhole separator, examining its performance under different liquid and gas flow rates.Through laboratory experiments using a multiphase flow setup, it was discovered that the separator's average gas separation efficiency was notably high, particularly at medium liquid flow rates [1,2].Ref. [11] conducted multiple tests to evaluate a newly designed centrifugal packer-type downhole separator for artificial lift systems in oil and gas wells.The research found the separator consistently achieved over 90% liquid separation efficiency across various conditions, although efficiency slightly dropped at higher liquid rates.This highlights the separator's potential to significantly reduce gas-related issues in pumping systems, marking a notable advancement in artificial lift technologies [11].Ref. [12] conducted a comparative analysis between centrifugal and gravitational downhole separators, evaluating the performance of each through over 150 tests for centrifugal separators and 55 tests for gravitational separators.Their findings indicated that centrifugal separators outperformed gravitational separators in terms of separation efficiency.The research aimed to assess the impact of flow rates on the performance and stability of a newly developed packer-type centrifugal separator, contrasting it with that of a gravitational separator [12].Ref. [13] developed and validated a numerical model to evaluate a hybrid separator combined with a piston pump, utilizing two-phase turbulent CFD analysis.The study examines the impact of piston speed and gas/liquid ratio on the separator's performance, highlighting that higher piston speeds decrease efficiency due to greater pressure drops.It also introduces a correlation for predicting separator efficiency, providing key insights for optimizing downhole separator design and operation [13].Ref. [14] reviewed advancements, limitations, and applications of downhole liquid-gas separators in oil and gas operations, emphasizing their importance in enhancing artificial lift systems.The study provides insights into optimizing separator designs for improved performance in unconventional wells, underlining the need for further research to address current challenges and enhance separator efficiency [14].
Machine learning (ML) has previously been applied to drilling, reservoir, and production data to enhance drilling and production efficiency of oil and gas wells.However, only two past works have conducted studies on gas separation efficiency of downhole separator by employing ML.Ref. [15] showed that centrifugal separators utilize this principle by inducing a swirling motion in an incoming flow to create a vortex effect.Stokes' law outlines the behavior of a particle (or bubble) moving through a fluid, influenced by its size, viscosity, and density.This action is achieved by either abruptly altering the fluid's direction tangentially or employing helical structures [15].Operating on a basis similar to gravitational separators, centrifugal separators leverage the difference in densities between gas and liquid.The centrifugal force propels the denser liquid toward the vortex's outer edge, while the gas remains closer to the center [15].Ref. [16] study advances the understanding of downhole centrifugal separators in oil and gas extraction by leveraging ML and dimensional analysis to pinpoint critical efficiency factors.The research, grounded on a comprehensive database from prior studies, shows separators can reach over 79% efficiency under certain conditions, though this declines with higher fluid rates.It introduces a predictive model emphasizing the importance of the Weber and gas Reynolds numbers, offering valuable insights for enhancing artificial lift system designs [15,17].
In the past, machine learning has been applied to predict separation efficiency of the downhole gas separator in a limited number of studies using multiple independent variables.However, prediction of separation efficiency of the downhole gas separator with blind test has not been performed in the past using supervised machine learning method by incorporating only two independent variables.In this study, a model was trained using experimental test data from a centrifugal-type downhole separator and then tested on a gravity-based downhole gas separator experimental test data in what is referred to as a blind test.To quantify the separators' efficiency, the gas volume flow per minute at the outlet was predicted based on the liquid and gas volume flows at the inlet, from which the gas separation efficiency was subsequently calculated.The formula to calculate gas separation efficiency is shown below.

Measured Gas separation efficiency(fraction) =
Gas volume flow per minute at outlet Gas volume flow per minute at inlet ,

Data Collection
This study utilizes data from experiments conducted to test the separation efficiency of both centrifugal and gravity-based downhole gas separators [1,2,11,12,18].Refs.[1,2,12] conducted experiments on the centrifugal gas separator, while Ref. [11] conducted experiments on the gravity-based downhole gas separator.In total, 260 tests were performed to assess the separation efficiency of the centrifugal separator, and 55 tests were conducted for the gravity-based separator using multiphase flow facility as shown in next section.Consequently, 260 tests were incorporated for model training, while 55 tests were used for model testing and analysis.The main distinction between centrifugal and gravity-based separators lies in the spiral section present in the centrifugal separator.Additionally, the gravity-based separator measured 10 inches in length, compared to the 15-inch length of the centrifugal separator.

Facility Design to Test Downhole Separator
A comprehensive multiphase flow system was constructed for this purpose, extending 31 feet in length horizontally and rising 27 feet vertically.The centrifugal downhole gas separator was provided by Echometer Company, located in Texas, United States.As depicted in Figure 1, the system encompasses the gas and water inlet lines, horizontal and vertical sections, and both tubing and casing return lines, comprising the full multiphase flow setup.Instrumentation within the configuration includes five Coriolis flowmeters, five control valves, and eight pressure transducers, supplemented by various manual pressure gauges and two differential pressure gauges.These instruments are all connected to a central data acquisition system (DAQ) for real-time monitoring and control of the flow parameters, with the DAQ cards being operated via Visual Basic software.In each experiment, the setup, featuring over 20 pieces of equipment, captures more than 30 different variables, such as pressure, temperature, density, and flow rates.As detailed in Figure 1, the tubing return line (TRL) connects to the tubing flow path, while the casing return line (CRL) links to the casing-tubing annulus flow path.Through the forces of gravity and centrifugal motion, the downhole gas separator segregates the liquid and gas within the vertical section, directing the liquid or water through the tubing to the TRL and the gas through the annulus to the CRL.Consequently, the flowmeter at the top of the CRL measures the gas, and the flowmeter at the bottom of the TRL measures the liquid or water.The flowmeter installed at the top of TRL measures gas flow, if there is any gas flow from the tubing.The gas volume flow per minute at the outlet (GVFO), which is the CRL, is utilized to forecast the efficiency of the downhole separator, calculated based on the liquid and gas volume flow per minute at the inlet (LVFI and GVFI) located at the T-section.
flow setup.Instrumentation within the configuration includes five Coriolis flowmeters, five control valves, and eight pressure transducers, supplemented by various manual pressure gauges and two differential pressure gauges.These instruments are all connected to a central data acquisition system (DAQ) for real-time monitoring and control of the flow parameters, with the DAQ cards being operated via Visual Basic software.In each experiment, the setup, featuring over 20 pieces of equipment, captures more than 30 different variables, such as pressure, temperature, density, and flow rates.As detailed in Figure 1, the tubing return line (TRL) connects to the tubing flow path, while the casing return line (CRL) links to the casing-tubing annulus flow path.Through the forces of gravity and centrifugal motion, the downhole gas separator segregates the liquid and gas within the vertical section, directing the liquid or water through the tubing to the TRL and the gas through the annulus to the CRL.Consequently, the flowmeter at the top of the CRL measures the gas, and the flowmeter at the bottom of the TRL measures the liquid or water.The flowmeter installed at the top of TRL measures gas flow, if there is any gas flow from the tubing.The gas volume flow per minute at the outlet (GVFO), which is the CRL, is utilized to forecast the efficiency of the downhole separator, calculated based on the liquid and gas volume flow per minute at the inlet (LVFI and GVFI) located at the T-section.The gas separator provided by Echometer Company is shown in Figure 2, showcasing a design concept for a static centrifugal packer-type downhole separator.This separator is particularly beneficial in wells with high gas-to-oil ratios, effectively addressing issues such as diminished artificial lift performance and gas lock.A key advantage of this gas separator is its lack of moving parts (static), which significantly reduces maintenance requirements and can help operating companies minimize non-productive time.The separator's design allows for the entry of the gas-liquid mixture through two ports, depicted in green and red in Figure 2. The rotational motion generated by the spiral section, coupled with the centrifugal force at the outlet, forces liquid droplets to impinge against the casing's inner wall, thereby enhancing liquid and gas separation at the outlet.The separated liquid, under gravity, descends into the casing-tubing annulus and is channeled into the separator's shroud, subsequently ascending through the tubing-shroud annulus into the two intake ports.The gas separator provided by Echometer Company is shown in Figure 2, showcasing a design concept for a static centrifugal packer-type downhole separator.This separator is particularly beneficial in wells with high gas-to-oil ratios, effectively addressing issues such as diminished artificial lift performance and gas lock.A key advantage of this gas separator is its lack of moving parts (static), which significantly reduces maintenance requirements and can help operating companies minimize non-productive time.The separator's design allows for the entry of the gas-liquid mixture through two ports, depicted in green and red in Figure 2. The rotational motion generated by the spiral section, coupled with the centrifugal force at the outlet, forces liquid droplets to impinge against the casing's inner wall, thereby enhancing liquid and gas separation at the outlet.The separated liquid, under gravity, descends into the casing-tubing annulus and is channeled into the separator's shroud, subsequently ascending through the tubing-shroud annulus into the two intake ports.

Methodology
This section outlines the development and application of various ML algorithms, including multiple linear regression (MLR), k-nearest neighbors (KNN), random forest regression (RFR), support vector machine (SVM), ridge, lasso, and XGBoost, with the objective of forecasting the gas separation efficiency in downhole separators.The workflow is visually represented in Figure 3, which illustrates the comprehensive steps involved: data collection, data processing, model training and testing, followed by efficient model selection.The comparative analysis of all regression models is performed based on the R-squared and error metrics as mentioned in [19][20][21] for efficient model selection.
gression (RFR), support vector machine (SVM), ridge, lasso, and XGBoost, with the objec-tive of forecasting the gas separation efficiency in downhole separators.The workflow is visually represented in Figure 3, which illustrates the comprehensive steps involved: data collection, data processing, model training and testing, followed by efficient model selection.The comparative analysis of all regression models is performed based on the Rsquared and error metrics as mentioned in [19][20][21] for efficient model selection.
The experimental setup, comprising over 20 different pieces of apparatus, records more than 30 distinct variables such as pressure, temperature, and flow rates in each test run.Given the extensive nature of this dataset, it is crucial to judiciously select independent variables that predict the separator's efficiency.This selection process relies on methods such as scatter and correlation plotting, alongside multicollinearity analysis.Further elaboration on input parameter selection and multicollinearity assessment is provided in subsequent sections.Prior to being included in model training and testing, the selected independent variables undergo scaling and outlier removal.The dataset used for training and testing the ML models comprises 260 rows of data collected from experimental testing of centrifugal separator efficiency, and 55 rows of data collected from experimental testing of gravity-based downhole separator efficiency, respectively.In this research, the effectiveness of the models was compared and assessed using metrics such as the coefficient of determination (R-squared), root mean square error (RMSE), and mean absolute percentage error (MAPE).Models with low R-squared values and high error rates were eliminated.The model demonstrating the highest R-squared and the lowest error rates was ultimately selected as the most reliable for predicting the The experimental setup, comprising over 20 different pieces of apparatus, records more than 30 distinct variables such as pressure, temperature, and flow rates in each test run.Given the extensive nature of this dataset, it is crucial to judiciously select independent variables that predict the separator's efficiency.This selection process relies on methods such as scatter and correlation plotting, alongside multicollinearity analysis.Further elaboration on input parameter selection and multicollinearity assessment is provided in subsequent sections.Prior to being included in model training and testing, the selected independent variables undergo scaling and outlier removal.The dataset used for training and testing the ML models comprises 260 rows of data collected from experimental testing of centrifugal separator efficiency, and 55 rows of data collected from experimental testing of gravity-based downhole separator efficiency, respectively.
In this research, the effectiveness of the models was compared and assessed using metrics such as the coefficient of determination (R-squared), root mean square error (RMSE), and mean absolute percentage error (MAPE).Models with low R-squared values and high error rates were eliminated.The model demonstrating the highest R-squared and the lowest error rates was ultimately selected as the most reliable for predicting the efficiency of the downhole separator.This approach not only ensures accuracy but also contributes to optimizing the separator's performance in practical scenarios.

Data Preprocessing and Feature Engineering
In this study, two critical independent variables were identified for predicting the gas volume flow per minute at the outlet (GVFO), the gas volume flow (GVFI) and the liquid volume flow per minute (LVFI), both measured at the inlet.A thorough analysis of the experimental data was conducted, including examination for multicollinearity, as discussed later in this section.This analysis determined that GVFI and LVFI were the most impactful predictors, ensuring that the final model was not adversely affected by collinearity between variables.
To construct the most accurate model for predicting GVFO, seven regression algorithms were incorporated for analysis: KNN, lasso, MLR, RFR, ridge, SVM, and XGBoost regression.We conducted hyperparameter tuning for each model using a grid search method combined with cross-validation.Specifically, for the random forest model, we optimized parameters such as the number of trees (ntree = 500), maximum depth which is controlled indirectly through the "mtry" parameter, which represents the number of variables randomly sampled as candidate at each split, minimum samples split, and minimum samples leaf.The tuning process involved a five-fold cross-validation on the training dataset to identify the optimal hyperparameters that minimize prediction error.Each model's performance was meticulously evaluated based on its R-squared value and error metrics.The selection process was data-driven, with a clear focus on minimizing prediction errors.The optimal model, as depicted in Figure 4, was chosen for its superior performance, marked by the lowest error values among the contenders and MLR is considered as the base case in the comparative analysis.This rigorous selection methodology ensures that the implemented model for downhole separator efficiency analysis is robust and reliable.
Energies 2024, 17, x FOR PEER REVIEW 7 of efficiency of the downhole separator.This approach not only ensures accuracy but al contributes to optimizing the separator's performance in practical scenarios.

Data Preprocessing and Feature Engineering
In this study, two critical independent variables were identified for predicting the g volume flow per minute at the outlet (GVFO), the gas volume flow (GVFI) and the liqu volume flow per minute (LVFI), both measured at the inlet.A thorough analysis of th experimental data was conducted, including examination for multicollinearity, as di cussed later in this section.This analysis determined that GVFI and LVFI were the mo impactful predictors, ensuring that the final model was not adversely affected by collin arity between variables.
To construct the most accurate model for predicting GVFO, seven regression alg rithms were incorporated for analysis: KNN, lasso, MLR, RFR, ridge, SVM, and XGBoo regression.We conducted hyperparameter tuning for each model using a grid sear method combined with cross-validation.Specifically, for the random forest model, we o timized parameters such as the number of trees (ntree = 500), maximum depth which controlled indirectly through the "mtry" parameter, which represents the number of va iables randomly sampled as candidate at each split, minimum samples split, and min mum samples leaf.The tuning process involved a five-fold cross-validation on the trainin dataset to identify the optimal hyperparameters that minimize prediction error.Ea model's performance was meticulously evaluated based on its R-squared value and err metrics.The selection process was data-driven, with a clear focus on minimizing predi tion errors.The optimal model, as depicted in Figure 4, was chosen for its superior pe formance, marked by the lowest error values among the contenders and MLR is consi ered as the base case in the comparative analysis.This rigorous selection methodolog ensures that the implemented model for downhole separator efficiency analysis is robu and reliable.The preprocessing of the data, including the selection of relevant parameters, involved both univariate and bivariate analyses.Box plots were used for each parameter to detect outliers, and the analysis concluded that the experimental data did not contain any outlier values.Similarly, bivariate analysis was conducted to understand the relationship between the dependent variable, GVFO, and the available independent parameters, with the findings shown in Figure 5.According to Figure 5, there is an observable trend where GVFO tends to increase in conjunction with rises in gas-liquid ratio at inlet (IGLR), control valve opening percentage installed at gas inlet line (CVGIL), and casing control valve (CCV), while it does not change much with increases in average inlet gas flow rate (AIGF) and average inlet liquid flow rate (AILF).To refine the model, further exploration is necessary to examine the correlations between the response variable and independent parameters, which will aid in selecting the optimal set of independent variables for accurate GVFO prediction.The preprocessing of the data, including the selection of relevant parameters, involved both univariate and bivariate analyses.Box plots were used for each parameter to detect outliers, and the analysis concluded that the experimental data did not contain any outlier values.Similarly, bivariate analysis was conducted to understand the relationship between the dependent variable, GVFO, and the available independent parameters, with the findings shown in Figure 5.According to Figure 5, there is an observable trend where GVFO tends to increase in conjunction with rises in gas-liquid ratio at inlet (IGLR), control valve opening percentage installed at gas inlet line (CVGIL), and casing control valve (CCV), while it does not change much with increases in average inlet gas flow rate (AIGF) and average inlet liquid flow rate (AILF).To refine the model, further exploration is necessary to examine the correlations between the response variable and independent parameters, which will aid in selecting the optimal set of independent variables for accurate GVFO prediction.6 presents a correlation matrix that visually maps out the relationships between the dependent variable, gas volume flow at the outlet (GVFO), and various independent variables, alongside the intercorrelations among the independent variables themselves.The visualization reveals a high correlation of GVFO with parameters such as CVGIL, gas volume flow per minute at inlet (GVFI), and gas volume flow at tubing return line (GVFTRL), indicating strong predictive relationships.Conversely, GVFO is moderately associated with CCV, IGLR, and gas-liquid ratio at outlet (OGLR), while sharing a low Energies 2024, 17, 2655 9 of 17 degree of correlation with AILF, AIGF, liquid volume flow per minute at inlet (LVFI), and gas volume flow at tubing return (LVFTRL).This graphical representation also indicates significant correlations between several independent variables, an observation that shows a more detailed analysis.To ensure the integrity of the regression model and to mitigate the risk of multicollinearity, a subsequent application of the variable inflation factor (VIF) is recommended for the meticulous selection of independent variables.By doing so, the analysis will eliminate redundant variables, ensuring that the model's predictive accuracy is not compromised by highly interdependent predictors.
independent variables, alongside the intercorrelations among the independent variables themselves.The visualization reveals a high correlation of GVFO with parameters such as CVGIL, gas volume flow per minute at inlet (GVFI), and gas volume flow at tubing return line (GVFTRL), indicating strong predictive relationships.Conversely, GVFO is moderately associated with CCV, IGLR, and gas-liquid ratio at outlet (OGLR), while sharing a low degree of correlation with AILF, AIGF, liquid volume flow per minute at inlet (LVFI), and gas volume flow at tubing return (LVFTRL).This graphical representation also indicates significant correlations between several independent variables, an observation that shows a more detailed analysis.To ensure the integrity of the regression model and to mitigate the risk of multicollinearity, a subsequent application of the variable inflation factor (VIF) is recommended for the meticulous selection of independent variables.By doing so, the analysis will eliminate redundant variables, ensuring that the model's predictive accuracy is not compromised by highly interdependent predictors.Examining Figure 6 underscores the importance of assessing multicollinearity to understand the interdependencies among chosen predictor variables.The variable inflation factor (VIF) was utilized to evaluate the degree of correlation between the independent variables within the regression analysis.The VIF essentially quantifies the extent to which multicollinearity has amplified the variance of an estimated regression coefficient.Referring to Table 1, it is evident that the VIF scores for regression model is consistent, reflecting the uniformity of input variables used in the estimations of GVFO.VIF values below 1 imply the absence of correlation, those ranging from 1 to 5 denote a moderate level of correlation, and values exceeding 5 reveal a strong correlation amongst the predictors.The Examining Figure 6 underscores the importance of assessing multicollinearity to understand the interdependencies among chosen predictor variables.The variable inflation factor (VIF) was utilized to evaluate the degree of correlation between the independent variables within the regression analysis.The VIF essentially quantifies the extent to which multicollinearity has amplified the variance of an estimated regression coefficient.Referring to Table 1, it is evident that the VIF scores for regression model is consistent, reflecting the uniformity of input variables used in the estimations of GVFO.VIF values below 1 imply the absence of correlation, those ranging from 1 to 5 denote a moderate level of correlation, and values exceeding 5 reveal a strong correlation amongst the predictors.The selected two independent variables have moderate VIF, indicating moderate level of correlation.Following a detailed examination of scatter plots and correlation matrices, along with the implementation of the stepAIC function, the variables GVFI and LVFI were chosen as the input parameters.To facilitate a faster convergence and enhance the efficiency of the machine learning algorithms, these variables underwent a scaling process to ensure uniformity in their range.Subsequently, the dataset from experimental testing of centrifugal separator was used for model training and the dataset from experimental testing of gravitybased separator (data not used in model training) was used for model testing.This blind test was essential to check how well the model predicts and how it performs on unseen data.

Results
The research conducts seven separate regression analyses using a suite of methods that include k-nearest neighbors (KNN), lasso, multiple linear regression (MLR), random forest regression (RFR), ridge, support vector machine (SVM), and XGBoost.These techniques aim to forecast the dependent variable, GVFO.The regression models are tested using centrifugal separator data and tested using gravity-based separator experimental data.The selection of the superior model from the seven was informed by its low error rates and high R-squared value, as detailed in Table 2.This table presents the R-squared, root mean square error (RMSE), and mean absolute percentage error (MAPE) for each model for prediction of GVFO using testing dataset (unseen dataset).R-squared is employed to evaluate how well the regression models explain the variance in GVFO, with a range from 0 to 100% providing a direct measure of the correlation between dependent and independent variables.Meanwhile, RMSE offers insight into the models' predictive precision, with lower values indicating predictions more closely reflecting actual outcomes.The analysis also involves a qualitative review, examining scatterplots that showed predicted values against actual data, as will be elaborated in subsequent sections.These plots reveal the explanatory strength of each model, with a tighter clustering of points around the regression line indicative of lower variance.Additionally, MAPE is utilized to gauge the accuracy of predictions, with smaller MAPE values pointing to more precise models.MLR serves as the benchmark for comparison, illustrating the relationship between predicted and observed data, though it is not graphically represented in the forthcoming sections.

KNN Regression Model
Figure 7 offers a visual assessment of the predictive capabilities of the KNN model by showing predicted values against actual measurements of the gas volume flow at the annulus outlet (GVFO).The scatterplot (Figure 7) demonstrates the correlation between the predicted and actual GVFO, with the x-axis denoting the actual measurements and the y-axis depicting the predicted values.The KNN model's predictive strength is highlighted by its R-squared value of 96.6%, indicating that a substantial portion of GVFO variability is captured by the model.With RMSE and MAPE scores standing at 130 and 8, respectively, the KNN model's predictions exhibit notable reliability, specifically when compared to MLR, SVM, and XGBoost models.Figure 7 further indicates the KNN model's robustness; the proximity of the predicted points to the regression line suggests a commendable degree of accuracy, surpassing that of MLR, lasso, ridge, and SVM methods.While the KNN model demonstrates a strong predictive performance, it is the second efficient model among all those evaluated, suggesting there is room for improvement or potential advantages to be gained from other models in certain aspects of prediction for GVFO.

KNN Regression Model
Figure 7 offers a visual assessment of the predictive capabilities of the KNN by showing predicted values against actual measurements of the gas volume flow annulus outlet (GVFO).The scatterplot (Figure 7) demonstrates the correlation b the predicted and actual GVFO, with the x-axis denoting the actual measurements y-axis depicting the predicted values.The KNN model's predictive strength is high by its R-squared value of 96.6%, indicating that a substantial portion of GVFO va is captured by the model.With RMSE and MAPE scores standing at 130 and 8, tively, the KNN model's predictions exhibit notable reliability, specifically whe pared to MLR, SVM, and XGBoost models.Figure 7 further indicates the KNN robustness; the proximity of the predicted points to the regression line suggests mendable degree of accuracy, surpassing that of MLR, lasso, ridge, and SVM m While the KNN model demonstrates a strong predictive performance, it is the se ficient model among all those evaluated, suggesting there is room for improve potential advantages to be gained from other models in certain aspects of predic GVFO.

Lasso Regression Model
Figure 8 visually represents the lasso model's predictive capability in est GVFO, showcasing the plotted predicted values against actual measurements.Th terplot illustrates the relationship between predicted and measured GVFO valu the x-axis indicating measured values and the y-axis displaying predicted on model's accuracy is highlighted by an R-squared value of 92.1%, indicating its a capture a reasonable amount of variability in GVFO, albeit lower than some other However, its precision is somewhat compromised, as reflected by RMSE and MA ues of 199 and 7, respectively, suggesting a relatively lower level of reliability co to other models considered.Based on these metrics, the lasso model appears to effective compared to the other models incorporated for predicting GVFO.

Lasso Regression Model
Figure 8 visually represents the lasso model's predictive capability in estimating GVFO, showcasing the plotted predicted values against actual measurements.This scatterplot illustrates the relationship between predicted and measured GVFO values, with the xaxis indicating measured values and the y-axis displaying predicted ones.The model's accuracy is highlighted by an R-squared value of 92.1%, indicating its ability to capture a reasonable amount of variability in GVFO, albeit lower than some other models.However, its precision is somewhat compromised, as reflected by RMSE and MAPE values of 199 and 7, respectively, suggesting a relatively lower level of reliability compared to other models considered.Based on these metrics, the lasso model appears to be less effective compared to the other models incorporated for predicting GVFO.

Random Forest Model
Figure 9 demonstrates the predictive capability of the RFR model in estimating GVFO by contrasting its predictions with actual data.The scatterplot illustrates the relationship between predicted and measured GVFO values, with actual values represented on the xaxis and predictions on the y-axis.The RFR model excels in predicting GVFO, as evidenced by its impressive R-squared value of 95.9%, indicating its ability to explain a significant portion of the variability in GVFO.The model demonstrates reliability, with an RMSE of 112 and an MAPE of 8, the lowest error metrics among all incorporated models.Moreover, Figure 9 illustrates that the RFR model's predictions closely align with the line of best fit, showcasing its high prediction accuracy compared to all other models.Overall, the RFR model ranks highest in its ability to predict GVFO among all seven regression models.

Random Forest Model
Figure 9 demonstrates the predictive capability of the RFR model in estimating GVFO by contrasting its predictions with actual data.The scatterplot illustrates the relationship between predicted and measured GVFO values, with actual values represented on the xaxis and predictions on the y-axis.The RFR model excels in predicting GVFO, as evidenced by its impressive R-squared value of 95.9%, indicating its ability to explain a significant portion of the variability in GVFO.The model demonstrates reliability, with an RMSE of 112 and an MAPE of 8, the lowest error metrics among all incorporated models.Moreover, Figure 9 illustrates that the RFR model's predictions closely align with the line of best fit, showcasing its high prediction accuracy compared to all other models.Overall, the RFR model ranks highest in its ability to predict GVFO among all seven regression models.

Random Forest Model
Figure 9 demonstrates the predictive capability of the RFR model in estimating GVFO by contrasting its predictions with actual data.The scatterplot illustrates the relationship between predicted and measured GVFO values, with actual values represented on the xaxis and predictions on the y-axis.The RFR model excels in predicting GVFO, as evidenced by its impressive R-squared value of 95.9%, indicating its ability to explain a significant portion of the variability in GVFO.The model demonstrates reliability, with an RMSE of 112 and an MAPE of 8, the lowest error metrics among all incorporated models.Moreover, Figure 9 illustrates that the RFR model's predictions closely align with the line of best fit, showcasing its high prediction accuracy compared to all other models.Overall, the RFR model ranks highest in its ability to predict GVFO among all seven regression models.

Ridge Regression Model
In Figure 10, the predicted data are visually compared against the observed measurements for the ridge regression model, focusing on the GVFO response variable.The model demonstrates an R-squared value of 94.1%, indicating that it performs slightly better than the MLR, lasso, and XGBoost models in capturing the variance within the GVFO data.However, the model's reliability is compromised, as evidenced by an RMSE of 175 and a MAPE of 10, which are less favorable compared to the outcomes from other models.Based on these error metrics and the quality of predictions, the ridge regression model does not rank among the top predictive models evaluated in this study.

Ridge Regression Model
In Figure 10, the predicted data are visually compared against the observed measurements for the ridge regression model, focusing on the GVFO response variable.The model demonstrates an R-squared value of 94.1%, indicating that it performs slightly better than the MLR, lasso, and XGBoost models in capturing the variance within the GVFO data.However, the model's reliability is compromised, as evidenced by an RMSE of 175 and a MAPE of 10, which are less favorable compared to the outcomes from other models.Based on these error metrics and the quality of predictions, the ridge regression model does not rank among the top predictive models evaluated in this study.

XGBoost Regression Model
Figure 12 presents the predicted versus actual data for the GVFO, as estimated by the XGBoost regression model.This model's ability to account for fluctuations in GVFO is demonstrated by its R-squared value of 93.9%.The data in Figure 12 imply that XGBoost regression is more effective in modeling GVFO variation compared to the MLR and SVM models.The model shows a moderate level of dependability, evidenced by an RMSE of 175 and a MAPE of 13.Although the XGBoost model exhibits enhancements in prediction

XGBoost Regression Model
Figure 12 presents the predicted versus actual data for the GVFO, as estimated by the XGBoost regression model.This model's ability to account for fluctuations in GVFO is demonstrated by its R-squared value of 93.9%.The data in Figure 12 imply that XGBoost regression is more effective in modeling GVFO variation compared to the MLR and SVM models.The model shows a moderate level of dependability, evidenced by an RMSE of 175 and a MAPE of 13.Although the XGBoost model exhibits enhancements in prediction accuracy and error rates over MLR and lasso, it does not surpass the other models evaluated in this research.

XGBoost Regression Model
Figure 12 presents the predicted versus actual data for the GVFO, as estimated by the XGBoost regression model.This model's ability to account for fluctuations in GVFO is demonstrated by its R-squared value of 93.9%.The data in Figure 12 imply that XGBoost regression is more effective in modeling GVFO variation compared to the MLR and SVM models.The model shows a moderate level of dependability, evidenced by an RMSE of 175 and a MAPE of 13.Although the XGBoost model exhibits enhancements in prediction accuracy and error rates over MLR and lasso, it does not surpass the other models evaluated in this research.

Discussion
The result section above shows the efficacy of machine learning techniques in forecasting the gas volume flow per minute at the outlet (GVFO), a key performance indicator

Discussion
The result section above shows the efficacy of machine learning techniques in forecasting the gas volume flow per minute at the outlet (GVFO), a key performance indicator for the liquid-gas separation efficiency of downhole centrifugal separators.As delineated in Table 2, the random forest regression approach exhibits the most favorable R-squared value, signifying its superior predictive capacity for GVFO, while the multilinear regression method registers the least.Concurrently, Figure 13 shows the selection criteria for the optimal model, which is predicated on achieving the lowest error alongside the highest R-squared value for GVFO prediction.The R-squared and RMSE values were normalized with the maximum value and plotted as shown in Figure 13, indicating the random forest regression model as the most precise, characterized by the highest R-squared and the lowest RMSE.The results showed that the random forest model provided superior predictive performance due to its ability to handle complex interactions between variables and mitigate overfitting through the bagging approach.All seven models' comparisons are mentioned in Table 2 based on their error metrics.Furthermore, when scrutinizing the R-squared values, both the RFR and KNN regression models demonstrate good predictive proficiency with R-squared values of 95.9% and 96.6%, respectively.The RFR model, in particular, outperformed others in terms of RMSE on the testing dataset (blind test), whereas the multilinear regression model lags behind, registering the highest RMSE.It is noteworthy that the support vector machine algorithm, typically favored for classification tasks, also shows better performance in predicting GVFO, more than the multilinear and lasso regression models.In summation, upon considering the collective metrics, the random forest regression model emerges as the preeminent choice for GVFO prediction.Its distinguished R-squared and minimal RMSE values corroborate its aptitude to accurately predict GVFO from experimental data with minimal error, thereby substantiating its adoption in the analysis of downhole separator efficiency.
whereas the multilinear regression model lags behind, registering the highest RMSE.It is noteworthy that the support vector machine algorithm, typically favored for classification tasks, also shows better performance in predicting GVFO, more than the multilinear and lasso regression models.In summation, upon considering the collective metrics, the random forest regression model emerges as the preeminent choice for GVFO prediction.Its distinguished R-squared and minimal RMSE values corroborate its aptitude to accurately predict GVFO from experimental data with minimal error, thereby substantiating its adoption in the analysis of downhole separator efficiency.

Conclusions
This investigation aimed to forecast the gas volume flow per minute at the outlet (GVFO) for gravity-based separators using experimental test data measuring the efficiency for centrifugal downhole separators.Seven distinct regression techniques were developed and evaluated: k-nearest neighbors, lasso, multilinear, random forest, ridge, support vector machine, and XGBoost.These models utilized gas volume flow per minute at

Figure 1 .
Figure 1.Multiphase flow setup constructed to measure the efficiency of the downhole gas separator installed in the vertical section.

Figure 1 .
Figure 1.Multiphase flow setup constructed to measure the efficiency of the downhole gas separator installed in the vertical section.

Figure 3 .
Figure 3. Workflow for supervised learning model selection for prediction of downhole separator's efficiency.

Figure 3 .
Figure 3. Workflow for supervised learning model selection for prediction of downhole separator's efficiency.

Figure 4 .
Figure 4. Selection of the efficient regression model for prediction of gas volume flow per minute at outlet (GVFO) (response variable) using gas volume flow per minute at inlet (GVFI) and liquid volume flow per minute at inlet (LVFI) as independent parameters.

Figure 4 .
Figure 4. Selection of the efficient regression model for prediction of gas volume flow per minute at outlet (GVFO) (response variable) using gas volume flow per minute at inlet (GVFI) and liquid volume flow per minute at inlet (LVFI) as independent parameters.

Figure 5 .
Figure 5. Scatterplot of gas volume flow per minute in ft 3 with independent variables IGLR, CVGIL, CCV (Top left to right), and AILF and AIGF (Bottom left to right).This figure indicates how GVFO is behaving with changes in all independent variables.

Figure 6
Figure6presents a correlation matrix that visually maps out the relationships between the dependent variable, gas volume flow at the outlet (GVFO), and various

Figure 5 .
Figure 5. Scatterplot of gas volume flow per minute in ft 3 with independent variables IGLR, CVGIL, CCV (Top left to right), and AILF and AIGF (Bottom left to right).This figure indicates how GVFO is behaving with changes in all independent variables.

Figure
Figure6presents a correlation matrix that visually maps out the relationships between the dependent variable, gas volume flow at the outlet (GVFO), and various independent variables, alongside the intercorrelations among the independent variables themselves.The visualization reveals a high correlation of GVFO with parameters such as CVGIL, gas volume flow per minute at inlet (GVFI), and gas volume flow at tubing return line (GVFTRL), indicating strong predictive relationships.Conversely, GVFO is moderately associated with CCV, IGLR, and gas-liquid ratio at outlet (OGLR), while sharing a low

Figure 7 .
Figure 7. Scatterplot of measured vs. predicted GVFO using testing dataset for KNN re model.

Figure 7 .
Figure 7. Scatterplot of measured vs. predicted GVFO using testing dataset for KNN regression model.

Figure 8 .
Figure 8. Scatterplot of measured vs. predicted GVFO using testing dataset for lasso regression model.

Figure 9 .
Figure 9. Scatterplot of measured vs. predicted GVFO using testing dataset for random forest regression model.

Figure 8 .
Figure 8. Scatterplot of measured vs. predicted GVFO using testing dataset for lasso regression model.

Figure 9 .
Figure 9. Scatterplot of measured vs. predicted GVFO using testing dataset for random forest regression model.

Figure 9 .
Figure 9. Scatterplot of measured vs. predicted GVFO using testing dataset for random forest regression model.

Figure 10 .
Figure 10.Scatterplot of measured vs. predicted GVFO using testing dataset for ridge regression model.

Figure 11
Figure 11 displays the comparison between predicted and actual GVFO measurements using the SVM regression model.The scatterplot showcases the model's accuracy in explaining GVFO variability, evidenced by an R-squared value of 92.4%.This value indicates that the SVM model performs more effectively than the MLR model in capturing the changes in GVFO data.With an RMSE of 140 and a MAPE of 10, the SVM regression model's results demonstrate moderate reliability compared to other models evaluated in the study.While the SVM model shows improvement over the MLR model, it does not outperform the random forest and KNN regression models.

Figure 10 .
Figure 10.Scatterplot of measured vs. predicted GVFO using testing dataset for ridge regression model.

Figure 11 18 Figure 11 .
Figure 11 displays the comparison between predicted and actual GVFO measurements using the SVM regression model.The scatterplot showcases the model's accuracy in explaining GVFO variability, evidenced by an R-squared value of 92.4%.This value indicates that the SVM model performs more effectively than the MLR model in capturing the changes in GVFO data.With an RMSE of 140 and a MAPE of 10, the SVM regression model's results demonstrate moderate reliability compared to other models evaluated in the study.While the SVM model shows improvement over the MLR model, it does not outperform the random forest and KNN regression models.Energies 2024, 17, x FOR PEER REVIEW 14 of 18

Figure 11 .
Figure 11.Scatterplot of measured vs. predicted GVFO using testing dataset for SVM regression model.

Figure 11 .
Figure 11.Scatterplot of measured vs. predicted GVFO using testing dataset for SVM regression model.

Figure 12 .
Figure 12.Scatterplot of measured vs. predicted GVFO using testing dataset for XGBoost regression model.

Figure 12 .
Figure 12.Scatterplot of measured vs. predicted GVFO using testing dataset for XGBoost regression model.

Figure 13 .
Figure 13.Best regression model selection based on lowest RMSE and highest R-squared to predict GVFO.The highest R-squared of 96% and lowest RMSE value of 112 for GVFO prediction is for RFR regression model.Therefore, RFR model is selected for prediction of GVFO.

Figure 13 .
Figure 13.Best regression model selection based on lowest RMSE and highest R-squared to predict GVFO.The highest R-squared of 96% and lowest RMSE value of 112 for GVFO prediction is for RFR regression model.Therefore, RFR model is selected for prediction of GVFO.

Table 1 .
VIF for each variable indicating multicollinearity between independent variables.A VIF value ranging from 1 to 5 suggests a moderate level of correlation between the two independent variables considered in the analysis.

Table 2 .
Regression methods used in the model to predict GVFO with R-squared, RMSE, and MAPE.The most efficient model among all is the lasso model with highest R-squared value of 98% and lowest RMSE and MAPE values on test data.