Machine Learning Augmented Two-Fluid Model for Segregated Flow

: Segregated ﬂow, including stratiﬁed and annular ﬂows, is commonly encountered in several practical applications such as chemical, nuclear, refrigeration, and oil and gas industries. Accurate prediction of liquid holdup and the pressure gradient is of great importance in terms of system design and optimization. The current most widely accepted model for segregated ﬂow is a physics-based two-ﬂuid model that treats gas and liquid phases separately by incorporating mass and momentum conservation equations. It requires empirically derived closure relationships that have the limitation of being applicable only under a narrow range of input parameters under which they were developed. In this paper, we proposed a more generalized machine learning augmented two-ﬂuid model, using a database that spans the range of various ﬂowing conditions and ﬂuid properties. Machine learning algorithms such as random forest, neural networks, and gradient boosting were tested for the best performing data-driven predictive model. The new model proposed in this work successfully captures the complex, dynamic, and non-linear relationships between the friction factor and ﬂowing conditions. A comprehensive model evaluation against nineteen existing correlations shows the best results from the proposed model.


Introduction
Segregated flow, which includes stratified smooth, stratified wavy, and annular flows, is commonly encountered in several practical applications such as chemical, nuclear, refrigeration, and oil and gas industries. Accurate prediction of liquid holdup and the pressure gradient is of great importance in terms of system design and optimization. This section first reviews the two-fluid hydraulic model that is widely used in the industry for segregated flow pressure gradient and liquid holdup predictions, followed by a review of the common application of machine learning (ML) in multiphase flow modeling.

Review of Two-Fluid Hydraulic Models for Segregated Flow
The current most widely accepted two-fluid hydraulic models for steady-state gasliquid two-phase segregated flow are primarily based on the mass and momentum conservation equations for gas and liquid phases separately [1,2], which is also referred to as a two-fluid model in the current study. Separate governing equations are derived by treating each phase or component as a separate entity. Figure 1 depicts a two-phase flow system, as well as the forces acting on the gas and liquid phase. Equations (1) and (2) provide the integral form of conservation of momentum equations for gas and liquid under steady-state conditions. − A L dp dL L − τ W L S L + τ I S I − A L ρ L g sin θ = 0, − A C dp dL C − τ WC S C − τ I S I − A C ρ C g sin θ = 0, sin 0 C WC C I I C C C dp A S S A g dL where AL is the area occupied by the liquid phase, AC is the area occupied by the gas core, τWL is the liquid wall shear stress, τWC is the wall shear stress for the gas core, τI is the interfacial shear stress, SL is the liquid wetted perimeter, SC is the gas core wetted perimeter, and SI is the interfacial perimeter. dp/dL is the pressure gradient, and ρL and ρC are liquid and gas core densities respectively. The subscript L represents the liquid phase, C refers to the gas core with the entrained liquid droplet. θ is the inclination angle from the horizontal. vL and vC are the actual in-situ velocities for the liquid and gas core, respectively. Equation (3) is the combined momentum equation of Equations (1) and (2) by canceling out the pressure gradient term, which can be used to solve for the liquid holdup, while Equations (4)- (6) provide the formulae to calculate shear stresses [1,3].


 sin 0 2 To solve Equation (3), several closure relationships are needed, including liquid and gas wall friction factors (fL and fC), interfacial friction factor (fI), geometrical parameters such as SL, SC, SI, AC, and AL, and liquid entrainment (FE) defined by the entrained liquid mass flow rate in the gas core over the total liquid mass flow rate.
There are several different models for the geometrical parameters, such as the flat Equation (3) is the combined momentum equation of Equations (1) and (2) by canceling out the pressure gradient term, which can be used to solve for the liquid holdup, while Equations (4)- (6) provide the formulae to calculate shear stresses [1,3].
To solve Equation (3), several closure relationships are needed, including liquid and gas wall friction factors (f L and f C ), interfacial friction factor (f I ), geometrical parameters such as S L , S C , S I , A C , and A L , and liquid entrainment (F E ) defined by the entrained liquid mass flow rate in the gas core over the total liquid mass flow rate.
There are several different models for the geometrical parameters, such as the flat interface assumption that is widely used by various studies thereafter [3][4][5][6][7], apparent rough surface that assumes a liquid film with a constant thickness [8,9], and unified "doublecircle" model and its simplified version applied to all inclination angles [10,11]. The geometrical parameters can be easily extended to annular flow by assuming a uniform film thickness [12,13].
Previous studies have shown that the interfacial friction factor plays the most important role in model performance. There are dozens of different interfacial friction factor correlations available in the literature, which were developed for specific experimental or field conditions. A previous study discussed the importance of the interfacial friction factor in gas-liquid stratified flow modeling and also emphasized the basic pitfalls in the commonly used single-phase closure relationships [1]. The primary reason behind the inaccuracy was attributed to the empirical nature through which the correlations were developed and the narrow range considered in the correlation development. Using the cor- relations outside the conditions originally used to develop the model can result in dramatic errors. Another problem is that the model prediction is discontinuous when switching from one correlation to another, which may potentially cause problems in production design. Due to the absence of a correlation built for a wide range of flow conditions, there is a desire for the industry to look for generalized correlations or approaches that can capture all the parameters that affect the pressure gradient and liquid holdup prediction, such as gas and liquid flow rates, inclination angle, fluid properties, pipe diameter, etc.
The wall friction factors in Equations (4) and (5) are commonly calculated using correlations for single-phase flow as functions of the Reynolds number, given in Equations (7) and (8), using hydraulic diameters. The formulae for the liquid and gas hydraulic diameters are given in Equations (9) and (10) by assuming open-channel flow for the liquid phase and closed-duct flow for the gas phase, respectively [3,14].
Experimental data show that the use of the hydraulic diameter and single-phase correlation for the wall friction factor may also result in inaccuracy [15]. This conclusion can be obtained by comparing the experimental measured frictional pressure gradient and the one calculated from correlations, while assuming negligible errors from geometrical and entrainment correlations. The experimental measured frictional pressure gradient can be calculated from Equation (12), which is derived from Equations (1) and (2) by canceling out the interfacial shear stress terms, shown in Equation (11). The frictional pressure gradient from correlations can be obtained from Equation (13), in which the liquid and gas core wall shear stresses are calculated from Equations (4), (5) and (7)- (10). In this study, we used the liquid holdup obtained from experiments to calculate the geometrical parameters using a unified geometrical prediction model developed for all inclination angles [11], with a zero liquid entrainment assumption. The comparison of the two calculated frictional pressure gradients is plotted in Figure 2a for various datasets from [16][17][18][19] that cover all inclination angles from 0 • to 90 • and wide pressure and liquid flow rate ranges, showing unsatisfactory results. The sensitivity study on the single-phase wall friction factor correlation from [20][21][22][23][24] also shows that it does not have a significant impact on the conclusion (Figure 2b). dp dL total = dp dL f + dp Fluids 2022, 7, 12 4 of 20

Application of Machine Learning in Multiphase Flow Modeling
As discussed in the previous section, closure relationships are used to estimate wall shear stresses and interfacial shear stress in a two-fluid model by empirically derived friction factor correlations [25]. Modeling of interfacial shear stress is one of the most critical issues in gas-liquid stratified flow models [1]. This is primarily due to the drag forces acting between the gas and liquid phases, which cause a considerable increase in the pressure gradient. Although mechanistic models are based on the fundamental laws of natural sciences, the closure relationships used in the process have a limitation of being applicable only under the conditions used for model development for acceptable accuracy [26,27].
In the past few years, multiphase flow solutions based on artificial intelligence were also developed and have proven to be a promising solution to model flow behavior. The advantage of using data-driven models lies in the reduced computational cost and a more simplistic approach without any user-induced bias in the form of model assumptions. A primary focus in the application of machine learning solutions in the past has been the prediction of flow patterns using pressure drop signal data as discussed by Mi et al. (2001), Shaban and Tavoularis (2014) [28,29]. Multiple other studies are based on fluid property experimental data using support vector machines, neural networks, or deep learning as a machine learning tool. Some of the related work was conducted by Osman

Application of Machine Learning in Multiphase Flow Modeling
As discussed in the previous section, closure relationships are used to estimate wall shear stresses and interfacial shear stress in a two-fluid model by empirically derived friction factor correlations [25]. Modeling of interfacial shear stress is one of the most critical issues in gas-liquid stratified flow models [1]. This is primarily due to the drag forces acting between the gas and liquid phases, which cause a considerable increase in the pressure gradient. Although mechanistic models are based on the fundamental laws of natural sciences, the closure relationships used in the process have a limitation of being applicable only under the conditions used for model development for acceptable accuracy [26,27].
In the past few years, multiphase flow solutions based on artificial intelligence were also developed and have proven to be a promising solution to model flow behavior. The advantage of using data-driven models lies in the reduced computational cost and a more simplistic approach without any user-induced bias in the form of model assumptions. A primary focus in the application of machine learning solutions in the past has been the prediction of flow patterns using pressure drop signal data as discussed by Mi et al. (2001), Shaban and Tavoularis (2014) [28,29]. Multiple other studies are based on fluid property experimental data using support vector machines, neural networks, or deep learning as a machine learning tool. Some of the related work was conducted by Osman  [30][31][32][33][34][35][36]. The studies prove machine learning is an effective tool for predicting flow patterns based on pipe characteristics, superficial velocities, and other fluid properties. Mohammadi (2020) proposed an unsupervised learning model to determine clusters for closure relationships and used a genetic algorithm to find a near-optimal set of closure relationships and validate the results using statistics [37]. Lin (2020) incorporated physics by introducing a partial differential equation (PDE) aware deep learning model but was focused on solving the physical phenomenon of the saturation front [38].
Most of the previous studies are predominantly data-driven and did not capture the physics of the multiphase flow behavior, which limits their application range to the condition where the models were trained. In order to address the limitations of previous studies, this paper focuses on developing a machine learning model for friction factors and combining them with a physics-based two-fluid model to predict pressure drop and liquid holdup for segregated flow for a wide range of flow conditions and fluid properties.

Methods
In the following section, a hybrid physics-data-driven algorithm is proposed to model segregated flow using machine learning. Specifically, the new approach adopts the framework of a two-fluid model, incorporating the physics of gas and liquid segregated flow, and develops new generalized data-driven correlations for the liquid wall friction factor and interfacial friction factor using machine learning.

Model Development
Considering the inaccuracy using wall friction factor correlations developed for singlephase pipe flow as discussed previously (Figure 2), we proposed a new liquid wall friction factor (f L ) data-driven model for two-phase segregated flow, which is a function of the liquid wall friction factor for single-phase pipe flow (f L-SP ), given in Equation (14). The reason for choosing the liquid wall friction factor instead of the gas wall friction factor is the consideration of annular flow in which gas has zero contact with the pipe wall. The new data-driven model for f L is applicable to both stratified and annular flows.
In addition to φ, we also proposed a new data-driven machine learning model for f I , that is supposed to capture the complex behaviors under various flowing conditions and fluid properties. φ and f I can be directly calculated from experimental data for training purposes if both the pressure gradient and liquid holdup are known. The formulas to determine φ and f I from experimental data sets are given in Equations (17) and (18), which are derived from Equations (15) and (16).
In the current study, a data-driven methodology is proposed to model φ and f I to capture their non-linear relationships with flowing conditions and fluid properties for determining pressure drops and liquid holdups. The data source used in the model development covers a wide range of flowing conditions and fluid properties, discussed in the next section. The back-calculated interfacial friction factor, f I , and the coefficient, φ, from experimental measurement, become the target/response variable, and machine learning can be used to generate a correlation for accurately predicting this response variable while using fluid properties and flowing parameters as input variables. Figure 3 shows the stepwise workflow of the model development using machine-learning (Steps 1-5) and the implementation of the new model to predict pressure gradient and liquid holdup for segregated flow (Step 6).  The biggest advantage of the proposed method is that it preserves the physics of twophase segregated flow while generating more robust and generalized interfacial and liquid wall friction factor correlations.

Dataset Description
The dataset was extracted from the open literature, including multiple studies where pressure gradient and liquid holdup were both presented, such as [16,18,19,[39][40][41][42][43][44][45][46][47][48]. The two response variables, the coefficient for liquid wall friction factor, ϕ, and the interfacial friction factor, fI, were estimated from the experimental data first. These two parameters were correlated with the input variables using three machine learning algorithms-random forest, gradient boosting, and an artificial neural network. The input parameters include liquid and gas superficial velocities (vSL and vSg), liquid and gas densities and The biggest advantage of the proposed method is that it preserves the physics of two-phase segregated flow while generating more robust and generalized interfacial and liquid wall friction factor correlations.

Dataset Description
The dataset was extracted from the open literature, including multiple studies where pressure gradient and liquid holdup were both presented, such as [16,18,19,[39][40][41][42][43][44][45][46][47][48]. The two response variables, the coefficient for liquid wall friction factor, φ, and the interfacial friction factor, f I , were estimated from the experimental data first. These two parameters were correlated with the input variables using three machine learning algorithms-random forest, gradient boosting, and an artificial neural network. The input parameters include liquid and gas superficial velocities (v SL and v Sg ), liquid and gas densities and viscosities (ρ L , ρ G , µ L , and µ G ), surface tension (σ), pipe diameter (d), and inclination angle (θ). The total number of filtered data points is approximately 1500.
Exploratory Data Analysis (EDA) is a critical step for understanding quantitative variables in an experimental dataset and is often used as a visual tool to understand the high-dimensional dataset. This step plays a critical role in learning from the data during the scientific process of model building and testing and discovering patterns in the dataset. In this section, the dataset is analyzed in a univariate and multivariate graphical sense by looking at statistical distributions in the form of histograms, scatter plots, and box plots. Data pre-processing includes the process of data wrangling by identifying the irregularities, cleaning the data, identifying, and removing anomalies and outliers as well as data transformation and feature selection.
There are four basic questions about the data that need to be asked before starting with data analysis. The first question relates to whether the data are discrete or continuous. The second step involves looking at the symmetry using distributions to identify if there is skewness present in the dataset. The third step is related to the upper and lower bounds of the data, and the final question determines the likelihood of observing extreme values in the distribution. The data used in the study are primarily of continuous form with values in a finite interval. Figure 4 shows the univariate distribution of all the variables that form the dataset. This figure also helps to determine the wide range of each variable, which is essential to create a model with high generalization capability. As an example, it can be observed from the inclination distribution plot that a majority of studies have been conducted on horizontal pipes.    A multivariable bivariate distribution leads to an identification of significant skewness in liquid density, gas density, liquid viscosity, and gas viscosity, but it is not necessarily an outlier, rather another subset of data points with lower frequency.
Box plots are often used as a graphical tool to display information for continuous univariate data and to identify highly influential points or possible outliers. The classical method from Tukey (1977) [49] was used to remove outliers by using a threshold of 1.5×interquartile range (IQR), which is defined as the difference between the first and third quartile [50]. Predictions for φ and fI using the two-fluid model were analyzed to identify and remove outliers. The results for the case of φ and fI before and after the outlier removal are shown in Figure 5.

Machine Learning Algorithms
Three different machine learning algorithms are implemented for the dataset-Random Forest [51], eXtreme Gradient Boosting-XGBoost [52], and Artificial Neural Networks [53]. The objective is to generate a predictive model for ϕ and fI with the highest accuracy. Although these three algorithms could be used for both regression and classification problems, in this work the focus will be supervised regression since the dataset has continuous variables, with no categorical information. A primary reason for the selection of these algorithms is their high performance to account for the non-linearity associated with the predictor variables or features [54]. A detailed explanation of each algorithm is readily available in the literature and is omitted from this paper.

Model Evaluation-Accuracy Metrics
The model performance is comprehensively evaluated with other existing methods available in the literature to show improvement while predicting pressure gradient and liquid holdup. To quantify the performance of different regression models, accuracy metrics such as the Root Mean Squared Error (RMSE), Absolute Average Error (AAE), and R 2 are commonly used. The mathematical explanation of some of these error metrics is provided below. The error for each prediction is defined as the difference between the measured and predicted values.
Root Mean Squared Error (RMSE)-This metric is defined as the square root of the

Machine Learning Algorithms
Three different machine learning algorithms are implemented for the dataset-Random Forest [51], eXtreme Gradient Boosting-XGBoost [52], and Artificial Neural Networks [53]. The objective is to generate a predictive model for φ and f I with the highest accuracy. Although these three algorithms could be used for both regression and classification problems, in this work the focus will be supervised regression since the dataset has continuous variables, with no categorical information. A primary reason for the selection of these algorithms is their high performance to account for the non-linearity associated with the predictor variables or features [54]. A detailed explanation of each algorithm is readily available in the literature and is omitted from this paper.

Model Evaluation-Accuracy Metrics
The model performance is comprehensively evaluated with other existing methods available in the literature to show improvement while predicting pressure gradient and liquid holdup. To quantify the performance of different regression models, accuracy metrics such as the Root Mean Squared Error (RMSE), Absolute Average Error (AAE), and R 2 are commonly used. The mathematical explanation of some of these error metrics is provided below. The error for each prediction is defined as the difference between the measured and predicted values.
Root Mean Squared Error (RMSE)-This metric is defined as the square root of the average square distance between the measured value and the prediction. It represents the sample standard deviation of the residuals, which is a measure of how concentrated the data are around the line of best fit. It can be calculated by Equation (19).
Mean Absolute Error (MAE)-The metric is the absolute of the difference between the predicted value and observed value and provides a linear score where all the individual differences are weighted equally in the average. The metric can be calculated by Equation (20).
R 2 -Defined as the coefficient of determination, R 2 provides the goodness of fit of a model in the form of an intuitive scale varying between 0 and 1. Another way to define the metric is with respect to the proportion of the explainable variance in the dependent variable with the independent variable(s). A lower R 2 indicates a low level of correlation while a score of 1 indicates a perfect correlation between the two variables with no variance. Equation (21) can be used to determine the metric.
Average Absolute Percentage Relative Error (AAPRE): This metric can be derived from the absolute error and is an indication of how good a prediction is with respect to the observed value. Mathematically the metric can be calculated from Equation (22) below.
In the equations above, y j represents the experimentally measured value for the j th data point;ŷ j represents the predicted value from the machine learning model; and y denotes the mean of the dataset, with n being the total number of data points.

Results
Predictive modeling results for different regression models along with different accuracy metrics are discussed in this section. Table 1 shows the results in a tabulated format, broken down into the base model and random search cross-validated results, for the three machine learning algorithms analyzed in this work.
Based on the results for different machine learning models, XGBoost performed better when average performance was evaluated between predictions for both φ and f I , in terms of RMSE for the test dataset, and was selected as the final model. The learning curve after hyper-parameter tuning for interfacial friction factor is shown in Figure 6. An 'early stopping' approach is used to prevent overfitting. The process enables the input of the number of iterations (epochs) after which the algorithm stops if the validation score does not increase. Cross plots between the known value and its corresponding prediction are also shown in Figure 7. Hyper-parameter tuning was carried out for each machine learning algorithm, and predictions for φ and f I were made. Based on the results for different machine learning models, XGBoost performed better when average performance was evaluated between predictions for both ϕ and fI, in terms of RMSE for the test dataset, and was selected as the final model. The learning curve after hyper-parameter tuning for interfacial friction factor is shown in Figure 6. An 'early stopping' approach is used to prevent overfitting. The process enables the input of the number of iterations (epochs) after which the algorithm stops if the validation score does not increase. Cross plots between the known value and its corresponding prediction are also shown in Figure 7. Hyper-parameter tuning was carried out for each machine learning algorithm, and predictions for ϕ and fI were made.   Once the predictions are made for the dataset, the error can be calculated between the experimental and predicted values.
XGBoost also provides the capability to determine the feature importance, which plays a critical role to identify predictors with the highest impact on the model. Figure 8 shows the results of feature importance for the current dataset. The metric shown in the figure is the 'F-score'-weight, which is determined simply by how many times each feature is used to split the data across all trees.  Once the predictions are made for the dataset, the error can be calculated between the experimental and predicted values.
XGBoost also provides the capability to determine the feature importance, which plays a critical role to identify predictors with the highest impact on the model. Figure 8 shows the results of feature importance for the current dataset. The metric shown in the figure is the 'F-score'-weight, which is determined simply by how many times each feature is used to split the data across all trees. Once the predictions are made for the dataset, the error can be calculated between the experimental and predicted values.
XGBoost also provides the capability to determine the feature importance, which plays a critical role to identify predictors with the highest impact on the model. Figure 8 shows the results of feature importance for the current dataset. The metric shown in the figure is the 'F-score'-weight, which is determined simply by how many times each feature is used to split the data across all trees.  The next step is to evaluate the model in terms of the pressure gradient and liquid holdup predictions, using the physics-based two-fluid model, as shown in the workflow in Step 6 of Figure 3. Figures 9 and 10 show the comparison between experimentally determined and predictions from the proposed model for pressure drop and liquid holdup, respectively, for the entire dataset on the left and each individual data source on the right. In general, the model gives good predictions for all data sets.
Fluids 2022, 7, x FOR PEER REVIEW 14 of 22 in Step 6 of Figure 3. Figures 9 and 10 show the comparison between experimentally determined and predictions from the proposed model for pressure drop and liquid holdup, respectively, for the entire dataset on the left and each individual data source on the right.
In general, the model gives good predictions for all data sets.  The results were also compared with other existing models in the literature. The primary equations of these models are based on the two-fluid model but incorporated with different correlations for geometrical parameters and interfacial friction factor (fI). Two wetted wall fraction correlations tested were Zhang and Sarica (2011) [11], and Taitel and Dukler (1976) [3]. Churchill (1977) was used to determine wall friction factors [21]. In total, 19 different interfacial friction factor correlations from the literature were tested, including [3,5,7,8,12,[55][56][57][58][59][60][61][62][63][64], which led to a total of 38 different combinations of wetted wall friction and interfacial friction factor correlations. The list was divided into different cases, referred to in Figures 11 and 12, and is included in the Appendix A.
Model performance is gauged through metrics such as the average absolute relative error (AAPRE) and root mean squared error (RMSE). Figure 11 shows the performance in the form of bar charts for both metrics during the prediction of the pressure gradient. The comparison shows that the new model has the lowest average absolute relative error and RMSE.
Predictions are also made for liquid holdup, and comparisons of results using the same metric are shown in Figure 12. Based on the error metrics, the new model proposed  in Step 6 of Figure 3. Figures 9 and 10 show the comparison between experimentally determined and predictions from the proposed model for pressure drop and liquid holdup, respectively, for the entire dataset on the left and each individual data source on the right.
In general, the model gives good predictions for all data sets.  The results were also compared with other existing models in the literature. The primary equations of these models are based on the two-fluid model but incorporated with different correlations for geometrical parameters and interfacial friction factor (fI). Two wetted wall fraction correlations tested were Zhang and Sarica (2011) [11], and Taitel and Dukler (1976) [3]. Churchill (1977) was used to determine wall friction factors [21]. In total, 19 different interfacial friction factor correlations from the literature were tested, including [3,5,7,8,12,[55][56][57][58][59][60][61][62][63][64], which led to a total of 38 different combinations of wetted wall friction and interfacial friction factor correlations. The list was divided into different cases, referred to in Figures 11 and 12, and is included in the Appendix A.
Model performance is gauged through metrics such as the average absolute relative error (AAPRE) and root mean squared error (RMSE). Figure 11 shows the performance in the form of bar charts for both metrics during the prediction of the pressure gradient. The comparison shows that the new model has the lowest average absolute relative error and RMSE.
Predictions are also made for liquid holdup, and comparisons of results using the same metric are shown in Figure 12. Based on the error metrics, the new model proposed The results were also compared with other existing models in the literature. The primary equations of these models are based on the two-fluid model but incorporated with different correlations for geometrical parameters and interfacial friction factor (f I ).
Model performance is gauged through metrics such as the average absolute relative error (AAPRE) and root mean squared error (RMSE). Figure 11 shows the performance in the form of bar charts for both metrics during the prediction of the pressure gradient. The comparison shows that the new model has the lowest average absolute relative error and RMSE.
Predictions are also made for liquid holdup, and comparisons of results using the same metric are shown in Figure 12. Based on the error metrics, the new model proposed in this work provides the lowest error in comparison to two-fluid models that use empirically derived interfacial friction factor correlations.
Fluids 2022, 7, x FOR PEER REVIEW 15 of 22 in this work provides the lowest error in comparison to two-fluid models that use empirically derived interfacial friction factor correlations.  In addition to liquid holdup and pressure gradient predictions, the new model developed from this study was incorporated into a state-of-art mechanistic model for onset of liquid accumulation prediction proposed in [65]. That model is based on liquid film reversal of segregated flow and requires pressure gradient and liquid holdup prediction for segregated flow. The critical gas velocity determined from the coupled mechanistic model and the proposed ML algorithm from this study was further evaluated against existing droplet and film reversal models from [25,[66][67][68][69][70][71][72][73][74], as listed in Table 2. It is worth mentioning that the critical gas velocity refers to the minimum gas superficial velocity  in this work provides the lowest error in comparison to two-fluid models that use empirically derived interfacial friction factor correlations.  In addition to liquid holdup and pressure gradient predictions, the new model developed from this study was incorporated into a state-of-art mechanistic model for onset of liquid accumulation prediction proposed in [65]. That model is based on liquid film reversal of segregated flow and requires pressure gradient and liquid holdup prediction for segregated flow. The critical gas velocity determined from the coupled mechanistic model and the proposed ML algorithm from this study was further evaluated against existing droplet and film reversal models from [25,[66][67][68][69][70][71][72][73][74], as listed in Table 2. It is worth mentioning that the critical gas velocity refers to the minimum gas superficial velocity In addition to liquid holdup and pressure gradient predictions, the new model developed from this study was incorporated into a state-of-art mechanistic model for onset of liquid accumulation prediction proposed in [65]. That model is based on liquid film reversal of segregated flow and requires pressure gradient and liquid holdup prediction for segregated flow. The critical gas velocity determined from the coupled mechanistic model and the proposed ML algorithm from this study was further evaluated against existing droplet and film reversal models from [25,[66][67][68][69][70][71][72][73][74], as listed in Table 2. It is worth mentioning that the critical gas velocity refers to the minimum gas superficial velocity that maintains segregated flow in upward inclined pipes. The experimental data used in model evaluation were extracted from [16,17,19,48,64,65,[74][75][76]. Details of the mechanistic model and experimental datasets are well described in [65] and were omitted from this paper.  6 Zhou and Yuan (2010) As expected, the liquid film models provide a better prediction than the liquid droplet models, which is consistent with the findings from previous studies [65,74]. Overall, the new model gives better predictions for critical gas velocity compared to other models evaluated in this study. This is illustrated with the lowest AARE score of 12.78 units, shown in Figure 13 between the existing models and the proposed model in this paper.
Fluids 2022, 7, x FOR PEER REVIEW 16 of 22 that maintains segregated flow in upward inclined pipes. The experimental data used in model evaluation were extracted from [16,17,19,48,64,65,[74][75][76]. Details of the mechanistic model and experimental datasets are well described in [65] and were omitted from this paper.  6 Zhou and Yuan (2010) As expected, the liquid film models provide a better prediction than the liquid droplet models, which is consistent with the findings from previous studies [65,74]. Overall, the new model gives better predictions for critical gas velocity compared to other models evaluated in this study. This is illustrated with the lowest AARE score of 12.78 units, shown in Figure 13 between the existing models and the proposed model in this paper.

Summary
Segregated flow is one of the most commonly encountered flow patterns in several practical applications such as chemical, nuclear, refrigeration, and oil and gas industries. Accurate prediction of its pressure gradient and the liquid holdup is an important consideration for facility design and operations. It is also crucial for predicting the onset of liquid accumulation in the petroleum industry, which needs to be prevented and carefully managed during oil and gas production. It has been found from previous studies that the interfacial friction factor plays a crucial role in two-fluid model performance. The previous approach is to use empirically derived correlations that are only applicable to the small subset of working conditions under which those solutions are derived.
Using predictions from a mechanistic model augmented with data-driven analysis, a novel and more generalized approach is proposed that combines a physics-based two-fluid

Summary
Segregated flow is one of the most commonly encountered flow patterns in several practical applications such as chemical, nuclear, refrigeration, and oil and gas industries. Accurate prediction of its pressure gradient and the liquid holdup is an important consideration for facility design and operations. It is also crucial for predicting the onset of liquid accumulation in the petroleum industry, which needs to be prevented and carefully managed during oil and gas production. It has been found from previous studies that the interfacial friction factor plays a crucial role in two-fluid model performance. The previous approach is to use empirically derived correlations that are only applicable to the small subset of working conditions under which those solutions are derived.
Using predictions from a mechanistic model augmented with data-driven analysis, a novel and more generalized approach is proposed that combines a physics-based two-fluid model and machine learning algorithms to accurately predict pressure drop and liquid holdup. The new model is comprehensively evaluated with existing models with various closure relationships and shows a significant improvement in terms of prediction accuracy. A comparison of the average absolute percentage error was also made for critical gas velocity between the experimentally determined values and model predictions, which include both droplet and film-reversal based models. The hybrid model gives the best prediction for the critical gas velocity compared with other existing modeling approaches.
The proposed modeling workflow helps to reduce the dependence on empirically derived correlations and the need for interfacial friction factor correlation selection. Considering the gradual transition to the digital oil field, more field data will be available in the future, which can be used to augment the physics-based mechanistic multiphase flow modeling. The proposed approach in this study will add significant value in segregated flow modeling in terms of combining physics-based and data-driven models, and therefore the optimal design of production systems.

Discussion
A hybrid machine learning and mechanistic model to determine the pressure gradient and liquid holdup, as well as the critical gas velocity, presented in this work, provides better accuracy in comparison to studies derived from literature, which are primarily mechanistic or empirical-based. This has been made possible by leveraging the predictive capabilities of a machine learning algorithm along with the causal understanding of fluid flow mechanisms through a mechanistic model. However, this conclusion is accompanied by a caveat that significant, good-quality data are available before the implementation of any machine learning model. It is important to understand that machine learning and mechanistic modeling are two different paradigms and should never be considered to compete with or replace one other.
According to Baker (2018) [77], if large-scale datasets are available, machine learning algorithms can be an efficient and scalable modeling approach, as well as provide the ability to avoid the need to understand complex mechanisms. Advanced algorithms in the field of machine learning have been proven to identify hidden correlations between parameters that are not easy to identify through conventional approaches. One of the biggest drawbacks of a pure data-driven model is the applicability and generalization of using the model outside the boundary conditions in which the model was trained. This is where a mechanistic model has an advantage since the predictions from a mechanistic model, based on fundamental laws of nature, can be applied under conditions where experiments are either costly or difficult to perform. As discussed earlier, a mechanistic model relies on the generation of novel hypotheses and is usually built to mimic real-life events with certain assumptions. In some cases, these oversimplified assumptions and their extreme specific nature, commonly found in empirical models, prevent the model predictions to be applicable on a wider scale. This paper shows how a symbiotic relationship can be created between machine learning and mechanistic modeling approaches by harnessing the positive aspects of both approaches.

A C
Gas core occupied cross-sectional area, m 2 A L Liquid film occupied cross-sectional area, m 2 A p Pipe cross-sectional area, m 2 d Pipe inner diameter, m d L Liquid hydraulic diameter, m d C Gas core hydraulic diameter, m (dp/dL) total Total pressure gradient, Pa/m (dp/dL) L Liquid phase pressure gradient, Pa/m (dp/dL) C Gas core pressure gradient, Pa/m (dp/dL) f Frictional component of pressure gradient, Pa/m (dp/dL) g Gravitational component of pressure gradient, Pa/m f C Gas wall friction factor,f G-SP Gas wall friction factor for single phase pipe flow,f I Interfacial friction factor,f L Liquid wall friction factor,f L-SP Liquid wall friction factor for single phase pipe flow, -F E Entrainment fraction, -  Table indicating case number and the corresponding combination of geometrical parameters, wall friction factor, and interfacial friction factor used for model validation.