Prediction of Oil Recovery Factor in Stratified Reservoirs after Immiscible Water-Alternating Gas Injection Based on PSO-, GSA-, GWO-, and GA-LSSVM

In this study, we solve the challenge of predicting oil recovery factor (RF) in layered heterogeneous reservoirs after 1.5 pore volumes of water-, gasor water-alternating-gas (WAG) injection. A dataset of ~2500 reservoir simulations is analyzed based on a Black Oil 2D Model with different combinations of reservoir heterogeneity, WAG hysteresis, gravity influence, mobility ratios and WAG ratios. In the first model MOD1, RF is correlated with one input (an effective WAG mobility ratio M∗). Good correlation (Pearson coefficient −0.94), but with scatter, motivated a second model MOD2 using eight input parameters: water–oil and gas–oil mobility ratios, water–oil and gas–oil gravity numbers, a reservoir heterogeneity factor, two hysteresis parameters and water fraction. The two mobility ratios exhibited the strongest correlation with RF (Pearson coefficient −0.57 for gas-oil and −0.48 for water-oil). LSSVM was applied in MOD2 and trained using different optimizers: PSO, GA, GWO and GSA. A physics-based adaptation of the dataset was proposed to properly handle the single-phase injection. A total of 70% of the data was used for training, 15% for validation and 15% for testing. GWO and PSO optimized the model equally well (R2 = 0.9965 on the validation set), slightly better than GA and GSA (R2 = 0.9963). The performance metrics for MOD1 in the total dataset were: RMSE = 0.050 and R2 = 0.889; MOD2: RMSE = 0.0080 and R2 = 0.998. WAG outperformed single-phase injection, in some cases with 0.3 units higher RF. The benefits of WAG increased with stronger hysteresis. The LSSVM model could be trained to be less dependent on hysteresis and the non-injected phase during single-phase injection.


Introduction
Oil recovery through water or gas injection often lacks efficiency due to the unfavorable mobility ratio between the oil and the displacing phase. Viscous fingering, gravity segregation and heterogeneity can also lead to poor sweep. Gas features low viscosity and density and can achieve channeling and early breakthrough [1,2]. Water-alternating gas injection (WAG) is an enhanced oil recovery (EOR) technique in which water and gas are injected in cycles to displace the oil. This type of technique mitigates the exponential rate decline seen in most fields after peak production [3]. The mobility of each injected fluid is reduced by the presence of the other, producing a more favorable mobility ratio to oil. Gravity segregation becomes less detrimental, since gas sweeps the top of the reservoir, while water sweeps the bottom [2,4]. Gas usually results in lower residual oil saturation than water, but this can be further lowered by WAG. Thus, WAG utilizes the advantages of water and gas injection and minimizes their individual downsides. In several field -How well do the models predict WAG performance? -Which parameters affect RF the most? -Do the parameters have a positive or negative effect on RF? -Will the models properly account for WAG injection and single phase injection?
The paper is structured as follows. The model serving as the basis for the simulation results is outlined in Section 2.1. The dimensionless number M * is outlined in Section 2.2. This number is used in the single-input parameter model. The machine learning approach and dataset follow in Section 2.3. The eight input parameters of the second approach are also presented in those latter sections. The results from analyzing the data are shown in Section 3 and the paper is concluded in Section 4.

Mathematical Model
We consider the same modeling approach for immiscible WAG injection as [47]: A 2D reservoir layered in a horizontal direction, with one injector and one producer, both vertical and perforated along the full reservoir height. See Figure 1 for an illustration. A black oil model is assumed with an incompressible and immiscible three-phase flow of oil, water and gas and negligible capillary pressure. WAG is applied from the start, rather than as a tertiary method. Relevant equations are presented below: u Tx = −K x λ T ∂ x p, u Tz = −K z λ T ∂ z p + K z g λ o ρ o + λ w ρ w + λ g ρ g where φ denotes porosity, s i saturation of phase i, f i fractional flow, K permeability, λ i mobility, ρ i density, u T total Darcy flux and p pressure. Corey correlation was applied for relative permeabilities, while gas relative permeability hysteresis was incorporated using Land's trapping model [12] (which reduces the mobile gas saturation interval based on the parameter C) and Carlson's hysteresis model [15] with parameter α (which reduces gas relative permeability). Nygård and Andersen [47] ran simulations systematically to investigate the role of gravity segregation, the mobility ratios between the three phases, heterogeneity, hysteresis and WAG ratio and how they affected RF after 1.5 pore volumes of fluid were injected. The simulations were scaled using a combined dimensionless mobility ratio M * stating how effectively the injected fluids displaced oil under the given conditions, summarized as follows. In the design of this number, the mechanisms were incorporated one at a time. We refer to Tables A1-A3 in Appendix A for several important simulation input parameters or model configurations that were constant in the simulations. More details can be found in the original paper. The fact that these parameters remained constant was mainly due to prioritization. However, these input parameters were incorporated in the dimensionless numbers presented in the following sections.
Where denotes porosity, saturation of phase , fractional flow bility, mobility, density, total Darcy flux and pressure. Core was applied for relative permeabilities, while gas relative permeability hyste corporated using Land's trapping model [12] (which reduces the mobile g interval based on the parameter ) and Carlson's hysteresis model [15] with (which reduces gas relative permeability).
Nygård and Andersen [47] ran simulations systematically to investiga gravity segregation, the mobility ratios between the three phases, heterogen sis and WAG ratio and how they affected RF after 1.5 pore volumes of fluid w The simulations were scaled using a combined dimensionless mobility rati how effectively the injected fluids displaced oil under the given conditions, as follows. In the design of this number, the mechanisms were incorporated We refer to Tables A1-A3 in Appendix A for several important simulation in ters or model configurations that were constant in the simulations. More d found in the original paper. The fact that these parameters remained constan due to prioritization. However, these input parameters were incorporated i sionless numbers presented in the following sections.  [47]). is the the injector, while is the distance from the top of the reservoir.
Oil recovery factor (RF) is the output parameter of interest, defined as: Every grid block features same dimension Δ Δ . RF is reported after umes are injected.  [47]). is the distance from the injector, while is the distance from the top of the reservoir.
Oil recovery factor (RF) is the output parameter of interest, defined as: RF = volume oil produced volume oil initially in place Every grid block features same dimension ∆x∆z. RF is reported after 1.5 pore volumes are injected.

WAG Efficiency Characterization Using Dimensionless Number
The characteristic mobility ratio M * defined by Nygård and Andersen [47] features the following functional relation: r w is the volume fraction of water in each cycle. Larger M * is associated with lower recovery factor. Characteristic two-phase mobilities λ * i for each phase i were found by s i,max denotes the saturation of phase i where end-point relative permeability k max ri is obtained, n i is the Corey exponent and µ i is viscosity. A heterogeneity factor F H was derived from the horizontal permeability K xj and layer height h j distribution over layers j = 1 : N L : Two-phase gravity numbers were defined using the ratio of two-phase segregation time t seg and the residence time t res of the injected phase: It was found that the role of gravity depended on heterogeneity and two-phase gravity factors F w/o G , F w/o G accounting for this coupling were introduced: Note the unitless tuning parameters a 1 = 3 and a 2 = 0.5. Finally, hysteresis was incorporated into the gas characteristic relative permeability. Land's parameter C defines a hysteresis residual gas saturation s hyst gr : s hyst gr = s gr + s g,max − s gr 1 + C s g,max − s gr (13) A further modification according to r w was made: Additionally, the gas relative permeability end point k max rg in λ * g , (see Equation (8)), was reduced due to hysteresis. The reductions were performed individually for the gas-oil mobility ratio and the gas-oil gravity number first based on the parameter α and heterogeneity factor F H using unitless tuning parameters b 1 = 1, b 2 = 0.5 and b 3 = 10, b 4 = 2: Next, the fraction r w was incorporated according to: We then obtained the hysteresis-corrected characteristic gas mobilities λ * g,M and λ * g,N G by replacing s gr with s wag gr in (8), while the end-point relative permeability k max rg in (8) (11): Every input parameter is incorporated in the dimensionless number M * . Note that during single-phase injection (r w = 0 or 1), the two-phase parameters involving the phase not injected do not affect M * . Similarly, hysteresis does not affect M * during singlephase injection.

Model Input Parameters
In the first model, MOD1, we take as the only input parameter to predict RF. We also consider a machine learning model (MOD2) in which the following eight dimensionless numbers are used as input parameters: x 1 = r w , x 2 = log 10 F H , x 3 = α, x 4 = log 10 C, These numbers reflect injected fluid fractions x 1 , heterogeneity x 2 , hysteresis x 3 , x 4 , relative magnitude of fluid mobilities x 5 , x 6 and gravity vs. advective forces x 7 , x 8 . They incorporate all the input parameters used in the Eclipse model and the number M * . The overall workflow is demonstrated in Figure 2, where the two modeling approaches after the data collection step are indicated. In MOD1, a polynomial regression is performed, while machine learning is used for MOD2. The data and the detailed steps for developing MOD2 are explained below.

Reservoir Simulation Dataset and Model Approaches
In addition to the 1648 WAG and 96 single phase injection simulations generated by [47], 824 new WAG simulations were performed with new C and α values combined with existing combinations of heterogeneity, density and mobility. In the previous study, C and α were selected primarily to cover no or significant hysteresis. The values α = 0 and C = 1000 were assigned to points without hysteresis influence from the respective parameters. For MOD1, each simulation allows the calculation of M * , which is input to the corresponding output RF. The 1648 + 96 + 824 = 2568 data points were analyzed with MOD1 and correlated using a polynomial expression between RF and x 0 .

Reservoir Simulation Dataset and Model Approaches
In addition to the 1648 WAG and 96 single phase injection simulations generated by [47], 824 new WAG simulations were performed with new and values combined with existing combinations of heterogeneity, density and mobility. In the previous study, and were selected primarily to cover no or significant hysteresis. The values 0 and 1000 were assigned to points without hysteresis influence from the respective parameters. For MOD1, each simulation allows the calculation of * , which is input to the corresponding output RF. The 1648 96 824 2568 data points were analyzed with MOD1 and correlated using a polynomial expression between RF and . For the single-phase injection of water 1 , values for ⁄ and ⁄ * are not well defined since gas is not injected. The case is similar regarding ⁄ and ⁄ * at 0 (gas injection). Further, hysteresis parameters , should not matter. The insensitivity of * to the mentioned parameters under single-phase injection was ensured during its derivation. To properly define the input values under single-phase injection for MOD2 the following approach was taken: Each single-phase data point was duplicated to 16 data points, in which all combinations of high and low values of the four missing parameters were assigned. Specifically, for gas injection (points with 0, indexed 'g'), the following values were set for , , , : For the single-phase injection of water (r w = 1), values for N g/o G and M * g/o are not well defined since gas is not injected. The case is similar regarding N w/o G and M * w/o at r w = 0 (gas injection). Further, hysteresis parameters C, α should not matter. The insensitivity of M * to the mentioned parameters under single-phase injection was ensured during its derivation. To properly define the input values under single-phase injection for MOD2 the following approach was taken: Each single-phase data point was duplicated to 16 data points, in which all combinations of high and low values of the four missing parameters were assigned. Specifically, for gas injection (points with r w = 0, indexed 'g'), the following values were set for x 3 , x 4 , x 6 , x 7 : x 6,g = x 6,WAG ± Xσ 6,WAG , x 7,g = x 7,WAG ± Xσ 7,WAG , (20) where x i,WAG (i = 3, 4, 6, 7) indicate the average of the data point values applied in the WAG cases and σ i,WAG (i = 3, 4, 6, 7) the corresponding standard deviations. X is a multiplier. Similarly, for water injection (indexed 'w') the following values were set for x 3 , x 4 , x 5 , x 8 : x 3,w = x 3,WAG ± Xσ 3,WAG , x 4,w = x 4,WAG ± Xσ 4,WAG , The 96 single-phase simulations resulted in 16 · 96 = 1536 points to the ML model. In total, 1648 + 824 + 16 · 96 = 4008 data points were then applied in MOD2.

Machine Learning Dataset Preparation
The 2472 WAG cases and 96 single-phase injection cases were both divided randomly between three sets: 70% in the training set, 15% in the validation set and the remaining 15% in the testing set [48]. Single-phase data points within each set were further split into 16 points, as described previously. See Table 1 for an overview of points in the models and datasets. We apply LSSVM with radial basis kernel (RBK) function and either PSO, GWO, GA or GSA as optimizers. Each of the optimizers features its own strengths and disadvantages in finding global optima efficiently, as well as depending on its individual tuning parameters. They are swarm-based algorithms, making use of many potential solutions simultaneously and improving these solutions according to those performing best at a given iteration. The result is the existence of differently optimized LSSVM models, as indicated in Figure 2. Detailed explanations of the algorithms are provided in Appendices B and C, respectively. Each input parameter x i was normalized (denoted x N ) to a range between −1 and +1 based on the maximum and minimum values of the total dataset.
Assuming predefined values of the metaparameters (σ, γ), LSSVM provides the function y(x) and its coefficients α k , b that minimize the error between the model predictions and observations of a given dataset (usually the training dataset) for those parameter choices [27]: For given metaparameters, the LSSVM algorithm is calibrated on the training set to provide choices of α k , b. The optimizer algorithm is used to search for the metaparameters σ, γ that minimize the model prediction error on the validation dataset (i.e., many LSSVM models are calibrated on the training set and the one giving best prediction on the validation set is taken as the best). This systematically determines the best choice of metaparameters to avoid over-or under-fitting. The optimized LSSVM models were finally used to predict the data in the testing set. The model performing best overall was selected for further sensitivity analysis. For proper comparison, the optimizers were implemented with the same random initial solutions guesses (and velocities if applicable), search space and number of iterations. The optimizer parameters can be found in Table A4.
An advantage of LSSVM is its few (two) metaparameters and the automated optimization of its internal tuning parameters [27]. In comparison, artificial neural networks often need subjective selection of the number of nodes and layers and then comprehensive tuning of a vast number of weights and biases to train the network [49].
The correlation between input variables x and output y (RF) is quantified using Pearson correlation r P xy , Spearman rank r Sp xy and distance correlation r D xy coefficients. These indices, respectively, indicate linear correlation, nonlinear monotonic correlation and nonlinear nonmonotonic correlation. The former two range between −1 and 1, while the latter ranges from 0 to 1. For all of them, 0 indicates no correlation. The goodness-of-fit between the calculated RF from MOD1 or MOD2 and the data values of RF were quantified using the coefficient of determination R 2 and the root mean square error RMSE. The definitions of the mentioned quantities are in Appendix D.

Preliminary Dataset Analysis
The range and mean of the data points used in MOD1 and MOD2 (using X = 0.5) are listed in Table 2. The use of logarithms made the range of the different variables span a few units rather than orders of magnitude.  The Pearson, Spearman and distance correlation coefficients evaluated between RF and the input parameters were calculated for the two model datasets and are listed in Table 3. For MOD1, the two former coefficients were ≈ −0.94 and the latter 0.93. Their magnitude being close to 1 indicates a strong linear correlation between RF and x 0 = log 10 M * and the negative sign indicates that a larger M * reduces RF.
For MOD2, Pearson correlation coefficients were reported for WAG cases only and for all cases when X = 0.25, 0.5 and 1 in Table 3. Spearman rank and distance correlation were calculated only for X = 0.5. Considering the WAG cases, several variables correlate with RF, especially x 5 , x 6 which are the log of gas/oil and water/oil mobility ratios. They feature Pearson coefficients r P xy ∼ −0.5 to −0.6 indicating that when they increase (less favorable mobility ratio towards the oil), RF is reduced. Heterogeneity, represented by x 2 , also correlates with RF with a lower r P xy ∼ −0.27, indicating that RF generally reduces when the heterogeneity factor increases. The hysteresis parameters x 3 , x 4 correlate with RF in opposite ways to each other, with r P xy ∼ 0.15 for x 3 and −0.15 for x 4 . When x 3 (i.e., α) increases, gas relative permeability is reduced and should improve RF. Higher x 4 (i.e., log C) leads to less gas trapping and RF therefore decreases. The gravity numbers feature relatively poor Pearson correlation with RF, with r P xy ∼ 0.08 for water-oil and 0.0045 for gasoil. Similarly, the water volume fraction correlates little with RF, and r P xy ∼ −0.05 is slightly negative. These results could be related to their coupled nature, as is discussed below. Table 3. Pearson, Spearman and Distance correlation coefficients r xy evaluated for the total dataset between RF and the involved input parameters for MOD1 and MOD2.

MOD1
r P xy r Sp xy When considering the MOD2 datasets with single-phase data included for different X, we note that the magnitude and sign of the different Pearson coefficients are similar to when only the WAG cases were considered. The main difference is that the correlation is somewhat lower, especially the parameters with unspecified information during singlephase injection. This was expected, since we added points where RF does not vary with changes in these parameters.
When evaluating the dataset with Pearson rank correlation for X = 0.5, we observe similar, but slightly lower values as for the Pearson coefficient, except for x 8 , where the correlation doubles, but remains very low. When considering the distance correlation coefficient, however, several input parameters correlate more strongly with recovery, indicating that their relation is nonmonotonic. In particular, the water fraction, x 1 , features a higher distance correlation coefficient, of 0.19. WAG was expected to perform better than single-phase injection, with RF not changing linearly with x 1 , but peaking. Gravity can be a cause of both low and improved sweep and the gravity numbers x 7 , x 8 feature distance correlation coefficients around 0.08, where more impact is attributed the gas-oil gravity number in particular. Furthermore, the hysteresis parameters x 3 , x 4 now seem to correlate more strongly. The three correlation coefficients are similar for x 2 , indicating a relatively linear and monotic relation, so if all other parameters are constant, increased heterogeneity should reduce recovery.
Note that all the variables in MOD2 feature less correlation than the variable x 0 in MOD1, since they individually do not contain all the involved system parameters. The aim is for them to provide better predictions when combined.

Development of MOD1
RF is plotted against x 0 in Figure 3 and demonstrates a clear correlation where higher x 0 gives lower RF. There is also significant scatter, meaning a given x 0 can be associated with a range of RF values. The data were fitted to a third-order polynomial function, given by the blue curve in Figure 3 and Equation (24). A higher order polynomial did not further reduce the RMSE, which means the remaining error was associated primarily with the scatter in the data. The performance of the model is also shown by comparing the estimated RF and the data RF in Figure 4a together with a histogram, Figure 4b, of the residual errors (the difference between the estimated RF and the data point RF). The R 2 = 0.889 is relatively high. As seen in the histogram, the residuals are symmetrically distributed around zero and roughly 95% of the points estimate RF correctly within 0.1. The RMSE, which can be considered a more typical error, is 0.050.  The performance of the model is also shown by comparing the estimated RF and the data RF in Figure 4a together with a histogram, Figure 4b, of the residual errors (the difference between the estimated RF and the data point RF). The R 2 = 0.889 is relatively high. As seen in the histogram, the residuals are symmetrically distributed around zero and roughly 95% of the points estimate RF correctly within ±0.1. The RMSE, which can be considered a more typical error, is 0.050.

Development of LSSVM Model MOD2
The ML model MOD2 was developed using LSSVM and a dataset assuming 0.5.
The best LSSVM model was determined using different approaches. First, a random choice of metaparameters , 1,1 was used. Next, LSSVM was applied with the different optimization algorithms, PSO, GA, GWO and GSA, to systematically find the

Development of LSSVM Model MOD2
The ML model MOD2 was developed using LSSVM and a dataset assuming X = 0.5. The best LSSVM model was determined using different approaches. First, a random choice of metaparameters (σ, γ) = (1, 1) was used. Next, LSSVM was applied with the different optimization algorithms, PSO, GA, GWO and GSA, to systematically find the best metaparameters. As previously mentioned, for any combination of the metaparameters, LSSVM models were fitted to the training set and used to forecast the validation set. The metaparameters that resulted in the best performance in the validation set, after using a given optimizer algorithm, determined the best model. The test set was then forecasted.
In Figure 5, the performance of the different algorithms is illustrated as a function of the iterations performed. R 2 and RMSE for the validation set are plotted for the best solution at the given iteration, together with the corresponding values of log γ and log σ in plots a to d, respectively. The same initial solutions and number of iterations were applied in all the algorithms (different colors). Two different initializations were applied for robustness (dashed and full lines). The best metaparameters obtained during the 30 iterations are listed in Table 4, considering all four algorithms and both initializations. The corresponding metrics (R 2 and RMSE) were calculated on the training, validation and testing sets. All the optimized models performed better than the algorithm with the arbitrarily preset metaparameters, alt-  In all cases, a high R 2 ≈ 0.996 and low RMSE ≈ 0.009 were obtained after 30 iterations for all the algorithms and both starting points, although GSA deviated from initially good solutions and converged slowly or to inferior solutions. Furthermore, GA seemed to not produce as good results as PSO and GWO. The two algorithms, PSO and GWO, exhibited very similar values of log γ ≈ 5.5 and log σ ≈ 0.3 and the lowest error indicating that they were better able to find the global optima. Notably, the values of log γ and log σ varied significantly during the iterations, but mostly exhibited very good performance. This may have been due to the ability of LSSVM to tune its internal parameters α i and b for any given γ, σ.
The best metaparameters obtained during the 30 iterations are listed in Table 4, considering all four algorithms and both initializations. The corresponding metrics (R 2 and RMSE) were calculated on the training, validation and testing sets. All the optimized models performed better than the algorithm with the arbitrarily preset metaparameters, although this choice also performed well, with R 2 greater than 0.969 on all three sets. The optimized models exhibited very consistent performance in the three datasets, with R 2 ≈ 0.999 in the training set, ≈0.996 in the validation set and ≈0.992 on the testing set. The difference in R 2 was in the fourth digit for the first two sets and the third digit in the latter set. The RMSE was around 0.006 on the training set, 0.009 on the validation set and 0.015 on the test set, with GSA standing out with the highest RMSE. As final metaparameter values in the optimized model, we took an average of the four similar results from the PSO and GWO runs with two significant digits. Calculating the RMSE and R 2 metrics on the datasets confirmed that the performance with these values was still optimal (see Table 4). The LSSVM model with these parameters is referred to as MOD2 in what follows. Note especially that MOD2 is capable of predicting unseen single-phase data and thus accounts for the physics introduced during the modification of the dataset. The RMSE and R 2 were calculated with MOD2 for the total dataset as 0.0080 and 0.9976, respectively. These metrics are greatly improved compared to MOD1, which featured a corresponding RMSE of 0.0498 and an R 2 of 0.889. The calculated (with MOD2) and observed RF data are plotted against each other for the three datasets in Figure 6. For all three datasets, there is little scatter around the perfect match line. The residual errors were calculated for each datapoint in the full dataset and the results are plotted as a histogram The RMSE and R 2 were calculated with MOD2 for the total dataset as 0.0080 and 0.9976, respectively. These metrics are greatly improved compared to MOD1, which featured a corresponding RMSE of 0.0498 and an R 2 of 0.889. The calculated (with MOD2) and observed RF data are plotted against each other for the three datasets in Figure 6. For all three datasets, there is little scatter around the perfect match line. The residual errors were calculated for each datapoint in the full dataset and the results are plotted as a histogram in Figure 7. Approximately 90% of the data feature errors in the estimated RF of less than 0.01, and 95% of the data feature errors less than 0.02.  Partial derivatives with respect to each normalized variable, , were calculated for each data point using MOD2 and histograms were created for each variable, as shown in Figure 8. The derivatives were calculated numerically and two choices of Δ 5 ⋅ 10 and 10 were used. The two choices produced practically identical histograms, suggest-  , were calculated for each data point using MOD2 and histograms were created for each variable, as shown in Figure 8. The derivatives were calculated numerically and two choices of ∆x N = 5 · 10 −2 and 10 −3 were used. The two choices produced practically identical histograms, suggesting that the optimized LSSVM function did not suffer from oscillations (a sign of over-fitting). For each variable, a large fraction of the points featured positive and negative derivatives. Hence, changing the variable can affect RF positively or negatively, indicating coupling and room for finding optimal conditions.

Sensitivity Analyses with Optimized LSSVM Model MOD2
The calibrated model, MOD2, was much better at predicting RF than MOD1 and was therefore pursued in the sensitivity analysis. Below, we present contour plots showing RF as a function of different input variables, while keeping the others constant. The parameters are kept within the total dataset range (see Table 2) in order to ensure model validity.  are both dominated by negative values, since increasing the mobility ratio between gas and oil or between water and oil, respectively, should reduce RF. Higher water-oil gravity segregation is considered negative for RF, with ∂y ∂x N7 mainly negative. On the other hand, gas-oil gravity segregation is considered mainly positive for RF with a majority of points having ∂y ∂x N8 positive. This could be attributed to the better sweep of low-permeable layers in heterogeneous cases. Hysteresis appears to benefit recovery, as seen by a majority of positive ∂y ∂x N3 , although the effect of ∂y ∂x N4 seems to be equally negative and positive.

Sensitivity Analyses with Optimized LSSVM Model MOD2
The calibrated model, MOD2, was much better at predicting RF than MOD1 and was therefore pursued in the sensitivity analysis. Below, we present contour plots showing RF as a function of different input variables, while keeping the others constant. The parameters are kept within the total dataset range (see Table 2) in order to ensure model validity.

Variation of Oil Viscosity
Oil viscosity can vary greatly from one reservoir to another. It proportionally impacts mobility ratios M * w/o and M * g/o in Equations (7) and (8), represented by x 6 and x 5 . For low oil mobilities, the gravity numbers N w/o G and N g/o G (represented by x 7 and x 8 ) increase proportionally with oil viscosity but are less dependent if water or gas feature mobility that is similar to or lower than that of oil (see Equations (10) and (11)). For simplicity, we assume they are proportional. We vary the oil viscosity by 2.0 orders of magnitude, which is less than the smallest range of the four dimensionless numbers (2.3 for x 6 ), as seen in Table 2.
Four cases are defined in Table 5 with low or high heterogeneity (low or high x 2 ), and a low or high degree of hysteresis (low x 3 and high x 4 and opposite, respectively). For each of these cases, RF is plotted as a function of x 1 (the water fraction) and x 5 (representing the gas-oil mobility ratio) representing different viscosities (see Figure 9).From the figure, we observe that:   Table 5.
To better understand the relation between viscosity, heterogeneity and hysteresis, we plotted RF as a function of and * for 0.5 (WAG injection with equal volume fractions of gas and water) for the two hysteresis cases in Figure 10. Each value of represents fixed oil viscosity and the curves cover the same viscosity range as before. We observed that that: -For low hysteresis (Figure 10a), RF was very sensitive to heterogeneity for low oil viscosities and increased heterogeneity reduced RF. For high viscosity, RF changed little with heterogeneity. -With significant hysteresis (Figure 10b), low-viscosity cases produced reduced RF at higher heterogeneity, while high-viscosity cases produced increased RF. Figure 9. Contour plots of recovery factor RF plotted against x 1 = r w and x 5 = log M * go . The latter represents variation in oil viscosity, which affects all of x 5 , x 6 , x 7 , x 8 . The four cases are for low heterogeneity and hysteresis (a), high heterogeneity and low hysteresis (b), low heterogeneity and high hysteresis (c) and high heterogeneity and hysteresis (d). See all input values in Table 5.

-
Optimal RF values were mainly obtained at an intermediate water fraction 0 < x 1 (consider any line parallel with the x-axis), suggesting that WAG gives higher RF than single-phase injection. Cases with low hysteresis and favorable mobility ratios seem to give similar RF for water injection and WAG (although WAG with a low water fraction seems optimal) (see Figure 9a,b (low and high heterogeneity)). - The advantage of WAG over single-phase injection was most clear when hysteresis was significant (see Figure 9c,d). The best water fraction produced RF up to 0.3 units higher than the worst fraction. This strong impact was mainly at low oil viscosity (low x 5 ) with optimal water fraction around 0.5-0.6. For higher oil viscosity or lower heterogeneity cases, WAG was in many cases only marginally better (~0.05 units) than the best single-phase injection. -Increased oil viscosity reduced RF for a given water fraction (follow any line parallel with the y-axis). This was dominant over the WAG fraction at high viscosities, except for the highly heterogeneous cases with high hysteresis (Figure 9d). This demonstrates the benefit of WAG in heterogeneous formations and that hysteresis is an important contributor. -For a given heterogeneity (low or high), increased hysteresis improved RF (compare Figure 9c,d (high hyst) with Figure 9a,b (low hyst)). This was related to the improved gas-oil mobility ratio and reduced gravity segregation, which improves volumetric sweep. The optimal water fraction shifted to more central values, since both phases are needed for hysteresis. -For a given hysteresis state, increased heterogeneity reduced RF, especially for cases with less viscous oil (compare Figure 9b,d (high het) with Figure 9a,c (low het)). For high hysteresis cases, increased heterogeneity increased RF in cases with more viscous oil.
To better understand the relation between viscosity, heterogeneity and hysteresis, we plotted RF as a function of x 2 = logF H and x 5 = log M * go for x 1 = 0.5 (WAG injection with equal volume fractions of gas and water) for the two hysteresis cases in Figure 10. Each value of x 5 represents fixed oil viscosity and the curves cover the same viscosity range as before. We observed that that:  Table   5.

Variation of Well Distance, Injection Rate or Density Difference
The distance between wells can vary from a dense pattern of a few hundred meters onshore to ~1000 m offshore. For fixed injection rates, a longer well distance proportionally increases the residence time and, hence, the gravity numbers (see (10)), represented by and . Similarly, increasing the injection rates of water and gas equally reduces the residence times and the gravity numbers. Increased density differences reduce the segregation time and increase the gravity numbers. If the height is varied but the injection rate is the same, we note that both segregation time and residence time change equally and there is no net change in the gravity numbers. Varying the aforementioned parameters does not affect the variables to ; we can thus investigate cases in which they are constant and only the gravity numbers change.
We plotted RF as a function of injected water fraction and log gravity number (equal values of and ). We investigated the role of mobility ratio, heterogeneity and hysteresis one by one. The different cases are listed in Table 6. The gravity numbers varied equally by 2.5 orders of magnitude.  Table 5. -For low hysteresis (Figure 10a), RF was very sensitive to heterogeneity for low oil viscosities and increased heterogeneity reduced RF. For high viscosity, RF changed little with heterogeneity. -With significant hysteresis (Figure 10b), low-viscosity cases produced reduced RF at higher heterogeneity, while high-viscosity cases produced increased RF.

Variation of Well Distance, Injection Rate or Density Difference
The distance between wells can vary from a dense pattern of a few hundred meters onshore to~1000 m offshore. For fixed injection rates, a longer well distance L x proportionally increases the residence time and, hence, the gravity numbers (see (10)), represented by x 7 and x 8 . Similarly, increasing the injection rates of water Q w and gas Q g equally reduces the residence times and the gravity numbers. Increased density differences reduce the segregation time and increase the gravity numbers. If the height is varied but the injection rate is the same, we note that both segregation time and residence time change equally and there is no net change in the gravity numbers. Varying the aforementioned parameters does not affect the variables x 1 to x 6 ; we can thus investigate cases in which they are constant and only the gravity numbers change.
We plotted RF as a function of injected water fraction x 1 and log gravity number (equal values of x 7 and x 8 ). We investigated the role of mobility ratio, heterogeneity and hysteresis one by one. The different cases are listed in Table 6. The gravity numbers varied equally by 2.5 orders of magnitude. Table 6. Parameter selections for MOD2 with cases demonstrating influence of gravity numbers according to heterogeneity, mobility ratio and hysteresis.

Lo Het
Hi Het Fav Unfav Lo Hyst Hyst 3 0 x 5 2 0.5 2 1.5 x 6 2 0.5 2 1.5 x 7 −4:−1.5 When low heterogeneity x 2 = 0 is considered (Figure 11a), RF stays fairly constant at low N g (when the impact from gravity is negligible) and decreases when N g is large due to gravity segregation and reduced vertical sweep. At high heterogeneity (x 2 = 1) in Figure 11b, RF is generally lower, but increases significantly with increases in the gravity number. Gravity therefore exerts a positive effect as more of the low-permeable layers are swept by gravity drainage into the highly permeable layers [47].
A relatively heterogeneous case (x 2 = 0.8 ) is considered where either mobility ratio is favorable, Figure 12a, or unfavorable, Figure 12b. In both cases, increased gravity number improves RF, but the effect is more pronounced in the favorable mobility ratio case. In the unfavorable mobility case, gravity exerts little impact until the gravity number exceeds −3. RF is generally higher in favorable mobility ratio cases compared to corresponding unfavorable mobility ratio cases.
In a relatively uniform case (x 2 = 0.3) with intermediate mobility ratios x 5 = x 6 = 1.5), hysteresis is varied. At low hysteresis, Figure 13a, increased gravity numbers increase RF moderately towards an optimal gravity number. At high hysteresis, Figure 13b, the optimal gravity number occurs at a lower value (for a given injected fraction). The peak can be related to improved sweep in low mobility layers, which becomes dominated by gravity segregation at the highest gravity numbers.
ing unfavorable mobility ratio cases.
In a relatively uniform case ( 0.3) with intermediate mobility ratios ( 1.5), hysteresis is varied. At low hysteresis, Figure 13a, increased gravity numbers increase RF moderately towards an optimal gravity number. At high hysteresis, Figure 13b, the optimal gravity number occurs at a lower value (for a given injected fraction). The peak can be related to improved sweep in low mobility layers, which becomes dominated by gravity segregation at the highest gravity numbers.  Table 6).  Table 6).  Table 6).  Table 6).  Table 6).  Table 6).  Table 6).

Handling Single Phase Data
The model MOD2 was trained to provide the same RF during single-phase injection when varying input variables related to hysteresis and the phase not injected (for example, gas during water injection). This was performed by generating points with different input values for parameters that should not exert an influence, but with the same output. To check how effectively this was captured by the calibrated model, we ran cases in which the hysteresis parameters x 3 , x 4 and mobility ratio parameters x 5 , x 6 were varied individually. RF was plotted against the relevant variable and r w ranging from gas to water injection. The input parameters are listed in Table 7 and the results are shown in Figure 14.

Application to a 3D Model
The dataset used to train MOD1 and MOD2 was based on a 2D layered model. By obtaining effective parameters from 3D reservoir models, we can predict RF from MOD1 and MOD2. An artificial 3D heterogeneous model was considered with curvature, faults and three layers (see Figure 15). The average permeability, average porosity and layer thickness are listed in Table 8. The vertical permeability was half of the horizontal permeability. An injector and producer were placed at a distance of 1500 m and the pore volume was based on a width of 750 m. The RF was calculated after the injection of 1.5 PV, assuming five injected fractions ( equal 0, 0.33, 0.5, 0.67 and 1), two oil viscosities (30 and 110 cP) and low and high hysteresis (see values of and ).  The variation of the hysteresis parameters x 3 and x 4 , in Figure 14a,b, respectively, produces relatively constant RF with gas injection r w = 0 (RF~0.45) and water injection r w = 1 (RF~0.30), although the variation of x 3 during gas injection produces a wider range (RF = 0.30-0.45). The levels of RF differ, as is expected, since gas and water injection perform differently. Varying the gas-oil mobility ratio M * g/o (via x 5 in Figure 14c) produces much less change in RF with water injection (RF~0.3) than gas or WAG injection. Similarly, varying the water-oil mobility ratio M * w/o (via x 6 in Figure 14d) produces much less change in RF with gas injection (RF~0.45) than with water or WAG injection.

Application to a 3D Model
The dataset used to train MOD1 and MOD2 was based on a 2D layered model. By obtaining effective parameters from 3D reservoir models, we can predict RF from MOD1 and MOD2. An artificial 3D heterogeneous model was considered with curvature, faults and three layers (see Figure 15). The average permeability, average porosity and layer thickness are listed in Table 8. The vertical permeability was half of the horizontal permeability. An injector and producer were placed at a distance of 1500 m and the pore volume was based on a width of 750 m. The RF was calculated after the injection of 1.5 PV, assuming five In Figure 16, we plotted RF as a function of for the four cases, calculated with the 3D model, MOD1 and MOD2. In these examples, RF with water injection (0.13 and 0.35 for high and low oil viscosity) is higher than with gas injection (0.09 and 0.22) and RF is lower with more viscous oil. When hysteresis is high, WAG exhibits the best performance out of all the models (the RF peaks at 0 1). When hysteresis is low, the 3D model features similar RF for a high WAG fraction to water injection, while MOD1 indicates water injection as optimal and MOD2 still clearly supports WAG. MOD1 predicts a level of RF and change in RF that are more similar to those seen in the 3D model than MOD2. MOD2 predicts the level of RF in low hysteresis relatively well, but appears very sensitive to adding hysteresis. This is also seen through a difference in the single-phase injection RF for the same oil viscosity: about 0.2 units for water injection and for gas injection with low amounts of viscous oil, but, more reasonably, 0.03 units for gas injection with high amounts of viscous oil. This indicates that this region of the model could be better calibrated. The water injection points in this case could be insufficiently near the single-phase points in the dataset. Furthermore, we do not expect MOD1 and MOD2 to predict the 3D model behavior identically as the geometries are not the same.  In Figure 16, we plotted RF as a function of x 1 = r w for the four cases, calculated with the 3D model, MOD1 and MOD2. In these examples, RF with water injection (0.13 and 0.35 for high and low oil viscosity) is higher than with gas injection (0.09 and 0.22) and RF is lower with more viscous oil. When hysteresis is high, WAG exhibits the best performance out of all the models (the RF peaks at 0 < r w < 1). When hysteresis is low, the 3D model features similar RF for a high WAG fraction to water injection, while MOD1 indicates water injection as optimal and MOD2 still clearly supports WAG. MOD1 predicts a level of RF and change in RF that are more similar to those seen in the 3D model than MOD2. MOD2 predicts the level of RF in low hysteresis relatively well, but appears very sensitive to adding hysteresis. This is also seen through a difference in the single-phase injection RF for the same oil viscosity: about 0.2 units for water injection and for gas injection with low amounts of viscous oil, but, more reasonably, 0.03 units for gas injection with high amounts of viscous oil. This indicates that this region of the model could be better calibrated. The water injection points in this case could be insufficiently near the single-phase points in the dataset. Furthermore, we do not expect MOD1 and MOD2 to predict the 3D model behavior identically as the geometries are not the same.

Conclusions
In this study, we interpreted a dataset of ~2500 points generated from single-phase and WAG injection reservoir model simulations to predict the recovery factor, RF. Two modeling approaches were selected. In MOD1, a universal dimensionless number * derived from Nygård and Andersen [47] was selected as the single input variable and correlated by a polynomial expression. In MOD2, eight dimensionless numbers were used as input variables. Both choices included all the relevant input parameters to run the reservoir simulations. MOD2 was developed using LSSVM and optimized based on the best results from PSO, GA, GWO and GSA. The overall conclusions to this work can be summarized as follows: -We demonstrated that it is possible to predict the recovery factor during single-phase and WAG injection. - The LSSVM model optimized by GWO or PSO performed better than when optimized by GA or GSA. -MOD2 with eight input variables clearly performed better than MOD1 with one input. Based on the total dataset, the RMSE and R 2 were 0.0080 and 0.998 for MOD2 and 0.050 and 0.889 for MOD1, respectively.

-
The physics-based training of MOD2 was applied successfully. Single-phase injection data points were duplicated using different values in the input variables that should not affect RF, while keeping the same values for the relevant input variables and the output. The model correctly displayed little response to the irrelevant variables, but not for all conditions. Improvements could be made by adding more of these points or by training the model to include such constraints via an added penalty term in the objective function. MOD1 was analytically independent of these variables during sin-

Conclusions
In this study, we interpreted a dataset of~2500 points generated from single-phase and WAG injection reservoir model simulations to predict the recovery factor, RF. Two modeling approaches were selected. In MOD1, a universal dimensionless number M * derived from Nygård and Andersen [47] was selected as the single input variable and correlated by a polynomial expression. In MOD2, eight dimensionless numbers were used as input variables. Both choices included all the relevant input parameters to run the reservoir simulations. MOD2 was developed using LSSVM and optimized based on the best results from PSO, GA, GWO and GSA. The overall conclusions to this work can be summarized as follows: -We demonstrated that it is possible to predict the recovery factor during single-phase and WAG injection. - The LSSVM model optimized by GWO or PSO performed better than when optimized by GA or GSA. -MOD2 with eight input variables clearly performed better than MOD1 with one input. Based on the total dataset, the RMSE and R 2 were 0.0080 and 0.998 for MOD2 and 0.050 and 0.889 for MOD1, respectively. - The physics-based training of MOD2 was applied successfully. Single-phase injection data points were duplicated using different values in the input variables that should not affect RF, while keeping the same values for the relevant input variables and the output. The model correctly displayed little response to the irrelevant variables, but not for all conditions. Improvements could be made by adding more of these points or by training the model to include such constraints via an added penalty term in the objective function. MOD1 was analytically independent of these variables during single-phase injection. -Plotting histograms of partial derivatives of RF showed that for most input variables, increasing them would increase RF for some conditions, but reduce RF under others, demonstrating coupling in the data. - The best model (MOD2) predicted that under identical conditions, an optimal injected WAG fraction existed that outperformed single-phase injection (water or gas). The benefit of WAG was much clearer when gas relative permeability hysteresis was significant. - The mobility ratios were important input variables. Increased values tended to reduce RF. - The roles of gravity numbers, heterogeneity and hysteresis were coupled. Strong gravity effects reduced RF in low-heterogeneity cases, but improved RF in heterogeneous cases.
Finally, some limitations of the study and recommendations should be mentioned. Some parameters were not varied in the dataset and their role can therefore not be predicted by the model. This includes the reservoir dip angle, capillary forces, starting WAG in tertiary mode (with some period of gas or water injection first), gas miscibility and tapering (changing WAG ratio with time). Furthermore, the heterogeneity of the model was mainly described by one parameter although porosity and vertical permeability appear in other dimensionless numbers. It could also matter how the heterogeneity appears, i.e., permeability increasing up or down. We note that several parameters that were not varied are included in a physically meaningful manner into dimensionless numbers that were varied. Thus, considering new values of Corey parameters, the vertical-to-horizontal permeability ratio and reservoir layer configurations are accounted for. The proposed methodology can be applied to predict the performance of other EOR techniques as well, but requires a similar development of representative dimensionless numbers and parameters capturing the EOR effect.
It is recommended to explore the potential of physics-based machine learning [50,51] in combination with dimensionless numbers describing complex systems, as was considered in this study. The methodology of modifying the dataset as described offers the advantage of applying ML algorithms in their standard form. On the downside, the dataset is enlarged and the physics are added around the specific datapoints, not as inherent part of the model.

Appendix B. Least Squares Support Vector Machines (LSSVM)
The support vector machine (SVM) algorithm was developed by Vapnik [25] and used to solve classification problems by building hyperplanes in multidimensional spaces that separated data into classes. Its application was extended to regression. The least squares support vector machine, or LSSVM, is a modification of SVM introduced by Suykens and Vandewalle [26]. The LSSVM regression algorithm is outlined below.
Consider a finite dataset with n points D = {(x 1 , y 1 ), . . . . . . , (x n , y n )}, where the input x i ∈ R p , p being the number of input variables (in our case 1 or 8) and the output y i ∈ R. The regression function is expressed as [27]: where ϕ is a higher dimensional function and w is a weight vector that combines the contributions of each element of ϕ to a scalar. Each output measurement y i is by definition equal to the regression plus the error e i : The LSSVM algorithm aims to minimize the objective function J described as follows: min w,e J(w, e) = 1 2 γ is called the regularization coefficient and its magnitude determines which of the two terms is minimized more. The error equations are treated as equality constraints: with Lagrange multipliers α i . The conditions for optimality are found by setting partial derivatives equal to zero: We can eliminate e i and w from the above set of equations to obtain the remaining linear equations for α i and b: By applying Mercer's condition, the product ϕ(x i ) T ϕ x j is replaced by a kernel function K x i , x j : We can then solve for α i and b by solving the matrix form of (A10) and (A11): K(x 1 , x n ) . . . . . . . . . . . .

1
K(x n , x 1 ) · · · K(x n , x n ) + 1  Different choices of kernel function can be made. The radial basis kernel (RBK) function was selected: ||x i − x|| denotes the Euclidian distance between vectors x i and x, while σ is the width parameter.
The choice of the metaparameters σ and γ determines the LSSVM algorithm performance. σ controls how rapidly the function can vary around the training data points x i . For very small σ, the function equals the constant b between the points x i , while it matches y i at every x i of the training set. This results in the very poor prediction of new points. Large σ linearizes the function (a straight line for a scalar input variable). Intermediate σ are hence expected to capture non-linear trends. γ controls how much weight is placed on minimizing the mismatch compared to minimizing the magnitude of the nonlinear terms. A very low γ minimizes the coefficients of the nonlinear terms to zero and provides a constant function, equal to b. A very high γ minimizes the mismatch between the function and the training set (between y(x i ) and y i ) but allows it to be more nonlinear.