# Application of Gene Expression Programming (GEP) in Modeling Hydrocarbon Recovery in WAG Injection Process

## Abstract

R^{2}= 92.85% and MSE = 1.38 × 10^{−3}

^{−3}and R

^{2}= 91.93% and MSE = 4.30 × 10

^{−3}are attained for the testing phase. The relative importance of the input dimensionless groups is also determined. According to the sensitivity analysis, decreasing the oil–water capillary number results in a significant reduction in RF in all cycles. Increasing the magnitudes of oil to gas viscosity ratio and oil to water viscosity ratio lowers the RF of each cycle. It is found that oil to gas viscosity ratio has a higher impact on RF value compared to oil to water viscosity ratio due to a higher viscosity gap between the gas and oil phases. It is expected that the GEP, as a fast and reliable tool, will be useful to find vital variables including relative permeability in complex transport phenomena such as three-phase flow in porous media.

## 1. Introduction

_{2}absorption in piperazine [35], CO

_{2}conversion to urea [36], permeate flux during the filtration [37], CO

_{2}storage efficiency [38], fouling occurrence in membrane bioreactors [39], and the recovery performance of CO

_{2}-WAG injection processes [40,41]. In recent years, various optimization techniques such as genetic algorithm (GA) and particle swarm optimization (PSO) have been widely used as reliable approaches to optimize different upstream and downstream processes in the oil and gas industry [42]. The primary version of GA was then modified into a new algorithm, called the genetic programming (GP) approach. Gene expression programming (GEP) is a new and updated version of GA that addresses most of the drawbacks and concerns around previous versions [43]. Generally, GEP is able to obtain a solution for regression problems [44]. Unlike in the GP program where the individuals’ populations are symbolically considered as expression trees (ETs), the individuals’ populations are regarded as the linear chromosomes in the GEP algorithm [45,46]. The GEP method has been employed in various research subjects in petroleum engineering, including estimating mixture viscosity in solvent-assisted oil recovery process [47], CO

_{2}solubility in crude oil [48], minimum miscibility pressure (MMP) of live oil systems [49], petroleum emulsions’ viscosity [50], surfactant retention in porous media [51], residual gas saturation in spontaneous and forced imbibition processes [52], and oil price [53]. However, the application of this smart technique has not been reported for predicting the oil recovery of the near-miscible WAG injection process in the open-source literature. Near-miscible WAG injection studies are limited to few experimental investigations, which are highly time-consuming, expensive, and more importantly not comprehensive in terms of sensitivity analysis. Today, with significant developments in computer and data science, it is feasible to introduce robust, fast, and reliable models to forecast the performance of complex EOR processes such as WAG injection. The main objective of this work is to conduct the scaling analysis using the data provided by the validated and reliable implicit-pressure-explicit-saturation (IMPES) mathematical model for the development of a robust empirical model, which is able to predict the recovery factor of a WAG injection at the near-miscible condition.

## 2. Theory and Background

#### 2.1. WAG Mechanisms

#### 2.2. Dimensional Analysis

_{2}injection performance. Two different correlations were introduced with respect to different injection scenarios in the Weyburn field. The primary correlation was developed according to the WAG injection in vertical wells and the second correlation was generated based on the horizontal well injection by applying the Kinder Morgan CO

_{2}Scoping model and utilizing the field production data. However, in their proposed model, only the oil production rate and CO

_{2}and water injection rates were accounted for, and other field or operational variables were not included. Liu et al. [80] suggested a new model for developing dimensionless CO

_{2}injection performance such as total injection (DTI), CO

_{2}injection (DCI), tertiary oil production (DEOR), and CO

_{2}production (DCP) for various WAG injection approaches. A Microsoft Excel VBA program (for injecting CO

_{2}pulses) was applied to develop the prototypes for forecasting the system performance. Their new methodology (pulse method) was verified using mechanistic simulation results of finite elements for different WAG injection processes or different CO

_{2}injection slug sizes, or both [80]. Jaber et al. [25] introduced a simple data-driven model to evaluate the miscible CO

_{2}-WAG injection performance in an Iraqi oil field. They employed a central composite design (CCD) to introduce a proxy model. They implemented an ANOVA to examine the effectiveness of the variables and their combinations within the model. The proposed proxy model determined the incremental oil recovery $(\Delta FOE)$ as a function of reservoir properties and operational conditions including permeability, porosity, ratio of vertical to horizontal permeability, cyclic length, bottomhole pressure, ratio of injected CO

_{2}over water slug size, and injected CO

_{2}slug size.

#### 2.3. Fundamentals of GEP

## 3. Methodology

#### 3.1. Data Collection

#### 3.2. Governing and Auxiliary Equations

_{c}) (see Equation (3)). Capillary pressure is a function of fluids’ saturations and distribution, and porosity and permeability are the rock characteristics.

#### 3.3. Model Assumptions and Limitations

- Gravity forces are neglected.
- The flow direction is considered as 1D horizontal in the system.
- The core and fluids are considered incompressible.
- Core is strongly water-wet and homogeneous.
- The equilibrium of capillary forces is held in the system.
- The temperature of the system is 38 °C and the thermal equilibrium holds in the system.
- The capillary end effects are neglected.

#### 3.4. Design of Experiment (DOE)

_{g}and q

_{w}), system permeability (K), pore volume injection of fluids (PVI), and number of cycles (N) are chosen to determine the dimensionless numbers and their significance. Using a two-level full factorial DOE, each parameter is studied at two levels with the upper and lower bounds coded as +1 and −1. Table 2 lists the upper and lower levels of all variables.

^{6}full factorial design. Thus, a total of 96 runs are required with no replicates of each run to obtain the response.

#### 3.5. Dimensionless Scaling Groups

_{w}and q

_{g}represent the water and gas injection rates; PVI is the pore volume injection during each injection mode (WF or GI); and N introduces the number of injected cycles.

#### 3.6. Analysis of Variance (ANOVA)

#### 3.7. GEP Procedure

- Initializing the population through generating random chromosomes of a certain number of individuals.
- Fitting the population individuals according to the fitness functions.
- Selecting some individuals and copying them for the next generation based on their fitness (simple elitism) [96].
- Applying the same procedure for the new population including the selection of the environment, expression of genomes, selection of the population individuals, and reproduction with modification.
- Repeating the previous steps until the termination criteria are met.

#### 3.8. Model Development Steps

- Generating the population using random chromosome individuals and applying correlation formats as pars trees using the functions or operators (×, +, −), and terminals which are functions of input variables and output results (RF of WAG).
- Computing the fitness value for each individual of the generated population using the following objective function (OF):

- Applying the genetic operators on selected chromosomes, including:
- -
- Replication operator: This operator copies the chromosome’s structure selected in step 3.
- -
- Mutation operator: As the most important step in the GEP algorithm, the mutation can occur anytime and at any position in a genome, as long as the mutated chromosome meets the validity criteria. The mutation operator changes the head and tail terminals, while the original structure of the chromosome is preserved.
- -
- Inversion: The inversion operator is only applied to the heads of genes, where any sequence is randomly selected and employed. The inversion operator selects the chromosome, the gene to be modified, and the initiation and termination points of the sequence to be inverted at random.

- Transposition and insertion sequence elements: A portion of the genomes, which can be activated and jump to another place in the chromosome, are called the transposable elements of the GEP program. Ferreira [45] divided these elements into three types: “short fragments with either a terminal or function in the first position transpose to the head of genes, short fragments with a function in the first position that transpose to the rest of the head of genes (root IS elements or RIS elements), and entire genes that transpose to commencing of chromosomes.”
- Recombination: This step normally involves two parent chromosomes to produce two new chromosomes through combining various parts of the parents through three approaches: linking one-point recombination, two-point recombination, and gene recombination [35]. Accordingly, the new generation will be reproduced, and the procedure is continued until the termination criteria are met.

## 4. Results and Discussions

#### 4.1. Model Development

^{2}) are used for statistical analysis of the developed model. Thus, the error analysis can effectively test the reliability and accuracy of the proposed model. For instance, the R

^{2}parameter demonstrates the degree of a match between the target data generated by the mathematical model and the calculated RF data using the newly proposed correlation. Equations (A2) to (A6) in Appendix A express the mathematical formulas of the statistical measures used in this study. Table 6 presents the results of statistical error analysis for both the training and testing phases. The low values of error as well as the high magnitudes of correlation of determination (R

^{2}

_{Training}= 0.93 and R

^{2}

_{Testing}= 0.92) for both phases confirm the effectiveness and precision of the new GEP correlation.

#### 4.2. Relative Importance (RI) of Input Variables

^{2}= 0.9285 for the training phase, and the coefficient of determination R

^{2}= 0.9193 for the testing phase confirm that the proposed correlation is reliable and accurate for predicting oil recovery factor in a WAG injection process.

#### 4.3. Evaluation of Developed Correlation

_{gi}= 0). The process was then followed with the first gas injection in which the first cycle of the WAG injection was complete (N = 1). The consecutive flooding of water and gas continued for three cycles (N = 3). The process was terminated after the third gas injection where no significant amount of oil was recovered from the porous system. In the experiments, the process was conducted at a WAG ratio of 1:1 with a constant gas injection rate (e.g., q

_{inj}= 25 cm

^{3}/h). The RF against the number of injected cycles (N), based on the experimental data, predicted values by the new correlation, and the estimated values by the mathematical model, is presented in Table 9 and Figure 8.

#### 4.4. Effect of Capillary Number

_{ca}) is defined as the ratio of viscous forces to capillary forces [101]. There are numerous expressions for the capillary number (N

_{ca}). Among the proposed versions, the capillary number introduced by Saffman and Taylor [102] is the most common form, as given below:

_{w}and q

_{g}represent the water and gas injection rates; and K is the permeability of the medium. Since ${\pi}_{1}$ and ${\pi}_{2}$ show approximately the same relative importance within the developed correlation for predicting the RF of the WAG flooding process, a sensitivity analysis on the impact of ${\pi}_{1}$ on the RF of a WAG injection is conducted. To investigate the impact of capillary number (the inverse of ${\pi}_{1}$) on oil RF, the results of WAG RF for three cycles (N = 1, 2, 3) at three orders of magnitude of ${\pi}_{1}$ are compared in Figure 9. According to the results presented in Figure 9, by increasing the ${\pi}_{1}$ from the initial value of 8.16 × 10

^{−5}(corresponding to the value at the experimental condition) to 8.16 × 10

^{−2}, and 8.16 × 10

^{−1}, the ultimate RF of WAG injection decreases from 90.32% to 85.79%, and 75.03%, respectively. This implies that the RF of a WAG injection process decreases by 16.92% upon a decrease in the capillary number by four orders of magnitude. The same trend of RF reduction at higher values of ${\pi}_{1}$ is noticed at the end of the first and middle injection cycles. Increasing ${\pi}_{1}$ by four orders of magnitude (corresponding to decreasing the capillary number by four orders of magnitude) lowers the ultimate RFs of the first (N = 1), and the second (N = 2) cycles by 22% and 23.1%, respectively. This result highlights the dominancy of viscous forces at high capillary numbers, resulting in more oil trapping in the porous medium and a decrease in oil RF during various cycles of a WAG flooding process.

#### 4.5. Effect of Viscosity Ratio (${\pi}_{3},{\pi}_{4}$)

^{ultimate}= 90.32%) to 2 (RF

^{ultimate}= 84.14%), and 3 (RF

^{ultimate}= 76.66%), the ultimate recovery factors of WAG injection after three cycles of injection decrease by 6.84% and 15.12%, for ${\pi}_{3}$ = 2, and ${\pi}_{3}$ = 3, respectively. Increasing ${\pi}_{3}$ also decreases the oil recovery at the first and second injection cycles, significantly. The RF results at each cycle are provided in Table 10. The results are consistent with the previous studies in which the greater viscosity gap between the oil and gas leads to unfavorable high mobility ratio, bypassing the oil bank (gas channeling), and early gas breakthrough [20].

## 5. Summary and Conclusions

^{2}= 92.85% and MSE = 1.38 × 10

^{−3}are attained for the training phase. The results of the relative importance (RI) of the input variables indicate that the number of injected cycles (N, ${\pi}_{7}$) has the highest impact on the developed correlation with an RI of 37.14%. After ${\pi}_{7}$, ${\pi}_{3}$ and ${\pi}_{4}$ are reported as the most essential parameters with RI values of 36.61%, 18.84%, and 2.30%, respectively.

## Nomenclatures

Acronyms | |

ANOVA | Analysis of variance |

CCD | Central composite design |

DCI | Dimensionless CO_{2} injection |

DCP | Dimensionless CO_{2} production |

DEOR | Dimensionless tertiary oil recovery |

DOE | Design of experiment |

DTI | Dimensionless total injection |

EOR | Enhanced oil recovery |

ET | Expression tree |

GA | Genetic algorithm |

GEP | Gene expression programming |

GI | Gas injection |

GP | Genetic programming |

IFT | Interfacial tension |

IOR | Improved oil recovery |

IMPES | Implicit-pressure-explicit-saturation |

M | Mobility ratio |

MMP | Minimum miscible pressure |

MSE | Mean square error |

N | Number of injected cycles |

OF | Objective function |

PSO | Particle swarm optimization |

PVI | Pore volume injection |

RAE | Relative absolute error |

RF | Recovery factor |

RI | Relative importance |

RMSE | Root-mean-square error |

RSE | Residual standard error |

WF | Waterflooding |

WAG | Water-alternating-gas |

Variables and Parameters | |

a | Capillary exponent |

C_{ij} | Correlation constant values |

C_{i} | Capillary constant [Pa] |

F | The main effect of factors in ANOVA |

K | Absolute permeability [md] |

k_{ri} | Relative permeability of phase i |

p | Pressure [Pa] |

p value | The interaction effect of factors in ANOVA |

q | Flowrate [m^{3}/h] |

R^{2} | Coefficient of determination |

s_{i} | Saturation of phase i |

t | Time [h] |

v | velocity [m/h] |

x | Length [m] |

Greek Letters | |

µ | Viscosity [cP] |

ρ | Density [kg/m^{3}] |

σ | Interfacial tension [N/m] |

ϕ | Porosity |

θ | Contact angle |

${\pi}_{i}$ | Dimensionless number |

Subscripts and Superscripts | |

ave | Average |

ca | Capillary |

D | Drainage |

ed | Displaced phase (oil) |

exp | Experiment |

g | Gas phase |

ing | Displacing phase |

I | Imbibition |

nw | Nonwetting phase |

o | Oil phase |

og | Oil–gas system |

ow | Oil–water system |

r | Residual phase |

w | Wetting phase |

## Appendix A

**Table A1.**Description of parameters used in Equation (A1) [88].

Parameter | Description |
---|---|

${p}_{c,ij}$ | Capillary pressure (in a three-phase system) |

${\sigma}_{ij}$ | Interfacial tension |

${\theta}_{ij}$ | Contact angle |

${c}_{i}$, ${c}_{j}$ | Capillary entry pressure |

s | Saturation |

$a$ | Capillary exponent |

nw, w | Nonwetting and wetting phases, respectively. |

i, j | Existing phases (oil, water, or gas) |

**Figure 1.**A WAG injection process and distribution of phases in a typical reservoir (Modified after [62]).

**Figure 2.**A typical two-gene chromosome with its corresponding mathematical expression (Modified after Gharagheizi et al. [46]).

**Figure 3.**A schematic flowchart of workflow for developing the RF correlation of WAG in this research.

**Figure 5.**Relative importance of all input variables included in the new correlation for RF determination of WAG injection process.

**Figure 8.**Comparison of RF of the WAG process generated by the developed correlation, mathematical model, and experimental tests.

**Figure 9.**Effect of oil inverse capillary number (${\pi}_{1}$) on the RF of the WAG injection using the GEP correlation.

**Figure 10.**Effect of oil to water viscosity ratio (${\pi}_{3}$) on the recovery performance of near-miscible WAG injection based on the developed correlation.

**Figure 11.**Impact of oil to gas viscosity ratio (${\pi}_{4}$) on the recovery performance of near-miscible WAG injection based on the GEP correlation.

Governing Equations | Auxiliary Equations | ||
---|---|---|---|

$\nabla .\left(\frac{{\rho}_{i}K{k}_{ri}}{{\mu}_{i}}\frac{\partial {p}_{i}}{\partial x}\right)+{q}_{i}=\frac{\partial}{\partial t}\left(\varphi {\rho}_{i}{s}_{i}\right)i\in \left\{o,w,g\right\}$ | ${s}_{o}+{s}_{w}+{s}_{g}=1$ | ||

$\rho $ | Density of the fluid | ${p}_{c}={p}_{nw}-{p}_{w}$ | |

$K$ | Rock permeability | ${p}_{c}$ | Capillary pressure |

${k}_{ri}$ | Relative permeability of phase i | ${p}_{nw}$ | Pressure of the nonwet phase |

μ | Viscosity of the fluid | ${p}_{w}$ | Pressure of the wet phase |

$p$ | Pressure | $s$ | Saturation of phases |

$x$ | Spatial location | o | Oil phase |

q | Source/sink term | w | Water phase |

$t$ | Time | g | Gas phase |

Factors | Level | |
---|---|---|

Low (−1) | High (+1) | |

${\mu}_{o}$ (Pa.h) | 1.11 × 10^{−8} | 1.11× 10^{−7} |

${q}_{w}$ (m^{3}/h) | 25 × 10^{−6} | 40 × 10^{−6} |

${q}_{g}$ (m^{3}/h) | 25 × 10^{−6} | 40 × 10^{−6} |

K (mD) | 65 | 200 |

PVI | 0.5 | 1 |

N | 1 | 3 |

Variables | Fixed Parameters | Response Variable |
---|---|---|

${\mu}_{o}$ (cP) | ${\sigma}_{ow}$ (N/m) | RF |

q_{w} (m^{3}/h) | ${\sigma}_{og}$ (N/m) | |

q_{g} (m^{3}/h) | ${\mu}_{w}$ (cP) | |

K (mD) | ${\mu}_{g}$ (cP) | |

PVI | ||

N |

**Table 4.**Analysis of variance (ANOVA) table to assess design parameters (dimensionless groups) in a WAG injection process [d.f stands for the degree of freedom].

Source | Sum Sq | d.f | F | p |
---|---|---|---|---|

${\pi}_{1}$ | 0.0092 | 2 | 6.66 | 0.0018 |

${\pi}_{2}$ | 0.0159 | 1 | 10.45 | 0.0010 |

${\pi}_{3}$ | 0.0100 | 4 | 200.24 | <0.0001 |

${\pi}_{4}$ | 0.0153 | 1 | 40.34 | 0.0013 |

${\pi}_{5}$ | 0.0189 | 1 | 12.39 | 0.0007 |

${\pi}_{6}$ | 0.0635 | 1 | 4.67 | 0.0023 |

${\pi}_{7}$ | 0.0614 | 4 | 348.21 | <0.0001 |

Error | 0.1564 | 8 | ||

Total | 0.1942 | 22 |

Configuration | Value |
---|---|

Population size | 96 |

No. of chromosomes | 33 |

Head size | 8 |

No. of genes | 4 |

Fitness function | OF |

Map operators | $+,-,^,\times ,\xf7,\surd ,Ln,Log,\dots $ |

No. of constants per gene | 10 |

Statistical Measures | Training | Testing |
---|---|---|

MSE | 1.38 × 10^{−3} | 4.30 × 10^{−3} |

RMSE | 3.72 × 10^{−2} | 6.56 × 10^{−2} |

MAE | 3.06 × 10^{−2} | 5.25 × 10^{−2} |

RSE | 7.15 × 10^{−2} | 23.29 × 10^{−2} |

RAE | 26.85 × 10^{−2} | 47.87 × 10^{−2} |

Correlation coefficient (%) | 96.36 | 87.68 |

R^{2} (%) | 92.85 | 91.93 |

Constant | Value |
---|---|

C_{13} | −4.7530 |

C_{15} | −5.4106 |

C_{18} | −7.5887 |

C_{11} | 6.7068 |

C_{10} | 2.2652 |

C_{28} | 11.5608 |

C_{29} | 0.7480 |

C_{22} | −59.7849 |

C_{35} | −7.7281 |

C_{32} | 9.5931 |

C_{49} | 0.0875 |

C_{42} | 0.4019 |

**Table 8.**The statistics of the input variables to develop the new correlation with regards to the response variable (RF).

Attribute | ${\mathit{\pi}}_{1}$ | ${\mathit{\pi}}_{2}$ | ${\mathit{\pi}}_{3}$ | ${\mathit{\pi}}_{4}$ | ${\mathit{\pi}}_{5}$ | ${\mathit{\pi}}_{6}$ | ${\mathit{\pi}}_{7}$ |
---|---|---|---|---|---|---|---|

Importance | 1.89$\times {10}^{-2}$ | 1.70$\times {10}^{-2}$ | 3.66$\times {10}^{-1}$ | 1.88$\times {10}^{-1}$ | 2.30$\times {10}^{-2}$ | 1.53$\times {10}^{-2}$ | 3.71$\times {10}^{-1}$ |

Minimum | 3.16$\times {10}^{-2}$ | 4.71$\times {10}^{-2}$ | 1.61 | 6.17$\times {10}^{-2}$ | 6.25$\times {10}^{-1}$ | 5.00$\times {10}^{-1}$ | 1.00 |

Maximum | 1.56$\times {10}^{-1}$ | 2.32$\times {10}^{-1}$ | 16.09 | 6.17$\times {10}^{-1}$ | 1.60 | 1.00 | 3.00 |

Average | 8.22$\times {10}^{-2}$ | 1.25$\times {10}^{-1}$ | 8.62 | 3.30$\times {10}^{-1}$ | 1.06 | 7.58$\times {10}^{-1}$ | 2.14 |

Median | 7.39$\times {10}^{-2}$ | 1.10$\times {10}^{-1}$ | 1.61 | 6.17$\times {10}^{-2}$ | 1.00 | 1.00 | 2.00 |

Standard deviation | 4.63$\times {10}^{-2}$ | 7.19$\times {10}^{-2}$ | 7.29 | 2.79$\times {10}^{-1}$ | 3.58$\times {10}^{-1}$ | 2.52$\times {10}^{-1}$ | 8.33$\times {10}^{-1}$ |

R^{2} (vs. Response) | 2.62$\times {10}^{-3}$ | 2.30$\times {10}^{-4}$ | 3.24$\times {10}^{-1}$ | 3.24$\times {10}^{-1}$ | 4.51$\times {10}^{-3}$ | 4.12$\times {10}^{-3}$ | 5.67$\times {10}^{-1}$ |

**Table 9.**Comparison of RF and relative errors of WAG injection generated by the developed correlation, mathematical model, and experimental work.

N | RF_{GEP}(%) | Relative Error^{GEP-Exp}(%) | RF_{Target}(%) | Relative Error^{GEP-Target}(%) | RF_{Exp}(%) |
---|---|---|---|---|---|

0.5 | 59.35 | 18.77 | 50.08 | 18.51 | 49.97 |

1 | 69.20 | 5.50 | 64.64 | 7.05 | 65.59 |

1.5 | 76.08 | 5.65 | 71.94 | 5.75 | 72.01 |

2 | 81.56 | 3.08 | 79.37 | 2.76 | 79.12 |

2.5 | 86.22 | 1.33 | 84.30 | 2.28 | 85.09 |

3 | 90.32 | 3.48 | 92.00 | 1.83 | 93.58 |

${\mathit{\pi}}_{3}=\frac{{\mathit{\mu}}_{\mathit{o}}}{{\mathit{\mu}}_{\mathit{g}}}$ | N | RF (%) | ${\mathit{\pi}}_{4}=\frac{{\mathit{\mu}}_{\mathit{o}}}{{\mathit{\mu}}_{\mathit{w}}}$ | RF (%) |
---|---|---|---|---|

1.59 | 1 | 69.20 | 0.0611 | 69.20 |

2 | 81.56 | 81.56 | ||

3 | 90.32 | 90.32 | ||

2.00 | 1 | 63.15 | 0.100 | 62.14 |

2 | 75.46 | 74.89 | ||

3 | 84.14 | 84.25 | ||

3.00 | 1 | 57.00 | 0.120 | 59.40 |

2 | 70.00 | 73.42 | ||

3 | 76.66 | 81.19 |

