Application of Gene Expression Programming (GEP) in Modeling Hydrocarbon Recovery in WAG Injection Process

: Water alternating gas (WAG) injection has been successfully applied as a tertiary recovery technique. Forecasting WAG ﬂooding performance using fast and robust models is of great importance to attain a better understanding of the process, optimize the operational conditions, and avoid high-cost blind tests in laboratory or pilot scales. In this study, we introduce a novel correlation to determine the performance of the near-miscible WAG ﬂooding in strongly water-wet sandstones. We conduct dimensional analysis with Buckingham’s π theorem technique to generate dimensionless numbers using eight key parameters. Seven dimensionless numbers are employed as the input variables of the desired correlation for predicting the recovery factor of a near-miscible WAG injection. A veriﬁed mathematical model is used to generate the required training and testing data for the development of the correlation using a gene expression programming (GEP) algorithm. The provided data points are then separated into two subsets: training (67%) to develop the model and testing (33%) to assess the models’ capability. Conducting error analysis, statistical measures and graphical illustrations are provided to assess the effectiveness of the introduced model. The statistical analysis shows that the developed GEP-based correlation can generate target data with high precision such that the training phase leads to R 2 = 92.85% and MSE = 1.38 × 10 − 3 and R 2 = 91.93% and MSE = 4.30 × 10 − 3 are attained for the testing phase. The relative importance of the input dimensionless groups is also determined. According to the sensitivity analysis, decreasing the oil–water capillary number results in a signiﬁcant reduction in RF in all cycles. Increasing the magnitudes of oil to gas viscosity ratio and oil to water viscosity ratio lowers the RF of each cycle. It is found that oil to gas viscosity ratio has a higher impact on RF value compared to oil to water viscosity ratio due to a higher viscosity gap between the gas and oil phases. It is expected that the GEP, as a fast and reliable tool, will be useful to ﬁnd vital variables including relative permeability in complex transport phenomena such as three-phase ﬂow in porous media.


Introduction
Water alternating gas (WAG) injection is a common enhanced oil recovery (EOR) technique; this recovery method has been recognized as a cost-effective and successful method for greater oil production [1,2]. Over recent decades, there have been some field applications, numerical simulations, and laboratory experiments on WAG injection processes [1,2]. In a study conducted by Skauge et al. [3], it was reported that WAG injection can increase the oil recovery factor by 5-10% in the field scale.
Simulation and numerical modeling of the WAG flooding processes have been investigated in the literature to explore the effect of various key parameters such as the number of injected cycles, WAG ratio [4,5], wettability [6,7], relative permeability and hysteresis [7,8] on the WAG performance. One of the main problems in simulating and optimizing a WAG injection process is the development of reliable and accurate correlations for rock and fluid characteristics and the number of trapped phases in the porous medium [9]. The cyclic process of the WAG operation and the three-phase flow in the reservoir cause extra burden in terms of computational costs and modeling robustness. The main focus of most previous simulation and optimization research investigations related to the WAG flooding has been on the analytical methods of solving the governing equations of three-phase flow, and auxiliary equations such as relative permeability models in the oil production process [10,11].
Despite the availability of several correlations for determination of three-phase relative permeability and capillary pressure for WAG flooding [12,13], they fail in different situations where there is mass transfer between phases, especially when one phase disappears. Dynamic behaviors caused by the evaporation of the trapped oil and dissolution of the injected gas in the residual oil also add extra complications to the WAG process. Many past studies were not able to have a proper evaluation of WAG process performance due to incorporation of inappropriate three-phase relative permeability or capillary pressure expressions/data in their models [14][15][16][17].
In the oil and gas industry, laboratory coreflood experiments have been conventionally used as a representative of actual hydrocarbon reservoirs to evaluate the effectiveness of EOR processes such as WAG flooding prior to field applications. Coreflood experiments can be also used to quantify flow properties such as capillary pressure and relative permeability curves. The acquired data from a small laboratory model can be utilized to predict the behavior of other similar systems. In recent decades, several research investigations have focused on WAG efficiency evaluation [18], process simulation [19], pilot tests [1,2], process governing mechanisms [20], hysteresis effects [21], and management of WAG implementations [22]. However, the WAG process has not been well-developed or understood yet. One of the overlooked aspects in the WAG process is the development of predictive tools prior to conducting pilot and field tests [23]. In some cases, complicated and timely simulations and modeling approaches are employed to examine the efficiency of the WAG process. For instance, the computational time for simulation and optimization of some reservoirs, particularly heterogeneous cases, might be thousands of hours [24]. Moreover, the optimization process conducted by simulation modeling approaches is based on a one parameter at a time process, without involving the effect of interactions between uncertain parameters on the process output [25]. Therefore, fast and low-cost tools for the assessment of WAG injection are required to overcome this issue.
As the first step in the prediction process, it is required to provide a set of relationships between the two systems, i.e., the given system or experimental model which is used to estimate the behavior of the target equivalent system, and the one of actual interest (the prototype system). These relationships are generally known as the scaling laws, similarity laws, or similarity requirements [26]. The scaling process eventually leads to developing dimensionless numbers, which are known as dimensionless scaling groups.
Smart computational tools have been extensively utilized for the prediction of various properties and parameters in health, safety, and chemical and petroleum engineering, including reservoir fluid and rock properties, process optimization, and performance assessment of EOR techniques [27][28][29][30][31][32]. For instance, machine learning and artificial intelligence have been employed to investigate/forecast the unloading gradient pressure in continuous gas-lift systems [33], air specific heat ratios [34], CO 2 absorption in piperazine [35], CO 2 conversion to urea [36], permeate flux during the filtration [37], CO 2 storage efficiency [38], fouling occurrence in membrane bioreactors [39], and the recovery performance of CO 2 -WAG injection processes [40,41]. In recent years, various optimization techniques such as genetic algorithm (GA) and particle swarm optimization (PSO) have been widely used as reliable approaches to optimize different upstream and downstream processes in the oil and gas industry [42]. The primary version of GA was then modified into a new algorithm, called the genetic programming (GP) approach. Gene expression programming (GEP) is a new and updated version of GA that addresses most of the drawbacks and concerns around previous versions [43]. Generally, GEP is able to obtain a solution for regression problems [44]. Unlike in the GP program where the individuals' populations are symboli-Energies 2021, 14, 7131 3 of 28 cally considered as expression trees (ETs), the individuals' populations are regarded as the linear chromosomes in the GEP algorithm [45,46]. The GEP method has been employed in various research subjects in petroleum engineering, including estimating mixture viscosity in solvent-assisted oil recovery process [47], CO 2 solubility in crude oil [48], minimum miscibility pressure (MMP) of live oil systems [49], petroleum emulsions' viscosity [50], surfactant retention in porous media [51], residual gas saturation in spontaneous and forced imbibition processes [52], and oil price [53]. However, the application of this smart technique has not been reported for predicting the oil recovery of the near-miscible WAG injection process in the open-source literature. Near-miscible WAG injection studies are limited to few experimental investigations, which are highly time-consuming, expensive, and more importantly not comprehensive in terms of sensitivity analysis. Today, with significant developments in computer and data science, it is feasible to introduce robust, fast, and reliable models to forecast the performance of complex EOR processes such as WAG injection. The main objective of this work is to conduct the scaling analysis using the data provided by the validated and reliable implicit-pressure-explicit-saturation (IMPES) mathematical model for the development of a robust empirical model, which is able to predict the recovery factor of a WAG injection at the near-miscible condition.
This manuscript is structured as follows. After the introduction, in the theory and background section, we discuss a proper description of the WAG injection technique and a background on dimensional analysis approaches, and the introduction and fundamentals of GEP algorithms. In the methodology section, the application of a dimensionless model and its principles for generating the required data are discussed. Then, the design of experiment is provided, which is required to know the number of required runs, data, and dimensionless scaling groups. Afterwards, the analysis of variance (ANOVA), procedure of the GEP algorithm, and the model development are described. In the results and discussions section, the results of the ANOVA test and the relative importance of the input parameters are elaborated. We also discuss the results of the testing and training phases, statistical error analysis, GEP correlation, and sensitivity analysis. In the last section of this manuscript, the summary and main conclusions are given.

WAG Mechanisms
Primary water flooding (WF) and gas injection (GI) processes are the two most conventional and common techniques in hydrocarbon production among the tertiary and secondary methods [54]. The efficiency of these processes relies on the volumetric sweep efficiency (the macroscopic scale) and the microscopic displacement efficiency [55].
The major drawbacks during GI processes are the gas fingering phenomenon caused by unfavorable (high) mobility ratio and insignificant macroscopic (volumetric) production efficiency [56]; and early gas breakthrough occurrence which has been reported in various pilot and field applications [57][58][59][60] caused by gas fingering and channeling in layers with high permeability. The main issues with the GI performance are related to the mobility of the fluids and reservoir conformance [56]. On the other hand, the cyclic injection of water and gas (in a WAG injection process) reduces the effective permeability of the gas phase, resulting in stabilized fluids' fronts, and thereby enhancing the overall sweep efficiency of the system. Gravity segregation is another governing mechanism during a WAG flooding process, which is caused by the density difference between the active phases in the medium [61]. During the WF cycles, through displacing the oil bank at the bottom layers of the reservoir (bypassed during the GI cycle), gravity segregation enhances the vertical sweep efficiency of the system [61]. Figure 1 schematically depicts a general WAG flooding process including the distribution of the injecting (gas and water) and displaced (oil) phases in a typical porous medium. general WAG flooding process including the distribution of the injecting (gas and water) and displaced (oil) phases in a typical porous medium.  [62]).
The immiscible WAG (IWAG) is generally regarded as a WAG injection process in which the gas phase does not develop any miscibility with the oil phase. However, during a IWAG process, a slight mass transfer between the gas slug and the oil bank may occur, which improves the oil recovery. For a flow system with high gas-oil IFT (IWAG), despite the connectivity of the oil phase, the recovered oil by the film flow is insignificant when there are no other driving forces such as gravity forces. In a miscible gas-injection process, since there is no interface between the oil and gas phases, the fluids' behavior is the same as a single-phase flow in which the oil recovery occurs only through dispersion of phases and molecular diffusion mechanisms. However, WAG injection in the near-miscible gasinjection model is another effective process from operational and economic points of view [20].

Dimensional Analysis
Two conventional techniques are proposed to generate the dimensionless groups: inspectional analysis [63,64], and dimensional analysis [61,62]. The inspectional analysis runs based on the governing differential equations of fluid displacement, while the dimensional analysis performs based on the pertaining variables that directly affect the system's behavior. In the inspectional analysis procedure, the differential governing equations of the process, including the boundary and initial conditions, are transformed into dimensionless configurations through employing normalized variables into the differential equations [63,64]. This transformation leads to obtaining dimensionless forms of both dependent and independent variables, and dimensionless forms of scaling groups. In contrast to the inspectional analysis, in the dimensionless analysis, only the pertinent variables are required, and the governing equations are not transformed. In this technique, to obtain the dimensionless scaling groups, the power products of the variables are rendered to dimensionless forms. This leads to generating a set of homogeneous, linear, and algebraic equations [61,62]. The solutions of these equations provide a series of complete independent, meanwhile not unique, dimensionless numbers. Applying Buckingham's theorem on a set of selected variables results in generating the number and the forms of The immiscible WAG (IWAG) is generally regarded as a WAG injection process in which the gas phase does not develop any miscibility with the oil phase. However, during a IWAG process, a slight mass transfer between the gas slug and the oil bank may occur, which improves the oil recovery. For a flow system with high gas-oil IFT (IWAG), despite the connectivity of the oil phase, the recovered oil by the film flow is insignificant when there are no other driving forces such as gravity forces. In a miscible gas-injection process, since there is no interface between the oil and gas phases, the fluids' behavior is the same as a single-phase flow in which the oil recovery occurs only through dispersion of phases and molecular diffusion mechanisms. However, WAG injection in the near-miscible gas-injection model is another effective process from operational and economic points of view [20].

Dimensional Analysis
Two conventional techniques are proposed to generate the dimensionless groups: inspectional analysis [63,64], and dimensional analysis [61,62]. The inspectional analysis runs based on the governing differential equations of fluid displacement, while the dimensional analysis performs based on the pertaining variables that directly affect the system's behavior. In the inspectional analysis procedure, the differential governing equations of the process, including the boundary and initial conditions, are transformed into dimensionless configurations through employing normalized variables into the differential equations [63,64]. This transformation leads to obtaining dimensionless forms of both dependent and independent variables, and dimensionless forms of scaling groups. In contrast to the inspectional analysis, in the dimensionless analysis, only the pertinent variables are required, and the governing equations are not transformed. In this technique, to obtain the dimensionless scaling groups, the power products of the variables are rendered to dimensionless forms. This leads to generating a set of homogeneous, linear, and algebraic equations [61,62]. The solutions of these equations provide a series of complete independent, meanwhile not unique, dimensionless numbers. Applying Buckingham's π theorem on a set of selected variables results in generating the number and the forms of the dimensionless groups. Many studies have focused on applications of the scaling theory and similarity laws for flow in porous systems [61,[65][66][67][68][69][70][71][72][73][74][75][76]. Initially, Leverett et al. [77] used the dimensionless groups in a research work on immiscible displacement of oil by water, which was later extended by Engleberts and Klinkenberg [78]. Asghari et al. [79] used the data provided by the previous performance of WF in the Weyburn field to generate an empirical correlation for estimating CO 2 injection performance. Two different correlations were introduced with respect to different injection scenarios in the Weyburn field. The primary correlation was developed according to the WAG injection in vertical wells and the second correlation was generated based on the horizontal well injection by applying the Kinder Morgan CO 2 Scoping model and utilizing the field production data. However, in their proposed model, only the oil production rate and CO 2 and water injection rates were accounted for, and other field or operational variables were not included. Liu et al. [80] suggested a new model for developing dimensionless CO 2 injection performance such as total injection (DTI), CO 2 injection (DCI), tertiary oil production (DEOR), and CO 2 production (DCP) for various WAG injection approaches. A Microsoft Excel VBA program (for injecting CO 2 pulses) was applied to develop the prototypes for forecasting the system performance. Their new methodology (pulse method) was verified using mechanistic simulation results of finite elements for different WAG injection processes or different CO 2 injection slug sizes, or both [80]. Jaber et al. [25] introduced a simple data-driven model to evaluate the miscible CO 2 -WAG injection performance in an Iraqi oil field. They employed a central composite design (CCD) to introduce a proxy model. They implemented an ANOVA to examine the effectiveness of the variables and their combinations within the model. The proposed proxy model determined the incremental oil recovery (∆FOE) as a function of reservoir properties and operational conditions including permeability, porosity, ratio of vertical to horizontal permeability, cyclic length, bottomhole pressure, ratio of injected CO 2 over water slug size, and injected CO 2 slug size.

Fundamentals of GEP
Current genetic algorithm (GA) and genetic programing (GP) techniques have been widely used in common regression engineering problems such as function fitting and time series predictions [81][82][83]. In the GA algorithm, individuals are represented as the linear strings with specified lengths (chromosomes) during all evolution levels. This prevents GA from being applied in some complex-function fitting problems [84]. However, in the GP algorithm, individuals are represented as nonlinear elements with different sizes and forms. The GP algorithm is generally able to analyze complicated functions; however, in some cases different object size variations hinder the evolutionary procedure to obtain an optimum solution. On the other hand, gene expression programming (GEP) is a new method in developing computer programs with a focus on learning models and discovering knowledge [43]. GEP combines the features of the GA and GP algorithms, while the chromosome encoding makes it different from these two evolutionary approaches. In the GEP algorithm, the individuals are encoded as the chromosomes and are considered as the linear strings with specified lengths [43]. GEP is more flexible and rigorous in probing the search space by separating the genotype and phenotype. The designed chromosomes in GEP are simple and linear. GEP demonstrates significant advantages over its peers; for instance, compared to the GA, GEP performs approximately 2-4 orders of magnitude faster in analyzing the conventional problems due to its unique individual functioning [43]. GEP is a mature genotype/phenotype system in which the genotype is completely separated from the phenotype. However, in the GP technique, the genotype and phenotype are one element; indeed, it is a simple replicator system. GEP works based on two elements (e.g., chromosomes and ETs). The chromosomes encode the candidate solution and transform the actual solution candidates into an ET. The transition process of the chromosomes into an ET is inspired by the biological process of genes encoded (in DNA) into proteins [85].

Data Collection
The accuracy and reliability of any proposed correlation are strongly linked to the identified appropriate input variables as well as the validity and reliability of employed data points used in the development stage [36,86]. The most important parameters affecting a WAG injection process include the fluid and rock properties and operational conditions. Due to the unavailability of adequate laboratory data for near-miscible WAG flooding processes, a suitable and reliable mathematical model is used [20]. The mathematical simulator approach is a two-dimensional model, with an implicit-pressure-explicit-saturation (IMPES) solution scheme. In the following subsections, the main and auxiliary equations required for developing the mathematical model are provided. For more details about the model development, technical readers may visit the study conducted by Afzali et al. [20].

Governing and Auxiliary Equations
For a three-phase and incompressible flow system, the governing equations to be solved include the mass or molar balances for all present fluids. The final form of the governing equations for an immiscible system with three phases (e.g., oil, water, and gas), after neglecting the phase dispersion and gravity effects and applying the Darcy's Law (as the momentum balance equation) can be written as follows: Equation (2) introduces the relationship between the saturations of three phases existing in the system: For an accurate simulation of a WAG injection process in which three phases are involved, selecting reliable relative permeability and capillary-pressure models accountable for three-phase flow systems is essential. Using direct measurement methods of capillary pressure data (for three-phase systems) from laboratory coreflood tests is a tedious and practically difficult process [87].
The pressure difference between two nonmiscible phases (i.e., nonwetting and wetting phases) is called capillary pressure (p c ) (see Equation (3)). Capillary pressure is a function of fluids' saturations and distribution, and porosity and permeability are the rock characteristics.
The descriptions of all parameters of Equations (1)-(3) are provided in Table 1.
In the current work, we apply a capillary pressure model appropriate for three-phase systems, suggested by Neshat and Pope [88], derived based on the Gibbs free energy. The proposed model is an extended/modified version of a two-phase capillary pressure model introduced by Skjaveland et al. [89]. This capillary pressure model is capable of being applied on all wettability states including water-wet, mixed-wet, and oil-wet systems. The general form and description of the parameters of the employed capillary pressure model are presented by Equation (A1), as given in Appendix A.  (3)).
Density of the fluid

Model Assumptions and Limitations
The following assumptions are considered to develop the mathematical model: The flow direction is considered as 1D horizontal in the system.

•
The core and fluids are considered incompressible.

•
Core is strongly water-wet and homogeneous.

•
The equilibrium of capillary forces is held in the system.

•
The temperature of the system is 38 • C and the thermal equilibrium holds in the system.

•
The capillary end effects are neglected.
In the current work, we simulate an experimental case, in which a 2-inch core sample was utilized.
Since the diameter of the core is small, the gravity effects are insignificant, and assuming one dimensional flow is satisfactory for simulations. For each cycle, the two-phase relative permeability parameters are needed to be tuned and optimized. To do so, experimental two-phase data are used for tuning the two-phase relative permeability models. Afterwards, the tuned two-phase relative permeability models are utilized in the threephase relative permeability model. For each of the WF and GI injection modes in the WAG flooding model, these parameters are optimized repeatedly to incorporate the hysteresis involved in the saturation of phases in the cyclic injection of fluids.
In the GEP algorithm, the number of constants per gene determines the maximum number of constants that can be allocated for a gene. The higher the number of constants, the more complicated the model is and a more accurate algorithm is obtained. Therefore, there should be a balance between the accuracy and degree of complexity while selecting the optimized GEP configuration. The optimal parameters are normally found randomly for development of any new correlation through GEP. Another limitation of the proposed model is impotence of the system analysis during each injection mode. Since the developed correlation only reports the recovery factor at the end of each cycle or injection mode (N = 1-3), other vital features such as instant RF and breakthrough time are not detected or determined.

Design of Experiment (DOE)
Due to the three-phase flow in the porous medium and the cyclic nature of the WAG flooding, the number of influential variables is higher than other conventional oil recovery techniques. Using a design of experiment (DOE) approach can reduce the computational costs required for mathematical modeling runs. The key aspect of a DOE is the selection of controlling factors. In this study, oil viscosity (µ o ), gas and water injection rates (q g and q w ), system permeability (K), pore volume injection of fluids (PVI), and number of cycles (N) are chosen to determine the dimensionless numbers and their significance. Using a two-level full factorial DOE, each parameter is studied at two levels with the upper and lower bounds coded as +1 and −1. Table 2 lists the upper and lower levels of all variables.
In the case of six factors (variables) at two levels, the model run design is called 2 6 full factorial design. Thus, a total of 96 runs are required with no replicates of each run to obtain the response.

Dimensionless Scaling Groups
As mentioned before, having an adequate knowledge of WAG processes helps use Buckingham's π theorem and derive the dimensional groups in the dimensional analysis. We consider six variables, four fixed parameters, and the recovery factor (RF) as the response variable to obtain the dimensionless numbers (see Table 3). Table 3. Variables needed to develop the dimensionless numbers.

Variables
Fixed Parameters Response Variable Hence, seven dimensionless numbers are introduced using Buckingham's π theorem, which are listed below: where σ og and σ ow denote the interfacial tension between the oil-gas and oil-water phases, respectively; µ i refers to the viscosity of the phase i; q w and q g represent the water and gas injection rates; PVI is the pore volume injection during each injection mode (WF or GI); and N introduces the number of injected cycles.

Analysis of Variance (ANOVA)
Lorenzen and Anderson [90] reported that ANOVA is the most accurate method to investigate the significant effects of factors. In the ANOVA table, the F test and p values represent the main and interaction effects, respectively [91]. The p value denotes the probability of error involved in the obtained results [92,93]. Accordingly, the smaller the values of p, the more significant the corresponding coefficient term is [94]. The p value corresponds to an α value of 0.05. For a factor with α lower than 0.05, the factor is considered as a significant parameter. In this study, we perform an ANOVA analysis for the simulation results and corresponding dimensionless numbers. Table 4 presents the results of the ANOVA to statistically assess the significance of each factor. Thus, the relevancy and importance of the input variables in a WAG flooding process can be determined through implementing the ANOVA.

GEP Procedure
As previously mentioned, the GEP algorithm uses two entities: the ETs and chromosomes. A chromosome consists of constant and variable terminals as well as prearranged functions in one or more genes with equal lengths [85]. The function and variables are the input data, while the constant values are generated by the algorithm within a range specified by the user. Each gene contains a head made of functions, variables, and constants, and a tail of terminals [85]. The size of the head (h) is specified by the user; however, the size of the tail (t) is computed as a function of "h" and a parameter "n", which is defined as the number of elements in the function sets. The tail size of a chromosome can be obtained using the following equation: t = h(n − 1) + 1 (11) where t and h are the tail and head of the gene; and n represents the number of elements of the function utilized in the head of the gene. Figure 2 demonstrates an example of a two-gene chromosome containing four functions of ×, ÷, +, and √ , and three terminals including a, b, and c. In Figure 2, both the mathematical and the equivalent expression tree forms using Kara language are illustrated.  [46]).
Every character is set in one spot from zero to seven, which is shown by 01234567. In the case of multigenetic chromosomes, all ETs are connected by their root nodes through a linking function such as Boolean function [95]. The computational procedure of the GEP algorithm is summarized in the following steps [45]: 1. Initializing the population through generating random chromosomes of a certain number of individuals. 2. Fitting the population individuals according to the fitness functions. 3. Selecting some individuals and copying them for the next generation based on their fitness (simple elitism) [96]. 4. Applying the same procedure for the new population including the selection of the environment, expression of genomes, selection of the population individuals, and reproduction with modification. 5. Repeating the previous steps until the termination criteria are met.

Model Development Steps
Using the GEP algorithm, there is no need to assume prespecified correlation formats for accurate prediction of target data. Hence, the GEP's computation process finds the most accurate forms of independent parameters by itself. As previously discussed, the main parameters affecting the RF of a WAG injection process are the oil viscosity, water and gas injection flowrates, permeability of the system, PVI, and the number of injected cycles (N). Therefore, the variables are used to generate the dimensionless numbers; these groups are assumed to be correlating parameters for predicting the RF of WAG flooding: , , , Fixed Parameters) where = ( ).
The following steps are taken to develop a new WAG RF correlation: 1. Generating the population using random chromosome individuals and applying correlation formats as pars trees using the functions or operators (×, +, −), and terminals which are functions of input variables and output results (RF of WAG). 2. Computing the fitness value for each individual of the generated population using the following objective function (OF): Every character is set in one spot from zero to seven, which is shown by 01234567. In the case of multigenetic chromosomes, all ETs are connected by their root nodes through a linking function such as Boolean function [95]. The computational procedure of the GEP algorithm is summarized in the following steps [45]: 1.
Initializing the population through generating random chromosomes of a certain number of individuals.

2.
Fitting the population individuals according to the fitness functions.

3.
Selecting some individuals and copying them for the next generation based on their fitness (simple elitism) [96].

4.
Applying the same procedure for the new population including the selection of the environment, expression of genomes, selection of the population individuals, and reproduction with modification.

5.
Repeating the previous steps until the termination criteria are met.

Model Development Steps
Using the GEP algorithm, there is no need to assume prespecified correlation formats for accurate prediction of target data. Hence, the GEP's computation process finds the most accurate forms of independent parameters by itself. As previously discussed, the main parameters affecting the RF of a WAG injection process are the oil viscosity, water and gas injection flowrates, permeability of the system, PVI, and the number of injected cycles (N). Therefore, the variables are used to generate the dimensionless numbers; these groups are assumed to be correlating parameters for predicting the RF of WAG flooding: where RF WAG = f (π i ).
The following steps are taken to develop a new WAG RF correlation: 1.
Generating the population using random chromosome individuals and applying correlation formats as pars trees using the functions or operators (×, +, −), and terminals which are functions of input variables and output results (RF of WAG).

2.
Computing the fitness value for each individual of the generated population using the following objective function (OF): (13) where N is the number of data points used for the GEP implementation, and subscriptions "prd" and "exp" denote the RF values predicted by the GEP algorithm and the RF values generated by the verified mathematical model to be used as the experimental (or target) data, respectively.

1.
Selecting some individuals and copying them into the next generation based on their fitness (simple elitism). In this work, the tournament method is employed to select adequate varieties of the population in each generation [44,45].

2.
Applying the genetic operators on selected chromosomes, including: -Replication operator: This operator copies the chromosome's structure selected in step 3. -Mutation operator: As the most important step in the GEP algorithm, the mutation can occur anytime and at any position in a genome, as long as the mutated chromosome meets the validity criteria. The mutation operator changes the head and tail terminals, while the original structure of the chromosome is preserved. -Inversion: The inversion operator is only applied to the heads of genes, where any sequence is randomly selected and employed. The inversion operator selects the chromosome, the gene to be modified, and the initiation and termination points of the sequence to be inverted at random.

3.
Transposition and insertion sequence elements: A portion of the genomes, which can be activated and jump to another place in the chromosome, are called the transposable elements of the GEP program. Ferreira [45] divided these elements into three types: "short fragments with either a terminal or function in the first position transpose to the head of genes, short fragments with a function in the first position that transpose to the rest of the head of genes (root IS elements or RIS elements), and entire genes that transpose to commencing of chromosomes." 4.
Recombination: This step normally involves two parent chromosomes to produce two new chromosomes through combining various parts of the parents through three approaches: linking one-point recombination, two-point recombination, and gene recombination [35]. Accordingly, the new generation will be reproduced, and the procedure is continued until the termination criteria are met.
In this study, the data are distributed into two categories: training and testing/validating data subsets by a ratio of 67% and 33%, respectively. The training phase is carried out for the developing the model. After this stage, the validation data set is used to assess the validity of the model. Figure 3 illustrates a schematic flowchart of the procedure applied in this study to develop a correlation for the RF of WAG. Energies 2021, 14, x FOR PEER REVIEW 12 of 29 Figure 3. A schematic flowchart of workflow for developing the RF correlation of WAG in this research. Figure 3. A schematic flowchart of workflow for developing the RF correlation of WAG in this research.

Model Development
In order to develop a new reliable correlation for estimating the RF of a WAG injection using the GEP approach, some important factors/parameters in the GEP strategy including population size, number of chromosomes, head size, number of genes, type of the fitness function, map operators, and number of constants per genes, are required. The optimal or adjusted parameters for this work are listed in Table 5. For instance, the number of constants per gene determines the maximum number of constants that can be allocated for a gene. The higher the number of constants, the more complicated and accurate the algorithm becomes. Therefore, a balance should be reached between the accuracy and degree of complexity while selecting the optimized GEP configuration. The optimal parameters are normally found randomly for the development of any new correlation through GEP. In this modeling method, 67% of data for the oil recovery factor is allocated to the training step. Then, 33% of the entire database is considered as unseen data for the testing phase. It is found that the model is satisfactory and can predict the unseen data within almost the same accuracy obtained in the training phase. The model is developed based on the GEP algorithm using the GeneXproTools software [97]. To examine the accuracy of the newly developed model, a systematic error analysis should be conducted. The statistical parameters including mean square error (MSE), root-mean-square error (RMSE), mean absolute error (MAE), residual standard error (RSE), relative absolute error (RAE), and coefficient of determination (R 2 ) are used for statistical analysis of the developed model. Thus, the error analysis can effectively test the reliability and accuracy of the proposed model. For instance, the R 2 parameter demonstrates the degree of a match between the target data generated by the mathematical model and the calculated RF data using the newly proposed correlation. Equations (A2) to (A6) in Appendix A express the mathematical formulas of the statistical measures used in this study. Table 6 presents the results of statistical error analysis for both the training and testing phases. The low values of error as well as the high magnitudes of correlation of determination (R 2 Training = 0.93 and R 2 Testing = 0.92) for both phases confirm the effectiveness and precision of the new GEP correlation. Moreover, cross plots or parity diagrams and residual scattering error-distribution plots are provided to graphically investigate the error analysis. After conducting the ANOVA test, the selected dimensionless numbers (π 1 to π 7 ) are introduced as the variables to the GEP simulator for generating the correlation. The final form of the developed correlation using the GEP algorithm for predicting the recovery factor of a WAG process is expressed as follows: The constant values of Equations (14)- (18) are listed in Table 7.  (14)- (18).

Constant Value
−7.5887 C 11 6.7068 C 10 2.2652 C 28 11.5608 C 29 0.7480 C 22 −59.7849 C 35 −7.7281 C 32 9.5931 C 49 0.0875 C 42 0.4019 Table 8 shows the statistical information of the input variables, i.e., dimensionless numbers (π 1 to π 7 ) including the minimum, maximum, standard deviation, slope, intercept, correlation, and R-square versus the response variable which is the RF. Figure 4 demonstrates the newly developed correlation in the form of an expression tree (ET) diagram. The generated ET consists of four sub-ETs (four genes) and each sub-ET is linked with the "addition" operator to the others. The input values to the model (π 1 to π 7 ) are expressed as d 0 to d 6 within the sub-ETs.

Relative Importance (RI) of Input Variables
The importance of each input variable (e.g., dimensionless numbers) used for developing a correlation is associated with the weight and effect of the variable on the objective function; this is important for better design and optimization of the corresponding operation. The relative importance of the dimensionless numbers generated in this study is depicted in Figure 5. According to Figure 5, π 7 , or the number of WAG cycles, has the highest impact on the developed correlation with a relative importance (RI) of 37.14%. This is logical since through injecting consecutive WAG injection cycles (and thereby increasing the number of cycles (N)), a higher recovery factor is expected from the porous system. After π 7 , π 3 and π 4 are reported as the most important parameters. According to Figure 5, there is a noticeable difference between the ratio of oil to gas viscosities (π 3 ) and the viscosity ratio of oil to water (π 4 ) in terms of variable importance. This is due to a considerable viscosity difference between the hydrocarbon phase and the injected gas phase, compared to the oil and water viscosity difference. Moreover, the injection rate ratio of the water to the gas (π 5 ), inverse of oil-water capillary number (π 1 ), inverse of oil-gas capillary number (π 2 ), and PVI injection (π 6 ) have relative importance values of 2.30%, 1.89%, 1.69%, and 1.52%, respectively.

Relative Importance (RI) of Input Variables
The importance of each input variable (e.g., dimensionless numbers) used for developing a correlation is associated with the weight and effect of the variable on the objective function; this is important for better design and optimization of the corresponding operation. The relative importance of the dimensionless numbers generated in this study is depicted in Figure 5. According to Figure 5, , or the number of WAG cycles, has the highest impact on the developed correlation with a relative importance (RI) of 37.14%. This is logical since through injecting consecutive WAG injection cycles (and thereby increasing the number of cycles (N)), a higher recovery factor is expected from the porous system. After , and are reported as the most important parameters. According to Figure 5, there is a noticeable difference between the ratio of oil to gas viscosities ( ) and the viscosity ratio of oil to water ( ) in terms of variable importance. This is due to a considerable viscosity difference between the hydrocarbon phase and the injected gas phase, compared to the oil and water viscosity difference. Moreover, the injection rate ratio of the water to the gas ( ), inverse of oil-water capillary number ( ), inverse of oil-gas capillary number ( ), and PVI injection ( ) have relative importance values of 2.30%, 1.89%, 1.69%, and 1.52%, respectively. In Figure 6, the cross plots show a part of the error analysis for the training and testing phases of the employed connectionist tool. As is clear from Figure 6 (panels a and b), a good match is observed between the target data and RF predictions based on the newly developed correlation. The magnitudes of the coefficient of determination R 2 = 0.9285 for In Figure 6, the cross plots show a part of the error analysis for the training and testing phases of the employed connectionist tool. As is clear from Figure 6 (panels a and b), a good match is observed between the target data and RF predictions based on the newly developed correlation. The magnitudes of the coefficient of determination R 2 = 0.9285 for the training phase, and the coefficient of determination R 2 = 0.9193 for the testing phase confirm that the proposed correlation is reliable and accurate for predicting oil recovery factor in a WAG injection process. the training phase, and the coefficient of determination R 2 = 0.9193 for the testing phase confirm that the proposed correlation is reliable and accurate for predicting oil recovery factor in a WAG injection process.  Figure 7 shows the residual plot with the residual error, (Equation (19)) on the yaxis and the predicted recovery factor values on the x-axis.
According to Figure 7, for both testing and training phases, the absolute residual error values are less than 0.1. It is concluded that the obtained data are unbiased within an average of approximately zero around the zero residual line; it also indicates a homoscedastic behavior (constant variance).   Figure 7 shows the residual plot with the residual error, e i (Equation (19)) on the y-axis and the predicted recovery factor values on the x-axis.
the training phase, and the coefficient of determination R 2 = 0.9193 for the testing phase confirm that the proposed correlation is reliable and accurate for predicting oil recovery factor in a WAG injection process.  Figure 7 shows the residual plot with the residual error, (Equation (19)) on the yaxis and the predicted recovery factor values on the x-axis.
According to Figure 7, for both testing and training phases, the absolute residual error values are less than 0.1. It is concluded that the obtained data are unbiased within an average of approximately zero around the zero residual line; it also indicates a homoscedastic behavior (constant variance).  According to Figure 7, for both testing and training phases, the absolute residual error values are less than 0.1. It is concluded that the obtained data are unbiased within an average of approximately zero around the zero residual line; it also indicates a homoscedastic behavior (constant variance).

Evaluation of Developed Correlation
To assess the performance of the newly developed correlation (Equations (14)-(18)), the predicted recovery factors of a three-cycle WAG injection process are compared with the experimental data and the results obtained by the mathematical model. In the selected experiments, the WAG flooding process was conducted at the near-miscible condition (T = 38 • C, and p = 12.7 MPa), and in a strongly water-wet sandstone starting with a primary waterflooding (two-phase flow where s gi = 0). The process was then followed with the first gas injection in which the first cycle of the WAG injection was complete (N = 1). The consecutive flooding of water and gas continued for three cycles (N = 3). The process was terminated after the third gas injection where no significant amount of oil was recovered from the porous system. In the experiments, the process was conducted at a WAG ratio of 1:1 with a constant gas injection rate (e.g., q inj = 25 cm 3 /h). The RF against the number of injected cycles (N), based on the experimental data, predicted values by the new correlation, and the estimated values by the mathematical model, is presented in Table 9 and Figure 8. Table 9. Comparison of RF and relative errors of WAG injection generated by the developed correlation, mathematical model, and experimental work.

Evaluation of Developed Correlation
To assess the performance of the newly developed correlation (Equations (14)-(18)), the predicted recovery factors of a three-cycle WAG injection process are compared with the experimental data and the results obtained by the mathematical model. In the selected experiments, the WAG flooding process was conducted at the near-miscible condition (T = 38 °C, and p = 12.7 MPa), and in a strongly water-wet sandstone starting with a primary waterflooding (two-phase flow where sgi = 0). The process was then followed with the first gas injection in which the first cycle of the WAG injection was complete (N = 1). The consecutive flooding of water and gas continued for three cycles (N = 3). The process was terminated after the third gas injection where no significant amount of oil was recovered from the porous system. In the experiments, the process was conducted at a WAG ratio of 1:1 with a constant gas injection rate (e.g., qinj = 25 cm 3 /h). The RF against the number of injected cycles (N), based on the experimental data, predicted values by the new correlation, and the estimated values by the mathematical model, is presented in Table 9 and Figure 8.   After the primary water flooding (N = 0.5), the recovery factor of RF = 50.00% was obtained in the experimental run. The mathematical model can compute a magnitude of RF = 50.08%, confirming the accuracy and reliability of the numerical model. The newly developed correlation forecasts a RF = 59.35% at the end of the primary waterflooding. Following the production operation by conducting the first gas injection (N = 1), the experiment resulted in the RF= 65.59%; this was predicted by the mathematical model with a value of RF = 64.64%, implying excellent agreement. The proposed correlation predicts the RF = 69.20% for this stage. After that, two more cycles were injected into the porous medium and the final RF = 93.58%, 92.00%, and 90.32% were obtained through the experimental phase, numerical model, and newly developed correlation, respectively. The details of the RF at each injection mode are given in Table 9. The relative errors in estimating the RF at different cycles (N = 1 to 3) are also listed in Table 9. According to Table 9 and Figure 8, it is found that the correlation developed by the GEP algorithm is able to successfully predict the RF of the intermediate cycles, and more importantly, the ultimate RF with a relative error of 3.48% at N = 3.

Effect of Capillary Number
The efficiency of oil recovery techniques depends on the interplay between various forces at the pore scale and the macroscopic scale [98]. The capillary and viscous forces are not in favor of each other when implementing an EOR/IOR process. The capillary forces are the cause of fluid entrapment during immiscible/near-miscible processes. However, viscous forces of the injecting phase act in opposite to the capillary forces. The capillary forces are affected by the interfacial tension (IFT) between phases, wettability state of the system, and the pore geometry in which the blob entrapment of phases occurs. On the contrary, viscous forces are controlled by the rock permeability, applied pressure drop, and the viscosity of the injected fluid (the displacing phase) [99]. In the oil and gas industry, capillary desaturation curves are well recognized for highlighting the properties and geometry of the porous systems as well as the fluids distribution within the pores [100]. Increasing the capillary number has always been set as a target for designing EOR/IOR processes in order to achieve a higher oil recovery from reservoirs. Capillary number (N ca ) is defined as the ratio of viscous forces to capillary forces [101]. There are numerous expressions for the capillary number (N ca ). Among the proposed versions, the capillary number introduced by Saffman and Taylor [102] is the most common form, as given below: where the v is the superficial velocity; µ stands for the viscosity of the displacing phase; and σ refers to the interfacial tension between fluids. In this work, after applying Buckingham's π theorem, variations (inverse) of the capillary number between the oil-water (N ow ca ) and oil-gas (N og ca ) systems are generated as follows: where σ og and σ ow denote the interfacial tension between the oil-gas and oil-water phases, respectively; µ i refers to the viscosity of phase i; q w and q g represent the water and gas injection rates; and K is the permeability of the medium. Since π 1 and π 2 show approximately the same relative importance within the developed correlation for predicting the RF of the WAG flooding process, a sensitivity analysis on the impact of π 1 on the RF of a WAG injection is conducted. To investigate the impact of capillary number (the inverse of π 1 ) on oil RF, the results of WAG RF for three cycles (N = 1, 2, 3) at three orders of magnitude of π 1 are compared in Figure 9. According to the results presented in Figure 9, by increasing the π 1 from the initial value of 8. 16  RF of a WAG injection process decreases by 16.92% upon a decrease in the capillary number by four orders of magnitude. The same trend of RF reduction at higher values of is noticed at the end of the first and middle injection cycles. Increasing by four orders of magnitude (corresponding to decreasing the capillary number by four orders of magnitude) lowers the ultimate RFs of the first (N = 1), and the second (N = 2) cycles by 22% and 23.1%, respectively. This result highlights the dominancy of viscous forces at high capillary numbers, resulting in more oil trapping in the porous medium and a decrease in oil RF during various cycles of a WAG flooding process.

Effect of Viscosity Ratio ( , )
The viscosity of fluids in a three-phase flow of oil, water, and gas affects the mobility ratio (M) of the displaced (oil) and displacing phases, residual oil saturation, and finally the recovery factor of a WAG flooding process. The mobility ratio is defined as follows: In Equation (23) and represent the relative permeability of the displacing (water or gas) and displaced (oil) phases, respectively. and denote the viscosities of the displacing and displaced phases.
In this research, two of the introduced dimensionless groups describe the ratio of oil to gas, and oil to water viscosities ( = , = ), which considerably affect the RF of the WAG flooding. To evaluate the impact of the viscosity ratio, WAG simulations are conducted at different values of and (Figures 10 and 11). All cases are simulated using the developed correlation in all three cycles (N = 1, 2, 3), where the rest of the dimensionless groups are fixed at the base condition corresponding to the experimental data. We evaluate the performance of the WAG flooding process at three values of : 1.59 (the base value), 2, and 3. The simulation outputs are shown in Figure 10. The RF results reveal that in all cycles, by increasing the value, the RF decreases. When increases from 1.59 (RF ultimate = 90.32%) to 2 (RF ultimate = 84.14%), and 3 (RF ultimate = 76.66%), Figure 9. Effect of oil inverse capillary number (π 1 ) on the RF of the WAG injection using the GEP correlation.

Effect of Viscosity Ratio (π 3 , π 4 )
The viscosity of fluids in a three-phase flow of oil, water, and gas affects the mobility ratio (M) of the displaced (oil) and displacing phases, residual oil saturation, and finally the recovery factor of a WAG flooding process. The mobility ratio is defined as follows: In Equation (23) k ring and k red represent the relative permeability of the displacing (water or gas) and displaced (oil) phases, respectively. µ ing and µ ed denote the viscosities of the displacing and displaced phases.
In this research, two of the introduced dimensionless groups describe the ratio of oil to gas, and oil to water viscosities (π 3 = µ o µ g , π 4 = µ o µ w ), which considerably affect the RF of the WAG flooding. To evaluate the impact of the viscosity ratio, WAG simulations are conducted at different values of π 3 and π 4 (Figures 10 and 11). All cases are simulated using the developed correlation in all three cycles (N = 1, 2, 3), where the rest of the dimensionless groups are fixed at the base condition corresponding to the experimental data. We evaluate the performance of the WAG flooding process at three values of π 3 : 1.59 (the base value), 2, and 3. The simulation outputs are shown in Figure 10. The RF results reveal that in all cycles, by increasing the π 3 value, the RF decreases. When π 3 increases from 1.59 (RF ultimate = 90.32%) to 2 (RF ultimate = 84.14%), and 3 (RF ultimate = 76.66%), the ultimate recovery factors of WAG injection after three cycles of injection decrease by 6.84% and 15.12%, for π 3 = 2, and π 3 = 3, respectively. Increasing π 3 also decreases the oil recovery at the first and second injection cycles, significantly. The RF results at each cycle are provided in Table 10. The results are consistent with the previous studies in which the greater viscosity gap between the oil and gas leads to unfavorable high mobility ratio, bypassing the oil bank (gas channeling), and early gas breakthrough [20].    . Impact of oil to gas viscosity ratio ( ) on the recovery performance of near-miscible WAG injection based on the GEP correlation. Figure 11. Impact of oil to gas viscosity ratio (π 4 ) on the recovery performance of near-miscible WAG injection based on the GEP correlation. The sensitivity analysis is also performed, considering different values of π 4 = µ o µ w . The simulations are conducted using the experimental conditions and at three cycles of consecutive injections of water and gas for three values of π 4 : 0.061 (reference value corresponding to the experimental condition), 0.10, and 0.12. The outputs of the simulations for three cycles of WAG injection are demonstrated in Figure 11 and Table 10. According to the simulation outputs of the WAG process using the GEP correlation (Equations (14)-(18)), increasing the values of π 4 from 0.061 to 0.10, and 0.12, lowers the ultimate RF by 6.72% and 10.11%, respectively. This appears to be logical as the π 4 (the ratio of oil to water viscosity) increases the mobility ratio of the oil (as the displaced phase) and water (as the displacing phase) which is unfavorable, resulting in front instability of fluids and considerable residual oil saturation. Comparing Figures 10 and 11 also highlights the superior impact of π 3 on RF of WAG injection over the impact of π 4 ; this is also confirmed while assessing the relative importance of the variables (dimensionless groups). This is due to a higher viscosity difference between oil and gas compared to the difference of oil-water viscosities, leading to higher potential of oil trapping and early breakthrough, and thereby the RF is affected more (greater reduction) at higher values of π 3 , compared to π 4 .

Summary and Conclusions
In this study, we develop a new correlation for predicting the recovery factor (RF) of a near-miscible water alternating gas (WAG) injection process. The model is developed using dimensionless groups made of the key fluid, rock, and process characteristics that impact the RF of a near-miscible WAG flooding process. Seven dimensionless groups are generated using the dimensional analysis approach through employing Buckingham's π theorem. The dimensionless numbers are used as the input variables for the newly introduced evolutionary algorithm gene expression programming (GEP) model to develop a reliable predictive tool for the RF of WAG processes. The accuracy of the proposed correlation is verified using modeling results and data from a laboratory case study taken from the literature in which near-miscible WAG flooding was conducted on a strongly water-wet sandstone core sample at 38 • C and 12.7 MPa. Based on the error analysis, the newly proposed GEP model shows a very good match with the target data. For example, R 2 = 92.85% and MSE = 1.38 × 10 −3 are attained for the training phase. The results of the relative importance (RI) of the input variables indicate that the number of injected cycles (N, π 7 ) has the highest impact on the developed correlation with an RI of 37.14%. After π 7 , π 3 and π 4 are reported as the most essential parameters with RI values of 36.61%, 18.84%, and 2.30%, respectively.
The predicted recovery factors of a three-cycle WAG injection process are compared with the experimental data; the results obtained by the mathematical model show that the correlation developed by the GEP algorithm is able to successfully predict the RF of the intermediate cycles, and more importantly, the ultimate RF with a relative error of 3.48% at N = 3.
According to the sensitivity analysis, increasing oil-water capillary number leads to an increase in RF for all cycles. In addition, an increase in the magnitudes of oil to gas viscosity ratio or oil to water viscosity ratio causes a reduction in RF of each cycle in the WAG flooding process. It is found that the viscosity ratio of oil and gas has a greater influence on the RF value, compared to viscosity ratio of oil and water because of a greater viscosity difference between the oil and gas phases.
For future work, due to the lack of predictive tools for WAG injection processes, it is recommended to conduct similar procedures to develop empirical correlations for WAG injection in other porous systems such as fractured and/or heterogeneous porous media. The presented results in this work are based on experimental data of a near-miscible WAG injection in a strongly water-wet sandstone core. This research can be extended for other miscibility conditions such as miscible/immiscible, at other wettability states and rock lithologies.