Intelligent Prediction of Minimum Miscibility Pressure (MMP) During CO2 Flooding Using Artificial Intelligence Techniques

Carbon dioxide (CO2) injection is one of the most effective methods for improving hydrocarbon recovery. The minimum miscibility pressure (MMP) has a great effect on the performance of CO2 flooding. Several methods are used to determine the MMP, including slim tube tests, analytical models and empirical correlations. However, the experimental measurements are costly and time-consuming, and the mathematical models might lead to significant estimation errors. This paper presents a new approach for determining the MMP during CO2 flooding using artificial intelligent (AI) methods. In this work, reliable models are developed for calculating the minimum miscibility pressure of carbon dioxide (CO2-MMP). Actual field data were collected; 105 case studies of CO2 flooding in anisotropic and heterogeneous reservoirs were used to build and evaluate the developed models. The CO2-MMP is determined based on the hydrocarbon compositions, reservoir conditions and the volume of injected CO2. An artificial neural network, radial basis function, generalized neural network and fuzzy logic system were used to predict the CO2-MMP. The models’ reliability was compared with common determination methods; the developed models outperform the current CO2-MMP methods. The presented models showed a very acceptable performance: the absolute error was 6.6% and the correlation coefficient was 0.98. The developed models can minimize the time and cost of determining the CO2-MMP. Ultimately, this work will improve the design of CO2 flooding operations by providing a reliable value for the CO2-MMP.


Introduction
Enhanced oil recovery (EOR) processes are used to create favorable conditions for producing oil, through interfacial tension (IFT) reduction, wettability alteration or decreasing the oil viscosity [1]. CO 2 injection is one of the most practical and effective EOR methods because it significantly reduces the oil viscosity and improves the sweep efficiency [2,3]. The minimum miscibility pressure (MMP) plays a significant role in designing the gas flooding operations. The minimum miscibility pressure can be measured using slim tube tests [4]. However, the laboratory measurements are costly and time-consuming [5]. MMP can also be determined using empirical correlations, but these correlations can lead to significant deviations, where the absolute error can reach up to 25% [6][7][8][9][10]. Several empirical correlations were proposed to estimate the MMP based on the reservoir condition and the fluids compositions. The empirical correlations were developed utilizing regression approaches. The common correlations are the Glaso correlation, Sebastian et al. correlation and Khazam et al. correlation [6,8,9]. Generally, the accuracy of the empirical correlations is increasing as the mathematical complexity of the equation increases. Most of the empirical correlations are used mainly for the fast screening applications [10].
Rostami et al. [34] applied the support vector machine (SVM) technique to estimate the CO 2 -MMP during CO 2 flooding. SVM was used to determine the CO 2 -MMP for live and dead crude oil systems. The developed model showed an accurate prediction with the average absolute relative deviation (AARD) of less than 3% and minimum coefficient of determination (R 2 ) of 0.99. However, no direct equation is reported and the developed SVM model is considered as a black box model. Alfarge et al. [35] used laboratory measurements and field studies to characterize the CO 2 -flooding in shale reservoirs; more than 95 case studies were used. They constructed a proxy system to predict the incremental oil recovery based on the affecting parameters. The relationship between rock properties and incremental oil recovery were explained; the effect of permeability, porosity, total organic carbon content and fluid saturations on oil production was investigated. They mentioned that their findings could help to understand the complex recovery mechanisms during CO 2 -EOR operations.
Based on an intensive literature review, significant deviations between the measured and predicted CO 2 -MMP was observed. Analytical and empirical models can lead to considerable estimation errors. Artificial intelligence methods can improve the prediction performance for CO 2 -MMP. However, the available AI models were developed based on the hydrocarbon's injection, not CO 2 flooding data, which may lead to unreliable predictions. The difference between hydrocarbons injection and CO 2 -flooding is significant in terms of system disturbance and miscibility mechanisms [10,14,15]. Usually, hydrocarbons injection was implemented by injecting the same reservoir composition, while CO 2 -flooding introduces new components into the reservoir system which results in disturbing the reservoir system. First contact miscibility (FCM) is usually associated with hydrocarbon injection while injecting non-hydrocarbon fluids (such as CO 2 and N 2 ) leads to multiple contact miscibility (MCM) [10,15]. Considerable errors could be generated when hydrocarbons injection models are used to predict the CO 2 -flooding performance [34,35]. Therefore, looking for a reliable model to estimate the CO 2 -MMP based on actual CO 2 -flooding data is highly needed.
In this paper, a reliable approach is presented to determine the MMP during the CO 2 miscible flooding. Several artificial intelligence (AI) methods were studied, such as neural network, radial basis function, generalized neural network and fuzzy logic. The studied models investigate the significance of reservoir temperature, oil gravity, hydrocarbon composition and the injected-gas composition on the CO 2 -MMP. More than 100 data sets belonging to actual CO 2 -MMP experiments were used to develop and investigate the model reliability. This work introduces an effective approach for estimating the MMP during CO 2 -flooding, which could be used to refine the current numerical or analytical models and result in a better determination of the CO 2 -MMP.

Methodology
The data used were gathered from several published papers [7, 10,14,15,36,37]. The minimum miscibility pressures (MMP) were measured using slim tube tests. The used dataset covers a wider range of reservoir conditions and hydrocarbon compositions; the main inputs are fluid composition, reservoir temperature and molecular weights. The dataset was randomly categorized into two divisions, training group (70% of the total data set) and testing group (30% of the total data). Before developing the AI models, statistical analysis was conducted by determining the minimum, maximum, mean, mode and other parameters, as listed in Table 1. The temperature data is changing in a range of 229 • F with a minimum value of 71 • F, maximum of 330 • F and arithmetic mean of 185.67 • F. The MMP values are changing between 1100 psia to 5000 psia with an arithmetic mean of 2583.49 psia. The statistical dispersion for MMP results was measured by calculating the standard deviation, skewness and kurtosis, and values of 876.98, 0.21 and 2.20 were obtained, respectively, which indicates that the data points are spread out over a wider range of values. In addition, the frequency histograms were obtained for all data to give a rough estimation for the distribution density. The data set showed a multimodal pattern as shown in Figure 1. Finally, the correlation coefficient was determined to measure the strength and direction of the linear relationship between the input data and MMP data, Figure 2. Values of 0.7481, −0.493 and 0.1626 were obtained for temperature, mole fractions, and molecular weight, respectively, which indicates a weak linear relationship for both the mole fractions and molecular weight. Histogram plots indicate that most of the data set can be represented by the multimodal pattern. The correlation coefficient analysis reveals that the MMP has a weak relationship with the molecular weight, moderate relationship with the mole fractions, and a strong relationship with the system temperature. In addition, the frequency histograms were obtained for all data to give a rough estimation for the distribution density. The data set showed a multimodal pattern as shown in Figure 1. Finally, the correlation coefficient was determined to measure the strength and direction of the linear relationship between the input data and MMP data, Figure 2. Values of 0.7481, −0.493 and 0.1626 were obtained for temperature, mole fractions, and molecular weight, respectively, which indicates a weak linear relationship for both the mole fractions and molecular weight. Histogram plots indicate that most of the data set can be represented by the multimodal pattern. The correlation coefficient analysis reveals that the MMP has a weak relationship with the molecular weight, moderate relationship with the mole fractions, and a strong relationship with the system temperature.
The correlation coefficient analysis showed that the molecular weight of C7+ and the mole fractions of C2-C6 have a small effect on the MMP; however, those parameters are playing a significant role in controlling the MMP. Therefore, to improve these relationships the input data was transformed to different domains using different approaches (i.e., log, power, sigmoidal, etc.) until the best relationships, that have the highest correlation coefficient values, were obtained. Using the power model with power values of −1 and −0.5 for the molecular weight and the mole fraction, respectively, showed the best relationships between the input and output parameters (results are listed in Table 2).

Results and Discussion
Different artificial intelligence methods were used to obtain the optimum model that has the lowest average absolute percentage error (AAPE) for both the training and testing data and has the maximum correlation coefficient (R) value. Sensitivity analysis was performed to fine tune the model parameters; the most suitable models are reported in this paper.
Initially, the original data were used to predict the MMP; however, significant errors were observed for all AI models. For example, the ANN model gave an error of 41.39%, and fuzzy logic system showed an error of 26.14%. Therefore, data processing techniques were implemented, by filtering the data to remove the outlier based on the average values and standard deviation (SD). The input data were also transformed into another domain using a power model. Trial and error technique was used to determine the best combination for the input parameters. Mainly, the fluid composition was categized into two groups: the first group is the mole percentage of ethane to hexane (C2 to C6%) and the second group is the molecular weight of heptane plus (MW C7+). Then, the square root of the first group (C2 to C6%) and the reciprocal of the second group (MW C7) were used as input parameters. As results of that, the error was reduced significantly, for example, in the ANN model the error decreased from 41.392% to 9.682%. The results from the artificial intelligence techniques are discussed below. Moreover, the problem of local minima was avoided by running the AI models several times using different model parameters. The profile of the error for the training and the testing data sets were also monitored during the phase of model development and validation. The error profiles were used to avoid the model memorization and the local minima problems.  The correlation coefficient analysis showed that the molecular weight of C 7+ and the mole fractions of C 2 -C 6 have a small effect on the MMP; however, those parameters are playing a significant role in controlling the MMP. Therefore, to improve these relationships the input data was transformed to different domains using different approaches (i.e., log, power, sigmoidal, etc.) until the best relationships, that have the highest correlation coefficient values, were obtained. Using the power model with power values of −1 and −0.5 for the molecular weight and the mole fraction, respectively, showed the best relationships between the input and output parameters (results are listed in Table 2).

Results and Discussion
Different artificial intelligence methods were used to obtain the optimum model that has the lowest average absolute percentage error (AAPE) for both the training and testing data and has the maximum correlation coefficient (R) value. Appendix A illustrates the equations used to calculate the AAPE and R. Sensitivity analysis was performed to fine tune the model parameters; the most suitable models are reported in this paper.
Initially, the original data were used to predict the MMP; however, significant errors were observed for all AI models. For example, the ANN model gave an error of 41.39%, and fuzzy logic system showed an error of 26.14%. Therefore, data processing techniques were implemented, by filtering the data to remove the outlier based on the average values and standard deviation (SD). The input data were also transformed into another domain using a power model. Trial and error technique was used to determine the best combination for the input parameters. Mainly, the fluid composition was categized into two groups: the first group is the mole percentage of ethane to hexane (C 2 to C 6 %) and the second group is the molecular weight of heptane plus (MW C 7+ ). Then, the square root of the first group (C 2 to C 6 %) and the reciprocal of the second group (MW C 7 ) were used as input Sustainability 2019, 11, 7020 6 of 16 parameters. As results of that, the error was reduced significantly, for example, in the ANN model the error decreased from 41.392% to 9.682%. The results from the artificial intelligence techniques are discussed below. Moreover, the problem of local minima was avoided by running the AI models several times using different model parameters. The profile of the error for the training and the testing data sets were also monitored during the phase of model development and validation. The error profiles were used to avoid the model memorization and the local minima problems. Figure 3 illustrates a schematic of the ANN model developed for estimating the CO 2 -MMP. The model inputs are reservoir temperature (Temp.), the molecular weight of the heptane plus (MWC 7+ ) and the mole fraction of ethane to hexane (C 2 -C 6 %). The ANN model was trained using the seen data (training dataset), then the model becomes ready to determine the MMP for the testing data (unseen data). The model parameters were fine-tuned to minimize the AAPE and maximize the correlation coefficient. The ANN parameters were fine-tuned by changing the number of hidden layers and the number of neurons per each layer, and the best predictive models listed in Table 3. Three cases were reported, the number of hidden layers and neuron per each layer were varied to find the best ANN model. A minimum error of 7.22% and a relatively high correlation coefficient of 0.974 were obtained by using one hidden layer with 20 neurons. Figure 4 shows the predicted results against the actual values for training and testing data for visual validation. ). The ANN model was trained using the seen data (training dataset), then the model becomes ready to determine the MMP for the testing data (unseen data). The model parameters were fine-tuned to minimize the AAPE and maximize the correlation coefficient. The ANN parameters were fine-tuned by changing the number of hidden layers and the number of neurons per each layer, and the best predictive models listed in Table 3. Three cases were reported, the number of hidden layers and neuron per each layer were varied to find the best ANN model. A minimum error of 7.22% and a relatively high correlation coefficient of 0.974 were obtained by using one hidden layer with 20 neurons. Figure 4 shows the predicted results against the actual values for training and testing data for visual validation.

Fuzzy Logic System
Different cases were investigated to optimize the model parameters and achieve the best possible model, and the results are represented in Table 4 and Figure 5. Correlation coefficients (R) and absolute error (AAPE) for testing data were used to select the optimum model. Increasing the cluster radius led to better results, i.e., increase the R-values and decrease AAPE for the testing data. Increasing the number of iterations led to worse results that could be due to memorization, i.e., decreasing the R-values for testing data; 50 was selected as the optimum value for the iteration number. Case 7 has the lowest AAPE of 9.54%, and can be considered as the best possible model.

Fuzzy Logic System
Different cases were investigated to optimize the model parameters and achieve the best possible model, and the results are represented in Table 4 and Figure 5. Correlation coefficients (R) and absolute error (AAPE) for testing data were used to select the optimum model. Increasing the cluster radius led to better results, i.e., increase the R-values and decrease AAPE for the testing data. Increasing the number of iterations led to worse results that could be due to memorization, i.e., decreasing the R-values for testing data; 50 was selected as the optimum value for the iteration number. Case 7 has the lowest AAPE of 9.54%, and can be considered as the best possible model.

Fuzzy Logic System
Different cases were investigated to optimize the model parameters and achieve the best possible model, and the results are represented in Table 4 and Figure 5. Correlation coefficients (R) and absolute error (AAPE) for testing data were used to select the optimum model. Increasing the cluster radius led to better results, i.e., increase the R-values and decrease AAPE for the testing data. Increasing the number of iterations led to worse results that could be due to memorization, i.e., decreasing the R-values for testing data; 50 was selected as the optimum value for the iteration number. Case 7 has the lowest AAPE of 9.54%, and can be considered as the best possible model.

Generalized Neural Network
A generalized neural network (GRNN) was used to determine the CO 2 -MMP based on the reservoir temperature and the hydrocarbon composition. This network showed a good performance for predicting the minimum miscibility pressure. Table 5 and Figure 6 summarize the results obtained using the GRNN method. Several cases were investigated to optimize the model parameters. It was found that increasing the spread from 1 to 50 led to improving the R-value from 0.96 to 0.98. Different training functions were also tested, with "newgrnn" showing the highest performance among the others. The best GRNN model (Case 2) showed an error of 7.02% and R-value of 0.98.

Radial Basis Function
Different model parameters (goal, spread, the maximum number of neurons and number of neurons) were studied to achieve the optimum values. Generally, increasing the goal values mean increasing the model tolerance, leading to increases in the AAPE and a decreased R-value; the same trend was observed for the spread. Table 6 summarizes the obtained results for 5 cases. For this data set, the goal showed a minor effect in obtaining a better solution, and the value of 0.5 is selected for the goal. Reducing the spread has a positive effect in improving the solution, spread of 10 is selected.

Radial Basis Function
Different model parameters (goal, spread, the maximum number of neurons and number of neurons) were studied to achieve the optimum values. Generally, increasing the goal values mean increasing the model tolerance, leading to increases in the AAPE and a decreased R-value; the same trend was observed for the spread. Table 6 summarizes the obtained results for 5 cases. For this data set, the goal showed a minor effect in obtaining a better solution, and the value of 0.5 is selected for the goal. Reducing the spread has a positive effect in improving the solution, spread of 10 is selected. Increasing the MN (maximum number of neurons) led to memorization and then reduced the R-value, and 20 MN is selected as an optimum value. The number of neurons to add between displays (DF) has a small effect in improving the model accuracy; a DF of 1 is selected. Based on the previous analysis, the optimum case could be obtained by using the goal of 0.5, the spread of 10, MN of 10 and DF of 1, the obtained R is 0.98 and the AAPE is 6.56% (Case 4); the obtained results are shown in Figure 7.

Validation of the Developed Model
The radial basis network was utilized to extract an empirical correlation, the weights of the hidden layer (w 1 ) and the output layer (w 2 ) were used to derive the empirical equation, and the values are listed in Table 7. The proposed model to predict the CO 2 -MMP is given by the following equations: (2)  [23][24][25]38]. In Equations (1) and (2), N is the total neurons number, j is the input index, x1, x2, x3 are the reservoir temperature, the mole fraction of C 2 to C 6 , and the molecular weight of heptane plus a fraction, respectively. The weights (w) and biases (b) are listed in Table 7. The developed model normalizes the input data automatically into a range between −1 and 1 based on the two-points method. Equations (3) and (4) are used to calculate the normalized values: Furthermore, a comparison study was performed between the different MMP determination approaches. CO 2 -MMP was determined using the Glaso [8] empirical correlation, the Yuan et al. [15] analytical method and the developed AI model. Figure 8 and Table 8 summarize the obtained CO 2 -MMP. The absolute error and correlation coefficient were used to select the best determination approach. Yuan et al.'s [15] analytical equation showed the highest error (16.7%) and the lowest correlation coefficient (0.60) among all approaches. Absolute error of 16.4% and correlation coefficient of 0.67 were obtained using the Glaso [8] empirical correlation, which indicates that those equations (Yuan et al. and Glaso 1985) were developed based on limited experimental results and several assumptions were applied. The AI model of radial basis function showed the best prediction performance, the absolute error and the correlation coefficients are 6.6% and 0.98 respectively. Based on this study, the recommended approach for predicting the CO 2 -MMP is an AI model with a radial basis function.  In addition, real case studies for the flooding of hydrocarbon reservoirs with carbon dioxide were used. The data were collected from Kanatbayev et al. and Alomair et al. [39,40]. The CO2 minimum miscibility pressure was determined using the developed AI model, numerical simulation and regression techniques. Numerical approach (in Eclipse 300) was utilized to determine the CO2-MMP, the reservoir system was segmented into 2000 grid blocks, and the numerical dispersion was corrected using the infinite cell solution [39]. Alternating conditional expectation (ACE) regression algorithm was used to predict the CO2-MMP, the regression algorithm was proposed by Alomair et al., 2015 [40]. The predicted MMP values were compared with the actual values that were measured using slim tube tests. The actual minimum miscibility pressures and the predicted values were calculated by different methods and, in addition to the error values, are listed in Table 9. Average absolute error of 10.2% was obtained using the regression approach, and errors between 8.7% to 17.3% were obtained using the numerical approach (Eclipse 300). The developed AI model in this study showed an acceptable prediction performance, with the absolute error varying between 6.4% to 9%.  In addition, real case studies for the flooding of hydrocarbon reservoirs with carbon dioxide were used. The data were collected from Kanatbayev et al. and Alomair et al. [39,40]. The CO 2 minimum miscibility pressure was determined using the developed AI model, numerical simulation and regression techniques. Numerical approach (in Eclipse 300) was utilized to determine the CO 2 -MMP, the reservoir system was segmented into 2000 grid blocks, and the numerical dispersion was corrected using the infinite cell solution [39]. Alternating conditional expectation (ACE) regression algorithm was used to predict the CO 2 -MMP, the regression algorithm was proposed by Alomair et al., 2015 [40]. The predicted MMP values were compared with the actual values that were measured using slim tube tests. The actual minimum miscibility pressures and the predicted values were calculated by different methods and, in addition to the error values, are listed in Table 9. Average absolute error of 10.2% was obtained using the regression approach, and errors between 8.7% to 17.3% were obtained using the numerical approach (Eclipse 300). The developed AI model in this study showed an acceptable prediction performance, with the absolute error varying between 6.4% to 9%.

Conclusions
This paper presents an intelligent model for determining the minimum miscibility pressure (MMP) during CO 2 flooding. Artificial intelligence (AI) techniques were used to build a new MMP model. A neural network, radial basis function, generalized network and fuzzy logic system were used. The best predictive model was selected based on the absolute error and the correlation coefficient for the testing data set. Based on this work, the following points can be drawn:

•
The proposed AI models are quick, rigorous and outperforms the current CO 2 -MMP methods. The developed models can minimize the time and cost of determining the CO 2 -MMP.

•
The developed models investigate the effect of hydrocarbon component and the injected gas composition on the MMP during CO 2 -flooding.

•
A new equation was extracted using the optimized radial basis function, the developed equation showed a good performance for determining the CO 2 -MMP, and the absolute error is 6.6% and the correlation coefficient is 0.98. • Ultimately, this work will improve the design of CO 2 flooding operations by providing a reliable value for the CO 2 -MMP.

Appendix A. Mathematical Formulas
This appendix presents the formulas used in this study for the error calculation, Glaso [8] empirical correlation and Yuan et al. [15] analytical equation.
Average Absolute Percentage Error (AAPE): Coefficient of Determination: where MMP is the predicted minimum miscibility pressure for CO 2 injection, M C7+ is the molecular weight of C 7+ , P C2-6 is the percentage of C 2 to C 6 and a 1 to a 10 are fitting coefficients.
where MMP is the estimated minimum miscibility pressure in psia, C 2-6 is the mole fraction of C 2 to C 6 and MW C7+ is the molecular weight of heptane plus fraction.