A Self-Adaptive Artiﬁcial Intelligence Technique to Predict Oil Pressure Volume Temperature Properties

: Reservoir ﬂuid properties such as bubble point pressure ( Pb ) and gas solubility ( Rs ) play a vital role in reservoir management and reservoir simulation. In addition, they affect the design of the production system. Pb and Rs can be obtained from laboratory experiments by taking a sample at the wellhead or from the reservoir under downhole conditions. However, this process is time-consuming and very costly. To overcome these challenges, empirical correlations and artiﬁcial intelligence (AI) models can be applied to obtain these properties. The objective of this paper is to introduce new empirical correlations to estimate Pb and Rs based on three input parameters—reservoir temperature and oil and gas gravities. 760 data points were collected from different sources to build new AI models for Pb and Rs . The new empirical correlations were developed by integrating artiﬁcial neural network (ANN) with a modiﬁed self-adaptive differential evolution algorithm to introduce a hybrid self-adaptive artiﬁcial neural network (SaDE-ANN) model. The results obtained conﬁrmed the accuracy of the developed SaDE-ANN models to predict the Pb and Rs of crude oils. This is the ﬁrst technique that can be used to predict Rs and Pb based on three input parameters only. The developed empirical correlation for Pb predicts the Pb with a correlation coefﬁcient (CC) of 0.99 and an average absolute percentage error (AAPE) of 6%. The same results were obtained for Rs , where the new empirical correlation predicts the Rs with a coefﬁcient of determination ( R 2 ) of 0.99 and an AAPE of less than 6%. The developed technique will help reservoir and production engineers to better understand and manage reservoirs. No additional or special software is required to run the developed technique.


Introduction
Reservoir fluid pressure volume temperature (PVT) properties such as bubble point pressure, gas solubility, and oil and gas formation volume factors and viscosities are critical in reservoir engineering management and computations. These PVT properties are required to obtain the initial hydrocarbons in place, optimum production schemes, ultimate hydrocarbon recovery, design of fluid handling equipment, and reservoir volumetric estimates. Bubble point pressure (Pb) and gas solubility (Rs) are two of the most critical quantities used to characterize an oil reservoir. Therefore, the accurate determination of these properties is one of the main challenges in reservoir development and management. There are also other factors that affect reservoir management, such as permeability. Jia et al. [1] illustrated that for shale reservoirs with a permeability of 0.01 mD, continuous gas injection is preferred, while for ultra-low permeability reservoirs, CO 2 huff-n-puff is recommended. For CO 2 huff-n-puff injection in oil shale reservoirs, the reservoir heterogeneity is not a favorable parameters-reservoir temperature and oil and gas gravities; and (2) predict the Rs using the three input parameters as well as the predicted Pb as the fourth input parameter.
The proposed methods require no expensive laboratory experiments. Hence, it is a step toward minimizing PVT laboratory experiments. The proposed data-driven models are developed using a modified self-adaptive differential evolution algorithm (MSaDE) [25] combined with ANN. In the subsequence sections of this paper, the proposed hybrid algorithm is referred to as SaDE-ANN.

Artificial Neural Network Modeling
Artificial neural network (ANN) is computational method derived from the biological neural network [26,27]. In the architecture of ANN, the input and the output are connected by specific neurons. A normal ANN contains an input layer, one or more hidden layers, and an output layer. Information is received by the input layer. In the hidden layer(s), a relationship between input(s) and output(s) is developed. Every neuron of one layer is linked to every neuron in the following layer and every connection has a related weight [28]. The relationship between the neuron and the source is controlled by weights and biases [29].
To avoid overfitting and underfitting, an optimization process is performed to determine the optimum number of neurons [28,30]. Training is the first step in the formation of the network. After training the network with training data, the testing output can be predicted using the weighted average of the outputs of training dataset, where the weights are calculated using the Euclidean distance between the training and testing data [31,32].

Methodology
ANN has several control parameters, such as the number of hidden layers, number of neurons at each layer, training and transferring functions, and ratio of testing over training datasets. Conventionally, the values of these control parameters are assigned by several sensitivity trials. In each single trial, different values of one parameter are assigned while keeping other parameters constant. Then the value that achieved the minimum error between the measured (real) and predicted output is selected. Similar processes are applied to the remaining parameters to find their best values. However, because of the interdependency of these control parameters, this "trial" approach does not ensure the accomplishment of optimum results.
Therefore, the methodology approached in this paper involves the simultaneous optimization of these parameters to achieve the minimum average absolute percentage error (AAPE) and the maximum CC. The definitions of AAPE and CC are shown in Appendix A. The stochastic optimization method used in this paper is modified by self-adaptive differential evolution (MSaDE) [25]. In MSaDE, the control variables of a differential evolution algorithm, such as scale factor, crossover, and mutation strategy, are self-adapted during each iteration. In this paper, MSaDE is integrated with ANN to optimize the control parameters of ANN.
The input parameters to the ANN are: reservoir temperature (T), oil gravity (American Petroleum Institute (API)), and gas specific gravity (GG). The outputs are bubble point pressure (Pb) and solution gas ratio (Rs). As mentioned earlier, ANN consists of two phases-training and testing. In the training phase, the optimization process of SaDE-ANN continues running until one of two conditions: (1) the AAPE is less than 5%, or (2) the maximum number of function evaluations (1000) is reached. Then the optimized SaDE-ANN model is validated on unseen testing datasets to predict the values of Pb and Rs using the input parameters T, API, and GG.

Data Analysis and Acquisition
The data points utilized in this paper were collected from the literature [7][8][9][33][34][35]. The data includes different oil sources with different concentrations. Data from the Middle East (Al-Marhoun) [8], data from Malaysian Crudes (Omar and Todd) [34], data from North Sea Glasø [7], data from fields all over the world (Vazquez and Beggs) [9], and data from the Mediterranean Basin, Africa, the Persian Gulf, and the North Sea (De Ghetto) [35] were employed. Each data point contains input parameters (reservoir temperature (T), oil gravity (API), and GG) and output parameters (solution gas oil ratio (Rs) and bubble point pressure (Pb)). Table 1 shows the statistical parameters of the studied 460 datasets after outlier removal using mean-standard deviation method; in which the dataset (x j ) would be considered as outliers if the condition shown in Equation (1) is achieved.
where x j is the data vector for the j th parameter, x j = x j,1 , x j,2 , x j,3 , . . . , x j,N , j = 1, 2, . . . , J, J is the total number of input parameters (in this case, J = 3), x j is the mean of the j th parameter, N is the total number of datasets, and σ j is the standard deviation of the j th parameter. The CC of the input parameters (T, API, and GG) with output parameters (Pb and Rs) are shown in Figure 1. In this paper, a combined correlation coefficient (cCC) parameter is introduced to indicate the combined CC of T, API, and GG to Pb and Rs. cCC is the arithmetic mean of the CCs of the three input parameters calculated by Equation (2). cCC is estimated for Pb and Rs to determine which output should be estimated first.
where CC GG , CC API , and CC T are the correlation coefficients between the output parameter and GG, oil gravity, and reservoir temperature, respectively. Figure 1 shows that Pb has a higher cCC with the input parameters (0.37) compared to Rs (0.32). Therefore, it is more convenient to estimate Pb first, and then use the estimated Pb with the three input parameters to predict Rs.

Bubble Point Pressure Estimation
The SaDE-ANN model was built to correlate Pb with T, API, and GG. The optimum parameters of ANN to generate the best results in terms of the lowest AAPE and highest CC were found to be an ANN structure of 3-18-17-1; the input layer consisted of three neurons representing the input parameters-reservoir temperature, oil API, and gas gravity. 18 neurons made up the first hidden layer, 17 neurons made up the second hidden layer, and Pb was the only parameter in the output layer. Data were divided into three sets-training (65%), validation (11%), and testing (24%). The optimum training and transfer functions were Bayesian regularization backpropagation and symmetric sigmoid, respectively. Figure 2 shows the cross plot of the predicted values of Pb using the SaDE-ANN model vs. the actual Pb values. Figure 2 shows that the AAPE was 5.18% and the CC was 0.994 for the training data, while for the testing data the AAPE was 6.37% and the CC was 0.993. These results confirm the stability and high accuracy of the SaDE-ANN model, which can be used to predict the Pb based on reservoir temperature, oil API gravity, and GG.

Bubble Point Pressure Estimation
The SaDE-ANN model was built to correlate Pb with T, API, and GG. The optimum parameters of ANN to generate the best results in terms of the lowest AAPE and highest CC were found to be an ANN structure of 3-18-17-1; the input layer consisted of three neurons representing the input parameters-reservoir temperature, oil API, and gas gravity. 18 neurons made up the first hidden layer, 17 neurons made up the second hidden layer, and Pb was the only parameter in the output layer. Data were divided into three sets-training (65%), validation (11%), and testing (24%). The optimum training and transfer functions were Bayesian regularization backpropagation and symmetric sigmoid, respectively. Figure 2 shows the cross plot of the predicted values of Pb using the SaDE-ANN model vs. the actual Pb values. Figure 2 shows that the AAPE was 5.18% and the CC was 0.994 for the training data, while for the testing data the AAPE was 6.37% and the CC was 0.993. These results confirm the stability and high accuracy of the SaDE-ANN model, which can be used to predict the Pb based on reservoir temperature, oil API gravity, and GG.

Bubble Point Pressure Estimation
The SaDE-ANN model was built to correlate Pb with T, API, and GG. The optimum parameters of ANN to generate the best results in terms of the lowest AAPE and highest CC were found to be an ANN structure of 3-18-17-1; the input layer consisted of three neurons representing the input parameters-reservoir temperature, oil API, and gas gravity. 18 neurons made up the first hidden layer, 17 neurons made up the second hidden layer, and Pb was the only parameter in the output layer. Data were divided into three sets-training (65%), validation (11%), and testing (24%). The optimum training and transfer functions were Bayesian regularization backpropagation and symmetric sigmoid, respectively. Figure 2 shows the cross plot of the predicted values of Pb using the SaDE-ANN model vs. the actual Pb values. Figure 2 shows that the AAPE was 5.18% and the CC was 0.994 for the training data, while for the testing data the AAPE was 6.37% and the CC was 0.993. These results confirm the stability and high accuracy of the SaDE-ANN model, which can be used to predict the Pb based on reservoir temperature, oil API gravity, and GG.  An AAPE of 6.37% for bubble point pressure prediction is acceptable considering two important factors: (1) this approach is utilized when the solution gas oil ratio data are not available and depends only on reservoir temperature and oil and gas gravities; (2) most of the published bubble point pressure correlations and models, including this proposed model, use fluid properties data from oilfield service companies, which has a lot of concerns in terms of data quality. In addition, when comparing the outputs of the proposed SaDE-ANN model with other correlation and models, in which the solution gas oil ratio is considered as the fourth input with the three inputs used in the SaDE-ANN model, the results are superior compared to other models and correlations. Figure 3 shows the performance comparison of validation data between different models and correlations. The outputs from the proposed SaDE-ANN method has the highest R 2 compared to other models. Figure 4 shows the comparison between different models and correlations based on the AAPE and CC-as shown in the figure, SaDE-ANN has the lowest AAPE and highest CC. An AAPE of 6.37% for bubble point pressure prediction is acceptable considering two important factors: (1) this approach is utilized when the solution gas oil ratio data are not available and depends only on reservoir temperature and oil and gas gravities; (2) most of the published bubble point pressure correlations and models, including this proposed model, use fluid properties data from oilfield service companies, which has a lot of concerns in terms of data quality. In addition, when comparing the outputs of the proposed SaDE-ANN model with other correlation and models, in which the solution gas oil ratio is considered as the fourth input with the three inputs used in the SaDE-ANN model, the results are superior compared to other models and correlations. Figure 3 shows the performance comparison of validation data between different models and correlations. The outputs from the proposed SaDE-ANN method has the highest R 2 compared to other models. Figure 4 shows the comparison between different models and correlations based on the AAPE and CC-as shown in the figure, SaDE-ANN has the lowest AAPE and highest CC.

Mathematical Model for Bubble Point Pressure
The mathematical model to estimate Pb derived from the optimized ANN model using GG, API, and T as input parameters, where the limitations of each parameter are shown in Table 1, is shown as follows: An AAPE of 6.37% for bubble point pressure prediction is acceptable considering two important factors: (1) this approach is utilized when the solution gas oil ratio data are not available and depends only on reservoir temperature and oil and gas gravities; (2) most of the published bubble point pressure correlations and models, including this proposed model, use fluid properties data from oilfield service companies, which has a lot of concerns in terms of data quality. In addition, when comparing the outputs of the proposed SaDE-ANN model with other correlation and models, in which the solution gas oil ratio is considered as the fourth input with the three inputs used in the SaDE-ANN model, the results are superior compared to other models and correlations. Figure 3 shows the performance comparison of validation data between different models and correlations. The outputs from the proposed SaDE-ANN method has the highest R 2 compared to other models. Figure 4 shows the comparison between different models and correlations based on the AAPE and CC-as shown in the figure, SaDE-ANN has the lowest AAPE and highest CC.

Mathematical Model for Bubble Point Pressure
The mathematical model to estimate Pb derived from the optimized ANN model using GG, API, and T as input parameters, where the limitations of each parameter are shown in Table 1, is shown as follows:

Mathematical Model for Bubble Point Pressure
The mathematical model to estimate Pb derived from the optimized ANN model using GG, API, and T as input parameters, where the limitations of each parameter are shown in Table 1, is shown as follows: where Pb n is normalized Pb (psi) and Y j is calculated as: and X i is calculated as: where: N 1 , N 2 number of neurons in the first and second hidden layers, respectively; i, j neuron index in the first and second hidden layers, respectively, as shown in Tables 2 and 3; w 1 j , b 1 i weights and bias between the input and first hidden layers, respectively, as shown in Table 2; w 2 j,i , b 2 j weights and bias between the first hidden and output layers, respectively; the values of W 2j,i are shown in Table 3; weights and bias between the second hidden and output layers, respectively; b 3 = −0.2626 and the values of W 3j are shown in Table 3; γ g n normalized GG, as calculated by Equation (6); API n normalized oil API gravity, as calculated by Equation (7); T n normalized reservoir temperature ( • F), as calculated by Equation (8).
Pb n presented in Equation (3) is the normalized value of Pb. Pb is calculated as: Pb = Pb n + 1 0.000286 + 126 (9)

Gas Solubility Estimation
In this section, gas solubility (Rs) is estimated based on the value of Pb predicted by the proposed model as well as the three input parameters (T, API, and GG). The optimum values of ANN parameters to generate the best results in terms of lowest AAPE and highest CC are an ANN with a structure of 4-15-15-1; the input layer consists of four neurons (the input parameters of oil API reservoir temperature, gas gravity, and Pb), while both the first and second hidden layers consist of 15 neurons. Rs is the only output parameter in the output layer. Data were divided into three sets-training (67%), testing (21%) and validation (12%). The best training function was trainbr and the best transferring function was logsig. Figure 5 shows the relative importance of input parameters (GG, oil API (API), reservoir temperature (T), and predicted bubble point pressure (from the previous step) with solution gas oil ratio (Rs)). As shown in Figure 5, the bubble point pressure had the highest relative importance with Rs, which is why it was very challenging to predict Rs without considering Pb as an input parameter. Therefore, GG, API, T, and predicted Pb were considered as the four inputs for SaDE-ANN to predict Rs.

Gas Solubility Estimation
In this section, gas solubility (Rs) is estimated based on the value of Pb predicted by the proposed model as well as the three input parameters (T, API, and GG). The optimum values of ANN parameters to generate the best results in terms of lowest AAPE and highest CC are an ANN with a structure of 4-15-15-1; the input layer consists of four neurons (the input parameters of oil API reservoir temperature, gas gravity, and Pb), while both the first and second hidden layers consist of 15 neurons. Rs is the only output parameter in the output layer. Data were divided into three setstraining (67%), testing (21%) and validation (12%). The best training function was trainbr and the best transferring function was logsig. Figure 5 shows the relative importance of input parameters (GG, oil API (API), reservoir temperature (T), and predicted bubble point pressure (from the previous step) with solution gas oil ratio (Rs)). As shown in Figure 5, the bubble point pressure had the highest relative importance with Rs, which is why it was very challenging to predict Rs without considering Pb as an input parameter. Therefore, GG, API, T, and predicted Pb were considered as the four inputs for SaDE-ANN to predict Rs. Training and testing cross plots of actual and predicted values of Rs, from SaDE-ANN, are shown in Figure 6. The figure shows that the R 2 was 0.99 for both the training and testing data, while the AAPEs of the training and testing data were 5.89% and 6.54%, respectively. These results confirm the capability of the SaDE-ANN model to predict Rs based on four parameters-Pb, T, API, and GG. Training and testing cross plots of actual and predicted values of Rs, from SaDE-ANN, are shown in Figure 6. The figure shows that the R 2 was 0.99 for both the training and testing data, while the AAPEs of the training and testing data were 5.89% and 6.54%, respectively. These results confirm the capability of the SaDE-ANN model to predict Rs based on four parameters-Pb, T, API, and GG. Mathematical Model for Gas Solubility The mathematical model extracted from the optimized SaDE-ANN model to estimate Rs using Pb, T, API, and GG, with the limitation of each value as shown in Table 1, is introduced by equations as follows:

Mathematical Model for Gas Solubility
The mathematical model extracted from the optimized SaDE-ANN model to estimate Rs using Pb, T, API, and GG, with the limitation of each value as shown in Table 1, is introduced by equations as follows: where Rs n is the normalized Rs (SCF/STB) and Y j is calculated as follows: X i is calculated as: The definitions of N 1 ,  Table 4; the values of j, b 2 j , w 2 j,i , and w 3 j are listed in Table 5; b 3 = 1.2091. Pb n is the normalized bubble point pressure (psi) and is calculated as follows: Pb n = 0.00029(Pb − 161.96) − 1 The value of Rs n from Equation (10) is normalized and can be converted to Rs as follows:

Conclusions
Bubble point pressure (Pb) and gas solubility (Rs) have a significant effect on the accuracy of modeling fluid flow in porous media. This paper introduced two data-driven correlations to predict Pb and Rs using reservoir temperature and oil and gas gravities. These empirical correlations were developed using a self-adaptive artificial neural network (SaDE-ANN). SaDE-ANN is a hybrid ANN integrated with a modified self-adaptive differential evolution (MSaDE) algorithm. The proposed correlations by SaDE-ANN were validated using previous experimental data reported in the literature (760 data points).
The developed empirical correlation for Pb predicted the Pb with a CC of 0.99 and an average absolute error (AAPE) of 6%. The same results were obtained for Rs, where the new empirical correlation predicted the Rs with a coefficient of determination (R 2 ) of 0.99 and an AAPE of less than 6%.
The proposed correlations showed the highest prediction accuracy when compared to different empirical correlations. The proposed method outperformed other previously reported methods, as it obtained the highest CC of 0.992 and lowest AAPE of 5.42% between measured and predicted values. The correlations introduced in this paper used reservoir temperature and oil and gas gravities as input parameters to predict Pb and Rs. Hence, this represents a breakthrough that minimizes the need for the expensive and time-consuming PVT laboratory experiments commonly used to determine Pb and Rs.

Conflicts of Interest:
The authors declare no conflicts of interest.

Appendix A
The AAPE is the measure of the relative deviation from the experimental data and is defined by: where E i is the relative deviation of an estimated value (Y est ) from an experimental value (Y exp ); The CC represents the degree of success in reducing the standard deviation by regression analysis, defined by: