Data Analysis and Neuro-Fuzzy Technique for EOR Screening: Application in Angolan Oilfields

: In this work, a neuro-fuzzy (NF) simulation study was conducted in order to screen candidate reservoirs for enhanced oil recovery (EOR) projects in Angolan oilfields. First, a knowledge pattern is extracted by combining both the searching potential of fuzzy-logic (FL) and the learning capability of neural network (NN) to make a priori decisions. The extracted knowledge pattern is validated against rock and fluid data trained from successful EOR projects around the world. Then, data from Block K offshore Angolan oilfields are then mined and analysed using box-plot technique for the investigation of the degree of suitability for EOR projects. The trained and validated model is then tested on the Angolan field data (Block K) where EOR application is yet to be fully established. The results from the NF simulation technique applied in this investigation show that polymer, hydrocarbon gas, and combustion are the suitable EOR techniques.


Introduction
The process of selecting potential candidates for enhanced oil recovery (EOR) operation is a complex task involving integration of a set of rock and fluid parameters governing technical and economic performance of a reservoir. It is understood that technical evaluation of EOR techniques is crucial to the success of such projects. However, it is equally important to evaluate the economic viability of an EOR project development including environmental, commercial, political and governmental factors [1][2][3]. Currently, there is no fully established technique for identifying the potential candidates for EOR operation. Operations are generally based on trial-and error; with reduced chances of success. In order to increase the chances of success and to make an informed decision, parameters obtained from either successful EOR field applications or from existing knowledge of the EOR operation could be effectively utilised. Comparison between these criteria and the reservoir of interest will provide an indication of the possibility of success for future EOR projects [4]. However, matching the parameters from the worldwide successful EOR techniques is a challenge from data mining and screening points of view. This is particularly the case since these parameters may not necessarily be directly dependent on each other. Several methods have been developed and published for screening oil reservoirs such as data analysis by using tables and graphs [5][6][7][8] and artificial intelligence (AI) [3,[9][10][11][12][13].
AI in oil industry has long been studied and its development has been relatively mature. Fuzzy-neural is an important approach in the area of reservoir characterisation in which knowledge of reservoir performance forecast from well logs can be derived [9]. Chung et al. [10] developed a fuzzy expert system for EOR risk analysis incorporating preliminary screening EOR methods, the field performance estimation and economic analysis. The system reduces the requirements of massive laboratory experiments and field data input. Kamari et al. [3] solved both problems of selecting appropriate EOR method by using an artificial neutral network (ANN) and an economical EOR screening model for prediction of cash flows.
A lot of work has been done on the application of artificial intelligence technique in EOR projects [14][15][16][17][18][19][20][21][22]. Morel et al. [14] published the screening criteria of the EOR technologies using 347 successful EOR projects worldwide. The study was based on the established analysis through fuzzy-logic (FL) membership functions (triangular, Z-shaped and S-shaped) and results are promising compared to the existing commercial software (EORgui). An neuro-fuzzy (NF) approach to screening reservoir candidates was published [15] by combining the strength of fuzzy technique in searching data with the learning capability of neural network (NN) to deduce knowledge from analogous to linguistic rules. 365 successful EOR data set were used to validate and determine the combination of fluid and rock properties which could best characterise the key parameters that control EOR success.
Kamari et al. [16] presented an AI based on gene expression programming (GEP) for prediction of CO 2 -oil minimum miscibility pressure (MMP) at different reservoir temperatures and oil compositions for live oil. In their work, Chen et al. [17], proposed two types of ANN models; back-propagation neural network (BPNN) and radial basic function neural network (RBFNN) for CO 2 solubility prediction in all types of amine solutions. The models were evaluated by comparing the results of experimental data with the predicted results of eight numerical models from the literature. Furthermore, Saeid et al. [18] developed adaptive NF inference system (ANFIS) for the estimation of solubility of hydrogen in heavy oil. To validate the model, both statistical and graphical methods were used in the training and testing data sets for the developed model.
Least square support vector machine (LSSVM) technique has been developed to estimate the interfacial tension (IFT) and MMP in paraffin-CO 2 systems [19], permeability of heterogeneous oil reservoirs [20], and surfactant-polymer flooding performance [21]. The proposed models were validated by statistical and graphical error analysis. Abouzar et al. [22], presented an ANN by using cuckoo optimisation algorithm (COA) and teaching learning based optimisation (TLBO) to predict the pure and impure CO 2 MMP.
As can be seen, most researchers on EOR screening have focused on single data point or use of insufficient number of well data in the models; thereby ignoring the heterogeneity of the fields. This usually leads to non-linearity of the data for the candidate reservoirs for EOR techniques. Hence, a multi-layered genetic fuzzy perceptron approach based on ANFIS [23] is used in this study. It is practical and easy to define constraints for the NF learning procedure, impose the rule of the fuzzy sets intersection point and minimise and stabilise the error between the training and validation data set [15].
It is our motivation to use NF as an AI tool to identify potential reservoir for EOR candidates. The model was performed using an in-house code and is capable of generating an automatic rule-base from successful worldwide database projects, optimising the variables of the fuzzy membership functions and providing interpretation models. The NF algorithm uses a self-organising technique to learn and initialise the membership functions of the input and output variables from a set of training data [24]. This is similar to the work of Zhou and Quek [25] where pseudo outer-product (POP) learning algorithm was used to identify the fuzzy rules instead of competitive learning [26] adopted in this paper. The input variables for the NF model consist of training functions (Figure 1) where the hidden layer nodes are varied in order to obtain the lowest root mean square error (RMSE) and non-dimensional error index (NDEI). Further details about methods are provided in Section 2.3 below. This is the first comprehensive study around the country and we believe the model can be used as an important tool on a technical field and/or reservoir selection. With the declining production rate within Angolan oil fields, the EOR methods are the most plausible means of increasing the recovery factor of hydrocarbon left in the ground after conventional recovery methods. The application of EOR methods in Angola is very necessary but requires an extensive research, development of a cheap and efficient techniques and more expertise involvement.
The data set used in this study is from 365 multiple successful thermal, miscible gas, chemical and biological EOR projects worldwide. The field data set, consisting of 2994 Angolan oil field data are mined and analysed using box-plot technique for Block K which is made up of four (4) areas, 13 fields, 40 reservoirs and 179 wells. The area grouping is based on production allocation associated with the asset (Figure 2 and Table 1). The results of the NF model can be applied as a preliminary step in technically evaluating the suitability of a particular EOR technique in Angola or elsewhere.  Table 1 for detailed breakdown of the distribution of the areas, fields, reservoirs and wells for the Block K.

The Methodology and Approach
The methodology employed in this study consists of three main steps: data mining, data analysis, and technical screening of EOR methods by NF algorithm (Figure 3).

Data Mining
The data for EOR screening comprises of two categories. The first category is training or validation data: data derived from laboratories studies, data generated from oil reservoirs simulation, data from successful worldwide projects. Data from successful worldwide projects are the most reliable category by the fact that technical and economical capabilities are proved practically [3]. The second category is the test data from the reservoirs under investigation.

Data Analysis
In this study, box-plots ( Figure 4) were used to represent the distribution of EOR projects against the oil and reservoir properties. These representations illustrate the distribution of oil property and reservoir characteristics for the available EOR data set. The upper limit of the whisker top, represents the maximum value and the lower limit of the whisker bottom represents the minimum value of the data set. All values out of this range (max-min) are considered as outliers. The extreme minimum and maximum values could have negative impact on the EOR criterion; even when the averages are established [7]. The values of both successful EOR data set and from reservoir of the variable under investigation are plotted in a graph. The following expression can be used to effectively analyse the observed variables: where a and b are the minimum and maximum of the training data set (e.g., successful EOR); c and d are the minimum and maximum of the test data set (e.g., Angolan field), and f is the outcome data set of the investigation. If f is an empty set, then the test variable will be considered unsuitable for the application process under investigation, hence the Equation (1) becomes: In summary, the box-plots provide a quick and efficient way to analyse the data, providing basic information about minimum, maximum, average and range in which the majority of the projects or data set are concentrated. However, the caveat in the use of box-plot analysis is that, it does not quantify the degree of uncertainty or consider the weight of each parameter, which requires a robust system as NF or laboratory test for full investigation.

Neuro-Fuzzy Technique
The structure of the model is based on five (5) layered feedforward -backpropagation NN ( Figure 5). This structure consists of input, hidden, and output layers. The input layer represents the input variables, whilst the output layer (defuzzification) represents the output decision signals. For the defuzzification, centre of gravity (COG) and min of max (MOM) were employed. In the hidden layers, layer two (2) nodes are functioning as input and output membership functions, and layers three (3) and four (4) nodes act as fuzzy logic rules AND, OR respectively [15,27,28]. The operation is done in many simple individual processors called neurons. On each layer, each neuron is connected to the neurons in the proceeding layer by direct links which have their own special weight [27][28][29]. Each neuron applies an activation function to its net input to produce its output after receiving signal from the proceeding neurons, and x represents the input signal to a node [27][28][29].
The description of the membership functions (MFs) applied in this work is highlighted in paper [15,30,31] where triangular, trapezoidal and Gaussian membership functions were all tested and the leftmost and rightmost values were shouldered. The choice was based on specific MFs that adequately matches the available successful EOR data using minimum error. Full details of the development of these NF model applied in this work can be found in [15].
During the learning process, the knowledge extracted from the NF system can be expressed in the form of fuzzy rules by computing weights, number of rules and fuzzy set parameters. These parameters are computed by machine learning process from the EOR data with the input fuzzy sets determined by the fuzzy clustering algorithm. The aforementioned parameters can also be determined by engineers and experts in the field. The back-propagation algorithm developed by [32] is used to tune all parameters where the error is propagated from the output towards the input units. The mean square error is expressed by the Equation (3) [15,27,28]: whereȳ (x) is the desired output and d is the current actual output. α represents a learning rate coefficient, set in simulations to 0.01 after error validation sensitivity. The (∂E/∂E c ) for the input and output of the Layer five (5) and two (2) can be determined, respectively as [15]: Hence, the updated value of w can be determined and the root mean square error (RMSE) and non-dimensional error index (NDEI) is used to evaluate the predicted error defined as: whereȳ (x) is the predicted output, σ(d) is the standard deviation of the target series, i is the data point that varies from 1 to N. If the NN is successfully trained, it can now be used to predict the suitability of the test data for the respective EOR technique under investigation.

Application of the Techniques in Angolan Oilfields
Angola is producing approximately 1.7 Mbbls/day under primary or secondary recovery mechanisms. The recovery factor from these mechanisms account for about 30% of the original oil in place [29,33] and most of the reservoir fields are maturing with production and pressure declining very rapidly as shown in Figures 6 and 7. Observation of the production and pressure patterns across a single block (Block K) consisting of 13 major fields suggests that there is a significant decline in performance. As an example, the reservoir drawdown; which is the primary force driving the fluids into the wellbore, decreases with time with significant negative impact on the productivity index (PI) of these wells. Figure 8 shows this trend for wells X6Y13W1 and X7Y29W4. PI for well X6Y13W1 is 61.84 stb/d/psi from 2005 and down to 10 stb/d/psi over a period of five years. Similarly, for well X7Y29W4, PI decreases from 2.91 stb/d/psi to 1.29 stb/d/psi over a period of ten (10) months in 2011. These observed trends suggest that there is a need for a mechanism for enhancement to be put in place. Hence, improving the performance of these wells is a cost-effective way to reverse the negative production decline trend, extend field life and improve oil recovery.
The term "easy oil" is vanishing in Angola due to the fact that more than 80% of oil production is from offshore fields and the production is moving towards more remote areas like deepwater and ultra-deepwater where the extraction of oil or field development is very costly. Angola can still produce the remaining oil from existing fields by applying new EOR technologies capable of increasing the recovery factor. Not much work has been published in the area of applied EOR technologies; an area which requires more research is the identification of suitable techniques that could allow further extraction of oil beyond primary and secondary recovery.
Historically, only one deep offshore field case of EOR (i.e., polymer injection) in Dalia/Camelia fields has been implemented in Angola [14,[34][35][36][37]. Considering the large number of fields and start of production activities dated as far back as 1955, Angola can be considered as a potential location for the implementation of EOR techniques. By applying the EOR recovery techniques, millions of barrels of oil will be extracted from the existing fields by increasing the recovery factor up to 60% of the oil in the reservoir [33]. Therefore, screening oil reservoirs can be considered as the first step before an EOR project implementation in Angola. However, before stating with confidence that the selected EOR technique will likely be technically successful, additional evaluations such as core analysis, reservoir simulation and field pilots are required [4].

Data Analysis
The successful EOR data set used in this model is from 365 worldwide successful EOR projects ( Figure 9); divided into ten (10) different EOR techniques such as steam, miscible CO 2 , miscible hydrocarbon gas, polymer, combustion, surfactants, nitrates, microbial, hot water and miscible acid gas. Some techniques from available data set present a number of successful projects that are considered insufficient for performing advanced statistical test. These techniques which include miscible acid gas, microbial, hot water, surfactant and nitrates will not be investigated in this current study.  Table 2 indicates that oil properties and reservoir characteristics were updated according to the available data set from the worldwide successful EOR projects and is not intended to present threshold limits since the range could be affected also by economic constraints and scientific development.
Angola data set was collected from 13 fields (X 1 , X 2 , ..., and X 13 ) consisting of reservoir rock and fluid properties including: reservoir depth, oil API gravity, oil viscosity, rock porosity, rock permeability, oil saturation, net pay thickness, reservoir temperature, reservoir pressure, formation water salinity and formation type. These data sets were collected from several reports including well test, geochemistry, fluid sampling, final well, thermodynamic, geological, Drill stem test (DST) and log interpretation reports. No carbonates formation type rock was found in the area under investigation. The area under investigation is an offshore field with water depth greater than 3500 ft and sea bottom temperature approximately 40 • F.
All the mined data are carefully checked for consistency and quality in order to minimise error. The set of data which were not available are highlighted in Table 3. The box plots were used to identify possible inconsistency and discrepancies in the data as the accuracy of the model in predicting the output may be impaired with the presence of outliers [8,39]. Table 4 contains the summary of the minimum, average and maximum values of the variables associated with the area under investigation (Block K). Figure 10 shows a single box-plot for the data-set associated with the successfull EOR and Angolan field for each variable. This is aimed at providing information about the distribution and alignment of both data-sets.

Neuro-Fuzzy Technique
The modelling process consists of three main stages: training, validation and testing. Data was grouped by variables from each EOR technique (Table 5). The set of options which generates the least RMSE and NDEI 80% (4/5) of the data set were selected at random for the training and the remaining 20% as the validation (prediction) set. This set of data 20% (1/5); which generates the least RMSE and NDEI is used as validation data set for the testing process. 45 runs for each variable and totaling more than 1350 runs for the six variables of five EOR techniques were generated. Figures A1-A5 summarise the best selected simulation results. Figure 11 illustrates five options run out of forty five runs of Depth for steam.   [15,38]. Steam  145  145  134  145  138  141  CO 2  131  130  129  130  107  128  Miscible Gas  37  37  36  37  33  36  Polymer  24  24  24  24  18  21  Combustion  16  16  14  15  15  15  Surfactants *  3  3  3  3  3  3  Nitrates *  2  2  2  2  2  2  Microbial *  3  3  3  3  3 2 Hot water * 2 2 2 2 2 2 Acid gas * The test data set (Angolan oilfield data), we used random selection and tested with the already validated data set from the training process. The results of simulation determines the EOR techniques suitable for Angolan oilfield according to the methods and variables investigated. Figures A6-A8 summarise the results of the simulation using the Angolan oilfield data set. The sample testing simulation results for steam are illustrated in Figure 12. However, this is not binary decision operation and hence the engineering expertise and knowledge from the previous operations in the area will be invaluable in evaluating the sensitivity of each variable for decision making.

Results and Discussion
The data base from the worldwide successful EOR projects was maximised by tuning the parameters (number of patterns, epochs, mean and standard deviation) of each variables associated with each five (5) different EOR techniques; steam, CO 2 , miscible hydrocarbon gas, polymer and combustion. Based on the identified patterns reinforced by the available data set, five unique values of mean and standard deviation were computed. The weight values were added to the results and then used to predict the degree of success of different EOR projects. The sample size of the available data becomes crucial to minimise the error and optimise simulation outcome.
In this study, the values of the NDEI and overall RMSE associated with the investigated successful EOR projects of the training process with corresponding oil and reservoir properties were computed. Figure A1 shows the NF model for steam matches the predicted depth data with NDEI ranging between 0.04 and 1.8, respectively. The RMSE varies from 40 (minimum) and 1183 (maximum). The best match (RMSE = 40, NDEI = 0.04) corresponds to option 2 (see Figure 11 and Figure A1). The predicted or validated data set of this option is then used as predicted set on testing process of the steam for the depth.
The training process was performed for the other parameters and EOR techniques. The best results of each training process are summarised in Figures A1-A5. The error computation is critical to ensure that the NF technique is suitable for the EOR process or technique under investigation. The developed model performed satisfactorily when run with enough training, verification and testing data sets. Each of the groups must have equal number of data sets. The degree of suitability of a typical EOR project obtained from the model prior to full field implementation as well as permits to segregate more oil properties and reservoir characteristics that could impact on EOR projects. The formation type is not included the in model. However, this can be determined by screening criteria from the successful EOR worldwide field data set ( Table 2).
Data from Angola reservoir fields was tested against this trained and validated data. Table 4 presents the data of some of the Angolan oil reservoir fields which consists mainly of sandstones formations. No carbonates reservoir was encountered in the area investigated. Six variables such as depth, API, viscosity, porosity, permeability, and oil saturation were investigated. EOR methods such as surfactants, microbial, nitrates, hot water, miscible acid gas ( Table 2) were not investigated due to the reduced number of the sample size. Figure 12 presents a testing process for steam process resulting from Area 1 of Block K. There is a good matching results for saturation (RMSE = 0.29, NDEI = 0.018), porosity (RMSE = 0.16, NDEI = 0.053). API matches with RMSE and NDEI of 0.42 and 0.08 whilst , depth (RMSE = 363, NDEI = 0.38), viscosity (RMSE = 1875, NDEI = 0.322), and permeability (RMSE = 2.25, NDEI = 0.0007). This procedure was performed for the four areas of the Block K (Area 1, Area 2, Area 3, Area 4) of the six variables investigated (API, depth, porosity, saturation, permeability and viscosity) for five EOR techniques (miscible gas, steam, CO 2 , polymer and combustion) and results are summarised in Table 6.
In order to determine the suitability of a particular technique in EOR project, variables are considered based on their degree of variance. It is understood that variables such as permeability can vary by up to 3 or 4 orders of magnitude in a geological formation [40]. Three scenarios were investigated: (1) the least RMSE combined with 20 < NDEI ≤ 30%; (2) the least RMSE combined with 10 < NDEI ≤ 20%; (3) the least RMSE combined with NDEI ≤ 10% ( Table 6). As this is not a binary decision operation, engineering knowledge of the process is required in decision making. As an example, variables like viscosity and depth for thermal process (steam and hot water), pressure for gas and steam injection, temperature for chemical and hot water are very sensitive and critical [13]. Permeability is not a critical variable for gas injection [5,6]. Based on the available data and the screening results, the summary of the main results for the investigated techniques are presented in Table 6 and Figures A6-A8. Scenario 1, polymer is the most suitable EOR methods for the areas investigated. Combustion is also suitable, however, due to the reduced number of the successful EOR projects, the results obtained may need further laboratory test for confirmation before execution. Miscible gas and CO 2 are suitable in three out of four areas, whilst steam is suitable in one out of four areas investigated (Table 6). Scenarios 2 and 3, the results of polymer, miscible hydrocarbon gas, and combustion remains the same except for steam and CO 2 that are not good candidates, because most of the parameters investigated present more than 50% NDEI that is not within the range of the investigated variables. However, more study is recommended for CO 2 technique due to its importance in CO 2 sequestration ( Table 6). Figures A10 and A11 show the comparison of the RMSE (Equation (7)) and NDEI (Equation (8)) for the variables investigated (depth, porosity, API, permeability, viscosity, and saturation). The computed values for the simulated and analytical calculation are conducted for non-regression and five different regression methods: linear, exponential, logarithmic, polynomial, and power. The same set of equations (Equations (7) and (8)) that was used for the model simulation was used to verify the code analytically. Expectedly, the simulated and analytical calculations matched very well ( Figures A10 and A11).

Conclusions
A NF model provides a powerful technical screening tool for reservoir fields within Angola or around the world. The data set of 365 successful EOR projects from 10 different EOR technologies in which five were investigated. Sixteen major oil producing countries were used in the developed model based on six different reservoir parameters and could be extended to other reservoir parameters. The model was tested using oil reservoir fields from Angola and can be used to test any data worldwide.
Box plots were used as data analysis and a quick look of technique suitability. However, use of box-plots do not reflect the degree of suitability or the behaviour of given parameter within the investigated range. The trained and validated data were used for comparison of simulation RMSE and NDEI output with five different regression methods; linear, exponential, logarithmic, polynomial, and power law. The regression models matched the simulation output to varying degrees. The caveat in the use of regression techniques is that some data points could be potentially excluded during the fitting process. The non-regression simulation approach adopted in this study, however, allows for automated error decay with the defined tolerance limit.
The Angolan field reservoirs from Block K under investigation are good candidates for polymer and combustion, followed by the miscible hydrocarbon gas. The screening methods are simply used to determine the suitability or chance of success of an EOR technique. Before stating with confidence that the selected EOR technique will likely be technically successful, additional evaluations such as core analysis, reservoir simulation and field pilots are required. (e) (f) Figure A7. Statistical data plots of steam, CO 2 , combustion, miscible gas and polymer injection for Area 2 of the Angolan oilfield (See Table 4  (e) (f) Figure A8. Statistical data plots of steam, CO 2 , combustion, miscible gas and polymer injection for Area 3 of the Angolan oilfield (See Table 4  (e) (f) Figure A9. Statistical data plots of steam, CO 2 , combustion, miscible gas and polymer injection for Area 4 of the Angolan oilfield (See Table 4