Optimized ANFIS Model Using Aquila Optimizer for Oil Production Forecasting

: Oil production forecasting is one of the essential processes for organizations and govern-ments to make necessary economic plans. This paper proposes a novel hybrid intelligence time series model to forecast oil production from two different oil ﬁelds in China and Yemen. This model is a modiﬁed ANFIS (Adaptive Neuro-Fuzzy Inference System), which is developed by applying a new optimization algorithm called the Aquila Optimizer (AO). The AO is a recently proposed optimization algorithm that was inspired by the behavior of Aquila in nature. The developed model, called AO-ANFIS, was evaluated using real-world datasets provided by local partners. In addition, extensive comparisons to the traditional ANFIS model and several modiﬁed ANFIS models using different optimization algorithms. Numeric results and statistics have conﬁrmed the superiority of the AO-ANFIS over traditional ANFIS and several modiﬁed models. Additionally, the results reveal that AO is signiﬁcantly improved ANFIS prediction accuracy. Thus, AO-ANFIS can be considered as an efﬁcient time series tool.


Introduction
Accurate forecasting of oil production is a significant and cumbersome task for monitoring and improving oil reservoirs. While hydrocarbons usage constitutes the largest share of the globe's energy consumption in 2019, with over 58% world's energy consumption [1,2]. Therefore, oil production forecasting plays a crucial role in the life cycle of oil reservoirs, including early resource evaluation and improving recovery. Simultaneously, various factors influence the hydrocarbon resources, including formation heterogeneities, the complexity of fluid flow within the subsurface formation, reservoir properties, and fluid properties that make the precise prediction of oil production more cumbersome [3,4]. Three approaches are frequently used to establish the prediction of oil production models in oil reservoirs. Numerical reservoir simulation (NRS) is considered the optimal traditional means for forecasting oil production. NRS method relies on a numerical model, which tends to achieve good performance and evaluate reservoir geological heterogeneity [5,6]. Furthermore, the NRS models have some limitations, including being time-consuming, cumbersome [7], and it requires constructing an accurate static model, history matching, and other dynamic model parameters. Furthermore, analytical techniques are employed to compute various forms of wellbore flow rate adjustments. Depending on the reservoir heterogeneity, well structures complexity, and boundary conditions, some hypotheses are essential for determining the analytical solution [8][9][10]. Moreover, the conventional decline curve analysis (DCA) technique [11,12] can forecast the production rate by evaluating the long-term hydrocarbon production data. The DCA approach employs the empirical equations to match the historical production data with a model, including harmonic, hyperbolic, and exponential models [13]. These models are perfect curves and cannot consider the actual formation factors. Thus, it is challenging to ensure accurate performance by employing DCA.
The use of a numerical simulation to predict oil production is a more reliable and robust technique. Its accuracy is based on the accuracy of static models and history matching quality. However, it is troublesome to construct an accurate static model [11,14,15]. Moreover, the parameterization techniques of the static model and the integrating method of objective components have a significant effect on history matching and reservoir prediction [14][15][16]. Although multi-objective optimization can be determined, a perfect history matching model leads to poor prediction [17]. Thus, the history matching approach is formidable and requires a long time, which renders a lot of work [18].
The applications of deep learning (DL) and machine learning (ML) in the petroleum industry have gained more concern [19], particularly in forecasting oil production [20,21], forecasting of pressure-volume-temperature (PVT) properties [22,23], optimizing well placement and oil production [24,25], the prediction of reservoir petrophysical properties, including porosity and permeability [26,27], and oil spill detection [28]. Deep learning has been incorporated into the petroleum industry with the remarkable development of deep learning algorithms, enabling overcoming troublesome concerns in oilfields [21].
Several DL and ML methods were introduced for forecasting oil production [1,20,21]. For example, Sagheer et al. [29] introduced Long Short-Term Memory (LSTM) to predict oil production time series. Fan et al. [1] proposed to incorporate autoregressive integrated moving average (ARIMA) and LSTM to forecast oil production. In [30], an optimized Random Vector Functional Link was introduced for time series forecasting. This model was implemented for oil production in Tahe oilfield. Liu et al. [20] employed LSTM with Ensemble Empirical Mode Decomposition (EEMD) to predict oil production.
One of the most efficient time series prediction models is the Adaptive Neuro Fuzzy Inference System (ANFIS), which was employed for different forecasting problems [31][32][33]. In this paper, we improve a modified ANFIS model using a new metaheuristic optimization algorithm called the Aquila Optimizer (AO) [34]. It belongs to a class of nature-inspired optimization algorithms, which are motivated by the behavior of living organisms, such as grey wolves [35], harris hawks [36], or red foxes [37]. The AO is inspired by the behavior of Aquila in nature, and it showed superior performance in solving different optimization and complex problems. In this paper, AO is applied to optimize ANFIS parameters to avoid traditional ANFIS shortcomings. First, the AO works by generating a set of candidates (solutions). Each one represents ANFIS parameter configurations. Then, each solution is evaluated using the training set. Thereafter, the solution that has the smallest fitness value is the best solution.
In this paper, AO-ANFIS is used on two real-world historical oil production datasets from Masila oilfields in Yemen and Tahe oilfields in China. The evaluation experiments are implemented using several performance measures, and extensive comparisons to several models are also carried out. The main contribution of the current study is: • A new modified ANFIS model, called AO-ANFIS, is proposed as a time series forecasting model for oil production. • The AO algorithm to optimize ANFIS parameters to overcome the shortcomings of ANFIS. • We implement extensive comparisons to several models to verify the performance of AO-ANFIS using two real-world datasets.
This paper is presented as: Section 2 summarizes several oil forecasting studies. The backgrounds of applied methods are described in Section 3. The AO-ANFIS time series forecasting model is described in Section 4. Experiments and conclusion are presented in Sections 5 and 6, respectively.

Related Work
In this section, we recap a list of relevant methods employed for oil production forecasting. Abdullayeva et al. [38] established a hybrid model based on the integration of a Convolutional Neural Network (CNN) and LSTM networks, called CNN-LSTM, to forecast the oil production accurately. Calvette et al. [39] implemented a deep learning algorithm in a proxy model to precisely duplicate the simulator by predicting the history data of production. Fan et al. [1] proposed a hybrid model by incorporating the ARIMA model and LSTM to consider the impact of manual operation and assess the benefit of linearity and nonlinearity. Wang et al. [40] proposed a hybridization forecasting model of the linear and non-linear to a modern predicting method in two stages. The first one, by incorporating between the grey model of the non-linear with the mentalism idea to establish non-linear metabolism grey model (NMGM). The second one by integrating the established NMGM with ARIMA to develop the NMGM-ARIMA method. In [41], based on pressure-rate datasets, an integration model between non-linear autoregressive (NARX) and the LSTM was proposed to investigate synthetic datasets and contrast the findings of forecasting pressure. Zhong et al. [42] proposed a deep learning proxy model to forecast the fluid saturation and reservoir pressure during the water flooding in heterogeneous reservoirs. Based on the recent development of deep learning, the coupled generative adversarial network (Co-GAN) was employed to determine the distribution of multidomain high-dimensional image data.
Wang et al. [43] introduced a novel equal probability gene expression programming (EP-GEP) to eliminate the defects of the conventional Arps decline model in carbonate reservoir during the production decline analysis. The outcomes of the EP-GEP model show perfect forecasting accuracy with relative errors compared to the traditional methods. Yan et al. [44] introduce time series data that can be examined with supervised algorithms and the Internet of Things (IoT). The elucidation of the efficiency of forecasting oil production techniques in steam flood scenarios was proposed. In addition, a 3% enhancement in oil production was observed based on the established optimal steam distribution plan. Singh et al. [45] proposed a novel approach that can forecast the gas hydrate saturation(Sh) for any well utilizing various parameters, including bulk density, porosity, compressional wave (P wave) velocity well-logs neural networks (NNs), or without any well-specific calibration. The findings indicated that the accuracy of the established technique in forecasting (Sh) was 83%.
Zanjani et al. [46] proposed various deep learning approaches, including Artificial Neural Network (ANN), Support Vector Regression (SVR), and Linear Regression (LR), to forecast the oil production. The findings indicate that all three approaches presented good forecasting. However, ANN is considered the optimal approach. Liu et al. [47] proposed forecasting oil production, which considers the trends and the correlations of oil production data based on the LSTM approach. Alalimi et al. [30] established an integrated model of Random Vector Functional Link (RVFL) and Spherical Search Optimizer (SSO) to forecast oil production from the Taha oilfield. The proposed model (SSO-RVFL) was evaluated with comparisons to several optimization methods. Sagheer et al. [29] introduced deep LSTM as a deep learning technique to address the shortcomings of conventional forecasting methods and present accurate predictions.

Backgrounds
In this section, we give a brief description of the applied methods, as follows.

ANFIS
The ANFIS approach was established by [48] as a new artificial network (ANN). The ANFIS model's structure is considered the incorporation of ANN and Fuzzy Inference Systems (FIS). Furthermore, "IF-THEN rules" are applied to generate a mapping for the inputs and outputs, identified as the "Takagi-Sugeno inference model". This renders to substantiate that the ANFIS approach is more convenient and reliable to deal with realglobal datasets as it has a robust learning capability. As stated by these characteristics, the ANFIS approach has been implemented in many applications.
In the common ANFIS workflow, as drawn in Figure 1, x and y represent the inputs of Layer 1, where O 1i indicates the outputs of the i node. The ANFIS mathematical model is expressed as follows: where µ indicates the generalized Gaussian membership function. The membership values of µ is represented by A i and B i , and α i and ρ i represent the premise parameter set. Moreover, Equation (3) can be utilized for the second layer: The output of the third Layer is calculated as: in which w i represents the ith output from the layer 2. Furthermore, the output of layer 4 is generated by Equation (5).
in which f indicates a function that uses the input and parameters of the network as inputs. r i , q i , and p i indicate consequent parameters of node i. Finally, layer 5 generates the output that is computed as in Equation (6).

Aquila Optimizer (AO)
In this section, the proposed Aquila Optimizer (AO) is presented as follows.
Following [34], AO starts by determining the initial value for a set of N individuals X using the following formula: In Equation (7), r 1 is a random value that belongs to ∈ [0, 1]. LB j and UB j denote the lower bound and upper bound at dimension j, respectively. Dim is the dimension of the test problem. Similar to other metaheuristic techniques, AO has two phases, namely exploration and exploitation, to update the current individuals. The exploration phase starts when t ≤ ( 2 3 ) * T, and it has two methods; the first one is formulated using Equation (8): In Equation (8), T is the number of total iterations. X b (t) is the best individual obtained so far at current iteration t, while the factor 1−t T is applied to manage the search during the exploration phase. In addition, the X M (t) is the individual average among the dimensions, and it is computed as: In the second exploration method, the AO depends on using Levy flight distribution to update the current individual as formulated in the following equation: where X R denotes a random chosen individual. Levy(D) refers to the Levy flight distribution defined as: In Equation (11), s = 0.01 and β = 1.5 are constant values, u and υ refer to random numbers generated from [0,1]. In Equation (10), y and x are used to simulate the spiral shape as: where r 1 ∈ [0, 20] is random value. ω = 0.005 and U = 0.00565 denote small values. Following [34], there are two methods are used to simulate the exploitation ability of individuals during the searching process. The first method depends on using the best solution (X best ), and the average of individual's location (X M ) and this formulated as: In Equation (14), rand ∈ [0, 1] denotes random number. α and δ denote the exploitation adjustment parameters.
The second exploitation method depends on X best , Levy, and quality function QF.
where the main aim of using QF is to balance the search strategies, and it is defined as: G 1 represents different motions applied to track the best solution, and it is defined as In Equation (15), G 2 is decreased from 2 to 0, and it is formulated as where rand represents a random value. The full description of the AO algorithm is given in Algorithm 1.

Algorithm 1 Aquila Optimizer (AO)
1: Set the initial value for the parameters of the AO. 2: Generate initial population X. 3: while (end condition is not met) do 4: Compute the fitness values for each X i .

Proposed AO-ANFIS Model
The framework of the developed forecasting oil production is given in Figure 2. The developed model aims to enhance the ability of the ANFIS network for forecasting oil using the behavior of the AO algorithm. This is achieved by determining the optimal parameters of ANFIS using AO. The developed AO-ANFIS starts by constructing the ANFIS network, followed by splitting the historical oil dataset into training and validation sets, which represent 70% and 30%, respectively. Then, the generation of a set of N individuals X, which represents the parameters for ANFIS (i.e., we have N configurations). The next step is to use the training part of the dataset and compute the quality of each configuration using the following fitness function (i.e., it is the root mean square error): where T and P refer to the actual and predicted output, respectively, and N a is the number of training samples. This process is followed by updating the value of the best configuration X best , and then updating other configurations using the operators of AO, as defined in Algorithm 1. Thereafter, the validation part is applied to the best configuration obtained so far and computed the quality of the output. Besides this quality, the proposed model is used for forecasting oil production.
The steps of the developed AO-ANFIS is given in Algorithm 2. Compute quality of each individual X i .

Masila Oilfield, Yemen
The Masila basin is considered the largest basin in Yemen. It is located in the south part of Yemen, Hadramout city, with a total area of 1250 km 2 (see Figure 3). Masila basin consists of more than 18 oilfields, including Sunah, N.E. Sunah, Hemiar, N.Camaal, and Tawila oilfield, etc. The Sunah oilfield is the second-largest oilfield in the Masila basin. The Sunah oilfield is an onshore oilfield located in the N.E. corner of the Masila basin. It is subdivided into three main reservoirs, namely S1, S2, and S3. The S1 reservoir is divided into S1A, S1B, S1C, and S1D. The Qishn formation (Upper Qishn formation) is the main target formation in the S1A reservoir [49,50]. The data is collected from 11/10/1993 to 01/14/2012 with 6640 records.

Performance Metrics
In the current study, we employed three performance metrics to evaluate the proposed AO-ANFIS model, the Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Coefficient of Determination R 2 , Standard deviation (Std), Akaike information Criterion (AIC), and Bayesian information criterion (BIC). The definitions of the measures are presented in Table 1, where N represents the size of the testing set. Y and Py denote the target oil production and its prediction value using the model, respectively. Y is the mean of Y. Moreover, k is the number of data to be estimated, andL is the maximized value of likelihood function.

Masila oilfields, Yemen
In this section, we assess the performance of the AO-ANFIS using datasets of Masila oilfields. Additionally, we compared the AO-ANFIS to the traditional ANFIS model and several modified ANFIS models, using several optimization methods, namely, sine cosine algorithm (SCA), grey wolf optimizer (GWO), particle swarm optimization (PSO) algorithm, slime mould algorithm (SMA), and genetic algorithm (GA).
From Table 2, we notice that the AO-ANFIS obtained the best performance in all applied performance measures, namely, RMSE, MAE, R 2 , Std, computation time, AIC, and BIC. For other models, in terms of RMSE, the GA-ANFIS came in the second rank, followed by PSO, SMA, GWO, SCA, and the original ANFIS. For MAE, the GA came in the second rank, followed by PSO, SMA, GWO, SCA, and the original ANFIS. For R 2 , the PSO and GA obtained the second and third ranks, respectively, followed by SMA, GWO, SCA, and the original ANFIS. For Std, the PSO and GA also obtained the second and third rank, respectively, followed by the original ANFIS, SMA, GWO, and SCA. For AIC and BIC, the PSO came in the second rank, followed by SMA, GA, GWO, original ANFIS, and SCA, respectively. Additionally, the prediction results are drawn in Figure 5. As shown from this figure, the proposed AO-ANFIS showed better performance than other methods.

Results of Tahe Oilfield
The results of the AO-ANFIS and other compared methods for the Tahe oilfield are presented in this section. As shown in Tables 3 and 4, in terms of RMSE and MAE, the proposed AO-ANFIS outperformed all compared models with the smallest RMSE and MAE values in nine out of ten wells. The PSO obtained the best RMSE and MAE values in one out of ten wells (well No. 10). Moreover, in terms of R 2 , the proposed AO-ANFIS obtained the best performance in eight out of ten wells. In comparison, the PSO obtained the best R 2 value in two out of ten wells (Wells No. 9 and 10), as recorded in Table 5. Additionally, Table 6 shows the results of all compared methods in terms of AIC and BIC. The AO obtained the best AIC and BIC results in seven out of ten oil wells (1-6, and 9). The GA obtained the best AIC and BIC results for wells No. 7 and 10, where for well No. 8, the best results were obtained by PSO.  Furthermore, Figure 6 shows the prediction results of the AO-ANFIS against the compared model. As noticed from this figure, AO-ANFIS achieved the nearest values to the target data.

Comparison to Other Forecasting Methods
We also compare the proposed AO-ANFIS model to other well-known time series forecasting methods, such as ARIMA and LSTM. Table 7 illustrates the comparisons results using the oil production datasets of the Almasila oilfield, Yemen. The AO-ANFIS obtained the best results in terms of RMSE, MAE, R 2 , and Std, whereas the ARIMA model obtained the best results in terms of AIC and BIC.

Statistical Tests
In this section, we implement a well-known statistical test, the Friedman test, to further evaluate the AO-ANFIS against the compared models.
The Friedman test results are illustrated in Table 8 for the Almasila oilfields data and in Table 9 for the Tahe oilfield data. From the results in the tables, we see that the AO-ANFIS obtained the best results in all datasets, in terms of RMSE, except in one oil well (well No.10), where the PSO-ANFIS recorded the best result in this well dataset. Overall, from all of the evaluation results, the AO-ANFIS method showed its high performance and superiority over all other compared models. Thus, AO-ANFIS can be considered an efficient time series forecasting model, which can be further utilized in other time series forecasting applications.

Conclusions
In the current study, we proposed a developed ANFIS model, called AO-ANFIS, for oil production time series forecasting. The Aquila Optimizer (AO) is a recently developed metaheuristic optimization algorithm that showed significant performance in addressing optimization tasks.
In this study, AO is applied to optimize ANFIS parameters to boost its prediction accuracy. The AO-ANFIS is evaluated with different datasets collected from two oilfields, namely, Tahe oilfields and Almasila oilfields, from China and Yemen, respectively. We also considered extensive experimental comparisons to state-of-art models, including the ANFIS traditional version, in addition to five modified versions of ANFIS using five optimization algorithms, called PSO, SCA, SSA, GWO, and SMA. AO-ANFIS has achieved significant results, and it outperformed the mentioned models in terms of RMSE, MAE, and R 2 .
The current AO-ANFIS model could be further developed to achieve more accurate results in future work. For example, applying a mutation strategy could further enhance the search process of the AO algorithm, which will result in improving ANFIS accuracy.