Forecasting Oil Production Flowrate Based on an Improved Backpropagation High-Order Neural Network with Empirical Mode Decomposition

: Developing a forecasting model for oilﬁeld well production plays a signiﬁcant role in managing mature oilﬁelds as it can help to identify production loss earlier. It is very common that mature ﬁelds need more frequent production measurements to detect declining production. This study proposes a machine learning system based on a hybrid empirical mode decomposition backpropagation higher-order neural network (EMD-BP-HONN) for oilﬁelds with less frequent measurement. With the individual well characteristic of stationary and non-stationary data, it creates a unique challenge. By utilizing historical well production measurement as a time series feature and then decomposing it using empirical mode decomposition, it generates a simpler pattern to be learned by the model. In this paper, various algorithms were deployed as a benchmark, and the proposed method was eventually completed to forecast well production. With proper feature engineering, it shows that the proposed method can be a potentially effective method to improve forecasting obtained by the traditional method.


Introduction
One important activity in the oil industry is to measure well production. By conducting measurements of the oil production, it could show how the well performs compared to the simulation result. Moreover, it plays a significant role in the phase of declining production. By measuring the declining production earlier, petroleum engineers have the capability to deliver appropriate action to respond [1,2]. However, such an ideal situation of providing continuous and periodic measurements is not viable to deploy due to economic and technical challenges [3,4]. Non-continuous and occasional basis of well production measurement is very common in oilfield operation [5,6].
Commonly, well production rate is measured using a test separator for some minutes to hours and then applying certain calculations to represent the whole day production of the well. In more advanced technology, the rate is measured by multiphase flow meters (MPFMs) that are equipped with several sensors, such as an ultrasonic meter that is used for gas rate measurement and a capacitance meter that can measure very high water cut, which is barely possible to acquire using the conventional method [7,8]. In many oil fields, a production test is not acquired every day for each well due to the well number limitation of test stations to conduct the test. Therefore, conducting specific tasks such as well performance monitoring will rely on a lagged test that leads to late action when something occurs in the well. Hence, petroleum engineers have difficulty determining declining well production earlier. For a longer production span, some traditional approaches are common to forecast production, including decline curve analysis (DCA), exploration interpolation and the black oil model [9]. Since those forecasting models need to be tuned with proper 1.
The introduction of a novel hybrid method incorporating EMD and BP-HONN as the main proposed framework for forecasting short term oil production. 2.
The introduction of a secondary novel hybrid framework utilizing EMD and BP-MLMVN for the same objective. 3.
Providing a 25-well dataset from an actual oilfield consisting of stationary and nonstationary datasets, which is a real representation of business challenges. This dataset will be available for future work by other researchers. 4.
The experiment shows that the proposed method EMD-BP-HONN are significantly better than other benchmark models.
The remainder of this paper explains the oilfield/reservoir description where the dataset is retrieved, the algorithms that are used, the selected performance evaluation and eventually, the framework proposed. The final result will be discussed in the result section alongside the statistical test to evaluate the significant difference among models.

The Reservoir under Study
The experiment data are carried out from two sources. The first one is from previous literature [18], which provides real production data from 5 (five) wells in Cambay Basin (CB), India. The top depth of the reservoir is between 1380-1413 m, with initial reservoir pressure observed at around 144 kg/cm 3 . The reservoir reserved oil in place was estimated at around 2.47 MMt. This oilfield started production in February 2000 with CB1 (first well). In September 2009, the cumulative oil produced for the CB1 well was 0.156 MMt. Other wells were drilled in subsequent years. Each well has 63 data points which were taken for each month from the year 2004 to 2009.
The descriptive statistics of those datasets are listed in Table 1. In addition to common statistic measurement, to test the stationarity of the data, the augmented Dickey-Fuller (ADF) test was used as per [26,27]. The null hypothesis of stationarity is rejected whenever the ADF statistic value is above critical values. Hence, all CB well's production data are categorized as non-stationer. The second source of experiment data comes from an actual oilfield in the Central Sumatra Basin (CS), Indonesia. The field was discovered in 1952, and it has been producing since 1971. The field area is around 19,905 acres, and currently, it is produced by 220 oil producer wells in either single or commingles production. The reservoir consists of multiple productive sands with depths between 4200-4400 ft and thicknesses around 0.13-158 ft. Regarding rock properties, the sand has permeability between 10 to 1200 mD and porosity around 10-34%. The oil is considered a light oil type with a value of gravity of 0.8 • . In this experiment, 25 (twenty-five) wells were selected that consist of stationary and nonstationary characteristics to evaluate the model prediction robustness. Descriptive statistics of those datasets are listed in Table 2. Those well datasets with higher ADF statistics belong to the non-stationary category. Another statistical characteristic being measured is the Hurst exponent value, indicating volatility, roughness and smoothness time-series data [16]. The completed dataset is provided in Supplementary Material.

Higher-Order Neural Networks (HONN)
Most artificial neural networks (ANN) use linear neurons, where the linear connection exists between the input vector and synaptic-weight vector Naturally, the correlation between input and neuron weight is considered non-linear. To overcome this, one neural network variance was introduced [18]. The major difference between HONN and conventional ANN is how the weighted sum of the input vector is calculated. The operation of synaptic weight and input is defined as: where x i is the neuron input of ith element of X, w i is the weight of neuron input of ith, x 0 is the bias neuron input, w 0 is the weight of bias and v is the output of the synaptic operation. Additionally, the somatic operation to calculate the outputs is described as where y is the output of the neural network and ϕ is the activation function. Architecturewise, as illustrated in Figure 1, HONN have interconnected layers, and the input vector will be calculated in each correlator (order) and then applied to the weighted sum of those. Suppose we have a neural network structure, as shown in Figure 1.
where y is the output of the neural network and is the activation function. Architecturewise, as illustrated in Figure 1, HONN have interconnected layers, and the input vector will be calculated in each correlator (order) and then applied to the weighted sum of those.
Suppose we have a neural network structure, as shown in Figure 1. First, we must carry out a feed-forward operation by the initial weight, and the input then calculates the error.
where output is the desired output and is the neural output (or y). We begin backpropagating the error from the output layer to the hidden layer.
Recall the error formula to calculate the derivative error.
The derivative output layer node-out to output layer node-in is derivative of its activation function Output layer node-in is calculated by multiplying the weight and input of the high order hidden layer node. First, we must carry out a feed-forward operation by the initial weight, and the input then calculates the error.
where output is the desired output and O out is the neural output (or y). We begin backpropagating the error from the output layer to the hidden layer.
Recall the error formula to calculate the derivative error.
The derivative output layer node-out to output layer node-in is derivative of its activation function Output layer node-in is calculated by multiplying the weight and input of the high order hidden layer node.
Then, we continue backpropagating the error from the hidden layer to the input layer.
Derivative error to hidden layer: Processes 2022, 10, 1137 6 of 15 We already calculated dError dO out and dO out dO in in the previous step. To calculate dO in dJout , recall the high order synaptics operation when we conducted the feed-forward. Thus, The derivative hidden layer node-out to hidden layer node in is derivative of its activation function Hidden layer node-in is calculated by multiplying the weight and input of the high order input layer node.
The error (squared error) is minimized by updating the weight matrix as where the change in weight matric is denoted by ∆W k which is proportional to the gradient of the error function as where η > 0 is the learning rate which affects the performance of the algorithm during the updating process.

Multi-Layer Neural Network with Multi-Valued Neurons (MLMVN)
MLMVN is a multi-layer neural network consisting of Multi-Valued Neurons (MVN) as basic neurons with complex-valued weights, which becomes the key difference between MLMVN and the classic neural network. The difference makes the learning process in MLMVN simpler and means that it has a better capability of generalization.
All neurons in the network are complex numbers located on the unit circle and the weights. An input/output mapping of a continuous MVN is described by a function of n variables where O is a set of points located on the unit circle, x i is the neuron input of ith element of X, w i is the weight of neuron input of ith element of X, x 0 is the bias neuron input, w 0 is the weight of bias. The continuous MVN activation function (as in Figure 2) is where z = w 0 + w 1 x 1 + · · · + w n x n is the weighted sum, Arg(z) is the main value of the argument of the complex number z. Thus, a continuous MVN output is a projection of its complex-valued weighted sum onto the unit circle.
The MVN training is based on the error-correction learning rule ( Figure 3) as follows: where X is the array of neuron inputs complex-conjugated, n is the number of neuron inputs, D is the expected target of the neuron, Y = P(z) is the calculated output of the neuron, r is the number of the epoch step, W r is the current weighting vector, W r+1 is the new weighting vector, C r is a learning rate and |z r | is the current absolute value of the weighted sum. where is a set of points located on the unit circle, i is the neuron input of ith element of X, i is the weight of neuron input of ith element of X, 0 is the bias neuron input, 0 is the weight of bias.
The continuous MVN activation function (as in Figure 2) is where = 0 + 1 1 + ⋯ + is the weighted sum, ( ) is the main value of the argument of the complex number z. Thus, a continuous MVN output is a projection of its complex-valued weighted sum onto the unit circle.
where ̅ is the array of neuron inputs complex-conjugated, is the number of neuron inputs, is the expected target of the neuron, = ( ) is the calculated output of the neuron, is the number of the epoch step, is the current weighting vector, +1 is the new weighting vector, is a learning rate and | | is the current absolute value of the weighted sum. where = 0 + 1 1 + ⋯ + is the weighted sum, ( ) is the main value of the argument of the complex number z. Thus, a continuous MVN output is a projection of its complex-valued weighted sum onto the unit circle. The MVN training is based on the error-correction learning rule ( Figure 3) as follows: where ̅ is the array of neuron inputs complex-conjugated, is the number of neuron inputs, is the expected target of the neuron, = ( ) is the calculated output of the neuron, is the number of the epoch step, is the current weighting vector, +1 is the new weighting vector, is a learning rate and | | is the current absolute value of the weighted sum. The MLMVN learning algorithm is derivative-free. It is based on the same errorcorrection learning rule as the one of a single MVN. Let MLMVN contain m layers of neurons and x 1 , . . . , x n be the network inputs. To obtain the local errors for all neurons, the global error of must be shared with these neurons. Therefore, the errors of the mth (output) layer neurons are where j m specifies the jth neuron of the mth layer; t m = N m + 1. The errors of the hidden layer's neurons are where j s specifies the jth neuron of the sth layer; t s = N s−1 + 1, s = 2, . . . , m is the number of all neurons in the layer s − 1, and t 1 = n + 1 (n is the number of network inputs).
After the error backpropagation is completed, the weights for all connecting layers shall then be updated using the error-correction learning rule adapted to MLMVN. It means that while it is used for the hidden neurons, the factor 1 |z r | should not be applied to the output neurons thus it becomes W r+1 = W r + C r (n+1) (D − Y)X in this output layer.

Empirical Mode Decomposition (EMD)
Empirical mode decomposition (EMD), also known as the Hilbert-Huang transformation (HHT), is a method to decompose signals into several simpler signals, called intrinsic mode functions (IMF) [28]. This method is an empirical approach to obtain current frequency data from a dataset, which preferably has non-stationer and non-linear characteristics [29,30]. Each IMF has to satisfy only one extrema that crosses the zero line (zero-crossing), with a mean value of zero. Due to its nature, the number of IMF obtained cannot be pre-determined; thus, every dataset has its own number of IMF and residue (last monotonic functions). The final result of EMD with the aggregation of all its components (and residue) can be seen as where x(t) is the time-series signal, is the ith IMF and r n is the residue.

Performance Evaluation
In this research, several performance metrics are used, such as R 2 (coefficient of determination), root mean square error (RMSE) and mean absolute percentage error (MAPE), as follows: where y i is the actual target for i th component,ŷ i is the predicted value, y is the mean value and n is the amount data.

Framework of the Proposed Model
In this research, a hybrid model of combining EMD and HONN with a backpropagation learning method is proposed to forecast the oil flow rate of 30 well production data. Another hybrid model of EMD and MLMVN is also introduced as a benchmark. The proposed method includes several steps, as shown in Figure 4. The first step is to deliver pre-processing of all datasets. The pre-processing started by normalizing all values to the same range. For the CB dataset, the scaler was applied as in the original paper, which calculated the ratio to maximum production (9500 m 3 ) [15]. For the CS dataset, a min-max scaler (−1 to 1) is applied for HONN and 0 to 1 for MLMVN. Additionally, the normalized value is processed with an EMD algorithm which constructs multiple IMFs (and residue) for each dataset. Due to the fact that the proposed learning algorithms are not able to learn from time-series data directly, the feature transformation using a lag feature transforms to the supervised-learning dataset. For this research, two up to five prior time series data were selected as inputs (i.e., X t−5 , X t−4 , X t−3 , X t−2 , X t−1 , for the lag feature of five) and the subsequent time series as a target (y). Then, the transformed feature feeds into HONN with the backpropagation learning method. For HONN and EMD-HONN, we chose hyperparameter as follows: the network architecture is one hidden layer for the activation function of tanh-tanh, the initial learning rate is 0.001 and is adaptive to the error of each The iteration for backpropagation learning (epoch) is 600, and the momentum is 0.9. The experiment was repeated with different hidden neuron configurations from 2 to 10 and also with the polynomial synaptic operation of linear (LSO), quadratic (QSO) and cubical (CSO). The best-performed model was selected to compete with other benchmark models.
dataset. Due to the fact that the proposed learning algorithms are not able to learn from time-series data directly, the feature transformation using a lag feature transforms to the supervised-learning dataset. For this research, two up to five prior time series data were selected as inputs (i.e., Xt−5, Xt−4, Xt−3, Xt−2, Xt−1, for the lag feature of five) and the subsequent time series as a target (y). Then, the transformed feature feeds into HONN with the backpropagation learning method. For HONN and EMD-HONN, we chose hyperparameter as follows: the network architecture is one hidden layer for the activation function of tanh-tanh, the initial learning rate is 0.001 and is adaptive to the error of each epoch; if the error decreases, the learning rate multiplies by 0.7 and 1.05 if the error increase. The iteration for backpropagation learning (epoch) is 600, and the momentum is 0.9. The experiment was repeated with different hidden neuron configurations from 2 to 10 and also with the polynomial synaptic operation of linear (LSO), quadratic (QSO) and cubical (CSO). The best-performed model was selected to compete with other benchmark models. For the second hybrid model introduced using EMD-MLMVN, a similar framework has been utilized, with BP-HONN being replaced with BP-MLMVN. The architecture of BP-MLMVN of 1 hidden layer, with randomized initial weight and number of training epochs of 500. The experiment was repeated with different hidden neurons configuration from 2 to 10 to select the best performing configuration. For the second hybrid model introduced using EMD-MLMVN, a similar framework has been utilized, with BP-HONN being replaced with BP-MLMVN. The architecture of BP-MLMVN of 1 hidden layer, with randomized initial weight and number of training epochs of 500. The experiment was repeated with different hidden neurons configuration from 2 to 10 to select the best performing configuration.
Additionally, three other forecasting methods were utilized for benchmarking. The most basic forecasting method is persistence or naïve, which takes the previous value (X t−1 ) as the next step value (X t+1 ) or forecasted value. For a more advanced baseline, two time series forecasting algorithms are used: autoregressive integrated moving average (ARIMA) and long short-term memory (LSTM). ARIMA is a very popular statistical model for forecasting time series data. It consists of three components: autoregression (p), moving average (q) and differencing (d). For this experiment, the variable p, d, q values are selected from 1 to 4. LSTM as the variance of recurrent neural network (RNN) can capture nonlinearity trends of well production forecast [19]. The hyperparameters for LSTM are the number of layers and activation function. The best of 20, 50 and 200 layers with combinations of tanh, relu and sigmoid activation functions are selected as the benchmarks for the proposed model.

Result of CB and CS Datasets
The result of the proposed methods and other benchmark methods for both oilfields are presented in this section. As mentioned in the previous section, the proposed models repeated runs with different parameters and configurations and selected the best to compete with benchmark models. An example of the best configuration for several wells can be seen in Table 3, which shows that different lag features and the number of hidden neurons provide the best MAPE. In the summary, as seen in Tables 4-7, the proposed EMD-BP-HONN and EMD-MLMVN outperformed all benchmarked models with the smallest MAPE in 23 out of 30 wells. Interestingly, ARIMA models have a decent result compared to our proposed models. For other metrics measured, RMSE and R2 have consistent results, as seen in Tables 5 and 6 for the CB dataset, in which the proposed methods have better forecasting performance than the benchmark models. Additionally, for detailed prediction, as an example, the prediction result of the proposed models for CB2 are presented in Figures 5 and 6. The prediction results of the proposed models for CS18 are presented in Figures 7 and 8. benchmarks for the proposed model.

Result of CB and CS Datasets
The result of the proposed methods and other benchmark methods for both oilfields are presented in this section. As mentioned in the previous section, the proposed models repeated runs with different parameters and configurations and selected the best to compete with benchmark models. An example of the best configuration for several wells can be seen in Table 3, which shows that different lag features and the number of hidden neurons provide the best MAPE. In the summary, as seen in Table 4-7, the proposed EMD-BP-HONN and EMD-MLMVN outperformed all benchmarked models with the smallest MAPE in 23 out of 30 wells. Interestingly, ARIMA models have a decent result compared to our proposed models. For other metrics measured, RMSE and R2 have consistent results, as seen in Tables 5 and 6 for the CB dataset, in which the proposed methods have better forecasting performance than the benchmark models. Additionally, for detailed prediction, as an example, the prediction result of the proposed models for CB2 are presented in Figures 5 and 6. The prediction results of the proposed models for CS18 are presented in Figures 7 and 8.  benchmarks for the proposed model.

Result of CB and CS Datasets
The result of the proposed methods and other benchmark methods for both oilfields are presented in this section. As mentioned in the previous section, the proposed models repeated runs with different parameters and configurations and selected the best to compete with benchmark models. An example of the best configuration for several wells can be seen in Table 3, which shows that different lag features and the number of hidden neurons provide the best MAPE. In the summary, as seen in Table 4-7, the proposed EMD-BP-HONN and EMD-MLMVN outperformed all benchmarked models with the smallest MAPE in 23 out of 30 wells. Interestingly, ARIMA models have a decent result compared to our proposed models. For other metrics measured, RMSE and R2 have consistent results, as seen in Tables 5 and 6 for the CB dataset, in which the proposed methods have better forecasting performance than the benchmark models. Additionally, for detailed prediction, as an example, the prediction result of the proposed models for CB2 are presented in Figures 5 and 6. The prediction results of the proposed models for CS18 are presented in Figures 7 and 8.         Another interesting finding, based on the result, is that the hybrid model indeed improves the performance of the base model. As seen in Tables 4 and 7, by implementing EMD in the pre-processing stage, both BP-HONN and BP-MLMVN have been improved, on average, by 23% and 34%, respectively.
In regard to comparing our work to other studies, other studies proposed a DLSTM model that outperforms the HONN vanilla model for the Cambay Basin dataset [20]. However, instead of individual well production, the dataset used is the cumulative production of five wells, as in Table 1. Based on the result, the reported MAPE scores are 2.851 and 3.459 for DLSTM and vanilla-HONN, respectively. To compare the performance of EMD-BP-HONN and EMD-BP-MLMVN, the same dataset was carried out, and the result can be seen in Table 8. EMD-BP-HONN continues to show better performance than other methods that are reported in other papers.

Statistical Tests
In this section, a statistical test was applied to evaluate whether there is a significant difference between the methods being proposed and the benchmark models. The Friedman test uses null and alternative hypotheses. The null hypothesis (H0) implies that the mean for each population is equal; thus, there is no significant difference among methods. The alternate hypothesis implies that at least one population mean is different from the rest. If the p-value of the test is less than 0.05, the null hypothesis can be rejected.
Using the Friedman chi-square test (using scipy python library) with MAPE metrics for all datasets, the results are as follows: statistic = 52.828 and p-value = 1.270 × 10 −9 . Seeing this result, the null hypothesis can be rejected. Then, to determine which methods are significantly different, the Nemenyi post hoc test was utilized, and the result is shown in Figure 9. The result shows that EMD-BP-HONN is significantly different from other benchmark methods except with EMD-BP-MLMVN. EMD-BP-MLMVN significantly differs from persistence and LSTM; however, it is not significantly different from ARIMA and BP-HONN.

Statistical Tests
In this section, a statistical test was applied to evaluate whether there is a significant difference between the methods being proposed and the benchmark models. The Friedman test uses null and alternative hypotheses. The null hypothesis (H0) implies that the mean for each population is equal; thus, there is no significant difference among methods.
The alternate hypothesis implies that at least one population mean is different from the rest. If the p-value of the test is less than 0.05, the null hypothesis can be rejected.
Using the Friedman chi-square test (using scipy python library) with MAPE metrics for all datasets, the results are as follows: statistic = 52.828 and p-value = 1.270 × 10 −9 . Seeing this result, the null hypothesis can be rejected. Then, to determine which methods are significantly different, the Nemenyi post hoc test was utilized, and the result is shown in Figure 9. The result shows that EMD-BP-HONN is significantly different from other benchmark methods except with EMD-BP-MLMVN. EMD-BP-MLMVN significantly differs from persistence and LSTM; however, it is not significantly different from ARIMA and BP-HONN.

Conclusions
In this study, we introduced a hybrid model of EMD-BP-HONN and EMD-BP-MLMVN for oil flow rate forecasting. The decomposition method of EMD was utilized in the pre-processing stage to make time-series data simpler; thus, it should increase the performance of the forecasting algorithm. The proposed methods were applied to 30 datasets collected from two oilfields, Cambay Basin, India and the Central Sumatra Basin, Indonesia. To compare the performance, time-series forecasting was tested as well. The proposed

Conclusions
In this study, we introduced a hybrid model of EMD-BP-HONN and EMD-BP-MLMVN for oil flow rate forecasting. The decomposition method of EMD was utilized in the pre-processing stage to make time-series data simpler; thus, it should increase the performance of the forecasting algorithm. The proposed methods were applied to 30 datasets collected from two oilfields, Cambay Basin, India and the Central Sumatra Basin, Indonesia. To compare the performance, time-series forecasting was tested as well. The proposed methods have significant results and outperformed the benchmark models in most datasets. In addition, by implementing the decomposition method prior to base models, the hybrid models were improved significantly in all datasets.
For future works, the hybrid models being proposed, EMD-BP-HONN and EMD-BP-MLMVN, could be improved with a more advanced version of the decomposition method. Selecting the best parameter can also be explored using an optimization algorithm to be able to search global optimum of parameters.