You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

16 July 2022

Well Performance Classification and Prediction: Deep Learning and Machine Learning Long Term Regression Experiments on Oil, Gas, and Water Production

,
,
,
,
,
,
,
,
and
Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia
*
Author to whom correspondence should be addressed.
This article belongs to the Section Intelligent Sensors

Abstract

In the oil and gas industries, predicting and classifying oil and gas production for hydrocarbon wells is difficult. Most oil and gas companies use reservoir simulation software to predict future oil and gas production and devise optimum field development plans. However, this process costs an immense number of resources and is time consuming. Each reservoir prediction experiment needs tens or hundreds of simulation runs, taking several hours or days to finish. In this paper, we attempt to overcome these issues by creating machine learning and deep learning models to expedite the process of forecasting oil and gas production. The dataset was provided by the leading oil producer, Saudi Aramco. Our approach reduced the time costs to a worst-case of a few minutes. Our study covered eight different ML and DL experiments and achieved its most outstanding R2 scores of 0.96 for XGBoost, 0.97 for ANN, and 0.98 for RNN over the other experiments.

1. Introduction

The ability to forecast oil wells’ production prior to drilling is a critical element in oil companies’ decision making. To do so, most oil companies, such as Saudi Aramco, use simulation. However, despite its accuracy, this method is time consuming due to the vast computational power required to perform such a task. What we offer in this paper is an alternative approach that will ease this step significantly and dramatically reduce the computational power needed. Figure 1 below shows a comparison between simulation and our proposed solution to this problem, which is making an AI model. Simulation processes require significant time, and result in an inconsistent accuracy, whereas our solution uses AI to provide accurate predictions promptly without compromising accuracy. The AI system will be able to make such predictions using ML and DL on the provided dataset and, with the trained system, the AI will be able to forecast the production of oil wells based on a few geological features.
Figure 1. Abstract view of the problem and the proposed solution.
We started our research by data collection. The data were provided by the well-known oil company Saudi Aramco. The data consist of a sample of simulation data representing the multiple reservoirs of oil wells with different numbers of wells. A total of five different reservoirs’ data were received as the dataset. These phases are illustrated in Figure 2. Data preprocessing, and the other steps shown in Figure 2, will be covered in detail in the upcoming sections.
Figure 2. Flowchart for the research’s experiment, step by step.

3. Methods and Materials

For our methods, we used eight different methods to compare them and try to find the best out of them. We will briefly explain each method in general and the formulas as well. Moreover, we will go through the dataset and the materials we used that helped us build the AI model.

3.1. Dataset

The five datasets we received are samples from Saudi Aramco of five well reservoirs with a different number of surrounding wells to predict oil/gas/water productions over almost three years. We combined all five datasets into one custom dataset in order to let the models learn the effect number of surrounding wells had on productions. The final combined custom dataset had 12 features and 280 dependent variables and a total of 1968 data points to work with. We can see each column and the description for it in Table 1
Table 1. Dataset Description.

3.2. Tools

For our experiment we used Anaconda Spyder and Google Collaboration online compilers as environments to program in Python. This section will describe each library we used and why we used it. Sklearn is the main machine learning library. It includes most of the classification, regression, and clustering techniques, along with dataset splitting and fitting into multiple ML models. Tensorflow and Keras are libraries that facilitate the programming process for DL models and allows parameter tuning along with number of neurons and hidden layers. NumPy is a library that allows the use of arrays in Python. The Pandas library allows importing and splitting of datasets, and the Matplotlib library provides visual representations of the datasets. It can provide static, interactive, or animated graphs.

3.3. Methods

3.3.1. MLR

As we know that regression is a way to predict the nature of the relationship between different variables, we use multiple linear regression (MLR) to find the relationships of a dependent variable with numerous independent or predictive factors. In MLR we can predict the dependent variable by two or more variables. As a result, MLR examines the correlation between numerous independent variables and the dependent variable. MLR can also help us know the value of the dependent variable at a specific independent variable value, we can use this information to acquire any dependent variable at any given point [], the general for multiple regression is shown in Equation (1):
Y = a + b 1 x 1 + b 2 x 2 + b 3 x 3 + + b n x n +
where Y refers to the dependent variable, x1, x2, x3, …, xn are the independent variables, b1, b2, b3, …, bn are the regression coefficients, a is the constant, and ε if the error. The regression coefficients in this equation stand for the independent contributions made by each independent variable to the forecasting of the dependent variable. Given the independent variables (X), the regression line expresses the most accurate forecast of the dependent variable (Y). There is usually a significant variance of the observed points around the fitted regression line since nature is rarely entirely predictable. The term “residual value” refers to the departure of a certain point from the regression line. Model fit is measured using R square, also called the coefficient of determination, which is equal to 1 minus the residual variability ratio.

3.3.2. PLR

Polynomial linear regression (PLR) is one of the types of linear regression, and it can solve the problem when the relation between the variables is non-linear. It can help us determine the independent and dependent variables’ curvilinear relationship. PLR works by fitting the data into the model as a polynomial of the nth degree. We use PLR when the linear regression cannot capture the point in the data and fails to describe the best result. This will help us because the relation in our dataset is not linear [], the polynomial linear regression is shown in Equation (2):
Y = a 0 + a 1 x 1 + a 2 x 1 2 + + a n x 1 n
One downside for PLR is that, as the polynomial degree increases, the time cost of the model also increases. A very high-degree PLR model can provide highly accurate results but, in some cases, may cause the models to overfit. A low-degree model may cause the model to underfit as it will not learn or extract all the features in the dataset. We can solve this problem by using the Bayes information criterion (BIC), which is an external algorithm that can help us in determining the best degree for the PLR model.

3.3.3. SVR

SVM is a well-known machine learning method that is used in classification for predicting one output. However, in our experiment, we will be using SVR which is an extension of SVM that allows it to handle regression problems, as it was limited to classification. SVR uses a function f(x) to transform a low-dimensional non-linear dataset into a high-dimensional linear problem in feature space by mapping the data with the function [,]. We still faced a problem as SVM/SVR both predict only one output whereas our experiment required us to predict 35 different outputs for oil, gas, and water predictions each collecting around three years of predictions. The use of a multi-output regressor is necessary, as it allows the model to act like other multi-target regression methods. The regression model of SVR can be written as follows, after numerous processes to the regression model. The Equation (3) of SVR is:
f x = i = 1 m a i * a i   δ x i , x j + b
where δ (x) refers to kernel functions which differ depending on the use of the model. According to our experiment we used a radial basis function (RBF).

3.3.4. DTR

Considering that our dataset is regression data, we decided to implement decision-tree regression using Python. A few reasons for choosing this ML method are how simple it is to implement and validate. Moreover, the computational cost to using the tree is relatively low, being [], as shown in Equation (4).
O ( log n t r a i n i n g   s a m p l e s )
Moreover, the regression tree is somewhat more complex than the classification tree. The decision tree regression optimally splits up the data into sections called leaves by using the value of the threshold to answer the following questions:
  • Does performing the split increase the amount of information we have about our dataset?
  • Does it add some value to the approach we would like to group our data points (information entropy)?
The algorithm does stop when it has reached a certain minimal amount of accepted information. Then the following information is used to create a decision tree according to the data that come from each split with the parameters; lastly, the algorithm takes the average of the terminal leaves points (Y) within each split, so when a new point comes (x1, x2, ...) the model predicts its results with the average of (Y) value with the following Equation (5):
Y ¯ = 1 n i n Y i
Let n be the number of samples. This algorithm improves accuracy by splitting the points, then taking the average of all points in each split, which generally results in a much higher prediction outcome for the new element [].

3.3.5. RFR

Random forest regression (RFR) is a regression technique based on machine learning techniques. Bagging and random subspace approaches are at the core of it. Bagging is used to generate a variety of decision trees, which are then ensembled to obtain the overall prediction. To train the learner trees, several independent bootstrap samples were constructed from the primary training data. Each bootstrap sample Db is made up of N instances drawn in D. The general Equation (6) for random forest regression prediction is:
R F R   p r e d i c t i o n = 1 K k = 1 K h k x
Db is approximately 2/3 the size of D and does not contain any duplicate instances. For bootstrap samples with input vector x, a total of K number of independent decision trees are created using the DTR method discussed above. Moreover, the replacement of examples is possible while contracting bootstrap samples. High variance and low bias characterize the regression trees. In regression tasks, the random forest prediction is generated using the mean prediction of K regression trees, hk(x) [].

3.3.6. XGBoost

We can simply define XGBoost as a set of decision trees constructed sequentially. In XGBoost, weights are very significant. All of the independent variables are given weights, which are subsequently fed into the decision tree, and which may be used to solve issues including regression and classification. The weight of factors that the tree predicted incorrectly is increased, and these variables are fed into the second decision tree. Individual classifiers are then combined to form a more powerful and precise model. XGBoost was created with careful consideration of both system optimization and machine learning techniques. The purpose of this model is to push machines to their boundaries in terms of computing in order to create a flexible, portable, and accurate model.
XGBoost is a kind of gradient-boosted decision tree (GBM) that is optimized for both performance and speed. XGBoost has many features such as gradient tree boosting. In Euclidean space, standard optimization methods cannot enhance the tree ensemble of classifiers. The model is instead taught in an incremental approach. Moreover, regularized learning aids in smoothing the final learned weights, preventing over-fitting from the dataset size, etc. The regularized target will favor models that use basic and predictive functions. Friedman proposed the very first approach, reduction, and column dimension reduction, which are two more approaches used to avoid overfitting in addition to the regularized aim. After each stage of tree boosting, shrinkage adjusts recently added weights by a ratio. Reduction, like a learning rate in stochastic optimization, decreases each tree’s influence while allowing future trees to enhance the model. When the number of features in the training set is smaller than the number of observations in the training set, or if the dataset exclusively contains numeric features, XGBoost is utilized. XGBoost works in a similar way to a decision tree in that it creates a specific number of trees depending on the issues, but it does it one by one, with each following tree using the knowledge obtained by the previous tree to enhance it []. To put it another way, any new tree will correct the mistakes caused by the prior tree. XGBoost uses the following Equation (7): []
Sensors 22 05326 i001

3.3.7. ANN

Artificial neural network (ANN) is a method that simulates how the biological brain works. It is a deep learning method that consists of three types of layers: input layer, hidden layers (number varies between different models), and output layer. The input layer receives the inputs from the dataset, assigns weight values to them, and passes them to the hidden layers. An activation function is assigned to each neuron and a bias variable is added to the data. The output of the neuron is then sent to the next neuron in the next hidden layer. Each hidden layer can use different activation functions. At the end, all outputs are collected into the output layer where the last process on data is done to predict where the data belong. For our experiment we used 35 output neurons for each experiment since we are trying to obtain 35 different outputs, predicting the productions of oil, gas, and water productions over three years. Basically, ANN uses a collection of interconnected neurons through multiple layers that receive inputs xi with weights value related to each input wij and a bias value, which allows shifting the activation function by adding a constant with its related weight as we can see Equation (8):
N j = i = 1 n w i j x i + b j
where Nj represents the set of data coming from the j-th neuron. The neuron output is computed by the various activation functions. The output of the j-th neuron and the activation function can thus be represented as in Equation (9):
O u t p u t j = f N j
where f varies among the choice of activation function, which is based on the type of problem the model is built to answer [].

3.3.8. RNN

RNN is a type of artificial neural network that processes sequential data to recognize patterns and predicts the final outcome. A similar calculation takes place for each element of a sequence, and the following output is based on a preceding calculation of the result. As part of its internal memory, RNN can remember or memorize the information of the input it received, which helps it to gain context and predict the next step. In order to anticipate the output of a layer, RNN saves the output of that layer and feeds it back into the input, and that is how RNN works. Moreover, it is one of the most powerful models when it comes to recognizing sequences of words and paragraphs, as well as predicting time series problems [].

4. AI Model

Figure 3 shows the model that we are building. We divided the dataset between the training and the test to 80% and 20% respectively. The AI model will use 80% of the dataset for training and 20% for testing. After finishing the training process, the evaluation process comes to measure the model. Hence, when the validation process takes place, it will use the split to ensure that the model is achieving accurate results. After finishing these two processes, we will be at the output trained files. Here we will use 20% of the remaining dataset to verify the result we obtained from the model.
Figure 3. AI Model.

5. Result and Analysis

For comparing the results of our eight experiments, we used the R2 (correlation coefficient) score to determine which of the models had the best performance. Later, we applied other testing measures such as MAE (mean absolute error), MSE (mean squared error), and RMSE (root mean squared error) to the best-performing models. Table 2 shows all the used parameters in the study for all the eight models we experimented on. These parameters were optimized by applying a grid search algorithm that compared the results of the models for multiple different combinations. These parameters have resulted in the R2 scores for each model as the next sections will show.
Table 2. All methods parameters optimized.

5.1. MLR

For MLR testing we used Sklearn LinearRegression to implement the model in Python. The model has a few parameters such as normalize and fit_intercept. However, those parameters are irrelevant to our model, as well as the fact that the data received are ideal state and the fact that normalization does not affect linear regression. Therefore, no further modification to the model or the data was implemented. For the train/test split we performed for the dataset is 80% for training, and 20% for testing and this is true for all methods. The model was used to predict the oil production, all productions (oil, water, and gas), as well as all dependent variables (all production + oil/water/gas ratio). Using k-fold cross-validation of five folds and five repeats, the model produced adequate results for the most part, with R2 of 0.834 for oil production, 0.7684 for gas production, and 0.6666 for water production, and a bad result of −0.02 for all dependent variables as shown in Figure 4.
Figure 4. MLR’s R2 score results.

5.2. PLR

We tested polynomial regression using the LinearRegression model from the Sklearn library. As for MLR there is no modification to the model or the data. However, we tested the model using different degrees (from two to eight). As with MLR, three tests were conducted on oil, all production, and all dependent variables. The model performed better with an R2 score of 0.966 for oil production, 0.9185 for gas production, and 0.8199 for water production, all by using the degree of four in the experiment as shown in Figure 5.
Figure 5. PLR’s R2 score results.

5.3. SVR

The SVR model experiment resulted in some good findings. However, these results were the lowest among all the other models. A repeated k-fold algorithm of five splits and five repeats was used in a grid search algorithm to tune the parameters of SVR. The parameters we found to achieve high results are: kernel with ‘Rbf’, Gamma ‘scale’, C ‘475’, Epsilon ‘0.01’, Max_iter ‘−0.1’, and Tol ‘0.1’. The model achieved great results with an R2 score of 0.9659 for oil productions, a score of 0.8129 for gas productions, and a score of 0.7543 for gas productions. An overall R2 score of 0.7401 was found for all dependent variables. All these results have been achieved after standardizing the data for the model as shown in Figure 6.
Figure 6. SVR’s R2 score results.

5.4. DTR

For testing the decision tree regression model, we apply sklearn.tree to implement the model using Python. We also implemented standardization and normalization for the dataset. Moreover, we applied the grid search technique to optimize the model parameters and increase the model accuracy as much as possible. We found that these parameters produce the highest results: Criterion ‘absolute_error’, Max_depth ‘6’, Max_features ‘auto’. Furthermore, the decision tree regression model has shown the result using R2 with K-fold cross-validation to predict all of the oil production, all productions (oil, water, and gas), as well as all dependent variables (all production and oil/water/gas ratio) with five folds, five repeats, and with random_state = 1 for standardization. The highest results were for the pure with R2 of 0.9225 for oil, 0.88 for all production, and 0.87 for all dependent variables as shown in Figure 7.
Figure 7. DTR’s R2 score results.

5.5. RFR

For testing the random forest regression model, we apply sklearn.tree to implement the model using Python. We also implemented standardization and normalization for the dataset. Moreover, we applied the grid search technique to optimize the model parameters and increase the model accuracy as much as possible. With this set of features, we achieved the highest results with: Criterion ‘squared_error’, n_estemators ‘100’, and Max_features ‘auto’. Furthermore, the random forest regression model has shown the result using R2 with K-fold cross-validation to predict all of the oil production, all productions (oil, water, and gas), as well as all dependent variables (all production and oil/water/gas ratio) with five folds, five repeats, and with random_state = 1 for standardization. The highest results in R2 were 0.9355 for oil, 0.9247 for gas, and 0.8029 for water production, as shown in Figure 8.
Figure 8. RFR’s R2 score results.

5.6. XGBoost

As for the XGBoost training, we used the same procedures as the previous methods. We again used k-fold cross-validation of five folds and five repeats, grid search, and fitting of the model itself. We used the XGBoost library to implement it easily, and we also implemented standardization and normalization for the data. XGBoost contains many parameters that can affect the result so, by using the gird search on the method several times in some parameters, we then took the best parameters and their values: Max_depth ‘2’, Learning Rate ‘0.4’, booster ‘gnlinear’, Gamma ‘0’. The model was used to predict the oil production, all productions (oil, water, and gas), all dependent variables (all production + oil/water/gas ratio), as well as the oil, water, and gas separately. XGBoost managed to achieve an R2 score of 0.9561 for the oil, 0.9336 for all gas, and lastly 0.8141 for water production as shown in Figure 9.
Figure 9. XGBoost’s R2 score results.

5.7. ANN

For the ANN model, we began our experiment using a grid search with an abundance of parameters to fit the best combination possible. We used a sequential classifier of three layers into k-fold cross-validation with five splits and three repeats, we increased the number of epochs to 1000 to ensure better results. The set of parameters are: optimizer ‘ReLu’, Activation ‘Adam’, Init_mode ‘Normal’, Epochs ‘1000’, Batch-size ‘10’, and Learn rate ‘0.3’. This collection of parameters appears to give the best results in predicting the oil production through all three years with a great correlation coefficient score of 0.9697, and thus we applied these parameters to the other tests. Using this model to predict all the gas productions across three years has yielded a score of 0.9185, and low results for the water productions with an R2 score of 0.5631. Lastly, we tested how well the model does through all outputs–all productions and ratios–and obtained a good score of 0.8506 compared with the other models as shown in Figure 10.
Figure 10. ANN’s R2 score results.

5.8. RNN(LSTM)

In RNN we used LSTM from the Kkeras library. We started with an LSTM input layer with 104 hidden nodes. Then we added 200 nodes in the hidden layer, a dropout of 0.2, and a dense output layer consisting of 35 nodes representing each month. Based on our tests for ANN we used adam as an optimizer and mean_squared_error for loss. The model was trained with a batch size of 50 and 400 epochs. The model performed well with an R2 score of 0.9785 for oil prediction as shown in Figure 11.
Figure 11. RNN’s R2 score results.
Lastly, we used other evaluation metrics in our ANN model, as its performance is one of the best in our experiments. As shown in Table 3 we can see the results of MAE, MSE, and RMSE.
Table 3. Other evaluation measures on the ANN model.
The oil results are the best-performing according to the results of our experiments in Figure 12, followed by the gas results. The water results show a small drop compared to oil and gas, which is caused by the nature of the dataset as it is more focused on oil and gas.
Figure 12. All models’ R2 score results comparision.

5.9. Experiments Discussion

The main advantage this paper provides is a fast and accurate prediction of a well’s production and characteristics in comparison with the current simulation methods, which take much more time and require more computational power. However, this is limited by the need for many records of wells or simulations of wells with different properties and geological features to accurately represent future wells.
The main limitation of this paper’s experiments is the dataset. The dataset represents wells in Saudi Arabia and, based on the location, the geological features differ, such as the porosity and permeability of rocks. This limits the application of this research to wells with similar properties. This limitation, however, can be overcome by acquiring datasets that represent different types of reservoirs and their production records and features. Another issue with the dataset is that the water productions are not as consistent as the gas and oil productions which lead to a significant drop in water production prediction compared to oil and gas production predictions. XGBoost and RNN, however, showed us the greatest water results according to Figure 12. We are planning to experiment more on these two models and more similar new methods to overcome the water’s low prediction issue.

6. Conclusions

In this paper, we attempted to accelerate the process of predicting oil and gas production using ML and DL methods. The models were passed through a series of transform functions that were applied to the data. Below are the main highlights of our findings:
  • The results we achieved with ANN, XGBoost, and RNN are the highest, with a mean R2 for oil, gas, and water of 0.9627, 0.9012, and 0.926, respectively. We found that ML algorithms performed best with the default dataset while the other algorithms performed better in the custom dataset. Some methods had more significant results if the data were standardized before experimenting, such as SVR with a mean R2 of 0.9014. Other algorithms, however, performed better with a pure dataset such as RFR with a mean R2 of 0.8848. Normalizing the dataset for both the default and the custom datasets did not yield good results and was outperformed by pure and standardized data.
  • After experimenting with the dataset and examining the results for every method selected, it is hard to say that these are the best results we can obtain. There is still plenty of room for improvement to achieve even better results by exploring different methods or a combination of methods. Nevertheless, the results we acquired are satisfactory considering the complexity of the problem.

7. Future Work

The progress and the results that we obtained are not final. We aim to improve and test on different set of methods, and even return to the previous ones and perform further tests to achieve better results. One of the of ideas we want to work toward is creating a system that can select an ML or DL models depending on the dataset type, features, and more.

Author Contributions

Conceptualization, N.M.I.; Formal analysis, A.A.A. (Ali A. Alharbi); Methodology, N.M.I.; Project administration, N.M.I.; Software, A.A.A. (Ali A. Alharbi), T.A.A., A.M.A., I.A.A. and A.M.H.; Writing—review & editing, N.M.I., A.A.A. (Ali A. Alharbi), T.A.A., A.M.A., I.A.A., A.M.H., A.S.A., D.A.A., M.K.A. and A.A.A. (Abdullah A. Almuqhim). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

We would like to acknowledge Saudi Aramco for providing us with the required assistance with the dataset, Ali Al-Turki and Osaid Hajjar.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Abdullayeva, F.; Imamverdiyev, Y. Development of Oil Production Forecasting Method Based on Deep Learning. Optim. Inf. Comput. 2019, 7, 826–839. [Google Scholar] [CrossRef] [Green Version]
  2. al Ajmi, M.D.; Alarifi, S.A.; Mahsoon, A.H. Improving Multiphase Choke Performance Prediction and Well Production Test Validation Using Artificial Intelligence: A New Milestone. In Proceedings of the Society of Petroleum Engineers—SPE Digital Energy Conference and Exhibition 2015, The Woodlands, TX, USA, 3–5 March 2015. [Google Scholar] [CrossRef]
  3. Ghorbani, H.; Wood, D.A.; Choubineh, A.; Tatar, A.; Abarghoyi, P.G.; Madani, M.; Mohamadian, N. Prediction of Oil Flow Rate through an Orifice Flow Meter: Artificial Intelligence Alternatives Compared. Petroleum 2020, 6, 404–414. [Google Scholar] [CrossRef]
  4. Amaechi, U.C.; Ikpeka, P.M.; Xianlin, M.; Ugwu, J.O. Application of Machine Learning Models in Predicting Initial Gas Production Rate from Tight Gas Reservoirs. Rud.-Geol.-Naft. Zb. 2019, 34, 29–40. [Google Scholar] [CrossRef] [Green Version]
  5. Mirzaei-Paiaman, A.; Salavati, S. The Application of Artificial Neural Networks for the Prediction of Oil Production Flow Rate. Energy Sources 2012, 34, 1834–1843. [Google Scholar] [CrossRef]
  6. Han, D.; Kwon, S. Application of Machine Learning Method of Data-driven Deep Learning Model to Predict Well Production Rate in the Shale Gas Reservoirs. Energies 2021, 14, 3629. [Google Scholar] [CrossRef]
  7. Pal, M. On Application of Machine Learning Method for History Matching and Forecasting of Times Series Data from Hydrocarbon Recovery Process Using Water Flooding. Pet. Sci. Technol. 2021, 39, 519–549. [Google Scholar] [CrossRef]
  8. Doan, T.T.; van Vo, M. Using Machine Learning Techniques for Enhancing Production Forecast in North Malay Basin. In Springer Series in Geomechanics and Geoengineering; Springer: Singapore, 2020; pp. 114–121. [Google Scholar] [CrossRef]
  9. Negash, B.M.; Yaw, A.D. Artificial Neural Network Based Production Forecasting for a Hydrocarbon Reservoir under Water Injection. Pet. Explor. Dev. 2020, 47, 383–392. [Google Scholar] [CrossRef]
  10. Guo, Z.; Wang, H.; Kong, X.; Shen, L.; Jia, Y. Machine Learning-Based Production Prediction Model and Its Application in Duvernay Formation. Energies 2021, 14, 5509. [Google Scholar] [CrossRef]
  11. Jabbari, M.; Khushaba, R.; Nazarpour, K.; Han, S.; Zhong, X.; Shao, H.; Pan, S.; Wang, J.; Zhou, W. Prediction on Production of Oil Well with Attention-CNN-LSTM. J. Phys. Conf. Ser. 2021, 2030, 012038. [Google Scholar] [CrossRef]
  12. Xia, L.; Shun, X.; Jiewen, W.; Lan, M. Predicting Oil Production in Single Well Using Recurrent Neural Network. In Proceedings of the 2020 International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering, ICBAIE, Fuzhou, China, 12–14 June 2020; pp. 423–430. [Google Scholar] [CrossRef]
  13. Nguyen-Le, V.; Shin, H. Artificial Neural Network Prediction Models for Montney Shale Gas Production Profile Based on Reservoir and Fracture Network Parameters. Energy 2022, 244, 123150. [Google Scholar] [CrossRef]
  14. Al-Shabandar, R.; Jaddoa, A.; Liatsis, P.; Hussain, A.J. A Deep Gated Recurrent Neural Network for Petroleum Production Forecasting. Mach. Learn. Appl. 2021, 3, 100013. [Google Scholar] [CrossRef]
  15. Al-Wahaibi, T.; Mjalli, F.S. Prediction of Horizontal Oil-Water Flow Pressure Gradient Using Artificial Intelligence Techniques. Chem. Eng. Commun. 2014, 201, 209–224. [Google Scholar] [CrossRef]
  16. Wahid, M.F.; Tafreshi, R.; Khan, Z.; Retnanto, A. Prediction of Pressure Gradient for Oil-Water Flow: A Comprehensive Analysis on the Performance of Machine Learning Algorithms. J. Pet. Sci. Eng. 2021, 208, 109265. [Google Scholar] [CrossRef]
  17. Orrù, P.F.; Zoccheddu, A.; Sassu, L.; Mattia, C.; Cozza, R.; Arena, S. Machine Learning Approach Using MLP and SVM Algorithms for the Fault Prediction of a Centrifugal Pump in the Oil and Gas Industry. Sustainability 2020, 12, 4776. [Google Scholar] [CrossRef]
  18. Alrifaey, M.; Lim, W.H.; Ang, C.K. A Novel Deep Learning Framework Based RNN-SAE for Fault Detection of Electrical Gas Generator. IEEE Access 2021, 9, 21433–21442. [Google Scholar] [CrossRef]
  19. Ahmadi, M.A.; Chen, Z. Machine Learning Models to Predict Bottom Hole Pressure in Multi-Phase Flow in Vertical Oil Production Wells. Can. J. Chem. Eng. 2019, 97, 2928–2940. [Google Scholar] [CrossRef]
  20. Khamehchi, E.; Bemani, A. Prediction of Pressure in Different Two-Phase Flow Conditions: Machine Learning Applications. Measurement 2021, 173, 108665. [Google Scholar] [CrossRef]
  21. Sami, N.A.; Ibrahim, D.S. Forecasting Multiphase Flowing Bottom-Hole Pressure of Vertical Oil Wells Using Three Machine Learning Techniques. Pet. Res. 2021, 6, 417–422. [Google Scholar] [CrossRef]
  22. Song, H.; Du, S.; Wang, R.; Wang, J.; Wang, Y.; Wei, C.; Liu, Q. Potential for Vertical Heterogeneity Prediction in Reservoir Basing on Machine Learning Methods. Geofluids 2020, 2020, 3713525. [Google Scholar] [CrossRef]
  23. Singh, H.; Seol, Y.; Myshakin, E.M. Prediction of Gas Hydrate Saturation Using Machine Learning and Optimal Set of Well-Logs. Comput. Geosci. 2020, 25, 267–283. [Google Scholar] [CrossRef]
  24. Feng, X.; Feng, Q.; Li, S.; Hou, X.; Liu, S. A Deep-Learning-Based Oil-Well-Testing Stage Interpretation Model Integrating Multi-Feature Extraction Methods. Energies 2020, 13, 2042. [Google Scholar] [CrossRef] [Green Version]
  25. Feng, X.; Feng, Q.; Li, S.; Hou, X.; Zhang, M.; Liu, S. Automatic Deep Vector Learning Model Applied for Oil-Well-Testing Feature Mining, Purification and Classification. IEEE Access 2020, 8, 151634–151649. [Google Scholar] [CrossRef]
  26. Ali, A. Data-Driven Based Machine Learning Models for Predicting the Deliverability of Underground Natural Gas Storage in Salt Caverns. Energy 2021, 229, 120648. [Google Scholar] [CrossRef]
  27. Chakraborty, A.; Goswami, D. Prediction of Slope Stability Using Multiple Linear Regression (MLR) and Artificial Neural Network (ANN). Arab. J. Geosci. 2017, 10, 385. [Google Scholar] [CrossRef]
  28. Sinha, P. Multivariate Polynomial Regression in Data Mining: Methodology, Problems and Solutions. Int. J. Sci. Eng. Res. 2013, 4, 962–965. [Google Scholar]
  29. Wei, W.; Li, X.; Liu, J.; Zhou, Y.; Li, L.; Zhou, J. Performance Evaluation of Hybrid Woa-svr and Hho-svr Models with Various Kernels to Predict Factor of Safety for Circular Failure Slope. Appl. Sci. 2021, 11, 1922. [Google Scholar] [CrossRef]
  30. Wang, Z.H.; Liu, Y.M.; Gong, D.Y.; Zhang, D.H. A New Predictive Model for Strip Crown in Hot Rolling by Using the Hybrid AMPSO-SVR-Based Approach. Steel Res. Int. 2018, 89, 1800003. [Google Scholar] [CrossRef]
  31. 1.10. Decision Trees—Scikit-Learn 1.1.1 Documentation. Available online: https://scikit-learn.org/stable/modules/tree.html#tree (accessed on 3 July 2022).
  32. Pekel, E. Estimation of Soil Moisture Using Decision Tree Regression. Theor. Appl. Climatol. 2020, 139, 1111–1119. [Google Scholar] [CrossRef]
  33. Ganesh, N.; Jain, P.; Choudhury, A.; Dutta, P.; Kalita, K.; Barsocchi, P. Random Forest Regression-Based Machine Learning Model for Accurate Estimation of Fluid Flow in Curved Pipes. Processes 2021, 9, 2095. [Google Scholar] [CrossRef]
  34. Pesantez-Narvaez, J.; Guillen, M.; Alcañiz, M. Predicting Motor Insurance Claims Using Telematics Data-XGBoost versus Logistic Regression. Risks 2019, 7, 70. [Google Scholar] [CrossRef] [Green Version]
  35. Sherstinsky, A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.