Production Decline Rate Prediction for Offshore High Water-Cut Reservoirs by Integrating Moth–Flame Optimization with Extreme Gradient Boosting Tree

Ding, Zupeng; Lu, Chuan; Chen, Long; Chong, Qinwan; Dong, Yintao; Xia, Wenlong; Meng, Fankun

doi:10.3390/pr13072266

Open AccessArticle

Production Decline Rate Prediction for Offshore High Water-Cut Reservoirs by Integrating Moth–Flame Optimization with Extreme Gradient Boosting Tree

by

Zupeng Ding

^1,2,

Chuan Lu

^1,2,

Long Chen

^1,2

,

Qinwan Chong

^1,2,

Yintao Dong

^1,2,

Wenlong Xia

³

and

Fankun Meng

^3,*

¹

State Key Laboratory of Offshore Oil and Gas Exploitation, Beijing 100028, China

²

CNOOC Research Institute Co., Ltd., Beijing 100028, China

³

College of Petroleum Engineering, Yangtze University, Wuhan 430100, China

^*

Author to whom correspondence should be addressed.

Processes 2025, 13(7), 2266; https://doi.org/10.3390/pr13072266

Submission received: 4 June 2025 / Revised: 1 July 2025 / Accepted: 15 July 2025 / Published: 16 July 2025

(This article belongs to the Section Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

The prediction of production decline rate in the development of offshore high water-cut reservoirs predominantly relies on the traditional Arps decline curves. However, the solution process is complex, and the interpretation efficiency is low, making it difficult to meet the demand for rapid prediction of production decline rates. To address this, this paper first identifies the key influencing factors of production decline rate through comprehensive feature engineering. Subsequently, it proposes a novel prediction method for the production decline rate in offshore high water-cut reservoirs by integrating Moth–Flame Optimization with Extreme Gradient Boosting Tree (MFO-XGBoost). This method utilizes seven dynamic and static influencing factors, namely vertical thickness, perforated thickness, shale content, permeability, crude oil viscosity, formation flow coefficient, and well deviation angle, to predict the production decline rate. The forecasting outcomes of the MFO-XGBoost method are then compared with those of standard RF, standard DT, the standalone XGBoost model, and the calculated results from the exponential decline model. Additionally, the forecasting capability of the MFO-XGBoost method is benchmarked against Particle Swarm Optimization–XGBoost (PSO-XGBoost) and Bayesian Optimization–XGBoost methods for predicting the production decline rate in offshore high water-cut reservoirs. The findings from the experiments show that the MFO-XGBoost method can achieve accurate prediction of the production decline rate in offshore high water-cut reservoirs, with a coefficient of determination (R²) reaching 0.9128, thereby providing a basis for strategies to mitigate the production decline rate.

Keywords:

decline rate; prediction model; moth–flame optimization; machine learning

1. Introduction

In the advancement of offshore reservoirs with a high water cut, the production of offshore oilfields typically experiences stages of increasing production, stable production, and declining production [1,2,3,4]. As offshore high water-cut oilfields enter the middle and late stages of development, many mature oilfields face the challenges of accelerated natural decline and increased difficulty in maintaining stable production [5,6,7]. Consequently, studying the production decline patterns and adopting corresponding stable production measures has become crucial. Regarding the research on oilfield production decline patterns, over 20 types of production decline models have been proposed by researchers both domestically and internationally [8,9,10,11]. Analysis reveals that most studies are based on Arps decline curves to determine the connection between the rate of production decline and the time series data [12,13,14,15]. Currently, in the advancement of offshore reservoirs with a high water cut, the prediction of production decline rate using traditional Arps decline curve theory involves a complex solution process and low interpretation efficiency, making it difficult to meet the demand for rapid prediction of production decline rates.

In recent years, data-driven machine learning (ML) techniques have increasingly emerged as a vital tool for the rapid prediction of development indicators in oil and gas field development. These methods can not only process large amounts of dynamic and static data but also provide more accurate predictions of development indicators through complex nonlinear modeling capabilities [16,17]. The swift progress of artificial intelligence (AI) technology is significantly enhancing the development of oil and gas resources like never before [18,19,20,21,22]. Alimohammadi H. et al. [23] employed a multivariate time series approach for predicting production in unconventional resources. Amr S. et al. [24] used ML models to forecast the production of horizontal wells. Bhattacharyya S. et al. [25] used ML models to predict the decline in oil cut of Bakken shale wells. Chaikine I. et al. [26] used ML models to forecast the production of wells under multi-stage hydraulic fracturing conditions, and Chaikine I. et al. [27] also employed ML methods to forecast the output of multi-stage horizontal wells. Chen X. et al. [28] used the LSTM method to predict the productivity of horizontally drilled shale gas wells after volumetric fracturing. Gao Q. et al. [29] applied artificial intelligence technology to predict the production of unconventional natural gas. Artificial intelligence technology is full of unlimited vitality in the oil and gas field [30]. However, existing studies still exhibit limitations in three key aspects: noise robustness, generalization capability under small-sample conditions, and feasibility for real-time deployment.

Therefore, addressing the issue that traditional production decline rate prediction methods in the oil and gas field can no longer meet the demand for rapid prediction, this paper develops a lightweight MFO-XGBoost framework to reduce on-site deployment costs and employs the Extreme Gradient Boosting (XGBoost) algorithm to predict the production decline rate of offshore high water-cut reservoirs. Furthermore, it utilizes the Moth–Flame Optimization (MFO) algorithm to automatically tune the hyperparameters of XGBoost, proposing an MFO-XGBoost method for predicting the production decline rate in offshore high water-cut reservoirs. By identifying the key influencing factors of production decline rate through comprehensive feature engineering, this method uses seven dynamic and static influencing factors, namely vertical thickness, perforated thickness, shale content, permeability, crude oil viscosity, formation flow coefficient, and well deviation angle, to predict the production decline rate in offshore high water-cut reservoirs. The prediction results of the MFO-XGBoost method are then compared with those of standard RF, standard DT, the standalone XGBoost model, and the calculated results from the exponential decline model. Furthermore, the prediction performance of the MFO-XGBoost method is benchmarked against PSO-XGBoost and Bayesian-XGBoost models for predicting the production decline rate in offshore high water-cut reservoirs. The results demonstrate that the MFO-XGBoost method can achieve accurate prediction of the production decline rate in offshore high water-cut reservoirs, with a coefficient of determination (R²) reaching 0.9128, thus providing a basis for strategies to mitigate the production decline rate.

2. Principle of the MFO-XGBoost Method

2.1. XGBoost Algorithm

The XGBoost [31] algorithm, building upon the foundation of gradient boosting trees, combines multiple decision trees into a strong classifier, effectively addressing the limitations of gradient boosting trees in terms of efficiency and scalability.

Given a dataset

D = \{(x_{i}, y_{i}) | i = 1, 2, \dots, m, x_{i} \in R^{p}, y_{i} \in R\}

consisting of m samples and p features, and given

k (k = 1, 2, \dots, K)

regression trees, where F is the space of regression trees, the model can be represented as follows:

{\hat{y}}_{i} = \sum_{k = 1}^{K} f_{k} x_{i}, f_{k} \in F

(1)

The objective function is defined as:

O_{bj} = \sum_{i = 1}^{m} l (y_{i}, {\hat{y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})

(2)

where

{\hat{y}}_{i}

represents the predicted value and

y_{i}

represents the actual value.

By incorporating a regularization term

Ω (f_{k})

, the result of the t-th iteration of the model can be expressed as shown in Equation (3):

{\hat{y}}_{i}^{(t)} = \sum_{j = 1}^{t} f_{k} (x_{i}) = {\hat{y}}_{i}^{(t - 1)} + f_{t} (x_{i})

(3)

Substituting Equation (3) into Equation (2), the objective function at the t-th iteration, denoted as

O_{bj}^{(t)}

, is obtained as shown in Equation (4):

O_{bj}^{(t)} = \sum_{i = 1}^{m} l [y_{i}, {\hat{y}}_{i}^{(t - 1)} + f_{t} (x_{i})] + Ω (f_{k}) + σ

(4)

Carrying out a second-order Taylor expansion of the objective function while including the regularization term

Ω (f_{k})

, as defined in Equation (2), yields Equation (5):

\{\begin{matrix} O_{bj}^{(t)} ≅ \sum_{i = 1}^{m} [\partial_{{\hat{y}}_{i} (t - 1)} l (y_{i}, {\hat{y}}_{i}^{(t - 1)}) f_{t} (x_{i}) + \frac{1}{2} \partial_{{\hat{y}}_{i} (t - 1)}^{2} l (y_{i}, {\hat{y}}_{i}^{(t - 1)}) f_{t}^{2} (x_{i})] + Ω (f_{k}) + σ \\ Ω (f_{k}) = γ T + \frac{1}{2} λ ‖ ω^{2} ‖ \end{matrix}

(5)

where T and w indicate the quantity of leaf nodes and the corresponding leaf weight value of the tree, respectively.

γ

is the tree complexity penalty coefficient.

λ

is the leaf weight penalty coefficient.

σ

is the other items.

2.2. Moth–Flame Optimization Algorithm

Genetic algorithms (GAs) and Bayesian optimization are widely used. However, the moth–flame optimization (MFO) algorithm offers the following advantages: (1) In high-dimensional spaces, MFO reduces iteration counts by 30–50% compared to GA [32], and (2) its spiral search mechanism achieves better exploration–exploitation balance than grid search [33,34].Therefore, MFO [35] was selected to optimize the hyperparameters of the XGBoost model. The algorithmic steps of the Moth–Flame Optimization are as follows:

(1) Initialization: Randomly generate the initial positions of the moth population, calculate the fitness value of each moth, and initialize the flame positions with the moth positions.

(2) Flame update: In each iteration, merge the moth and flame populations, sort them based on their fitness values, and select the top N optimal solutions as the new generation of flames.

(3) Update of the moth position: Assign each moth to a flame and update its position accordingly. Continue this process repeatedly until either the maximum number of iterations is achieved or the best solution for the model’s hyperparameters is identified.

2.3. Workflow of the MFO-XGBoost Model for Predicting Production Decline Rate in Offshore High Water-Cut Reservoirs

The workflow of the MFO-XGBoost method for forecasting the production decline rate in offshore high water-cut reservoirs is demonstrated in Figure 1. The detailed procedures are outlined below:

(1) Data Preprocessing: Perform data preprocessing on the collected data, including the detection and removal of missing values, outliers, and duplicate entries.

(2) Feature Engineering: Employ feature engineering techniques to reduce the dimensionality of the dataset’s features, enhancing the practicality for application in actual oil and gas fields and alleviating the pressure of data acquisition for various oilfields.

(3) Dataset Partitioning and Initialization: Shuffle the dataset and divide it into a training set (70% of the samples) and a testing set (the remaining 30% of the samples). Select an appropriate fitness function, initialize the moth population, and calculate the fitness value for each moth.

(4) Position Update and Iteration: Update the position information of moths and flames. Decrease the number of flames and update the positions of moths using the corresponding formulas. The iteration process continues until the optimal values for the model hyperparameters are found or the maximum number of iterations is reached.

(5) Model Training and Prediction: Assign the optimal hyperparameters obtained to the XGBoost model and retrain it. Apply the trained XGBoost model to the testing set to obtain the final prediction results of the MFO-XGBoost method.

2.4. Evaluation Metrics

For regression problems, the following metrics are typically employed:

(1) Mean Absolute Error (MAE): The mean absolute error is calculated by averaging the absolute differences between the predicted values and the actual values. It reflects the overall deviation of the predictions, and a lower MAE indicates better performance.

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(6)

(2) Mean Squared Error (MSE): The Mean Squared Error is calculated by averaging the squares of the differences between the predicted values and the actual values. It amplifies the impact of larger errors, and a lower MSE indicates better performance.

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(7)

(3) Root Mean Squared Error (RMSE): The RMSE is derived by taking the square root of the MSE. This process brings the metric back to the same scale as the original data, and a smaller RMSE signifies improved performance.

RMSE = \sqrt{MSE} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(8)

(4) Coefficient of Determination (R-squared, R²): The coefficient of determination represents the model’s ability to explain the variance in the target variable. A higher R² value indicates a better fit, with values closer to 1 being desirable.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(9)

In Equations (6)–(9),

y_{i}

represents the true value of the production decline rate for the i-th sample in the offshore high water-cut reservoir production decline rate dataset,

{\hat{y}}_{i}

represents the predicted value of the production decline rate for the i-th sample in the offshore high water-cut reservoir production decline rate dataset, and n is the total number of samples in the offshore high water-cut reservoir production decline rate dataset. In Equation (9),

\bar{y}

represents the mean of the true values in the offshore high water-cut reservoir production decline rate dataset.

3. Application Example

3.1. Overview of the Studied Oilfield Block

The oilfield block investigated in this study is the P oilfield block, an offshore high water-cut reservoir located in China. This block comprises 442 effective production wells, including 75 horizontal wells and 367 directional wells. The fluids in this oilfield block are characterized by high viscosity, high density, high gum content, and low asphaltene content. The formation in this block exhibits a high initial oil saturation pressure, a small pressure difference between initial and bubble-point pressures, a low dissolved gas–oil ratio, and significant variations in formation crude oil viscosity. The formation water in this block is of the sodium bicarbonate type, with a total formation water salinity ranging from 3000 to 8000 mg/L.

This oilfield block is situated within a normal reservoir temperature and pressure system. However, it exhibits weak edge water energy and insufficient natural energy, necessitating the early implementation of artificial energy supplementation for development. Upon commencement of production, the oilfield block achieved a daily oil production of 0.92 ×

10^{4} m^{3}

, with an overall water cut of 84.7%, a daily water injection rate of 6.19 ×

10^{4} m^{3}

, and a cumulative water–oil ratio of 0.89. Detailed parameters for a portion of the offshore P oilfield are presented in Table 1.

3.2. Data Acquisition and Processing

In this study, a sample dataset was constructed based on the P offshore high water-cut oilfield block, which includes 442 effective production wells, comprising 75 horizontal wells and 367 directional wells. Information regarding the rate of production decline and both the dynamic and static factors affecting this decline rate was gathered for these 442 wells. The influencing factors include vertical thickness, perforated thickness, porosity, oil saturation, shale content, permeability, crude oil viscosity, mobility, formation flow coefficient, reservoir heterogeneity coefficient, the coefficient that describes the change in permeability at a boundary between two different reservoir zones, production pressure difference, and well deviation angle. The angle-based well deviation angle was calculated through the following steps:

(1) Obtain vertical thickness (

α

) data.

(2) Obtain measured depth (

β

) data.

(3) Calculate the radian-based well deviation angle (

ψ

) using Equation (10):

ψ = \arccos \frac{α}{β}

(10)

(4) Calculate the angle-based well deviation angle (

ψ^{'}

) using Equation (11):

ψ^{'} = \frac{ψ}{π} \times 180

(11)

The collected sample dataset was then subjected to missing value detection. A missing value heatmap was plotted, as shown in Figure 2, to visualize the distribution of missing values within the dataset, where dark blue indicates missing values and yellow indicates no missing values. As evident from Figure 2, the reservoir heterogeneity coefficient, the coefficient that describes the change in permeability at a boundary between two different reservoir zones, and production pressure difference exhibited significant missing data. Consequently, these three features were excluded from the sample dataset. Furthermore, any other sample points containing missing values were also removed. Subsequently, the sample dataset underwent duplicate value and outlier detection, revealing no duplicate or outlier entries. After data cleaning, the statistical summary information of the sample dataset is presented in Table 2. Table 2 describes the mathematical distribution of some features in the sample dataset, including the total count, average, standard deviation, lowest value, 25th percentile, median, 75th percentile, and highest value for every column.

Feature engineering was performed on the cleaned sample dataset of production decline rate in offshore high water-cut reservoirs. Given the limitations of individual feature engineering methods, such as strong reliance on linear assumptions, sensitivity to noise and small samples, and a lack of dynamic robustness verification [36], this paper adopts a comprehensive feature engineering approach that integrates five methods: linear regression, Grey Relational Analysis (GRA), Shapley Additive Explanations (SHAP), Pearson correlation coefficient, and Mean Decrease Impurity (MDI). The workflow of this integrated approach is illustrated in Figure 3. The main steps of this method are as follows: First, the sample dataset of production decline rate in offshore high water-cut reservoirs is input into linear regression, GRA, SHAP, Pearson correlation coefficient, and MDI to perform their respective calculations and obtain the corresponding results. Second, the top 70% of features are selected based on the feature engineering evaluation criteria of each method. Then, a majority voting approach is applied, where features identified by three or more methods are considered the finally selected key influencing factors. Finally, the selected key influencing factors are output as the dominant factors for the production decline rate in the offshore high water-cut reservoir sample dataset.

The results of linear regression, GRA, SHAP, Pearson correlation coefficient, and MDI are shown in Figure 4, Figure 5, Figure 6, Figure 7, and Figure 8, respectively. In linear regression, GRA, SHAP, Pearson correlation coefficient, and MDI, a larger fitting coefficient with the production decline rate, a higher grey relational grade, a larger SHAP value, a larger absolute value of the Pearson correlation coefficient, and a higher importance score, respectively, indicate that the feature is more likely to be a key influencing factor. The key influencing factors identified by each method are then subjected to majority voting, and the results are shown in Figure 9. As shown in Figure 9, the key influencing factors for the production decline rate in the offshore high water-cut reservoir sample dataset are vertical thickness, perforated thickness, shale content, permeability, crude oil viscosity, formation flow coefficient, and well deviation angle.

3.3. MFO-XGBoost Model Training and Validation

The preprocessed and feature-engineered sample data of production decline rate in offshore high water-cut reservoirs was subjected to Z-score standardization. The data was then divided randomly into a training set and a testing set in a 70:30 ratio.

The MFO algorithm was employed to optimize the hyperparameters of the XGBoost model using the training set. The fitness function was set as the average coefficient of determination (R²) obtained from 10-fold cross-validation of the XGBoost model in each iteration. The population size was set to 50 with 200 iterations. The six hyperparameters optimized were: n_estimators (number of trees), learning_rate (step size for weight updates in each iteration), max_depth (the greatest depth of a tree), gamma (the minimum reduction in the loss function needed to justify an additional split on a leaf node), subsample (the proportion of samples to be randomly selected for each tree), and colsample_bytree (the proportion of features to be randomly selected for each tree). After 200 iterations, the coefficient of determination (R²) of the MFO-XGBoost method on the testing dataset reached a maximum value of 0.9128 and converged at the 180th iteration. The iterative optimization process of the optimal hyperparameters for MFO-XGBoost is shown in Figure 10, and the optimal hyperparameter values for the MFO-XGBoost method are listed in Table 3.

The MFO-XGBoost method, built using the optimal hyperparameter combination, was used to predict the production decline rate in offshore high water-cut reservoirs. The calculated results from the exponential decline model were taken as the true values of the production decline rate. The results are shown in Figure 11. The results indicate that the MFO-XGBoost method achieved a coefficient of determination (R²) of 0.9542 on the training set and 0.9128 on the testing set. This demonstrates that the MFO-XGBoost method exhibits a good predictive performance for the production decline rate in offshore high water-cut reservoirs.

3.4. Comparison of Prediction Performance of Different Models

To further validate the superiority of the MFO-XGBoost model in predicting the production decline rate of offshore high water-cut reservoirs, the prediction results of the MFO-XGBoost method were compared with those of standard RF, standard DT, and the standalone XGBoost model, using the calculated results from the exponential decline model as the true values of the production decline rate. Additionally, MAE, MSE, RMSE, and R² were employed as evaluation metrics to quantify the prediction performance of each ML model. The results are shown in Figure 12, Figure 13 and Figure 14 and summarized in Table 4.

As evident from Figure 12, Figure 13 and Figure 14 and Table 4, among the standard RF, standard DT, and XGBoost models, XGBoost exhibited the best prediction performance, which inherently stems from its flexible nonlinear modeling capability and targeted optimization strategies. Compared to the standalone XGBoost model, the hyperparameter optimization component of the MFO-XGBoost method significantly improved the prediction performance.

Based on the same dataset, two other prediction models, Bayesian Optimization–XGBoost (Bayesian-XGBoost) and Particle Swarm Optimization–XGBoost (PSO-XGBoost), were also used to predict the production decline rate in offshore high water-cut reservoirs and were compared with the MFO-XGBoost method. The results are shown in Figure 15 and Figure 16 and summarized in Table 5.

As shown in Figure 15 and Table 5, compared to the MFO-XGBoost method, the Bayesian-XGBoost model showed average prediction performance. In the testing set, there were several wells where the predicted production decline rate differed significantly from the calculated results of the exponential decline model, leading to poorer overall prediction performance. As shown in Figure 16 and Table 5, the PSO-XGBoost model exhibited overfitting, with good prediction performance on the training set but a significant decline in prediction performance on the testing set. Thus, the MFO-XGBoost method demonstrated the best prediction performance for the task of predicting the production decline rate in offshore high water-cut reservoirs. The underlying mechanisms for the superior prediction performance of the MFO-XGBoost method are as follows:

(1) MFO optimizes the critical hyperparameter combinations of XGBoost, making the model better adapted to the specific dataset’s feature distribution and noise level, thereby reducing prediction bias.

(2) By optimizing parameters such as subsample and colsample_bytree, MFO controls the randomness of data sampling, enhancing the model’s robustness. Adjusting max_depth balances the tree complexity, avoiding overfitting to the training data.

(3) The hyperparameter space of XGBoost typically exhibits non-convex and multi-modal characteristics. The population intelligence of MFO can effectively traverse this complex space and find better solutions.

4. Conclusions

To address the challenge of predicting production decline rates in offshore high water-cut oil reservoirs, this study proposes an MFO-XGBoost intelligent prediction framework. The key findings are as follows: (1) Multimodal feature fusion identified, for the first time, seven dominant control parameters: vertical thickness, perforated thickness, shale content, permeability, crude oil viscosity, formation flow coefficient, and well deviation angle. This provides a theoretical basis for targeted regulation of high water-cut reservoirs. (2) Three models—MFO-XGBoost, Bayesian-XGBoost, and PSO-XGBoost—were applied to predict production decline rates using actual data from the offshore P oilfield. The test set coefficients of determination (R²) were 0.9128, 0.873, and 0.8981, respectively, demonstrating the MFO-XGBoost model’s superior performance in predicting decline rates for offshore high water-cut reservoirs.

5. Limitations and Future Work

The current study has certain limitations due to its reliance on a specific dataset. The machine learning prediction model was validated only in the offshore P oilfield, a high water-cut reservoir in China, and has not been tested across diverse geological or operational environments. While this confirms the model’s effectiveness and accuracy under the studied conditions, future work should incorporate datasets from other geological settings to investigate potential performance variability. The authors plan to conduct additional experiments if supplementary data become available, further assessing the model’s generalizability.

Author Contributions

Conceptualization, Z.D. and F.M.; methodology, Z.D., C.L. and L.C.; software, Q.C. and Y.D.; validation, Z.D., C.L. and W.X.; formal analysis, C.L. and Y.D.; investigation, Z.D., L.C. and Q.C.; resources, F.M. and W.X.; data curation, Y.D. and Q.C.; writing—original draft preparation, Z.D. and L.C.; writing—review and editing, C.L., W.X. and F.M.; visualization, L.C. and Y.D.; supervision, F.M.; project administration, F.M.; funding acquisition, F.M. and W.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China grant number 52104018, 52104017, and 52274030.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Authors Zupeng Ding, Chuan Lu, Long Chen, Qinwan Chong and Yintao Dong were employed by the CNOOC Research Institute Co, Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

AI	Artificial intelligence
RF	Random forest
DT	Decision tree regression
XGBoost	Extreme gradient boosting tree
MFO	Moth–flame optimization
GA	Genetic algorithm
MFO-XGBoost	Integrating moth–flame optimization with extreme gradient boosting tree
PSO-XGBoost	Integrating particle swarm optimization with extreme gradient boosting tree
Bayesian-XGBoost	Integrating Bayesian optimization with extreme gradient boosting tree
GRA	Grey relational analysis
SHAP	Shapley additive explanations
MDI	Mean decrease impurity
MAE	Mean absolute error
MSE	Mean squared error
RMSE	Root mean squared error
ML	Machine learning

References

Wang, R.; Xue, L.L.; Dang, D.Q.; Gao, W.J. Establishment and application of a general formula for production decline equations. Acta Pet. Sin. 2023, 44, 1693–1705. [Google Scholar]
Men, H.W.; Zhang, J.; Wei, H.J.; Zhao, Y.; Gao, W.J. Establishment and application of a pan-functional mathematical model for oil and gas reservoir production cycles. Xinjiang Pet. Geol. 2023, 44, 365–374. [Google Scholar]
Wang, X.; Cai, B.; Li, S. History and prospect of reservoir stimulation technology for oil and gas reservoirs in China National Petroleum Corporation. Pet. Drill. Tech. 2023, 45, 67–75. [Google Scholar]
Zhao, G.Z.; Li, C.L.; He, X. Variation law of development indicators in the extra-high water-cut stage of continental sandstone oil reservoirs. Pet. Geol. Oilfield Dev. Daqing 2023, 42, 50–58. [Google Scholar]
Yu, Q.T. Production decline law of waterflooded oilfields. Pet. Explor. Dev. 1993, 20, 72–80. [Google Scholar]
Ji, B.Y. Seepage flow theory basis of production decline equation. Acta Pet. Sin. 1995, 16, 86–91. [Google Scholar]
Gao, W.J.; Wang, Z.J. Discrimination theory basis and application of production decline equation. Xinjiang Pet. Geol. 1999, 20, 518–521. [Google Scholar]
Arps, J.J. Analysis of decline curves. Trans. AIME 1945, 160, 228–247. [Google Scholar] [CrossRef]
Gentry, R.W. Decline-curve analysis. J. Pet. Technol. 1972, 24, 38–41. [Google Scholar] [CrossRef]
Yu, Q.T. The making and application of generalized decline curve standard charts. Pet. Explor. Dev. 1990, 17, 84–87. [Google Scholar]
Yu, Q.T. Study on the characteristics of seven decline curves. Xinjiang Pet. Geol. 1994, 15, 49–56. [Google Scholar]
Liu, W.F.; Zhang, X.Y.; Sheng, S.Y.; Wang, K.; Duan, Y.G.; Wei, M.Q. Research on a new combined method for production decline analysis of tight oil reservoirs: A case study of Mahu tight oil reservoir. Pet. Reserv. Eval. Dev. 2021, 11, 911–916. [Google Scholar]
Wang, Q.; Zeng, J.C.; Liang, B. Research and application of production decline law based on Arps algorithm. Well Logging Eng. 2021, 32, 142–146. [Google Scholar]
Huang, S.; Peng, C.Z. Analysis of production decline factors based on grey relational analysis. Pet. Reserv. Eval. Dev. 2018, 8, 33–35. [Google Scholar]
Li, C.; Wang, Y.; Li, W.Z. Application of type A water drive curve and production decline method in Bohai B oilfield. Complex Hydrocarb. Reserv. 2023, 16, 100–103. [Google Scholar]
Cao, Q.; Banerjee, R.; Gupta, S.; Li, J.; Zhou, W.; Jeyachandra, B. Data driven production forecasting using machine learning. In Proceedings of the SPE Argentina Exploration and Production of Unconventional Resources Symposium, Buenos Aires, Argentina, 1–3 June 2016; p. D021S006R001. [Google Scholar]
Li, Y.; Han, Y. Decline curve analysis for production forecasting based on machine learning. In Proceedings of the SPE Symposium: Production Enhancement and Cost Optimisation, Kuala Lumpur, Malaysia, 7–8 November 2017; p. D011S004R003. [Google Scholar]
Johan, D.C.; Shukla, P.; Shrivastava, K.; Koley, M. Data-Driven Completion Optimization for Unconventional Assets. In Proceedings of the 11th Unconventional Resources Technology Conference, Denver, CO, USA, 13–15 June 2023. [Google Scholar] [CrossRef]
Kong, B.; Chen, S.; Chen, Z.; Zhou, Q. Bayesian probabilistic dual-flow-regime decline curve analysis for complex production profile evaluation. J. Pet. Sci. Eng. 2020, 195, 107623. [Google Scholar] [CrossRef]
Kong, B.; Chen, Z.; Chen, S.; Qin, T. Machine learning-assisted production data analysis in liquid-rich Duvernay formation. J. Pet. Sci. Eng. 2021, 200, 108377. [Google Scholar] [CrossRef]
Li, D.; You, S.; Liao, Q.; Sheng, M.; Tian, S. Prediction of Shale Gas production by hydraulic fracturing in Changning Area using machine learning algorithms. Transp. Porous Media 2023, 149, 373–388. [Google Scholar] [CrossRef]
Lu, C.; Jiang, H.; Yang, J.; Wang, Z.; Zhang, M.; Li, J. Shale oil production prediction and fracturing optimization based on machine learning. J. Pet. Sci. Eng. 2022, 217, 110900. [Google Scholar] [CrossRef]
Alimohammadi, H.; Rahmanifard, H.; Chen, N. Multivariate time series modelling approach for production forecasting in unconventional resources. In Proceedings of the SPE Annual Technical Conference and Exhibition, Virtual, 26–29 October 2020. [Google Scholar] [CrossRef]
Amr, S.; Ashhab, E.; El-Saban, H.; Schietinger, M.; Caile, P.; Kaheel, C.; Rodriguez, A.L. A large-scale study for a multi-basin machine learning model predicting horizontal well production. In Proceedings of the SPE Annual Technical Conference and Exhibition, Dallas, TX, USA, 24–26 September 2018. [Google Scholar] [CrossRef]
Bhattacharyya, S.; Vyas, A. Application of machine learning in predicting oil rate decline for Bakken shale oil wells. Sci. Rep. 2022, 12, 16154. [Google Scholar] [CrossRef] [PubMed]
Chaikine, I. Machine Learning Applications for Production Prediction and Optimization in Multistage Hydraulically Fractured Wells; University of Calgary: Calgary, AB, Canada, 2020; Available online: https://prism.ucalgary.ca/bitstream/handle/1880/112817/ucalgary_2020_chaikine_ilia.pdf?sequence=2&isAllowed=y (accessed on 30 April 2025).
Chaikine, I.; Gates, I.D. A machine learning model for predicting multi-stage horizontal well production. J. Pet. Sci. Eng. 2021, 198, 108133. [Google Scholar] [CrossRef]
Chen, X.; Li, J.; Gao, P.; Zhou, J. Prediction of shale gas horizontal wells productivity after volume fracturing using machine learning–an LSTM approach. Pet. Sci. Technol. 2022, 40, 1861–1877. [Google Scholar] [CrossRef]
Gao, Q.; Liao, L.; Yang, S. Application of artificial intelligence technology in unconventional natural gas production forecasting. In Proceedings of the Artificial Intelligence and Machine Learning for Multi-Domain Applications II; SPIE: Bellingham, WA, USA, 2022; p. 291. [Google Scholar] [CrossRef]
Agwu, O.E.; Alatefi, S.; Azim, R.A.; Alkouh, A. Applications of Artificial Intelligence Algorithms in Artificial Lift Systems: A Critical Review. Flow Meas. Instrum. 2024, 97, 102613. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Nadimi-Shahraki, M.H.; Taghian, S.; Mirjalili, S.; Ewees, A.A.; Abualigah, L.; Elaziz, M.A. MTV-MFO: Multi-Trial Vector-Based Moth-Flame Optimization Algorithm. Symmetry 2021, 13, 2388. [Google Scholar] [CrossRef]
Okere, C.J.; Sheng, J.J.; Ikpeka, P.M. Which Offers Greater Techno-Economic Potential: Oil or Hydrogen Production from Light Oil Reservoirs? Geosciences 2025, 15, 214. [Google Scholar] [CrossRef]
Algwil, A.R.A.; Khalifa, W. An Enhanced Moth Flame Optimization Extreme Learning Machines Hybrid Model for Predicting CO₂ Emissions. Sci. Rep. 2025, 15, 1–24. [Google Scholar] [CrossRef] [PubMed]
Mirjalili, S. Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowl.-Based Syst. 2015, 89, 228–249. [Google Scholar] [CrossRef]
Ning, Y.; Schumann, H.; Jin, G. Application of Data Mining to Small Data Sets: Identification of Key Production Drivers in Heterogeneous Unconventional Resources. SPE Reserv. Eval. Eng. 2023, 26, 411–421. [Google Scholar] [CrossRef]

Figure 1. The workflow of the MFO-XGBoost model for predicting the production decline rate in offshore high water-cut reservoirs.

Figure 2. Heatmap of missing values in the sample dataset for production decline rates of high water-cut offshore oil reservoirs.

Figure 3. Workflow diagram of comprehensive feature engineering methodology.

Figure 4. Results of the linear regression method.

Figure 5. Results of the GRA method.

Figure 6. Results of the SHAP method.

Figure 7. Results of the Pearson correlation coefficient method.

Figure 8. Results of the MDI method.

Figure 9. Results of majority voting-based integration of comprehensive feature engineering methods.

Figure 10. Iterative optimization process of optimal hyperparameters for MFO-XGBoost model.

Figure 11. Results of MFO-XGBoost model for predicting production decline rates in offshore high water-cut oil reservoirs.

Figure 12. Results of standard random forest model for predicting production decline rates in offshore high water-cut oil reservoirs.

Figure 13. Results of standard decision regression tree model for predicting production decline rates in offshore high water-cut oil reservoirs.

Figure 14. Results of standard XGBoost model for predicting production decline rates in offshore high water-cut oil reservoirs.

Figure 15. Performance visualization of Bayesian-optimized XGBoost for predictive modeling.

Figure 16. Performance visualization of PSO-optimized XGBoost for predictive modeling.

Table 1. Partial parameter list of a certain offshore P oilfield.

Parameter	Value
Reservoir Oil Saturation Pressure	6.890–13.720 MPa
Reservoir Oil Viscosity	9.1–944.0 mPa·s
Average Relative Density of Natural Gas Samples	0.769
Pressure Coefficient	1.00
Pressure Gradient	0.977 MPa/100 m
Temperature Gradient	3.0 °C/100 m

Table 2. Mathematical statistical summary of production decline rate sample dataset for offshore high water-cut oil reservoirs.

	Count	Mean	Std	Min	25%	50%	75%	Max
Vertical Thickness/m	209	65.112	23.115	6.5	50.7	64.312	83.8	135.3
Perforated Interval/m	209	55.824	21.337	2.4	42	54.987	70.8	118.8
Porosity/%	209	26.154	1.951	20.274	25.119	26.225	27.326	30.479
Oil Saturation/%	209	66.516	7.054	48.874	63.042	66.758	71.302	84.1
Shale Content/%	209	11.785	2.32	6.168	10.376	11.591	12.912	19.813
Permeability/mD	209	942.283	335.933	293.048	711.526	943.44	1103.695	2256.5
Crude Oil Viscosity/(50 °C) mPa·s	209	175.938	143.768	48.21	80.98	124.8	193.4	830.8
Mobility/(mD/(mPa·s))	209	7.942	4.641	1.056	4.426	7.158	10.189	23.278
Reservoir Flow Coefficient/(mD/(mPa·s))	209	450.779	333.069	6.372	211.434	395.233	581.682	1622.779
Deviation Angle/°	209	46.201	16.94	0	35.211	47.027	55.087	89.01
Production Decline Rate	209	0.1	0.06	0.026	0.06	0.081	0.123	0.304

Table 3. Optimal hyperparameter values for the MFO-XGBoost model.

Hyperparameter	Value
n_estimators	159
learning_rate	0.065
max_depth	3
gamma	0.001
subsample	0.527
colsample_bytree	0.711

Table 4. Comparative analysis of standard ML models vs. MFO-XGBoost.

		DT	RF	XGBoost	MFO-XGBoost
	MAE	0.0216	0.0287	0.0109	0.0101
Training Dataset	MSE	0.0011	0.0014	0.0002	0.0002
	RMSE	0.0328	0.0379	0.014	0.0134
	R²	0.7261	0.6336	0.9503	0.9542
	MAE	0.0225	0.0161	0.0141	0.011
Test Dataset	MSE	0.0008	0.0004	0.0004	0.0002
	RMSE	0.0285	0.0211	0.0191	0.0146
	R²	0.6668	0.8174	0.8503	0.9128

Table 5. Comparative analysis of standard machine learning models vs. MFO-XGBoost.

		MFO-XGBoost	Bayesian-XGBoost	PSO-XGBoost
Training Dataset	MAE	0.0101	0.0144	0.0055
	MSE	0.0002	0.0004	0.0001
	RMSE	0.0134	0.0187	0.0072
	R²	0.9542	0.9105	0.9868
Test Dataset	MAE	0.011	0.013	0.0117
	MSE	0.0002	0.0003	0.0002
	RMSE	0.0146	0.0176	0.0158
	R²	0.9128	0.873	0.8981

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ding, Z.; Lu, C.; Chen, L.; Chong, Q.; Dong, Y.; Xia, W.; Meng, F. Production Decline Rate Prediction for Offshore High Water-Cut Reservoirs by Integrating Moth–Flame Optimization with Extreme Gradient Boosting Tree. Processes 2025, 13, 2266. https://doi.org/10.3390/pr13072266

AMA Style

Ding Z, Lu C, Chen L, Chong Q, Dong Y, Xia W, Meng F. Production Decline Rate Prediction for Offshore High Water-Cut Reservoirs by Integrating Moth–Flame Optimization with Extreme Gradient Boosting Tree. Processes. 2025; 13(7):2266. https://doi.org/10.3390/pr13072266

Chicago/Turabian Style

Ding, Zupeng, Chuan Lu, Long Chen, Qinwan Chong, Yintao Dong, Wenlong Xia, and Fankun Meng. 2025. "Production Decline Rate Prediction for Offshore High Water-Cut Reservoirs by Integrating Moth–Flame Optimization with Extreme Gradient Boosting Tree" Processes 13, no. 7: 2266. https://doi.org/10.3390/pr13072266

APA Style

Ding, Z., Lu, C., Chen, L., Chong, Q., Dong, Y., Xia, W., & Meng, F. (2025). Production Decline Rate Prediction for Offshore High Water-Cut Reservoirs by Integrating Moth–Flame Optimization with Extreme Gradient Boosting Tree. Processes, 13(7), 2266. https://doi.org/10.3390/pr13072266

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Production Decline Rate Prediction for Offshore High Water-Cut Reservoirs by Integrating Moth–Flame Optimization with Extreme Gradient Boosting Tree

Abstract

1. Introduction

2. Principle of the MFO-XGBoost Method

2.1. XGBoost Algorithm

2.2. Moth–Flame Optimization Algorithm

2.3. Workflow of the MFO-XGBoost Model for Predicting Production Decline Rate in Offshore High Water-Cut Reservoirs

2.4. Evaluation Metrics

3. Application Example

3.1. Overview of the Studied Oilfield Block

3.2. Data Acquisition and Processing

3.3. MFO-XGBoost Model Training and Validation

3.4. Comparison of Prediction Performance of Different Models

4. Conclusions

5. Limitations and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI