A Model Embedded with Development Patterns for Oilfield Production Forecasting
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsDear Authors,
I have some comments:
- Please mark the study area in Figure 1a, with a black square. Not red.
- Line 82, Stretching technique in PSO can also improve the performance of the algorithm: Sweilam,N.H., Gobashy, M.M., and Hashem , T. 2007. Using particle swarm optimization with function stretching (SPSO) for inverting gravity data: A visibility study. Journal of Physics and mathematics, Science Bull., CairoUniversity.
- Line 192, section 2.2.3: The PPSO suffers from several drawbacks. How did the authors constraint these deficiencies: these include: 1- High computational cost , Slower for large problems 2-Parameter sensitivity, it requires tuning and sensitive to setup, 3- Premature convergence: Risk of local minima if hierarchy collapses. 4- Scalability limitations : Poor performance on high-dimensional problems. 5- No standard structure: it is hard to compare results; fragmented approaches. 6- Weak theoretical basis : Unlike canonical PSO, the theoretical analysis (e.g., convergence proofs, stability) for Pyramid PSO is limited or empirical.
- Line 244, Eq.11: MSE implicitly assumes the data noise is normally distributed with constant variance (homoscedastic).If the real noise distribution is not Gaussian or has changing variance, MSE is statistically suboptimal. Poor fit for data with heteroscedastic or heavy-tailed noise.!! How to overcome this?
Comments for author File:
Comments.pdf
Author Response
Dear Authors,
I have some comments:
Response: We are deeply grateful for your favorable comments and the meticulous effort you've dedicated to the review of our manuscript.
Comments 1: Please mark the study area in Figure 1a, with a black square. Not red.
Response 1: Thank you for your suggestion. We have added a black dashed box in Figure 1 to indicate the research field, while the text explanation is provided in the following paragraph at line 154.
Comments 2: Line 82, Stretching technique in PSO can also improve the performance of the algorithm: Sweilam,N.H., Gobashy, M.M., and Hashem , T. 2007. Using particle swarm optimization with function stretching (SPSO) for inverting gravity data: A visibility study. Journal of Physics and mathematics, Science Bull., CairoUniversity.
Response 2: Thank you very much for pointing this out. We have incorporated stretching technology into the consideration of PSO performance improvement. This reference has been added to the introduction at line 91, making the overall PSO algorithm narrative logic more scientific.
Comments 3: Line 192, section 2.2.3: The PPSO suffers from several drawbacks. How did the authors constraint these deficiencies: these include: 1- High computational cost , Slower for large problems 2-Parameter sensitivity, it requires tuning and sensitive to setup, 3- Premature convergence: Risk of local minima if hierarchy collapses. 4- Scalability limitations : Poor performance on high-dimensional problems. 5- No standard structure: it is hard to compare results; fragmented approaches. 6- Weak theoretical basis : Unlike canonical PSO, the theoretical analysis (e.g., convergence proofs, stability) for Pyramid PSO is limited or empirical.
Response 3: Thanks for your valuable and insightful comments. In view of the six major deficiencies of the traditional PPSO, this paper proposes a systematic improvement through the adaptive pyramidal particle swarm optimization algorithm based on chaotic map (CAPPSO), with specific summaries as follows.
(1) The population is initialized through chaos mapping to enhance particle diversity and reduce the number of iterations, thereby lowering computational costs. Experiments demonstrate that chaos initialization enables the algorithm to converge faster on the CEC2022 test functions, significantly improving the efficiency of solving large-scale problems.
(2) An adaptive learning factor at line 387 (Equation 20) is introduced to dynamically balance exploration and exploitation, reducing reliance on initial parameters.
(3) A particle iteration stagnation value is introduced at line 400. When the global optimum has not been updated for consecutive iterations, the positions of particles are randomly reset (line 404, Equation 23) to avoid getting trapped in local optima. Furthermore, chaos mapping introduces perturbations during the initialization phase and when stagnation occurs, preventing convergence stagnation.
(4) The pyramid structure divides particles into multiple layers, with each layer engaging in independent competition and cooperation, thereby mitigating the impact of parameter coupling in high-dimensional problems.
(5) As described in line 528, the pyramid structure is standardized, employing a fixed "2-4-6-8" hierarchical structure for 10-dimensional problems, providing a standardized benchmark for comparison.
(6) Conduct experiments as extensively as possible to verify and demonstrate convergence and stability. Through comprehensive experiments on the CEC2022 test function suite (which includes unimodal, multimodal, hybrid, and composite functions), it is proven that CAPPSO exhibits excellent performance in terms of precision, convergence speed (Figure 9), and stability.
Comments 4: Line 244, Eq.11: MSE implicitly assumes the data noise is normally distributed with constant variance (homoscedastic).If the real noise distribution is not Gaussian or has changing variance, MSE is statistically suboptimal. Poor fit for data with heteroscedastic or heavy-tailed noise.!! How to overcome this?
Response 4: Thank you very much for your comments. It's a great honor to discuss this issue with you. From a data perspective, reservoir heterogeneity leads to complex noise distributions in production data. The drastic variations in permeability fields at fault and fracture locations also cause production data to exhibit multimodal distributions or heavy-tailed characteristics. Addressing or avoiding this issue is not solely confined to the later stages of machine learning modeling but is reflected throughout the entire reservoir development process. From the standpoint of real-world production, early-stage geological exploration endeavors to avoid unstable strata as much as possible, thereby ensuring the stability of well drilling operations. Moreover, during the production lifecycle, water injection development considers the injection-production balance, meaning that the volume of fluid extracted from the formation is roughly equivalent to the volume of water injected into it. This approach maintains formation energy stability and, consequently, ensures stable production. Additionally, in this study, the predictive scenario focuses on the production decline phase. The preceding water-free production period and stable production phase are safeguarded for later production stability through measures such as well pattern optimization, zonal water injection, and profile control and water plugging. Thanks to these production techniques, the production data used for neural network training can maintain a relatively stable noise distribution. During the neural network modeling phase, constructing loss functions based on development patterns is also a method to address this issue. The Arps decline curve analysis method derives reservoir development relationships through statistical induction from extensive production data across diverse geological scenarios, implicitly accounting for the impact of complex noise distributions in reservoir development on production data. Loss functions constructed under the guidance of these production patterns can assist neural networks in overcoming the influence of outliers during modeling, thereby ensuring the predictive accuracy of the models.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript proposes an innovative machine learning approach for oilfield production forecasting by embedding physical decline curve analysis (DCA) into a long short-term memory (LSTM) network. Additionally, the authors introduce an improved particle swarm optimization algorithm (CAPPSO) to optimize the loss function's parameters. The study is methodologically robust and tackles a pertinent challenge in petroleum engineering by combining data-driven techniques with physical constraints to enhance both interpretability and predictive accuracy.
This work is original and well-implemented, and it could be publicated following revisions to improve clarity and organization. The hybrid framework represents a meaningful contribution to oilfield production forecasting and offers a transferable approach applicable to similar challenges in other domains.
Several sections, particularly on LSTM and DCA theory, could be made more concise
Figure 6 and 8 hve low quality
Manuscript need a clearer comparison with similar physics-informed machine learning approaches in other engineering domains.
How are the practical implementation aspects, such as data availability, real-time computation demands, and generalization to different reservoir types. Limtations?
Not all variables are defined immediately near their usage (e.g., in the loss function with DCA types).
Figure captions lack detailed descriptions.
Table 2 and 4 could benefit from confidence intervals or standard deviations to evaluate variability.
Verify all references are properly formatted and linked (many DOIs missing).
Author Response
The manuscript proposes an innovative machine learning approach for oilfield production forecasting by embedding physical decline curve analysis (DCA) into a long short-term memory (LSTM) network. Additionally, the authors introduce an improved particle swarm optimization algorithm (CAPPSO) to optimize the loss function's parameters. The study is methodologically robust and tackles a pertinent challenge in petroleum engineering by combining data-driven techniques with physical constraints to enhance both interpretability and predictive accuracy.
This work is original and well-implemented, and it could be publicated following revisions to improve clarity and organization. The hybrid framework represents a meaningful contribution to oilfield production forecasting and offers a transferable approach applicable to similar challenges in other domains.
Response: Thank you very much for your positive comments and the effort you put into reviewing our manuscript.
Comments 1: Several sections, particularly on LSTM and DCA theory, could be made more concise
Response 1: Thank you very much for your positive comments. We have streamlined the descriptions of LSTM and DCA sections. In order to provide a clearer and more unambiguous description of the basic knowledge of LSTM, we first removed the content related to RNN and directly introduced the core structure of LSTM in the manuscript. In addition, references are clearly marked in prominent locations, making it easy for readers to jump to relevant content when they wish to obtain more. Similarly, the DCA section removed redundant descriptions and instead obtained them from references. Moreover, the definitions of formulas and variables have been moved to their respective positions of use.
Comments 2: Figure 6 and 8 have low quality
Response 2: Thank you very much for your detailed suggestion. We have made quality improvements to the figures, including increasing the number of coordinate axis graduations and enlarging the font, in order for readers to understand the content more clearly. In addition, in the manuscript, figures of model structure (Figure 5) and all experimental result (Figure 7-14) are drawn using vector graphics, which can ensure that readers do not distort when zooming in for reading.
Comments 3: Manuscript need a clearer comparison with similar physics-informed machine learning approaches in other engineering domains.
Response 3: Thank you very much for your professional comments. In the newly added Remark 1 at line 487 of the manuscript, we take groundwater flow prediction and geological modeling research as examples to discuss the effectiveness and uniqueness of the method proposed in this work. Compared to similar cases where machine learning models are constructed based on mathematical formulas, the method proposed in this work focuses more on practical usability. As mentioned in the article, the data used for modeling in this work are all available on site, including input data and data used for constructing the loss function. These features are unique to our work and not possessed by other similar studies. When facing on-site environments, those other studies often fail to complete modeling due to the unavailability of data.
Comments 4: How are the practical implementation aspects, such as data availability, real-time computation demands, and generalization to different reservoir types. Limtations?
Response 4: Thank you very much for your valuable and insightful advice. We will reply in three parts as follows.
(1) The data used in the model proposed in this work are all readily available and easily obtainable development data on site. Firstly, as stated in Input features at line 297, where the input and output features are based on injection and production volume data obtained from monthly production reports. The geological static parameters used to construct high-order features are well point permeability and reservoir thickness, rather than difficult to obtain permeability field data. Secondly, at line 237, the loss function is constructed based on DCA, which not only achieves the goal of embedding development rules to guide machine learning modeling, but also avoids the parameter quantity of pressure that is almost impossible to obtain on site. These measures ensure data availability during modeling.
(2) The computational resource requirements of machine learning models are usually reflected in the modeling phase. We have added the calculation time in the experimental section of the manuscript at line 469 and 592. By combining the modeling platform mentioned earlier, readers can have a clear understanding of resource requirements. Based on the current experimental hardware platform and time consumption, it can be said that the method proposed in this work has the advantage of low computational requirements.
(3) To further validate the generalization performance of the model proposed in this work for different reservoir types, a typical marine sedimentary facies reservoir named Brugge is introduced for experimental verification in section 3.4 at line 576. From the numerical performance, it can be seen that the model proposed in this article still maintains a high level in this reservoir, effectively proving that the model has good generalization ability in different reservoir types and strong practical application ability.
Comments 5: Not all variables are defined immediately near their usage (e.g., in the loss function with DCA types).
Response 5: Thanks very much for your careful checks. We have made modifications to make the definition of variables closer to their usage location, making it easier to read. For example, the definition of Arps DCA at line 244 and δ at line 222.
Comments 6: Figure captions lack detailed descriptions.
Response 6: Thank you very much for your positive comments. After checking the figure captions of the manuscript, we have modified and enriched the inappropriate captions (Figures 2, 3, 5, 7, 10) to improve reading convenience.
Comments 7: Table 2 and 4 could benefit from confidence intervals or standard deviations to evaluate variability.
Response 7: We gratefully thank you for your professional comments. We have included standard deviation in the performance evaluation index and calculated this index in all experiments to better evaluate the performance of the model. Refer to Tables 2 and 4, as well as the newly added Table 6.
Comments 8: Verify all references are properly formatted and linked (many DOIs missing).
Response 8: Thank you very much for pointing this out. We have added Doi or links to all references for readers to find them more accurately and conveniently.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe manuscript under review is devoted to the actual problem of oil field forecasting. The authors propose a new model for processing oil well data that allows a machine learning model to incorporate physical information. The article is of technical and academic interest and may be published after revision. The main observations that will help the authors to improve the presentation of the manuscript are presented below.
- The introduction presents rather complete information about the development of machine learning in the field of oil productivity forecasting. I would recommend the authors to start with the description of the problem of accuracy and efficiency of forecasting applied before the introduction of new technologies and show the necessity of their development at the present stage.
- The technical side of the process, how the modification of memory cells in LSTM takes place, is not clearly described. In my opinion, Figures 2 and 3 need to be more fully explained in the text. In addition, the position of FIG. 3 should be placed after it is mentioned rather than before it.
- I would recommend devoting some space in the description to the real advantages of the PPSO algorithm over the well-known earlier PSO algorithm.
- It is obvious that the results of any forecast can be confirmed or refuted only by subsequent developments. The authors could apply the proposed methodology to historical data of a specific oil field and compare the obtained result with real production data. Such a comparison could give an unambiguous assessment of the proposed methodology.
- In conclusion, the authors point out the advantage of the machine learning model in multi-well productivity prediction. This general conclusion requires more precise justification.
- Technically, the manuscript makes a very good impression. It is obvious that the authors are excellent specialists in their field. The disadvantage of the work, in my opinion, is the lack of description of its scientific component. This is an important point for a paper seeking publication in a scientific journal and I recommend that the authors pay attention to it.
Author Response
The manuscript under review is devoted to the actual problem of oil field forecasting. The authors propose a new model for processing oil well data that allows a machine learning model to incorporate physical information. The article is of technical and academic interest and may be published after revision. The main observations that will help the authors to improve the presentation of the manuscript are presented below.
Response: Thanks for your positive evaluation and the effort you've put into reviewing our manuscript.
Comments 1: The introduction presents rather complete information about the development of machine learning in the field of oil productivity forecasting. I would recommend the authors to start with the description of the problem of accuracy and efficiency of forecasting applied before the introduction of new technologies and show the necessity of their development at the present stage.
Response 1: Thank you very much for your professional suggestions. The accuracy and efficiency of predictions are indeed the core driving forces behind the development of new technologies. Therefore, we have adjusted the narrative logic in the introduction and added the discussion on accuracy and efficiency at line 20, 43 and 97, thus connecting the development process of various technologies.
Comments 2: The technical side of the process, how the modification of memory cells in LSTM takes place, is not clearly described. In my opinion, Figures 2 and 3 need to be more fully explained in the text. In addition, the position of FIG. 3 should be placed after it is mentioned rather than before it.
Response 2: Thanks for your positive comments. LSTM is one of the classic models widely used in the field of machine learning, which is developed based on the improvement of RNN. In this work, in order to provide a clearer and more unambiguous description of the basic knowledge of LSTM, we removed the content related to RNN and directly introduced the core structure of LSTM in the revision manuscript. The advantage of this modification is that it prevents readers from being influenced by the structure of RNN and confusing the design logic of LSTM. In addition, references are clearly marked in prominent locations, making it easy for readers to jump to relevant content when they wish to obtain more.
Comments 3: I would recommend devoting some space in the description to the real advantages of the PPSO algorithm over the well-known earlier PSO algorithm.
Response 3: We gratefully appreciate for your valuable suggestion. We have systematically summarized the advantages of PPSO algorithm at line 225-231, making it clear to readers the progress of PPSO algorithm compared to PSO algorithm.
Comments 4: It is obvious that the results of any forecast can be confirmed or refuted only by subsequent developments. The authors could apply the proposed methodology to historical data of a specific oil field and compare the obtained result with real production data. Such a comparison could give an unambiguous assessment of the proposed methodology.
Response 4: Thank you very much for your valuable and insightful advice. To further validate the generalization performance of the model proposed in this work for specific reservoir, a typical marine sedimentary facies reservoir named Brugge is introduced for experimental verification in Section 3.4 at line 576. From the numerical performance, it can be seen that the model proposed in this article still maintains a high level in this reservoir, effectively proving that the model has good generalization ability in different reservoir types and strong practical application ability.
Comments 5: In conclusion, the authors point out the advantage of the machine learning model in multi-well productivity prediction. This general conclusion requires more precise justification.
Response 5: We gratefully appreciate for your enlightening suggestion. In order to provide more solid support for our conclusions, we have included new experiments in the manuscript to evaluate generalization (Section 3.4) and a new performance evaluation metric, standard deviations (line 435), to assess variability. In addition, Remark 1 at line 487 has been added to describe the unique advantages of the method proposed in this work compared to similar methods in other engineering fields, namely high on-site applicability. By incorporating these works, we have provided more precise reasons for the conclusion drawn in this article.
Comments 6: Technically, the manuscript makes a very good impression. It is obvious that the authors are excellent specialists in their field. The disadvantage of the work, in my opinion, is the lack of description of its scientific component. This is an important point for a paper seeking publication in a scientific journal and I recommend that the authors pay attention to it.
Response 6: We sincerely appreciate the time and effort you have devoted to reviewing our manuscript. We are delighted that the technical aspects of our work have made a positive impression on you, and we are truly honored by your recognition of our expertise in this field. We have examined the entire manuscript and made our best efforts to refine the language and logical descriptions. In our future scientific research endeavors, we will adhere to your suggestions, keep learning continuously, and strive to make our paper better conform to the standards of scientific publications.
Reviewer 4 Report
Comments and Suggestions for AuthorsDear Authors,
I sincerely appreciate the opportunity to review this research. I hope that the scientific feedback I provide will guide the authors in refining their paper to the highest standard, ultimately leading to its publication in your esteemed journal. Now I have read through your paper, which is entitled :A model embedded with development patterns for oilfield production forecasting.
The following is the main comment on this paper:
- The rationale for the specific weights (λdata = 0.7, λphy = 0.3) in Equation (17) is not thoroughly justified. A sensitivity analysis or theoretical grounding (e.g., via Bayesian optimization) would strengthen this choice.
- Clarify how η and d0​ are dynamically updated during training. Are they treated as trainable parameters or fixed after initial estimation?
- Specify whether the Egg model dataset or code will be publicly shared. Reproducibility is critical for adoption.
- List all hyperparameters (e.g., LSTM layers, hidden units, PSO swarm size) in a table for clarity.
- Figures 8 and 9: Ensure axis labels and legends are legible in the final publication.
- Table 3: Highlight CAPPSO’s superiority more clearly (e.g., boldface for best results).
Author Response
Dear Authors,
I sincerely appreciate the opportunity to review this research. I hope that the scientific feedback I provide will guide the authors in refining their paper to the highest standard, ultimately leading to its publication in your esteemed journal. Now I have read through your paper, which is entitled :A model embedded with development patterns for oilfield production forecasting.
Response: Thanks a lot for your valuable comments and the effort you put into reviewing our manuscript.
The following is the main comment on this paper:
Comments 1: The rationale for the specific weights (λdata= 0.7, λphy = 0.3) in Equation (17) is not thoroughly justified. A sensitivity analysis or theoretical grounding (e.g., via Bayesian optimization) would strengthen this choice.
Response 1: We gratefully appreciate your valuable suggestion. Sensitivity analysis or theoretical foundation can clearly describe the selection of weight coefficients. In this work, we choose coefficients from two perspectives. Firstly, for LSTM-DCA, fixed λdata=0.7 and λphy=0.3 are set. They are selected based on expert experience or parameter tuning experience, without conducting sensitivity analysis. This defect is precisely the driving force that inspires us to introduce particle swarm optimization algorithm. In LSTM-DCA-CAPPSO, CAPPSO uses an algorithmic approach to find the optimal combination of λdata, λphy, λe, λb, λm. As shown in line 567 of the manuscript, better experimental results demonstrate that better weight coefficients have been found.
Comments 2: Clarify how η and d0​ are dynamically updated during training. Are they treated as trainable parameters or fixed after initial estimation?
Response 2: Thanks for your positive comments. η and d0 are parameters in Arps DCA. Their values are determined based on reservoir production data. After the reservoir development data is given, its corresponding η and d0 are determined based on the method described in lines 266-280 of the manuscript. This process belongs to the content of data processing. After determining the values, they exist as fixed values throughout the entire neural network training period.
Comments 3: Specify whether the Egg model dataset or code will be publicly shared. Reproducibility is critical for adoption.
Response 3: Thank you very much for your professional evaluation. The Egg reservoir is a publicly available benchmark model for reservoir numerical simulation and can be downloaded from the following website https://github.com/RMiftakhov/EclGym. By using numerical simulators such as Eclipse (employed in this work) or CMG, and following the parameter settings provided in this manuscript, corresponding production data can be obtained.
Comments 4: List all hyperparameters (e.g., LSTM layers, hidden units, PSO swarm size) in a table for clarity.
Response 4: Thanks a lot for your detailed suggestion. Hyperparameters of LSTM-DCA-CAPPSO are summarized in Table 5 to facilitate clearer reading for readers.
Comments 5: Figures 8 and 9: Ensure axis labels and legends are legible in the final publication.
Response 5: We gratefully appreciate for your enlightening suggestion. We have increased the number of coordinate axis graduations and enlarged the font, in order for readers to understand the content more clearly. In addition, in the manuscript, figures of model structure (Figure 5) and all experimental result (Figure 7-14) are drawn using vector graphics, which can ensure that readers do not distort when zooming in for reading.
Comments 6: Table 3: Highlight CAPPSO’s superiority more clearly (e.g., boldface for best results).
Response 6: Thanks a lot for your detailed suggestion. We have highlighted the advantages of CAPPSO by bolding the font with the best results.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsDear authors,
thank you for efforts done to improve the text.
However, I still have the following:
- The comment” 11(now eq.9): MSE implicitly assumes the data noise is normally distributed with constant variance (homoscedastic). If the real noise distribution is not Gaussian or has changing variance, MSE is statistically suboptimal. Poor fit for data with heteroscedastic or heavy-tailed noise.!! How to overcome this?”, is not answered by the authors. Error analysis of new proposed numerical schemes is critical issue in inversion algorithms. Please check the effect of noise with varying amplitudes of the estimated model parameters.
- Please consider revise the spelling of the reference added Sweilam …Gobashy…’
- Please add a location map to the case study
Author Response
Dear authors,
thank you for efforts done to improve the text.
However, I still have the following:
Comments 1: The comment” 11(now eq.9): MSE implicitly assumes the data noise is normally distributed with constant variance (homoscedastic). If the real noise distribution is not Gaussian or has changing variance, MSE is statistically suboptimal. Poor fit for data with heteroscedastic or heavy-tailed noise.!! How to overcome this?”, is not answered by the authors. Error analysis of new proposed numerical schemes is critical issue in inversion algorithms. Please check the effect of noise with varying amplitudes of the estimated model parameters.
Response 1: Thank you very much for your meaningful and insight comment. We are truly sorry for not providing a clear and satisfactory answer to your question during the first round of review. To gain a clearer understanding of how the method proposed in this work performs under non-Gaussian noise conditions, we have added Section 3.4.2. In this section, production data contaminated with heteroscedastic noise and t-distribution noise (which exhibits heavy-tailed characteristics) are fed into neural networks for training. Based on the results from the testing set presented in Tables 7 and 8, neural networks that solely use MSE as the loss function experience significant performance degradation. In contrast, the method proposed in this work, which incorporates DCA into the loss function construction, maintains relatively better stability. A reasonable explanation is that Arps DCA is derived through statistical induction from a vast amount of production data under different geological conditions, implicitly accounting for the impact of complex noise distributions on production data. To our limited understanding, the loss function constructed under the guidance of this development pattern can assist neural networks in to a certain extent overcoming complex noise interference during the training process. However, as stated in Remark 2, it must be admitted that the method proposed in this work cannot fully handle data containing non-Gaussian noise. There is still significant room for improvement in the structural enhancements of machine learning models in the future.
Comments 2: Please consider revise the spelling of the reference added Sweilam …Gobashy…’
Response 2: Thanks a lot for your careful check. We have corrected the author’s name of the reference [28].
Comments 3: Please add a location map to the case study
Response 3: We sincerely appreciate your professional advice. As illustrated in Figure 12, after locating the prototype position of the Brugge reservoir in the mapping application, we provided a location map in the form of a screenshot marked with a red circle.
Reviewer 2 Report
Comments and Suggestions for Authorspublish as is
Author Response
Comments 1: publish as is
Response 1: We sincerely thank you for your recognition.
