Identification of Water Flooding Advantage Seepage Channels Based on Meta-Learning

Dong, Chi; Zhang, Baobin; Yang, Erlong; Lu, Jinhao; Zhang, Linmo

doi:10.3390/en16020687

Open AccessArticle

Identification of Water Flooding Advantage Seepage Channels Based on Meta-Learning

¹

Key Laboratory of Oil and Gas Recovery Enhancement Ministry of Education, Northeast Petroleum University, Daqing 163700, China

²

Sanya Offshore Oil & Gas Research Institute, Northeast Petroleum University, Sanya 572024, China

³

Department of Computer & Information Technology, Northeast Petroleum University, Daqing 163700, China

⁴

Daqing Oilfield Co., Ltd., First Oil Production Plant, Daqing 163700, China

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(2), 687; https://doi.org/10.3390/en16020687

Submission received: 1 December 2022 / Revised: 27 December 2022 / Accepted: 29 December 2022 / Published: 6 January 2023

(This article belongs to the Special Issue AI Technologies in Oil and Gas Geological Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

As the water injection oilfield enters into the high water cut stage, a large number of water flooding advantage seepage channels are formed in the local reservoir dynamically changing with the water injection process, which seriously affects the water injection development effect. In oilfield production, water injection and fluid production profile test data are direct evidence to identify advantage seepage channels. In recent years, some scholars have carried out research related to the identification of advantage seepage channels based on machine learning methods; however, the insufficient profile test data limit the quantity and quality of learning samples, leading to problems such as low prediction accuracy of learning models. Therefore, the author proposes a new method of advantage seepage channel identification based on meta-learning techniques, using the MAML algorithm to optimize the neural network model so that the model can still perform well in the face of training tasks with low data sample size and low data quality. Finally, the model was applied to the actual blocks in the field to identify the advantage seepage channels, and the identification results were basically consistent with the tracer monitoring results, which confirmed the feasibility of the method. It provides a new solution idea for the task of identifying advantage seepage channels and other tasks with low data quality, which has a certain guiding significance.

Keywords:

advantage seepage channel; meta-learning; MAML; correlation analysis; artificial neural network

1. Introduction

In the middle and late stages of water injection oilfield development, it will enter the high water cut stage. One of the main factors leading to high water cut is the existence of advantage seepage channels. After a long period of high-intensity water injection, the porosity, permeability, and other physical properties of the reservoir change, and the heterogeneity of the reservoir becomes stronger, forming advantage seepage channels [1]. The injected water circulates inefficiently along the advantage seepage channels, which seriously affects the effect of water injection development.

Currently, the methods for identifying advantage seepage channels at home and abroad mainly include the logging curve method, the well-test analysis method, the geochemical method, the tracer test method, the fuzzy comprehensive evaluation method, the capacitive resistance model analysis method, etc. The log curve method [2] is mainly used to determine the formation site of the advantage seepage channel in the reservoir by studying the reservoir properties from the logs, which is a relatively simple method and does not reflect the reservoir changes well. The well-test analysis method [3] uses interference test wells, pulse test wells, and other multi-well test methods to determine the dynamic connectivity between reservoir wells, which requires changing the working regime of wells and affects normal product development and has a long testing period and high cost. The geochemical methods [4] use chromatographic fingerprinting techniques to study dynamic reservoir connectivity, and although the identification results are more accurate, the cost is too high for large-scale use. The tracer testing method [5] is to inject the tracer from injection wells, monitor its output from production wells, and plot the tracer output curve to judge the reservoir condition; the results are more accurate but time-consuming and costly, and not suitable for large-scale promotion. The fuzzy comprehensive evaluation method [6,7,8] determines the determination index of low-efficiency circulating oil and water wells by analyzing the influencing factors of forming dominant seepage channels and their characteristics in development. The results are objective and accurate, but the determination index takes a lot of time to develop engineers and later is challenging to adjust dynamically. The capacitive resistance model analysis method [9,10] uses rapid modeling and simulation of a water-driven oil recovery process, but the model parameters are more difficult to solve, and some assumptions in the solution process need to be verified so they are consistent with the actual production of the oilfield.

In recent years, the rapid growth of artificial intelligence technology has spawned a trend of artificial intelligence in all spheres of life, with a range of AI algorithms being applied to specific projects in finance, healthcare, industry, etc. The oil and gas industry is no exception, there are attempts at water injection adjustment [11], missing oil data filling [12], and inter-well connectivity evaluation [13]. How to apply AI technology to investigate the laws from vast production data to aid production, improve recovery efficiency, and cut production costs is gradually becoming the industry’s focal point. Machine learning algorithms such as support vector machines [14], random forests [15], and neural networks [16] have also been applied to the study of advantage seepage channels, which save a lot of human and material costs compared with traditional identification methods and have a higher identification accuracy, making them suitable for generalization. However, the accuracy and generalization ability of the models still need to be improved, and there are problems such as difficulty in learning and adapting to unobserved tasks and high dependence on training data sets. The development of meta-learning technology provides a new way of thinking to solve the problems of training several machine learning algorithms mentioned above. Meta-learning [17,18] is a machine-learning model designed to address the lack of generalization performance of traditional machine-learning models and their poor adaptability to new tasks, which allows the model to learn new tasks faster with fewer samples. The meta-learning model can be well adapted to a new task after training with different tasks, i.e., training a model with learning capability, also known as “Learning to learn”. Therefore, the author proposes a new method based on meta-learning technology combined with oilfield dynamic and static data for advantage seepage channel identification.

2. Methodology

Meta-learning is learning to learn, and meta-learning desires that models have the ability to “learn to learn” so that they may quickly learn new tasks based on their “learned knowledge”.

2.1. The Distinction between Meta-Learning and Machine Learning

The ability to learn quickly is a significant difference between humans and artificial intelligence; humans can use their prior knowledge and experience to learn new techniques rapidly. The training and assessment of meta-learning can be compared to people swiftly adjusting to new activities after acquiring fundamental abilities [19]. For instance, during a child’s development, he or she will be exposed to many new things to recognize; when a child recognizes a bird and then encounters a puppy, he or she can distinguish between the two. In conventional machine learning model training, a huge number of images of puppies and birds are often required for the model to recognize and distinguish between them. As shown in Figure 1, the meta-learning model can gather experience by learning 50 tasks, and when the 51st task is faced, it will perform better than conventional machine learning.

In machine learning, once we select the model and determine the loss function, we bring the data into the model for training and continuously optimize the parameters, which leads to a better model: y = f(x). In meta-learning, the training process is divided into two layers. The first layer is to bring multiple sets of training tasks (f1, f2, f3, etc.) and their corresponding data sets (x1, x2, x3, etc.) into the model for training iteratively to optimize the parameters to obtain f = F(x), where F(x) is used to go to train the new f, while the second layer is to find the f(x) corresponding to each training task. The f(x) in machine learning is directly going to find the relationship between features and labels. While F(x) in meta-learning is used to find the relationship(f = F(X)) between the individual training task f and F(x), and it is f that is the individual specific task: y = f(x). As shown in Figure 2, this process of finding F(x) is what gives the model the ability to quickly adapt to new tasks, which is also known as learning to learn.

2.2. MAML

The concept of meta-learning has been around since the 1980s, and there are currently a number of ways to implement it. These include the meta-learning method based on recurrent networks, the meta-learning method based on metric learning, the meta-learning method based on initialized parameters with strong generalization, the meta-learning method based on optimizers, etc. Scholars in the industry have paid a great deal of attention to the direction of initialization parameters based on strong generalization, and many meta-learning models with strong practical applications have been built from the MAML algorithm in this direction. The author optimizes the neural network model via the MAML technique.

The Model-Agnostic Meta-Learning (MAML) algorithm is proposed by Finn et al. [20]. Model-Agnostic refers to model-independent, indicating that this algorithm will not pick network architecture and can be well integrated with most network architecture algorithm models. As we all know, the initial parameters of the model play an important role in improving the convergence speed of the model and finding the optimal solution of the parameters. As shown in Figure 3, the core idea of the MAML algorithm is to train a set of more appropriate initial parameters of the model through a small amount of training data and then bring a set of parameters into the subsequent training so that the model can converge quickly and obtain good training results.

3. Establishment of Data Sets

In this study, from the perspective of reservoir engineering, combined with relevant oil recovery mechanisms and engineers’ experiences [21], it is known that the formation of an advantage seepage channel is mainly related to the dynamic and static parameters of injection—production wells [22,23]. We examine the elements that contribute to the formation of advantage seepage channels and extract the dynamic and static data of injection-production wells to identify advantage seepage channels. The static data consist of porosity, permeability, permeability variation coefficient, dart coefficient, reservoir thickness, effective thickness, and sedimentary facies; the dynamic data consist of monthly liquid production, monthly water injection volume, water injection intensity, fluid producing intensity, injection-production well spacing, flowing pressures, and injection-production pressure difference. Table 1 illustrates the example data set’s format.

The level in the table represents the strength of the advantage seepage channel, with 0 being a strong advantage seepage channel(S-ASC), 1 representing a weak advantage seepage channel(W-ASC), and 2 representing a non-advantage seepage channel(N-ASC). One thousand samples of each of the three types of data, strong, weak, and non-advantage, are taken to create the data set D. High-quality data collection will considerably improve the model’s accuracy; hence data preprocessing is necessary.

3.1. Feature Screening

Feature screening [24,25] is a very important step in the process of machine learning modeling, which will directly affect the effectiveness of the final model. However, there is no uniform standard in the industry on how to select training features. Feature screening methods are mainly divided into the filtering method, the packaging method, and the embedding method. It is necessary to select appropriate methods according to different modeling tasks. Combining the respective feature screening methods to do the filtering based on a comprehensive understanding of the business is a sensible strategy. Considering that the model built in this paper is a neural network model, both the packaging method and the embedding method need to select the optimal feature subset through the model, and the workload will be greatly improved. Therefore, based on fully analyzing the formation mechanism of the advantage seepage channels, this paper considers the influence of various factors on the formation of the channels and uses the filtering method to screen the features.

The filtering method is based on the calculation results of statistical tests to determine the relevance of features and labels for feature selection. It contains, among others, variance filtering [26], chi-square filtering [27], the F-test [28], and the mutual information approach [29]. Table 2 outlines the principles and filtering rules for each of the four techniques.

In practice, a single variance filter may filter out features that are significant to the labels due to issues such as inconsistent data dimensions or a lack of understanding of business mechanisms, whereas the mutual information method can capture arbitrary relationships (including linearity and nonlinearity) between features and labels, unlike the chi-square filter and F-test. Therefore, this work calculates the data using both variance filtering and mutual information. The variance calculation can notice the dispersion of the sample data set, while the mutual information value can establish the degree of linkage between the features and labels. The pertinent calculation outcomes are displayed in Figure 4.

Figure 4 demonstrates that there is almost no correlation between reservoir thickness (RT), porosity (Poro), effective thickness (ET), fluid-producing intensity (FPI), and labels. However, there is a strong correlation between the interlayer permeability variation coefficient of oil wells (IPDCOOW), the interlayer permeability dart coefficient of oil wells (IPDCOOW), and labels. However, the correlation between the features cannot be explored by screening with the variance filtering and mutual information method only, and this paper uses the Pearson correlation coefficient method to study the correlation between the features.

Pearson Correlation is a method for determining the linear correlation between two vectors and measuring the similarity of vectors. The output range is [−1, 1], where 0 indicates no correlation, a negative value indicates a negative correlation, and a positive value indicates a positive correlation. The greater the absolute value of the value, the higher the correlation, and vice versa. As illustrated in Figure 5, the estimated results are presented on a heat map.

As shown in Figure 5, the Pearson correlation coefficients between the two features, such as reservoir thickness (RT) and effective thickness (ET), monthly liquid production (MLP) and monthly water injection volume (MWIV), monthly liquid production (MLP) and fluid producing intensity (FPI), monthly water injection volume (MWIV) and water injection intensity (WII), are all greater than 0.7, indicating a strong correlation. According to the calculation results of Figure 4 and Figure 5, the optimal feature subset is finally selected, as shown in Table 3. The data samples of dataset D are extracted according to the feature extraction of the optimal feature subset to form dataset D1.

3.2. Filling Missing Values and Outliers

There are often many missing values and outliers in the production data of oilfields, which will affect the analysis of data and modeling prediction. The common situation in the industry is that engineers tend to simply delete and ignore these missing values and outliers or use the traditional statistical filling methods such as median filling and mean filling. All of the above practices tend to ignore the variation pattern in the data itself, resulting in data distortion. In recent years, with the rise of machine learning technology, the era of big data has presented higher requirements for data quality, and how to significantly improve the usability of data through data preprocessing has become a popular area of study, as have numerous algorithms for filling missing values [30].

Since most of the dynamic and static data in the field are obtained from well-point tests and manual records, such as permeability, porosity, and fluid production, there may be missing values and outliers in the data obtained. A random forest model is established to fill in the missing values and outliers [31]. The specific filling process is to use the data feature columns with missing values and outliers as labels and use other feature columns with better data quality as sub-feature sets to build a new data subset. The data samples with complete label column data are used as the training data set to train the model and adjust the parameters to maximize the model accuracy. Then, the feature set of data samples with missing or abnormal label column data is brought into the model to predict the values of the label column, and the purpose of filling in the missing and abnormal values is achieved.

Taking filling the missing value of permeability of a well as an example, two methods of mean filling and random forest model filling are used to fill the data. The results are shown in Figure 6, the distribution of data obtained by random forest filling is closer to the real data.

3.3. Data Dimensionless

In machine learning training, the data samples obtained are frequently multivariate, and the data distribution of each feature may vary substantially, preventing feature comparability. Before starting with the modeling process, we need to dimensionless the data [32].

Dimensionless data scaling can either be linear or nonlinear. The most prevalent linear dimensionless scaling techniques are the Min-Max Scaling and the StandardScaler, as shown in Table 4.

In machine learning, training data often have outliers and are unevenly distributed, so normalization is a more common method, and in this paper, we choose the normalization method to dimensionless the data.

4. Modelling and Experimental Results

The artificial neural network (ANN) is constructed to imitate the process of information transmission in the neural network of the human brain; data are modified linearly by parameters (weights and deviations) on neurons and nonlinearly by activation functions. The data are passed from the input layer to the hidden layer, where they are then processed and passed to the output layer, a process known as forward propagation; the parameters of the neurons are updated iteratively by optimizing the loss function, a process known as backward propagation, as illustrated in Figure 7.

4.1. Selection of Activation Function

The activation function is a crucial element of neural networks. A neural network without an activation function is just a linear regression, but the addition of an activation function enables the network to transform input data nonlinearly, allowing it to learn more complex tasks. Common activation functions include, among others, sigmoid, tanh, and relu. As indicated in Figure 8, we build three BP neural network models using the aforementioned three activation functions; the learning rate is 0.1, we bring in data for training, and compute the model’s accuracy, evaluate their effects, and select the optimal activation function.

From Figure 8, it can be noted that the model using tanh as the activation function is more accurate. Hence tanh is chosen as the activation function.

4.2. Determination of Neural Network Depth

The evolution of neural networks from single-layer to multi-layer has resulted in a qualitative leap in network performance, but this does not imply that the greater the number of layers, the better the model’s performance; this must be determined based on the unique circumstances. In this research, a BP neural network model with one, two, three, and four hidden layers (all linear layers and each linear layer is followed by a tanh activation function) is constructed, the learning rate is 0.1, trained with data, and the model’s performance is measured to determine the effect of the number of hidden layers. This is seen in Figure 9.

Figure 9 demonstrates that the performance of the model has improved as the number of network layers has increased. When there are four hidden layers, the accuracy of the model in the training set can reach 100 percent, while the accuracy of the test set can approach 90 percent. However, increasing the number of layers also increases the number of neurons, and the model needs to calculate additional parameters. Since the accuracy achieved by the existing model is good, this article chooses to construct a BP neural network model with four hidden layers, and the learning rate is 0.1, as depicted in Figure 10.

4.3. MAML Optimized Tanh4-BP

This task serves as the fundamental unit for the training of the MAML model. In this study, the training task is a three-classification problem, where 20 data samples are collected from each of the three categories, of which five are used for training and the remaining 15 are utilized for testing, forming a task also known as “3-way, 5-shot”. The query set is made up of 3 × 15 test data samples, whereas the support set is made up of 3 × 5 training data samples. In a ratio of 7:3, dataset D1 is split into the training set D1_train and the testing set D1_test. The “3-way, 5-shot” rule is used to divide the training set D1 train into task sets. The learning rate is

\partial

for internal training, which updates the parameters of the training tasks, and

β

for external training, which updates the parameters of the entire MAML model. In this experiment,

\partial

is 0.1,

β

is 0.01. Figure 11 illustrates the parameter update process for the Tanh4-BP-MAML model. The specific steps are listed below.

Randomly initialize the Tanh4-BP model’s parameters.
Set the outer training termination condition to 100 repetitions.
Randomly select 5 Tasks from the set of Tasks.
Run the next process for the 5 extracted Tasks.
The Support set of Tasks is extracted and brought into the Tanh4-BP model for training, the loss function $L_{ti}$ and its gradient $\overline{v}$ to $θ$ are calculated, and the learner parameters $θ^{'} = θ - \partial \overline{v}$ are updated.
Extract the Task’s Query set for MAML parameter updates.
Steps 5 and 6 must be executed for each of the five extracted Tasks.
The updated model parameters of each task ${θ^{'}}_{i}$ are propagated forward in the respective Query set to calculate the loss Lossi, sum the Los of the 5 tasks to derive the ${θ^{'}}_{i}$ , and update the MAML parameters by multiplying them with $β$ .

To analyze the effect of data preprocessing, dataset D was also split 7:3 into D_train and D_test. Two training datasets, D_train and D1_train, were used to train the Tanh4-BP model and MAML optimized Tanh4-BP model (Tanh4-BP-MAML), using the test datasets D_test and D_test1 to validate the model effects, as shown in Figure 12.

From Figure 12 and Table 5, it can be seen that the Tanh4-BP-MAML model outperforms the Tanh4-BP model for both D_test and D1_test datasets. Meanwhile, the model accuracy of both models on the D_test dataset is higher than that of the D1_test, but the differences are insignificant.

4.4. Practical Applications

The advantage seepage channels were identified in block Y of the X oil field by establishing a model according to the above method, and a total of 892 channels were identified, including 115 strong advantage seepage channels, 263 weak advantage seepage channels, and 514 non-advantage seepage channels. When comparing the prediction results of the model with the tracer monitoring results in the field, the identification accuracy of the model is as high as 90%. As shown in Figure 13, it is proved that the advantage seepage channels of X oilfield are developed, and it is necessary to manage the advantage seepage channels to improve the development efficiency.

5. Conclusions

1. By using the random forest model to fill in missing values and outliers, the resulting data are closer to the true values than the traditional mean-fill approach. When dealing with field anomalies in oil fields, we strongly advocate that we should find suitable filling algorithms to deal with them according to the characteristics of the respective data and abandon the traditional mean filling.

2. The optimal feature subset was selected by variance filtering, the mutual information method and the Pearson correlation coefficient combined with artificial experience. The data sets D and D1 are brought into the model training, respectively. From the training results, it can be seen that the performance of the model on the original data set D is only slightly better than that of the model on D1. From the perspective of reducing the calculation amount of the model and improving the operation efficiency of the model this experimental result confirms the necessity of feature screening.

3. The MAML-optimized model has greater precision and convergence capability than the original model. The MAML model can quickly achieve optimal performance with a small quantity of data, is universal, and can be combined with a wide range of machine learning methods. In the realm of oil and gas AI, the MAML algorithm gives a solution to the problem of poor data quality of oilfield field data, which is of critical relevance.

4. When applying the model to predict the advantage seepage channels of the actual block, the model accuracy is up to 90% compared with the tracer monitoring results, which confirms the feasibility of using machine learning algorithms to solve oilfield problems based on dynamic and static oilfield data. The smart oilfield is unquestionably a future trend, but there are still practical issues to be resolved, such as the low quality of the obtained field data.

Author Contributions

Methodology, B.Z.; Software, J.L.; Investigation, B.Z.; Writing—original draft, C.D.; Writing—review & editing, E.Y.; Project administration, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Natural Science Foundation of China (No. 51834005) and Natural Science Foundation of Heilongjiang Province (No. LH2021E013).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict to interest.

References

Guo, X. Research on the Identification Method of Advantage Seepage Channels Based On Dynamic Data of Oilfield Development. Master’s Thesis, Northeast Petroleum University, Daqing, China, 2021. [Google Scholar]
Li, T. Research on the Identification of Advantage Seepage Channels Based on Logging Curves. Master’s Thesis, Northeast Petroleum University, Daqing, China, 2021. [Google Scholar]
Li, D. Advantages and disadvantages of conventional well test analysis method and improvement countermeasures. Petrochem. Technol. 2017, 24, 228. [Google Scholar]
Guo, J.; Niu, B. A review of inorganic geochemical methods for paleofluidic studies. Chin. Geol. Surv. 2017, 4, 45–49. [Google Scholar]
Di, D.J.; Pang, W.; Mao, J.; Guo, X. Current status and development suggestions of horizontal well output profile testing technology. Pet. Drill. Prod. Technol. 2022, 44, 56–62. [Google Scholar]
Song, K.; Yang, Q.; Fu, Q.; Yang, E.; Sun, Y. Fuzzy comprehensive evaluation method to determine inefficient circulation well formation. Drill. Technol. 2006, 4, 35–37. [Google Scholar]
Geng, W.; Yang, E.; Song, K. Comprehensive evaluation of water drive development effect in the southern part of Songfangtun oilfield based on fuzzy comprehensive evaluation. Pract. Underst. Math. 2014, 44, 76–80. [Google Scholar]
Dong, C.; Song, K.; Shi, C.; Zhu, M.; Cui, X.; Liu, Z. A new method for fast prediction of dynamic indicators of water-driven oil well stratification. Xinjiang Pet. Geol. 2017, 38, 233–239. [Google Scholar]
Zhu, L.H.; Wang, H.T.; Wei, L.Y.; Guo, J.H. Quantitative identification of inefficient and ineffective circulation field based on capacity resistance model. Daqing Pet. Geol. Dev. 2019, 38, 239–245. [Google Scholar]
De Holanda, R.W.; Gildin, E.; Jensen, J.L. A generalized framework for Capacitance Resistance Models and a comparison with streamline allocation factors. J. Pet. Sci. Eng. 2018, 162, 260–282. [Google Scholar] [CrossRef]
Wang, M.; Sun, Y.; Song, K.; Gao, T.; Yang, E. Analysis of factors influencing the water injection volume of water drive in dense well network based on Lasso-Lars. Pract. Underst. Math. 2016, 46, 124–131. [Google Scholar]
Li, D.; Chi, J.; Xiang, B.; Wang, M. Missing filling algorithm for oil data based on SMOTE and KNN. Pract. Underst. Math. 2019, 49, 187–195. [Google Scholar]
Yuan, J.; Zeng, X.; Wu, H.; Zhang, W.; Zhou, J.; Chen, B. Analytical Determination of Interwell Connectivity Based on Interwell Influence. Tsinghua Sci. Technol. 2021, 26, 813–820. [Google Scholar] [CrossRef]
Fu, Y.; Li, T.; Guo, X.P. Research on the identification of advantage seepage channels based on support vector machine. Comput. Technol. Dev. 2021, 31, 182–185. [Google Scholar]
Liu, W.; Liu, W.; Gu, J.W.; Ji, C.F.; Sui, G.L. Study of inter-well connectivity in oil reservoirs using a combination of Kalman filter and artificial neural network. Oil Gas Geol. Recovery 2020, 27, 118–124. [Google Scholar]
Zhao, Y.; Jiang, H.; Li, H. Identification and prediction method of water injection channel state in oil field developed by water injection. J. Pet. 2021, 42, 1081–1090. [Google Scholar]
Li, F.; Liu, Y.; Wu, P. A review of meta-learning research. J. Comput. Sci. 2021, 44, 422–446. [Google Scholar]
Wu, P.; Li, F. A metric metric learning algorithm based on group equivariant convolution. Comput. Eng. 2022, 48, 72–78. [Google Scholar]
Zhu, Y.; Li, M. A review of meta-learning research. Telecommun. Sci. 2021, 37, 22–31. [Google Scholar]
Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
Dong, C.; Song, K.; Liu, Z.; Sun, N.; Cui, X. A fuzzy comprehensive evaluation method for identifying inefficient recirculating wells and Matlab implementation. Pract. Underst. Math. 2018, 48, 271–275. [Google Scholar]
Zhang, J.C.; Ren, S.H.; Lin, L.; Zhao, S.H.; Feng, S. A quantitative calculation method for stratified and directional development index of production wells. J. Zhejiang Univ. (Sci. Ed.) 2022, 49, 598–605. [Google Scholar]
Zhang, J.C.; He, X.R.; Zhou, W.S.; Geng, Z.L.; Tang, E.G. Study on the main control factors of inter-layer interference in large-section combined recovery wells. J. Southwest Pet. Univ. (Nat. Sci. Ed.) 2015, 37, 101–106. [Google Scholar]
Wang, M.; Wang, C.; Li, S.; Lu, Y.H.; Song, J.A.L.; Li, K.; Hou, Y. Research on deep learning models fused with regularization methods in feature screening of high-dimensional data. China Health Stat. 2021, 38, 73–80. [Google Scholar]
Ji, S.; Shi, H.; Lyu, Y.; Guo, M. Feature selection algorithm for massive high-dimensional data based on granulation-fusion. Pattern Recognit. Artif. Intell. 2016, 29, 590–597. [Google Scholar]
He, H.; Jia, J.; Liu, H. Improved multipath Metropolis light transmission algorithm based on variance filtering. J. Comput.-Aided Des. Graph. 2018, 30, 1082–1088. [Google Scholar] [CrossRef]
Zhen, Z.L.; Zhang, J. KL scatter-based feature filtering of high-dimensional text data in cardinality statistics. Stat. Decis. Mak. 2022, 38, 43–46. [Google Scholar]
Wang, Y.; Tang, M. Channel leakage assessment based on Bartlett and multicategorical F-test side channels. J. Commun. 2021, 42, 35–43. [Google Scholar]
Zhao, T.; Wu, S.; Yang, M.; Chen, Y.; Wang, Y.; Yang, J. Research on intention reinforcement learning method based on mutual information maximization. Comput. Appl. Res. 2022, 39, 3327–3332+3364. [Google Scholar] [CrossRef]
Chen, J.; Wang, X.; Luo, L.; Cui, J. Missing value filling effect: A comparison of machine learning and statistical learning. Stat. Decis. Mak. 2020, 36, 28–32. [Google Scholar]
Zhang, X.; Cheng, Y. Missing value filling method for component data based on random forest model. Appl. Probab. Stat. 2017, 33, 102–110. [Google Scholar]
Li, X.; Gao, X. Research on dimensionless methods for obeying data with different distributions. Stat. Decis. Mak. 2022, 38, 31–36. [Google Scholar]

Figure 1. The learning progression process of the meta-learning model.

Figure 2. The difference between meta-learning and machine learning.

Figure 3. Two-layer parameter update of MAML model.

Figure 4. Variance and mutual information values of dataset D.

Figure 5. Pearson’s correlation coefficient for data set D.

Figure 6. Different methods to filling missing values.

Figure 7. Forward and backward propagation of artificial neural networks.

Figure 8. Comparison of the accuracy of different activation function models.

Figure 9. Comparison of the accuracy of different network depth models.

Figure 10. Tanh4-BP network architecture.

Figure 11. Tanh4-BP-MAML model parameter update.

Figure 12. Comparison of accuracy of different models with different datasets.

Figure 13. Identification results of advantage seepage channels.

Table 1. Sample data set.

	Sample 1	Sample 2	Sample 3
Features	Sample 1	Sample 2	Sample 3
Thin layer	S1	S1	S1
Water well	W2-1	W2-2	W2-3
Oil well	O2-1	O2-2	O2-3
Porosity (Poro)	0.21	0.21	0.22
Permeability (Perm)/μm²	0.28	0.12	0.14
Interlayer permeability variation coefficient of water well (IPVCOWW)	0.62	0.58	0.58
Interlayer permeability dart coefficient of water well (IPDCOWW)	2.18	1.90	2.33
Interlayer permeability variation coefficient of oil well (IPVCOOW)	0.53	0.64	0.763
Interlayer permeability dart coefficient of oil well (IPDCOOW)	1.77	2.12	3.003
Reservoir thickness (RT)/m	1.15	0.6	1
Effective thickness (ET)/m	0.73	0.425	0.58
Sedimentary facies (SF)	31	42	23
Monthly liquid production (MLP)/m³	91.32	88.41	107.12
Monthly water injection volume (MWIV)/m³	127.56	109.32	129.41
Water injection intensity (WII)/m³/d*m	5.82	8.57	7.43
Fluid producing intensity (FPI)/m³/d*m	4.16	6.93	6.15
Well spacing (WS)/m	307.6	347.5	341.6
Flowing pressures (FP)/mPa	6.24	7.31	4.65
Injection-production pressure difference (IPPD)/mPa	12.48	4.37	9.3
Level	2	1	0

Table 2. Summary of common filtering methods.

Method	Principle	Filtering Rules
Variance filtering	Calculate the variance of the feature to filter the feature.	Screen out features with zero or near zero variance.
Chi-square filtering	To calculate the differences between features and labels.	Features with large chi-square values and p-values less than 0.05 were selected.
F-test	Find the linear relationship between features and labels.	Features with p-values less than 0.05 were selected.
Mutual information approach	Any relationship (linear or non-linear) between features and labels can be captured, and a larger value of mutual information indicates a stronger correlation, and vice versa.	The mutual information value is between [0, 1], and the features with a mutual information value greater than 0 are generally chosen.

Table 3. Optimal feature subset.

No.	Feature	No.	Feature
1	Interlayer permeability variation coefficient of water well (IPVCOWW)	7	Monthly liquid production (MLP)
2	Interlayer permeability dart coefficient of water well (IPDCOWW)	8	Monthly water injection volume (MWIV)
3	Interlayer permeability variation coefficient of oil well (IPVCOOW)	9	Water injection intensity (WII)
4	Interlayer permeability dart coefficient of oil well (IPDCOOW)	10	Fluid producing intensity (FPI)
5	Well spacing (WS)	11	Flowing pressures (FP)
6	Reservoir thickness (RT)	12	Injection-production pressure difference (IPPD)

Table 4. Summary of dimensionless methods.

Method	Formula	Feature	Applicable Scenarios
Min-Max Scaling	$x^{'} = \frac{x - \min (x)}{\max (x) - \min (x)}$	The Min-Max Scaling is “pat flat” uniform to the interval (0–1).	The data are stable and there are no extreme maximum and minimum values.
StandardScaler	$x^{'} = \frac{x - μ}{σ}$	The standardized scaling is more “flexible” and “dynamic”.	There are outliers and a lot of noise in the data.

Table 5. Comparison of training time of different models with different datasets.

	D_Test	D1_Test
Models	D_Test	D1_Test
Tanh4-BP	102s	82s
Tanh4-BP-MAML	85s	70s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, C.; Zhang, B.; Yang, E.; Lu, J.; Zhang, L. Identification of Water Flooding Advantage Seepage Channels Based on Meta-Learning. Energies 2023, 16, 687. https://doi.org/10.3390/en16020687

AMA Style

Dong C, Zhang B, Yang E, Lu J, Zhang L. Identification of Water Flooding Advantage Seepage Channels Based on Meta-Learning. Energies. 2023; 16(2):687. https://doi.org/10.3390/en16020687

Chicago/Turabian Style

Dong, Chi, Baobin Zhang, Erlong Yang, Jinhao Lu, and Linmo Zhang. 2023. "Identification of Water Flooding Advantage Seepage Channels Based on Meta-Learning" Energies 16, no. 2: 687. https://doi.org/10.3390/en16020687

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Identification of Water Flooding Advantage Seepage Channels Based on Meta-Learning

Abstract

1. Introduction

2. Methodology

2.1. The Distinction between Meta-Learning and Machine Learning

2.2. MAML

3. Establishment of Data Sets

3.1. Feature Screening

3.2. Filling Missing Values and Outliers

3.3. Data Dimensionless

4. Modelling and Experimental Results

4.1. Selection of Activation Function

4.2. Determination of Neural Network Depth

4.3. MAML Optimized Tanh4-BP

4.4. Practical Applications

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI