2.2.2. Representation of Mechanism Knowledge and Fault Symptom Inference for ESP Wells
This section mainly introduces the representation of the fault mechanism knowledge of ESP wells and the construction of the fault symptom inference model for ESP wells.
Figure 2 shows the framework of the ESP fault symptom inference model.
- (1)
Representation of Mechanism Knowledge
Based on the automated data acquisition equipment of an oil production plant, real-time working parameters of the ESP wells are obtained, including three-phase current, voltage, wellhead oil pressure, wellhead temperature, wellhead back pressure, and wellhead casing pressure, totaling six parameters. In conjunction with the oil production plant’s yield calculation methods [
26,
27,
28], the liquid production, oil production, and gas production, as well as the oil/gas ratio and liquid/gas ratio of the ESP wells, are determined.
These parameters reflect the working status characteristics of the ESP well in different aspects during working. For example, an increase in the value of the three-phase current causing the motor to overload shows two obvious fault symptoms: “increase in three-phase current” and “overload”.
This paper primarily categorizes fault types into four main categories: electrical system faults, mechanical system faults, environmental factors faults, and fluid-related faults and other faults.
This study has extracted knowledge related to the mechanism structure and fault diagnosis of ESPs based on the publicly available literature on fault diagnosis of ESPs and the existing historical fault records of ESPs in oil production plants over the past decade. The fault mechanism knowledge of ESPs is classified according to fault types and fault symptom characteristics, forming 50 types of ESP well fault classifications suitable for actual production diagnosis. This study constructed a fault tree [
29] for ESP wells and a semantic network [
30].
Figure 3 shows the fault tree for ESPs, and
Figure 4 shows the semantic network of fault symptoms for ESPs.
Different types of working parameters and fault indications have different associations with different fault types. This study employs probabilistic and statistical methods to organize and analyze the correlation between historical fault data and working parameters, forming a parameter weight table between ESP fault types and parameter types, as shown in
Table 1.
For example, when an ESP experiences electrical system faults such as cable burning, the weight of electrical parameters like current and voltage is considered more significant than the pressure parameters related to the wellhead, with a weight of 0.48. The sum of the weights in each row of the table is 1.
Based on the real-time data of the ESP wells, this study conducts time-series analysis and classification of the changes in working status parameters over the past 60 time steps. The changes in parameters are categorized into four types: increase, decrease, fluctuation, and zero drop, which form a manifestation of the fault. Fault types are inferred using a semantic network based on these fault manifestations.
For example, if the voltage parameter shows an increasing trend over a certain period in the past, it is inferred as the fault manifestation of “voltage increase”. Based on the semantic network, the working status of the ESP well at this time is inferred to have experienced a fault of the type “electrical system fault”.
Subsequently, referring to the parameter weight table, this study retrieves the weights of the working parameters associated with this type of fault. These parameter weights are then calculated with the corresponding parameter sequences. The results of the calculations are concatenated with the fault parameter features to serve as the input for the subsequent model, thereby expanding the feature information contained in the model samples.
- (2)
A Fault Symptom Inference Model for ESP Wells Based on Meta-Learning Convolutional Shrinkage Neural Network
To address the challenges of limited historical fault data, poor document quality, and the complex working status of ESP wells, this paper constructed a meta-learning convolutional shrinkage neural network model.
The model utilizes the idea of meta-learning [
31,
32,
33] to analyze the time-series parameters of the working status of ESP wells and embeds a shrinkage neural network (SNN) [
34,
35] for thorough learning.
Figure 5 is the architecture diagram of the inference model, which includes 1 input layer, 4 convolutional layers, 4 SNNs, and 1 fully connected layer.
In total, 80% of the historical fault dataset of the ESP wells is used as the training set and 20% as the testing set . The support set and query set are separated in both the training set and the testing set using a similarity matching method.
The similarity calculation formula is (1).
In the formula, and are the working parameter sequences vectors of the ESP, and and are the norms of the vectors and . Sample pairs with similarity scores above a predefined threshold are considered as similar samples and used as the support set. The support set is employed for the inner-loop training of the model, where model parameters are updated to learn the classification tasks of fault symptoms, thereby enabling fault symptom inference for the ESP on the query set. The remaining samples form the query set, which is used to evaluate the model’s adaptability to different fault symptom classification tasks of ESPs and further enhance the model’s few-shot learning capability.
For the support set
in the training set, the initial parameters
are trained using the meta-learning convolutional SNN model. The optimal parameters
are updated by summing all the losses in the query set
. In the test set, the optimal parameters
are fine-tuned using the support set, and the parameters
are tested in the query set using the model, forming a process of “training–updating–testing”. Through meta-learning, the model can find a common parameter
suitable for different fault symptom diagnosis tasks of ESP wells, and the calculation formula for the common parameter
is Formula (2).
In the formula, represents the optimal parameter value for meta-learning training. represents the parameters of the meta model. represents all possible fault symptom inference tasks for ESPs. represents the distribution of tasks. represents the support set obtained from sampling in task , and represents the loss value obtained from training with support set on task . Formula (2) aims to find the , which is the parameter that minimizes the sum of losses obtained from training on the support set of all tasks.
By using meta-learning convolutional shrinkage neural networks, the model can construct a link between the working parameters of the ESP and the fault symptoms, combining with knowledge of the mechanism of ESP wells. This approach enables effective and accurate inference of ESP fault symptoms, especially useful when training data are relatively scarce.
The model also serves as an important demonstration of the integration of mechanistic knowledge of electric submersible pump wells with deep learning models in this paper.
2.2.3. Knowledge-Integrated ESP Well Fault Diagnosis Model
This section mainly uses the inference results of fault symptoms of ESP wells, combines the knowledge graph of the ESP well working status, and screens out subgraphs and fault result sets. Through a dual diagnostic mode of neural networks and expert rules, the fault diagnosis of ESP wells is realized.
Figure 6 shows the framework diagram of the knowledge-integrated ESP well fault diagnosis model.
- (1)
Fault Subgraph Query Based on Fault Symptom
The inferred fault symptoms and the fault symptom nodes in the knowledge graph are represented as string sets. The Jaccard similarity algorithm is used to calculate the similarity between strings, and the node with the highest similarity is selected for entity linking.
The formula for calculating Jaccard similarity is (3).
Among them, A and B represent the sets formed by strings that require similarity calculation. represents the size of the intersection between A and B, and represents the size of the union between sets A and B. The closer the Jaccard similarity is to 1, the more similar the two strings are.
After completing entity linking, the breadth-first search strategy is employed to obtain the neighboring nodes of each fault symptom node, which are then combined to form a subgraph. A spectral-based graph neural network (GNN) is utilized, leveraging the feature vectors of the subgraph nodes to compute the centrality of the node feature vectors. This process identifies the fault nodes with a higher probability of occurrence in the ESP when these fault symptoms coexist.
The GNN calculates the embedded representation of nodes through message passing. By continuously iterating and updating the representation of nodes, the embedding of nodes can capture the information of nodes and their neighboring nodes to reflect their importance in the network. Assuming the embedding representation of node
i is
, where
l represents the
l-th layer where node
i is located, the embedding update of node
i can be calculated using the following Formula (4).
In the formula, f represents the update function, represents the set of adjacent nodes of node i, and represents the set of embedded representations of adjacent nodes of node i. When calculating the eigenvector centrality of a node, the eigenvalue is obtained by decomposing the Laplace matrix of the subgraph. The corresponding eigenvector is then derived, with the absolute value of its first non-zero element representing the node’s eigenvector centrality.
The 6 fault nodes with the highest feature vector centrality are ranked and filtered. In combination with the classification of fault types in the ESP fault tree, irrelevant faults are eliminated and the 4 most likely fault types are selected as the fault result set.
- (2)
Fault Diagnosis Based on BiLSTM and Expert Rules
By utilizing Bidirectional Long Short-Term Memory (BiLSTM) [
36,
37], the model can efficiently and accurately process a large amount of time-series data and diagnose faults in ESP wells based on known fault result sets and working parameters, ensuring good timeliness and accuracy of the model. Simultaneously, by incorporating expert rules of ESP fault diagnosis, an Expert Rule-based Inference Machine (ERIM) is constructed. This ERIM compensates for the neural network models’ low accuracy in identifying working parameters with different working statuses but similar time-series features, thereby enhancing the accuracy and reliability of working status recognition.
The input sequence for fault diagnosis consists of known fault symptom sets, fault result sets, and real-time working parameters of the ESP. BiLSTM is employed to infer and analyze the time-series features within the sequence. By extracting features from time series in both forward and reverse directions, BiLSTM effectively leverages the changing characteristics of working parameters for fault diagnosis. Additionally, the Conditional Random Field (CRF) module is incorporated as a multi-classifier to achieve fault conclusion inference.
Certain specific changes in the working parameters of ESPs are often inevitably linked to specific faults and may also have mutually exclusive relationships with some other faults. By constructing the ERIM, the strong correlation and mutual exclusion relationship between the fault symptoms and faults of ESPs are clarified. Leveraging historical fault data and expert knowledge from oil extraction plants, a set of correlation degrees between fault symptoms and faults is established to achieve the inference and diagnosis of ESP faults.
For example, in an ESP well, when the working status of “decreased production”, “temperature drop”, and “decreased three-phase current” occur simultaneously, the possible fault set screened by eigenvector centrality calculation includes four faults: “insufficient formation supply”, “pump outlet blockage”, “choked nozzle”, and “motor insulation failure”. In the fault diagnosis knowledge of ESP wells, a “pump outlet blockage” would cause an increase in three-phase current, which is mutually exclusive with “decreased three-phase current”. Therefore, using the ERIM, the fault of “pump outlet blockage” can be excluded. Similarly, the fault of “motor insulation failure” would lead to an increase in current values, which is also mutually exclusive with “decreased three-phase current” and is thus excluded by the inference model. After excluding the above two faults, based on the association degree set in the expert rules, it is further determined that the possibility of “insufficient formation supply” is higher than that of “choked nozzle”. Therefore, the final diagnosis result is “insufficient formation supply”.
Table 2 and
Table 3 are partial representations of the association degree set and the mutual exclusivity table, respectively. The association degree relationship indicates the likelihood of a particular fault occurring when a certain fault symptom is observed. The mutual exclusivity relationship indicates that when a certain fault symptom is observed, a particular fault will definitely not occur.
The BiLSTM model and the ERIM diagnose the fault symptoms of the ESP well separately and output their respective diagnostic results and confidence levels. The fault result reasoning module will comprehensively evaluate the diagnostic results based on the confidence levels of the two models and select the conclusion with the higher confidence level as the final diagnostic result.
By driving the fault diagnosis through both the BiLSTM-based model and ERIM, the advantages of data-driven mathematical diagnosis can be fully utilized, combined with the mechanism diagnosis capabilities based on expert experience. This effectively improves the accuracy and reliability of ESP well fault diagnosis. Ultimately, this approach enables rapid identification of fault types, provides interpretable fault reasoning information, reduces the downtime of the ESP, and enhances the operational efficiency and safety of the ESP well system.