Fault Tracking Method for Relay Protection Devices

: A method of fault tracking for relay protection devices is presented in this paper. Fault tracking means that after the failure of relay protection devices, the anomalies and warning information are obtained through data-mining technology, and then, the fault tracking algorithm is used to ﬁnd the cause of failure. Let us take microcomputer protection as an example: Firstly, the common failure symptoms and the prior probability of failure causes can be collected through empirical ﬁeld data. Then, the concept of an event set is proposed; thus, the causes set and the symptoms set of failure can be created. According to the causal relationship between the causes set of failure and symptoms set of failure, the reasoning chain and the corresponding Bayesian network model are built. Then, the probability of failure causes can be obtained through backward reasoning to continue the tracking analysis of failure causes for relay protection devices. Since the data used in modeling are all from statistics, this method has strong applicability and represents a simple and reliable method for the timely determination and elimination of failure in a power system.


Introduction
When a power system fails, the corresponding circuit breaker should be tripped to cut off the fault, to reduce power outages. However, if the protection or circuit breaker itself fails, the result is a refuse-operation or maloperation, which is likely to expand the failure range. At present, the mainstream ideas of fault diagnosis of power systems are concerned with the accuracy of protection and circuit breakers. Some fault diagnosis methods can further determine whether the failure is caused by refuse-operation or maloperation, but these methods are unable to identify the internal causes of failure. Therefore, in order to rapidly find the cause and promptly eliminate it when failure occurs, this paper puts forward a concept of fault tracking to solve the problem. Fault tracking [1] refers to the process of using data-mining technology to classify and extract the alarm data inside substations, so as to determine the internal causes of failures. That is, after failure of the known device, the warning information can be used to traceback and find internal causes of its failure. Logically, this method is contrary to the traditional fault diagnosis methods. This method aims at making full use of various information sources in a power system to extend the function of the fault diagnosis algorithm. At the same time, this idea improves the concept of fault diagnosis. The notion of fault tracking means that fault diagnosis is no longer restricted to condition monitoring or fault feature recognition and enables it to go deeper into the device to determine the causes of failure. Through fault tracking, the internal causes of power system failure can be diagnosed, from shallow to deep levels. This method is based on alarm data inside substations, so it can provide a reference for what monitoring information needs to be added. In this paper, the analysis of various fault types of relay protection devices also provides an important guidance for the maintenance, design and improvement of devices.
In this study, the failure of a relay protection device was taken as an example to construct a fault tracking model. The algorithm of fault tracking for relay protection devices was utilized on the basis of the model. Relay protection devices provide a guarantee for proper operation of an entire power system. This is essential for the reliable, stable and economic operation of a power system to ensure its consistent operation and function. At present, most of the research on the failure of relay protection devices focuses on conditionbased maintenance [2,3], failure detection [4][5][6] and the diagnosis of hidden failures [7].
There are still few studies on the failures caused by internal faults of relay protection devices. Some only focus on one or two specific cases, such as separate research and analysis of the secondary circuit problem [8,9]. Others include too few fault types with insufficiently detailed analysis [10,11].
In view of the above analysis, this paper puts forward a fault tracking method for relay protection devices. By utilizing the received abnormal and warning signals, a fault tracking model for relay protection devices can be constructed on the basis of the combination of reasoning chain and Bayesian theory. The reverse reasoning ability of the Bayesian network is used to find causes of failure. This model contains the vast majority of fault types of relay protection devices. Based on related alarm and monitoring information, this method determines which monitoring information should be included. At the same time, the analysis of various fault types and their causes in this paper provides an important reference for the maintenance, design and improvement of relay protection devices.

Composition of Microcomputer Relay Protection
The microcomputer relay protection of a power system refers to a relay protection device based on digital signal processing technology with a microcomputer and microcontroller as the core components. With the progress of computer science, microcomputer relay protection has become a mainstream aspect of relay protection, which is mainly achieved through hardware and software [3,12].

1.
Hardware: The hardware of a microcomputer protection device mainly includes the analog input (AI), digital input (DI), central processing unit main system (CPU), digital output (DO), man-machine conversation interface (MMI), communication interface (CI) and power supplement unit (PSU). Among them, analog input is responsible for voltage and current analog acquisition and signal discretization. The digital input is responsible for collecting the contact information from the switchblade, the protection plate and other devices. The CPU main system includes the microprocessor CPU, data memory, program memory, timer, parallel interface and serial interface, which is responsible for the measurement, logic and control functions of relay protection. The digital output is composed of a photoelectric coupler and relay, which is responsible for protection against tripping and warning signal output. The power supply circuit provides DC regulated power for the whole device to ensure reliable power supply.

2.
Software: The software of a microcomputer protection device mainly includes the data acquisition, digital signal processing, protection discrimination logic, humancomputer interaction program, self-checking program, communication interface program and operating system.

Fault Analysis of Microcomputer Protection Device
The failure of a relay protection device is mainly divided into two categories: refuseoperation and maloperation. Refuse-operation refers to the failure of the protection function module in the relay device to detect the fault, or the failure of the transmission or execution of the tripping signal issued by the relay in the tripping circuit and the breaker control circuit and operation mechanism. When the protection device sends out the tripping signal, if there is no short circuit in the protected range, or if there is a short circuit in the adjacent lower-level equipment and its protection device does not refuse to operate, then this is referred to as protection maloperation [4].
We mainly studied the causes of refuse-operation or maloperation due to the problems of the protection device itself and its secondary circuit. Firstly, the reasons were divided into two categories caused by software settings and hardware problems. The main reasons for refuse-operation and maloperation due to software problems include setting errors or setting value errors. The fault causes in the hardware can be classified by modules, such as the failure of components in the CPU main system and the loss or distortion of analog acquisition data in the data acquisition system. Among them, most of failures in the man-machine conversation interface are caused by human errors and so are not taken into account. Through the analysis of the accident causes of the protection device and consulting the relevant literature [6,10,13,14], common causes that may directly lead to refuse-operation or maloperation of the protection device were obtained. All the failure causes were put into the causes set of failure and named M, M = {m1, m2 . . . m22}, as shown in Table 1. The prior probability p is the probability of the occurrence of the causes, that is, the ratio of the occurrence times of the corresponding failure cause to the total fault times of the protection device, obtained based on References [13,15]. Before and after the failure occurs, the substation receives a large number of abnormal and warning signals. These data are recorded by the fault oscillograph. Through datamining technology, the useful information related to protection can be sorted and screened out, and then the symptoms of failure related to causes are obtained. Let all the selected symptoms of failure constitute the symptoms set of failure, named S, S = {s1, s2...... s35}, as shown in Table 2. When there are enough failure samples, the corresponding relationship between causes and symptoms of failure can be found. We can describe it using probability statistics in order to qualitatively and quantitatively conduct fault diagnosis or fault tracking. Tables 1 and 2 cover the common causes and symptoms of failure in the current microcomputer protection devices. If other causes or symptoms of failure not in the table appear in practical applications, they can also be added into the table according to the category. With the completion of Table 2, the accuracy of fault tracking is improved. Assuming that the required information was obtained through data-mining technology, this study did not involve specific methods of data mining.

Reasoning Chain
A chain is a dynamic data structure. It consists of nodes that contain event information and points to relationships between nodes. The chain organizes and manages these nodes to form a new data structure that can achieve specific functions while avoiding a ring network. The reasoning chain shows the causality between events clearly and visually. Reason nodes at the front of the reasoning chain are used to characterize causes of the event [16,17]. The reasoning chain also meets the principle of reverse reasoning, the reason node information can be inferred according to the subsequent nodes. The simplest form of reasoning chain is denoted by A→B, meaning 'if A, then B'.
For the failure of a relay protection device, these nodes can be divided into causes of failure nodes and symptoms of failure nodes according to the causal relationship. Then, the reasoning chain from the causes set of failure to symptoms set of failure can be constructed. For example, if the current transformer is saturated, then the distorted current waveform is obtained, that is m9→s14.

Bayesian Network
A Bayesian network is a probabilistic network that combines the Bayesian probability method with graph theory. At the qualitative level, it uses a directed acyclic graph to show the relationship between the nodes more intuitively. At the quantitative level, the Bayesian network better expresses the correlation between symptom nodes and cause nodes through conditional probability distribution. Prior probability and posterior probability are two key factors of Bayes' theorem. The prior probability can be obtained from existing data statistics and calculations. The posterior probability is calculated by prior information and sample data.
In the fault tracking of a relay protection device, 'm' represents the suspected failure causes, namely the hypothesis in Bayesian theory; 's' represents failure symptoms, that is, the argument supporting the assumption. The Bayesian formula [18] is shown in (1): In Formula (1): In this paper, l = 22, namely, the total number of failure causes in Table 1; p(m i ): The prior probability of suspected failure causes m i is true; p s j m i : The probability of inducing failure symptom s j when m i occurs, which is conditional probability; p m i s j : The probability that the failure causes m i is true when the failure symptom s j is true, which is posterior probability.
Through the analysis of the Bayesian formula, it can be seen that, when predicting an uncertainty phenomenon, it is necessary to generate a prior probability by combining the existing information and statistical data systematically. The adjustment of the probability prediction of unknown events in the process of collecting new information and accumulating experience must also be realized, so as to improve the accuracy of the prediction results. When dealing with failure of relay protection devices, the internal faults of the protection device are causes, and the failure symptoms are results. Fault tracking in this paper refers to the process of finding causes by reverse reasoning when the result of the event is known. To be specific, the purpose is to track the causes of failure according to the known symptom information. When the symptoms of failure occur, the probability of each suspected failure cause is calculated, and then the most likely cause of relay protection device rejection or maloperation can be inferred. Since there are often multiple symptoms in a failure, Formula (2) can be used to calculate the Bayesian suspicion B(m i ) of possible causes corresponding to multiple symptoms [19]. Usually, the cause with the largest Bayesian suspicion is the most likely cause of failure. 'Sx' represents the possible symptoms set corresponding to the cause of failure.
The prior probability of failure causes of relay protection devices is given in Table 1. By consulting a large number of data and actual failure cases, the incidence relation of causes and symptoms of failure can be obtained through analysis and calculation, as shown in Table in the Appendix A. The values in the table represent the conditional probability p s j m i of the failure symptom s j when the cause m i occurs, that is, the ratio between the frequency of the failure symptom s j and the total number of the failure cause m i .

Construction of Reasoning Chain
According to the connection relationship shown in Table in Table 2 can be obtained by data-mining technology. For example, the failure of a relay protection device caused by TV secondary circuit multipoint grounding can be seen from four possible failure symptoms: (1) voltage sampling being zero, (2) the abnormal/invalid alarm of voltage sampling data, (3) three-phase voltage drift and (4) voltage waveform distortion. Therefore, the failure cause is m 10 , and the failure symptoms are s 15 , s 16 , s 18 and s 19 . A smaller reasoning chain can be obtained, as shown in Figure 1a.  Figure 1b. Therefore, in the actual fault tracking calculation, we can establish a simplified reasoning chain model based on the obtained failure symptoms, which helps to reduce the complexity of the network and simplify the calculation.

Construction of Bayesian Networks
From the complete reasoning chain model diagram, it can be seen that a cause may have multiple failure symptoms, and the same symptom also corresponds to more than one failure cause. Therefore, the next step is to use the Bayesian network to conduct reverse reasoning according to the known failure symptoms to determine the most likely failure cause. According to the reasoning chain in Figure 1b, a Bayesian network model can be constructed, as shown in Figure 2. The Bayesian network model is conducive to obtaining fault tracking results conveniently and quickly. In addition, the model takes into account the partial loss of data during transmission. For example, if the s 19 data are lost during the transmission, only s 15 , s 16 and s 18 are obtained at the station, and the simplified reasoning chain model and the Bayesian network model can still be obtained according to the above three failure symptoms, which contain both the lost data s 19 and the unrealized failure symptom s 17 . In the calculation of Bayesian suspicion, it is unknown whether the failure symptoms s 17 and s 19 do not occur, or the data of these two symptoms are lost. Therefore, it is necessary to consider the possibility of data loss and also reduce the interference of unrealized failure symptoms on the results. In this paper, the corresponding posterior probabilities of the failure symptoms (s 17 and s 19 ) not obtained in the model are multiplied by 0.1 and then substituted into Formula (2) for calculation. They are therefore taken into account in the calculation, but their influence on the results is reduced.
In addition, for a complex situation where multiple parts have faults at the same time, multiple simple reasoning chains and Bayesian network diagrams can be constructed after the obtained symptoms of failure are classified. Different diagrams do not affect each other, and they are calculated individually to obtain multiple possible causes at the same time. If a certain part of the equipment has multiple failures, maintenance work can also be carried out in accordance with the size of Bayesian suspicion in order to make the maintenance content clearer.

Fault Tracking Process
The fault tracking process of a relay protection device is as follows: After the relevant warning information of relay protection device failure is obtained by using data-mining technology, the symptoms subset of failure is established, and the corresponding causes subset of failure is also obtained according to the causal relationship in Figure in Appendix B.
If all the events in the two subsets are connected by causality in a graph, that is, only a connected graph is formed, then only a reasoning chain model is constructed. On the contrary, if the causes and symptoms of failure belong to various modules, then various independent reasoning chain models are constructed, and the causes subset and symptoms subset of failure are also be split into several corresponding groups. The next step is to build a corresponding Bayesian network model according to the reasoning chain. The probability of each suspected failure cause is calculated by Bayesian reverse reasoning and output in order of size. The probability of all possible causes is listed in the results, and the cause with the maximum probability is the most likely cause. Multiple models can obtain multiple causes. The on-site maintenance of relay protection devices can be used in combination with this approach. In case of time emergency, the field personnel can firstly check the most likely cause of failure, which can decrease maintenance time significantly and improve the efficiency of the on-site staff.

Fault Case 1 (Simple Fault)
When an AC double phase-to-ground metal fault occurs at the outlet of a 110 kV incoming line (No.151) in a 110 kV substation, the protection device returns after starting and refuses to operate [9]. After the accident, the substation shows that it has received abnormal voltage sampling data warning signals. When viewing the wave recording file, it can be seen that the AC phase voltage is not zero, the amplitude and phase of the three-phase voltage are offset and the input voltage of the protection device is distorted. During the insulation inspection of the voltage circuit, it can be found that there are still earthing points after switching off the earthing point in the control room. After inspection, the secondary circuit of TV secondary winding in the 110 kV TV terminal box is grounded by a zinc oxide arrester, which has been broken down and has caused two-point grounding of TV secondary circuit. Therefore, when the grounding short-circuit fault occurs in the primary system, the electric potential difference between the two points is formed, which causes the measurement voltage value of the detection circuit of the protection device to be incorrect and waveform distortion, leading to the incorrect action of the directional element and causing the protection to refuse to operate.
Through postaccident inspection, we can obtain the symptoms of failure: s 16 (abnormal data/invalid warning of voltage sampling), s 18 (three-phase voltage ripple) and s 19 (voltage waveform distortion). The associated causes subset of failure is M x = {m 10 , m 11 , m 12 }; that is, one of the most likely causes of protection device failure is m 10 (multipoint grounding in the secondary circuit of TV), m 11 (poor contact or the abruption of the secondary circuit of TV) or m 12 (a voltage transformer wiring error). It can be seen from  Through Table 1 and Table in Appendix A, the prior probability of elements in the causes subset M x and the conditional probability between symptoms and possible causes of failure are obtained, as shown in Table 3. The posterior probabilities can be obtained by substituting the data in Table 3 to Formula (1), as shown in Table 4. Substituting the posterior probability into Formula (2) 086. Therefore, it can be concluded that the most likely cause of the failure is m 10 (multipoint grounding in the secondary circuit of TV), which is consistent with the actual situation. A small amount of data loss does not affect the accuracy of the results, so this method can achieve a certain degree of resistance to data loss.

Fault Case 2 (Complex Faults)
If the alarm and abnormal information of other modules are found simultaneously in the former case, it indicates that the relay protection device is likely to be responsible for more than one part of the problem. In addition to the failure symptoms listed in the previous example, there are still s 24 (switch indicator light is not on), s 28 (protection-switching relay power failure), s 30 (DC power supply fault warning) and s 31 (DC protection disappears). Therefore, the symptoms subset S x of failure is S x = {s 24 ,s 25 ,s 26 ,s 28 ,s 29 ,s 30 ,s 31 ,s 32 ,s 33 }, and the causes subset M x of failure is M x = {m 16 , m 18 , m 19 , m 20 }. Another Bayesian network model was created as shown in Figure 4.
The prior and conditional probabilities in Tables 1 and 3 were substituted to Formulas (1) and (2) 19 (the power supply cannot work properly) and m 10 (multipoint grounding in the secondary circuit of TV) calculated from the previous case. In other words, for a complex situation where multiple parts of the device fail at the same time, multiple causes can be obtained by establishing multiple reasoning chains and Bayesian models for calculation, which proves that this method is still effective for complex failure cases.

Conclusions
Taking the relay protection devices as an example, this paper puts forward a fault tracking method, which proposes a new approach to the identification of the causes of equipment failure. Firstly, based on obtaining failure symptom information, the possible causes subset of failure can be constructed. According to the relationship between the two subsets from the statistics, the corresponding reasoning chain is constructed. Then, the Bayesian network is used for reverse reasoning, and the Bayesian suspicion of possible failure causes can be calculated so as to obtain the reason for the failure of the relay protection device. The functions of the reasoning chain and Bayesian network are complementary. On the one hand, the reasoning chain is used to realize the reduction of knowledge and the simplification of failure characteristics to establish the minimum event sets, which can simplify the network structure and contribute to the establishment of a more optimized and intuitive Bayesian network model. On the other hand, using the Bayesian network to deal with causal reasoning can give consideration to the error or lack of alarm signals in the transmission process and has higher accuracy for dealing with uncertainty problems. This method combines a Bayesian network with a reasoning chain to achieve efficient and rapid fault tracking and diagnosis. It is simple, effective and easy to implement. This method is based on empirical data and probability theory. The key to this method is the construction of Tables 1 and 2, which can be obtained through practice accumulation and related experiments. This is also a limitation of the method. The integrity of the two tables has a significant impact on the accuracy of this method. When failure occurs, accurate and complete records of symptoms and causes are needed. With the continuous progress of monitoring and detection methods, the failure information contained in Tables 1 and 2 is more abundant and accurate, which provides a more solid foundation for the realization of this method and the accuracy of device fault tracking and diagnosis. On the contrary, if there are only a few statistics to refer to, or if there are omissions in recording the data, it seriously affects the accuracy of the results. With sufficient data, the effectiveness and accuracy are proven by example analysis. In view of the fact that there are almost no existing studies on fault tracking at present, this method creatively deduces the fault cause from fault symptoms. Therefore, this method has strong application value and can be used to determine failure causes quickly, conveniently and accurately. In addition to relay protection devices, this method can also be applied to fault tracking of circuit breakers, communication interfaces and other equipment.

Data Availability Statement:
The data presented in this study are available in the article.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript or in the decision to publish the results.