An Industrial Fault Diagnostic System Based on a Cubic Dynamic Uncertain Causality Graph

This study presents an industrial fault diagnosis system based on the cubic dynamic uncertain causality graph (cubic DUCG) used to model and diagnose industrial systems without sufficient data for model training. The system is developed based on cloud native technology. It contains two main parts, the diagnostic knowledge base and the inference method. The knowledge base was built by domain experts modularly based on professional knowledge. It represented the causality between events in the target industrial system in a visual and graphical form. During the inference, the cubic DUCG algorithm could dynamically generate the cubic causal graph according to the real-time data and perform the logic and probability calculations based on the generated cubic DUCG models, visually displaying the dynamic causal evolution of faults. To verify the system’s feasibility, we rebuild a fault-diagnosis model of the secondary circuit system of No. 1 at the Ningde nuclear power plant based on the new system. Twenty-four fault cases were used to test the diagnostic accuracy of the system, and all faults were correctly diagnosed. The results showed that it was feasible to use the cubic DUCG platform for fault diagnosis.


Introduction
With the increasing complexity of industrial systems, the systems' safety has attracted extensive attention. Suppose some minor faults in the system cannot be detected and eliminated in time. In that case, it may cause the failure and paralysis of the entire system, and even lead to substantial disastrous consequences [1][2][3][4]. Improving the safety and reliability of the system and preventing and eliminating the occurrence and development of faults that affect the regular operation of the system has become a crucial problem to be solved. Fault monitoring and diagnostic technology is an effective method to improve the safety and reliability of complex systems [5][6][7]. Early fault-diagnosis expert systems were mostly rule-based and case-based [8][9][10]. The systems' diagnostic rules and cases were constructed by experts based on experience. The advantage is that they are less dependent on data and are interpretable. When the industrial system is relatively simple, it is practical to use those methods. When the industrial system is complex, those methods are prone to knowledge conflict, repetition, and circulation [11]. In addition, the increase in knowledge reduces the reasoning efficiency of the system [12,13]. With the development of machine learning, some machine-learning algorithms are applied in fault-diagnosis systems, including SVM [14][15][16], ANN [17,18], and DNN [19][20][21]. Those methods use machine-learning theory to adaptively learn the diagnostic knowledge from the collected data, rather than using the experience and knowledge of engineers. The fault diagnostic system based on machine learning has high diagnostic efficiency. When the training data are sufficient, the diagnostic accuracy of the model is high. However, the models constructed

Bi
The B-type variable is the root-cause variable used to represent the root cause/fault th variables to occur.

Xi
The X-type variable is the consequence or process variable used to represent the resu root-cause variable, and can also be used as the cause of other variable

Gi
The G-type variable is the logic-gate variable. It is used to describe the logical relatio of parent variables.

Di
The D-type variable is the unknown cause or default-cause variable. When the cause occurrence is unknown, then the D-type variable is used to represent the root cause occur.

Fi;j
The F-type variable is the weighted functional-event variable. It is used to represent a causalities between parent variables and child variables.
The causal mechanism between variables of the DUCG is shown i child event Xnk may be caused by one or more of its parent events Vij (V∈ variable). The parent events cause the child event through the weighted fu Fnk;ij. Fnk;ij = (rn;i/rn) Ank;ij, where Ank;ij is the virtual random functional event causal mechanism that Vij independently causes Xnk. ank;ij = Pr {Ank;ij} is the Vij causes Xnk to occur independently. The virtual random functional eve parent Vi and the child Xn are represented by the matrix

Bi
The B-type variable is the root-cause variable used to represent the root cause/fault that causes variables to occur.

Xi
The X-type variable is the consequence or process variable used to represent the result caused b root-cause variable, and can also be used as the cause of other variables.

Gi
The G-type variable is the logic-gate variable. It is used to describe the logical relation combina of parent variables.

Di
The D-type variable is the unknown cause or default-cause variable. When the cause of a varia occurrence is unknown, then the D-type variable is used to represent the root cause that causes occur.

Fi;j
The F-type variable is the weighted functional-event variable. It is used to represent and quanti causalities between parent variables and child variables.
The causal mechanism between variables of the DUCG is shown in Figure 2 child event Xnk may be caused by one or more of its parent events Vij (V∈{B, X, D, G} variable). The parent events cause the child event through the weighted functional e Fnk;ij. Fnk;ij = (rn;i/rn) Ank;ij, where Ank;ij is the virtual random functional event representin causal mechanism that Vij independently causes Xnk. ank;ij = Pr {Ank;ij} is the probabilit Vij causes Xnk to occur independently. The virtual random functional events betwee parent Vi and the child Xn are represented by the matrix The matrix allows incomplete expression; if there is no causality between Vij and Xnk Ank;ij does not exist, and Ank;ij is replaced with the symbol "-" in the matrix. When cons ing causalities, we only need to give the parameters of the states that we care abo reduce the difficulty of knowledge-base construction. rn;i/rn ( ; n n i i rr =  ) is the w parameter used to normalize the effect of parent variables on child variables. The rel ship between parents is the logical weighted exclusive OR of the DUCG weighted se ory [33]. The weighted functional event and the logical weighted exclusive OR enab DUCG to freely modify (add, delete, or update) the influence of parent variables on variables.

Bi
The B-type variable is the root-cause variable used to represent the root cause/fault that causes variables to occur.

Xi
The X-type variable is the consequence or process variable used to represent the result caused b root-cause variable, and can also be used as the cause of other variables.

Gi
The G-type variable is the logic-gate variable. It is used to describe the logical relation combina of parent variables.

Di
The D-type variable is the unknown cause or default-cause variable. When the cause of a varia occurrence is unknown, then the D-type variable is used to represent the root cause that causes occur.

Fi;j
The F-type variable is the weighted functional-event variable. It is used to represent and quanti causalities between parent variables and child variables.
The causal mechanism between variables of the DUCG is shown in Figure 2 child event Xnk may be caused by one or more of its parent events Vij (V∈{B, X, D, G} variable). The parent events cause the child event through the weighted functional e Fnk;ij. Fnk;ij = (rn;i/rn) Ank;ij, where Ank;ij is the virtual random functional event representin causal mechanism that Vij independently causes Xnk. ank;ij = Pr {Ank;ij} is the probabilit Vij causes Xnk to occur independently. The virtual random functional events betwee parent Vi and the child Xn are represented by the matrix The matrix allows incomplete expression; if there is no causality between Vij and Xnk Ank;ij does not exist, and Ank;ij is replaced with the symbol "-" in the matrix. When cons ing causalities, we only need to give the parameters of the states that we care abo reduce the difficulty of knowledge-base construction. rn;i/rn ( ; n n i i rr =  ) is the w parameter used to normalize the effect of parent variables on child variables. The rel ship between parents is the logical weighted exclusive OR of the DUCG weighted se ory [33]. The weighted functional event and the logical weighted exclusive OR enab DUCG to freely modify (add, delete, or update) the influence of parent variables on variables.
i i Gi i The X-type variable is the consequence or process variable used to represent the result caused by the root-cause variable, and can also be used as the cause of other variables.

Bi
The B-type variable is the root-cause variable used to represent the root cause/fault that causes variables to occur.

Xi
The X-type variable is the consequence or process variable used to represent the result caused b root-cause variable, and can also be used as the cause of other variables.

Gi
The G-type variable is the logic-gate variable. It is used to describe the logical relation combina of parent variables.

Di
The D-type variable is the unknown cause or default-cause variable. When the cause of a varia occurrence is unknown, then the D-type variable is used to represent the root cause that causes occur.

Fi;j
The F-type variable is the weighted functional-event variable. It is used to represent and quanti causalities between parent variables and child variables.
The causal mechanism between variables of the DUCG is shown in Figure 2 child event Xnk may be caused by one or more of its parent events Vij (V∈{B, X, D, G} variable). The parent events cause the child event through the weighted functional e Fnk;ij. Fnk;ij = (rn;i/rn) Ank;ij, where Ank;ij is the virtual random functional event representin causal mechanism that Vij independently causes Xnk. ank;ij = Pr {Ank;ij} is the probabilit Vij causes Xnk to occur independently. The virtual random functional events betwee parent Vi and the child Xn are represented by the matrix The matrix allows incomplete expression; if there is no causality between Vij and Xnk Ank;ij does not exist, and Ank;ij is replaced with the symbol "-" in the matrix. When cons ing causalities, we only need to give the parameters of the states that we care abo reduce the difficulty of knowledge-base construction. rn;i/rn ( ; n n i i rr =  ) is the w parameter used to normalize the effect of parent variables on child variables. The rel ship between parents is the logical weighted exclusive OR of the DUCG weighted se ory [33]. The weighted functional event and the logical weighted exclusive OR enab DUCG to freely modify (add, delete, or update) the influence of parent variables on variables.

Bi
The B-type variable is the root-cause variable used to represent the root cause/fault that causes variables to occur.

Xi
The X-type variable is the consequence or process variable used to represent the result caused b root-cause variable, and can also be used as the cause of other variables.

Gi
The G-type variable is the logic-gate variable. It is used to describe the logical relation combina of parent variables.

Di
The D-type variable is the unknown cause or default-cause variable. When the cause of a varia occurrence is unknown, then the D-type variable is used to represent the root cause that causes occur.

Fi;j
The F-type variable is the weighted functional-event variable. It is used to represent and quanti causalities between parent variables and child variables.
The causal mechanism between variables of the DUCG is shown in Figure 2 child event Xnk may be caused by one or more of its parent events Vij (V∈{B, X, D, G} variable). The parent events cause the child event through the weighted functional e Fnk;ij. Fnk;ij = (rn;i/rn) Ank;ij, where Ank;ij is the virtual random functional event representin causal mechanism that Vij independently causes Xnk. ank;ij = Pr {Ank;ij} is the probabilit Vij causes Xnk to occur independently. The virtual random functional events betwee The matrix allows incomplete expression; if there is no causality between Vij and Xnk Ank;ij does not exist, and Ank;ij is replaced with the symbol "-" in the matrix. When cons ing causalities, we only need to give the parameters of the states that we care abo reduce the difficulty of knowledge-base construction. rn;i/rn ( ; n n i i rr =  ) is the w parameter used to normalize the effect of parent variables on child variables. The rel ship between parents is the logical weighted exclusive OR of the DUCG weighted se ory [33]. The weighted functional event and the logical weighted exclusive OR enab DUCG to freely modify (add, delete, or update) the influence of parent variables on variables. The D-type variable is the unknown cause or default-cause variable. When the cause of a variable's occurrence is unknown, then the D-type variable is used to represent the root cause that causes it to occur.

Bi
The B-type variable is the root-cause variable used to represent the root cause/fault that causes variables to occur.

Xi
The X-type variable is the consequence or process variable used to represent the result caused b root-cause variable, and can also be used as the cause of other variables.

Gi
The G-type variable is the logic-gate variable. It is used to describe the logical relation combina of parent variables.

Di
The D-type variable is the unknown cause or default-cause variable. When the cause of a varia occurrence is unknown, then the D-type variable is used to represent the root cause that causes occur.

Fi;j
The F-type variable is the weighted functional-event variable. It is used to represent and quantif causalities between parent variables and child variables.
The causal mechanism between variables of the DUCG is shown in Figure 2 child event Xnk may be caused by one or more of its parent events Vij (V∈{B, X, D, G} variable). The parent events cause the child event through the weighted functional e Fnk;ij. Fnk;ij = (rn;i/rn) Ank;ij, where Ank;ij is the virtual random functional event representin causal mechanism that Vij independently causes Xnk. ank;ij = Pr {Ank;ij} is the probability Vij causes Xnk to occur independently. The virtual random functional events betwee parent Vi and the child Xn are represented by the matrix The matrix allows incomplete expression; if there is no causality between Vij and Xnk Ank;ij does not exist, and Ank;ij is replaced with the symbol "-" in the matrix. When cons ing causalities, we only need to give the parameters of the states that we care abo reduce the difficulty of knowledge-base construction. rn;i/rn ( ; n n i i rr =  ) is the w parameter used to normalize the effect of parent variables on child variables. The rela ship between parents is the logical weighted exclusive OR of the DUCG weighted se ory [33]. The weighted functional event and the logical weighted exclusive OR enab DUCG to freely modify (add, delete, or update) the influence of parent variables on variables. The F-type variable is the weighted functional-event variable. It is used to represent and quantify the causalities between parent variables and child variables.
The causal mechanism between variables of the DUCG is shown in Figure 2. The child event X nk may be caused by one or more of its parent events V ij (V∈{B, X, D, G}-type variable). The parent events cause the child event through the weighted functional events F nk;ij . F nk;ij = (r n;i /r n ) A nk;ij , where A nk;ij is the virtual random functional event representing the causal mechanism that V ij independently causes X nk . a nk;ij = Pr {A nk;ij } is the probability that V ij causes X nk to occur independently. The virtual random functional events between the parent V i and the child X n are represented by the matrix A n;i = The matrix allows incomplete expression; if there is no causality between V ij and X nk , then A nk;ij does not exist, and A nk;ij is replaced with the symbol "-" in the matrix. When constructing causalities, we only need to give the parameters of the states that we care about to reduce the difficulty of knowledge-base construction. r n;i /r n (r n = ∑ i r n;i ) is the weight parameter used to normalize the effect of parent variables on child variables. The relationship between parents is the logical weighted exclusive OR of the DUCG weighted set theory [33]. The weighted functional event and the logical weighted exclusive OR enable the DUCG to freely modify (add, delete, or update) the influence of parent variables on child variables. In order to infer to which root causes the child event is related, the child even forms logic expression expansion along the opposite direction of the causal chain. expression expansion, the child event is expressed by its parent events, and the expa process can be executed recursively until the parent events are the B-type variables B-type variables are the root causes of other variables and the objects of inference ca tion. The expansion expression is shown in Equation (1) For simplicity, Equation (1) can be briefly written as Equation (2):

 
Logic expression expansion is an essential step of cubic DUCG reasoning. It c cursively expand the observation evidence E (E = E′E″, E′ = {Xij, j ≠ 0} is the collecti abnormal evidence, and E″ = {Xi0} is the collection of normal evidence) to the root f The result of the logic-expanded expression is used to calculate the conditional proba of each hypothesis under the current evidence.

The Inference Method of the Cubic DUCG
The reasoning process of the cubic DUCG reflects the temporal correlations a events, and is capable of representing the sequential causality interactions in fault-sp ing processes. The reasoning step of the cubic DUCG is described as follows: (1) D decomposition. The original DUCG is decomposed into several sub-DUCGs; each DUCG contains one root event Bi, remarked as DUCG(Bi); (2) Obtain Slice_DG (Bi, tm Slice_DG (Bi, tm) is the intraslice causality graph at time tm. According to the evidence at tm, the DUCG (Bi)s are decomposed into several Slice_DG (Bi,tm)s based on the DUCG simplification rules [26,34]. Each Slice_DG (Bi,tm) contains one root fault, and scribes the causality between the root fault Bi and the evidence E (tm) at tm; (3) Cubi (Bi,tm) generation. The Cubic_DG (tm) is the generated cubic causality graph at tm. It is erated by merging the Cubic_DG (Bi,tm-1) generated at tm-1 and Slice_DG (Bi,tm) obtain tm. Cubic_DG (tm) describes the propagation process of the evidence related to root fa from time t1 to time tm. In particular, when tm=t1, Cubic_DG (Bi,t1) = Slice_DG (Bi,t Select a valid Cubic_DG (Bi,tm)s. The reasoning of the cubic DUCG is based on the u assumption that there can only be one root fault at the same time. Therefore, if th bic_DG (Bi,tm) cannot explain all the evidence E (tm) at tm, it is regarded as an invali In order to infer to which root causes the child event is related, the child event performs logic expression expansion along the opposite direction of the causal chain. After expression expansion, the child event is expressed by its parent events, and the expansion process can be executed recursively until the parent events are the B-type variables. The B-type variables are the root causes of other variables and the objects of inference calculation. The expansion expression is shown in Equation (1) [27]: For simplicity, Equation (1) can be briefly written as Equation (2): Logic expression expansion is an essential step of cubic DUCG reasoning. It can recursively expand the observation evidence E (E = E E", E = {X ij , j = 0} is the collection of abnormal evidence, and E" = {X i0 } is the collection of normal evidence) to the root faults. The result of the logic-expanded expression is used to calculate the conditional probability of each hypothesis under the current evidence.

The Inference Method of the Cubic DUCG
The reasoning process of the cubic DUCG reflects the temporal correlations among events, and is capable of representing the sequential causality interactions in fault-spreading processes. The reasoning step of the cubic DUCG is described as follows: (1) DUCG decomposition. The original DUCG is decomposed into several sub-DUCGs; each sub-DUCG contains one root event B i , remarked as DUCG( is the intraslice causality graph at time t m . According to the evidence E (t m ) at t m , the DUCG (B i )s are decomposed into several Slice_DG (B i ,t m )s based on the cubic DUCG simplification rules [26,34]. Each Slice_DG (B i ,t m ) contains one root fault, and it describes the causality between the root fault B i and the evidence is the generated cubic causality graph at t m . It is generated by merging the Cubic_DG (B i ,t m-1 ) generated at t m-1 and Slice_DG (B i ,t m ) obtained at t m . Cubic_DG (t m ) describes the propagation process of the evidence related to root fault B i from time t 1 to time t m . In particular, when t m =t 1  of E (t m ) and H kj E (t m ) are based on each respective Cubic_DG (B i ,t m ), until the expression contains only the form of the sum of variable products of {B-,D-,A-,r-}-type variables. Those expressions are used for the probability calculation; (7) Probability calculation. Calculate the conditional probability Pr H kj (t m ) of each H kj in S H (t m ) to evaluate which root fault is more likely to occur. The probability is calculated using Equation (3): In Equation (3), Steps: An example is used to exemplify the dynamic inference process of the cubic DUCG in a continuous time series. The original DUCG knowledge base is shown in Figure 1. It contains three root faults: B 1 , B 2, and B 8 . Other variables include some intermediate processes or results caused by root faults. The inference process contains three moments: t 1 , t 2 , and t 3 .
Step 1.1. DUCG decomposition. By decomposing the original DUCG shown in Figure 1, we obtain three sub-DUCGs: DUCG (B 1 ), DUCG (B 2 ), and DUCG (B 8 ). Each DUCG (B i ) describes the relationships between the root fault B i and its related variables; the results are shown in Figure 3. Usually, the green circle stands for the normal evidence, At time t1, suppose the received evidence is E (t1) = X3,0X5,1X6,0.
Step 1.1. DUCG decomposition. By decomposing the original DUCG shown in Figure  1, we obtain three sub-DUCGs: DUCG (B1), DUCG (B2), and DUCG (B8). Each DUCG (Bi) describes the relationships between the root fault Bi and its related variables; the results are shown in Figure 3. Usually, the green circle stands for the normal evidence, the circle with other colors stands for the abnormal evidence, and the circle without colors stands for the state of the variable is known. Step 1.2. Obtain the valid Slice_DG (Bi,t1)s and generate Cubic_DG (Bi,t1)s at t1. By simplifying DUCG (Bi)s according to the simplification rules of the cubic DUCG, we obtain the three Slice_DG (Bi,t1)s shown in Figure 4; The Slice_DG (Bi,t1) shows the relationship between the root fault Bi and the current evidence E (t1) at t1. The Slice_DG (B1,t1) and the Slice_DG (B2,t1) can explain the abnormal evidence, so they are the valid Slice_DG (Bi,t1)s. In Slice_DG (B8,t1), the abnormal evidence X5,1 cannot be explained by the root fault B8, so it is the invalid Slice_DG (Bi,t1) and is deleted. Step 1.3. Cubic_DB (Bi,t1) generation. At time t1, the valid Slice_DG (Bi,t1)s are used as Cubic_DG (Bi,t1)s, as shown in Figure 5. From the two Cubic_DG(Bi,t1)s, we obtain the hypothesis spaces SH(t1)={H1,1, H2,1, H2,1}={ B1,1, B2,1, B2,2}. Step 1.2. Obtain the valid Slice_DG (B i ,t 1 )s and generate Cubic_DG (B i ,t 1 )s at t 1 . By simplifying DUCG (B i )s according to the simplification rules of the cubic DUCG, we obtain the three Slice_DG (B i ,t 1 )s shown in Figure 4; The Slice_DG (B i ,t 1 ) shows the relationship between the root fault B i and the current evidence E (t 1 ) at t 1 . The Slice_DG (B 1 ,t 1 ) and the Slice_DG (B 2 ,t 1 ) can explain the abnormal evidence, so they are the valid Slice_DG (B i ,t 1 )s. In Slice_DG (B 8 ,t 1 ), the abnormal evidence X 5,1 cannot be explained by the root fault B 8 , so it is the invalid Slice_DG (B i ,t 1 ) and is deleted.
At time t 2 , suppose the received abnormal evidence is X 6,1 , and combined with the evidence at t 1 , the total evidence at t 2 is E (t 2 ) = X 3,0 X 5,1 X 6,1 . The reasoning calculation process at t 2 is as follows.
Step 2.1. By simplifying DUCG (Bi)s based on the evidence E (t2), we obtai Slice_DG (B1,t2) and Slice_DG (B2,t2) shown in Figure 6. Step 2.2. Generate Cubic_DG (Bi,t2)s. The Cubic_DG (Bi,t2)s at t2 are generated by thesizing the Cubic_DG (Bi,t1)s at t1 and the Slice_DG (Bi,t2)s at t2; we then obtain th bic_DG (Bi,t2)s shown in Figure 7. From Cubic_DG (Bi,t2)s, we can see that X3 and X not change, but the state of X6 changed from normal to abnormal.  Step 2.2. Generate Cubic_DG (B i ,t 2 )s. The Cubic_DG (B i ,t 2 )s at t 2 are generated by synthesizing the Cubic_DG (B i ,t 1 )s at t 1 and the Slice_DG (B i ,t 2 )s at t 2 ; we then obtain the Cubic_DG (B i ,t 2 )s shown in Figure 7. From Cubic_DG (B i ,t 2 )s, we can see that X 3 and X 5 did not change, but the state of X 6 changed from normal to abnormal.  At time t2, suppose the received abnormal evidence is X6,1, and combined with the evidence at t1, the total evidence at t2 is E (t2) = X3,0 X5,1 X6,1. The reasoning calculation process at t2 is as follows.
This was the reasoning process of the cubic DUCG, and the inference was based on a time series. The algorithm reconstructed the current cubic DUCG based on the evidence received at the current moment and the cubic DUCG at the last moment, showing the causal propagation process based on the time series. The DUCG simplification could simplify the complex original DUCG into a set of simple Slice_DG (B i ,t m ) according to the evidence E (t m ) at t m . Slice_DG (B i ,t m ) described the relationship between the evidence E (t m ) and hypothesis B i . The DUCG simplification could remove the impossible causalities and irrelevant variables on the condition of the evidence E (t m ). Meanwhile, the computation scale was reduced exponentially without losing accuracy. The Cubic_DG (B i ,t m ) reflected how evidence changed over time slices, and was used as the graphical explanation of the hypothesis B i and to enhance the interpretability of the inference results. The logical expansion of expression then expressed the logical relationship between the evidence and hypothesis, which is the premise of probability calculation. The reasoning calculation calculated the conditional probability of each hypothesis under the current evidence according to the causal effect between variables.

System Design
According to the reasoning mode of cubic DUCG and the characteristics of industrial diagnostic systems, the cubic-DUCG-based industrial fault diagnostic system was divided into four parts: the communication module, real-time monitoring and diagnosis module, inference engine, and knowledge-editing tool. The summary of each functional module is shown in Figure 10. The knowledge-editing tool was used by domain experts to design the DUCG knowledge base. The DUCG knowledge base can be built in a modular way, and a whole DUCG can be divided into several sub-DUCGs. Generally, each sub-DUCG contains one fault, and represents causal relations between the fault and its related monitoring signals. This modular method of knowledge-base construction could reduce the difficulty in constructing a large complex knowledge base. Figure 11 shows a complete DUCG knowledge base. It was used for the fault diagnosis of the secondary circuit of No. 1 at the Ningde nuclear power plant. This DUCG knowledge base is reconstructed from the knowledge base in paper [35]. It contains 24 B-type variables that represent 24 different root faults in the secondary circuit of the nuclear power water reactor; the detail of the faults are shown in Appendix B. 141 X-type variables were used to describe the intermediate process or results arising from a root fault; a total of 1192 F-type variables (the direct red line) were used to describe the causal relations among variables. The knowledge-editing tool was used by domain experts to design the DUCG knowledge base. The DUCG knowledge base can be built in a modular way, and a whole DUCG can be divided into several sub-DUCGs. Generally, each sub-DUCG contains one fault, and represents causal relations between the fault and its related monitoring signals. This modular method of knowledge-base construction could reduce the difficulty in constructing a large complex knowledge base. Figure 11 shows a complete DUCG knowledge base. It was used for the fault diagnosis of the secondary circuit of No. 1 at the Ningde nuclear power plant. This DUCG knowledge base is reconstructed from the knowledge base in paper [35]. It contains 24 B-type variables that represent 24 different root faults in the secondary circuit of the nuclear power water reactor; the detail of the faults are shown in Appendix B. 141 X-type variables were used to describe the intermediate process or results arising from a root fault; a total of 1192 F-type variables (the direct red line) were used to describe the causal relations among variables.
The communication module was used for signal processing. The communication module received the monitoring data from the industrial system. It transformed the data to conform to the data format requirements of DUCG according to the mapping relationship between the measure points and variables, then transmitted the data to the real-time monitoring and diagnosis module.
The inference engine was the core module of the fault diagnostic system. It could generate cubic DUCG and engage in continuous causal reasoning based on the abnormal evidence. Its diagnostic results were presented in probabilistic form, and the generated cubic DUCG was used to explain the results. The inference engine was an independent service. Its data resulted from the real-time monitoring module, and its inference results were sent back to the real-time monitoring module for user decision making.
The real-time monitoring and diagnosis module was the control and human-computer interaction center. In this module, users could choose the DUCG knowledge base based on monitoring requirements, and then the instruction was sent to the communication module to receive and process signals associated with the current DUCG knowledge base. The monitoring module displayed and monitored signals in real-time.
The four functional modules of the system completed the functions of building a knowledge base, receiving and processing data, diagnosing faults, and displaying results. Through this system, users could translate knowledge and experience into diagnostic models. Furthermore, the model was used for real-time fault diagnosis. The system was a web application, the web client of the system was implemented with jquery+html, and the server-client was implemented using Java; the framework adopted by the system was the spring boot. The system is developed based on cloud native technology. Compared with the DUCG system based on traditional web technology [35]. The system has good scalability and can dynamically increase computing power according to task requirements. The communication module was used for signal processing. The communication module received the monitoring data from the industrial system. It transformed the data to conform to the data format requirements of DUCG according to the mapping relationship between the measure points and variables, then transmitted the data to the real-time

Experiment
In order to validate the feasibility and diagnostic accuracy of the system, an experiment was done based on the secondary circuit of No. 1 at the Ningde nuclear power plant. The DUCG knowledge base is shown in Figure 11. The system was deployed on the computer cluster. The inference engine was deployed on one machine of the cluster with an AMD Ryzen 7 5700G CPU at 4.45 GHz, an 8-core processor, and 128 GB of RAM. The test data were collected from the simulator of the secondary circuit of No. 1 at the Ningde nuclear power plant. A total of 24 fault cases were used to test the system. Each fault case contained several time slices, and each time slice contained 141 signal data. The data types of the signals included switching value, continuous data, and discrete data. These signal data corresponded to 141 variables in the model one by one. During the system test, the simulator sent a group of signal data to the system every other second, which was recorded as a time slice. An example of condensate extraction pump failure was used to demonstrate the diagnostic process of the system.
The fault "condensate extraction pump fault (CEX001PO)" was inserted at the 13th second after the simulator operated stably, and the opening of the pump CEX001PO gradually decreased; then, when the communication module received the real-time data at the 14th second, one of the variables was in its abnormal state (the intake pressure of ABP401RE was low (ABP004MP), X 71,2 ). Because this was the first time the system received the abnormal evidence, this time was marked as t 1 and the system started the inference. The inference results and graphical explanation at t 1 are shown in Figures 12 and 13. bility and can dynamically increase computing power according to task requirements.

Experiment
In order to validate the feasibility and diagnostic accuracy of the system, an experiment was done based on the secondary circuit of No. 1 at the Ningde nuclear power plant. The DUCG knowledge base is shown in Figure 11. The system was deployed on the computer cluster. The inference engine was deployed on one machine of the cluster with an AMD Ryzen 7 5700G CPU at 4.45 GHz, an 8-core processor, and 128 GB of RAM. The test data were collected from the simulator of the secondary circuit of No. 1 at the Ningde nuclear power plant. A total of 24 fault cases were used to test the system. Each fault case contained several time slices, and each time slice contained 141 signal data. The data types of the signals included switching value, continuous data, and discrete data. These signal data corresponded to 141 variables in the model one by one. During the system test, the simulator sent a group of signal data to the system every other second, which was recorded as a time slice. An example of condensate extraction pump failure was used to demonstrate the diagnostic process of the system.
The fault "condensate extraction pump fault (CEX001PO)" was inserted at the 13th second after the simulator operated stably, and the opening of the pump CEX001PO gradually decreased; then, when the communication module received the real-time data at the 14th second, one of the variables was in its abnormal state (the intake pressure of ABP401RE was low (ABP004MP), X71,2). Because this was the first time the system received the abnormal evidence, this time was marked as t1 and the system started the inference. The inference results and graphical explanation at t1 are shown in Figures 12 and 13.   Figure 13 shows that because multiple fault sources could cause X71,2 to occur, the fault source cannot be diagnosed at t1, but the scope of the fault and the probability of each fault could be preliminarily inferred. Sixteen faults could cause X71,2 to occur, and the top three faults in the result list were more likely to cause X71,2 to occur.
At the 15th second, the system did not receive the new abnormal evidence, so it did not perform the reasoning calculation. At the 16th second, the communication module received the new abnormal signal (condensate extraction pump (CEX003PO) failure, X195,1). This time was marked as t2. The inference results and graphical explanation are shown in Figures 14 and 15. Comparing the inference results in Figures 12 and 14, we can see that hypothesis space was further reduced. The hypotheses in the first inference that could not explain the new evidence were excluded. Only the hypotheses that could explain all abnormal evidence were valid. According to the results, we could infer that the abnormal signals were possibly caused by the condensate extraction pump status (CEX001PO) (B1,1) or the condensate extraction pump status (CEX002PO) (B2,1), and B1,1 was more likely.    Figure 13 shows that because multiple fault sources could cause X 71,2 to occur, the fault source cannot be diagnosed at t 1 , but the scope of the fault and the probability of each fault could be preliminarily inferred. Sixteen faults could cause X 71,2 to occur, and the top three faults in the result list were more likely to cause X 71,2 to occur.
At the 15th second, the system did not receive the new abnormal evidence, so it did not perform the reasoning calculation. At the 16th second, the communication module received the new abnormal signal (condensate extraction pump (CEX003PO) failure, X 195,1 ). This time was marked as t 2 . The inference results and graphical explanation are shown in Figures 14 and 15. Comparing the inference results in Figures 12 and 14, we can see that hypothesis space was further reduced. The hypotheses in the first inference that could not explain the new evidence were excluded. Only the hypotheses that could explain all abnormal evidence were valid. According to the results, we could infer that the abnormal signals were possibly caused by the condensate extraction pump status (CEX001PO) (B 1,1 ) or the condensate extraction pump status (CEX002PO) (B 2,1 ), and B 1,1 was more likely. Figure 13. The graphical interpretation of the inference hypothesis and abnormal evid Figure 13 shows that because multiple fault sources could cause X71,2 t fault source cannot be diagnosed at t1, but the scope of the fault and the probab fault could be preliminarily inferred. Sixteen faults could cause X71,2 to occur three faults in the result list were more likely to cause X71,2 to occur.
At the 15th second, the system did not receive the new abnormal eviden not perform the reasoning calculation. At the 16th second, the communica received the new abnormal signal (condensate extraction pump (CEX003 X195,1). This time was marked as t2. The inference results and graphical exp shown in Figures 14 and 15. Comparing the inference results in Figures 12 an see that hypothesis space was further reduced. The hypotheses in the first in could not explain the new evidence were excluded. Only the hypotheses th plain all abnormal evidence were valid. According to the results, we could i abnormal signals were possibly caused by the condensate extraction p (CEX001PO) (B1,1) or the condensate extraction pump status (CEX002PO) (B was more likely.    Figure 13 shows that because multiple fault sources could cause fault source cannot be diagnosed at t1, but the scope of the fault and the fault could be preliminarily inferred. Sixteen faults could cause X71,2 to three faults in the result list were more likely to cause X71,2 to occur.
At the 15th second, the system did not receive the new abnormal not perform the reasoning calculation. At the 16th second, the comm received the new abnormal signal (condensate extraction pump (C X195,1). This time was marked as t2. The inference results and graphic shown in Figures 14 and 15. Comparing the inference results in Figure see that hypothesis space was further reduced. The hypotheses in the could not explain the new evidence were excluded. Only the hypoth plain all abnormal evidence were valid. According to the results, we c abnormal signals were possibly caused by the condensate extrac (CEX001PO) (B1,1) or the condensate extraction pump status (CEX002 was more likely.  At the 17th second, the communication system received more abnormal signals. This time was marked as time t 3 . The third inference results and graphical interpretation are shown in Figures 16 and 17. Because only B 1,1 could explain all of the known abnormal evidence, according to the current evidence, B 1,1 was diagnosed.  This example showed the inference process of the cubic DUCG. It performed i ence calculations based on the time series. This fault diagnosis included three mom and the moment that first showed the abnormal evidence was marked as t1. At t1, was less abnormal evidence, so the specific diagnosis result could not be determined the range of possible failures could be roughly determined. At t2, new abnormal evid was added to the diagnosis, which further narrowed the scope of the fault diagnosi t3, new evidence increased, and the fault was uniquely determined. In the next few ments, new evidence continued to be received. However, since there was only one nostic fault left and the fault could explain all abnormal evidence, the diagnosis resul not change, and the diagnosis was completed. At each moment, the system dynami generated a new cubic DUCG for the diagnosis combined with the cubic DUCG obta from the diagnosis at the last moment and the new evidence, giving the reasoning nosis results. The graphical interpretations could demonstrate the development and lution process of the fault with time. It was convenient for the operator to understan development of the fault for troubleshooting. Table 2 shows the test results of the 24 fault cases. The "Fault" is the fault code i DUCG knowledge base. "Rank First" indicates the first moment when the fault ranked first in the diagnostic results. "Confirmed Diagnosis" indicates the moment w the fault was confirmed. "Time Consumption" indicates the total reasoning time for nosing the fault. "Average Time" represents the average time of each diagnosis. W  This example showed the inference process of the cubic DUCG. It performed inference calculations based on the time series. This fault diagnosis included three moments, and the moment that first showed the abnormal evidence was marked as t1. At t1, there was less abnormal evidence, so the specific diagnosis result could not be determined, but the range of possible failures could be roughly determined. At t2, new abnormal evidence was added to the diagnosis, which further narrowed the scope of the fault diagnosis. At t3, new evidence increased, and the fault was uniquely determined. In the next few moments, new evidence continued to be received. However, since there was only one diagnostic fault left and the fault could explain all abnormal evidence, the diagnosis result did not change, and the diagnosis was completed. At each moment, the system dynamically generated a new cubic DUCG for the diagnosis combined with the cubic DUCG obtained from the diagnosis at the last moment and the new evidence, giving the reasoning diagnosis results. The graphical interpretations could demonstrate the development and evolution process of the fault with time. It was convenient for the operator to understand the development of the fault for troubleshooting. Table 2 shows the test results of the 24 fault cases. The "Fault" is the fault code in the DUCG knowledge base. "Rank First" indicates the first moment when the fault was ranked first in the diagnostic results. "Confirmed Diagnosis" indicates the moment when the fault was confirmed. "Time Consumption" indicates the total reasoning time for diagnosing the fault. "Average Time" represents the average time of each diagnosis. We can see that the 24 fault cases were all correctly diagnosed from the results. This proved that the fault-diagnosis model of the secondary circuit system constructed in this study was accurate. It also proved that using the cubic DUCG to construct a complex diagnostic model was feasible. Among the 24 fault cases, 17 of them could be diagnosed at time t1. The remaining seven faults needed multiple time slices to be diagnosed, but they could all be sorted to the first place in the list of diagnostic faults within four diagnostic time This example showed the inference process of the cubic DUCG. It performed inference calculations based on the time series. This fault diagnosis included three moments, and the moment that first showed the abnormal evidence was marked as t 1 . At t 1 , there was less abnormal evidence, so the specific diagnosis result could not be determined, but the range of possible failures could be roughly determined. At t 2 , new abnormal evidence was added to the diagnosis, which further narrowed the scope of the fault diagnosis. At t 3 , new evidence increased, and the fault was uniquely determined. In the next few moments, new evidence continued to be received. However, since there was only one diagnostic fault left and the fault could explain all abnormal evidence, the diagnosis result did not change, and the diagnosis was completed. At each moment, the system dynamically generated a new cubic DUCG for the diagnosis combined with the cubic DUCG obtained from the diagnosis at the last moment and the new evidence, giving the reasoning diagnosis results. The graphical interpretations could demonstrate the development and evolution process of the fault with time. It was convenient for the operator to understand the development of the fault for troubleshooting. Table 2 shows the test results of the 24 fault cases. The "Fault" is the fault code in the DUCG knowledge base. "Rank First" indicates the first moment when the fault was ranked first in the diagnostic results. "Confirmed Diagnosis" indicates the moment when the fault was confirmed. "Time Consumption" indicates the total reasoning time for diagnosing the fault. "Average Time" represents the average time of each diagnosis. We can see that the 24 fault cases were all correctly diagnosed from the results. This proved that the fault-diagnosis model of the secondary circuit system constructed in this study was accurate. It also proved that using the cubic DUCG to construct a complex diagnostic reasoning efficiency, an inability to deal with the logical cycle or uncertain causalities, and difficulty in managing the growing knowledge.
This study proposed a fault-diagnosis method for a unique industrial system based on the cubic DUCG. The model was built based on expert knowledge, experience, and statistical data, and it described the causal mechanism of faults and abnormal signals. The system could carry out a continuous fault diagnosis according to the time sequence, display the results in the form of probability, and graphically represent the propagation of faults over time. The modular construction method of the cubic DUCG knowledge base reduced the modeling difficulty for large and complex knowledge bases and facilitated knowledge management and maintenance. The cubic DUCG could express logical cycles and uncertain causal relationships to express expert knowledge accurately. The reasoning process of the cubic DUCG included model simplification, logical calculation, and probability calculation. This inference method reduced the computational complexity of inference without losing the accuracy of the results. It solved the problem of the high computational complexity of large and complex knowledge base reasoning. These characteristics of the cubic DUCG made it more suitable for industrial system modeling and fault diagnosis based on expert knowledge.
In order to verify the feasibility and effectiveness of the system, we cooperated with nuclear experts to build the fault diagnostic model of the secondary circuit system of No. 1 at the Ningde nuclear power plant. The model's 24 root faults represented the operation status of 24 leading components of the secondary circuit system, such as the steam turbines and electric generators. A total of 141 variables represented the abnormal signals that root faults may have caused. The data types of variables included switching value, discrete type, and continuous type. Experts determined the values of the causal strength between variables based on experience or statistical data. The variables in the model could be mapped one-to-one with the detection points in the secondary circuit system. They could reflect the operation state of the secondary circuit system. The secondary system simulator generated 24 groups of fault cases to test the model. Each case tested 1 root fault, and 24 faults were correctly diagnosed. The test results showed that using the cubic DUCG for fault diagnosis in unique industrial systems was feasible. At the same time, it should be pointed out that the diagnostic accuracy of the diagnosis system based on expert knowledge depended on the accuracy of the model. Therefore, multiexpert joint modeling and thirdparty auditing of the model is one of the methods to ensure the model's accuracy. Since the current verification data were only provided by the nuclear power plant simulator, the verification model was also limited to 24 faults in the secondary circuit system of the nuclear power plant. Therefore, the verification of the system with only one application scenario was not complete and systematic. In a following work, we will continue to extend the model to test it using more data. In addition, we will verify the performance of the system in more application scenarios.