Fault Diagnosis Techniques for Nuclear Power Plants: A Review from the Artiﬁcial Intelligence Perspective

: Fault diagnosis plays an important role in complex and safety-critical systems such as nuclear power plants (NPPs). With the development of artiﬁcial intelligence (AI), extensive research has been carried out for fast and efﬁcient fault diagnosis based on intelligent methods. This paper presents a review of various AI-based system-level fault diagnosis methods for NPPs. We ﬁrst discuss the development history of AI. Based on this exposition, AI-based fault diagnosis techniques are classiﬁed into knowledge-driven and data-driven approaches. For knowledge-driven methods, we discuss both the early if–then-based fault diagnosis techniques and the current new theory-based ones. The principles, application, and comparative analysis of the representative methods are systematically described. For data-driven strategies, we discuss single-algorithm-based techniques such as ANN, SVM, PCA, DT, and clustering, as well as hybrid techniques that combine algorithms together. The advantages and disadvantages of both knowledge-driven and data-driven methods are compared, illustrating the tendency to combine the two approaches. Finally, we provide some possible future research directions and suggestions.


Introduction
Advances in technology have increased the level of automation in industry, but they have made systems increasingly complex, placing higher demands on system safety and reliability. One way to improve system safety and reliability is to improve the quality, reliability, and robustness of system components, but this still cannot eliminate the occurrence of faults [1,2]. Therefore, fault diagnosis has become an important technique to ensure the safety and reliability of industrial systems. For complex nuclear power plant systems, fault diagnosis techniques are designed to monitor whether the system and its components are functioning properly, detect the type of fault at an early stage, and determine the location and severity of the fault to avoid further damage [3].
Fault diagnosis includes fault monitoring, fault location, and fault analysis [4,5]. Fault monitoring determines whether there is a fault in the system and components. Fault location determines the location of the fault. Fault analysis performs the function of determining the type, severity, and cause of the fault. Traditional fault diagnosis techniques are generally divided into hardware-redundancy-based, model-based, and signal-processingbased methods [6][7][8]. The hardware-redundancy-based method is to use a redundant component design idea to detect component faults when the component outputs are different from those of the redundant components [9][10][11][12]. The model-based method requires a more accurate mathematical model of the system, similar to the concept of hardware redundancy, which is diagnosed by comparing the output of the mathematical model with that of the actual system [13][14][15][16]. The signal-processing-based method requires mathematical or statistical processing of the measured data to extract information related to the fault [17][18][19][20]. requires mathematical or statistical processing of the measured data to extract informatio related to the fault [17][18][19][20].
More than two-thirds of the nuclear reactors in service by the end of 2021 worldwid were pressurized water reactors (PWRs), and the typical pressurized water reactor system composition is shown in Figure 1. PWR nuclear power plants use light water as a coolan and moderator [21], which mainly consists of a nuclear steam supply system [21,22], turbine generator system [23], and other auxiliary systems [24]. After the coolant absorb the heat energy released from nuclear fuel fission, the heat is then transferred through th steam generator to the second circuit to generate steam, which then enters the turbine t do work and generate electricity [25]. Nuclear power plant systems contain hundreds o subsystems with potential radiological hazards. If a fault occurs during operation, the op erator is required to accurately determine the fault type. Therefore, fault diagnosis is a important support technology to assist operators in making fault identification. Trad tional fault diagnosis techniques for nuclear power plants rely mainly on expert exper ence, which is somewhat uncertain and subjective. With the advancement of instrumen tation and control systems, nuclear power plants generate a large amount of data. Artifi cial intelligence can process the large amount of data, and the research on fault diagnos technology based on AI is increasing. System-level faults are one of the major causes of accidents in nuclear power plant In the event of a fault, trained operators are faced with hundreds of subsystems and large number of monitoring and control parameters, and the immense psychologic stress can easily lead them to misjudgments, which can lead to serious radiological con sequences. With the rise of AI, numerous studies on fault diagnosis based on AI hav emerged. AI is a technology that resembles human intelligence through some pro grammed language of computers. AI has shown great advantages in some aspects. Firs AI can process huge sources of information in a short period, helping operators extra critical information quickly after a fault occurs. Second, AI can also eliminate human erro Even the best experts in the nuclear field have the potential to make mistakes, while A systems built on specific tasks do not suffer from such errors. Third, AI can work contin uously, and continuous condition monitoring is essential for nuclear power plants. A System-level faults are one of the major causes of accidents in nuclear power plants.
In the event of a fault, trained operators are faced with hundreds of subsystems and a large number of monitoring and control parameters, and the immense psychological stress can easily lead them to misjudgments, which can lead to serious radiological consequences. With the rise of AI, numerous studies on fault diagnosis based on AI have emerged. AI is a technology that resembles human intelligence through some programmed language of computers. AI has shown great advantages in some aspects. First, AI can process huge sources of information in a short period, helping operators extract critical information quickly after a fault occurs. Second, AI can also eliminate human error. Even the best experts in the nuclear field have the potential to make mistakes, while AI systems built on specific tasks do not suffer from such errors. Third, AI can work continuously, and continuous condition monitoring is essential for nuclear power plants. AI does not always work, e.g., some AI technologies that rely on training are unreliable if they encounter situations other than training.

Fault Diagnosis Classification from the AI Perspective
It is important to identify the history of AI development, which helps to establish a framework for AI-based fault diagnosis technology for nuclear power plants.

Development History of AI
Artificial intelligence (AI) was born in 1956, and there are two competing lines of development, namely, symbolism and connectionism [31]. As shown in Figure 2, connectionism, also known as data-driven methods, predates symbolism and originates in early computers and cybernetics [32]. The concept of neural networks was introduced by neuroscientist Warren McCulloch and logician Walter Pitts in 1943 [33]. The development trend in Figure 2 shows the dominance of connectionism in the early stages, and connectionism saw a major development in the late 1950s [34]. Frank Rosenblatt, inspired by the work of many parties, proposed a true connectionism system [35]. In the 1960s and 1970s, a variety of connectionist techniques were developed [36][37][38], such as statistical learning techniques based on decision theory [38] and reinforcement learning techniques [39], with representative works such as Samuel's checkers program [40] and Nilsson's "learning machines" [41].
In the late 1970s, limited by the computing power of the time, the development of connectionism reached a low point, and the school of symbolism gradually emerged. Symbolism, also known as knowledge-driven methods, was defined as artificial intelligence at the Dartmouth Conference in 1956, and connectionism in cybernetics was introduced into AI years later [42]. Symbolism dominated the field of AI from the 1960s to the early 1990s. At the request of National Aeronautics and Space Administration (NASA) in 1965, Stanford University successfully developed the Dendritic Algorithm (DENRAL) expert system, which has a very rich knowledge of chemistry and can help chemists infer molecular structures from mass spectrometry data. The completion of this system marked the birth of expert systems [43]. By the mid-1970s, expert systems had gradually matured, the most representative of which was the Daptomycin (MYCIN) system by Sholtev et al., which was used to diagnose and treat bloodstream infections and encephalitis infections and could provide prescription recommendations [44]. Another highly successful expert system is the Prosecutor (PROSPECTOR) system, which was used to assist geologists in detecting mineral deposits and was the first to achieve significant economic benefits [45]. After the mid-1980s, expert systems have been widely put into commercial operation. One In the late 1970s, limited by the computing power of the time, the development connectionism reached a low point, and the school of symbolism gradually emerg Symbolism, also known as knowledge-driven methods, was defined as artificial inte gence at the Dartmouth Conference in 1956, and connectionism in cybernetics was int duced into AI years later [42]. Symbolism dominated the field of AI from the 1960s to early 1990s. At the request of National Aeronautics and Space Administration (NASA) 1965, Stanford University successfully developed the Dendritic Algorithm (DENRAL) pert system, which has a very rich knowledge of chemistry and can help chemists in molecular structures from mass spectrometry data. The completion of this system mark the birth of expert systems [43]. By the mid-1970s, expert systems had gradually matur the most representative of which was the Daptomycin (MYCIN) system by Sholtev et which was used to diagnose and treat bloodstream infections and encephalitis infectio and could provide prescription recommendations [44]. Another highly successful exp system is the Prosecutor (PROSPECTOR) system, which was used to assist geologists detecting mineral deposits and was the first to achieve significant economic benefits [4 After the mid-1980s, expert systems have been widely put into commercial operation. O In conclusion, connectionism is dominant in the current development process of AI while symbolism is in a slow development stage. However, both AI routes have their one-sidedness. Connectionism lacks robustness and interpretability, while symbolism lacks data mining and relies excessively on expert subjective opinions and complex combinatorial rules. Therefore, the two schools have a strong complementarity, and the integration of the two schools will certainly become a major trend in the future development of AI. For the convenience of description, this paper refers to symbolism as the first-generation AI and connectionism as the second-generation AI.

AI-Based Fault Diagnosis Classification
Since the 1980s, AI-based fault diagnosis techniques have been applied in nuclear power plants [53], and subsequent developments have been closely linked to AI techniques. The nuclear accident diagnosis expert system [54] is a typical representative of the firstgeneration AI-based techniques, which consists of a fault diagnosis knowledge base, a comprehensive knowledge base, and a fault diagnosis inference machine. The system obtains fault types by importing the monitored physical symptoms into the inference machine and interacting with the diagnostic knowledge base. Early expert systems were mainly based on simple if-then rules in the computer domain [55], and new theories such as signed directed graphs (SDGs) [56], Bayesian networks (BNs) [57], and dynamic uncertain causality graphs (DUCGs) [58] were gradually introduced subsequently. Neuralnetwork-based fault diagnosis technology is a typical application of second-generation AI, which uses historical data to train neural networks to obtain a diagnostic model capable of identifying faults [59]. In addition, scholars have conducted in-depth research on the applications of single and hybrid algorithms. Scholars applied a single algorithm including artificial neural networks, principal component analysis, support vector machine, decision tree, and unsupervised clustering to fault diagnosis and preliminarily verified the feasibility of these methods [60][61][62][63][64]. Subsequently, hybrid algorithms such as fuzzy logic-neural network, laminar model-neural network, principal component analysis-neural network, laminar model-support vector machine, and principal component analysis-convolution neural network are proposed, and these hybrid algorithms have been proven to have better diagnostic performances than single algorithms [65][66][67][68][69][70][71][72][73]. VOSviewer 1.6.17 is a software tool for constructing and visualizing bibliometric networks. In this paper, we use VOSviewer software to conduct statistical and cluster analyses of the literature to obtain the hot topics and frontier trends in the field of "nuclear power plant fault diagnosis". A subject search was conducted by the keyword "fault diagnosis", and the research direction was set to "nuclear science technology" based on all the databases subscribed to the Web of Science. The clustering view shown in Figure 3 was obtained by performing text analysis on 647 relevant papers. The brighter the node color in the graph, the more relevant papers are. Being closer to the center indicates that the research object receives more attention. The fault diagnosis methods related to AI can be divided into part A and part B as shown in Figure 3. In part A, "knowledge base", "expert system", "fuzzy logic", and "fuzzy logic" can be seen, and they belong to the first generation of AI technology. Part A is located at the edge of the clustering diagram, which indicates that the current attention to the application of the first generation of AI technology is not high. In part B, "artificial neural network" and "training" can be seen, and they belong to the second generation of AI technology. It can be seen that part B has brighter color and higher centrality, which indicates that the fault diagnosis method based on second-generation AI is the current research hot spot.  As shown in Figure 4, this paper establishes a new fault diagnosis classification framework from the AI perspective, that is, knowledge-driven and data-driven fault diagnosis methods. Knowledge-driven methods correspond to first-generation AI technology, and data-driven methods correspond to second-generation AI technology. Then, the development of the two methods is systematically combed to help readers understand the  As shown in Figure 4, this paper establishes a new fault diagnosis classification framework from the AI perspective, that is, knowledge-driven and data-driven fault diagnosis methods. Knowledge-driven methods correspond to first-generation AI technology, and data-driven methods correspond to second-generation AI technology. Then, the development of the two methods is systematically combed to help readers understand the progress of fault diagnosis technology from the AI perspective. As shown in Figure 4, this paper establishes a new fault diagnosis classification framework from the AI perspective, that is, knowledge-driven and data-driven fault diagnosis methods. Knowledge-driven methods correspond to first-generation AI technology, and data-driven methods correspond to second-generation AI technology. Then, the development of the two methods is systematically combed to help readers understand the progress of fault diagnosis technology from the AI perspective.

Knowledge-Driven Fault Diagnosis Methods
Knowledge-driven fault diagnosis methods for nuclear power plants, also known as expert systems, can be regarded as a combination of the knowledge base and the inference machine. They mainly use the experience accumulated by domain experts in long-term practice. As shown in Figure 5, the knowledge-driven fault diagnosis methods can be divided into two types, the early if-then (Section 3.1) and the current new theories (Section 3.2). These new theories include signed directed graphs, Bayesian networks and dynamic uncertain causality graphs, etc. The inference mechanism is the main difference between them. Finally, we summarize the characteristics of these knowledge-driven methods (Section 3.3).

Knowledge-Driven Fault Diagnosis Methods
Knowledge-driven fault diagnosis methods for nuclear power plants, also known as expert systems, can be regarded as a combination of the knowledge base and the inference machine. They mainly use the experience accumulated by domain experts in long-term practice. As shown in Figure 5, the knowledge-driven fault diagnosis methods can be divided into two types, the early if-then (Section 3.1) and the current new theories (Section 3.2). These new theories include signed directed graphs, Bayesian networks and dynamic uncertain causality graphs, etc. The inference mechanism is the main difference between them. Finally, we summarize the characteristics of these knowledge-driven methods (Section 3.3).

Fault Diagnosis Methods Based on If-Then
The fault diagnosis technology based on if-then rules mainly includes the establishment of the knowledge base and the inference engine. William et al. built a fault diagnosis expert system by taking four types of typical accidents (LOFW, SGTR, LOCA, and MSLB) in nuclear power plants as the diagnosis objects [55]. They first established nine if-then rules (Table 1) based on domain knowledge and then constructed an expert system based on these rules. As shown in Figure 6, the system infers from known facts until the type of accident is obtained. If there is not enough information to conclude, the system infers backward to determine what information it needs to know. The system will then query the nuclear plant instrumentation or use the operator to fill in the knowledge gaps. Bergman et al. first used expert systems to diagnose faults in boiling water reactors [74]. Since expert knowledge has uncertainty, some scholars have introduced the concept of fuzzy

Fault Diagnosis Methods Based on If-Then
The fault diagnosis technology based on if-then rules mainly includes the establishment of the knowledge base and the inference engine. William et al. built a fault diagnosis expert system by taking four types of typical accidents (LOFW, SGTR, LOCA, and MSLB) in nuclear power plants as the diagnosis objects [55]. They first established nine if-then rules (Table 1) based on domain knowledge and then constructed an expert system based on these rules. As shown in Figure 6, the system infers from known facts until the type of accident is obtained. If there is not enough information to conclude, the system infers backward to determine what information it needs to know. The system will then query the nuclear plant instrumentation or use the operator to fill in the knowledge gaps. Bergman et al. first used expert systems to diagnose faults in boiling water reactors [74]. Since expert knowledge has uncertainty, some scholars have introduced the concept of fuzzy membership in the representation of expert knowledge and used fuzzy logic for inference as a way to deal with the uncertainty of expert knowledge [75,76]. Sutton et al. developed a fuzzy expert system for the early detection of steam leakage faults in nuclear power plants [77]. Fuzzy theory is good at describing the uncertainty caused by imprecision, while evidence theory can describe the uncertainty caused by ignorance. Yang et al. proposed an expert system based on a confidence rule base based on fuzzy theory, evidence theory, and decision theory [78,79]. The confidence rule base adds the concept of confidence to the if-then rule, which can represent the complex causal relationship between various types of data with uncertainty. The above shows that the early expert systems in the nuclear field focused on the application and improvement under if-then rules.

Signed Directed Graphs
Signed directed graphs (SDGs) are also knowledge-driven methods, which do not require an exact mathematical model. SDGs first originated in the chemical industry and were proposed by Iri et al. [80]. SDGs consist of nodes and directed arrows between nodes, Figure 6. Schematic diagram of an expert system based on if-then rules [55].

Signed Directed Graphs
Signed directed graphs (SDGs) are also knowledge-driven methods, which do not require an exact mathematical model. SDGs first originated in the chemical industry and were proposed by Iri et al. [80]. SDGs consist of nodes and directed arrows between nodes, which can effectively represent the relationships between elements within a system. As shown in Figure 7, the node representation in SDG is flexible. Nodes a, b, and c can represent not only physical variables, such as pressure and temperature, but also some parts of the system, such as switches and valves. Nodes R1 and R2 can represent an event, such as a specific fault cause or adverse consequence [10]. The relationships between nodes in SDG are expressed qualitatively, and it is not necessary to provide the exact quantitative relationships between system nodes. Therefore, it is easier to establish the model. For fault diagnosis of the SDG model, the means of combining reverse inference and forward inference are generally adopted. Assume that nodes a and b are abnormal and c is normal. According to the reverse inference, two compatible paths can be obtained: b → a → R1 and b → a → R2. R1 and R2 are the candidate fault sources. Then the forward inference is verified. If R1 is the fault source, node c should be large or small, but the state of node c is normal, which is not consistent with the observation. Therefore, the candidate fault source R1 is a false solution and should be discarded. Similarly, the forward inference for R2 is verified, and R2 is consistent with the actual observed value when R2 is the fault source, which means that R2 is a plausible fault source. More detailed information on the signed directed graph can be found in [81][82][83].
In the nuclear field, Wu et al. thoroughly studied the application of S for fault diagnosis and successively combined SDGs with fuzzy theory an analysis for online monitoring and diagnosis of nuclear power plants [91-diagnosis technology based on SDGs can reveal the fault propagation path hensively explain the fault cause, which is its remarkable feature. However, tem is complex, the rule combination explosion problem will appear in acc the directed graph, which is one of the reasons why this method is not w present.
In the nuclear field, Wu et al. thoroughly studied the application of SDGs methods for fault diagnosis and successively combined SDGs with fuzzy theory and correlation analysis for online monitoring and diagnosis of nuclear power plants [91][92][93]. The fault diagnosis technology based on SDGs can reveal the fault propagation path and comprehensively explain the fault cause, which is its remarkable feature. However, when the system is complex, the rule combination explosion problem will appear in accordance with the directed graph, which is one of the reasons why this method is not widely used at present.

Bayesian Networks
A Bayesian network is a directed acyclic network, which consists of nodes and directed edges. Nodes include parameter nodes and fault nodes, and the relationship between nodes is connected by directed edges. The uncertainty of the relationship between nodes is expressed by a conditional probability table [94]. Figure 8 shows a simple Bayesian network model in which X 1 is the fault nodes and has two states ("0" and "1"), and X 2 − X 5 are parameter nodes, each of which has i, j, k, l states. Assuming that the parameter nodes have two states, their conditional probability tables are shown in Tables 2 and 3. The directed edges between the nodes indicate the dependencies between the parent and child nodes, such as X 1 with X 2 . More detailed information on Bayesian networks can be found in [95,96]. tween nodes is connected by directed edges. The uncertainty nodes is expressed by a conditional probability table [94]. Fig ian network model in which 1 X is the fault nodes and has 2 5 X X − are parameter nodes, each of which has i, j, k, l s rameter nodes have two states, their conditional probability and 3. The directed edges between the nodes indicate the de ent and child nodes, such as 1 X with 2 X . More detailed i works can be found in [95,96].   Table 2. CPT of parameters X 2 , X 3 , and X 4 . Table 3. CPT of parameter X 5 .
Assuming that the state information of the parameter node X 5 is currently obtained as l 1 , the probability that the faulty node X 1 is in state 1 (faulty state) is inferred from Equation (1).
Bayesian networks were first used to build regulatory systems and have been used in industrial systems since the 21st century, especially in the area of reliability. Lerner et al. proposed a dynamic Bayesian network (DBN) for tracking and diagnosing complex systems [97]. Przytula et al. proposed an efficient BN generation procedure for diagnosis and applied it to internal combustion locomotives, satellite communication systems, and satellite test equipment [98], which can handle continuous variables representing parameter states and discrete variables representing fault situations. Mahadevan et al. applied the BN concept to a new method for reassessing the reliability of structural systems [99].
In the nuclear field, Wu proposed a fault diagnosis framework for NPPs with BNs as the core, which combines PCA, data fusion, and fuzzy theory to achieve an online diagnosis of NPPs with multi-sensor information [100]. Jones et al. proposed a DBN system for diagnosing the state of nuclear power plants, which can predict the progress of an accident [101]. Oh et al. focused on the diagnostic performance under normal operating conditions and LOCA system states based on a dynamic Bayesian network and adopted the step-by-step diagnostic idea for system states and accident types [102]. Yi Ren et al. proposed a method of uncertainty reliability evaluation combining GO-FLOW and dynamic Bayesian network. This method uses sensitivity analysis to provide input information that contributes most to uncertainty. The uncertainty is then quantified using the DBN algorithm and Monte Carlo simulation to appropriately estimate the analysis results [103]. Zhao et al. combined Bayesian networks with a probabilistic risk assessment to achieve fast prediction of accident source terms. They used Bayesian networks for online fault diagnosis and matched the fault diagnosis results with the accident sequences in probabilistic risk assessment to obtain the source term release class [104][105][106]. Bayesian networks have advantages over if-then and SDG in accommodating missing information and uncertainty inference and quantifying diagnostic results.

Dynamic Uncertain Causality Graphs
In the 1990s, Zhang proposed a knowledge expression and inference model based on probability, the Dynamic Causality Diagram (DCD) [107]. Based on the DCD, Zhang further proposed the dynamic uncertain causality graphs (DUCGs), which added conditional action events and default events. It expressed the uncertain causality by independent random events and graphically. When predicting, the qualitative inference results are obtained first, and then the probability is calculated numerically [108]. Compared with BNs, the DUCG model is greatly simplified by removing unrelated independent events, and inference becomes very easy when evidence of independent connecting events or action events is introduced. In addition, DUCG overcomes the shortcomings that the concise expression of knowledge and inference methods of the BN applicable in the single-assignment case are not applicable in the multi-assignment case. The reason for describing the state of a variable is called an assignment. A single-assignment variable means that there is only one assignment for a variable, and a multiple-assignment variable means that there is more than one assignment for a variable. For detailed principles of DUCG theory, readers can refer to [58,[109][110][111][112].
In the nuclear field, Deng was the first to establish a DUCG model for fault diagnosis in NPPs and validated the performance of the model with a second-loop feeder pipe leakage fault [113]. Zhang et al. proposed a DUCG method with fault diagnosis and fault process deduction [114]. Zhao et al. proposed a DUCG diagnosis system for CPR1000 reactor type and compared the method with other diagnosis methods, which promoted the development of an intelligent diagnosis system for the CPR1000 [115][116][117][118]. Dong et al. studied the new inference algorithm and industrial fault diagnostic system for nuclear power plants [119,120].
According to our literature study, the application of DUCG in the nuclear field is still in its infancy. There is a lack of specialized books on this theory, which hinders the promotion of DUCG technology to a certain extent. However, DUCG-based medical diagnosis technology is developing more rapidly, and Zhang's team is also conducting related research and promotion [121][122][123][124][125][126].

Summary of Knowledge-Driven Fault Diagnosis Methods
This section focuses on knowledge-driven fault diagnosis methods for NPPs and classifies the existing methods into two types: if-then rule (Section 4.1) and new theories (Section 4.2). Expert systems (if-then), signed directed graphs, Bayesian networks, and dynamic uncertainty causal graphs are introduced in detail. The development characteristics of these four methods are shown in Figure 9. The earliest expert systems are based on ifthen rules and rely on the rules stored in the expert knowledge base. The symbolic-directed graph method incorporates qualitative knowledge representation between nodes, which greatly improves the inference ability. The Bayesian network introduces uncertain inference technology, which can accommodate information loss and improve system robustness. Dynamic uncertainty causal graphs develop multi-assignment inference techniques based on Bayesian networks, which possess higher inference efficiency. These knowledge-driven methods share common drawbacks. They all need to establish a complete knowledge base first. Therefore, it is necessary to improve the efficiency of knowledge acquisition and the completeness of the knowledge base. In addition, the complex cause-effect relationships within nuclear power plants make the knowledge base complex and large, which requires improved reasoning efficiency.
This section focuses on knowledge-driven fault diagnosis methods for N sifies the existing methods into two types: if-then rule (Section 4.1) and new tion 4.2). Expert systems (if-then), signed directed graphs, Bayesian netwo namic uncertainty causal graphs are introduced in detail. The development c of these four methods are shown in Figure 9. The earliest expert systems are then rules and rely on the rules stored in the expert knowledge base. The rected graph method incorporates qualitative knowledge representation be which greatly improves the inference ability. The Bayesian network introdu inference technology, which can accommodate information loss and impro bustness. Dynamic uncertainty causal graphs develop multi-assignment in niques based on Bayesian networks, which possess higher inference effic knowledge-driven methods share common drawbacks. They all need to est plete knowledge base first. Therefore, it is necessary to improve the knowledge acquisition and the completeness of the knowledge base. In addi plex cause-effect relationships within nuclear power plants make the kno complex and large, which requires improved reasoning efficiency.

Data-Driven Fault Diagnosis Methods
The data-driven fault diagnosis method for nuclear power plants can be regarded as a combination of a "data base" and an "inference machine". The "data base" is defined as the massive data resources required by the method, which should be distinguished from the concept of the database in the computer field. Additionally, the "inference machine" refers to a trained model based on large amounts of data, which is different from knowledge-driven "inference machines" (Section 3). As shown in Figure 10, the application of data-driven fault diagnosis methods in the nuclear field can be divided into two types: single algorithms (Section 4.1) and hybrid algorithms (Section 4.2). Most hybrid algorithms are improved based on single algorithms and have stronger diagnostic performance. To enable readers to understand the data-driven methods and their research progress in detail, the principles of several representative methods and their application progress in NPP fault diagnosis are introduced in Section 4.1. The research progress of fault diagnosis based on hybrid algorithms is introduced in Section 4.2. Finally, we summarize the characteristics of data-driven methods (Section 4.3).

Fault Diagnosis Methods Based on Single Algorithms
In this section, we present several representative single algorithms and their research progress in the nuclear field and conclude with a brief comparison of these methods. types: single algorithms (Section 4.1) and hybrid algorithms (Section 4.2). Most hybrid algorithms are improved based on single algorithms and have stronger diagnostic performance. To enable readers to understand the data-driven methods and their research progress in detail, the principles of several representative methods and their application progress in NPP fault diagnosis are introduced in Section 4.1. The research progress of fault diagnosis based on hybrid algorithms is introduced in Section 4.2. Finally, we summarize the characteristics of data-driven methods (Section 4.3).

Fault Diagnosis Methods Based on Single Algorithms
In this section, we present several representative single algorithms and their research progress in the nuclear field and conclude with a brief comparison of these methods.

Artificial Neural Network
Artificial neural networks (ANNs) are mathematical models that mimic the structure and function of biological neural networks. They are used to approximate or evaluate functions [127]. An ANN is a system that can learn and summarize existing data to produce a system that can be automatically identified. The most common artificial neural network is a back propagation neural network (BPNN), as shown in Figure 11, which consists of an input layer, one or more hidden layers, and an output layer in which neurons are connected by weights. Each neuron contains two transformation steps internally [128][129][130]. First, the weighted sum of all input values connected to that neuron is calculated. Second, the weighted sum is nonlinearly transformed using an activation function.

Artificial Neural Network
Artificial neural networks (ANNs) are mathematical models that mimic the structure and function of biological neural networks. They are used to approximate or evaluate functions [127]. An ANN is a system that can learn and summarize existing data to produce a system that can be automatically identified. The most common artificial neural network is a back propagation neural network (BPNN), as shown in Figure 11, which consists of an input layer, one or more hidden layers, and an output layer in which neurons are connected by weights. Each neuron contains two transformation steps internally [128][129][130]. First, the weighted sum of all input values connected to that neuron is calculated. Second, the weighted sum is nonlinearly transformed using an activation function.  The training process of a BPNN is as follows: when a BPNN obtains a lea ple, the sample is transmitted from the input layer through the hidden layer to layer, which is the input response of the network. If the network fails to ob pected target output in the output layer, the error signal will enter the back-p phase and return to the input layer along the original connection path. The e can be reduced by modifying the weights of each layer. When errors are pro peatedly, the correct prediction of the output layer increases. The back-propa The training process of a BPNN is as follows: when a BPNN obtains a learning sample, the sample is transmitted from the input layer through the hidden layer to the output layer, which is the input response of the network. If the network fails to obtain the expected target output in the output layer, the error signal will enter the back-propagation phase and return to the input layer along the original connection path. The error signal can be reduced by modifying the weights of each layer. When errors are propagated repeatedly, the correct prediction of the output layer increases. The back-propagation process is stopped until the error is sufficiently small, and then a mapping is created between the input and output to obtain a model with predictive or diagnostic capabilities. With the development of technology, artificial neural networks have developed in various forms. The network architecture can be divided into three types: feed-forward neural networks [131], recurrent neural networks [132], and reinforcement networks [133].
Since artificial neural networks can handle complex multimodal, associative, inferential, and memory functions, this matches the fault diagnosis of complex nuclear power systems. The fault diagnosis method based on a neural network is to establish a mapping of the fault diagnosis based on the training data. The trained network is then used for new observations to judge anomalies. Zwingelstein et al. first applied the BPNN to the fault diagnosis of NPPs and preliminarily verified the feasibility [60,134,135]. In addition to BPNNs, such as recurrent neural networks (RNNs) [136], improved BPNNs [137], self-organizing neural networks [138], and Hopfield neural networks [139] have all been studied in applications. In general, research based on neural networks is mostly in the preliminary validation phase. The combination of neural networks with other algorithms for diagnosis is the mainstream trend. The related content will be presented in Section 4.2.

Support Vector Machine
The basic idea of support vector machines (SVMs) is to divide data into different categories using a hyperplane formed by formulas. Taking the simplest two classifications as an example, as shown in Figure 12, the formula represents different hyperplanes. For a linearly separable dataset, w·x + b = 1 and w·x + b = −1 denote the two boundaries of the hyperplane. All hyperplanes that can divide the dataset into two classes are within these two boundaries. Among all hyperplanes, the goal of SVM is to find an optimal decision boundary that is farthest from the nearest samples of different classes, that is, to obtain the most robust classification hyperplane. Since the nuclear power plant operation data are nonlinear, it is not possible to establish the hyperplane by the same method. The solution is to map the data from the low-dimensional space to the high-dimensional space and find the optimal hyperplane in the high-dimensional space, and the kernel function is the core of the method. More detailed principles about SVM can be found in [133][134][135]. Gottlieb et al. first used support vector machines for the diagnosis of and verified the feasibility of SVM for data classification [135]. Zio et a vector machines in the diagnosis of subsystems such as feed water system system [137], and other components of abnormal monitoring [138,139]. Gottlieb et al. first used support vector machines for the diagnosis of NPP accidents and verified the feasibility of SVM for data classification [135]. Zio et al. used support vector machines in the diagnosis of subsystems such as feed water system [136], firstloop system [137], and other components of abnormal monitoring [138,139]. Kim et al. used support vector machines to predict the times of serious accidents to help operators better manage accidents [140]. Abiodun et al. established diagnostic models for different components of NPPs in the form of a support vector set for early fault diagnosis [141]. As with neural network methods, NPP fault diagnosis relying on SVM alone has been less studied. As a fundamental method, the current research involving SVMs is more in the area of hybrid algorithms, which will be presented in subsequent sections.

Decision Tree
The decision tree is a tree structure learned from data. The decision tree is based on a tree structure to make decisions. It selects one of several attributes of the training samples for determination each time and assigns the samples to different sets according to their values on that attribute, after which the next round of decisions is made until all the samples in the same set belong to the same class. Decision trees usually have three steps: feature selection, generation of decision trees, and pruning of decision trees. As shown in Figure 13, when using a decision tree for fault diagnosis, the fault feature parameters are tested starting from the root node, and the fault samples are assigned to their internal nodes based on the test results. Each internal node corresponds to a value of that feature, so the samples are tested and assigned recursively until they reach the leaf node, which is the type of fault. In fact, for complex industrial systems such as nuclear power plants, overly complex decision trees will lead to poor generalization performance. Readers can find more detailed information in [140][141][142][143]. In the field of NPP fault diagnosis, decision trees are more intuitive and than other algorithms, but pure decision-tree-based fault diagnosis is less ap al. first used a decision tree in the fault diagnosis of NPP and compared and with other algorithms [63,144,145]. Sharanya et al. used decision trees for the cooling tower faults in NPPs. Based on the comparison of several algorithm cluded that decision trees have the potential to be combined with other algori struct hybrid models [146]  In the field of NPP fault diagnosis, decision trees are more intuitive and explanatory than other algorithms, but pure decision-tree-based fault diagnosis is less applied. Yu et al. first used a decision tree in the fault diagnosis of NPP and compared and combined it with other algorithms [63,144,145]. Sharanya et al. used decision trees for the diagnosis of cooling tower faults in NPPs. Based on the comparison of several algorithms, they concluded that decision trees have the potential to be combined with other algorithms to construct hybrid models [146].

Principal Component Analysis
Principal component analysis (PCA) is a statistical method that converts a set of potentially correlated variables into a set of linearly uncorrelated variables called principal components through an orthogonal transformation. PCA is often used for data dimensionality reduction. As shown in Figure 14, the first step is to move the center of the axes to the center of the data and then rotate the axes to maximize the variance of the data on the C1 axis to retain more information, where C1 is the first principal component. The second step is to find the second principal component C2 so that it has a covariance of 0 with C1 to avoid overlapping with C1 information and maximize the variance of the data in that direction. The third step is to use the same steps as the second step to continue to find the next principal component. Data containing m variables can have up to m principal components. More detailed principles of principal component analysis can be found in [147][148][149].
Principal component analysis (PCA) is a statistical method that conver tentially correlated variables into a set of linearly uncorrelated variables ca components through an orthogonal transformation. PCA is often used for sionality reduction. As shown in Figure 14, the first step is to move the cent to the center of the data and then rotate the axes to maximize the variance o the C1 axis to retain more information, where C1 is the first principal com second step is to find the second principal component C2 so that it has a co with C1 to avoid overlapping with C1 information and maximize the varian in that direction. The third step is to use the same steps as the second step t find the next principal component. Data containing m variables can have up t components. More detailed principles of principal component analysis can [147][148][149].   [157]. In addition, most of the fault diagnosis methods use PCA as a pre-technology to reduce the data dimension to improve the diagnostic performance of hybrid methods, which will be described in subsequent sections.

Clustering
Clustering is an emerging method, and the understanding of clustering is not systematic enough compared with the aforementioned algorithms. There is not even a chapter on clustering in the well-known textbook [158]. Clustering is an unsupervised learning method, that is, the labeling information of the training samples is unknown. The goal is to divide the samples in a dataset into several usually disjoint subsets, each called clusters. Note that clustering is significantly different from classification. The former algorithms (Sections 4.1.1-4.1.4) essentially solve the classification problem, that is, the labels of each sample are known and the data are classified into known categories. Clustering divides the data into different subsets according to its inherent nature and rules. Clustering can be divided into partition-based methods, hierarchy-based methods, density-based methods, network-based methods, and model-based methods. More detailed principles of clustering can be found in [159][160][161].
In the field of NPP fault diagnosis, clustering is mainly used for fault monitoring. This method can be used to distinguish abnormal conditions from normal conditions even if they have not been trained. Talonen first developed a diagnostic model for early fault identification in NPPs based on a partition method [162]. Podofillini et al. established a dynamic process fault identification model based on model clustering [163]. Mercurio et al. simulated 60 accident samples and classified them into four categories using a clustering method, one of which was a new type of fault that was not trained beforehand [164]. Sameer et al. clustered turbine fault information from different NPPs and developed a generic fault diagnosis framework [165]. Baraldi

Comparison of Single Algorithms
Some data-driven methods have not been mentioned, such as logistic regression and naive Bayes. Due to their lack of applicability or application prospects, they are not further described in this paper. Table 4 summarizes the characteristic of the above five methods.

Fault Diagnosis Methods Based on Hybrid Algorithms
In the field of NPP fault diagnosis, hybrid algorithms combine the advantages of different single algorithms to obtain better diagnosis results. With the development of AI technology, almost all of the current methods are based on hybrid algorithms. As shown in Table 5, we conducted a detailed survey of the existing hybrid-algorithm-based fault diagnosis methods. In this table, we established six topics based on the five algorithms introduced in Section 4.1. In the first five topics, X stands for other auxiliary algorithms, such as ANN+X, which represents a hybrid diagnostic algorithm with ANN as the main algorithm. The sixth topic is the research literature involving the comparison of each algorithm. According to the literature research, the proportion of the literature on each topic and the distribution of the diagnostic objects are shown in Figure 15.
In general, the current research of ANN+X algorithm occupies the mainstream direction, which is consistent with the development trend of AI, followed by SVM+X and comparison. In terms of diagnosis objects, system-level faults in NPPs, such as LOCA, SGTR, MSLB, and other initial events, are the vast majority of the diagnosis objects. However, component-level faults such as valves, feed pumps, inverters, etc. are less studied. One of the reasons is that most of the studies are conducted based on simulators due to the lack of real fault data, while simulation data for system-level faults are more readily available. introduced in Section 4.1. In the first five topics, X stands for other auxiliary algorithms such as ANN+X, which represents a hybrid diagnostic algorithm with ANN as the main algorithm. The sixth topic is the research literature involving the comparison of each al gorithm. According to the literature research, the proportion of the literature on each topic and the distribution of the diagnostic objects are shown in Figure 15. In general, the current research of ANN+X algorithm occupies the mainstream direc tion, which is consistent with the development trend of AI, followed by SVM+X and com parison. In terms of diagnosis objects, system-level faults in NPPs, such as LOCA, SGTR MSLB, and other initial events, are the vast majority of the diagnosis objects. However component-level faults such as valves, feed pumps, inverters, etc. are less studied. One o the reasons is that most of the studies are conducted based on simulators due to the lack of real fault data, while simulation data for system-level faults are more readily available ANN+X. This topic is closely related to the development of neural network technol ogy. For example, RNN is a very popular technique in recent years, which is good at pro cessing time series data and has wide applications in natural language processing and artificial intelligence translation. Ye et al. introduced RNN with other algorithms into NPP fault diagnosis [171][172][173], making full use of the time series nature of the data. Some schol ars considered the NPP data too complex, so they used other algorithms as front-end tech niques to reduce the dimensionality and finally obtained better diagnostic performances [69,174,175]. Since there are various types of ANNs, each with unique advantages, inte grated learning techniques combining multiple networks have been widely studied [158][159][160][161][162]. Ming et al. introduced multilayer flow models to improve the accuracy and inter pretability of the neural network [67]. Qian et al. proposed a method to expand the faul diagnosis dataset based on generative adversarial networks (GANs) and demonstrated ANN+X. This topic is closely related to the development of neural network technology. For example, RNN is a very popular technique in recent years, which is good at processing time series data and has wide applications in natural language processing and artificial intelligence translation. Ye et al. introduced RNN with other algorithms into NPP fault diagnosis [171][172][173], making full use of the time series nature of the data. Some scholars considered the NPP data too complex, so they used other algorithms as front-end techniques to reduce the dimensionality and finally obtained better diagnostic performances [69,174,175]. Since there are various types of ANNs, each with unique advantages, integrated learning techniques combining multiple networks have been widely studied [158][159][160][161][162]. Ming et al. introduced multilayer flow models to improve the accuracy and interpretability of the neural network [67]. Qian et al. proposed a method to expand the fault diagnosis dataset based on generative adversarial networks (GANs) and demonstrated that the enhanced dataset can improve the performance of various models [163]. In addition, several scholars have studied the optimization of the hyperparameters of ANNs to obtain the parameter settings with the best diagnosis performance [172].
Comparison. In a comparative study, Yao et al. compared the performance of ANN with PCA, DT, and SVM methods in a more systematic way. He also transformed the state information of NPPs into image form and then used the advantages of convolutional neural networks in image recognition for the fault diagnosis [164]. In addition, Liu et al. built a hybrid model of SVM and SDG (the knowledge-driven method) and adopted different diagnostic methods for different objects [165]. To sum up, the relative research is relatively basic, and systematic comparative research is not yet available.
Other topics. The research directions of SVM+X include two main aspects. One is a combination of algorithms to reduce the data complexity with SVM. Another is to introduce algorithms related to the SVM parameter optimization [73,166,167]. The other three topics (PCA+X, DT+X, and Clustering +X) are less studied, and these three topics can be further studied in terms of integrated learning, interpretability, and hyperparameter optimization.

Summary of Data-Driven Fault Diagnosis Techniques
This section provides a detailed survey of the data-driven NPP fault diagnosis methods based on a single and hybrid algorithm perspective. To help readers understand the basic principles of these methods more quickly, this part dilutes the relevant formulas and elaborates the core ideas in layman's terms, and it includes an index of the relevant literature for readers who need to do further study. The study shows that the current research favors the application of hybrid algorithms. This is because a single algorithm often does not fully satisfy the needs of the fault diagnosis. ANN+X is a popular research direction, which is also driven by the current popular deep learning technology. However, for nuclear power plants with high safety and reliability requirements, the inherent uninterpretability of neural networks and the dependence on massive data will hinder their practical application.  [171] WPT and LSTM NPP converter The operation mode of the power system is analyzed in depth when a failure occurs. [174] WPT and ANN NPP system faults Disturbing perturbations in training set are reduced by WPT.
[173] RNN, WOLP, and ARTD NPP system faults The paper improved the practical applicability and scalability of diagnosis systems to real processes and machinery.
[69] CN and DBN NPP system faults Correlation analysis is used for dimensionality reduction.
[69] FNN and MSIF NPP system faults The system is able to achieve a single-fault and some multiple-fault diagnoses.
[ DT + X [145] DT and RS NPP system faults A parameter reduction method based on neighborhood rough sets was proposed.
Clustering +X [176] Clustering and FBS NPP turbine The paper developed a framework of unsupervised classification of transients.
Comparison [164] PCA and (SVM, KNN, LDA, DT, and LR) NPP system fault s The state information imaging is used to construct the different condition images.
[177] PCA and SVM NPP system fault s A three-layer fault classification model was established to diagnose the fault type, location, and degree. [178] PCA and ANN SG tube; RCS pump Radial basis network provides better prediction and diagnoses the faults faster than Elman neural network. [165] ANN, D-S, and SDG NPP system faults To the different diagnostic object, we adopted the different diagnostic methods.
[179] ANN and SVM Feed-water pump A comparative analysis of ANN and SVM was performed.

Results
The advantage of knowledge-driven methods is that there is no need to establish a systematic analytical model, and the diagnosis results are highly interpretable and robust.
However, there are shortcomings in these methods. First, it is difficult to obtain expert knowledge, and the accuracy depends on the richness of the knowledge base. Second, when there are many inference rules, matching conflicts may occur during the inference process, resulting in low inference efficiency. The advantages of the data-driven methods are that the modeling process is relatively simple, general, and in real time, but there are also some shortcomings. Second, it is difficult to obtain fault sample data, and it is almost impossible to obtain real fault sample data for NPPs. For example, a large amount of data is required to train a neural network, which is not friendly to nuclear power plants. Third, the generalization ability of the model is weak. Once the actual data are slightly different from the training data, it may lead to inaccurate diagnosis results. Fourth, its calculation process is not interpretable, and it is difficult to convince industry workers. Table 6 compares the advantages and disadvantages of the two types. The table shows that the two types are highly complementary but that a way to truly integrate the two has not yet been achieved. In recent years, physics-informed neural networks (PINN) [180], which embed physical partial differential equations into neural networks for learning solutions, have effectively improved the generalization ability of models and compensated for the disadvantages of purely data-driven methods. The integration of knowledge-driven and data-driven methods represented by PINN brings new ideas to nuclear power plant fault diagnosis technology. How to combine the two organically and make full use of the knowledge and data resources of nuclear power plants is an important research direction.

Conclusions and Future Directions
This paper reviews the fault diagnosis techniques for NPPs from the perspective of AI. A new fault diagnosis classification framework is established. The fault diagnosis techniques are divided into two types: knowledge-driven and data-driven. The knowledgedriven methods are divided into the early if-then rules and current new theories (SDG, BN, DUCG, etc.). The principles, application, and comparative analysis of these methods are systematically described. The data-driven methods are divided into two research directions: single algorithms and hybrid algorithms. For single algorithms, the principles, application, and comparative analysis of the five representative algorithms (ANN, SVM, PCA, DT, and clustering) are also given. For the hybrid algorithm, a "topic + X" classification means is constructed in this paper. The existing fault diagnosis technology based on hybrid algorithms is investigated in detail, and the mainstream trend of current research is given. Finally, the advantages and disadvantages of both knowledge-driven and data-driven methods are compared, and an important research direction is how to combine data-driven and knowledge-driven methods. With the advancement of AI technology, NPP fault diagnosis methods are still being improved and developed forward, and the following are some possible research directions.

1.
The combination of data-driven and knowledge-driven fault methods. At present, their respective theories have become mature, but there is still a lack of theoretical research integrating their advantages. In the context of the digital transformation of NPPs, data resources can be easier to obtain, and knowledge resources can be obtained from NPP's Deterministic Security Analysis Report (DSAR) and Probabilistic Risk Analysis Report (PRAR). DSAR and PRAR contain detailed knowledge descriptions of fault mechanisms, which can be used to build knowledge-driven models such as Bayesian network models, and data resources of nuclear power plants can help build data-driven models such as neural networks. For example, when strong interpretability of diagnostic results is required, it is necessary to consider incorporating knowledge into the model. We have explored the combination of knowledge-driven and data-driven methods by using PRAR and DASR to build Bayesian networks for fault type diagnoses as well as using data to build neural networks for fault severity diagnoses [181]. Studying new technologies to make full use of these two resources is a field worthy of research in the future.

2.
On-demand system fault diagnosis. In practice, Zhao et al. classify the types of NPP system-level faults into two types, operational faults and on-demand faults [106]. An operational fault is defined as an unexpected abnormal behavior during the operation of a nuclear power plant, such as a rupture of primary coolant pipes and a rupture of heat transfer pipes of steam generators. An on-demand fault is defined as the fault of the response system to perform a predetermined function after an operational fault occurs, such as high pressure in the primary circuit, which makes the pressure of the regulator higher than the set value and prevents the relief of the safety valve from opening. At present, there is little research on on-demand faults, but it is of great significance to nuclear safety.

3.
Introduction of digital twin technology. Digital twin refers to the simulation process of integrating multi-disciplinary, multi-physical quantity, multi-scale, and multiprobability technology by making full use of the data such as the physical model, sensor update, and operation history and finally completing the mapping in the virtual space, to reflect the whole life cycle process of the corresponding physical equipment [182]. At present, most studies are based on data from NPP simulators. These simulators are not high-fidelity models, which means that the application of the diagnostic model in actual NPP has great uncertainty. The digital twin technology can accurately simulate the actual equipment. The reliability and safety of its practical application will be greatly improved based on this technology. Nguyen et al. studied a digital twin approach to system-level fault detection and diagnosis for thermal hydraulic systems [183]. Therefore, it is of great significance to establish the digital twin model of NPPs.

4.
More detailed diagnostic hierarchy. As shown in Figure 15, most of the current studies focus on system-level faults in NPPs, while there are few studies on human factor faults, sensor faults, control room faults, network security faults, etc.

5.
Construction of the generalized model. In many cases, the fault diagnosis models we construct are only applicable to specific tasks. When encountering a new task, it is an important challenge to reuse the previous data and experiences. Transfer learning provides a possible solution [184]. Some related studies are in progress [185][186][187][188]. 6.
Interdisciplinary cooperation. Fault diagnosis is a comprehensive technology involving multiple disciplines (modern control theory, mathematical statistics, signal processing, pattern recognition, artificial intelligence, etc.). Most of the current fault diagnosis research is limited intra-disciplinary exploration. An optimal fault diagnosis research should gather multidisciplinary knowledge as a way to drive the fault diagnosis technology in a more efficient, sensitive, and intelligent direction. Therefore, a cross-disciplinary perspective is crucial for researchers. Data Availability Statement: Not applicable.