A Novel Data-Driven Fault Detection Method Based on Stable Kernel Representation for Dynamic Systems

With the steady improvement of advanced manufacturing processes and big data technologies, modern industrial systems have become large-scale. To enhance the sensitivity of fault detection (FD) and overcome the drawbacks of the centralized FD framework in dynamic systems, a new data-driven FD method based on Hellinger distance and subspace techniques is proposed for dynamic systems. Specifically, the proposed approach uses only system input/output data collected via sensor networks, and the distributed residual signals can be generated directly through the stable kernel representation of the process. Based on this, each sensor node can obtain the identical residual signal and test statistic through the average consensus algorithms. In addition, this paper integrates the Hellinger distance into the residual signal analysis for improving the FD performance. Finally, the effectiveness and accuracy of the proposed method have been verified in a real multiphase flow facility.


Introduction
As a result of intelligence and informatization, modern industrial processes have evolved to become more complicated. Any abnormal behavior of equipment components may affect productivity or even cause accidents. To guarantee the reliability and stability of industrial systems, fault detection (FD) plays a fundamental role and has received intensive attention from scholars and engineers [1][2][3].
Currently, the majority of FD methods are commonly classified as model-based techniques and data-driven techniques [4]. In the framework of model-based FD, it is necessary to obtain a precise mathematical representation of the systems. In addition, the accuracy of the FD results will depend on the accuracy of the modeling. According to the way of signal generation, existing model-based FD methods can be divided into three distinct groups, i.e., parameter estimation techniques, observer-based techniques, the subspace-based strategy [5]. In practical applications, such methods can successfully implement FD schemes when accurate mathematical models are available. Unfortunately, with the increasing size of modern engineering systems, the modeling of systems using first principles poses further challenges.
With the accelerated advancement of sensor networks and data processing methods, data-driven FD strategies have naturally become a critical research topic and are well developed [6][7][8][9][10]. Data-driven FD techniques are typically sorted into multivariate statistics, neural networks, and subspace-technique-aided schemes, etc. [11]. Specifically, traditional multivariate statistical methods [12][13][14], i.e., canonical correlation analysis (CCA), have been extensively studied and applied to modern industrial systems. The core of multivariate statistical methods is to analyze the correlation among process variables, followed by constructing appropriate test statistics for FD tasks. This group of methods can solve the FD problem well in static processes. However, such traditional FD strategies do not usually consider dynamic changes in the systems. Therefore, they are usually unable to perform FD tasks in dynamic systems. In the past few years, with increasing attention to statistical learning, neural network-based FD methods have been rapidly developed [15][16][17][18]. This group of methods usually uses historical data generated by the production process to train the neural network model. Then, the models classify the test data to determine whether faults have occurred. Due to the excellent fitting ability of neural networks, neural networkbased methods have superior performance in dealing with FD problems for nonlinear systems. However, the training phase of neural networks requires the use of abundant labeled data, which has some limitations in practical applications [19]. In recent years, the subspace-technique-aided FD method has been widely studied because of its simple design and lower computational effort [20][21][22][23][24][25]. The core idea behind it is to identify the system parameters through the collected data. Because it takes into account the dynamic behavior in the systems, the subspace-aided FD method performs very well in dealing with process dynamics. In the subspace technique framework, Reference [23] proposes a stable kernel representation (SKR) of the systems. Remarkably, the proposed SKR scheme can directly construct residual generators using process data without identifying complex system models. Based on this, many data-driven FD algorithms are designed under the SKR framework.
At present, the majority of data-driven FD methods have been designed in a centralized framework, which involves collecting all process data in a central location to carry out necessary computations and FD tasks. With the growth of industrial process size, the centralized design becomes increasingly demanding in terms of memory and computing power, resulting in poor flexibility and high cost. Therefore, there is a strong research increase in distributed data-driven FD [26]. For example, ref. [27] evaluated the effectiveness of multiblock multivariate statistics-based approaches, such as PCA and PLS, for decentralized monitoring and assessed the individual contributions of each block. Although the above approaches can implement distributed FD for each sub-block, they do not take into account the connection among the sub-blocks. On this basis, ref. [28] designed a multiblock PCA strategy where information interaction among subblocks is considered. Similarly, considering the connection among neighboring nodes, ref. [29] designed a distributed CCA algorithm to achieve plant-wide process monitoring. The core idea behind it was to reduce uncertainty through information interaction among neighboring nodes. However, when there are few relevant variables in the historical data, the distributed FD results obtained based on this method are usually unreliable. In order to address this defect, ref. [30] designed a distributed, regularized CCA-based process monitoring algorithm. First, the traditional CCA algorithm is executed between the local and the neighboring nodes. In order to eliminate uncorrelated variables from the monitoring data, a GA-based regularization algorithm is then embedded in the traditional CCA technique. Finally, according to the local monitoring results, the corresponding residual signals and test statistics can be generated at the local node. In terms of technical systems, the multiphase flow facility used in this study is a device to achieve the separation of water, oil, and air at a given flow rate. The test zone, consisting of splitters and supply lines, can provide a mixture of water, oil, and air. The technical system is typically used to verify the effectiveness and accuracy of process monitoring and FD algorithms, such as the latent-variable-analysis-based FD method [31], the Kalman-filter-based FD method [32], and the multivariate-statistics-based process monitoring method [33]. In addition, the technical system used is described in detail in [34,35].
In general, the data-driven distributed FD methods above usually ignore the dynamic changes in the process. Therefore, these data-driven distributed methods have some limitations in dealing with FD problems in dynamic processes. In addition, there are abundant process data in dynamic systems, and the relationship among process variables is complicated. The abundant process data and strong coupling among variables bring new challenges to existing distributed data-driven FD solutions.
Motivated by the aforementioned points, a new data-driven distributed FD strategy is developed for dynamic systems. Compared with previous research, the key contributions of the developed FD solution are given as 1.
Compared with traditional SKR-based FD approaches, the proposed method is more sensitive to fault information by introducing the Hellinger distance (HD) in the residual signal.

2.
The consensus algorithm is embedded in the information interaction among sensor blocks. Therefore, each sensor block can obtain FD results without global fusion operations, thus remarkably improving FD efficiency.

3.
It has superior flexibility in the design of FD framework, particularly when the system models are not accurately obtained.
The structuring of this work is structured as follows. Section 2 provides information about the system descriptions, Hellinger distance, and the average consensus algorithms. In Section 3, a new distributed FD scheme for dynamic systems is presented. The effectiveness of the proposed FD algorithm is then demonstrated through a multiphase flow facility in Section 4. Finally, Section 5 presents the conclusion and prospects for future work.

System Descriptions
Given a LTI system H(z) with input factor u ∈ R u and output factor y ∈ R y , the input-output (I/O) behavior is characterized as where variable z represents the complex z-transform. In order to analyze the relationship among variables in dynamic systems, the state space model is used in this study. It not only reflects dynamic behavior in the process data but also provides a concise description, which is usually expressed in a standard form, as follows: where A, B, C and D are system parameters; u(k) ∈ R u , y(k) ∈ R y , and x(k) ∈ R x refer to the system input, output, and state variables, respectively. A sensor network consisting of k t blocks has been integrated into the considered system, as shown in Figure 1. In the sensor network, the network topology G can be represented using "node" J and "edge" K as

Hellinger Distance
Hellinger distance (HD), also known as the Bhattacharyya distance, is a type of fdivergence [36]. The f -divergence is a function that measures the difference between two probability distributions. HD is a statistical technique that evaluates the resemblance of two frequency distributions to each other. Supposing that n(x) and m(x) denote two probability density functions (PDFs) and since the probability distributions of the variables are unknown in the general definition, the HD between n(x) and m(x) is next defined as which can be also be expressed in Euclidean norm Based on the Cauchy inequality, HD is a symmetric bounded metric which satisfies 0 ≤ H(n, m) ≤ 1 and H(n, m) = H(m, n). In addition, according to the Lebesgue theorem, the squared form of HD in (6) is characterized as In order to perform FD task, Lemma 1 gives a concise representation of (8), which serves as the basis of the proposed approach.
is further represented as follows:

Average Consensus Algorithm
Given a communication network consisting of k t nodes, the consensus algorithm is a convergence technique to implement consensus calculations. For a data vector µ i at the i-th sensor block, the consensus technique can be executed as where µ i (s) refers to the calculated value of µ i during the sth iteration; v i,j is the weighting coefficients,. Many studies [37][38][39] have designed algorithms to solve the weighting prob-lem. Among them, the Metropolis-Hastings technique not only speeds up the convergence of the iterative algorithm but also enables computation in a distributed manner. Therefore, the Metropolis-Hastings technique is used to construct the weighting factors in this paper.
The weighting factors are assembled as follows where J i represents all adjacent blocks of the i-th block.
The final consensus results can be presented as which indicates that the consensus value of each block will converge to the average value of all sensor blocks.

Methodology
In this section, the SKR of the residual generator is first introduced. Considering the probability distribution of fault information, a novel FD strategy is then presented and applied to the FD problem.

SKR
Considering the process model of (1)-(3) above, the left coprime factorizations (LCF) of H(z) is given as follows: where (P(z), Q(z)) is called the left coprime pair. A key feature of the LCF under noise-free and fault-free conditions is displayed as where −P(z) Q(z) is denoted as the SKR of (2) and (3) [23]. Therefore, all LTI residual generators can be parameterized as

Data-Driven Distributed Fault Detection
Given that a sensor network is integrated into the dynamic system, the state-space model of the system and sensor measurements with noise are displayed as where x(k) ∈ R k l , u(k) ∈ R k m denote the system state and the process input; y i (k) ∈ R k n is the output vector at the i-th sensor sub-block; f (k) ∈ R k f denotes the unknown faults; σ(k) ∼ N(0, Σ σ ) and τ i (k) ∼ N(0, Σ τ i ) represent process and measurement noise, respectively. In addition, σ(k) and τ i (k) are assumed to be Gaussian distributions.
Considering that measurement data y i , i = 1, · · · k t , can be sent to a sensor block, a global model is then constructed as follows: where To complete the algorithm implementation of SKR, data models are indispensable in the design processes [40]. Assuming there exists a data variable κ s (k) ∈ R k κ , it can be further depicted as where k denotes sampling instants, and s and N are some integers. According to the extended models of H(z) in (17) and (18), a data model is derived by iterative computation at each node: where In order to remove the unobservable variable X k , (23) is re-modeled as where K s,i = I 0 G s,i F s,i ∈ R (s+1)(k n +k m )×(n+(s+1)k m ) . When s is large enough, there must be a left nullspace of K s,i as K s,i ⊥ is called the SKR of the system. Due to the excellent reliability and robustness of QR algorithm, the data-driven implementation of K s,i ⊥ is able to be executed via QR decomposition and SVD: where s p represents the past moment; In addition, the noise terms can be identified by the proof in [20]: Observe that R 2,1 R 2,2 R 3,1 R 3,2 and K s,i have same null space. It thus holds that In order to identify the residual signal, K s,i,y ⊥ in (25) needs to be obtained in a datadriven manner. It has been demonstrated [40] that where K s,i,y ⊥ = F s,i ⊥ . Therefore, the residual generator can be obtained as follows: Although the residual signal r i (k) generated by SKR has the advantages of having a simple design and a low computational effort, the robustness of its FD results often becomes weak under actual operating conditions. In order to improve the robustness of the SKR framework for FD applications, the probability distribution of the residual signal deserves further investigation. Based on the idea of HD, an approach to evaluate the similarity between two PDFs is introduced into the SKR framework.
Considering that the above process noise and measurement noise obey Gaussian distributions, the residual signal is r i ∼ (0, Λ i 2 ) in the normal (fault-free) historical dataset. In addition, for the actual fault dataset, the residual signal isr i ∼ (θ i ,Λ 2 i ). Therefore, a HD metric for the reisdual signal r i at the i-th sensor block can be represented as follows: where f (r i ) and f (r i ) denote the PDFs of r i andr i , respectively. According to the property of Lemma 1, (33) is able to be rewritten as Based on the proposed FD algorithm, the T 2 statistic at each sensor block can be displayed as whereĥ 2 i ∈ R k n can be obtained by (34) under the actual fault dataset; h i 2 ∈ R k n denotes the mean term of h i 2 under the normal historical dataset; Ψ is the covariance matrix ofĥ 2 i − h i 2 .
In order to implement distributed FD, each sensor node needs to perform the identical T 2 test statistic. Based on the above purpose, the average consensus technique is introduced in this framework. The consensus algorithm for where s denotes the iteration number. The initial value is i (0) = i . Furthermore, it holds that [5] lim s→∞ i (s + 1) =¯ = As the algorithm runs until convergence, the consensus result can be obtained at each block: Based on the consensus techniques above, identical i is obtained at each node. Therefore, (35) can be rewritten as where φ denotes the the covariance matrix of i . As a result, the T 2 statistic can be executed in parallel at each block. When the amount of data is sufficient, the used T 2 statistic obeys a chi-square distribution (T 2 ∼ χ β 2 (k n )). Specifically, χ β can be determined by a χ 2 distribution with degrees k n of freedom as follows: Based on this, the fault detection threshold can be calculated as where k n is the dimension of the residual data; and β is the confidence level (acceptable false alarm rate).
In addition, the FD logic for each node is represented as follows: In summary, with the help of the SKR framework, Hellinger distance, and average consensus algorithm, the distributed FD scheme is summarized in Algorithms 1 and 2. In addition, the flow chart of the proposed FD algorithm is shown in Figure 2.

Facility Description
In this study, a data set obtained from a multi-phase flow plant [35] is used to validate the proposed FD algorithm. The multiphase flow plant can achieve gas-liquid separation at a given flow rate. This device takes into account various working conditions during operation, so it can generate abundant process data from different operating conditions. In addition, the generated process data contain dynamic behavior by changing the set point of the flow rate. It is depicted in Figure 3, and its schematic diagram is presented in Figure 4. Specifically, the device comprises geometrically designed pipes and a 1.2 m high liquid-gas splitter. It is capable of providing separate air, oil, and water, as well as mixtures of these fluids. During the operation of the plant, the mixtures are split in a horizontal splitter. The air is returned to the environment, while the water-oil mixture is returned to their respective tanks (T100 and T200). The water coalescers ensure complete separation of oil and water before returning to their respective tanks. The flow conditions of air, water, and oil can be regulated by control valves. In addition, the relevant control valves can be operated continuously between closing and opening. In terms of sensor distribution, there are sensors measuring pressure at the air delivery line (PT417) and inside the three-phase splitter (PT501). Other sensors are located at the water delivery line (FIC101, FT102), at the bottom of the two-phase splitter(FT406), at the top of the two-phase splitter (FT404), at the top of the three-phase splitter (PIC501), at the air delivery line (FIC302, FT302), at the top of the water tank (LI101), at the bottom of the three-phase splitter (LI502), and at the top of the water coalescer (LI503).

Fault Injection and Distributed Fault Detection
In order to gather the necessary historical data for this experiment, a SCADA platform can be utilized at a sample rate of 1 Hz. The data parameters utilized in this validation are outlined in Table 1. A communication topology of the sensor network is represented in Figure 5.   Table 1 for tag descriptions).  To evaluate the effectiveness of the proposed method, the off-line part is first executed. The input flow rate of the training dataset is shown in Figure 6. Then, two typical faults are used to verify FD performance. The first fault scenario involves an incipient fault that arises due to the obstruction of the top separator input, leading to the shutdown of VC404 between 1136 s and 8352 s. The input situation of the faulty dataset is depicted in Figure 7. The residual signal of fault 1 and the HD-based h 2 statistic at node 1 are shown in Figure 8. The residual signal r 1 (k) in Figure 8 indicates the overall trend of the system. However, when the fault amplitude is low, the residual signal is often not effective in capturing abnormal cases. The blue curve in Figure 8 does not changed significantly after the fault occurred.
Based on the above problem, a HD-based metric is implemented into the proposed distributed SKR framework to enhance the sensitivity of the residual signal. When the fault intensity is minor, the information on the probability distribution also changes remarkably. As a result, the sensitivity of FD is significantly increased by measuring the HD of the residual signal. The green curve in Figure 8 indicates that the HD-based h 2 statistics have changed significantly after the fault occurred. The performance evaluation focuses on two key metrics: the Missed Alarm Rate (MAR) and the False Alarm Rate (FAR). In this study, the consensus algorithm is embedded in the information interaction among sensor blocks. Each node in the sensor network is used to execute FD algorithm through average consensus techniques. As a consensus result, each sensor node can obtain an identical FD performance. The distributed FD diagrams for fault 1 are displayed in Figures 9 and 10. J(2) and J (8) in Figures 9 and 10 represent the distributed FD results at node 2 and node 8, respectively. In terms of performance metrics, the MAR is 0.0323 and the FAR is 0.0391 at node 2. In addition, the MAR is also 0.0323 and the FAR is also 0.0391 at node 8. The second fault scenario is an intermittent fault, also known as segment plugging in practical engineering terms. This type of fault commonly occurs in the riser of multiphase flow when the flow rate of liquid and gas is low. The fault was introduced by deliberately reducing the air and water flow rates to levels that induce plugging. In this dataset, two plugging conditions were introduced and eliminated, from 686 s to 1172 s and from 1772 s to 2253 s, during the experiment. Specifically, the plugging fault was first formed at 686 s by continuously reducing the flow rate of air and water. The plugging fault was then removed at 1172 s when the air flow rate gradually increases. Additionally, the plugging fault was introduced again from 1772 s to 2253 s by changing the input flow rate. The input situation and detection results for fault 2 are shown in Figures 11-13, respectively. The calculated MAR and FAR at node 1 and node 5 are 0.0610 and 0.0454, respectively.

Comparison Results
In order to show the enhanced FD performance, Table 2 provides four sets of evaluations using MARs and FARs as assessment indicators. In Table 2, traditional SKR and dynamic principal component analysis are centralized designs; The distributed CCA and the proposed scheme are distributed designs.
According to the performance indicators in Table 2, both FAR and MAR of the proposed algorithm are significantly lower than other FD algorithms. This indicates that the accuracy and effectiveness of the proposed algorithm are better than the traditional SKR algorithm and other comparison algorithms. The excellent FD performance is mainly the result of the introduction of the Hellinger distance in the SKR framework. Specifically, the Hellinger distance is first introduced into the traditional SKR framework to further analyze the fault features of the residual signal. Since the Hellinger distance can accurately measure the difference between two probability distributions, the proposed algorithm is more sensitive to the fault information of the residual signal. As a result, the proposed algorithm can capture the fault information in the residual signal more effectively, which further improves the reliability and accuracy of the FD algorithm.

Conclusions
This study proposes a novel distributed FD method by introducing Hellinger distance and average consensus algorithm in the SKR framework. The proposed algorithm has the following three main advantages and differences over existing FD methods. This study introduces the first Hellinger distance in the traditional SKR framework to further analyze the fault features of the residual signal. Since the Hellinger distance can accurately measure the difference between two probability distributions, the proposed algorithm is more sensitive to the fault information of the residual signal. In addition, the consensus algorithm is embedded in the information interaction among sensor blocks. Based on this idea, each block can obtain FD results without performing global fusion operations. Finally, the proposed algorithm can identify noise terms and the residual signals directly from the process data. It has superior flexibility in the design of the detection framework, particularly when the system models are not accurately obtained. The accuracy and validity of the proposed FD algorithm have been verified via a multiphase flow facility. In addition, fault-tolerant control based on data-driven SKR is an open problem that can avoid the complex design of control systems. Based on this study, distributed FD with external disturbances and fault-tolerant control based on data-driven SKR will be explored in our future work.