Structural Methodologies for Distributed Fault Detection and Isolation

The increasing complexity and size of cyber-physical systems (e.g., aircraft, manufacturing processes, and power generation plants) is making it hard to develop centralized diagnosers that are reliable and efficient. In addition, advances in networking technology, along with the availability of inexpensive sensors and processors, are causing a shift in focus from centralized to more distributed diagnosers. This paper develops two structural approaches for distributed fault detection and isolation. The first method uses redundant equation sets for residual generation, referred to as minimal structurally-over-determined sets, and the second is based on the original model equations. We compare the diagnosis performance of the two algorithms and clarify the pros and cons of each method. A case study is used to demonstrate the two methods, and the results are discussed together with directions for future work.


Introduction
As the complexity of industrial systems grows, system monitoring and fault diagnosis systems are becoming essential to assure system reliability and functional safety; see for example [1,2] and the references therein.Safety-critical systems must detect and isolate faults quickly and reliably to enable effective safety maneuvers and fault-tolerant control so as not to endanger operations and human lives [3].
Traditional approaches have focused on designing centralized diagnosers for complex systems, e.g., the Aircraft Diagnostic and Maintenance Systems (ADMS) used on modern aircraft systems [4,5].However, since many industrial applications involve large dynamic systems with many subsystems, distributed approaches to fault detection and isolation are becoming necessary for a number of reasons [6].Centralized diagnosers are less reliable because they create a single point of failure, and designing centralized diagnosers for large, complex systems may become computationally intractable.Transferring sensor data from the distributed subsystems to a central fault diagnosis unit can become error prone, for example because of packet losses and networking delays, which can then affect the accuracy and timeliness of diagnosis decisions [7].Furthermore, from a practical point of view, different subsystems are designed by different manufacturers, who may not be willing to pass along all of their knowledge of the subsystems to the system integrator to protect their intellectual property.This makes it difficult for the system integrator to design centralized diagnosers since they to do not have access to subsystem models.
A number of approaches have been developed for distributed fault detection and isolation in discrete event systems.In the simplest case, a group of distributed fault detection and isolation approaches considers each subsystem as a node that reports its state as "OK" or "faulty" without providing any details of the nature of the fault and how it was inferred.This approach is prevalent for wireless sensor networks [8] and computer network [9] diagnosis.For most systems that exhibit hybrid and continuous behaviors, distributed fault detection and isolation is more complicated.In these systems, a subsystem has several components, and a fault could occur in a sensor, actuator, or other components in the subsystem.Therefore, it is not enough to simply declare a subsystem as "OK" or "faulty," since the isolation of component faults requires deeper reasoning processes.Shames et al. [10] used a bank of unknown input observers for distributed Fault Detection and Isolation (FDI) in time-invariant linear systems.A three-layer distributed diagnosis architecture design was proposed by [11].The work in [12] proposed a distributed residual generation and computation approach for distributed diagnosis.
To achieve accurate fault detection and isolation of a known set of potential faults in a distributed framework, subsystems may have to share data so that the necessary residuals may be derived for fault isolation.Roychoudhury et al. [13] developed an algorithm that searches for the minimal number of additional external measurements to add to each local diagnoser in order to make all faults detectable and isolable in that subsystem.Daigle et al. [14] used a similar approach for distributed fault detection and isolation in mobile robots.Bregon et al. [15] used breadth-first search to find the minimum number of measurements to add to each subsystem to make all the faults detectable and isolable by using information from all local diagnosers.The algorithm guarantees minimum communication among subsystems; however, it is exponential in the number of system measurements.To address this problem, we have proposed a greedy search algorithm that is computationally efficient, but suboptimal.Ferrari et al. [7,16] proposed a similar robust distributed fault detection and identification approach, but they did not address the problem of determining the minimum number of required shared variables between the subsystems.
In this paper, we propose two general approaches for designing a set of distributed diagnosers that together have the same diagnosability performance as centralized approaches: (1) a Minimal Structurally-Over-determined (MSO)-based approach and (2) an equation-based approach.Some of our previous work [17,18] presented an initial approach and results for our distributed diagnosis approach.In this paper, we present the problem formulations, the proposed algorithms, and the accompanying proofs for the hypotheses on which these algorithms are based in more detail.We compare the computational complexities and the application of these algorithms to a testbed: the spacecraft electrical power distribution system [19].The advantages and disadvantages of each algorithm are discussed to help the health monitoring engineers select the proper approach for a given application.
The first approach uses Minimal Structurally-Over-determined (MSO) set selection [20] and provides globally correct diagnosis results while minimizing the number of measurements shared between different subsystems.Each MSO set used for residual generation represents an analytical redundancy relation in the system [20,21]; however, the total number of MSO sets is exponential in terms of the system measurements.To avoid the computational complexity of dealing with a large number of MSO sets, we propose a second algorithm for designing distributed diagnosers that is based on system equations.This solution is computationally efficient, and its solution matches the diagnosability capabilities of a centralized diagnoser.Moreover, the equation-based method does not require access to the global model for diagnoser design, which makes it applicable to large, complex systems, where global system models are likely to be unavailable or unknown.
The rest of this paper is organized as follows.Section 2 presents basic definitions and the running example we use in the paper.Section 3 presents the MSO-based distributed diagnosis approach, and Section 4 presents the equation-based approach.Section 5 discusses the case study, and Section 6 presents the advantages and disadvantages of each approach along with directions for future work.

Basic Definitions and Running Example
This section introduces the basic concepts associated with the distributed diagnosis of dynamic systems.

Definition 1 (System model).
A system model S is a four-tuple: (V, M, E, F), where V is the set of variables, M is the set of measurements, E is the set of equations, and F is the set of system faults.
It is assumed that the sets of V and E are sufficient to define the behavior of the system.The system S is partitioned into n subsystems, S 1 , S 2 , ....S n , where each subsystem model is defined as: Definition 2 (Subsystem model).A subsystem model S i (1 ≤ i ≤ k) associated with a system model, S, is also a four-tuple: We note that a variable can be shared between two or more subsystems describing the connection between the different subsystems.
Example 1.A four-tank system is used as a running example in the paper; see Figure 1.The system is assumed to be divided into four non-overlapping subsystems, where each subsystem is constituted of one tank and the outlet pipe to its right.Two of the subsystems, 1 and 3, also have external inflows into their tanks.Associated with each subsystem is a set of measurements, {y 1 , y 2 , ..., y 6 }, that are shown as encircled variables in the figure .The first subsystem, S 1 , in the running example is described by the following set of equations: ( E 1 = {e 1 , e 2 , e 3 , e 4 , e 5 , e 6 } defines the set of equations; V 1 = { ṗ1 , p 1 , p 2 , q in1 , q 1 } defines the set of subsystem unknown variables; M 1 = {u 1 , y 1 , y 2 } defines the set of subsystem known variables (measurements); and F 1 = { f 1 , f 2 } defines the set of faults associated with this subsystem model.It is assumed that the system parameters (C T1 , and R P1 of the first subsystem) are known.This model representation is used to emphasize the model structure, which is useful, for example, when analyzing structural fault detectability and isolability properties [2].Similarly, the three other subsystem models are defined by the following equations: In the equations, p i represents the pressure in tank i, and q i represents the liquid flow through the connecting pipe associated with the adjoining tanks.q ini represents the inflow into tank i.The capacity of tank i is represented as C Ti , and pipe resistance is given by R Pi .The fault parameters are modeled by f i .Fault f 1 represents a leak in Tank 1; f 2 represents a clog in the connecting pipe to the right of Tank 1; f 3 represents a leak in Tank 2; f 4 represents a clog in the connecting pipe to the right of Tank 2; f 5 represents a clog in the connecting pipe to the right of Tank 3; and f 6 represents a leak in Tank 4.
The subsystem equations as described in Example 1 take on a general form; as examples, they may be expressed as state space equations, implicit differential equations, etc.The following definitions describe connections between subsystems.Definition 3 (First order connected subsystems).Two subsystems, S i and S j , are first order connected if and only if they have at least one shared variable.Definition 4 (i th order connected subsystems).Two subsystems, S k and S j , are i th order connected if there exists a subsystem model S m that is (i − 1) th order connected to S k and is first-order connected to S j , or S m is (i − 1) th order connected to S j and is first-order connected to S k .
Example 2. In the four-tank example, subsystems S 1 and S 2 are first order connected, and their shared variables are V 1 ∩ V 2 = {p 2 , , q 1 }.Similarly, S 1 and S 3 are second order connected because both of them are first order connected to S 2 .
In this paper, MSO sets are used as the primary approach for FDI and defined as follows [20]: Definition 5. (Structural over-determined set) Consider a set of equations and its associated variables, measurements, and faults: (E, V, M, F).This set of equations is structurally over-determined (SO) if the cardinality of the set {E} is greater than the cardinality of set {V}, i.e., |E| > |V|.Definition 6. (Minimal Structurally-Over-determined (MSO) set) A set of over-determined equations is minimal structurally-over-determined if it has no subset of structurally-over-determined equations.
The MSO sets are minimal sets of equations that can be used to generate residuals, for example by using the Fault diagnosis toolbox developed by Frisk and Krysander [22].MSO sets represent redundant equation sets that capture the redundancies in the system: MSO l = (E l , V l , M l , F l ).For example, MSO 11 = (E 11 , V 11 , M 11 , F 11 ), where E 11 = {e 1 , e 3 , e 4 , e 5 , e 6 }, V 11 = { ṗ1 , p 1 , q in1 , q 1 }, M 11 = {u 1 , y 1 , y 2 }, and F 11 = { f 1 } represent an MSO set in subsystem S 1 (1) of our running example.For brevity and simplification, we simply say a specific equation, variable, measurement, or fault is a member of an MSO in the rest of the paper, e.g., f ∈ MSO l .Each MSO set represents a part of the system model that can be used to design a residual that is only sensitive to certain faults.A set of MSO sets can be used to generate residuals that together can isolate a set of faults.
To discuss the fault detectability and isolability properties of the global system and its subsystems, we define global and local fault detectability and isolability as follows.
Definition 7. (Globally-detectable fault) A fault f ∈ F is globally detectable in system S if there is a minimal structurally over-determined set MSO l in the system, such that f ∈ MSO l .Definition 8. (Locally-detectable fault) A fault f ∈ F i is locally detectable in subsystem S i if there is a minimal structurally over-determined set MSO l in the subsystem such that f ∈ MSO l .Example 3. Fault f 1 in (1) is locally detectable because f 1 ∈ MSO 11 .However, f 2 is not locally detectable because there is no MSO set in this subsystem that includes f 2 .To detect f 2 locally, the diagnosis subsystem requires additional measurements.Definition 9. (Globally-isolable fault) A fault f i ∈ F is globally isolable from fault f j ∈ F if there exists a minimal structurally-over-determined set MSO i in the system S, such that f i ∈ MSO i and f j ∈ MSO i .Definition 10. (Locally-isolable fault) A fault f i ∈ F i is locally isolable from fault f j ∈ F if there exists a minimal structurally-over-determined set MSO i in subsystem S i , such that f i ∈ MSO i and f j ∈ MSO i .
Note that if a fault f j is locally detectable in a subsystem S i , it is globally detectable as well, and if a fault f j is locally isolable from a fault f k , it is globally isolable from f k , as well.

Problem Formulation
The objective in this work is to design a set of distributed diagnosers that together have the same diagnosability as a centralized diagnoser.This means that a distributed approach should detect any fault that is globally detectable and isolate any pair of faults that are globally isolable.In the ideal case, there are enough MSO sets in each subsystem to detect and isolate all of its faults, F i .In that case, no exchange of information is necessary between the different diagnosers.If the independence among diagnosers does not hold, the different subsystems need to share some sensor data with each other to be able to detect and isolate the faults.
To address this problem in an efficient way, an integrated approach is derived to select a set of MSO sets for each subsystem that guarantee full diagnosability and minimum exchange of measurements among subsystems.The general idea is to augment each subsystem with additional measurements that are typically acquired from the (nearest) neighbors of the subsystem, such that all of the faults associated with the extended subsystem model are detectable and isolable.In the worst case, all of the measurements from another subsystem may have to be included to make the current subsystem diagnosable.When such a situation occurs, we say the two subsystems are merged and represented by a common diagnoser; therefore, the total number of independent distributed diagnosers may be less than k.
Let MSO = {MSO 1 , MSO 2 , . . ., MSO r } denote the set of candidate MSO sets for the system S.For each subsystem S i , the objective is to develop an algorithm to select a subset of MSOs that guarantees maximal structural detectability and isolability for faults F i associated with the subsystem, while using a minimum number of measurements from the other subsystems in the system to assure the equivalence of local and global diagnosability, i.e., where M o represents the set of measurement we need to communicate to the subsystem S i and M o ∪ M i represents the set of measurement subsystem S i that will be used to diagnose all faults associated with it.M represents the set of all measurements in the system.For a given set of measurements, X, D i (X) ⊆ F i represents the set of detectable faults, and I i (X) ⊆ F i represents the set of isolable faults in F i and F i ⊆ F.

MSO Set Selection for Distributed Fault Detection
For the situation in which the global model is known, M = {m 1 , m 2 , ..., m l } in Equation ( 3) is the set of all system measurements.Let us assume we can generate r MSO sets given M: MSO = {MSO 1 , MSO 2 , . . ., MSO r }.
Our goal is to design an algorithm that selects MSO i ⊆ MSO in a way that requires a minimum number of additional measurements M o ⊆ M, M i ∩ M o = ∅, i.e., measurements from the system not belonging to subsystem S i , to make all its faults globally detectable and isolable, if possible.Note that this is equivalent to the set-covering problem, which is NP-complete.In the past, heuristic search methods have been adopted for solving this problem, for example a Temporal Causal Graph (TCG) approach was used in [13].In this paper, the problem is formulated as a Binary Integer Linear Programming (BILP) problem [23]: where vector c represents the cost weights, matrix A and vector b define linear constraints, x represents the variables, x b represents the binary variables, and x k ∈ x b represents a scalar binary variable [24].
There are several tools available for solving this problem, e.g., branch and bound algorithms [25] and branch and cut algorithms [26] (see, for example, http://www.mathworks.com/help/optim/ug/mixedinteger-linear-programming-algorithms.html in the MATLAB TM linear integer programming toolbox).
To formulate the problem (3) as a BILP problem, a binary variable x(k): 1 ≤ k ≤ l is defined for measurement m k in the system as follows: where M o is the answer to Problem (3).An additional binary x(k + l): 1 ≤ k ≤ r, is used for MSO set MSO k in the system as follows. x Minimizing the number of measurements used from the other subsystems, this is formulated in the following cost function c as: where l is the number of system measurements and r is the number of MSO sets in the system.To determine the set of measurements and selected MSO sets in each distributed diagnoser, a BILP algorithm with l + r binary variables should be solved for the subsystem.
Consider subsystem S i with local faults F i and the set of system faults, F. Each local fault f j ∈ F i has to be locally detectable.Given Definition 8, local detectability of all the faults f j ∈ F i is achieved using the following constraints in the optimization problem (4).
By considering b(j) = −1 for 1 ≤ j ≤ g, where g is the number of faults in F i , the solution will contain at least one MSO set to detect each fault.
The following constraint is added: is used to isolate where h is the number of faults in the system, h = |F|, will make sure that there is at least one MSO set to isolate each of the subsystem faults from the other faults in the system.Using an MSO set is equivalent to using the measurements that are included in the MSO set.For example, using MSO 11 from the water tank example in a subsystem diagnoser requires three measurements M 11 = {u 1 , y 1 , y 2 } transmitted to that subsystem.Therefore, a set of constraints is included that capture the relationship between the measurements and MSO sets in the distributed diagnosis system.
Equation (10) represents this set of constraints in the A matrix.
where |M j | is the cardinality number of the set of measurements in MSO j and |M| is the number of measurements in the system.Furthermore, b(j) = 0 for g * h < j ≤ g * h + r, where r is the number of MSO sets in the system.
Example 4. The water tank system has 165 MSO sets.The entire system includes eight measurements, where subsystem S 1 includes three measurements.S 1 has two faults of interest, and the goal is to be able to isolate them from the other six faults in the complete system.For the optimization problem (4) for S 1 , matrix A ∈ R 177×165 , i.e., two local detectability constraints, 10 local isolability constraints, and 165 constraints to capture the relationship between the MSO sets and the measurements and 173 columns corresponding to the eight measurements and 165 MSO sets.Table 1 shows the set of measurements to add for each of the subsystem diagnosers to achieve maximum possible detectability and isolability.To find the optimum measurements, the optimization problem (4) has been solved for each subsystem.Subsystem S 1 initially has three measurements M 1 = {u 1 , y 1 , y 2 }.To achieve global diagnosability for its faults, y 3 must be shared with its diagnoser from subsystem S 2 .Subsystem S 2 is the only subsystem that shares a variable with a second order connected subsystem.All the other subsystems only need to communicate with their first order connected subsystems.
In general, the worst case scenario for a system with connected subsystems will typically require a large number of measurements from other subsystems to be communicated to each subsystem diagnoser.In those situations, distributed subsystem diagnosers help overcome the single point of failure problem, but each subsystem diagnoser may require a large number of measurements to be communicated to it from all of the other subsystems.
Even though there are efficient algorithms to solve BILP problems, this approach will not scale up for larger systems, since the search space is exponential in the number of MSO sets 2 r , even if the subsystem diagnoser design is performed off-line.In addition to the computational complexity, the availability of global models for large, complex systems is unlikely because of the issues discussed in Section 1.To overcome this problem, a heuristic search strategy is proposed based on an incremental search algorithm that works with the original system equations instead of MSO sets.

Equation-Based Distributed Fault Detection and Isolation
To avoid the computational complexity of the MSO-based algorithm, in the previous section, a distributed diagnosis method that works directly with the system of equations is proposed.To recap from earlier work [27], a structural model representation is used.A structural model describes which variables are included in which model equations.A useful tool is the Dulmage-Mendelsohn (DM) decomposition that decomposes a system model into three parts: (1) under-determined, (2) exactly determined, and (3) over-determined.The over-determined part introduces redundancy in the system description and forms the basis for fault detection and isolation [2]. Figure 2 shows the DM decomposition of subsystem S 1 in the running example.The shared variables are shown as encircled variables in the figure.Without loss of generality, it is assumed that every fault parameter is included in exactly one equation, i.e., each fault f appears in one equation e f .This is not a restricting assumption because if a fault is included in more than one equation, we can replace the fault signal by a new variable and add a new equation where the new variable is equal to the fault.Similar to the definitions of detectability and isolability for a structural model in, e.g., [20], local detectability and isolability can be defined as: Note that e 10 did not add any new unknown variables or faults to the subsystem model.Figure 3 represents the DM decomposition of the augmented subsystem, S 1e10 .This figure shows that e 2 ∈ S 1e10 + , and therefore, f 2 is locally detectable for the augmented subsystem S 1e10 .Figure 4 shows DM decomposition of the S 1e10 \ e 1 .Equation e 2 is in the over-determined part of the augmented subsystem model; therefore, f 2 is locally isolable from f 1 in the augmented subsystem.

Problem Formulation
An equation-based solution approach is formulated for designing a distributed diagnoser.For the given set of subsystems S 1 , S 2 , • • • S n , when there are faults that are not locally detectable or isolable in one or more subsystems, it is necessary to consider the following cases: f l ∈ F i and f m ∈ F i are not locally isolable from each other.

3.
f n ∈ F i is not locally isolable from f o ∈ F j and f o / ∈ F i .
The last case represents a scenario where a subsystem fault is not locally isolable from a fault outside of the subsystem.This scenario can happen because a fault occurrence can have consequences beyond the original subsystem.Designing distributed diagnosers that account for these three scenarios is the focus in this section.After addressing each of these situations, we derive an integrated approach to distributed FDI and derive algorithms that apply to complex, dynamic systems made up of a number of subsystems.
For each subsystem, it might be necessary to augment the subsystem model with additional equations that are typically acquired from the neighbors of the subsystem, such that all of the faults associated with the augmented model are locally detectable and isolable.A set of equations is minimal if there is no subset of equations that provides the same detectability and isolability.More formally, the problem of designing a diagnoser for a particular subsystem S i can be described as follows.
Consider N Si = {S 1 , S 2 , . . ., S l } \ S i as the set of neighboring subsystems to subsystem S i .To address the three situations mentioned above, an algorithm is to be developed to find a minimal equation set E o in N Si that guarantees maximal structural detectability and isolability for subsystems faults F i , i.e., that solves the optimization problem: where E l represents the set of all the equations in N Si , D represents the set of detectable faults in F i , and I represents the set of isolable faults in F i from the system faults F. Example 6.Consider the first subsystem of the running example S 1 , e 10 makes f 1 and f 2 detectable and isolable from all the other faults in the system.Therefore, A 1 = {e 10 } is a minimal solution to the problem.
In this section, we present a method to make all the faults in a subsystem locally detectable (Situation (1) above).We also discuss the solution to the fault isolability problem (Situation (2) above) and prove that if we address the first situation, the third situation is automatically taken care of.

Maximum Detectability
Example 7. Consider subsystem S 1 in the four-tank example whose equations are listed in (1).The DM decomposition of this subsystem is shown in Figure 2. f 2 is in the just determined part of the subsystem; therefore, the fault is not locally detectable.However, p 2 is a shared variable with Subsystem 2. Therefore, an equation from subsystem 2 can be selected, e 10 , to make f 2 locally detectable in the augmented subsystem, S 1e10 (see Figure 3).Adding measurement equation e 10 makes p 2 known and, therefore, makes the subsystem over-determined.
Note that a variable that only appears in one subsystem (for example ṗ1 in S 1 ) cannot become known by adding equations from other subsystems.Therefore, our ability to increase fault diagnosability is limited to the shared variables in the subsystem.More formally, we can prove the following theorem.

Theorem 1. Consider local subsystem model S
} where all the shared variables are known, f is not globally detectable.
Proof.If e f remains in the just determined part or under determined part of the subsystem when all the shared variables have became known, there is no additional equation in the system that can make any of the variables in e f known.Therefore, the equation cannot be moved to the over-determined part of the structural decomposition.
Therefore, the maximum detectability that can be achieved in each subsystem cannot be more than the detectability when all the shared variables are known.Using Theorem 1, we develop Algorithm 1 and Algorithm 2 to find an upper bound for the number of detectable faults and isolable fault pairs in each subsystem, respectively.Note that our algorithms do not require any information from the neighboring subsystems.
Adopting the following strategy, a minimal set of shared variables can be found that guarantees maximum detectability.

•
We assume all the shared variables are known.If a fault is not locally detectable when all the shared variables are known, that fault is removed from the list of detectable faults (see Algorithm 1).

•
Each shared variable is removed from the list of known variables to the unknown variables one at the time, to evaluate the list of detectable faults.If removing the shared variable from the known variables decreases the number of faults in the list of detectable faults, the shared variable is added back to the list of minimal required shared variables.Otherwise, the shared variable is not needed.
Algorithm 3 presents our method to find a minimal set of required shared variables.The algorithm is initialized with the subsystem model and the set of shared variables (for subsystem S 1 , p 2 and q 1 are unknown shared variables), and this provides a minimal subset of shared variables that makes all the faults detectable in the subsystem.For Subsystem 1, V 1m = {p 2 } is a possible answer.Algorithm 3 Minimal shared variables.
if Detectable-Faults(V im , S i ) not equal DF then 8: Note that all of the shared unknown variables may not be measured.However, in some cases, it is possible to transfer a set of equations from the neighboring subsystems that can be used with the equations in the subsystem to compute the unknown variables.

Equation-Based Fault Detection Approach
Given a minimal set of required shared variables, we present our proposed approach to find a minimal set of equations from the neighboring subsystems in order to achieve the maximum possible fault detectability.The procedure is illustrated by solving this problem for subsystem S 2 , presented in Equation ( 2), of the running example, and then generalizing this approach by developing a general algorithm to solve this problem.
Example 8.The corresponding structural decomposition of S 2 is shown in Figure 5. Subsystem S 2 is just determined; therefore, none of the faults are locally detectable.However, q 1 and p 2 are shared variables with subsystem S 1 , and q 3 and p 3 are shared variables with S 3 .Algorithm 3 finds V 1m = {q 1 , p 3 } as a minimal set of shared unknown variables, which if transferred from neighboring subsystems, can provide maximum detectability performance.Therefore, to make f 3 and f 4 locally detectable, the neighboring subsystems are explored to find equations that make the variables q 1 and p 3 known.
To find a minimal set of just determined equations that includes q 1 , we start with all equations in S 1 that have q 1 .These equations are e 1 , e 2 , and e 6 , as is shown in Figure 6.
Then, for the additional variables in each equation that are not already in S 0 2 , additional equations are included.For equation e 1 , two additional equations are needed, one with q in1 and the other one with ṗ1 .Finally, another equation is needed where p 1 is included.Since p 2 ∈ S 2 0 , the variable is not considered in this step.To find the other minimal sets in the example, we keep adding the relative equations to the other sets using the same approach described above.As is shown in Figure 6, by sequentially adding equations to the system, we eventually achieve four sets of minimal constraints: A 2 = {e 1 , e 2 , e 3 , e 4 }, A 3 = {e 1 , e 3 , e 4 , e 5 }, A 4 = {e 2 , e 5 }, and A 5 = {e 6 }. Figure 6 represents a matching algorithm.In the previous work in [28], a matching algorithm was introduced for finding a minimal set of equations for detecting each discrete mode change during the hybrid system's operation.In this paper, a similar approach was applied to find a minimal set of equations from neighboring subsystems for computing each required shared variable as presented as Algorithm 4.
Algorithm 4 Count matchings.for each e ∈ E, which can determine x do 7: Let M be M ∪ {e → x} 8: Let D be D ∪ {x}.

11:
Add all the undetermined variables of e to U .12: COUNT-MATCHINGS(M , D , U , E ) If we initialize the algorithm with the set of unknown variables (in Figure 6, q 1 is the unknown variable), this provides a set of complete matching of variables and equations in the neighboring subsystems that includes the unknown variables.Example 9. Figure 7 shows that augmenting A 2 with S 2 makes f 3 detectable.To make f 4 locally detectable as well, Algorithm 4 is used to find a minimal set of equations in the neighboring subsystems that includes p  Subsystem S 2 is just determined, but a subsystem can have an under-determined part as well.For example, consider subsystem S 3 in Equation ( 2) where the DM decomposition is shown in Figure 8. Fault f 5 is in the under-determined part of the structure.q in2 and q 3 are in the just determined part of the system, and we can compute them using e 15 and e 16 , respectively.However, to compute the other four variables in the subsystem, p 3 , q 2 , ṗ3 , and p 4 , we only have three constraints, which makes complete matching between constraints and variables impossible.To make this part of the subsystem just determined, we need to augment a set of equations from the neighboring subsystems.
Unlike previous work [18] where an algorithm was developed for subsystems with under-determined parts, Algorithm 3 automatically takes care of this.Using Algorithm 3 gives V m3 = {q 2 , q 3 } as a minimal set of required shared variables to make f 5 detectable.Having the set of required shared variables, Algorithm 4 gives A 6 = {e 11 , e 17 , e 18 , e 19 } as a minimal sets of equations from neighboring subsystems can be used to augment S 3 to make f 5 locally detectable.Figure 9 shows the DM decomposition of (S 3 |A 6 ).In some cases, it is possible that an augmented minimal set, A i , also adds a set of faults F A i to the subsystem model S i .These faults can be sensor faults or faults in other equations.The following theorem states that these faults are locally detectable in subsystem model S i .Theorem 2. Consider local subsystem model S i = {V i , M i , E i , F i } and E k a set of minimal equations that makes set of faults F i detectable in the augmented subsystem (M i |C augments ) = {V j , C j , F j }, then the set of faults F j in the augmented subsystem (S i |E k ) is locally detectable.
Proof.The proof of this theorem is straight forward, since the minimal set makes a part of the system that includes the fault over-determined, and the set itself should be in the over-determined part as well.This means the associated faults in the set are detectable.
For example, f 6 is locally detectable in (S 3 |A 6 ); see Figure 9.As long as fault detection is considered, the augmented faults do not cause any problem.The fault detection algorithm is summarized in Algorithm 5.

Equation-Based Fault Isolation Approach
In this subsection, it is assumed that the set of minimal equations to make all the faults locally detectable have been derived as described in the previous subsection.It is clear that the locally-detectable faults in each subsystem are locally isolable from the faults in the other subsystems not included in the augmented subsystem.

Theorem 3. Consider local subsystem S
Proof.Since f i is detectable, we have e f i ∈ S i + , and since f j / ∈ F i , we can say e f j / ∈ S i + .Therefore, Considering Theorem 3, it is straight forward to address the isolability problem.For each fault f i ∈ F i , we remove the associated equation e f i from E i and all the neighboring subsystems.Then, we use Algorithm 5 to make all the remaining faults in F i detectable.
Example 10.In (S 3 |A 6 ) in Figure 9, f 5 is isolable from f 1 , f 2 , f 3 , and f 4 because they are not in the augmented subsystem and f 5 is detectable in this augmented subsystem.To make f 5 isolable from f 6 , we remove e 17 from (S 3 |A 6 ) and S 4 .Applying Algorithm 5 to S 4 \e 7 gives {e 20 } as a minimal set that can make f 5 detectable.
The augmented subsystem (S 3 |A 6 ∪ {e 20 }) will detect f 5 and isolate it from all the other faults in the global system S. Algorithm 6 summarizes the method discussed above.
Our proposed approach considers the first order neighboring subsystems of subsystem S i and augments minimal constraints from them to maximize diagnosability.If the set of first order neighboring subsystems does not have required redundancies to achieve maximum diagnosability, the search process continues to the next higher order of neighboring subsystems, as illustrated in Figure 10.The expansion process will stop when the distributed approach achieves maximum diagnosability, which in the worst case will result in a centralized diagnoser for the whole system.Thus, it is guaranteed that the method will find a distributed set of subsystem diagnosers that achieves the same diagnosability performance as the best centralized diagnoser for the same set of measurements.Algorithm 7 summarizes this approach.
The set of equations and measurements that each subsystem in the running example needs from its neighbors to achieve maximum possible detectability and isolability using the equation-based approach are presented in Table 2. Table 2 shows that all the subsystems of the water tank example share measurements with their first order connected subsystems.This is a practical advantage of this algorithm because usually, the subsystems with shared variables are physically closer to each other (corresponding to our definition of nearest neighbors).Another advantage of this algorithm is that not only do we not need a global model for detecting and isolating the faults, but also, we do not use the global model the design process of the supervisory system.This makes the approach suitable for large, complex systems, such as aircraft and power plants where the global systems models are likely to be unavailable or are unknown.

Computational Complexity
The time complexity of Algorithm time complexity for subsystem S i , where |F i | is the number of faults in the subsystem.Note that in the case that no globally-accurate diagnoser can be derived using neighboring subsystems, the solution gradually expands to include all subsystems.Therefore, the time complexity of our proposed method in Algorithm 7 for subsystem i is O(|F i | × |U | × |E|!), where |E| is total number of equations in the system.
In practice, Algorithm 4 finds the answer much faster.For example, consider Figure 6 where Algorithm 4 is searching for a set of equations to solve q 1 .As soon as the algorithm reaches an equation that does not have the required unknown variable, the algorithm discards that equation and, therefore, avoids enumerating the rest of the candidate equations in that branch.To achieve even faster solutions, we can sort the equations by the number of their unknown variables before the search.In this way, the algorithm starts with equations with fewer unknown variables and, therefore, has to expand fewer branches on average.
The equation-based solution is exponential in terms of the number of equations in the system.The MSO-based solution is exponential in terms of the number of MSO sets in the system.The total number of MSO sets for fault detection and isolation grows exponentially as the number of measurements increase [29].Consider Definition 1.The total number of redundancies introduced into the system model is equal to the number of measurements, |M|.Theoretically, each MSO set can include anything from one to |M| measurements.Therefore, the total number of MSO sets, N MSO , is proportional to all possible combinations of the measurements: In general, there are many more MSO sets in a system than equations.For example, the running example in this paper has 20 equations, and the fault diagnosis toolbox generated 165 MSO sets for this system.Therefore, we expect the equation-based approach to solve the problem in a more efficient way, which is demonstrated next.

Case Study
The ADAPT-Lite system is designed to emulate the operation of generic spacecraft electrical power distribution systems [30].The system has five subsystems: (1) the battery, (2) the Direct Current (DC) electric load, (3) the inverter, (4) the Alternating Current (AC) resistive electric load, and (5) the electric fan as a inductive load for the AC system (see Figure 11).Seven measurements are made on the system: y E240 , y E242 , and y E281 represent DC voltage measurements in the system; y IT240 represents the battery current; y E265 represents the inverter AC output voltage; y IT267 is the inverter AC output current; and y ST516 is the fan rotational speed.Six faults are considered in the system: f E240 and f E242 are sensor faults in y E240 , and y E242 , respectively; f dc represents a fault in the DC load; f I NV models inverter faults; f ac represents a fault in the AC load; and f f an is a fan fault.The ADAPT-Lite system has several Circuit Breakers (CB236, CB262, CB266, and CB280), and relays (EY244, EY260, EY281, EY272, and EY275) and, therefore, operates as a hybrid system with multiple modes (configurations).In previous work [28], we discussed structural diagnosis for hybrid systems.In this paper, we focus on distributed diagnosis.Therefore, we assume all the circuit breakers and relays are on and there is no mode change in the system.The set of equations in each subsystem is derived as follows.
Subsystem 1 (battery): The set of equations: where is the set of unknown variables in this subsystem, the set of measurements is M a1 = {y E240 , y E242 , y IT240 }, F a1 = { f E240 , f E242 } represents subsystem faults, and C 0 , C s and R s are the component parameters in the subsystem.The battery is directly connected to the second subsystem (DC load).Subsystem 2 (DC load): The DC load is modeled by an electric resistance, R dc .The set of equations for this subsystem is: where V a2 = {v 3 , v dc , i B , i dc , i inv } are unknown variables, M a2 = {y E281 } are measurements, F a2 = { f dc } are faults, and R dc is a component parameter in the subsystem.Subsystems 1 and 2 are first order connected, and their shared variables are V a1 ∩ V 2 = {v 3 , i B }. Subsystem 3 The inverter converts DC power to AC.When there is no fault in the subsystem and the input voltage, v in , is above 18 V, the output voltage, v rms , stays at 120 V. R inv represents the internal resistance in the inverter, and e is the inverter efficiency coefficient.The set of equation for the subsystem is: where V a3 = {v in , v dc , v rms , i inv , i rms } are unknown variables, M a3 = {y E265 } is a measurement, F a3 = { f I NV } is a fault, and {e, R inv } are parameters of the subsystem.Subsystems 2 and 3 are first order connected, and their shared variables are V a2 ∩ V a3 = {v dc , i inv }.Subsystems 1 and 3 are second order connected because they have no shared variable, and they are both first order connected to the second subsystem.Subsystem 4 (AC load): Like the DC load, the AC load is modeled as an electric resistance, R ac .The set of equations for this subsystem is: where V a4 = {v 4 , v rms , i ac , i rms , i f an } are unknown variables, M a4 = {y IT267 } is the measurement, F a4 = { f ac } is a fault, and {R ac , φ} are parameters of the subsystem.Subsystems 3 and 4 are first order connected, and their shared variable is V a3 ∩ V a4 = {v rms }.Subsystem 5 (electric fan): The fan rotational speed, ω, is a function of the fan current, i f an .The last subsystem equations are: where V a5 = {i f an , v 4 , ω, ω, i f an } are unknown variables, M a5 = {y ST516 } is a measurement, and F a5 = { f f an } is a fault of the subsystem.Fan electrical resistance, R f an , fan inertial, J f an , and fan mechanical resistance, B f an , are the parameters.Subsystems 4 and 5 are first order connected, and V a4 ∩ V a5 = {i f an , v 4 } is the set of shared variables among these subsystems.

Distributed Diagnoser Using the MSO-Based Method
For the ADAPT system, there are 258 MSO sets.To find the minimal number of shared measurements, the global MSO set selection algorithm solves an optimization problem for each subsystem.Table 3 shows the set of measurements that needs to be add for each of the subsystem diagnosers to achieve maximum possible detectability and isolability.In the first subsystem, all the faults are locally detectable and isolable, and therefore, this subsystem does not require any additional measurements from the other subsystems.For each of the other subsystems, we have to transfer exactly one measurement to achieve maximum diagnosability.Table 4 shows the set of MSO sets for each local diagnoser.Note that the global MSO sets' selection method only minimizes the number of shared variables, but the subsystems may require equations from the other subsystems.For example, the first subsystem in ADAPT does not require any additional measurement to detect and isolate its faults locally; however, as we can see in Table 4, this subsystem requires several equations from the other subsystems to generate residuals.

Distributed Diagnoser Using the Equation-Based Method
Instead of generating all the MSO sets and selecting a subset of MSO sets for each local diagnoser, Algorithm 7 is used.
The results are shown in Table 5, summarizing the set of equations to augment each subsystem to achieve maximum possible detectability and isolability.In some cases, the first order neighboring subsystems were not enough to detect and isolate all the faults, and the algorithm had to extend to include higher order neighbors.For example, for Subsystem 2, the algorithm cannot find any solution when it considered the first order neighbors (Subsystem 1 and Subsystem 3).Therefore, it extended the search to a second order neighboring subsystem (Subsystem 4).
Table 5 also represents the set of additional measurements that we need to transfer to each ADAPT subsystem.As mentioned earlier, the equation-based algorithm does not guarantee globally-minimum communication.For example, Subsystem 2 required three measurements from other subsystems (see Table 5).However, Table 3 shows that complete diagnosability was achievable by adding only one additional measurement.To detect and isolate faults in each subsystem, the augmented subsystem equations were used to generate MOS sets.Table 6 shows the set of MSO sets for each local diagnoser using the equation-based method.The experiment was run on the same desktop where total execution time was 0.32s.This demonstrates the computational advantage of this method.MSO b51 = {e a19 , e a20 , e a24 , e a25 , e a26 , e a27 }

Designing the Diagnosers
After the augmented subsystem models have been selected, computational tools, for example the fault diagnosis toolbox [22], can be used to generate the set of residuals to be used for each subsystem.For example, the fault diagnosis toolbox generates the following residual from MSO a11 .r a11 = y E240 − y E242 . ( Each residual is sensitive to a set of faults, and the set of residuals for each subsystem can detect all the globally-detectable faults and isolate all the globally-isolable faults in the subsystem.For example, r a11 in ( 18) is sensitive to f E240 and f E242 and, therefore, can be used to detect these faults.In realistic situations, sensor noise and model uncertainties can have a negative impact on each diagnoser's performance.For example, consider the case where each sensor in the ADAPT-Lite system has an additive noise with a normal distribution, N(0.0.1). Figure 12 shows that because of noise, residual r a11 is not zero even when there is no fault in the system.This can impact fault diagnosis performance negatively by increasing false positive rates.Moreover, noise can hide the effect of faults on the residuals and lead to high false negative rates. Figure 13 shows residual r a11 when an additive fault f E240 = 0.25 occurs at t = 20 h.Sensor noise conceals the fault signal and makes fault detection and isolation more challenging.To achieve acceptable performance in practice, it is necessary to design a set of hypothesis tests, such as the Z-test [31], to distinguish faults from noise and uncertainty and determine what residual outputs are significant enough to reject the normal operation assumption and trigger the alarms.Figure 14 shows that a simple hypothesis test can achieve a zero false positive rate and detect f E240 in less than 5 min using residual r a11 .

Discussion
The two proposed algorithms for designing distributed diagnosis systems provide a solution with maximum possible detectability and isolability that can be achieved for a system given a set of measurements.Unlike previous work, such as [14,15], our proposed methods were based on system models expressed as equations and, therefore, did not need to use the temporal response and event ordering in the diagnosis, all of which are derived properties and, therefore, require additional computation.Using a purely structural approach reduced the overall diagnosability of the system for the given set of measurements.However, it also reduced the number of assumptions we needed to make about the fault characteristics, such as the order of events in the diagnoses subsystems (which can be error-prone), and we did not have to analyze in detail the subsystem dynamics.
The total number of MSO sets was exponential in terms of the system measurements, and the MSO set selection was equivalent to the set covering problem.Therefore, the MSO-based algorithm had high computational cost especially for large-scale systems.The algorithm guaranteed that the subsystems shared a minimum number of measurements between the subsystems, implying that we minimized the communication of measurement streams across subsystems of the global system.This is important because sending data between subsystems is costly in large-scale systems.Moreover, it is straight forward to extend the MSO-based approach to robust distributed diagnosis by considering residuals' robustness performance in the selection process [32,33].
The equation-based algorithm found a minimal set of equations from neighboring subsystems that guaranteed the maximum possible detectability and isolability that can be achieved for the system given a set of measurements.The number of equations was significantly smaller than the number of MSO sets.Therefore, the second algorithm was computationally more efficient.Moreover, the second algorithm did not need to use the global model in the design process of the supervisory system.This makes the algorithm very feasible for large-scale complex systems.However, it did not guarantee that the number of shared variables among the subsystems was globally minimum.

Conclusions
Two algorithms are proposed for designing distributed diagnosers where the number of sensor data shared between different local diagnosers is minimized.The first method generates the MSO sets for the given global system and selects a subset for each subsystem with minimum required shared variables.Having all the MSO sets computed in advance makes robustness analysis possible for robust distributed MSO set selection.The second algorithm used a heuristic equation-based approach, which is computationally more efficient and makes it suitable for large-scale systems.The ADAPT system case study was used to compare the two algorithms and illustrate the advantages of each method.In future work, we will consider noise and uncertainty in the design step and will extend the distributed diagnosis design problem to robust distributed fault detection and isolation using different methods for decoupling noise and uncertainties.

Figure 2 .
Figure 2. Dulmage-Mendelsohn (DM) decomposition of the first subsystem model.The figure represents the set of equations in the just determined part, S 1 0 , the set of equations in the over-determined part, S 1 + , and the set of unknown variables in each equation.

𝑒 4 :Figure 6 .
Figure 6.Finding the minimal sets of equations in S 1 to compute q 1 .

1 :
input: current matching M 2: input: sets of determined variables D and undetermined variables U , set of equations E 3: if U = ∅ then 4: return M as a feasible (minimal) matching.5: for each x ∈ U do 6:

Figure 10 .Algorithm 7
Figure 10.Expanding the search environment to the higher order connected subsystems.
5 is mostly governed by Algorithm 4 (Count-Matchings) that has exponential complexity O(|U | × |E N |!), where |U | is the number of required unknown variables in the subsystem and |E N | is the number of equations in the neighboring subsystems.Algorithm 6 calls Algorithm 5 for every fault in the subsystem.Therefore, Algorithm 6 has O(|F i | × |U | × |E N |!)

Figure 12 .
Figure 12.Residual r a11 for 24 h when each sensor in the ADAPT-Lite system has an additive noise with a normal distribution, N(0.0.1), and the system is in normal operation.

Figure 13 .
Figure 13.Residual r a11 for 24 h when each sensor in the ADAPT-Lite system has an additive noise with a normal distribution, N(0.0.1), and f E240 = 0.25 occurs at t = 20 h.

Figure 14 .
Figure 14.A hypothesis test can achieve a zero false positive rate and detect f E240 in 4 :48 using residual r a11 .The step function represents the detection time.

Table 1 .
Set of augmented measurements for each subsystem model.
Note that these definitions are equivalent to Definitions 8 and 10 since an MSO set is an over-determined equation set.Consider Definition 11 and Figure2.Fault f 1 is locally detectable because e 1 ∈ S 1 + , but f 2 is not locally detectable since e 2 / ∈ S 1 + .To expand the over-determined part and make f 2 detectable, the diagnosis subsystem needs to include at least one additional equation.The extension to the original subsystem is defined as: Definition 13. (Augmented subsystem) Given subsystem S i and a set of equations, E k , the augmented subsystem modelS iE k = (S i |E k ) is (V iE k , M iE k , E iE k , F iE k ), where V iE k is the union of V i and the unknown variables that appear in E k , M iE k is the union of M i and the known variables that appear in E k , E iE k is the union of E i and E k , and F iE k is the union of F i and the possible faults associated with E k .Consider the running example.S 1e10 = (S 1 |e 10 where S i + is the over-determined part of subsystem S i .Definition 12. (Locally isolable) A fault f i ∈ F i is locally isolable from fault f j ∈ F if e f i ∈ (S i \e f j ) + , where (S i \e f j ) + is the over-determined part of subsystem S i without equation e f j .

Table 3 .
Set of augmented measurements for each ADAPT subsystem using the global method.

Table 4 .
Set of Minimal Structurally-Over-determined (MSO) sets for each local diagnoser using the MSO-based method.

Table 5 .
Set of augmented equations and measurements for each subsystem model using the equationbased approach.