^{1}

^{2}

^{1}

^{*}

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

To improve fault detection reliability, sensor location should be designed according to an optimization criterion with constraints imposed by issues of detectability and identifiability. Reliability requires the minimization of undetectability and false alarm probability due to random factors on sensor readings, which is not only related with sensor readings but also affected by fault propagation. This paper introduces the reliability criteria expression based on the missed/false alarm probability of each sensor and system topology or connectivity derived from the directed graph. The algorithm for the optimization problem is presented as a heuristic procedure. Finally, a boiler system is illustrated using the proposed method.

Fault detection plays a necessary and important role in large-scale industrial systems for safety issues. Its basis and data source is the measurements from sensors. Measurement technology and sensor quality has progressed significantly in the past several decades, but the problem still exists because not all process variables of concern can be measured due to economic and technical limitations, and the reliability of sensors cannot be assured. In large-scale systems, the components are interconnected and so the variables are correlated, which constitutes information on system topology with causality. After a fault occurs, it not only shows up as local phenomenon but also propagates to some other components or variables. Hence we should consider the sensor location problem to find the root cause of the fault origin and type from the viewpoint of the whole system.

In order to measure the fault detection quality related to sensor location, some criteria are defined in Kawabata

In engineering practice, sensors may often be faulty, meaning that they may fail to give adequate readings. For example, the reading may remain unchanged when the true value should be a deviation, which is called a missed alarm; or the sensor may give an alarm for a normal operation state, known as a false alarm. We should therefore allow for some redundancy in sensors in case of failures. More commonly, the measurements may show these two kinds of sensor faults because of the choice of the threshold. Often due to noise there are no real sensor faults but deviations due to measurement noise, which is inevitable. If the threshold setting is strict in order to suppress the missed alarm probability, the reading will be sensitive to random noise and temporary deviations, resulting in a high probability of false alarm. If we relax the threshold and accept larger region to be considered as normal, then the number of false alarms will decrease with more missed alarms. Therefore, missed alarms and false alarms are two aspects of reliability and we have to make a trade-off between them. This can be clearly illustrated via a receiver operating characteristics (ROC) curve [

With increasing complexity in process industrial systems, traditional mathematical models are difficult to obtain. Hence, graph-based models are proposed in the modeling analysis. Based on the signed directed graph (SDG) model, Raghuraj,

This paper is structured as follows: The criteria of fault detection, especially the reliability criterion regarding false and missed alarms in sensor readings, are presented in Section 2. Section 3 explains how to use graph theory to obtain the reachability measure between faults and process variables measured by sensors, which is needed in the optimization criteria. In Section 4, the optimization algorithm for the sensor location is proposed to improve the reliability of fault detection, followed by a case study to illustrate the application in Section 5. Finally some concluding remarks are given in the last section.

There are basic criteria that should be met under all fault detection issues, and also optimization criteria in consideration of faulty sensors or unreliability of sensor measurements.

The nodes in the SDG are classified into two types–variables and fault origin actors, which are denoted as _{i}_{j}

Starting from the fault node

Regarding detectability, each fault should be detected by at least one sensor. The definition of detectability appears below:

If there exists at least one sensor placed in the nodes of

Because the propagation time is ignored here, only leaf nodes are needed to consider whether or not to place sensors [

Based on the SDG, disregarding the cases that some variables cannot be measured, sensors need to be placed only on the leaf nodes.

According to the weak connection condition (i.e., the corresponding undirected graph is connected), each fault origin has at least one path to the leaf nodes, thus placing sensors on the leaf nodes can meet the detectability criterion. Assume that a sensor location with

Different faults have different behaviors. Represented in the SDG, the reachable nodes from the faults are different. So we must place sensors on these different nodes to identify the different faults. The definition of identifiable faults as noted in [

If there exist at least one sensor on the nodes of _{1}) (measuring corresponding variables), and these sensor nodes are not within the nodes of _{2}), in other words, if there are sensors in the nodes of _{1}, _{2}) = _{1}) ∪ _{2}) − _{1}) ∩ _{2}), then we say that faults _{1} and _{2} are identifiable.

Detectability and identifiability are two independent concepts. A fault can be detectable, but it may not be identifiable. On the other hand, identifiability does not imply detectability in general, because we can place only one sensor to identify them. But usually we assume that only when the faults are detectable, can they be considered for identifiability. Thus the identifiability criterion is stronger.

It should be noted that the signs of the nodes and branches can help identify different faults because some sensors are not only able to activate the alarm, but also indicate the direction of the departure from the normal values. For this case, we can split a node into two, one may show a higher deviation, and the other may show a lower deviation [

Detectability and identifiability are necessary conditions for fault detection. However sensor readings are not always reliable, which affects the reliability of fault detection. Let F_{i}_{j}_{i}_{i}_{j}_{j}_{j}_{i}_{j}_{ij}

As shown in

These numbers can be obtained by experiments. The missed alarm probability of sensor S_{j}_{j}_{j}

For each fault F_{i}_{i}_{i}_{j}_{j}_{j}_{ij}_{i}

On the other hand, we need to be concerned about the false alarm problem. For the variable S_{j}_{j}

The calculations of missed alarms and false alarms are dual problems, in which adding sensors will reduce the undetectability whilst increasing the false alarm probability. Here the false alarm probability reflects the influence of a sensor's false alarm on the whole system.

When a fault occurs, it will be measured not only by the adjacent sensors directly, but also by the influenced sensors due to propagation between variables. In order to describe the propagation, the SDG has been proposed as a qualitative model which uses nodes and arcs to denote the variables and their causal relations [

Signed adjacency matrix is an equivalent expression of the SDG, whose elements ‘0’s, ‘+1’s or ‘−1’s correspond to the arc signs in SDG. For the (

Given the adjacency matrix ^{k}^{k}

The diagonal elements in

The reachablity matrix can also be obtained by graph traversal instead of matrix computations. The depth first search method can be used to find the paths. The graph traversal method has many advantages compared with the matrix computation method. First, the paths can be obtained in addition to the reachability matrix, which is intuitive and may help for the fault propagation and other analysis [

The reachability matrix is also a probabilistic value because the connectivity may be broken for some reasons. This random factor, however, is quite small compared with the measurements, so it is ignored and so the reachability is regarded as a binary value.

The two criteria, detectability and identifiability should be met at first when deciding the sensor location. Yang and Xiao [

In the trade-off between false alarms and missed alarms, missed alarms are often considered to be more important because we do not want to lose a real fault. Thus the algorithm handles this criterion first. Meanwhile, we hope that the false alarm probabilities will be as small as possible, so we integrate the treatment of false alarms into the whole algorithm.

If we consider all the faults, then we want to minimize the total undetectability probabilities for all the faults, each one of which is a probability that no sensors indicate the alarm for the corresponding fault. According to the assumption of origin of a single fault [

This optimization problem cannot be solved analytically for the following reasons. First, this problem does not have a continuous solution space; instead it is an integer programming problem. Thus we should update the solution (_{j}_{j}_{j}_{0} is the cost limit. In addition, there may be other constraints due to technical or other reasons. Sometimes we have more constraints such as the number limit of sensors. Thirdly, the initial value of the problem is obtained according to the criteria of detectability and identifiability, and the _{j}_{j}

On the other hand, the probability of false alarm of the system is a product of probability that no faults have occurred and the probability that at least one of the sensors indicates a false alarm. Then the problem can be expressed as:

We introduce a similar assumption that at most one sensor will indicate a false alarm. This assumption is reasonable when the false alarm probability is small. Thus the false alarm problem can also be formalized as an integer optimization problem:

This expression is an approximation in order to simplify the computation. When adding a sensor, we can just add a _{j}

When trying to reduce the undetectability by adding a sensor, one is concerned not with the total number of missed alarms but the number for each fault or some specific faults. Thus the summation in

If we pay less attention to the false alarm probability, then we can treat it as a constraint and just set a limit _{0}. Then we obtain the simplified algorithm:

Initialization:

Get _{i}, u_{j}_{j}

Get _{ij}

Get the minimal _{j}

Calculate _{j}

Calculate _{j}_{j}

Let the index set of

Calculate _{i}

Select the maximal value from _{I}

Let the set of _{Ij}_{I}_{Ij}

Select the minimal _{J}_{I}_{J}_{j}_{∈}_{AI} u_{j}_{I}_{j}

Place a sensor on variable _{J}_{J}

Update the false alarm probability _{j}_{I}

Check the cost and other constraints. If they are met, then go on; if not, then delete _{I}

Go to step (2) and update the undetectability.

The algorithm is illustrated as a flow chart in

We choose a 65 tonnes per hour steam boiler system as an example that is widely used in the power and petrochemical industry, and realize its operation in both normal and abnormal conditions by a simulation software–Personal Simulator [

Five typical faults are considered here, all of which are complicated faults that have influences on multiple variables. The faults with their probabilities are listed in

The system's SDG is shown as

Initially all the variables have sensors except TI-07, FA, FH, FM, FL, which meets the criteria of detectability and identifiability for these five faults. Now we want to reduce the undetectability, so the algorithm we presented is applied. The execution procedure is recorded in

By adding two sensors on LIC-01 and FI-03, the maximal undetectability among all the faults reduces 100-fold from 1.5e-4 to 1.5e-6, while the total false alarm probability of all the sensors increases by only 12.9%. In fact, in real systems there are indeed levels of redundancy on the corresponding level sensors and flow meters.

In order to test that the approximation from

In this case, the variables that are not affected by any of the five faults can also be ignored in the procedure because placing sensors on them have no influence on the reliability. We can also use other optimization methods to obtain the optimal solution at once if we follow the objective and constraints. We tried this on this example and the results are the same.

In industrial systems, alarm monitoring design is a very important issue, for which the trade-off between missed alarms and false alarms should be treated appropriately. We should pay attention to two levels of design problems: (1) at the local level, the threshold selection, data filtering and alarm triggering are the key problems to be solved; (2) at the system level, topology expression and sensor location for alarm rationalization is important. In this paper, we have described and solved the sensor location problem aiming at the trade-off with the help of topology expressed by SDG. The optimization objective is expressed as the minimization of all the fault undetectabilities in the system. The false alarm probability is used as constraint as well as the cost limit.

The problem described in this paper is based on some simplifications. For example, the sensors on the same variable are assumed to have the fixed missed alarm probability and false alarm probability. However in reality the sensors can be different and the thresholds are not necessarily the same. So the problem formulation can be generalized as a more accurate form. And the multiple sensors usually do not just add to the redundancy but also provide more information by fusion. Again, future work could be the combination of system level problem and the local level problem.

The work was supported by the National Natural Science Foundation of China (Grant No. 60736026, 60904044), China Postdoctoral Science Foundation (Grant No. 20080440386) and NSERC (Natural Sciences and Engineering Research Council of Canada) - Suncor - Matrikon - iCORE Industrial Research Chair Program.

element of a matrix

set

_{i}

cost to be paid when placing a sensor on variable

_{0}

cost limit

_{ij}

reachability from fault

_{i}

fault

fault node

_{i}

occurrence probability of fault

serial number of fault

identifiability set

serial number of sensor

measureable node, number of measureable nodes

variable node, number of faults

reachability set

reachability matrix

sensor

_{j}

missed alarm probability of sensor

_{i}

undetectability probability of F

_{i}

false alarm probability of sensor

_{i}

false alarm probability of sensor

_{j}

number of sensors placed on variable

adjacency matrix

Bipartite graph to show the relations between faults and sensors.

Confusion matrix to show the terminology of missed alarms and false alarms.

Flow chart of the optimization algorithm.

Boiler system flow sheet.

(a) SDG of Boiler system. (b) Fault propagation of fault F3.

Typical faults and their occurrence probabilities in the system.

Faults | Description | Consequences | Probabilities |
---|---|---|---|

F2 | Steam drum full of water | Inlet reduced heavily | 0.1 |

F3 | Lack of water in steam drum | Water level decreases gradually | 0.05 |

F4 | Fire extinguishment | All the gas muzzles are extinguished; pressure and temperature of the stream decrease | 0.01 |

F5 | Power off | A series of complex phenomenon | 0.001 |

F6 | Failure in the cooler | Temperature of overheated steam reduces; cooling water reduces abnormally, etc. | 0.001 |

Sensor missed alarm probabilities and false alarm probabilities.

Missed alarm probability | False alarm probability | Sensors |
---|---|---|

0.25 | 0.002 | TIC-01, TI-07, AI-01 |

0.2 | 0.003 | PI-03, PI-05 |

0.15 | 0.004 | FR-01, FR-02, FI-03, FR-04, FI-06, FR-07, FI-08 |

0.08 | 0.005 | FH, FM, FL, FA |

0.02 | 0.008 | PIC-01, PIC-02, PIC-03, PIC-04 |

0.01 | 0.009 | LIC-01, LIC-02 |

Reachability from faults to variables.

_{j} | ||||||
---|---|---|---|---|---|---|

TIC-01 | 0 | 0 | 0 | 1 | 1 | 0.0020 |

TI-07 | 0 | 1 | 1 | 1 | 1 | 0.0019 |

AI-01 | 0 | 0 | 1 | 1 | 0 | 0.0020 |

PI-03 | 0 | 0 | 0 | 1 | 0 | 0.0030 |

PI-05 | 0 | 0 | 0 | 1 | 0 | 0.0030 |

FR-01 | 1 | 1 | 1 | 1 | 0 | 0.0034 |

FR-02 | 0 | 1 | 1 | 1 | 0 | 0.0038 |

FI-03 | 0 | 1 | 1 | 1 | 1 | 0.0038 |

FR-04 | 0 | 0 | 1 | 1 | 0 | 0.0040 |

FI-06 | 0 | 0 | 0 | 0 | 0 | 0.0040 |

FR-07 | 0 | 0 | 0 | 1 | 0 | 0.0040 |

FI-08 | 0 | 0 | 0 | 1 | 0 | 0.0040 |

FH | 0 | 0 | 0 | 0 | 0 | 0.0050 |

FM | 0 | 0 | 0 | 0 | 0 | 0.0050 |

FL | 0 | 0 | 0 | 0 | 0 | 0.0050 |

FA | 0 | 0 | 0 | 1 | 0 | 0.0050 |

PIC-01 | 0 | 0 | 1 | 1 | 0 | 0.0109 |

PIC-02 | 0 | 0 | 0 | 0 | 0 | 0.0080 |

PIC-03 | 0 | 0 | 0 | 0 | 0 | 0.0080 |

PIC-04 | 0 | 0 | 0 | 0 | 0 | 0.0080 |

LIC-01 | 1 | 1 | 1 | 1 | 0 | 0.0076 |

LIC-02 | 0 | 0 | 0 | 1 | 0 | 0.0090 |

Iterative procedure of the algorithm.

_{2} of F2 |
_{3} of F3 |
_{4} of F4 |
_{5} of F5 |
_{6} of F6 |
|||
---|---|---|---|---|---|---|---|

0 | 1.5e-4 | 1.7e-6 | 1.3e-8 | 5.7e-17 | 3.8e-5 | 0.0885 | |

1 | 1.5e-6 | 1.7e-8 | 1.3e-10 | 5.7e-19 | 3.8e-5 | LIC-01 | 0.0961 |

2 | 1.5e-6 | 2.5e-9 | 1.9e-11 | 8.5e-20 | 5.6e-6 | FI-03 | 0.0999 |