State Tracking and Fault Diagnosis for Dynamic Systems Using Labeled Uncertainty Graph

Cyber-physical systems such as autonomous spacecraft, power plants and automotive systems become more vulnerable to unanticipated failures as their complexity increases. Accurate tracking of system dynamics and fault diagnosis are essential. This paper presents an efficient state estimation method for dynamic systems modeled as concurrent probabilistic automata. First, the Labeled Uncertainty Graph (LUG) method in the planning domain is introduced to describe the state tracking and fault diagnosis processes. Because the system model is probabilistic, the Monte Carlo technique is employed to sample the probability distribution of belief states. In addition, to address the sample impoverishment problem, an innovative look-ahead technique is proposed to recursively generate most likely belief states without exhaustively checking all possible successor modes. The overall algorithms incorporate two major steps: a roll-forward process that estimates system state and identifies faults, and a roll-backward process that analyzes possible system trajectories once the faults have been detected. We demonstrate the effectiveness of this approach by applying it to a real world domain: the power supply control unit of a spacecraft.


Introduction
Today's real-world engineering systems are a product of careful design and manufacturing. These systems usually undergo rigorous testing and validation before deployment. However, due to wear and tear from sustained operations, degradation and faults in system components still occur. In addition, unlikely events and unanticipated situations can also create faults. To avoid these negative effects, it is imperative to accurately track system behavior and timely detect and isolate faults [1][2][3][4].
Model-based diagnosis techniques are frequently used to solve these problems. The key idea is to detect the discrepancies between the actual system behavior and the predictions of a model [5]. Traditionally, two distinct scientific communities have employed different kinds of models to implement their own model-based diagnosis: • The Fault Detection and Isolation (FDI) methods capture system behavior using differential equation models, whose foundations are based on control and statistical decision theories.
• The Diagnosis (DX) methods use qualitative model and logical approaches, with foundations in the fields of computer science and artificial intelligence.
More specifically, the FDI community has proposed three typical methods to track and diagnose system behavior using the system model: (1) parameter estimation based methods that estimate the value of particular parameter [6]; (2) state estimation based methods that use observers or filters to estimate unknown variables [7]; and (3) parity space based methods that design a set of residuals by eliminating the unknown variables [5]. On the other hand, many researchers in DX community assume that the system can be modeled as a Discrete Event System (DES) [8] at some level of abstraction. A DES model is characterized by a set of discrete states, a set of observable and unobservable events, and transitions between discrete states. The dynamic behavior of DES is described by partitioning time into discrete points at which events occur [9]. On the basis of this system model, the goal of diagnosis is to find unobservable fault events or discrete fault states. Until now, this technique has been widely applied into many domains such as power transmission lines [10] and telecommunication networks [11,12].
Over the years, some researchers and practitioners in both communities are dedicated their efforts to understanding and bridging the FDI and DX approaches. Cordier et al. [13] gave a systematic comparison of the analytical redundancy relations (ARR) and conflicts, but their analysis was only applied to the diagnosis of static systems. Bregon et al. [14] compared three different structural fault isolation techniques from both communities for linear continuous dynamic systems. Meanwhile, Travé-Massuyès [15] further discussed the facets of diagnosis in the FDI and DX communities, and exemplified how different theories of these areas can be synergistically combined to provide better diagnostic solutions and achieve improved fault management.
In this paper, we only focus on the DX area. In this area, a typical diagnosis approach of DES is based on a diagnoser [16,17], which uses a deterministic finite state machine without emitted events. The diagnosis problem is addressed by compiling the original finite state machine into one that contains only observable transitions and produces the same language in terms of observations. The weakness of this approach is the feasibility of the compilation of the large-scale complex system model into a reasonable size. Some approaches were put forward to overcome this dilemma such as off-line compiler technique [18], distributed diagnosis [19] and hierarchical diagnosis [20].
Simulation-based approach [21,22] is another general method. In this approach, the temporal evolution of system is decomposed into a set of state constraints that hold for the state at each time-step and a set of sequence constraints that restrict the possible transition sequences of such state, so the diagnosis process is performed by checking whether the state constraints and sequence constraints are consistent with the observations. The main problem of this approach is that the number of all possible system trajectories becomes too large to process only after a few time steps.
To address this problem, Williams [21] and Kurien [22] proposed a k Best-First Trajectory Enumeration (BFTE). Unfortunately, trajectory probability is significantly underestimated in this approach, because it ignores the additional trajectories that lead to the same state. Considering this shortcoming, Martin [23] presented a k Best-First Belief State Enumeration (BFBSE) that increases estimator accuracy and uses less memory and computation time. After that, Williams and Ragno [24] introduced a conflict-based A* (CDA*) search algorithm into these methods, so the belief state search process is further accelerated by eliminating subspaces around each state that are inconsistent with observations. For these k best methods, how to choose a suitable k value is the key issue. For real-time operation, a large value usually brings more computational complexity, while a small one loses estimator accuracy and even results in misdiagnosis. Moreover, symbolic techniques are also a feasible method. [9] exploited Ordered Binary Decision Diagrams (OBDD) to encode system model and belief state, so the complete belief state can be estimated. However, it still limits its applicability to relatively simple system. This paper develops a novel approximate simulation-based approach to track both a variety of operational modes of the system and arbitrary combinations of fault conditions, which will enable to determine the most likely system states and trajectories for dynamic diagnosis. First of all, LUG [25] is introduced to describe online monitoring and fault diagnosis of discrete systems. Moreover, the LUG scheme employs the Monte Carlo (MC) technique to sample belief state distribution. The differences between our approach and classical MC technique are twofold. First, the particles in our approach are only focused on sample but not on-line filtering, and particles are assumed to be unit weight throughout simulation. Second, every particle is tagged with unique symbol, so the system evolution trajectories can be easily obtained, and numerous trajectories do not need to be preserved as system evolves. Finally, since Monte Carlo techniques incur sample impoverishment problem, the observation information is combined with prior information to recursively generate the most likely belief states. Although this technique of using observation information has been employed in the literatures [7] and [26], our approach does not need to exhaustively consider all possible successor modes, so it can cope with large number of discrete modes.
The rest of this paper is structured as follows: Section 2 gives some basic definitions about component and system model, simulation-based dynamic diagnosis, and classical belief state update. Section 3 introduces the LUG to describe the dynamic diagnosis process of discrete systems. In Section 4, our approach is formally illustrated and analyzed in detail. Section 5 describes the experimental results on a real-world model: a portion of the power supply control unit of spacecraft. Finally, a conclusion is presented in the last section.

Theoretical Background
This section summarizes basic formalisms and concepts for dynamic diagnosis. The notion of time adopted in this paper consists of a discrete sequence of time step, which is derived from the assumption that the system can be viewed as a synchronous transition system [22].

Component and System Model
Supposed that the system to be diagnosed has n individual components, the component model can be built as a tuple: where: S + are the mode assignment at time t and t + 1, respectively. Guard is the transition event. Some events are observable (e.g., commands issued from external actuator), while the rest are unobservable (e.g., autonomous or fault events). Prob is a transition probability from t S to 1 t S + .
The overall system model G is modeled as a composition of synchronous operation on components. Formally, we can give definition of system model as follows: where 1 2 , ,..., n G G G denote the n component model of the system. SV (system variable) is a set of system I/O variables, which establishes the interconnection with the outside world. SC (system constraint) is a set of constraints that captures the interconnections among components.

Simulation-Based Dynamic Diagnosis
In this subsection, we firstly give the formal definition of static diagnosis, and then extend it into simulation-based dynamic diagnosis. For static systems, the aim of diagnosis problem is to check the joint consistency between observation and a set of constraints including component constraints and system constraints [27]. Since the constraints for each component depend on certain nominal or fault behavior mode, the formal definition of static diagnosis can be given as follow: where y is the observation. State constraints ( ) stateconstraints mode are composed of the component constraints in particular behavior mode mode and system constraints.
If the observations are available not just for one snapshot of system behavior, but for a whole observation period, the diagnosis for static systems will be extended into dynamic systems. The behavior model for dynamic systems not only has the state constraints, but also the relations between states across time: According to the definition of component model, the relations between states ( ) temporalconstraints mode correspond to the transition relation δ . Therefore, the dynamic diagnosis must check the individual observation snapshot for consistency with the state constraints, and satisfy the restriction of the pairs of adjacent states, which can be denoted as: Formally, the definition of simulation-based dynamic system is modeled as follow: where: • G is an entire system model. On the basis of Equations (5) and (6), the aim of simulation-based dynamic diagnosis is to find the system state t B at each time step t, which is consistent with observation t y , and is transitioned from 1 t B − .

Classical Belief State Update
A well known problem of model-based diagnosis is that the number of possible trajectories will become so large to be unmanageable as the system evolves. Typically, the number of trajectory hypotheses is exponential in the number of discrete modes and time steps considered. A general solution is the introduction of a preference criterion among belief states.
Most of the preference criteria are based on quantitative probability. A belief state is a probability distribution over the states of a system, which represents the likelihood of the system in any single state given observation and command sequence. Assuming that the system is Markovian, the belief state is then computed using standard Hidden Markov Model (HMM) belief state update equations: where α denotes the normalization term. This process is known as recursive estimation [28].
It includes two steps: Prediction, in which given a current belief state prior distribution For simulation-based dynamic diagnosis, the Prediction step uses transition relation ( ) temporalconstraints mode to propagate the system dynamics into the future by only considering the current belief state and commands, while the Update step evaluates the estimates by checking the consistency between observation y and state constraints ( ) stateconstraints mode . In this paper, the observation probability distribution is defined using a consistency approach similar to the Livingstone [21,22], and can be calculated as follow: if stateconstraints B y

Exploitation of LUG for State Tracking and Fault Diagnosis
A planning graph represents a relaxed look-ahead of the belief state space that identifies propositions reachable at different depths. It can be typically described as layered graphs of vertices where each level t consists of a proposition layer t P and an action layer t A . More specifically, proposition layer t P denotes the set of propositions at level t, while action layer t A includes all actions that have all of their precondition propositions in t P . Edges between the layers describe the propositions in action preconditions (from 1 [29]. LUG, proposed by Bryce [25,29,30], extends traditional planning graph in the following two ways. Firstly, since uncertainty is considered in Bryce's approach, an extra effect layer t ε +Δ is introduced into each level t, where 0 Δ → . An effect is in the effect layer t ε +Δ if its associated action is in the action layer t A and every one of its antecedent propositions is in t P . As a result, the uncertainty planning graph can be denoted as a sequence of layers ( ) . Secondly, LUG implicitly represents multiple uncertainty planning graphs by collapsing the graph connectivity into one uncertainty planning graph and uses annotations, called labels, to retain information about multiple worlds. Bryce has successfully applied this data structure to solve the probabilistic conformant planning problem with actions whose effects are uncertain.
The probabilistic conformant planning problem is closely related to state tracking and fault diagnosis of discrete systems [31], because the task of generating most likely belief states to match given observations can be viewed as a probabilistic plan generation problem. In particular, both two problems exhibit many similar features, such as finite state space, the uncertainty of the initial state and action effects, reachable goals and so on. Therefore, LUG is introduced to represent dynamic diagnosis process of discrete systems. Proposition layer, action layer and effect layer in LUG correspond to the possible belief states, transition events and the possible results of transition in simulation-based dynamic diagnosis.
In order to address the exponential increasing problem of possible belief states, the Monte Carlo (MC) technique [32] is employed to sample belief states in our method. First, unlike the traditional method using exact quantitative probability, we turn to approximation of probability by means of particles. The number of particles in a particular belief state represents the likelihood of this belief state. This approximate strategy allows our approach to focus on the highly probable belief states, without checking a prohibitively large number of unlikely belief states. Second, the MC technique in this paper is only used for sampling, and particles are assumed to be unit weight throughout simulation. Unlike generic particle filter, our work does not use observations to weight particles for re-sampling. For instance, assuming that belief state 1 B has 100 particles and a transition occurs from belief state 1 B to belief state 2 B with probability 0.95, there will be 95 particles in belief state 2 B after performing this transition. Third, every particle is tagged with unique symbol, which can be used to analyze the system possible evolution trajectories.
A simple circuit shown in Figure 1 is used as an illustrative example. The relay in this circuit is modeled as an automaton with five discrete modes: S1: open, S2: closed, S3: stuck_open, S4: stuck_closed and S5: unknown. The mode transition is probabilistic. When the initial mode is S1: open with the command close, the possible successor modes include S2: closed, S3: stuck open and S5: unknown with the transition probability 0.989, 0.01 and 0.001, respectively.  According to the formal definition of classical belief state update in Section 2.3, the Prediction step in LUG is from proposition layer t P to effect layer t ε +Δ , and the Update step is between effect layer t ε +Δ and proposition layer 1 t P + . The label below each belief state contains the tagged particle i x . At time t, 1000 particles are initialized using prior distribution of current belief state. If current mode of relay is unknown, the uniform distribution will be adopted. As can be seen from Figure 2, four possible belief states: S1, S3, S4 and S5 with 398, 200, 202 and 200 particles are captured at time-step t + 1, and possible system evolution trajectories also can be obtained by the tagged particles.

Proposed State Tracking and Fault Diagnosis Algorithm
This section presents the dynamic diagnosis process of discrete systems based on LUG in greater detail. First of all, a novel one step look-ahead technique is introduced to capture the fault mode with low likelihood. Moreover, the overall algorithm is described. Finally, the correctness, incompleteness and computational complexity are analyzed from a theoretical view point.

One Step Look-Ahead
As mentioned earlier, this paper introduces unit weight particles to evaluate the likelihood of belief state and filter out a prohibitively large amount of unlikely belief states. However, unit weight particles will bring a serious problem called sample impoverishment. Now the relay example is also used to describe this situation. To simplify the problem, belief state estimation in a single time-step is illustrated by using one initial belief state S2 with 200 particles. As can be seen from Figure 3, Prediction step is firstly executed to generate two possible transition results: 1 ϕ with 198 particles and 2 ϕ with two particles. Effect 3 ϕ is discarded, because it cannot be assigned a particle with the low transition probability 0.001. Then, Update step uses current observation to update belief state distribution. If successor belief state S1 at time t + 1 is in conflict with observation, only S4 will occupy all the 200 particles after normalization. In this case, classical method leads to losing the possible solution S5 resulted from effect 3 ϕ . The reason is that fault events usually have a very low prior probability. When a system is in its normal condition, the high probability transition results are consistent with observations, so only the solutions with low likelihood are removed. However, once a fault occurs, maybe no particles can transition into fault state. Therefore, the real fault cannot be reliably detected. Figure 3. Simple one-step belief state estimation.
To tackle sample impoverishment problem, our proposed algorithm adopts a novel strategy that combines prior transition probability and observation information to choose the most likely successor belief states. In particular, best-first A* search [33] is employed to recursively calculate the a posteriori transition probability: where i O P and i T P denote the observation probability and the prior probability for transition i, respectively. Once all the particles can be assigned according to a set of generated a posteriori transition probability 1 2 { , ,..., } i R R R P P P , the search process will be terminated.
Unfortunately, computing the normalization term Therefore, we transform this equation into another form, and then employ an approximate strategy to converge the real value: In Equation (10) (11) where l denotes a set of inconsistent transitions that have been generated during the enumeration process. Note that the normalization term needs to be recalculated, once a new inconsistent transition is determined.  Table 1 shows the estimation process of the relay example using look-ahead technique, where R P is a list to preserve all the obtained a posteriori transition probability. In first loop, path 2 1 S S → is generated. Since it is an inconsistent path, the a posteriori transition probability 1 R P is set to 0. Second loop analyzes path 2 4 S S → , which is consistent with the observation. The a posteriori transition probability for this path is calculated as 2 0.01/ (1 0.989) 0.91 After that, we continue to generate the third path 2 5 S S → , because these two expanded paths cannot occupy all the 200 particles. Although the prior transition probability for this path is only 0.001, the a posteriori transition probability increases to 3 0.001/ (1 0.989) 0.09 Since 200 particles can be assigned using generated a posteriori transition probability list (0; 0.91; 0.09), the search is terminated. Correspondingly, belief state S4 and S5 own 182 and 18 particles. We see that our proposed algorithm can capture an additional belief state 5 S , when compared to the traditional method.

Description of the Approach
Overall, our state tracking and fault diagnosis approach for dynamic systems are composed of two main steps: • A fast roll forward process that uses the forward propagation to extract the likely belief states at each time-step.
• A quick roll back process using tagged particles to generate the possible trajectories.
The roll forward process is shown in Algorithm 1. Since the procedures have been described in Section 4.1, we will not provide more explanations.
Algorithm 2 describes the roll back process to generate the possible trajectories The simple relay model is again considered as an example to further describe the combination of roll forward and roll back process. Assumed that only S2 with 1000 particles is available at time t − 1, Command open is issued at time t − 1. At time t, S1 and S5 are consistent with observation. After executing command close at next time step, S3 and S5 match with measurement. Figure 4 shows the LUG structure for relay at these two successive time steps. It is easy to see that the probability of belief state S3 and S5 at time t + 1 are 90.8% and 9.2%. Three different evolution trajectories can be rolled back to obtain as Trajectory1 = {S2, open, S1, close, S3}, Trajectory2 = {S2, open, S1, close, S5} and Trajectory3 = {S2, open, S5, close, S5} with the probability 90.8%, 9.1% and 0.1%, respectively. Figure 4. Two time-step state estimation using LUG for relay.

Correctness and Incompleteness
Since the proposed approach runs for a whole observation period to track and diagnose the system, we should prove the correctness of the Prediction step and Update step at each single time-step. As mentioned earlier, the aim of the Prediction step is to estimate the system state at the next time-step based on the current belief state, commands and temporal constraints, while the Update step checks the consistency between observations and state constraints in the estimated system state. Struss and Dressler [34] derived a correctness result for the consistency test in a static system. This result is relevant to the present approach, because it guarantees the correctness of the Update step. In this subsection, we extend this analysis to our approximate simulation-based dynamic diagnosis.
For the Prediction step, the proposition logics corresponding to the transition relation in each component model are performed to reason the possible successor modes. Without loss of generality, all the possible transition results in our proposed approach can be divided into three disjoint classes: • Case 1: the effect i ϕ can be assigned more than one particle according to the prior transition probability T P , and the observation is consistent with successor belief state (See path 2 in Figure 3). • Case 2: the effect i ϕ can also be distributed more than one particle, but the observation refutes the successor belief state (See path 1 in Figure 3).
• Case 3: the effect i ϕ cannot be assigned one particle using the prior transition probability T P (See path 3 in Figure 3).
For Case 1 and Case 2, these possible successor modes can be assigned more than one particle, and may be kept or dropped after the consistency test. In terms of Case 3, since these successor modes usually have low prior transition probability, the number of the remaining particles determines whether the proposed approach needs to check the consistency for these modes. Actually most of the probability space can be covered by just a few modes in the state estimation of discrete systems. If all the obtained belief distribution can reach the estimation accuracy determined by a predefined number of particles, all the remaining low likelihood modes can be discarded.
On the basis of the above analysis, the present approach is more efficient than using a consistency test for every mode, and achieves a close enough approximation. Moreover, it can be implemented as an anytime algorithm, and the trade-off between accuracy and time efficiency can be achieved by varying the number of particles. Finally, this analysis process also reveals the correctness and incompleteness of our approach.

Complexity
In this subsection, the complexity in a single time-step will be analyzed. Previously, we assumed that the system to be diagnosed is modeled as n concurrent individual components with b possible successor modes, and the number of particles is p_num. In addition, the computational complexity for a single consistency test is evaluated as a constant C to simplify the problem.
Martin et al. [23] analyzed the complexity of best-first A* search for a single initial state, and concluded that the best case complexity is roughly n b × and the worst case complexity is n b . In terms of the roll forward process, since the number of all the possible successor modes is n b in a single time-step, all the particles transition into the successor mode with the lowest prior probability in the worst case, and the complexity is roughly ( ) In the best case, all the particles transition into the first possible successor mode, so the complexity is ( ) O n b C × + . For the roll back process, the worst case has two different conditions: (1) all the particles exist in different modes ( _ n p num b < ); or (2) each possible successor mode captures more than one particle ( _ n p num b > ). Considering these two cases together, the complexity of this transversal process is (min( _ , )) n O p num b . On the other hand, the best case complexity is also (1) O when all the particles are in a single mode. As a summary, Table 2 shows the complexity for our proposed algorithm.

Experimental Results
We apply our state tracking and fault diagnosis approach on a simulation model of a real-world system-a selected subset of the power supply control unit of a spacecraft. This subsystem, shown in Figure 5, consists of an input Sig_in from a battery and five outputs: (1) output Sig_out1 directly connected to Load A; (2) output Sig_out2 connected to Load B that is controlled by relay K1; (3) output Sig_out3 connected to Load C that is controlled by hot backup DC/DC module (DC/DC_h); (4) output Sig_out4 connected to Load D that is controlled by both hot backup DC/DC module and relay K2; and (5) output Sig_out5 connected to Load E that is controlled by cool backup DC/DC module (DC/DC_c). An external actuator issues commands cmd1, cmd2, cmd3 and cmd4 to control the relay K1, K2 and cool backup module. In our experiment, six sensors are used to collect observations: system input: Sig_in and system outputs: Sig_out1, Sig_out2, Sig_out3, Sig_out4 and Sig_out5.
The schematics of the hot backup DC/DC module and cool backup DC/DC module are presented in Figure 6. Four components main1, main2, spare1 and spare2 are voltage converting units. Figure 6a shows the hot backup DC/DC module. The function is that component selector selects the voltage with higher value from main1 and spare1 to output. In the cool backup DC/DC module (see Figure 6b), the external commands cmd3 or cmd4 switch relays K3 and K4 and determine the output voltage.  This selected subset of the power supply control unit involves nine components: four voltage converting units, four relays and one selector. More specifically, the voltage converting unit has five different discrete modes: nominal (M1), overvoltage protection (M2), overvoltage protection failure (M3), voltage conversion failure (M4) and unknown mode (M5). Table 3 gives the mode transition matrix for this component. In addition, the relays and selector also contain five discrete modes. For lack of space, the transition matrixes for these components are not shown in this paper. Therefore, we can calculate that the system can potentially operate in roughly 9 5 1953125 = distinct modes at each time-step, and the full system trajectories will even reach up to 1953125 t as the system evolves.
Several groups of simulations were conducted on a test set, which includes the nominal scenario and the occurrence of a fault in one, two components and three components at the same time. The experimental results refer to a C++ implementation of the diagnostic algorithm using a personal computer featuring an Intel (R) Core (TM) i3 CPU with 2.27 GHz, 4GB RAM (Lenovo, Kunshan, China), and are presented in the following subsections.

Basic Results
The aim of these simulations is to evaluate the space and time performance results of our state tracking and fault diagnosis method. For these simulations, nominal, single fault, two faults and three faults are considered, and the number of particles is set to 500.
The good experimental time complexity results are confirmed by looking at the computational cost in terms of CPU time. Table 4 reports the average and the maximum CPU time for single-step mode estimation. The average time increases when more faults are injected. However, the CPU time is very low with three faults so that we claim that the algorithm can perform on-line.
For the belief state search problem, the number of expanded nodes is used to measure the space performance of algorithms. Moreover, since the consistency function usually consumes plenty of computing resources, the so-called times of consistency function are also employed to qualitatively evaluate the time performance. On the basis of the above consideration, the average and maximum number for these two values are also evaluated in Table 5. As expected, these two values will increase slowly as more faults are considered, and generally reach a maximum value at the fault detection time, because a large amount of nodes are expanded to check the consistency with observation at that time.

Number of Particles
In this subsection, we conduct a set of simulations in the nominal scenario with 10 time-steps to test the sensitivity of the number of particles to the performance of our approach. The number of particles varies from 100 to 1000 and typical experimental results are shown in Figure 7. As can be seen from this figure, the performance of our method is relevant to the number of particles. As the number of particles increases, more belief states and trajectories are obtained, and the time consumption also goes up.

Comparison with Other Algorithms
We now compare the performance of our approach with two k best methods: (1) k best BFTE algorithm and (2) k best CDA* algorithm with respect to the following aspects: (1) estimation accuracy; (2) the consumed time as the number of obtained belief states increases; and (3) the sensitivity of different approaches' performance to estimation time steps. As discussed earlier, k best methods choose k best trajectories or belief states to track system dynamics, and the value of k determines their estimation accuracy and performance. Blackmore et al. [35] pointed out that their estimation accuracy depends on whether or not k is large enough for real belief state distribution. In other words, when the distribution over belief state is relatively flat, k best methods maybe lead to losing the solution. Compared to these methods, our approach is robust for this situation. Generally speaking, the number of particles directly determines the estimation accuracy in our approach. Assumed that 100 particles are used, the loss of belief state probability density is less than 1% at each time step. If the particles increase to 500 or 1000, the loss will reduce to less than 0.2% or even 0.1%. Therefore, the number of obtained belief states at each time-step is dynamic adaptive and critically dependent on current belief state distribution. For a relatively concentrated distribution, our algorithm just needs to calculate a smaller number of belief states. On the other hand, more belief states will be obtained, when the desired distribution is relatively flat. Figure 8 shows the maintained belief state probability density over many cycles. Since the k best CDA* algorithm only improves computational performance but not estimation accuracy when compared to k best BFTE algorithm, only k best CDA* algorithm is shown in this figure. It is easy to find that the reduction in probability density is exponential in the number of time steps for both LUG and k best algorithm, but the rate of decay is clearly slow for our proposed method.
In second experiment, we investigate a set of simulations with 10 time-steps to show the time consumption of different algorithms varying predefined parameter. In Table 6, P N , B N and T N denote the number of particles, belief states and trajectories, respectively. It is easy to see that k best CDA* algorithm has a better time performance than the k best BFTE algorithm. Moreover, the difference between the proposed approach and k best CDA* algorithm can be analyzed in case that the same number of belief states B N or trajectories T N are obtained. When the k value is smaller than 3, the time performance of k best CDA* algorithm is good enough. However, when the k value is set to 10 ( 10 T N = ), k best CDA* algorithm captures four belief states, but the time result is 4185.69 ms. On the other hand, the proposed method ( 100 p N = ) can captures eight belief states, and only consumes 263.87 ms. Therefore, the proposed method achieves more estimation accuracy and consumes less time, and this advantage becomes significantly apparent as the number of obtained belief states or trajectories increases.  Figure 9. The performance results for different time step. Figure 9 shows the performance results as the estimated time-step increases for the third experiment. The number of particles in our approach is set to 100 and 500, while both BFTE and CDA* consider the value of k as 1 and 5 together. As can be seen from Figure 9a, the time consumption of BFTE and CDA* with k = 5 increase sharply, and the other curves go up smoothly. Since Figure 9a cannot clearly show the differences among our approach, BFTE and CDA* with single-estimation, Figure 9b zooms in these curves. This figure shows that our approach with 500 particles has more time consumption than 100 particles. It is in line with our previous analysis in Section 5.2. Similarly, we can also see that single-estimate results for BFTE and CDA* outperform our approach.
As a summary, k best BFTE and CDA* algorithm are well suited for a system with a relatively concentrated belief state distribution, while our approach can be applied for the systems with either concentrated or flat distributions. Moreover, our approach has better estimation accuracy and outperforms the k best BFTE and CDA* algorithms for sufficiently sized belief states.

Conclusions
In this paper, we propose a novel simulation-based fault diagnosis approach, which models the systems as concurrent probabilistic automata and applies LUG to state tracking and fault diagnosis of these systems. Moreover, the MC technique is introduced into this scheme, so our algorithm is anytime, and can balance between accuracy and time efficiency by varying the number of particles. On the one hand, the particles control the breadth of best-first A* search and maintain most likely belief states; on the other hand, the tagged particles can be used to generate system evolution trajectories. Finally, this paper analyzes the sample impoverishment problem resulted from the MC technique, and employs a novel recursively one step look-ahead strategy to mitigate this situation and improve the estimation accuracy.
The method has been successfully applied to a non-trivial real-world example: a power supply control unit of a spacecraft. The experimental results show its satisfactory performance including estimation accuracy, time and space complexity. It is also possible to diagnose the system without making any simplifying assumption such as single fault. In future work, we will introduce some variance into our predefined probability transition matrix, because the fixed transition probability in our experiment is relatively simple. Moreover, distributed diagnosis techniques can efficiently decrease the computational complexity for large-scale complex systems, so this is another research direction for the future.