Article Function Based Fault Detection for Uncertain Multivariate Nonlinear Non-Gaussian Stochastic Systems Using Entropy Optimization Principle

In this paper, the fault detection in uncertain multivariate nonlinear non-Gaussian stochastic systems is further investigated. Entropy is introduced to characterize the stochastic behavior of the detection errors, and the entropy optimization principle is established for the fault detection problem. The principle is to maximize the entropies of the stochastic detection errors in the presence of faults and to minimize the entropies of the detection errors in the presence of disturbances. In order to calculate the entropies, the formulations of the joint probability density functions (JPDFs) of the stochastic errors are presented in terms of the known JPDFs of both the disturbances and the faults. By using the novel performance indexes and the formulations for the entropies of the detection errors, new fault detection design methods are provided for the considered multivariate nonlinear non-Gaussian plants. Finally, a simulation example is given to illustrate the efficiency of the proposed fault detection algorithm.

Among the above results, for dynamic stochastic systems, the filter-based approach has been shown as an effective way where generally the noises or disturbances are supposed to be Gaussian [21,22] and the filter design objective is to realize a minimum variance residual.However, in many chemical and manufacturing processes, the inputs involved are non-Gaussian [19,20,[23][24][25][26].For example, in paper making, the length of fibre and the size of filling molecular weight are important input variables, but none of them would obey a Gaussian distribution.This is simply because most random variables and random processes are bounded in paper web formation system.Actually, even for stochastic systems with Gaussian inputs, nonlinearities in the system may lead to non-Gaussian outputs.For the non-Gaussian variables (vectors), it is well-known that the expectation and variance are insufficient to characterize their statistics.As a result, new measures should be adopted.Entropy is a known measure that describes the average information contained in the given PDF (JPDF), which has been widely used in information, thermodynamics and control fields [6,8,[27][28][29][30].By minimizing the entropy, higher order moments can be minimized with respect to the random variables [13,26,27,[31][32][33].
Nevertheless, since entropy is an integral operation of PDF while a PDF is a non-linear function with an integral constraint and a positive constraint, it is not easy to find the dynamical relationships between the PDFs of the input and the concerned output and further to formulate the entropies of the distribution of outputs.Besides, for random vectors, the formulation for the JPDF of the output becomes much more complicated than the case of random variables even if the transform matrix is linear.For example, in [6], π k+1 ∈ [a, b] n is supposed to be a non-Gaussian continuous random vector, D k ∈ R m×n , after the multivariate mapping θ k+1 = D k π k+1 , the JPDF of θ k+1 should be discussed for three different cases based on D k and θ k+1 is not continuous any more except that D k is invertible.Thus, the existing concept of entropy has to be extended.In fact, the approach presented in [6] holds only for linear systems with special form of system matrices when applied to FD.
In this paper, the filter-based FD approach for uncertain multivariate nonlinear non-Gaussian stochastic system is further investigated using the entropy optimization principle.Firstly, a novel filtering and filter-based FD framework is established to construct a residual such that the fault can be detected from the changes of the residual.Secondly, the formulations between the JPDFs of the disturbances and faults and those of the detection errors are established.The entropy optimization principles are then presented to calculate the gain matrix of the optimal FD filter.
The remainder of this paper is organized as follows.Section 2 presents the preliminaries.In Section 2.1 and Section 2.2, the nonlinear difference model and the corresponding filter model are described.In Section 2.3, the definition of entropy is introduced for the detection errors and the basic relationships are formulated between the entropy of the input and the output.The main results are given in Section 3, where the entropy optimization performance index is proposed in Section 3.1 and its corresponding formulations for entropies are established in Section 3.2.In order to calculate the JPDFs of the detection errors, the simplified algorithms are provided in Section 3.3 and the FD filtering algorithms are finally given in Section 3.4 to compute the optimal FD filter gain using the proposed entropy optimization principle.A simple simulation example is provided in Section 4 to demonstrate the effectiveness of the main results.Conclusion is given in Section 5.
In the following, unless stated otherwise, matrices are assumed to have compatible dimensions.The identity and zero matrices are denoted by I and 0 respectively with appropriate dimensions.For a square matrix M , its inverse and determinant are denoted by M −1 and det M respectively.For two real vectors v 1 and v 2 , the notation v 1 ⪰ v 2 means that every element of v 1 is no less than the corresponding one of v 2 and ∥v 1 ∥ 2 is used to denote the Euclidean norm.For a multivariate nonlinear smooth function y = g(x), ∂g(x) ∂x denotes its Jacobian matrix.For a random vector z, the formula P {z ⪯ τ } represents the joint probability of event z ⪯ τ , F z (•) denotes its joint probability distribution function, γ z (•) denotes the corresponding joint probability density function (JPDF), and H(z), ε(z) are used to denote its entropy and expectation, respectively.

Plant Models
Consider a multivariate nonlinear stochastic discrete-time system described by where is the disturbance influencing on the output and δ k ∈ R q is the fault to be detected.F 0 (., ., .) is a known multivariate Borel measurable and smooth nonlinear functions of their arguments, △F represents the uncertainty satisfying ∥△F ∥ 2 ≤ δ 0 and | det ∂F 0 (.,.,.) ∂w k | ̸ = 0 holds for any w k ∈ [a, b] p .It should be pointed out that δ k , w k and v k are supposed to be arbitrary bounded independent random vectors rather than Gaussian ones, which is different from the existing FD approaches based on the Kalman filtering theory.It is noted that δ k can also represent the abrupt change of the model parameters.This model is actually a generalization of those where only additive faults or unexpected changes of model parameters are concerned.
Generally speaking, two groups of approaches can be used to determine γ x 0 (τ ), γ δ (τ ), γ v (τ ), and γ w (τ ).One is the direct measurement using some advanced instruments such as the digital camera.For example, with the developments in image processing, several digital cameras have been used to measure the distribution of the flame in the flame combustion system, which can be further transformed into the temperature distribution.The other is the kernel estimation technique based on the open loop test [34].In some practical processes (e.g., paper web formation control and particle size distribution control), enormous data can be stored and used to analyze the model of both the disturbances and the faults, with which some probabilistic properties can also be obtained, where the involved random vectors (including both the disturbances and the faults) may also obey the non-Gaussian statistic behaviour [14,15,18,32,35].
and m ≤ n hold.Further, C k is with a full row rank at every sample time, and its first m columns also have full rank (otherwise, the column can be re-arranged to guarantee this assumption).D k ∈ R m×m is an invertible matrix.

Filter and Error Dynamics
For the nonlinear dynamic system given by (1), the filter can be described by where U k ∈ R n×m is the filter matrix gain to be determined.Combining (1) with (2), the resulting estimation error e k = x k − x k satisfies The residual signal for the fault detection is therefore defined by Related to (4), the vector denoted by can be regarded as a deterministic term with an unknown matrix gain U k .The main difficulty is that the term F 0 (x k , δ k , w k ) in ( 4) is both nonlinear and multivariate at each sample time.As the detection error, it is supposed that êk is defined on [α, β] m , where α and β can also be respectively chosen as −∞ and +∞.In the following, we will establish the relationships recursively between the JPDFs of x 0 , δ k , v k and w k with êk .
Remark 1.In [8], e k is considered instead of e k for simplicity based on a rigorous assumption, which may lead to considerable conservations.As such, in this note, e k will be concerned directly for which e k should be affected primarily by the fault δ k and minimally by the disturbances v k and w k .

Entropy and Its Formulation
H(z) can be regarded as a measure of randomness of z, which is defined as follows.
Definition 1.The entropy for a continuous random vector z is defined by where Based on Remark 1, the main task for fault detection is to find U k such that H( e k ) can be influenced by δ k maximally and by w k or v k minimally.
It can be shown that H( e k ), as the conditional entropies of as well as the underdetermined gain U k , which can be further represented by In order to calculate the entropies, the JPDFs of the errors have to be formulated in advance.The following result reveals the relationship between the JPDFs of the input and output, subject to a multivariate nonlinear mapping.
Lemma 1.For a multivariate Borel smooth function φ = Γ(σ), if σ is a random vector with known JPDF γ σ (τ ), the JPDFs of φ can be given by where for a given τ, Ω k (τ ) is the definition domain of the random vector σ with Proof: For a given τ , Ω k (τ ) is the set consisting of all events ρ such that Γ(ρ) ⪯ τ .For the concerned function, φ is also a random vector.Based on Probability theory [27], the joint distribution function of φ can be given by Hence, Equation ( 8) can be obtained by taking derivatives on both sides of (9).

Q.E.D
The next step is to consider the special relationship between êk and the terms e k , v k in (5), which is related to the linear mapping φ = Θσ, where Θ ∈ R m×n is a compatible real matrix.
In the following, Lemma 2 and Lemma 3 will be given respectively according to whether the concerned matrix Θ is invertible.
Lemma 2. If m = n and Θ is invertible, the following relationship holds Proof: If Θ is invertible, it can be verified that for given τ , there exists τ 0 such that τ 0 = Θ −1 τ .Thus, by using Lemma 1, we have This process also can be regarded as a generalization of the classical relationship for random variables (see, e.g., Chapter 6, [27]).Q.E.D In the other case, Θ is a singular matrix.Corresponding to the structure of C k , supposing Θ is with a full row rank.In this case, there exist a low-triangle invertible matrix T 1 and an upper-triangle invertible matrix T 2 such that where I m represents an m−dimensional identical matrix and T 2 is invertible with positive diagonal elements.
To simplify the presentation, we denote with compatible dimensions.This implies Lemma 3. If Θ ∈ R m×n is with full row rank and m < n, then for the linear mapping φ = Θσ, the following relationship holds with the auxiliary integral argument η being denoted by η := , where T 1 and T 2 are denoted by (11) and τ (2) is denoted by (12).
Based on Lemma 1, task 1 and 3 can be solved as follows.From ( 12), it can be shown that the JPDFs of φ and σ should satisfy On the other hand, (13) implies that
which implies that Lemma 3 is consistent with Lemma 2 when both the row rank and the column rank of Θ is m.

Performance Indexes
Following the above procedures, the fault detection objective can be performed by judging the changes of êk at every sample time, where the entropies of êk only in the presence of v k and w k are minimized, but the entropies resulting from δ k are maximized.After the corresponding performance indexes are established, the involved filter gain can be designed based on the entropy optimization principle.
Generally speaking, there are two main tasks to finish in the following procedures.The first one is to provide appropriate optimization principles to achieve the FD objectives, which are represented by the entropies of êk .The second one is to formulate the entropies and PDFs of êk in terms of the PDFs of x 0 , v k , w k and x k , δ k , as well as ∆F .
In this subsection, we will focus on the entropy optimization principle.The JPDF of e k (e k ) in the presence of v k , w k and ∆F and in the absence of δ k can be represented by γ 0 êk (γ 0 e k ), while that in the presence of δ k , ∆F and in the absence of v k , w k can be represented by γ 1 êk (γ 1 e k ).Similarly, we denote as the entropy of êk in the presence of v k , w k , ∆F and in the absence of δ k (in the presence of δ k , ∆F and in the absence of v k , w k ).Concretely it can be further denoted that for i = 0, 1, where Based on (2), ( 4), (5) and Assumption 3, it can be claimed that only the first m rows of U k are related to H i ( e k+1 ) while the last n − m rows are irrelevant to the entropy.This means that in this case, (19) are redundant vectors for the optimization, where U k,m+1 = • • • = U k,n = 0 can be selected under this circumstance.
Since U k is an n × m matrix, in order to use conventional optimization techniques, we denote where U ki is the row vector of U k and thus u k ∈ R mn×1 is a stretched column vector.The entropy optimization performance index is proposed as follows where R 1 and R 2 are pre-specified matrix weight and constant weight respectively.If there exists a filter such that J N (ê k ) is minimized for each sample time N, then it is called as an entropy optimization FD filter.
Remark 3. It is noted that entropy optimization is corresponding to variance optimization for Gaussian signals [32].For example, the entropy optimization principle for FD problems is in parallel to the main results in [2], where linear Gaussian systems were studied and the minimax technology was applied to the variance of errors.Remark 4. To enhance the FD filtering performance, the expectation vector ε( e k ) can also be included in the cost function described in J N (ê k ).We can consider the more general performance indexes than J N (ê k ) where the entropies of the error resulting from x 0 , v k , w k and δ k are considered simultaneously.However, noting that there is no added difference between the two performance indexes in optimization context, we will focus on J N (ê k ) for the FD problem in the following subsections.
Remark 5.In the entropy optimization performance index, R 1 and R 2 are two pre-specified weights.R 1 corresponds to the energy.If there is no restriction on the energy, R 1 can be selected as 0. Generally, the smaller R 1 is, the easier the fault can be detected.R 3 can also be added as the weight of H 0 (ê k ).In that case R 2 and R 3 are relative, so we set R 3 = 1 and regulate R 2 .It is noted that the cumulative performance index is adopted in this paper.If there is no sum operation, the index corresponds to the "greedy" one [36].However, it is shown that the use of long-range prediction can overcome the problem of stabilizing a non-minimum phase plant with unknown or variable dead-time.As such, in this paper, (20) is employed.Moreover, Entropy optimization is equivalent to the variance optimization for Gaussian signals.It can be verified that the entropy optimization is consistent with the results for linear Gaussian systems.

Formulations for the Error JPDFs
In this subsection, the formulations for the JPDFs related to the FD performance index will be discussed.The JPDFs γ i e k have to be calculated in order to formulate γ i êk based on Lemmas 1-3, and then to provide the representation of H i (ê k ) (i = 0, 1).
For a given τ ∈ [α, β] n , the domains of definition for several involved functions are defined as follows where the notation ⪯ have been defined in the ending of the Introduction section.
To summarize, the following result can be obtained.
Proof: This result can be obtained similarly to Lemma 1 using the independence property in probability theory.

Q.E.D
For the FD problem, êk is used as the residual to detect the fault.In order to calculate the entropies of γ i êk (τ ) (i = 0, 1), we denote two auxiliary vectors as follows and correspondingly to (11), we suppose that there exist T 1k and T 2k such that Proof: The formulations of γ i s 1k (τ ) can be obtained using Lemma 2 and Lemma 3. Noting êk = s 1k +s 2k , the result can be provided. Q.E.D

Simplified Calculation of the Error JPDFs
In the above results, the formulations of the entropies have been reduced to the differentials of some multiple integrals that depend on their integral domains.To simplify the design methods, the following assumption is introduced.
] ̸ = 0 holds for any x k in their operation domain.
Different from Theorem 1, under Assumption 4, simplified algorithms can be further obtained to calculate γ i e k (τ ) and then to formulate γ i êk (τ ) (i = 0, 1) by constructing the following auxiliary multivariate functions.
Theorem 3.Under Assumptions 1-4, γ 0 e k (τ ) and γ 1 e k (τ ) can be calculated by and where and Proof: Firstly, from ( 25) and ( 28), it can be claimed that ) is an one-to-one mapping from (δ k+1 , x k ) to (δ k+1 , e k+1 ) under Assumption A.4.Thus, the PDF of (δ k+1 , e k+1 ) can be represented by that of (δ k+1 , x k ) (see e.g., [27]).On the other hand, random vectors x k and δ k+1 can be regarded as mutually independent ones under Assumption A.1.As such, based on the special structure described by (25) and the notations in (23), it can be verified that which means that (30) holds.Similarly, by considering the auxiliary vector and function described by η 0 k+1 = Ψ 0 (w k+1 , x k ), we can obtain (32).This procedure can also be used to prove (29) and (31).Q.E.D With γ i e k (τ ) (i = 0, 1), γ i êk (τ ) can be obtained by using Theorem 2 so that H i (ê k ) can also be formulated for J N (ê k ).

Optimal FD Filter Design Strategy
Based on the above procedures, it can be claimed that the optimization is required for the performance index J N (ê k ).
It is noted that (20) leads to where The optimal filtering strategy can be obtained through where an explicit function from other arguments to u k can be further determined.The principle of optimality can thus result in the optimal FD filtering law for the whole process.To simplify the filter structure, a recursive design procedure is formulated in the following.Denote As a function of e k , w k and τ, it can be approximated to give where The recursive algorithm can be provided to determine the gain of the entropy optimization FD filter.
Theorem 4.An entropy optimization FD filtering strategy for J N subject to nonlinear error model ( 4) is given by for a weight R 1 > 0 satisfying Proof: Firstly, it can be seen that Substituting ( 39) and ( 42) into (37) yields recursive suboptimal control law (40) for all k = 0, 1, 2, • • • , +∞, under condition (41).It should be pointed out that the above algorithm given by ( 40) results from a necessary condition for optimization.To guarantee the sufficiency, the following second-order derivative should also be satisfied which can be guaranteed if R 1 is selected sufficiently large.Q.E.D Remark 6.The real-time suboptimal FD filter design algorithm can be summarized as follows: • Initialize x 0 , x 0 = ε(x 0 ) and u 0 ; • At the sample time k, compute γ i e k+1 (τ ), τ ∈ [α, β] n based on Theorem 3; • At the sample time k, compute γ i êk+1 (τ ), τ ∈ [α, β] m and then obtain H i (ê k+1 ) (i = 0, 1) via ( 18); • Calculate ∆u k and u k using equation ( 40) and (38); • Increase k by 1 to the next step.
The schematic diagram and flow chart are given in Figures 1 and 2 to illustrate how to apply our function based fault detection method to multivariate uncertain non-Gaussian system step by step.
According to (2), the filter can be formed as follows: Thus, the estimation error system is driven by and thus In the simulations, x 0 is set to be under uniform distribution on the interval [0, 1] 2 and thus x 0 = ε(x 0 ) = [0.5, 0.5] T , u 0 is set to be [0, 0] T , the weights are set to be R 1 = 1 and R 2 = 0.1.Figures 3-5 show the signals for the fault and disturbances, while Figures 6-8 show the corresponding pdfs of the signal.Figures 9 and 10 are provided to demonstrate the residual signals êk in the absence and in the presence of fault emerging respectively.When the random fault occurs, it is shown that the detection error increases significantly.Figure 11 is the 3-D mesh of the system output y, which shows a remarkable change of the pdf γ y when the fault occurs at sample time 50.Figure 12 is the optimization performance index along time t.The result shows that a satisfactory FD performance can be obtained through the error dynamics by optimizing the entropies of the detection errors.

Conclusions
Since the classical FD approach using the Kalman-filter theory is insufficient to apply to the stochastic systems with non-Gaussian variables, a new FD framework is proposed in this paper for dynamic multivariate nonlinear stochastic systems.The entropy optimization principle is established for the concerned nonlinear detection error system, as represented by a non-Gaussian stochastic system.The main design principle is to maximize the entropies of the residual errors when the faults occur and to minimize the entropies of the residual errors resulting from other stochastic noises.For this purpose, new relationships are provided to calculate the JPDFs of the detection error in terms of the JPDFs of both the disturbances and the faults.As such, recursive approaches can be constructed to formulate the entropies of the detection errors.Combining the formulations with the novel performance indexes based on the entropy optimization principles, the recursive algorithms are provided to calculate the gain of the optimal FD filters.The advantages derived from the proposed fault detection approach are summarized as following • The detected fault and system noises do not have to be Gaussian.
• The entropy optimization principle for FD problems is in parallel to the main results to [2], where linear Gaussian systems were studied and the minimax technology was applied to the variance of errors.It has therefore generalized variance optimization for Gaussian signal.
• The fault detection algorithm only uses the JPDF of the residual signal to calculate the filter gain matrix.There is no need to measure the system output pdf.This constitutes a major advantage compared with [37][38][39], which require the output pdf to be measurable.
• This fault detection approach is applicable to multivariate and uncertain systems.It is a generalization of the method in [8] where only single-input-signal-output system is concerned.
However, several issues still need to be studied in the future, i.e., how to determine the threshold and how to perform fault diagnosis.Moreover, as a branch of stochastic distribution control (SDC), fault detection using entropy optimization principle still remains in the theoretic research stage [8,37,40].Its real world application requires more research efforts.

Figure 9 .
Figure 9.The residual value when fault occurs.

Figure 10 .
Figure 10.The residual value when no fault occurs.

Figure 11 .Figure 12 .
Figure 11.3-D mesh of the system output y.