A Distributed Fault Diagnosis and Cooperative Fault-Tolerant Control Design Framework for Distributed Interconnected Systems

This paper investigates a design framework for a class of distributed interconnected systems, where a fault diagnosis scheme and a cooperative fault-tolerant control scheme are included. First of all, fault detection observers are designed for the interconnected subsystems, and the detection results will be spread to all subsystems in the form of a broadcast. Then, to locate the faulty subsystem accurately, fault isolation observers are further designed for the alarming subsystems in turn with the aid of an adaptive fault estimation technique. Based on this, the fault estimation information is used to compensate for the residuals, and then isolation decision logic is conducted. Moreover, the cooperative fault-tolerant control unit, where state feedback and cooperative compensation are both utilized, is introduced to ensure the stability of the whole system. Finally, the simulation of intelligent unmanned vehicle platooning is adopted to demonstrate the applicability and effectiveness of the proposed design framework.


Introduction
With the rapid development of sensing and communication technologies, modern engineering systems are increasingly networked and distributed [1]. Further, the large-scale distributed systems such as power grid and vehicle platooning are generally interconnected, physically or informationally [2][3][4][5]. These kinds of systems are thus referred to as distributed interconnected systems, which are composed of several subsystems in different locations through coupling mechanisms. On the other hand, the increasing size and complexity of distributed interconnected systems makes the occurrence of faults easier. Besides, due to the characteristics of interconnection, the fault diagnosis for distributed interconnected systems is challenging as an incipient fault occurring in any subsystem can potentially propagate from one subsystem to another and even result in the collapse of the whole system. The research on fault diagnosis and fault-tolerant control for distributed interconnected systems is receiving remarkable attention [6][7][8][9].
For the most part, the fault diagnosis approaches for distributed interconnected systems can be divided, according to the information used by the diagnostic units, into three categories: centralized, decentralized, and distributed fault diagnosis [10]. The centralized fault diagnosis approach employs a centralized diagnostic unit to collect the information of the whole system and then conducts fault diagnosis for all subsystems. In [11], an interconnected system with disconnected interconnections and packet dropouts was augmented into a switched system, and then a centralized robust fault detection filter was further designed. Obviously, the centralized approach requires high computation as well as communication and is not easy to expand, so it is not suitable for large-scale distributed interconnected systems. In the decentralized fault diagnosis approach, each subsystem is functions". In [25], the cycle-small-gain theorem was utilized to ensure the closed-loop stability of interconnected systems, and a fault-tolerant control scheme that considered both rigid and flexible component faults was proposed. However, the use of the small gain theorem generally leads to a conservative result, and the fault-tolerant objective is only to guarantee the stability of faulty systems. To the best of our knowledge, most investigations on fault-tolerant control for the distributed interconnected systems are limited to basic stability analysis, whereas other dynamic and static properties have not been covered in great detail.
Inspired by the above considerations, a distributed fault diagnosis and cooperative fault-tolerant control design framework for distributed interconnected systems is proposed in this paper. Specifically, the contributions of this paper are as follows:

1.
A novel fault diagnosis framework, which is mainly composed of fault detection observers and fault isolation observers, is developed for a general class of distributed interconnected systems with actuator faults. By transmitting the state estimation information in the form of a broadcast communication and carrying out several decision logic schemes in the cloud processing unit based on the residuals to achieve fault detection, isolation, and estimation, the problem of fault propagation can be solved as well; 2.
A cooperative fault-tolerant control scheme, where LQR controllers for the healthy subsystems and a cooperative fault-tolerant controller for the faulty subsystem are utilized respectively, is also proposed to guarantee the stability and performance of the whole system; 3.
Different from the conventional isolation decision logic, the adaptive method is employed to estimate the fault and the fault estimation information is used to modify the residuals. In this way, the subsystem with an actuator fault can be located where the residual value is less than the threshold rather than exceeding the threshold as usual.
This paper is organized as follows. In Section 2, the framework of distributed fault diagnosis and cooperative fault-tolerant control is introduced briefly, followed by the corresponding design objective. Section 3 presents the main results, including the design of fault detection observer, fault isolation observer, and cooperative fault-tolerant controller. Section 4 is dedicated to the simulation of intelligent unmanned vehicle platooning to demonstrate the applicability and effectiveness of the proposed design scheme. Ultimately, some conclusions and possible future research directions are presented in Section 5.

Problem Description
The design framework of distributed fault diagnosis and cooperative fault-tolerant control for distributed interconnected systems is depicted in Figure 1 and mainly includes the monitoring and control units (MCUs) and cloud processing unit. The whole distributed interconnected system consists of p subsystems and is modeled as s denotes the measurement noise, with w i (k) ∈ s/p the ith subsystem measurement noise. A, B, B f , B v , C and E in Equation (1) can be decomposed into denotes the output vector, with ( ) i y k the th i subsystem output.
represents the actuator fail- stands for the process noise, with The th i subsystem can be further given as The ith subsystem can be further given as where N is the set of all subsystems and N i is the set of subsystems other than the ith subsystem. x j (k) and u l (k) represent the jth(j = i) subsystem state and lth subsystem input. Note that f i (k) denotes the actuator failure and, in general, It can be found that each subsystem is equipped with an MCU which consists of the following components: (1) A fault detection observer (FDO), which is governed by wherex i (k) andx j (k) are the state estimations of the ith and jth subsystem respectively. x i (k) represents the output estimation of the ith subsystem. L i is the detection observer gain and r i (k) stands for the residual of the ith subsystem generated by the FDO.
(2) A fault isolation observer (FIO), which is activated when there is an alarm provided by the corresponding FDO and can be described by where Alarm + Alarm = N i andx q i (k) is the state estimation of subsystem given by the FIO. G i is the isolation observer gain of the ith subsystem.ŷ q i (k) represents the corresponding output estimation and r i q (k) is the residual of the ith subsystem generated by the FIO.f i (k) stands for the fault estimation and Γ i is the weighting matrix.
(3) A controller, which can keep the faulty system stable and is constructed as where R i is a positive definite matrix, and K i is the local optimal gain determined from a is the real state estimation of x i (k) and is provided by the cloud processing unit. S i (k) represents the cooperative compensation vector from other subsystems in the faulty case. The processing flow of the cloud processing unit is shown in detail in Figure 2. It can be seen that the clouding processing unit shoulders the responsibility of receiving, processing, and broadcasting information. Further, it mainly perform three functions: (i) obtaining the state estimationsx i andx q i from the monitoring unit; (ii) accomplishing fault detection and isolation based on the residual signals and spreading results; and (iii) providing the corresponding state estimation to the control unit. Based on this, the fault detection and isolation schemes in particular are given in Figure 3. The conventional fault detection observer is used to detect whether a fault occurs, and a residual value exceeding the threshold indicates that there is a fault in the process. Meanwhile, the fault isolation observer based on the adaptive fault estimation method is adopted to achieve fault isolation by the combination of an unconventional isolation decision logic. Specifically, since the adaptive method is employed to estimate the fault and the fault estimation information is used to modify the residuals. In this way, the subsystem with actuator faults can be located where the residual value is less than the threshold rather than exceed the threshold as usual. It is noteworthy that isola- Based on this, the fault detection and isolation schemes in particular are given in Figure 3. The conventional fault detection observer is used to detect whether a fault occurs, and a residual value exceeding the threshold indicates that there is a fault in the process. Meanwhile, the fault isolation observer based on the adaptive fault estimation method is adopted to achieve fault isolation by the combination of an unconventional isolation decision logic. Specifically, since the adaptive method is employed to estimate the fault and the fault estimation information is used to modify the residuals. In this way, the subsystem with actuator faults can be located where the residual value is less than the threshold rather than exceed the threshold as usual. It is noteworthy that isolation decision logic in this paper is contrary to the detection decision logic and different from the conventional method [26]. Based on this, the fault detection and isolation schemes in particular are given in Figure 3. The conventional fault detection observer is used to detect whether a fault occurs, and a residual value exceeding the threshold indicates that there is a fault in the process. Meanwhile, the fault isolation observer based on the adaptive fault estimation method is adopted to achieve fault isolation by the combination of an unconventional isolation decision logic. Specifically, since the adaptive method is employed to estimate the fault and the fault estimation information is used to modify the residuals. In this way, the subsystem with actuator faults can be located where the residual value is less than the threshold rather than exceed the threshold as usual. It is noteworthy that isolation decision logic in this paper is contrary to the detection decision logic and different from the conventional method [26]. In this paper, the design objective is to locate the fault accurately and achieve a cooperative fault-tolerant control. Hence, this paper studies the design of a novel fault diagnosis framework, and the detection and isolation observer gain i L and i G , the controller gain i K , and the cooperative compensation vector ( ) i S k . In this paper, the design objective is to locate the fault accurately and achieve a cooperative fault-tolerant control. Hence, this paper studies the design of a novel fault diagnosis framework, and the detection and isolation observer gain L i and G i , the controller gain K i , and the cooperative compensation vector S i (k).

Remark 1.
It is worthwhile to note that only the single fault case is considered in this paper. Meanwhile, a cooperative controller with fault-tolerant ability is introduced to keep the faulty subsystem stable, and LQR controllers are employed so that the healthy subsystems, which may be affected by the faulty subsystem, can remain stable. Figure 1 are physically interconnected. From the mathematical viewpoint, the physical interconnection can be seen from the state matrix A. If the matrix A ij (i = j), the non-diagonal block of the matrix A, is not equal to zero, it means that the ith subsystem and the jth subsystem are physically interconnected. In addition, the monitoring units in Figure 1 are informationally interconnected. To be specific, the monitoring units acquire the state estimation information from other interconnected subsystems through broadcast communication, and all subsystems can use them once the state estimation information has been broadcast.

Remark 3.
Fault isolation is achieved by making use of the adaptive fault estimation observer which is not applicable for a sensor fault. This is the reason we do not consider a sensor fault. If the fault isolation observer based on the adaptive fault estimation method is replaced by some other sensor fault estimation observer, the problem of sensor fault diagnosis can be considered.

Remark 4.
Similar to what has been given in [27], the necessary conditions for the existence of the observer are (A ii , C i )(i = 1, · · · p) being observable and (A, C) being observable, which can be guaranteed by the PBH rank criteria rank C T i sI i − A T ii = n/p and rank C T sI − A T = n respectively.

Main Results
The presentation of the main results is divided into three sections: (i) the design of fault detection observer; (ii) the design of fault isolation observer; and (iii) the design of cooperative fault-tolerant controller.

The Design of Fault Detection Observer
In the design of the fault detection observer, the only design parameter is the observer gain L i in Equation (3). To this end, we define the state error as e i (k)= x i (k) −x i (k) and the dynamics of the error system are obtained from Equations (2) and (3) as . Thus, the asymptotic stability and H ∞ performance of the error system given in Equation (6) are ensured in the following theorem.
where 0 < ε m , · · · , ε p 1 (m, · · · p ∈ N i ). Then the error system (6) is asymptotically stable and satisfies H ∞ performance r i (k) 2 < γ i d i (k) 2 , and the detection observer gain can be obtained Proof of Theorem 1. Consider the following Lyapunov function candidate: and the stability of the error system (6) is satisfied if and only if With applying the Schur complement twice, Ξ i < 0 can be further expressed as: based on this, ε m , · · · , ε p are further managed with the use of Schur complement p − 1 times. Hence, Theorem 1 can be proven.
The root mean square (RMS) norm is selected as the residual evaluation function with The threshold, the maximum influence of disturbance on the residual evaluation function without faults, can be computed by Then, the fault detection decision logic can be described by where ε i = 1 denotes that the ith subsystem will generate an alarm signal which will be broadcast by the cloud processing unit, and ε i = 0 denotes that the ith subsystem is healthy.

The Design of Fault Isolation Observer
According to the above fault detection results, the alarming subsystems are firstly put into the fault set and then fault isolation is only conducted in the fault set. Meanwhile, the FIO given in Equation (4) takes advantage of the output signals from sensors as well as the input signals to generate state estimationx q i (k) and fault estimationf i (k). Based on this, a novel framework of fault isolation is proposed to locate the fault accurately. For this purpose, some error expressions are defined as follows: Then, from Equations (2) and (4), the error dynamical system is described by where ∆ f i (k) = f i (k + 1) − f i (k) denotes the variation in the fault.
According to the error system (8), the augmented system is as follows: where e i (k) = e q i , and E i = 0 1 0 0 · · · 0 , and , and . Thus, the asymptotic stability and H ∞ performance of the error system given in Equation (9) are ensured in the following theorem.
Theorem 2. For a given scalar δ i > 0 , if there exist symmetric matrices P ii = P T ii > 0, Q ii = Q T ii > 0 such that the following condition holds where P i = diag P ii Q ii I I · · · I , , then the error dynamics (9) satisfy the H ∞ performance index e f i (k) 2 < δ i d i (k) 2 . Further, the weighting matrix and isolation observer gain are given as Γ i = Q −1 ii Y i and G i = P −1 ii X i respectively.
Proof of Theorem 2. Consider the following Lyapunov function candidate: and the stability of the error system Equation (9) is satisfied if and only if Theorem 2 can be proven by substituting P i , A i and B d i into Equation (11).
Furthermore, the residual evaluation in the fault set is carried out again. The RMS norm is chosen as the residual evaluation function with

Remark 5.
Alarm signals from FDOs are broadcasted by the cloud processing unit. In order to realize fault isolation, the alarming subsystems then further employ FIOs in turn after the use of FDOs. Meanwhile, the healthy subsystems, not generating alarm signals, use FDOs at all times.

The Design of Cooperative Fault-Tolerant Controller
In the two previous subsections, fault detection and isolation have been realized by the design of the fault detection and isolation observer. However, the stability of the whole system cannot be guaranteed because the fault is unsolved during fault diagnosis. For this reason, a cooperative fault-tolerant control scheme for the distributed interconnected system is presented in this subsection.

Theorem 3.
To ensure that a distributed interconnected system with an actuator failure can maintain stability, the optimal cooperative fault-tolerant control law is designed as where the local control gain and cooperative compensation vector are determined from Proof of Theorem 3. The global optimization problem for the control of the distributed interconnected system with a failure is a quadratic function related to the state and input vectors, which can be given as where Q i is a positive semidefinite matrix. When one subsystem fails, the effect of cooperative compensation can be achieved by the Hamiltonian for each subsystem where λ i (k + 1) is the adjoint vector. According to Equation (15), the necessary optimality conditions, obtained from the optimal control theory [28], are Further, the cooperative compensation is calculated through a feedback control, which can be described by Substituting Equations (18) and (2) without the fault f i (k) and the noise v i (k) into Equation (16) yields By combining Equations (18) and (2) without the fault f i (k) and the noise v i (k), Equation (17) can be re-written as Moreover, substituting Equation (20) into Equation (19) leads to Theorem 3 can be proven by comparing Equations (18) and (21).

Simulation Example
In this section, the proposed fault diagnosis and cooperative fault-tolerant control scheme is applied to the simplified model of intelligent unmanned vehicle platooning [29], which is shown in Figure 4. A desired separation distance ∆ o between adjacent vehicles, and a desired average velocity V o should be assigned under normal operating conditions. Furthermore, the variable ∆D i (k) (i = 2, 3, 4) represents the deviation from the desired separation distance while the variable ∆V i (k) (i = 1, 2, 3, 4) represents the deviation from the desired velocity. ∆d i (k) is the real separation distance between the ith and i − 1th vehicle at time k, and V i (k) is the real velocity of the ith vehicle at time k. Therefore, the state vector and output vector in Equation (1) Figure 4. Intelligent unmanned vehicle platooning made up of four vehicles.
The motion of each vehicle is characterized firstly by differential equations with the help of Newton's second law, and then the state-space representation of the four vehicles platooning can be acquired through expanding the nonlinear term in a Taylor series expansion. The system matrices of the four vehicles platooning are given as follows:    The motion of each vehicle is characterized firstly by differential equations with the help of Newton's second law, and then the state-space representation of the four vehicles platooning can be acquired through expanding the nonlinear term in a Taylor series expansion. The system matrices of the four vehicles platooning are given as follows: According to Theorem 1, the four parameters γ 1 , γ 2 , γ 3 , and γ 4 are computed as 1.1339, 1.1339, 1.1339, and 1.0003 respectively, and the detection observer gains are as follows: Then, by solving the condition in Theorem 2, we can obtain the H ∞ performance levels δ 4 = 1.8338, δ 3 = 1.8633, and the isolation observer parameters Further, by solving Riccati Equation (9), the local optimal gains can be obtained as: In the simulation, the process and measurement noise are assumed as v(k) = w(k) = 0.1 0.1 0.1 0.1 T · U −1 1 . Meanwhile, a fault has occurred in the 1st vehicle and is chosen as The simulation results of fault detection are depicted in Figure 5a-d. It can be observed that the 1st and 2nd vehicles generate alarm signals and the 3rd and 4th vehicles do not generate alarm signals. Thus, the fault detection logic table can be listed as follows: The simulation results of fault detection are depicted in Figure 5a-d. It can be observed that the 1st and 2nd vehicles generate alarm signals and the 3rd and 4th vehicles don't generate alarm signals. So the fault detection logic table can be listed as Table 1.  logic table can be listed as Table 1. It indicates that the faulty vehicle is located in alarming vehicles and the 3rd and 4th vehicles are healthy because of no alarm signals. Hence, the fault set is defined as {the 1st vehicle, the 2nd vehicle}. The next step is to determine whether the faulty vehicle is the 1st vehicle or the 2nd vehicle. For this purpose, fault isolation for the 1st vehicle and the 2nd vehicle are carried out in turn.
The simulation results of fault isolation for the 1st vehicle and 2nd vehicle are shown in Figures 6 and 7 respectively. It can be seen from Figure 6a,b that the residual assessment values of the 1st and 2nd vehicle are both less than the threshold. However, the residual assessment values of the 1st and 2nd vehicle are both over the threshold in Figure 7a,b after the occurrence of the fault. Based on this, the fault isolation logic table can be listed as Table 2. The simulation results of fault isolation for the 1st vehicle and 2nd vehicle are shown in Figures 6 and 7 respectively. It can be seen from Figure 6a,b that the residual assessment values of the 1st and 2nd vehicle are both less than the threshold. However, the residual assessment values of the 1st and 2nd vehicle are both over the threshold in Figure 7a,b after the occurrence of the fault. Based on this, the fault isolation logic table can be listed as Table 2.

Types Numbers Decision Results
Fault isolation for the 1st vehicle The 1st vehicle (FIO) 0 The simulation results of fault isolation for the 1st vehicle and 2nd vehicle are shown in Figures 6 and 7 respectively. It can be seen from Figure 6a,b that the residual assessment values of the 1st and 2nd vehicle are both less than the threshold. However, the residual assessment values of the 1st and 2nd vehicle are both over the threshold in Figure 7a,b after the occurrence of the fault. Based on this, the fault isolation logic table can be listed as Table 2.

Types Numbers Decision Results
The 1st vehicle (FIO) 0  Combing the simulation results and the fault isolation logic, it can be found that the faulty vehicle is the 1st vehicle.
Meanwhile, it also can be found from Figure 8 that the fault estimation value of the 1st vehicle can follow the fault value rapidly and accurately in a short time. In order to guarantee the stability of the whole intelligent unmanned vehicle platooning, a cooperative controller with fault-tolerant ability is applied to the 1st vehicle and LQR controllers are used for the other three vehicles, and fault-tolerant results are further shown in Figure 9. It is obvious that the malfunction of the 1st vehicle brings about fault propagation among intelligent unmanned vehicle platooning, so that the displacement curves of the other three vehicles are no longer parallel with each other for a period of time. However, the displacement curves of the four vehicles are parallel again under the action of cooperative fault-tolerant control, which demonstrates the effectiveness of the fault-tolerant control scheme proposed in this paper.

Conclusions
In this paper, a distributed fault diagnosis and cooperative fault-tolerant control In order to guarantee the stability of the whole intelligent unmanned vehicle platooning, a cooperative controller with fault-tolerant ability is applied to the 1st vehicle and LQR controllers are used for the other three vehicles, and fault-tolerant results are further shown in Figure 9. It is obvious that the malfunction of the 1st vehicle brings about fault propagation among intelligent unmanned vehicle platooning, so that the displacement curves of the other three vehicles are no longer parallel with each other for a period of time. However, the displacement curves of the four vehicles are parallel again under the action of cooperative fault-tolerant control, which demonstrates the effectiveness of the fault-tolerant control scheme proposed in this paper. In order to guarantee the stability of the whole intelligent unmanned vehicle platooning, a cooperative controller with fault-tolerant ability is applied to the 1st vehicle and LQR controllers are used for the other three vehicles, and fault-tolerant results are further shown in Figure 9. It is obvious that the malfunction of the 1st vehicle brings about fault propagation among intelligent unmanned vehicle platooning, so that the displacement curves of the other three vehicles are no longer parallel with each other for a period of time. However, the displacement curves of the four vehicles are parallel again under the action of cooperative fault-tolerant control, which demonstrates the effectiveness of the fault-tolerant control scheme proposed in this paper.

Conclusions
In this paper, a distributed fault diagnosis and cooperative fault-tolerant control framework was developed. To be specific, a fault detection observer was first designed for each subsystem, and the generated alarm signals were broadcast by the cloud processing unit. After that, fault isolation observers and isolation decision logic were used for alarming subsystems in turn to locate the fault accurately. Furthermore, the control unit with the effect of cooperative compensation was constructed to avoid the system

Conclusions
In this paper, a distributed fault diagnosis and cooperative fault-tolerant control framework was developed. To be specific, a fault detection observer was first designed for each subsystem, and the generated alarm signals were broadcast by the cloud processing unit. After that, fault isolation observers and isolation decision logic were used for alarming subsystems in turn to locate the fault accurately. Furthermore, the control unit with the effect of cooperative compensation was constructed to avoid the system instability caused by the faulty subsystem.
It is notable that the scheme in this paper can be employed if and only if a single fault occurs in the system, so it is challenging to study actuator faults as well as sensor faults simultaneously for a distributed interconnected system. Meanwhile, it is very meaningful to combine our proposed method with the distinguished technologies in [8,[18][19][20], focusing on optimizing shared information among subsystems, so as to reduce communication costs. These may represent the directions of our future work.