Observability Decomposition-Based Decentralized Kalman Filter and Its Application to Resilient State Estimation under Sensor Attacks

This paper considers a discrete-time linear time invariant system in the presence of Gaussian disturbances/noises and sparse sensor attacks. First, we propose an optimal decentralized multi-sensor information fusion Kalman filter based on the observability decomposition when there is no sensor attack. The proposed decentralized Kalman filter deploys a bank of local observers who utilize their own single sensor information and generate the state estimate for the observable subspace. In the absence of an attack, the state estimate achieves the minimum variance, and the computational process does not suffer from the divergent error covariance matrix. Second, the decentralized Kalman filter method is applied in the presence of sparse sensor attacks as well as Gaussian disturbances/noises. Based on the redundant observability, an attack detection scheme by the χ2 test and a resilient state estimation algorithm by the maximum likelihood decision rule among multiple hypotheses, are presented. The secure state estimation algorithm finally produces a state estimate that is most likely to have minimum variance with an unbiased mean. Simulation results on a motor controlled multiple torsion system are provided to validate the effectiveness of the proposed algorithm.


Introduction
As control systems operate through network communication and become more complex due to increased connectivity, security against adversarial attacks is becoming more important and receiving attention [1][2][3][4]. In fact, attacks on control systems took place in reality [5][6][7][8], and many studies have been conducted on the security issues of systems whose measurements have been compromised by adversaries because sensors are one of the vulnerable points to malicious attackers in dynamical systems [9][10][11][12][13][14][15].
Among them, the state estimation problem when some of sensors are corrupted by attackers, often called a sparse sensor attack, has been investigated, and several solutions have been recently proposed [10][11][12][13][14][15]. The reference [10] introduces the basic concepts of the secure state estimation problem and formulates it as a non-convex combinatorial optimization problem. The problem is shown to be transformed into a convex optimization problem by using the results developed in the field of compressed sensing [16,17] under additional limiting assumptions. The relationship between this resilient state estimation problem and the notion of strong observability was revealed in [11]. A necessary and sufficient condition for the solvability of this problem is derived in [12,15] with the notion of redundant observability, more specifically, it requires the redundancy of observability twice as much as the sparsity of sensor attacks. A method to alleviate the computational complexity of the logic for finding a combination of non-attacked sensors, is proposed in [13,14]. In [15], the estimator is designed by a set of local observers with only a single sensor, and the decoder uses an error correction algorithm to generate a final state estimate based on the data collected from each local observer.
(1) The proposed algorithm successfully estimates the state variable under sparse sensor attacks as well as Gaussian disturbances/noises. Our algorithm ensures the minimum variance, while [19] simply guarantees that its covariance is no worse than the worst case scenario with high probability; (2) We only assume that the system is redundant observable, which is known as an equivalent condition for the secure state estimation to be solvable under sparse sensor attacks. Note that [20] requires additional assumptions to reformulate the problem as a convex problem, and further, the combination of Kalman filter and the secure estimator implicitly supposes that the estimation error for the attack signal follows a zero-mean Gaussian distribution, which may not be true when the attack signal is intelligently designed in a coordinated way. The reference [21] needs the system matrix to be nonsingular, and both references [20] and [21] have additional assumptions about the closed-loop system; (3) The construction of the local observer is completely decentralized, and the overall size of the observer is relatively small. As the combinatorial logic is embedded in the fusion center, we do not have to prepare all possible combinations of observers. Note that [19] does not utilize any decomposition, and thus, it asks for all combinations of observers. The local decomposition presented in [21] is not fully decentralized because the decomposition is performed using the global information of the output matrix and the Kalman gain; (4) As a by-product obtained during the derivation process, the optimal decentralized information fusion Kalman filter scheme is developed based on the observability decomposition. Compared with the results in [24,25], the proposed scheme does not suffer from the numerical computational errors resulting from the diverging error covariance matrix. The algorithm in this paper guarantees that each error covariance matrix in the local observer converges by the observability decomposition, and this method can also be widely used for the multi-sensor information fusion Kalman filters that do not consider any attacks.
The rest of the paper is organized as follows. The remaining of this section introduces the notation used throughout the paper. The system model and problem formulation are given in Section 2. Section 3 presents the optimal multi-sensor information fusion Kalman filter based on the observability decomposition. We then give the attack detection algorithm by χ 2 test and the attack-resilient state estimation scheme by the multiple hypothesis test in Section 4. Finally, simulation results with a servo motor system are given in Section 5, and we provide our concluding remarks in Section 6. The preliminary results of this paper were studied in [28].
Notation: Throughout this paper, the following notations are adopted. For a set S, the number of elements in the set S is denoted by |S|. For a column vector y ∈ R p and its i-th element y i , supp(y) denotes the number of nonzero elements of the vector y, that is, supp(y) := {i ∈ [p] : y i = 0} where the symbol [p] is used to represent the subset of natural numbers {1, 2, · · · , p} ⊂ N. The number of nonzero elements of a vector y is defined by the 0 norm, and it is written as y 0 := |supp(y)|. We say that the vector y is q-sparse if its 0 norm is less than or equal to q, that is, y 0 ≤ q.
For an index set I ⊂ [p] and a vector y ∈ R p (or a matrix C ∈ R p×n ), y I ∈ R |I| (or C I ∈ R |I|×n ) denotes the vector (or the matrix) obtained from y (or C) by eliminating all i-th rows such that i ∈ I c . Similarly, for two index sets I, J ⊂ [p] and a matrix P ∈ R p×p , P I,J ∈ R |I|×|J | denotes the matrix obtained from P by eliminating all i-th rows and all j-th columns such that i ∈ I c and j ∈ J c . Let a finite sequence represents the j-th partition among total p partitions when a vector z ∈ R µ is partitioned by the sequence {µ i }. This notation is extended to a subset J ⊂

System Modeling and Problem Formulation
The plant and the attack model under consideration are presented, and the problem formulation is given in this section.

Plant Modeling with Gaussian Disturbances and Noises
A discrete-time linear time invariant (LTI) system under Gaussian disturbances and noises given by is considered. In the plant dynamics of (1), x ∈ R n is the state variable vector, u ∈ R m is the control input vector, and y ∈ R p is the sensor output vector. Furthermore, the dynamics is disrupted by the process disturbance d ∈ R n , and the sensors are corrupted by the measurement noise n ∈ R p . There are a total of p sensors that measure the system outputs, and the i-th sensor's measurement at time k is denoted by where c i is the i-th row of the output matrix C, which implies that C = [c 1 c 2 · · · c p ] .
Here, stochastic assumptions on the disturbance d(k), the noise n(k) and the initial state x(0) of the system (1) are formally stated as follows.
Assumption 1. The disturbance d(k) and measurement noise n(k) are independent and identically distributed (i.i.d.) white Gaussian process with zero-mean and covariance matrices Q and R, respectively. More specifically, where the symbol E[·] represents the expected value of a random variable and δ kt is the Kronecker delta function. Furthermore, the initial state x(0) is a Gaussian distributed random variable with the meanx 0 and covariance matrix P 0 , and is independent of d(k) and n(k).

Attack Modeling with Sparse Sensor Attacks
Among various attack scenarios [3], we consider false data injection attacks on sensors. Adversarial attackers can inject arbitrary inputs to some (not all) sensors so that a part of the measurements is compromised. Some additive inputs may be induced by cyber or physical tampering with the sensors, or adversaries may penetrate into the communication network on the output side of the plant because those communication links are not secure. In both cases, the attack is characterized by the attack vector a ∈ R p as in y a (k) = y(k) + a(k) = Cx(k) + n(k) + a(k) = Cx(k) + n a (k) (2) where y a ∈ R p denotes sensor readings with a potential attack, while y ∈ R p is the original healthy sensor data affected by the measurement noise only. Similarly, n a ∈ R p represents the total sensor contamination signal including both the noise n and the attack a.
Here, it is assumed that the adversaries can compromise only a part of the sensors, not all of them. Assuming that the attacker's resources are limited, we suppose that the attacker can contaminate up to q out of p measurement outputs. Therefore, a formal condition on the sparsity of the attack vector a can be given as follows.
This assumption tells more than a(k) 0 ≤ q for all k ≥ 0, in the sense that the compromised sensor channels are not altered for all time. In practice, this may be the case because it takes quite a long time and much effort to infiltrate into a new sensor from a malicious attacker's point of view. Thus, without loss of generality, it can be assumed that the attack channels remain the same in the long term although it is not revealed to the controller which channels are attacked. However, if the attacked sensor channel changes but does not change frequently, the resilient state estimation scheme to be presented is still applicable. We will simply refer to this assumption as a "q-sparse sensor attack".

Problem Formulation
For the given discrete-time LTI system (1) under Assumptions 1 and 2, this paper investigates how to design an estimator that can recover the state variable x correctly. First, the Gaussian distributed disturbances/noises are handled appropriately, and the optimality in the sense of minimum variance should be recovered. Second, the security against the sparse sensor attack is enhanced, and the attack-resilient estimation with the unbiased state estimate should be achieved. More specifically, this paper considers the problem of proposing a secure and robust state estimation algorithm that generates the estimate that is most likely to have the minimum variance and to be unbiased. In this process, the concept of "redundant observability", which characterizes the ability of coping with the sparse sensor attack, is utilized to ensure successful state estimation.
The basic condition for the observability of the system (1) with the attack model (2) satisfying Assumption 2, is given in the following assumption. Note that the assumption of "2q redundant observability" is an equivalent condition for the system to be observable under q-sparse sensor attacks ( [15], Proposition 2,3,6). Here, the state estimation problem becomes challenging because this redundant observability does not guarantee for the entire states to be recovered with only a single sensor. Assumption 3. The system (1), or the pair (A, C), is 2q redundant observable. In other words, each pair (A, C I ) is observable for any I ⊂ [p] satisfying |I| ≥ p − 2q.

Kalman Observability Decomposition with Single Sensor
Since conventional Luenburger observers or Kalman filters typically have the form of the whole state estimatesx are affected by the single sensor attack signal due to the observer gain K. In other words, any single non-zero component of a can alter all components of the state estimatex. Hence, we design a collection of observers where each local observer utilizes only a single sensor information so that an attack signal for one sensor channel only interferes with the corresponding local observer and leaves other local observers unaffected.
Consider a single-output system P i : where the i-th component of y a (k) in (2), y a i (k), is the output and the dynamics is given by (1). Since the pair (A, c i ) is not necessarily observable, an estimator of the system (3) generally recovers only an (observable) portion of the full state x. The Kalman observability decomposition, which clearly describes the observable portion of the system, is now briefly introduced. For the single-output system (3), the observability matrix is written as and we denote µ i as the rank of the observability matrix G i . The null space of G i , N (G i ), is the so-called unobservable subspace, and the column range space of G i , R(G i ), is often called the observable subspace. One can define the similarity transformation as where Z i ∈ R n×µ i is the matrix whose columns are th orthonormal basis of R(G i ) and W i ∈ R n×(n−µ i ) is the matrix whose columns are the orthonormal basis of N (G i ). Here, the size of those matrices is determined by Note that the observable subspace R(G i ) is the span of column vectors in Z i and the unobservable subspace N (G i ) is the span of column vectors in W i . Since the matrix [Z i W i ] is orthogonal, we have Moreover, because the unobservable subspace is A-invariant, any columns of AW i belong to N (G i ) = R(W i ). Therefore, the Kalman observability decomposition of the system (3) is obtained by the transformation (5) as Finally, the state x ∈ R n is decomposed into the observable sub-state z i ∈ R µ i and the unobservable sub-state w i ∈ R n−µ i . Further, the observable part of (6) can simply be written as P o i : where S i := Z i AZ i and t i := c i Z i .

Decentralized Multi-Sensor Kalman Filter
Even though the Kalman filter can be applied to unobservable linear systems, the error covariance matrix may not converge in that case. According to ([29], Theorem 26), the detectability of the system is a sufficient condition for the convergence of the error covariance matrix in Kalman filtering. Since detectability is a slightly weaker concept than observability, the results in this paper dealing with observability can be generalized to the concept of detectability with slight modifications. The design of local state estimators for the observable subsystem (7) in the form of Kalman filters using only single sensor information, is derived in this subsection. By its construction, the pair (Z i AZ i , c i Z i ), or simply denoted as (S i , t i ), is observable, and thus, the error covariance matrix of the Kalman filter designed for the system (7) converges to a positive semidefinite matrix ( [29], Theorem 26). Now, we design a decentralized Kalman filter with each single sensor output, which constitutes the local observer. Then, the design of an information fusion scheme, which collects all the information on state estimates and error covariance matrices from the decentralized Kalman filters, will be discussed in the next subsection. For the simplicity of the derivation, we assume that there are no attacks at this time, that is, a(k) ≡ 0. Thus, n a (k) and y a (k) are interpreted as n(k) and y(k), respectively, in this section.
Stochastic assumptions on the disturbance d(k) and the noise n(k) of the system (1) are formally stated in Assumption 1 where the covariance matrix R of the measurement noise n(k) is partitioned as Finally, the assumption for each measurement noise n i (k) (which is the same as n a i (k) in this section) of the system (3) can be written as follows: The local observer is designed by a Kalman filter for the observable subsystem (7). To this end, letẑ i (k|k − 1) be the estimate of z i (k) based on observations from y a (0) to y a (k − 1). Similarly,ẑ i (k|k) is the estimate of z i (k) after we process the measurement y a (k) at time k. Following the conventional notations in a Kalman filter, we use the terms P i (k|k − 1) and P i (k|k) to denote the estimation error covariance ofẑ i (k|k − 1) andẑ i (k|k), respectively. Thus, We have Then, the Kalman filter has the following form of whereẑ The above Equations (10) describe the recursive form of how the state estimateẑ i , the Kalman gain K i , and the error covariance matrix P i evolve. The error covariance P i of the i-th local observer defined in (8), is governed by Equations (10d) and (10e), which ensure that the covariance matrix P i (k|k) can be calculated by the following recursive form: with the initial value of Similarly, the error cross covariance P ij of the i-th and j-th local observers can be defined by and the recursive formula for P ij is derived here. To this end, define the estimation error and we have that By substituting (14a) into (14b), the dynamics of the errorz i (k|k) is obtained as The errorsz i (k|k) andz j (k|k) for i = j may be correlated; thus, by using (15), the error cross covariance betweenz i (k|k) andz j (k|k) can be computed recursively. From the recursive form of (15), note thatz i (k|k) is a linear combination of elements in Therefore, by Assumption 1, we have (i) n a i (k + 1) and d(k) are orthogonal, (ii)z i (k|k) and d(k) are orthogonal, and (iii)z i (k|k) and n a j (k + 1) are orthogonal. Using these facts, one can derive the recursive form of the error cross covariance betweenz i (k|k) andz j (k|k) as follows: with the initial value of

Optimal Information Fusion Based on Observability Decomposition
Based on the equivalence Z i x = z i in (5) and the definitionz i =ẑ i − z i in (13), we havê Stacking Equations (18) for all i ∈ [p] leads to the following equation of Finally, (19) is written in a compact form aŝ where the matrix is composed of the similarity transformation matrices Z i 's and v a (k) is used for a simple notation ofz(k|k). In Equation (20), denotes the size of the stacked vector.
It should be noted that all the information in (20) except the actual state x(k), are known or accessible to us. In Section 3.1, the matrix Φ is generated from the orthonormal basis of the observable subspace R(G i ) where G i is the observability matrix given by (4). In Section 3.2, each local observer O i in (9) provides the state estimateẑ i for the observable sub-state z i . Now, the stochastic properties of the last term are analyzed. First, its mean is zero becausez i (k|k) is a linear combination of elements in (16) by the Formula (15), and Assumption 1 ensures that every component in (16) has a zero mean. Second, the covariance matrix of v a (k) can be obtained since the error covariance matrix P i is computed by each local observer L i in (11), and the error cross covariance matrix P ij is generated by the second layer of the multi-sensor Kalman filter L ij in (17) with collected information from local observers (see Figure 1 for the structure of the proposed Kalman filter). In summary, we have where which can be recursively computed by (11) and (17). Finally, Equation (20) depicts a linear model with the measured data vectorẑ, the known matrix Φ, the noise vector v a with a zero-mean Gaussian distribution, and the unknown vector x to be estimated. Based on the statistical estimation and detection theory [30,31], an elaborate derivation process to recover the optimal estimate of x in (20), is now presented. The minimum variance unbiased estimator (MVUE) for the data model (20) with v a satisfying v a ∼ N(0 µ×1 , P) is introduced as follows. (24) and the corresponding covariance matrix ofx MVUE is which achieves the minimum covariance in the sense that Px MVUE ≤ Px for any type of estimatorx.
Proof. The results directly follows from the Gauss-Markov Theorem ( [30], Theorem 6.1). However, we provide a direct proof for the readers convenience, and it follows the procedure in the proof of ([24], Theorem 1) or ( [25], Theorem 1). We introduce a linear unbiased estimatorx = Ωẑ and, from the unbiased assumption, it follows that Thus, we have ΩΦ = I n×n .
Let the covariance matrix of the estimation errorx :=x − x be P x . Then, the estimation errorx is obtained that and the covariance matrix P x can be computed as In order to find the minimum variance estimator, set the trace of the covariance matrix P x as the performance index J := tr(P x ) = tr ΩPΩ .
The Lagrangian [32] associated with J becomes where Λ ∈ R n×n is a matrix representing the Lagrange multipliers. By solving Combining (26) and (27) results in the following equation of Therefore, the matrix inversion lemma ( [33], Section 2.3) yields the solution as (24), is obtained , and the corresponding covariance matrix in (25) is computed by Theorem 1 explains how the optimal estimate is computed. The information fusion center D calculates the MVUE by (24) and its covariance by (25). In summary, the whole structure of the decentralized multi-sensor information fusion Kalman filter is shown in Figure 1. The first layer is composed of the local observer O i , which generates the estimatê z i and the Kalman gains K i as given in (9) and (10). A part of the local observer O i , denoted as L i , provides the error covariance matrix P i . The second layer L ij collects the Kalman gain K i 's from the first layer and gives the error cross covariance matrix P ij by (17). Finally, the third layer operates as an optimal information fusion center D as described in Theorem 1 and computes the optimal estimate with the minimum covariance.

Remark 1.
Note that Gauss-Markov Theorem ( [30], Theorem 6.1) gives the best linear unbiased estimator (BLUE) for the measurementẑ = Φx + v a where v a is a random variable, whose probability density function (PDF) is not restricted to a Gaussian distribution, with a zero mean and covariance P. Since the BLUE is also the MVUE for Gaussian data, the results of Theorem 1 also follow directly from the Gauss-Markov Theorem. The state estimatex MVUE given in Theorem 1 is the optimal estimate since it achieves the minimum variance with an unbiased mean. A special case of Theorem 1 is considered in ([24], Theorem 1) and ( [25], Theorem 1) for an information fusion scheme; however, the scheme in [24,25] may not be successful for a system whose local systems with a single sensor are not observable because the covariance matrix P could diverge in that case, whereas the covariance matrix P does not diverge in our scheme due to the Kalman observability decomposition.

Effect of Sparse Sensor Attack on Information Fusion Kalman FIlter
In the previous section, we assumed that all sensors were attack-free, that is, a(k) ≡ 0. Hence, n a i (k) and y a i (k) in (3) and (7) were regarded as non-attacked noise n i (k) and output y i (k), respectively. The effects of a sparse sensor attack satisfying Assumption 2 on the information fusion Kalman filter developed in Section 3 are investigated in this subsection.
By linearity, the Kalman filter in (10) can be divided into two parts withẑ i =: g i + e i as in Note that g i (k + 1|k + 1) and e i (k + 1|k + 1) have the same dynamics with (10a), while the incoming signal y a i (k + 1) is divided into two parts with y i (k + 1) and a i (k + 1) assigned to the dynamics of g i (k + 1|k + 1) and e i (k + 1|k + 1), respectively. Similarly, g i (k + 1|k) and e i (k + 1|k) have the same dynamics with (10b), whereas the incoming signal u(k) is solely assigned to the dynamics of g i (k + 1|k). By setting the initial conditions as g i (0|0) =ẑ i (0|0) = Z ix 0 and e i (0|0) = 0 µ i ×1 , it easily follows from (10a) and (10b) that Finally, the local observer O i in (9) Now, define the attack-free estimation error and we have that which is the same as (14) and (15) with n a i replaced by n i . By (29) and (31), the total state-estimation error defined in (13) satisfies and, from (30b) and (32c), its dynamic equation is given as follows: which is a rewrite of (15) using the fact n a i = n i + a i . For notational simplicity,ẑ i (k|k), v i (k|k), and e i (k|k) are denoted byẑ i (k), v i (k), and e i (k), respectively. Then, Equation (19) becomes which can be written in a compact form aŝ The above Equation (36) is nothing but (20) with v a replaced by v + e. The properties of v are exactly identical with those of v a in (22) because the derivation in (22) is under the assumption of a ≡ 0 meaning e ≡ 0 in this case. Thus, we have where P(k) simply denotes P(k|k) in (23). The attack-induced signal e(k) = [e 1 (k), · · · , e p (k)] evolves according to Equation (30b) (or equivalently (28b) and (28d)) with an initial value of e i (0) = e i (0|0) = 0 µ i ×1 . Therefore, we have e i ≡ 0 µ i ×1 for the healthy sensor with a i ≡ 0, while e i ≡ 0 µ i ×1 generally holds for the attacked sensor with a i ≡ 0. Finally, the stacked error vector e ∈ R µ partitioned by the sequence {µ i }, is ({µ i }-stacked) q-sparse by Assumption 2.

Detection of Sparse Sensor Attack
In the previous subsection, the measurement data have the formẑ = Φx + v + e ∈ R µ with unknown signals x, v, and e where the noise-induced signal v can be considered as a random variable whose distribution is N(0 µ×1 , P) and the attack-induced signal e is ({µ i }stacked) q-sparse. To investigate the properties of the matrix Φ in the measurement data, we borrow the definition of ({µ i }-stacked) q-error detectability and its characterization from [15]. There is a slight modification in the following Definition 1 and Lemma 1 from [15].
They do not append any additional zeros, whereas [15] adds additional zeros to match the size of all partitioned vectors and matrices.
Accordingly, the matrix Φ ∈ R µ×n is not ({µ i }-stacked) q-error detectable if and only if there exist x, x ∈ R n satisfying x = x , and ({µ i }-stacked) q-sparse e ∈ R µ such that Φx + e = Φx . In other words, the matrix Φ ∈ R µ×n is ({µ i }-stacked) q-error undetectable if and only if there exist a non-zero x e ∈ R n and ({µ i }-stacked) q-sparse e ∈ R µ such that Φx e = e. Typically, in terms of vectors, the vector e ∈ R µ is said to be undetectable with respect to Φ ∈ R µ×n if e = Φx e ∈ R µ for some x e ∈ R n .
With the estimatex of x obtained by MVUE of (24) in Theorem 1, we can calculate the estimated output Φx and generate a residual signal r, which is a difference between the real measurement and the estimated output, that is, r :=ẑ − Φx. Then, the residual r becomes another random variable whose distribution is also Gaussian. Finally, the mean and covariance of the Gaussian distributed random variable r is computed in the following theorem.
Second, because it easily follows that the covariance matrix is calculated as
Theorem 2 clarifies the mean and covariance of the Gaussian random variable r, and further, characterization of undetectable attacks with statistical analysis is also given. Now, one can derive a detection criterion of ({µ i }-stacked) q-sparse errors based on the property of the residual signal r, assuming that Φ ∈ R µ×n is ({µ i }-stacked) q-error detectable and that e ∈ R µ is actually ({µ i }-stacked) q-sparse. This detection strategy is summarized in the following theorem.

Theorem 3. For a finite sequence {µ
which means thatx is an unbiased estimate of x.
Proof. From Theorem 2, the ({µ i }-stacked) q-sparse e satisfies e = Φx e ∈ R µ for some x e ∈ R n if and only if E[r] = 0 µ×1 . However, any non-zero e = Φx e ∈ R µ for some x e ∈ R n is not ({µ i }-stacked) q-sparse by Lemma 1. (iii) since Φ ∈ R µ×n is ({µ i }-stacked) q-error detectable. Therefore, the ({µ i }-stacked) q-sparse e = Φx e ∈ R µ should be zero, and the result directly follows. Furthermore, the property of an unbiased estimate (with minimum variance) is easily obtained from Theorem 1.
From the observation of Theorems 2 and 3, the problem of detecting a non-zero ({µ i }stacked) q-sparse error signal e with a ({µ i }-stacked) q-error detectable coding matrix Φ ∈ R µ×n can be rephrased as: Given the residual signal r, which comes from the Gaussian Therefore, the statistical decision theory [31] is helpful in this situation. More precisely, the χ 2 test for fault detection [22,23], which is widely used to detect unwanted error signals, such as faults or attacks, can be applied.
One can simply apply the χ 2 test to detect the presence of error signals in the ({µ i }-stacked) measurementẑ given by (36), and its operating scheme is summarized in Algorithm 1. Initially, the attack detection alarm indicator f is set to 0, and then the residual r is computed according to Equation (38). Without any error signal (that is, e = 0 µ×1 ), the residual r follows a Gaussian distribution N(0, P − Φ(Φ P −1 Φ) −1 Φ ), which is shown in (39). Now, define the standardized residual ζ := P − Φ(Φ P −1 Φ) −1 Φ ) − 1 2 r whose distribution becomes N(0 µ×1 , I µ×µ ). Thus, the 2-norm of ζ denoted by g := ζ ζ is an observation from a random variable g, which satisfies a χ 2 distribution with µ degrees of freedom (DOF), g ∼ χ 2 µ . This means that g cannot be far away from zero. Finally, when g is greater than a threshold ∆ TH , the attack detection alarm is triggered by setting f = 1. Here, ∆ TH is the predetermined threshold value, and it decides the probability of false alarm and the probability of error detection. For example, when the threshold ∆ TH is chosen such that where p g (x) denotes the PDF of the χ 2 µ distribution, the probability of false alarm becomes δ. As the probability of false alarm δ becomes smaller, the probability of error detection also decreases, which implies that there is a trade-off between the small false alarm and the high error detection ratio. Thus, one needs to choose ∆ TH as a good compromise between these two conflicting requirements.

Algorithm 1 Detection scheme based on the χ 2 test
Input:ẑ Output: f Initialization:

Secure State Estimation under a Sparse Sensor Attack
In this subsection, an attack-resilient and secure state estimation scheme, which reconstructs the optimal estimate for the state x under Assumptions 1-3, is developed. First, characterization of the matrix Φ defined in (21) under Assumption 3 is given as follows.
where G i is the observability matrix given in (4), the followings are equivalent: has full column rank.
(iv) The pair (A, C) is observable under q-sparse sensor attacks.
Note that the redundancy for observability is 2q, which is twice the sparsity of the attack signal. This is the key point of constructing the state estimation algorithm. We can examine each subset J k ⊂ [p] of sensors whose size is p − q. In other words, we have ( p q ) number of subsets J 1 , J 2 , · · · , J ( p q ) where J k ⊂ [p] and |J k | = p − q for k = 1, 2, · · · , ( p q ). Since Φ is ({µ i }-stacked) 2q-error detectable by Assumption 3 and Lemma 2.(ii), it easily follows that Φ is q-error detectable for J k with |J k | = p − q. This means that, even after removing any q sensors, the remaining outputs still have q redundancy for observability. Therefore, the detection scheme of Theorem 3, which relies on the ({µ i }-stacked) q-error detectability of the coding matrix, can be applied for each subset J k ⊂ [p] satisfying |J k | = p − q.
The configuration of the secure state estimator, which replaces the information fusion center D in Figure 1, is sketched in Figure 2, and its operation is described in Algorithm 2. Before explaining the operation, let Ψ denote (Φ P −1 Φ) −1 Φ P −1 where Φ and P are given in (21) and (23), respectively. Furthermore, the notation for a sub-matrix is slightly abused for simplicity. For example, P J , Φ J , and Ψ J denote P Figure 2. Configuration of the resilient estimation scheme with Gaussian disturbance/noise. Initially, an attack-free index set J * , a state estimatex, a standardized residual's norm g, and a fault alarm signal f , are set to [p], Ψẑ, 0, and 0, respectively. The algorithm continually checks if there is any attack in the index set J * based on Algorithm 1. For the given index set J * , the algorithm essentially calculates the MVUEx = Ψ J * ẑ J * , the residual r =ẑ J * − Φ J * x, the standardized residual ζ = (P J * − Φ J * Ψ J * P J * ) − 1 2 r, and its 2-norm g = ζ ζ only with the measurement and covariance data from the subset J * ⊂ [p]. Recall from Theorem 2 that, if e j = e J * , and thus, g = ζ ζ is an observation from a random variable g J * , which satisfies a χ 2 distribution with µ J * DOF, Therefore, g is used to detect the presence of attack in the subset J * by the χ 2 test. We compare g with the threshold ∆ J * TH , which is designed before running the algorithm and determines the probability of false alarm and the probability of error detection. If g ≤ ∆ J * TH , the index set J * is declared to be attack-free by setting f = 0 and the algorithm simply maintains the selected optimal index set J * . Otherwise, when g is greater than the threshold ∆ J * TH , the attack detection alarm is triggered by setting f = 1, and the algorithm starts the process of searching new attack-free index set.

end while
In order to find a new attack-free index set and, consequently, to recover the state x from the new index set, we search all subsets J k 's in [p] whose size is p − q. For a detailed explanation, let J 1 , J 2 , · · · , J ( p q ) be the set {J ⊂ [p] : |J | = p − q}. For each subset J k where k ∈ ( p q ) , the computing module C k calculates the 2 r J k , and its 2-norm g J k = ζ J k ζ J k only with the measurement and covariance data from the subset J k . Now, the new optimal subset J * is decided by the maximum likelihood (ML) decision rule with the values of g J k 's, and the selector S chooses the optimal index set J * by the ML decision rule. To this end, we wish to distinguish between ( p q ) hypotheses, H 1 , H 2 , · · · , H ( p q ) , which are given as follows: H k : the set J k is attack-free, i.e., e j = e Let us denote g k as a random variable such that g J k is a single observation from g k , whereas g J k denotes a random variable such that Note that, if the sensors indexed by J k are attack-free, then the random variable g k as well as g J k follows the χ 2 distribution with µ J k DOF. The ML decision rule choose the hypothesis H k * and the corresponding optimal index set J k * that maximize the likelihood p g k g J k ; H k , which is the PDF of g k being equal to the observation g J k under the hypothesis H k (that is, under the condition that there is no attack signal in the measurements indexed by J k ). Therefore, we have where the last equality comes from the fact that g k ∼ χ 2 µ J k under the hypothesis H k so that it follows the PDF of the χ 2 distribution. Therefore, from the index set J k * corresponding to the ML hypothesis H k * , the MVUE of the newly selected optimal index set J * (= J k * ), x J * , becomes the final suboptimal estimate of x.

Remark 2.
The proposed algorithm selects the subset of sensors J * ⊂ [p], which is most likely to be attack-free with |J * | = p − q. Moreover, if the selected set J * is actually attack-free, it gives the minimum variance with unbiased estimation. In short, Algorithm 2 generates a state estimate, which is most likely to have minimum variance with unbiased mean. However, we say that it is a suboptimal estimate of x instead of the optimal estimate because the decentralized multi-sensor information fusion Kalman filter cannot ensure to achieve the centralized optimal covariance even without attack as illustrated in ( [24], Section 5).

Remark 3.
Note that Algorithm 2 needs to prepare ( p q ) candidates and compare all those candidates. The time complexity of the error correction algorithm depends on the number of combinations ( p q ), and thus, it has the polynomial time complexity of O(p min{q, p−q} ). Therefore, the proposed algorithm may not be scalable for very large p with q ≈ p/2 due to the combinatorial nature of the algorithm. The time complexity could be reduced by imposing additional restrictive assumptions as done in [20,21] which reformulate the problem into a convex optimization problem. However, in our scheme demanding minimal assumptions, the comibinatorial algorithm only needs to operate when an attack is detected. In addition, most of the time, only the attack detection algorithm requiring a small amount of computation, is executed. Another advantage of the proposed algorithm is that its space complexity is linear with the number of sensors p, that is, O(p). The total memory size required for local observers is ∑ p i=1 µ i ≤ np, whereas if all possible combinations of estimator candidates are configured as real observers, the observer's size becomes n( p q ).

Simulation Results
We consider a motor-controlled multi-DOF torsion system [34] as depicted in Figure 3. A continuous-time state-space model of the system when the control input is the torque τ (N·m) generated by the servo motor is given by with the matrices where are the state variable and the output measurement, respectively. Here, the unit for angular positions θ's and the unit for angular velocitiesθ's are (rad) and (rad/s), respectively. The parameters are borrowed from [34], and we have that J 1 = 0.0022, J 2 = J 3 = 0.000545 (kg·m 2 ) for the moment of inertia, b 1 = 0.015, b 2 = b 3 = 0.0015 (N·m/(rad/s)) for the viscous damping ratio, and k 1 = k 2 = 1 (N·m/rad) for the flexible stiffness.
Note that the dynamics are the same as those of the three inertia system considered in [15]; however, Figure 3 additionally considers the servo motor system given as follows: which generates the torque τ (N·m) from the input voltage of u (V). The parameters for the servo system are also borrowed from [34], and we have that η g = 0.9 for the gearbox efficiency, K g = 70 for the total gear ratio, η m = 0.69 for the motor efficiency, k t = 0.00768 (N·m/A) for the motor current torque constant, k m = 0.00768 (V/(rad/s)) for the motor back electromotive force (EMF) constant, and R m = 2.6 (Ω) for the motor armature resistance. Thus, the final continuous-time plant with the voltage u (V) as an input signal is obtained as with the matrices and the same C c as in (44). Finally, the zero-order hold equivalent model of (46) is used for the discrete-time model P in (1), and the matrices are calculated by with the sampling time of T s = 0.002 (s). By examining all possible combinations of sensors, it follows that the system P in (1) with A and C given in (48) is 2-redundant observable, and hence it is observable under 1-sparse sensor attack by Lemma 2. In addition, the disturbance d and the noise n are assumed to satisfy Assumption 1 with , and the initial state x(0) of the system (46) satisfies x(0) ∼ N(x 0 , P 0 ) as stated in Assumption 1 with the meanx 0 and the covariance P 0 given bȳ The simulation is performed under 1-sparse sensor attack on the third sensor with the signal shown in Figure 4b, which is made to mimic the motion pattern by the natural frequency as observed in Figure 4c,d. Moreover, the attack starts at 2 second, which is the same time when the square pulse input u is injected into the system as described in Figure 4a. Even under the attack signal, the resilient state estimation with multi-sensor information fusion Kalman filter based on the observability decomposition developed in Sections 3 and 4 works well. The states are recovered with a small error as demonstrated in Figure 4c,d, which are the state estimation results for θ 3 andθ 3 , respectively.  In this simulation, the threshold ∆ TH for the attack detection is chosen by δ = 0.05 in (41) so that the cumulative density function (CDF) satisfies ∆ TH 0 p g J * (x)dx = 0.95 where p g J * is the PDF of a random variable g J * , which satisfies a χ 2 distribution with µ J * DOF, as stated in (42). Since Figure 4e shows that the 2-norm of the standardized residual, g, exceeds the threshold ∆ TH at the instant of 2 second, which is the initiation time of the attack, it is judged that there is an attack (the lines from 8 to 9 in Algorithm 2) and the estimation scheme begins to search the indices of attack-free sensors (the lines from 10 to 16 in Algorithm 2). As a result of the search algorithm, a new set of sensor indices is found by the ML decision rule (the line 16 in Algorithm 2), and the attacked third senor is excluded from 2 second as depicted in Figure 4f.

Conclusions
In this paper, the multi-sensor information fusion Kalman filter proposed in [24,25] was improved using the observability decomposition to ensure the convergence of the error covariance matrix of each local observer. The local observer of a decentralized Kalman filter with only a single sensor was designed for an observable subspace instead of the entire n-dimensional state vector without any global information. Then, the proposed decentralized information fusion Kalman filter was applied to the secure state estimation problem where some of sensors were compromised by a malicious attacker.
To cope with the zero-mean Gaussian distributed disturbances/noises, a local Kalman filter replaced the partial Luenberger observer designed in [15], where bounded disturbances/noises were considered in the state estimation problem under sparse sensor attacks. When there was no attack, the proposed algorithm guaranteed an optimal state estimate in the sense of minimum variance, and it generated a state estimate that was most likely to have the minimum variance with an unbiased mean in the presence of sparse sensor attacks.
The proposed algorithm can be applied to cyber-physical systems, including complex sensor networks operating based on linear dynamics under sparse sensor attacks as well as Gaussian disturbances/noises. We imposed the minimal assumption of the redundant observability, which is known to be the equivalent condition to solve the problem. Furthermore, the computational time was alleviated by running only a relatively light attack detection scheme for most of the execution time, and the memory size of the observer was reduced by constructing local observers only for observable subspaces.
One possible direction of future research is to develop a distributed attack-resilient state estimator. While this paper proposed a decentralized Kalman filter scheme, the fusion center collects all the data from each sensors. Although the construction of local Kalman filters is decentralized, the information fusion method is still centralized. Therefore, it is necessary to develop a fully distributed attack-resilient state estimation technique for a general sensor network without any central information fusion center.

Conflicts of Interest:
The authors declare no conflict of interest.