Allocation of Eavesdropping Attacks for Multi-System Remote State Estimation

In recent years, the problem of cyber–physical systems’ remote state estimations under eavesdropping attacks have been a source of concern. Aiming at the existence of eavesdroppers in multi-system CPSs, the optimal attack energy allocation problem based on a SINR (signal-to-noise ratio) remote state estimation is studied. Assume that there are N sensors, and these sensors use a shared wireless communication channel to send their state measurements to the remote estimator. Due to the limited power, eavesdroppers can only attack M channels out of N channels at most. Our goal is to use the Markov decision processes (MDP) method to maximize the eavesdropper’s state estimation error, so as to determine the eavesdropper’s optimal attack allocation. We propose a backward induction algorithm which uses MDP to obtain the optimal attack energy allocation strategy. Compared with the traditional induction algorithm, this algorithm has lower computational cost. Finally, the numerical simulation results verify the correctness of the theoretical analysis.


Introduction
Cyber-physical systems (CPS) are considered to be among the revolutionary technologies due to the continuous technological breakthroughs and innovations in information technology and in the manufacturing industry [1].CPS is a multidimensional and complex system that deeply integrates control, communication and computing (that is, 3C technology composed of control, communication and computing)and can realizes large-scale information acquisition and intelligent control of the physical world through the cognition, communication and control of physical objects, so that the network can monitor the specific actions of a physical entity in a real-time, reliable, remote and safe way [2,3].CPS is widely used in aerospace, industrial production, advanced automobile systems, energy reserve, environmental monitoring, national defense and military, infrastructure construction, intelligent building, smart grids, transportation systems and telemedicine [4].With the rapid development of network, computing, sensing and control systems, CPS technology is more and more widely used, and the emerging network attacks make the wireless CPS system very fragile, and the security of CPS becomes the primary consideration [5][6][7].
For the security issues of a system's remote state estimation, there are many forms of malicious network attacks, but they are divided into three main and common categories: denial of service (DoS) attacks, integrity (including replay and false data injection) attacks and eavesdropping attacks [8].DoS attacks are designed to interfere with wireless communication channels.This attack will lead to a significant decline in the estimation accuracy in CPS [9].Peng [10] and Zhang [11] formulated the problem as a Markov decision process (MDP) problem to consider the optimal attack power allocation for remote state estimation in a multi-system.Integrity attacks can disrupt the transmitted data packets with stealth constraint [12,13].In Ref. [14], an important scenario is designed from the attacker's point of view, in which the false data injection attack can completely and secretly destroy CPS.In addition, the channel may be subject to eavesdropping attacks, which can lead to serious economic losses and even pose a threat to human survival by eavesdropping on personal privacy data [15,16].For example, in the intelligent transportation system, eavesdroppers infer the path planning of vehicles by monitoring the location information, and on this basis, eavesdropping attacks will easily succeed [17,18].In terms of existing research, data encryption is the main method to protect system privacy from eavesdropping attacks [19][20][21].
Recently, the issue of remote state estimation in the presence of eavesdroppers has attracted widespread attention from researchers.The attack types of eavesdropping attacks are divided into passive eavesdropping attacks and active eavesdropping attacks.Some estimation and control problems have been studied in the presence of active attacks.Han [22] studied the problem of active eavesdropping on fading channels, and proposed an interference-assisted eavesdropping method to improve the probability of successful monitoring.Yuan [23] constructed a two-person non-zero-sum game between the sensor and the active eavesdropper with the goal of minimizing the covariance of the self-estimated error and maximizing the covariance of the opponent's estimated error.Ding [24] took the trade-off between stealth and eavesdropping performance as a constrained MDP, and proposed an optimal strategy for active eavesdropping.
The above literatures indicate a certain breakthrough in the design of active eavesdropping solutions.This paper mainly studies the passive attacks of eavesdroppers.Tsiamis [25] proposed a confidentiality mechanism for randomly hiding sensor information, and explored the trade-off between user utility and control theory confidentiality through optimization methods.Huang [26] proposed a new encryption strategy and considered the cost of the encryption process.Then, the optimal determinism of the encryption strategy and the existence of the Markov strategy in the finite time horizon are proven.Wang [27] theoretically proved that there are some structural properties in the optimal transmission scheduling for known and unknown eavesdropper estimation errors.In reference [28], the transmission scheduling strategy of remote state estimation systems with eavesdroppers on packet-dropping links was studied.Yuan [29] transformed the system model into MDP in order to obtain the optimal transmission scheduling to minimize the AoI of CPS and keep the AoI of eavesdroppers above a certain level, and proved that the optimal transmission scheduling strategy is a threshold behavior on the CPS and AoI of eavesdroppers, respectively.In [30], the proposed problem is formulated as a Stackelberg game, and the strategy of maximizing the secure transmission rate between sensor and controller in the presence of malicious eavesdroppers and disruptors is studied.On the basis of analyzing the influence of different strategies on eavesdropping performance, Zhou [31] studied the multi-output system and proposed a decryption scheduling scheme to minimize the expected estimation error under the condition of energy constraint.
Most of the existing literature studies the optimal transmission strategies of sensors from the remote estimator.Compared to [27,28], this paper studies the optimal attack energy allocation strategies from the perspective of eavesdroppers.Moreover, the previous literature mainly focuses on the situation that CPS has eavesdroppers in a single system and a finite time range, but does not pay too much attention to the situation when there are eavesdroppers in a multi-system and in an infinite time range.In this paper, the optimal attack allocation problem of remote state estimation in CPS with eavesdropping attacks in a multi-system in infinite time range is studied.Our goal is to maximize the state estimation error of the eavesdropper, so as to determine the optimal attack allocation of the eavesdropper.The contributions of this paper are as follows: 1.
We propose a multi-system eavesdropping attack model based on channel SINR, which reveals the relationship between attack power and packet arrival rate.

2.
In the infinite time horizon, under the condition of energy constraint, the optimal attack scheduling strategy is obtained by constructing MDP and using the Bellman equation.

3.
Finally, according to the given algorithm, the optimal attack energy allocation strategy is obtained, and then it is verified by simulation experiments.
Notations: The entire paper uses the following symbols.N is the set of natural numbers.The n-dimensional Euclidean space is denoted by R n .S n + (S n ++ ) is the set of n by n positive semi-definite matricess (and positive definite matrices).Tr(X) is the trace of a matrix X, and X T is the transpose of X and X −1 denotes the inverse of matrix X. X > 0 and X ≥ 0 represent that X is a positive definite matrix and positive semidefinite matrix, respectively.For functions g and h, g • h(x) stands for the function composition g(h(x)) and h n (x) = h(h n−1 (x)) with h 0 (x) = x.E[•] indicates taking the expected value of • .P[•] denotes the probability of • .

Problem Setup 2.1. System Model
Figure 1 shows the system architecture.We consider N general discrete time-invariant stochastic system, which is given as follows where k ∈ N is the time index and x i (k) ∈ R n and y i (k) ∈ R m refer to the state of the ith system and the system measurements vector taken by the sensor at time k, respectively.The process noise ω i (k) ∈ R n and the observation noise v i (k) ∈ R m are assumed to be independent and identically distributed (i.i.d.) Gaussian noises with zero-mean and the covariances matrix Q i 0 and matrix R i > 0, respectively.The initial state of the ith system x i (0) is also a zero-mean Gaussian random variable independented ω i (k) and v i (k) with covariance F i (0) ≥ 0. We also assume that the pair (A i , C i ) is observable and Assuming that the sensors in the system are intelligent sensors with certain computing powers, each sensor can first use the collected observation data to calculate the local state estimation, and then transmit the local state estimation value to the remote state estimator.Therefore, we use xi (k) and Fi (k) to represent the ith sensor's local minimum mean-squared error (MMSE) estimate of the state and the corresponding error covariance [32]: which can be calculated based on a standard Kalman filter: For ease of representation, we can also define the Lyapunov and Riccati operators h, g: Under the assumptions of detectability and stability, it has been shown that the posterior estimation error covariance matrix of the Kalman filter converges exponentially from any initial condition to a unique value F, [33], i.e., Fi (k) = Fi , k ≥ 1, which Fi is the steady-state error covariance, which is determined by the unique positive semi-definite solution of gi • h i (X)=X [34].

Attack Model Based on SINR
To simulate random data loss due to fading and interference, we assume that the communication between the sensor and the remote estimator or the eavesdropper is via an Additive White Gaussian Noise (AWGN) channel using quadrature amplitude modulation (QAM).Data packets sent by the sensor are quantized and mapped to QAM symbols.Then, digital communication theory reveals the relationship between symbol error rate (SER) and SINR as follows [5,36]: where x e −t 2 /2 dt, ℘ ∈ {e, a} and α > 0 is a parameter.℘ = e represents the remote estimator side, ℘ = a indicates the eavesdropper side.
Considering the remote estimator side first, the channel SINR for the remote estimator at time k is [24], where Φ i is the channel gain of the ith communication channel between the sensor and the remote estimator.p i (k) ≥ 0 is the transmission power for the QAM symbol used by sensor i at time k and σ 2 i,e is the AWGN power of the ith channel between the sensor and the remote estimator.Define a random variable ζ i (k) ∈ {0, 1} as to whether the remote estimator successfully receives the information at time k, i.e., Then, the packet arrival rate of the remote estimator is ).
Secondly, considering the eavesdropper side, we can know that the SINR of the channel at the eavesdropper side at time k can be expressed as where Ψ i is the channel gain of the ith communication channel between the sensor and the eavesdropper.a i (k) indicates the attack power to the ith channel launched by the eavesdropper.σ 2 i,a is the AWGN power of the channel between the sensor and the eavesdropper.Similarly, we use a binary random variable ζ a i (k) to indicate whether the eavesdropper is successful in eavesdropping, i.e., Then, the probability that an eavesdropper can successfully eavesdrop is: ). ( Hypothetical processes ζ e i (k) and ζ a i (k) are independent of each other.

Remote State Estimation
With the local estimate received by the remote estimator, we can determine the MMSE state estimate xe i (k) and the corresponding estimation error covariance F e i (k) of the remote estimator at time k, where xe i (k) and F e i (k) are obtained by the following iterative process: Similarly, denote xa i (k) and F e i (k) as the MMSE state estimation and corresponding error covariance of the eavesdropper at time k, then xa i (k) and F a i (k) can be expressed as Therefore, Define S { Fi , h i ( Fi ), h 2 i ( Fi )...}, it is composed of all possible values of F e i (k) and F a i (k).

Problem Formulation
Specifically, we consider the following problem: from the perspective of the eavesdropper, in an infinite time horizon, the eavesdropper finds the optimal attack allocation under the condition of limited energy to maximize the state estimation error of the eavesdropper, i.e., Problem 1.

Optimal Attack Schedule
In this section, we formulate Problem 1 as a discrete time MDP to solve.In addition, we also give an algorithm for searching the optimal eavesdropping attack strategy.

MDP Formulation
For the convenience of notation, denote τ e i (k) (or τ a i (k)) as the holding time from the estimator (or eavesdropper) to the continuous successful acquisition of data at time k, that is, the duration from the last successful transmission time to time k, which can be expressed by the following formula: Obviously, τ e i (k) ∈ S e i = N (or τ a i (k) ∈ S a i = N), then we can get: Using MDP to describe the dynamic process of CPS under eavesdropping attacks, MDP is expressed mathematically as S, A, P, r(•) , and the specific elements are as follows.
State space: where τ e i (k − 1) and τ a i (k − 1) can be considered as the state of process i at time k − 1 at the remote estimator side and eavesdropper side, respectively.The state at time k is defined as s(k) = (s 1 (k), s 2 (k), . . ., s N (k)), and its value range is a countable state space S i S e i × S a i .Let S = {S 1 , . . ., S N }.
Action space : we can know the action space is defined as A {a 1 (k), a 2 (k), . . ., a N (k)}, where a i (k) ∈ (0, a i , a i , . . ., a is the maximum attack power to channel i.Thus, the action is a(k) {a 1 (k), a 2 (k), . . ., a N (k)} ∈ A.
Transition probability :let the state transition introduction matrix at time k be P i (s i (k + 1)|s i (k), a i (k)), which represents the probability of the state changing from s i (k) to s i (k + 1) under action a i (k), where s i (k), s i (k + 1) ∈ S, a i (k) ∈ A. For simplicity, let the state at time k be s i (k) = (j 1 , j 2 ).Then, the state transition probability matrix is as follows: Payoff functions : let r(•) be the immediate cost function and define it as: Obviously, the single-stage reward at time k is independent of the action behavior and only depends on the current state.
Note that the random decision rule of the eavesdropper is a mixed strategy sequence π = {(a 1 (1), a 2 (1), . . ., a N (1)), (a 1 (2), a 2 (2), . . ., a N (2)), . ..},where π is the random kernel from H to A and definition Π is the set of all these feasible strategies.Based on the process state s(k), the attacker chooses action a(k) = a(s(k)), π = {(a(s(1)), . . ., a(s(k)), . ..}.Then, for the initial state s(0) = s ∈ S, we can get the sum of expected reward r(k) following the action strategy π ∈ Π: and its optimal value J * (s) is Define the average value function under policy π ∈ Π as the function V: S → R. Therefore, we can get the following theorem.
Theorem 1.According to the MDP theory, we can obtain the optimal value J * (s) by solving the following optimality (Bellman) equation: where s = (s 1 , . . ., s N ) ∈ S is the initial state.
The optimal attack strategies of the eavesdropper is: Proof of Theorem 1.According to the eighth chapter in reference [37], Theorem 1 can be obtained by introducing our state transition probability matrix (27) and immediate cost function (28).
From Equation (8.4.2) in [37], we can get the following equation: where π k is the strategy of the time k.r and P are abbreviations.Many decision rules are contained in historical strategies.So, r π 1 and P π 1 can be decomposed into the following formula: Therefore, we can get the following: Then, we rewrite the finite-horizon optimality Equation (4.5.1) in [37] as Thus, we can get the optimality (Bellman) Equation ( 31) and the optimal attack strategies of the eavesdropper (32).So, Theorem 1 is proved.

Remark 1.
It should be noted that for finite MDP, the action a(k) taken at time k is non-stationary and depends on the current state at time k.Remark 2. We can get the optimal attack energy allocation strategy of (29) by using the optimality (Bellman) Equation ( 31); in addition, the optimal strategy is statically deterministic, which helps us to find out the structural characteristics of the optimal allocation strategy.

Policy Iteration Algorithm
MDP proposed in this paper has infinite state space.However, according to the characteristics of state transition in the system model, we can find that when the eavesdropper's attack energy is limited, the transition rule can effectively limit the system state in a limited time range.Therefore, in the MDP proposed in this paper, although it has infinite state space, we can treat it as an MDP with a finite time domain.This is convenient for us to design the algorithm of the optimal attack strategy.
In a finite time domain, the solution of the optimal equation is the optimal quality function from the decision time k to the decision time T at the end of the process.Based on the MDP problem constructed above, we provide a specific backward induction algorithm to solve it and provide the optimal attack strategy, i.e., Algorithm 1.

Algorithm 1 Backward induction algorithm for optimal allocation strategy
σ a , ā, T, S, A, s.Ensure: The optimal value J * T (s); optimal deterministic Markov policy π * Step 1: Calculate Fi , the packet rate λ e i (k), λ a i (k) and the holding time τ e i (k), τ a i (k) Step 2: Calculate state transition matrix P i Step 3: Set all k = 0 and for all s(k) ∈ S, compute J * (s) + V(s) by (31).
Step 7: If k = 0, then output J * T (s) and π * = (a * (0), a * (1), ..., a * (T − 1)).Otherwise, go to Step 4. Remark 3. In the above Algorithm 1, it is assumed that a * (s(k)) = 1 in order to reduce the calculation cost and complexity compared with the traditional algorithm.Remark 4. We can derive the state estimation of the eavesdropper each time to ensure the feasibility of Algorithm 1.

Discussion and Illustrative Example
In this section, according to the above MDP model, we can provide some numerical simulations to show the optimal attack energy allocation strategy of problem 1.Consider systems (1) and (2) with β = 0.5.The parameters of the systems and channels are shown in Table 1.Suppose the energy constraint is ā = 10.In Table 1, a = 0, a = 5, a = 10 indicates that the possible maximum attack power of channel 1 is 0, 5, 7 and 10, respectively, and another 0 means that there is no attack.a 2 = 10 in the same way.According to the number of existing systems and the number of channels that eavesdrop at the same time, we can divide them into the following three methods, such as numerical simulation.We use the Algorithm 1 to calculate the optimal attack energy allocation strategy and the optimal average return.

Single System
Let us first consider the case where there is only one system.When there is only one system, we consider the optimal attack energy allocation strategy when attacking this channel at different times.We can use the data in Table 1 about Sensor 1.The strategy set for the eavesdropper is {(0, 0), (10, 0), (0, 10), (5,5)}, where (0, 0) means no attack, (10, 0) means use energy 10 to attack at the first moment and not to attack at the second moment and (0, 10) and (5,5) have a similar meaning.Assume that the transmission power of the sensor is p(k) = 0.6.Through the calculation, the error covariance of the steady state estimation is obtained, i.e., F = 1.9755.Assume that the AWGN power of the channel between the sensor and the estimator is σ 2 e = 0.3, and the AWGN power of the channel between the sensor and the eavesdropper is σ 2 a = 0.5.This paper studies the infinite time domain, but in order to simplify the calculation, we use the truncated set N 0 = {0, 1, ..., 16}.The optimal strategy is calculated through Algorithm 1.
In Figure 2, we use τ e (k) and τ a (k) to represent the state of the optimal strategy and the optimal strategy is shown, where the purple and red symbols represent policy 1 (policy 1 = (0, 0)) and policy 2 (policy 2 = (0, 10)).

Conclusions
The optimal attack energy allocation of multi-system remote state estimation in CPS is studied when eavesdroppers exist in an infinite time domain.Based on the channel SINR, a wireless communication model is constructed.When the eavesdropper's energy is limited, the optimal value of the eavesdropper and the optimal attack energy allocation strategy are found by using MDP theory.Finally, the research results are numerically simulated.According to the theoretical analysis and the numerical analysis, we can draw the following conclusions: in a multi-system, the optimal energy allocation strategy is to choose a higher attack energy to attack the channel when the estimation error covariance at the eavesdropper is large, and choose a lower attack energy to attack the channel when the estimation error covariance is small.And, we can see that the optimal attack strategy has an obvious threshold structure.In the future, we will prove the threshold structure of the optimal allocation strategy and study the situation when the detector exists.

Figure 2 .
Figure 2. A single system's optimal energy allocation.

Figure 4 .
Figure 4. Optimal action of (τ e (k), τ a (k)).The meaning of circles and stars is the same as in Figure3.

4. 3 .
Dual System (Not Attacking, Attacking One Channel or Attacking Two Channels)

Figure 6 .
Figure 6.Optimal action of (τ e (k), τ a (k)).The meaning of circles, triangles, pentagrams and stars is the same as in Figure5.

Table 1 .
Parameters for sensors and attack power.

Table 2 .
AWGN power and Transmission power.

Table 3 .
Attack power levels.Dual system (not attacking or attacking one channel).

Table 4 .
Attack power levels.Dual system (not attacking, attacking one channel or attacking two channels).