Extended Distributed State Estimation: a Detection Method against Tolerable False Data Injection Attacks in Smart Grids

False data injection (FDI) is considered to be one of the most dangerous cyber-attacks in smart grids, as it may lead to energy theft from end users, false dispatch in the distribution process, and device breakdown during power generation. In this paper, a novel kind of FDI attack, named tolerable false data injection (TFDI), is constructed. Such attacks exploit the traditional detector's tolerance of observation errors to bypass the traditional bad data detection. Then, a method based on extended distributed state estimation (EDSE) is proposed to detect TFDI in smart grids. The smart grid is decomposed into several subsystems, exploiting graph partition algorithms. Each subsystem is extended outward to include the adjacent buses and tie lines, and generate the extended subsystem. The Chi-squares test is applied to detect the false data in each extended subsystem. Through decomposition, the false data stands out distinctively from normal observation errors and the detection sensitivity is increased. Extensive TFDI attack cases are simulated in the Institute of Electrical and Electronics Engineers (IEEE) 14-, 39-, 118-and 300-bus systems. Simulation results show that the detection precision of the EDSE-based method is much higher than that of the traditional method, while the proposed method significantly reduces the associated computational costs. 1518 Nomenclature: p sub The number of subsystems after decomposition subsys_k The label of subsystem after decomposition (1 ≤ k ≤ p sub) N The number of buses in a power system M The number of transmission lines in a power system M TIE The number of tie lines N k The number of buses in subsys_k ܰ ௞ ୉ The number of buses in subsys_k after extension M p The number of transmission lines in subsys_k ‫ܯ‬ ௞ ୉ The number of transmission lines in subsys_k after extension bus i Load bus/generators in power system, labeled according to the definition in the Institute of Electrical and Electronics Engineers (IEEE) standard case (1 ≤ i ≤ N) L i,j The transmission line connecting bus i and bus j P i,j The active line power flow from bus i to bus j , observed on bus i x State variables in power system, T n x x x x = z Measurements in power system, T m z z z z = e Measurements noise e = [e 1 , e 2 , …, e m ] T R The diagonal measurement covariance matrix h(x) The nonlinear function …


Nomenclature: p sub
The number of subsystems after decomposition subsys_k The label of subsystem after decomposition (1 ≤ k ≤ p sub ) N The number of buses in a power system M The number of transmission lines in a power system M TIE  The number of tie lines N k  The number of buses in subsys_k The number of buses in subsys_k after extension M p  The number of transmission lines in subsys_k The number of transmission lines in subsys_k after extension bus i Load bus/generators in power system, labeled according to the definition in the Institute of Electrical and Electronics Engineers (IEEE) standard case (1 The transmission line connecting bus i and bus j P i,j The active line power flow from bus i to bus j , observed on bus i x State variables in power system, The diagonal measurement covariance matrix h(x) The nonlinear function relating measurements z to state variables x o The degree of freedom in power system T o,p The threshold of o degree of freedom corresponding to a detection confidence with probability p x k  States variables in subsys_k, [ , ,..., ] The nonlinear function relating measurements z k to states variables x k in subsys_k o k  The degree of freedom in subsys_k a k The false data injection attack (if exists) in subsys_k G The graph model for given smart grid The extension status of i th tie-line in subsys_k IDL Injected data levels, the relative injected error against the original value of measurement

Introduction
In smart grids, information techniques are applied to provide a desirable infrastructure for real-time measurement, transmission, decision and control.For this purpose many sensors are deployed across millions of buildings and streets.They are connected to the information network, raising the issue of how to protect the system against false data injection (FDI) attacks, which are launched by hijacking and tampering with communication, or illegal access and control of electrical devices [1].Moreover, FDI attacks are quite attractive to hackers, since the data in smart grid can be easily monetized (e.g., hackers can manipulate their energy costs by modifying the smart meter readings).The false data may also mislead the control center to take erroneous actions, which can be extremely dangerous for smart grids.
Power system state estimation (SE) has been believed to be a good solution to process the bad data, since the pioneering work of Schweppe in 1970 [2].It is applied in supervisory control and data acquisition (SCADA) systems to reduce the observation errors, detect bad data and estimate the electrical states of power systems through processing the set of real-time redundant measurements, typically bus voltage magnitudes and phase angles.
It is believed the bad data detection methods, such as energy conservation test, the Chi-squares test and normalized residuals test [3], can protect the smart grids against the FDI attacks.While relatively effective against random noises, these detectors lack the ability to detect specialized and highly structured false data that conforms to the network topology and some particular physical laws.Recent works [4][5][6][7][8][9][10] have demonstrated that an adversary, armed with knowledge about the network's configuration, can successfully construct undetectable FDI attacks on SE avoiding detection.However, there are still some obstacles to launch such attacks.First, the attacker has to know the configuration of the power system, which is in general not easy to obtain.Second, the attacker has to access a sufficient number of smart meters.Some smart meters are protected by different mechanisms.It is almost impossible to get access to every meter.Third, some smart meter readings such as active power, reactive power and voltage are read-only.The attacker can only falsify some writable configuration parameters like the current transformer (CT) ratio and time.Finally, even when the above conditions are satisfied, the attacker still has to solve a non-deterministic polynomial (NP)-complete problem to find such a sparse attack vector, which has a high computational cost.Simulation results indicate that the attacker may need to compromise almost 80% of all meters to ensure finding an attack vector for targeted FDI attacks (unconstrained case) in the Institute of Electrical and Electronics Engineers (IEEE) 118-bus system [8].The authors of [11] presented a "generic FDI", which could bypass the bad data detection and did not require solving the NP-complete problem.However, to launch such an attack, the attacker has to know all sensor measurements and state values of the power system.In our work, it is shown that light-weight false data can bypass traditional bad data detection methods, exploiting their tolerance of observation errors.In experiments, when we injected false data into each bus in the IEEE 14-, 39-, 57-, 118-and 300-bus standard systems, a relative low detection precision is achieved by the Chi-squares test.These attacks exploit the detector's tolerance of normal cumulative random noises and hide among normal measurement errors.This kind of attack is named as tolerable false data injection (TFDI) in this paper.Compared with the strict conditions required by the undetectable FDI attack, the TFDI only requires the attacker to manipulate meters on target transmission lines.It is a relatively easy and practicable approach for attackers to falsify some specific measurements with limited knowledge about the system configuration and restricted access to smart meters.
Various advanced methods are proposed to detect the false data.Many researchers have studied how to find the important meters in FDI attacks, and investigated various security strategies to protect the important measurements, such as independent verification [11] and data encryption [12,13].
These methods would require extra investments on system implementation.Phasor measurement units (PMUs) have become increasingly deployed in power systems, providing accurate, synchronous, and secure sampling of the system states.How to economically deploy PMUs to best facilitate the state estimator to detect FDI attacks has become an interesting problem [6,14].In addition, many smart algorithms are applied to detect the false data, such as geometrically-designed residual filter [15], and the adaptive cumulative sum (CUSUM) test [16].Zonouz et al. [17] presented a security-oriented cyber-physical state estimation (SCPSE) system, in which the suspicious nodes in the cyber network are removed and the SE is applied to detect the false data with the remaining measurements, but the observability of the remaining measurements is a big problem for the SCPSE [17].In the fully distributed power system, the distributed state estimation (DSE) is applied to detect false data, which is a two-level process: the local level is in charge of filtering the local bad data and the coordination level is applied to detect boundary bad data [18,19], but the coordination level SE faces constraints on observability.
In this paper, we propose a bad data detection method based on an extended distributed state estimation (EDSE).With this method, a power system is decomposed into several subsystems using graph partition algorithms.For each subsystem, buses are classified into three groups: internal bus, boundary bus and adjacent bus.Each subsystem is extended outward to include the adjacent buses and tie lines, and generate the extended subsystem.The SE and Chi-squares test are applied to detect whether there is any false data in each extended subsystem.Through decomposition, the false data will stand out from normal observation noises and the detection sensitivity will be improved.To verify the effectiveness of the EDSE-based method, extensive TFDI attack cases are designed to inject false data into the IEEE 14-, 39-, 118-and 300-bus systems.These TFDI attacks exploit the detector's tolerance of normal cumulative random noises and hide among normal measurement errors.They keep the test statistics lower than the threshold to bypass the Chi-squares test.Through decomposition, false data do not have enough space to hide behind normal measurement errors.They will stand out prominently and the detection sensitivity will be increased.The IEEE 14-bus system is selected to illustrate how the attack is constructed and how the EDSE-based false data detection method works.The IEEE 39-, 118and 300-bus systems are used to carry out the simulations to discuss the detection performance, computation complexity and tunable parameters.Simulation results demonstrate that the detection accuracy of the EDSE-based method is much higher than the traditional bad data detection method on average, and the computation cost is reduced by over 90% in the IEEE 300-bus system.
To summarize, the contributions of this work are as follows: (1) The possibility of random TFDI attack construction, which is much easier to launch than the well-known FDI method in [8], is proved; (2) Several cases are initially designed and numerically analyzed to show how the TFDI attacks bypass the traditional bad data detection method, and to demonstrate their potential risks; (3) A new method is proposed to detect the injected false data.The graph model is introduced to automatically decompose the smart grid, instead of manual power system partition based on the grid topology and geographical information in the power system DSE.Its detection accuracy is proved to be much higher than traditional methods and its computation complexity is significantly lowered.
The rest of this paper is organized as follows: the background of SE and bad data detection is given in Section 2; the TFDI attack is introduced in Section 3; in Section 4, a TFDI attack scenario is demonstrated to explain how to bypass the bad data detection, and prove the potential motivations and risks; the methodology of EDSE is presented in Section 5; in Section 6, the proposed method is tested with IEEE standard systems, and the results and analysis are also shown in this section; the concluding remarks and future work are given in Section 7.

SE
Power system SE is widely used to ensure the safety and economy of operation of power system.The state variables are related to the measurements as shown in Equation ( 1): where x is the state variables; z is the meter measurements; x n ; and e = [e 1 , e 2 , …, e m ] T is the measurement error.For a well-proofreading system, these errors can be considered to follow the Gaussian distribution of zero mean [3].
In the SE, measurements are usually the values that can be observed easily, such as the line power flow, bus power injections, bus voltage magnitudes, and line current flow magnitudes, etc.The state variables are usually complex phasor voltages which cannot be measured conveniently.Both the measurements and state variables follow the same constraints, such as power balance theory and the Kirchhoff's Law, etc.When using the polar coordinates for a system containing N buses, the state vector will contain (2N − 1) elements, N bus voltage magnitudes and (N − 1) phase angles.In general, measurements are more than state variables (m > n), since there are more lines than buses and more kinds of measurements than state variables.Essentially, power system SE is a process which uses real-time redundant measurements to improve data accuracy and automatically excluded from the error message caused by random interference.The objective is to find an estimate x ^ of x that is the best fit of the measurement z according to Equation ( 1).The problem is usually solved by the weighted least squares (WLS) algorithm [3].The SE can be formulated as a quadratic optimization problem: where R −1 is the measurement inverse covariance matrix.The Newton's method can be applied to solve the quadratic optimization problem.The increment can be calculated by: where ) is the gain matrix.
The convergence criterion is the following: where ε x is a predefined threshold.

Bad Data Detection
Sensor measurements might be inaccurate because of device misconfiguration, device failures, malicious actions or other errors.The Chi-squares test is a common approach for detecting bad data according to the measurement residuals: Assuming that all state variables are mutually independent and the sensor errors follow a normal distribution, the measurement residuals  ( ) ) The steps of the Chi-squares test are as follows: (1) Solve the WLS estimation problem and compute the measurement residuals  ( ) (2) The threshold χ 2 (m−n),p is determined through a hypothesis test with a significance level p.
There is a trade-off between false positive rate and false negative rate.A high threshold may lead to a high false alarm rate.According to [4], p = 95% is an empirical value.
, then bad data will be suspected.Else, the measurements will be assumed to be free of bad data.

TFDI
Most researches on the FDI construction follow the same idea: the attackers find an attack vector, a, to be equal to Hc.Then the manipulated measurement z a = z + a can pass the bad data detection and identification of direct-current (DC) SE [8,9].Thus, the measurement residual is: From the perspective of the attacker, it is almost an unattainable mission to find an attack vector a in the real world.Firstly, the topology of the power system is one of the top secrets of most power companies.It is difficult to obtain the measurement matrix H. Secondly, solving the a = Hc, which in real systems is an ultra-high dimensional equation is difficult.It would be a NP-hard problem, when the attackers want to inject a specific data with limited compromised meters.Moreover, if the system topology is changed, the FDI attack would trigger bad data detection.
Subject to the constraints of invisible observation errors and the false alert rate, the tolerance mechanism for measurement errors in SE is necessary.Instead of solving the problem in Equation ( 6), the attacker can construct a TFDI below the threshold of estimated residuals: Moreover, there is a high probability that the false data could not be detected when the attackers manipulate the data on both sides of the same transmission line.There are four power flow measurements per line.In each direction, there is a pair of active powers and reactive powers.Since the active power is related to economic interests, it is more attractive for attackers to falsify.On the transmission line L i,j (between the bus i and j), P i,j denotes the active power from bus i to bus j , observed on bus i , and P j,i denotes the active power from bus i to bus j , observed on bus j .The original active power from bus i to j Injected data levels (IDL) is defended to present the relative injected errors against the measurements: , , , 100% Comparing with the strict conditions required by the undetectable FDI attack, the TFDI only needs the attacker to manipulate meters on target transmission lines.Moreover, from [8], it can be seen that the probability of finding an attack vector for a target FDI (unconstrained case) in an IEEE 300-bus system is about 20%, even if the attacker can compromise 60% of all smart meters.In experiments, traversal attacks are conducted in IEEE 57-and 300-bus systems.The probabilities to construct a TFDI are shown in Table 1.It can be seen that the possibility to construct a TFDI attack is much higher than for an undetectable FDI.In addition, we modify the active power on each bus in IEEE 39-, 57-and 118-bus systems with different IDL.A relative low detection precision is performed by the Chi-squares test, as shown in Table 2. Furthermore, with the scale of the power system grows, the tolerance of measurement errors is accordingly increased.We can see from Table 2 that it is easier for the attackers to bypass the detection in the larger system.It should be noted that the attackers construct the TFDI according to their limited information and constrained access to smart meters.They do not care about the observability of the system.TFDI attacks exploit the detector's tolerance of normal cumulative random noises and hide among normal measurement errors.It just falsifies some measurements of smart meters and has no influence on the system observability.The TFDI scheme mentioned above is compatible both in alternating current (AC) models and DC models and easy to achieve, therefore, the TFDI is an easy and practicable attack, of which power engineers and security people should be aware.In this paper, we will discuss the countermeasures against such attack scheme.

Smart Meter Intrusion
Cyber techniques are the foundation of the FDI.The basic target for cyber-attacks is to obtain the authorization to make invalid operations on smart meters or network communications.For most smart meters, the communication protocol is Modbus/TCP or DNP 3.0/TCP.The port of Modbus/TCP is 502 and the port of DNP3.0/TCP is 20,000 by default.The attacker can first scan all hosts in the network segment, trying to find devices with opened 502 or 20,000 ports.Next, special hosts are found and marked to be suspicious.The attack can further communicate with these devices to obtain their product types and make sure they are smart meters.
Two strategies can be used to access smart meters: (1) Password cracking is the traditional method to intrude into devices.The modification of smart meter settings often requires authentication.However, considering the limited computational resource and storage, smart meters are not equipped with complex password mechanisms.For smart meters in this simulation, the password is made up of four numerical digits and only several seconds are needed to crack it; (2) Plaintext transmission is another vulnerability which can be used to access smart meters.Some smart meters are equipped with complex password mechanisms.However, for most smart meters, the communication protocol used is Modbus/TCP or DNP 3.0/TCP, in which information is transmitted as plaintext.Attackers can monitor the traffic flow to identify critical operations on smart meters requiring authentication, such as modifications of system time, IP addresses and firmware updates.If the package including authentication information is identified, attackers can seize the password and obtain access to smart meters.
With successful intrusion, the attacker can change measurement values.For most smart meters, measurement values such as active power and reactive power are read-only.However, some settings such as time and CT ratio are writable.A CT is used for measurement of alternating electric currents.The CT ratio K is defined as: K = I 1 /I 2 , where I 1 is the primary current and I 2 is the secondary current.The values of active power and reactive power will increase or decrease in proportion to the change of K.The attacker can change the CT ratio to manipulate the power flow measurements.

A TFDI Attack on IEEE 14-Bus System
A simulation case is constructed to inject false data into the IEEE 14-bus system as shown in Figure 1.Measurements of active power are changed by falsifying the CT ratio of smart meters.This attack case is illustrated to demonstrate how to bypass traditional bad data detection through hiding the injected data among the normal observation errors, and analyze the potential risks of TFDI attacks.The original loads on bus 5 and bus 4 are 7.60 MW and 47.80 MW, respectively.The power flow on the transmission line L 5,4 is 61.16MW.In the attack case, the hacker tries to move 60.96 MW of power load from bus 5 to bus 4 .Thus, the load on bus 5 and bus 4 and the power flow on the transmission line L 5,4 are modified to −53.56 MW, 108.96MW and 122.32 MW, respectively, to maintain the power balance of these buses.The revised data is analyzed with the SE and J(x ^) is equal to 67.5471 by solving Equation (5), which is less than the threshold 72.1532.This shows that the traditional bad data detection method is inadequate and unable to detect this attack, and measurements will be assumed to be free of false data.
The result provided by SE is the basis for the energy management system (EMS).EMS is a system of computer-aided tools used by operators of electric utility grids to monitor, control, and optimize the performance of the generation and transmission system.Some adverse consequences will occur if the EMS is misled.Two potential risks are as follows: Risk 1: Energy Theft Energy theft is the most common and attractive motivation for hackers to launch TFDI attacks.In this case, the hacker tries to reduce the measurement of active power on bus 5 .Then, a TFDI attack is launched and 60.96 MW of power load is moved from bus 5 to bus 4 .According to the current tariff published by the Pacific Gas & Electric Corporation [20], the electricity price is 0.18590$/kW h.If this attack lasts for one day, customers on bus 5 may see their costs unjustly lowered by $272,871.It should be noted that the load on bus 5 is changed to −53.56 MW.Generally, it seems ridiculous to change the load from a positive value to a negative value.However, this is quite normal under the smart grid paradigm.Demand response plays an indispensable role in the smart grid.For some energy-intensive industries, such as iron, steel and cement enterprises, captive power plants and energy storage devices are intrinsic [21].When sufficient energy exists for production, these enterprises can participate in the demand side bidding and feed power back to the grid for their economic benefits.It may drive these energy-intensive enterprises to falsify the value of the smart meter and mislead the power company to believe that electrical energy is being fed back into the power grid.

Risk 2: Cracking Economic Dispatch
Economic dispatch is the short-term determination concerning the optimal output of a number of electricity generation facilities, which is to minimize the overall operating cost while satisfying the power load of system in a robust and reliable manner.To achieve economic dispatch, the optimal power flow (OPF) is applied to solve the load flow and determine a new set of values for generator's output that reduces the generation cost [22].The cost of each generator is usually considered to be quadratic in power generation: where p is the generator's output; and a, b and c are the parameters determined by the property of generators.
We calculate the OPF of this region with MATPOWER, a toolbox developed by the Cornell University [23].The result shows that if the attack illustrated in Figure 1 is launched, the loads of bus 4 and bus 5 and the power flow on the line will change.The value of injected power on bus 5 is negative after the modification.Thus bus 5 pretends to feed energy back to the power grid, and will mislead the control center to reduce the generation output on bus 1 , bus 2 and bus 3 and increase the generation output on bus 8 to meet the increased demand on bus 4 .Output of each generator will be adjusted to pursue the lowest generation cost [24].In any normal situation, the optimal total generation cost is 8081.5$/h.After the hacker launches the attack, the output of bus 8 rises sharply and the optimal total generation cost of the system increases by 5%.

EDSE-Based Bad Data Detection
As shown in Section 2.2, the threshold of the Chi-squares test is set to tolerate unpredictable and inevitable measurement noises.The attackers can elaborately construct TFDI attacks hidden in normal measurement noises.When the number of measurements grows, the Chi-square test has to tolerate larger cumulative normal observation errors from each measurement.If the large system can be reasonably decomposed, false data will not have enough space to hide among normal measurement noises.Based on this idea, an EDSE-based bad data detection method is proposed to handle TFDI attacks.

Power System Decomposition
Setting a smart grid with n buses and m transmission lines, the weighted-undirected graph model of power system can be established as G = {V, E}, where V is a set of vertex representing load buses or generators, and E is the set of edges representing the transmission lines in smart grids.The adjacency matrix of the graph is denoted by A = {a i,j }, i,j = 1, 2, …, n.The element a i,j is non-zero when bus i and bus j are directly connected and it also indicates the physical properties between the two buses.For the modeled graph, the weight of the branch can be determined as the following ways: • The basic topology of the power system (a i,j = 1 if bus i and bus j are connected); • The impedance of transmission lines; • The line power flow at each sampling time.
In this paper, we use the impedance (Z = R + jX) of transmission lines as weight of edges, which reflects the electrical distance between each bus.R is the resistance and is X reactance of the transmission line.Comparing with X, the value of R is very small.Therefore, the absolute value of line reactance |X| is chosen to be the weight of edge.The large graph is divided into several subgraphs using clustering algorithms, such as the L-bounded Graph Partition Method (LGPM) [25], the K-Medoid [26], and Chameleon [27], etc.In this paper, the LGPM method is applied to graph decomposition, since it is relatively stable and not affected by the choice of initial clustering centers.The main process of LGPM is illustrated in Table 3.

Data
The adjacency matrix

Subsystem Extension
After the graph partitioning, the power system graph is decomposed into several sub-graph.Accordingly, the power system is decomposed into a specific number p sub of non-overlapping subsystems connected with each other by tie lines.Let M TIE denote the number of tie lines.In the subsystem k (described by subsys_k (k = 1, ..., p sub )), there are N k buses and M k lines.Let n k denote the number of state variables and m k denote the number of measurements, they should satisfy the following equations: It should be noted that a sufficient redundancy of measurements must be ensured in each subsystem to carry out the SE, i.e., m k > n k .For each subsystem, buses can be grouped into three categories as shown in Figure 2: (1) Internal Buses, all of whose directly connected buses belong to the subsystem; (2) Boundary Buses, whose neighbors are this subsystem's internal buses and at least one bus from another subsystem; (3) Adjacent Buses, which are a boundary bus of another subsystem with a connection to at least one boundary bus in this subsystem.Tie-lines connect the boundary buses of each subsystem.According to the result of decomposition, the tie-lines will not be included into any subsystems.False data injected into the measurements on tie-lines will not be detected.If there are measurements on tie-lines, the subsystem should be extended to contain the adjacent buses, which help to identify attacks on tie lines between original subsystems.
Let TIE k M denote the number of tie-lines directly connecting to subsys_k.The extension status of tie-lines can be determined by a binary set { } Thus, the number of buses and transmission lines in subsystem will be updated as follows: , where E k N denotes the number of buses; and M denotes the number of transmission lines after subsystem extension.In Figure 2, the solid line represents the first step decomposition.Then, the subsystem is extended to contain the adjacent buses.The dashed lines represent the boundaries of new subsystems after extension.

Subsystem SE and Bad Data Detection
In this paper, we adopt a "line-only" method [28] in which values of active and reactive power flow on all transmission lines are selected to be measurements.Each subsystem possesses its own state estimator.The state estimator of each subsystem can be run in parallel and separately with the respect to their own slack bus.The SE of subsys_k can be formulated as follows: 11) where z k is the measurement vector, For SE, enough measurement redundancy must be ensured to eliminate random errors.It is obvious In addition, a global system will be divided at most into m + 1 subsystems after m transmission lines are removed.Therefore, the upper bound of p sub is less than M TIE + 1.Therefore, we can obtain: In addition, the upper bound of the number of extended tie-lines is TIE k M .We can obtain: All the aforementioned discussion suggests that the local degree of freedom o k is obviously lower than the global one.The threshold of the Chi-squares test in each subsystem could be less than that in the entire system.Therefore, the threshold is reasonably reduced without changing the detection confidence.
It should be noted that the EDSE-based method is also effective for other kinds of FDI, which exploit the detection threshold's tolerance of normal observation noises.Since it is difficult to find an attack vector a, which strictly satisfies a = Hc, some researchers try to relax this condition.The authors of [11] proposed that the "generic FDI attack" would be a future direction for attackers to inject false data without being detected.Such attack does not require the strict condition a = Hc, as long as the attacker chooses his attack vector a, satisfying the following equation: where τ is the detection threshold.Then the attacker could still inject false data without being detected.Actually, the basic idea of EDSE method is to improve the detection accuracy by decomposing the system into several subsystems.Through decomposition, false data do not have enough space to hide behind normal measurement errors.They will stand out prominently and the detection sensitivity will be increased.Therefore, it will help defending this kind of attack.
Comparing with other DSE papers [19,29,30], the features of the proposed method are as follows: (1) Although the system is divided into several subsystems, all subsystems' SEs are conducted by the only computing center which has the knowledge of the global system's estimate.
(2) Since this paper just focuses on the bad data detection in smart grids, it is not necessary to construct the aggregation level [30], which is required for each subsystem to achieve convergence of the entire power system's states to those of the centralized estimation.The computational complexity is remarkably reduced.(3) In this paper, subsystems are overlapping, because each original subsystem is extended to contain the adjacent buses.After first-step decomposition, the tie-lines will not be included into any subsystems.False data injected into the measurements on tie-lines will not be detected.Subsystem extension helps to identify attacks on tie lines between original subsystems.(4) The clustering algorithm is introduced for system decomposition.Through the graph establishment, the configuration of power system is taken into consideration.It is suitable for systems which do not have obvious physical or geographic features.

Experiments and Analysis
In this section, the performance of the EDSE-based method is studied: in Section 6.1, three attack cases are simulated on the IEEE 14-bus system.Bad data detection based on EDSE is applied to detect these attacks; in Section 6.2, the IEEE-39 bus system is used to present a statistical comparison of detection performances between the traditional and EDSE-based methods; in Section 6.3, we discuss the some TFDI attacks which are not detected by the EDSE-based method; the evaluation of time complexity is shown in Section 6.4; and in Section 6.5, the proper number of subsystems is discussed.

Attack Cases on IEEE 14-Bus Systems
Three attack cases are constructed on the IEEE 14-bus system as shown in Figure 3.The decomposition of the IEEE 14-bus system is carried out by the LGPM.As shown in Table 4, the IEEE 14-bus system is divided into two subsystems, "subsys_1" and "subsys_2".L 5,6 , L 4,7 , L 4,9 are tie lines.In subsys_1, there are 8 buses including adjacent buses: bus 6 , bus 7 and bus 9 .The number of state variables n 1 is 15 and the number of measurements m 1 is 40.The degree of freedom o 1 in this subsystem is m 1 − n 1 = 25.According to the property of distribution, the threshold of bad data suspicion is 43.77.In subsys_2, there are 11 buses, 21 state variables, and 52 measurements, and the threshold is 51.00.This indicates that the local threshold is much lower than the global one.To test the performance of EDSE-based bad data detection, three attack cases are constructed as shown in Table 5.In Table 5, L i,j denotes the transmission line where the false data are injected.P i,j denotes the active power from bus i to bus j , observed on bus i .The active power P i,j and P j,i are modified at the same time to guarantee the balance of line power flow.The original measurements are simulated by MATPOWER and then the Gaussian noise is added.It should be noted that there is a tiny difference between , i j P and , j i P .These two active power measurements are observed at each end of the transmission line.There is some power loss on the transmission line.For an attacker, it is not easy to change the active power to arbitrary values, because active power is usually read-only.As explained in Section 4.1, attackers can change the active power through falsifying the CT ratio.In Attack Case 1-3, they increase the CT ratio by 2 times, 3 times and 1.5 times, respectively.In Attack Case 1, false data is only injected into subsys_1.The P 4,5 is modified from −61.16 MW to −122.32 MW and P 5,4 is modified from 61.67 MW to 122.34 MW.In Attack Case 2, false data is only injected into subsys_2.The P 6,13 is modified from 17.75 MW to 53.24 MW, and P 13,6 is modified from −17.54 MW to 52.61 MW.In Attack Case 3, the false data is injected into the tie line between subsys_1 and subsys_2.The P 5,6 is modified from 66.13 MW to 99.20 MW, and P 6,5 is modified from −66.13 MW to 99.20 MW.As shown in Table 6, global values of J(x ^) are 54.91,66.04 and 54.73 in three attack cases, respectively.Obviously, they are lower than the threshold T o,p (72.15).Thus, the injected false data cannot be detected.When we adopt EDSE-based method to deal with the Attack Case 1, we find that: in subsys_1, the J(x ^) is 51.98, which is higher than the local threshold 1 Similarly in Attack Case 2, the EDSE-based method can detect the false data in subsys_2.In Attack A slight modification of the original measurement is not worthwhile for attackers.Meanwhile, substantial falsification is easy to detect.We test the detection precision with the |IDL| from 10% to 50%.For each transmission line, we modify the measurement of the active power on this line and then record the total number of successful detection after traversing all the transmission lines.final detection precision of the Chi-squares test and EDSE-based method are illustrated in Figure 5.When IDL = −10%, 84.8% of injected data are detected by EDSE-method while the detection precision of Chi-squares test is only 63%.When IDL = +10%, the detection precision of the Chi-squares test and proposed method are 76.1% and 89.1%, respectively.The detection precision of the EDSE-based method is remarkably higher than that of the Chi-squares test for the lower IDL.With the increase of differences between original measurements and injected data, the performance of the Chi-squares comes to be closer to EDSE-based method.When IDL is higher than 50% or lower than −60%, all injected data can be detected by the proposed EDSE-based method and Chi-squares test.Overall, the detection precision is improved tremendously.
It can be seen that the EDSE approach cannot detect all the TFDI attacks, especially when the IDL is very low.However, attackers construct the attack according to the configuration of the global system.They do not know the decomposition model and the testing threshold of subsystem.In addition, a very slight modification on the original measurements is very attractive for the attackers.The attackers are more likely to maximize the injection, so long as the attack does not trigger the detector of the global system.Therefore, the EDSE approach would achieve a better performance in practice than in the traversal attack simulation shown in Figure 5.

Discussion of Undetected TFDI Attacks
Although the EDSE-based method greatly improves the detection precision than tradition method, some TFDI with slight IDL cannot yet be detected, as shown in Figure 5.In this section, we perform further analysis on the conditions under which a TFDI can be detected or cannot be detected by the EDSE detector. of subsys_2 is close to that of the original system.For larger systems, the performance is more obvious.
For the IEEE 118-bus system, the computation time is 0.791 s with the traditional method, while it only takes 0.2074 s in serial, which is about 26.3% of the traditional method, and 0.084 s in parallel, about 1.3% of the traditional method.For the IEEE 300-bus system, the computation time with the EDSE-based method is 7.5 s in serial mode, which is only 1/16 of the time compared to that of the traditional method.Obviously, the complexity and computation time for detecting the bad data can be reduced dramatically by using the proposed EDSE-based method.Furthermore, if the detection can be conducted in parallel, more execution time can be saved.Furthermore, it is possible to carry out the online bad data detection in a real smart grid consisting of hundreds of buses and thousands of transmission lines with the EDSE-based method.

Optimal Number of Subsystems
In this section, we discuss the optimal number of subsystems in EDSE.We decompose the IEEE 39-bus system into different numbers of subsystems and then analyze the running time and detection precision.False data with IDL = 15% is injected to a pair of active power flow measurements each time.The detailed result is shown in Table 10.At the very start, the detection precision increases with the number of subsystems.Then, the EDSE reaches the maximum detection precision (93.5%) when the power system is decomposed into five subsystems.The precision begins to decrease when the number of subsystems exceeds six because the original system will be divided into some "Isolated Islands" mentioned in Section 6.3.In addition, too much decomposition cannot ensure the observability.
For the IEEE 39-bus system, the optimal number of subsystems can be set to 3-5, which represents a compromise between the running time and detection precision.In practice, the number of subsystems is strongly associated with the topology of the power system, execution mode (parallel or serial) and response time.The optimal number of subsystems can be selected offline according to the practical requirements.

Conclusions and Future Work
In this paper, TFDI attack cases are designed to explain how hackers can modify the data in smart grids and evade the traditional bad data detection methods in power systems.These attacks make use of the tolerance of the Chi-squares testing threshold and hide in normal observation errors.We discuss the potential risks of such attacks in smart grids, and evaluate the losses by energy theft and cracking economic dispatch on the IEEE 14-bus system.
To solve this problem, the EDSE-based method is proposed to detect TFDI attacks.The basic idea of this method is to improve the sensitivity of bad data detection by dividing a complex system into several subsystems.The EDSE-based method consists of the following steps: (1) decompose the power system into a certain number of subsystems using clustering algorithms; (2) extend each subsystem to include the adjacent buses; and (3) carry out SE and bad data detection in each subsystem.
In simulations, extensive TFDI attack cases are simulated in the IEEE 14-, 39-, 118-and 300-bus systems.Results show that the detection precision of EDSE-based method is improved tremendously.Moreover, the EDSE presents a new idea for online bad data detection, since its computational complexity is greatly reduced, and its detection process could be further sped up by parallel analysis of all extended subsystems.
Since FDI in smart grids is a cyber-physical attack which introduces interactive reactions both in the cyber network and power grid, the cyber-physical fusion strategy is considered as a better solution.
In the future, we will further study a detection method which fuses the EDSE with the abnormality detection of traffic flow.Even though the EDSE does not detect the bad data, illegal access to smart meters will also trigger the alarms deployed in the communication network.

P
simultaneously to guarantee the balance of line power flow.

Figure 1 .
Figure 1.An attack case on IEEE 14-bus system.

1 .
Normalize a nonnegative symmetric matrix A' from A and make it doubly stochastic; 2. Spectural Partition: Calculate the N largest eigenvectors ( general clustering algorithm (k-means or EM) using { } i U and N as input s is adopted to get the attribution of each vertices; 4. Generate the adjacency matrix for each subgraphs.

Figure 2 .
Figure 2. Decomposition of a power system.
a non-linear vector function indicating the relationship between the measurements and the state variables in the subsystems k; e k is the random Gaussian error; and a k is a sparse vector of which the non-zero elements are injected attacking values on specific measurements.The Chi-squares test is carried out in each subsystem to detect bad data.Referring to Section 2.2, the local threshold , k o p T of bad data in subsys_k is determined by the local degree of freedom o k .The difference between the global and the local threshold is shown as below:
, o p T (43.77); in subsys_2, the J(x ^) is 25.22, which is below the local threshold 2 , o p T (51.00).It implies that there is false data in subsys_1.

Table 1 .
Success probability to find a tolerable false data injection (TFDI) attack.IDL: injected data levels; and IEEE: the Institute of Electrical and Electronics Engineers.

Table 2 .
Detection precision of the Chi-squares test against TFDI attacks.

Table 9 .
Running time comparison.

Table 10 .
Discussion on the number of subsystems.