Anomaly Detection in Power System State Estimation: Review and New Directions

: Foundational and state-of-the-art anomaly-detection methods through power system state estimation are reviewed. Traditional components for bad data detection, such as chi-square testing, residual-based methods, and hypothesis testing, are discussed to explain the motivations for recent anomaly-detection methods given the increasing complexity of power grids, energy management systems, and cyber-threats. In particular, state estimation anomaly detection based on data-driven quickest-change detection and artiﬁcial intelligence are discussed, and directions for research are suggested with particular emphasis on considerations of the future smart grid


Introduction
Since its introduction by Schweppe in the late 1960s [1,2], power system state estimation has proved an integral component of Energy Management Systems (EMSs).Schweppe's proposed nonlinear static state estimation (SSE) provides estimates of the actual network status, which could then be leveraged for subsequent analysis, including contingency evaluation and power flow studies [3].Soon after, strategies for mitigating erroneous measurement data [4,5] were developed to ensure the fidelity of the power system state estimates.SSE and dynamic state estimation (DSE) both share a rich history of research [6][7][8]; however, SSE has seen more real-world implementation.Nevertheless, DSE shows great promise in having an enhancing role in legacy SSE-based EMS [9], especially with the increased adoption of synchrophasor measurements [10], and thus, anomalydetection methods using both approaches are surveyed.
Numerous sources of state estimation error have been identified and formulated in the literature, including measurement, parameter, and topology discrepancies with respect to the system model.More recently, with the integration of EMS into sophisticated computer networks, the potential for cyber-security vulnerabilities became apparent.What new considerations must be made when bad data are malicious?Stealthy false data injection attacks [11], for example, were formulated as an exercise in fooling legacy baddata-detection schemes.That said, attacks on cyber-physical systems have yielded very real consequences, including equipment damage and rolling blackouts [12].Anomalydetection techniques that can properly handle these manufactured instances of bad data, and thus improve bad data processing in state estimation generally, are surveyed in this review.This review also hopes to highlight some considerations for future approaches to anomaly detection in state estimation, including implementation-based research in the face of increasingly dynamic load and generation profiles, the complexity of distributed cyber-physical infrastructure, and pushes for combined SSE and DSE approaches for higherfidelity EMS information to improve control, efficiency, and stability in the future smart grid.Because the field of anomaly detection covers a wide range of approaches, this survey limits its scope to power system state estimation, which is a central component of EMS and is expected to remain as such well into the future [9].
Articles selected for this review were chosen based on their impact on the power state estimation anomaly detection field.For earlier foundational works, the authors sought to include papers with lasting influence and citation impact for bad data detection generally.Particular emphasis was placed on real-world implementation in modern EMSs.More recent works required consideration of cyber-attacks and/or error types designed to circumvent the approaches of older works.Because many of these approaches have yet to be implemented in EMSs, selected papers required notable metrics of improvement compared to legacy detection methods.
The contributions of this work include: • Providing a history of legacy bad data detection and error types in power system state estimation and the connection to newer detection approaches and cyber-attack types.

•
Surveying various sources of state estimation cyber-threats and the challenges they pose to anomaly detection schemes.

•
An overview of newer approaches for anomaly detection based on quickest-change detection and AI.

•
Considerations for future research, including the incorporation of dynamic load profiles, autocorrelated data, and asynchronous measurements.
This review is organized as follows.Section 2 provides a brief theory of static and dynamic state estimation generally and the components used for bad data detection.Section 3 describes the theory and physical meaning behind three main types of error in state estimation: measurement, parameter, and topology.Section 4 outlines the traditional methodologies developed for bad data detection and identification, which often serve as a basis for many modern approaches.Section 5 discusses malicious data attacks designed specifically to circumvent traditional bad data detection.Section 6 describes more modern approaches that aim to overcome these pitfalls.Section 7 provides a summary and considerations for future work.

Static State Estimation
One of the most used models to perform power system SE is the Weighted Least Squares (WLS) estimator [7].A power system with n buses and d measurements can be modeled through a set of nonlinear algebraic equations in the measurement model: where z ∈ R 1×d is the measurement vector, x ∈ R 1×N the state variables vector, h : R 1×N → R 1×d is a continuous nonlinear differentiable function, and e ∈ R 1×d is the measurement error vector.Each measurement error e i is assumed to follow a zero mean Gaussian distribution.N = 2n − 1 is the number of unknown state variables, i.e., the complex voltages at each bus.
In the traditional WLS approach, the state vector estimate in (1) is determined by minimizing the weighted norm of the residual [13], represented with the cost function J(x): where W = R −1 is the inverse covariance matrix of the measurements, otherwise known as the weight matrix.Linearizing the measurement model (1) yields where H = ∂h ∂x is the Jacobian matrix of h at the current state estimate.The estimate of the linearized state vector is then given by The estimated value of the measurement vector mismatch ∆z is given by where P = H(H T WH) −1 H T W denotes the linear projection or "hat" matrix.The idempotent matrix P also has the following properties [7]: These properties facilitate an expression for the measurement residuals [8]: where S is known as the residual sensitivity matrix, which was first recognized in [5] for representing the sensitivity of the measurement residual to the measurement error during bad data processing.Also useful is the residual covariance matrix Ω [7]: [r] = [Se] = 0 (8a) The residual covariance matrix is used for the detection and identification of bad data, as well as providing insight into the degree of interaction; these concepts will be elaborated upon further in Section 3.

Dynamic State Estimation
SSE does not consider any history of the measurement vector z, but instead provides a snapshot of the system.This "memoryless" assumption of SSE proved sufficient for real-time monitoring in early EMS.For one, power networks were not as regimented at the distribution level, with far fewer microgrids, distributed energy resources, and net load dynamics compared to today's systems.Secondly, the measurement data fed to the state estimator almost always came from measurement devices with slow sampling rates, such as the 2-4 s range of SCADA.One might argue, then, that the true bottleneck for capturing dynamic behavior in state estimation was slow metering rates.That said, Schweppe's formulation arrived just shortly after the introduction of the Kalman filter in 1961 [14], which inspired power researchers to seek formulations beyond the stilldeveloping SSE.The practical hangup of slow meter sampling rates would be relieved somewhat with the introduction of synchronized phasor measurements in the 1980s [10].Phasor Measurement Units (PMUs) provide higher sampling rates compared to SCADA but also GPS coordination to avoid the uncertainty associated with asynchronicity.
Like SSE, dynamic state estimation (DSE) encompasses a wide range of methods.Early DSE formulations considered the same set of measurements and state variables as those used in SSE: active and/or reactive power flow and injections and complex bus voltages.Other approaches seek to better capture load dynamics by considering generator rotor angle and speed as differential-algebraic state variables [9,15,16]; however, this review will primarily consider DSE-based anomaly-detection implementations that use algebraic state variables.
DSE can be accomplished by modeling the power system as a discrete-time dynamic system.The Kalman filter is used [17] to estimate the state variables at time k through prediction and measurement update steps upon each iteration: Predict: Update: where, at time k, A k is the state transition matrix, K k is the Kalman gain matrix, and H k is the measurement matrix.F k|k and F k|k−1 denote the state covariance matrix estimates based on measurements up to times k and k − 1. Q k and R k are the process and observation noise covariance matrices, respectively.The authors of the first Kalman filter power system DSE approach [18] hinted at its compatibility with anomaly-detection methods, which, at the time, were being studied for SSE.Early work soon after [19,20] formulated bad data detection by analyzing the innovation process: Additional approaches for bad data processing in DSE include asymmetry analysis based on the skewness of the normalized estimation error [17,21].DSE anomaly detection research remains an active field [16,22], especially since dynamic load and generation profiles are commonplace in microgrid systems with distributed energy resources (DERs).

Bad Data Types and Considerations
Bad data can be classified as either single or multiple.For single bad data, one measurement in the system is corrupted with a large error.Multiple bad data describe more than one measurement being in error and can be further classified by the degree of interaction and conformity [7].Multiple bad data are said to interact when the residuals are highly correlated, whereas conformity describes the degree to which gross errors are "masked" in the residual (i.e., nonconforming errors present as highly normalized residuals) [8].Another illustration of how error is not always fully reflected in the residual is the concept of leverage points [23][24][25][26], which can hinder the effectiveness of the largest residual methods.Leverage points arise as a consequence of system topology, parameter values, and measurement placement and are usually caused by the following: (i) injection and flow measurements near branches with a small X/R ratio; (ii) injection measurements near buses with a large number of incident branches; and (iii) a measurement with a large weight [6,27].Even a single leverage point can compromise bad data detectability.
Gross errors that exist beyond the acceptable noise limit of the state estimation model can be categorized into three types: measurement, parameter, and topology.Each of these errors suggests a discrepancy between the measurement data and model and are described further in the following.

Measurement Error
Measurement error is inevitable given the limitations of metering equipment accuracy.Meters can fail or degrade, introducing bias and compromising both accuracy and Gaussian error assumption: empirical studies of synchrophasor errors have yielded heavy-tailed error distributions such as Cauchy, Student's t, logistic, and Laplace [28,29].Further, the communications infrastructure itself may contribute to measurement error in the case of failure or interference [7].Particularly egregious measurement errors that suggest physically impossible grid conditions, such as negative bus voltage magnitudes or magnitudes several times larger or shorter than nominal values, are filtered through pre-processing [8], but more "agreeable" measurement errors can nevertheless affect the accuracy of state estimates.

Parameter Error
Parameter errors suggest discrepancies between measurement data and the system model.While Schweppe in his original formulation [1] did recognize the impact of erroneous model parameters, such errors were not considered in the network model.For example, a parameter error might arise when the variability in a line-impedance value due to extreme weather conditions is not taken into account.The mismatch between the measurement data and the line impedance database value, which is used in the Y-admittance matrix for power flow calculations, would be reflected in the state estimation result.
A simple alteration of ( 1) yields an augmented model [30] and linearization: where p is the true parameter value, p 0 is the erroneous parameter value, and ∆p = p 0 − p is the parameter error.Stuart and Herget [31] investigated the impact of parameter errors on SSE by simulating erroneous values for line impedance, measurement error variance, and transformer tap settings.Of particular note was an observed relationship between the severity of error and lightly loaded lines.
Parameter errors can be thought of as a special case of multiple bad data in which only the measurements pertaining to the erroneous model parameter are in error.As such, studies have been performed with the goal of differentiating between the two.In [32], it was shown through analysis of the state estimation error distribution that parameter errors are reflected only in the measurement functions with erroneous parameter values.Parameter estimation itself has been treated as a process separate from state estimation.A practical implementation of this was first developed in [33], in which a sensitivity-based WLS estimation approach is used to both identify and estimate parameter error.

Topology Error
Like parameter errors, topology errors suggest discrepancies in the measurement model.System topology describes the bus-branch network configuration at the time of state estimation.Topology processing, which precedes state estimation, normally determines the correct status of manual switching and the circuit-breaking apparatus.A topological discrepancy, such as a branch outage unaccounted for by the topology processor, would be reflected in the Jacobian measurement matrix H, which requires accurate bus-branch connection logic for the calculation of power flow.Topology errors can significantly compromise state estimation accuracy through multiple conforming bad data [7].Early work showed that such topology errors can be reflected in the state estimation error [34,35] and that normalized residual methods could be used for detection.Other approaches suggest incorporating the statuses of switching devices themselves as additional state variables [36], aiding in the identification of topology errors as such.

Bad Data Detection
To preserve the accuracy of state variable estimates, bad data must be detected, identified, and either eliminated or corrected.Whether the source of the bad data is measurement-, parameter-, or topology-based, detection is the first step.The classical components of bad data detection can be broadly categorized into three main branches and are often used in conjunction with one another: chi-square χ 2 testing, residual-based methods, and hypothesis testing.
For a set of d random variables {X i , i = 1, 2, . . ., d} with unit Gaussian distribution . This follows the form of the cost function defined in (2) and can be written as the performance index [8] (16) assuming that the measurement errors are independent and distributed e i ∼ N (0, σ 2 ).J( x) then follows a χ 2 distribution with d − N degrees of freedom, where d is the number of measurements and N is the number of unknown state variables.
A critical value C = χ 2 (d−N),p can then be obtained based on the degrees of freedom d − N and the desired detection confidence with probability p = 1 − α, where α is a constraint on false probability.If J( x) ≥ C, then bad data are suspected; otherwise, the measurements are assumed to be free of bad data.χ 2 testing has proved valuable for the detection of bad data even in the early history of SSE [5], where it was quickly realized that χ 2 and normalized residual methods can outperform one another generally, but that χ 2 often proved better for multiple bad data.

Residual-Based Methods
The χ 2 test soon became commonplace for the detection of bad data detection in WLS SSE for a specified constraint on false probability α, after which residual analysis could be performed for the identification of the measurement(s) in error [37].However, in the case of single bad data in larger networks, the analysis of both the weighted and normalized residuals also proved viable for detection due to a more pronounced response in the presence of gross errors when compared to χ 2 testing.The use of normalized residuals for bad data detection was introduced in [5].Using the residual covariance matrix Ω ii = diag(Ω), the normalized residuals can be defined It was shown in [5] that, after bad data had been detected through means such as the χ 2 test, a list of the normalized residuals in descending order could be obtained.The largest normalized residual could be used to identify the measurement in error, after which the measurement was removed and the state estimation re-run.If bad data were still detected, the procedure would repeat until all erroneous measurements were eliminated.Further techniques were developed to correct measurements contaminated with bad data, rather than eliminating them [8].Correction keeps the measurement structure intact, which is especially important in cases of limited redundancy.
Both the detection and identification of bad data can be achieved without χ 2 testing by comparing the largest normalized residual to a statistical threshold depending on the desired sensitivity [7].The case studies in [5] demonstrated that, in the case of multiple bad data, either interacting or noninteracting, no consensus could be developed as to whether χ 2 testing or the largest normalized residual test proved superior for bad data detection.A geometric interpretation of the normalized residuals was developed in [38], significantly improving the generalizability of multiple interacting bad data detection.The residual difference between estimated and actual measurements continues to be a vital component in state estimation anomaly detection, including in newer formulations to be expanded upon in Section 6.

Hypothesis Testing
Hypothesis testing is a statistical method for deciding between accepting a null hypothesis H 0 or an alternative hypothesis H 1 based on available observations.In power system state estimation, the hypotheses are formulated as follows: The first work to use hypothesis testing identification (HTI) for bad data in power system state estimation [39] developed regions of acceptance between H 0 and H 1 by comparing the estimation error to a threshold dependent on the measurement standard deviation and a pre-selected constraint on false probability α.New results of this HTI method were presented in [40], where the optimality of the linear estimator is established along with a decision strategy based on a constraint for missed detection β.In [41], the authors bridge the gaps between theory and practice by implementing the HTI on eight test systems, showcasing its strengths in detecting multiple interacting bad data.For bad data identification, HTI methods show significant advantages over methods based on normalized residuals, which may be strongly correlated [7].HTI techniques also demonstrated potential for discerning error type, such as in topology error identification [42,43].

When Bad Data Become Malicious
The introduction of the concept of false data injection attacks (FDIAs) [11] helped to highlight the limitations of classical bad-data-detection methods.What if bad data are malicious and/or statistically derived to avoid conventional detection?The basic idea of FDIAs is that an attacker can design an injection of multiple interacting bad data, which is then applied to the measurement vector z.Consider the representation z a = z + a, where a = (a 1 , a 2 , . . ., a m ) T is a vector of malicious data.The attacker's goal is to design a to alter the state estimates, which EMSs use to make operating decisions, but without triggering bad data detection.Ramifications of undetected attacks include compromised system stability [12] and negative economic impact [44].The success of such attacks is largely dependent on the information available to the attacker, such as the number of meters compromised, state estimates, system topology, and Jacobian structure, to name a few.
Denial-of-service (DoS) attacks are another source of mismatch between the measurement data fed to the state estimator and the true power system state.Causes for DoS attacks are numerous [45], including communication channel jamming, packet flooding, and compromising of metering devices such as SCADA and PMUs so that data are not updated for that region of the power grid.For state estimation, DoS attacks are typically modeled as a set of measurements that are no longer available, which can negatively impact state variable accuracy.If stealthiness is desired, care would need to be taken on the attacker's part so as not to render the system unobservable.FDIAs can also be designed to create a topology error attack [46][47][48], in which a conventionally nondetectable mismatch between measurement data and topology processing can lead to compromised system stability and cost-effective operation.
The authors of [49] present FDIA strategies from the attacker and defender perspectives.For the attacker, it is typically assumed that there is a cost associated with the information obtained.With this in mind, an algorithm is presented to find the minimal set of measuring devices required to manufacture an unobservable attack.In [50], a comparative analysis of the FDIA impact between so-called DC and AC SSE is conducted.DC SSE considers active power measurements only, with bus voltage angles as the state variables.In contrast, the complete AC SSE considers both active and reactive power measurements, with bus voltage magnitudes and angles as the state variables.Such a study was important due to the DC model warranting far more attention in the FDIA research space at the time, despite the full nonlinear AC model finding use in real-world EMS applications [51,52].
Impacts of FDIAs on Kalman filter DSE approaches were studied in [53], where it was found that the unscented Kalman filter (UKF) [54] yielded better performance compared to the extended Kalman filter (EKF) [55] and the enhanced EKF [56].Further, an online nonparametric cumulative sum (CUSUM) approach was proposed to detect anomalies based on distribution changes in the state estimation error.This is related to quickestchange detection approaches, which will be elaborated upon further in Section 6.1.A Kalman filter state estimation approach was proposed in [57], where a Euclidean detector was used to overcome the shortcomings of the χ 2 test for detecting statistically derived FDIAs as well as DoS attacks.
The FDIA formulation highlighted a need for improved bad data detection.The classification of bad data as such would also need improvement.Common confusion matrix metrics like false negatives and false positives become harder to minimize when stealth FDIAs can closely resemble power system events like transients, switching, and sudden load changes.Further, with the increasing push towards the cyber-physical operation of the smart grid [58], many new points of entry for cyber-attack became apparent, such as Internet of Things (IoT) infrastructure [59], communication channels [60], and distributed computing [61].The intersection of model-based and data-driven solutions should grow to better handle the bad data detection limitations posed by FDIAs.With state estimation anticipated to remain a vital component of EMSs, new formulations based on quickestchange detection and AI should be developed for improved anomaly detection.

Recent Approaches 6.1. Quickest-Change Detection
Quickest-change detection (QCD) is concerned with detecting a possible change in the distribution of a monitored observation sequence [62], which is indicative of an anomaly in a stochastic environment.The general goal of QCD theory is to design algorithms to detect these changes with the smallest detection delay possible, subject to a constraint on false alarms.
Three main ingredients are needed in the QCD problem [63]: an observed stochastic process {X n , n = 1, 2, . ..}, a change time τ a at which the statistical properties of the process undergo change, and a decision maker that declares a change time τ s based on observations of the stochastic process.A false alarm is defined as an instance of the decision maker declaring a change before the change occurs: I{τ s < τ a }.The constraint on false alarm follows from the Neyman-Pearson hypothesis testing formulation [64], which is foundational to the QCD problem.
The Neyman-Pearson Lemma [65] establishes the optimal test for binary hypothesis testing, involving the null (H 0 ) and alternate (H 1 ) hypotheses.For a single observation X: Then, comparing the likelihood ratio q(X)/p(X) to a threshold value is the most powerful test in terms of deciding which hypothesis is true while minimizing missed detection subject to a constraint on false alarms [66].The likelihood ratio plays a fundamental role in recursive sequential-change-detection algorithms such as Page's CUSUM [67] and the Shiryaev-Roberts procedure [68], each of which enjoys optimality properties in terms of minimizing false alarm and detection delay (τ s − τ a ) + max(0, τ s − τ a ).These properties are given proper discussion in [62].
QCD approaches have shown great promise for power system anomaly detection applications, such as line outage detection and identification [69][70][71].QCD has further application in detecting changes in the state estimation error, which has been proposed for fault and FDIA detection.The first QCD approach for state estimation FDIA detection implemented an adaptive approach using the CUSUM statistic: where {Z n , n = 1, 2, . ..} is the observed stochastic process and L is the log-likelihood ratio.Sample plots of a subtle change in a Gaussian observation process, along with the corresponding CUSUM statistic, are included in Figure 1.Because the exact form of the post-change distribution q is not known, the authors in [72,73] used a Rao test-based approximation [74] of the generalized likelihood ratio test for CUSUM-based FDIA detection.A low-complexity Orthogonal Matching Pursuit CUSUM (OMP-CUSUM) approach in [75] accounts for the unknown change distribution by maximizing the cumulative log-likelihood ratio to detect FDIAs that are sparse (i.e., only a small number of meters are assumed accessible to the attacker).
Both centralized and distributed CUSUM-based approaches for FDIA detection are proposed in [76], replacing the unknown parameters of the post-change distribution with their maximum likelihood estimates (MLEs).For the centralized case, the observed stochastic process of interest is the projection of the measurement vector on the orthogonal Jacobian space component R ⊥ (H).This is expressed as ỹn P n y n , where P is the previously defined linear projection matrix.The distributed case partitions the power system into areas and estimates the state variables through the alternating direction method of multipliers (ADMM) [77], where each area i has its own observed process { ỹi n , n = 1, 2, . ..}.These approaches outperformed the adaptive-CUSUM approach in [72,73], due in part to the improved detection of FDIAs with negative and larger elements of the attack vector a.
The work in [78] incorporates a Kalman filter approach and separately evaluates DoS attacks and FDIAs.Better detection performance was observed for stealth FDIAs in particular, in which perfect system topology knowledge allows an attacker to inject false data along the column space of H. Four Kalman filtering techniques in [53] were evaluated using nonparametric CUSUM, in which both pre-and post-change distributions p and q are unknown.Hybrid FDIA/jamming attacks are assessed for the Kalman filter CUSUM-based detector in [79].The distinction between persistent and non-persistent attacks was made as well.Most CUSUM-based detectors assume persistence in the change in the observed stochastic process, and so an intermittent attack series could be designed to increase the detection delay.Thus, the Generalized Shewhart Test, which can detect significant increases in L, is presented as a countermeasure against stealthy, non-persistent FDIAs.A relaxed generalized CUSUM (RGCUSUM) algorithm is presented in [80] for FDIA detection.A relaxation on maximizing the post-change likelihood over the unknown parameters yielded a more computationally efficient algorithm than its generalized CUSUM counterpart.A normalized Rao CUSUM-based detector with a time-varying dynamic model was employed in [81] to better distinguish between FDIA and sudden load changes.
The work in [82] also assesses the Shiryaev-Roberts (SR) procedure, along with CUSUM for change detection.In contrast to CUSUM, the optimality of the SR procedure pertains to detecting τ at a distant time horizon [83,84].The SR procedure is defined recursively as Further, the modified CUSUM and SR procedure algorithms [85] are employed in the same work as evaluation benchmarks for a so-called DeepQCD algorithm for online cyber-attack detection, which uses deep recurrent neural networks to detect changes in transient cases and with autocorrelated observations.

AI Approaches
FDIA detection can be framed as a binary classification problem in which the measurement vector z is determined to be either normal (negative class) or anomalous (positive class).One of the first to use semi-supervised and supervised learning for FDIA detection [86] explored perceptron, support vector machine (SVM), k-nearest neighbors (k-NN), and sparse logistic regression algorithms for supervised learning.Semi-supervised learning, in which unlabelled test data are incorporated in training, was explored with semi-supervised SVMs.Many valuable takeaways were garnered from this work, including considerations of power system size and and computational complexity; however, stealthy FDIAs were not considered.An Extended Nearest Neighbors (ENN) algorithm was proposed in [87] to better handle the imbalanced data problem (i.e., cases in which the number of negative class samples greatly exceeds or is significantly less than the number of positive class samples).Classification performance was then compared to SVM and k-NN algorithms.The work in [88] used a method based on the margin-setting algorithm, typically used in image processing applications, in which hypersphere decision boundaries were formed through labeled PMU time-series data.The MSA approach yielded superior classification performance compared to standard artificial neural networks (ANNs) and SVM.
Unsupervised principal component analysis (PCA) showed utility in the construction of stealthy and blind FDIAs, as well as in developing robust detection methods [89,90].PCA is again employed in [91] as a preprocessing step to project higher-dimensional correlated measurement data to a lower dimension, removing the correlation between data and magnifying the distance between normal and anomalous measurements.For performance comparison, the authors implemented a supervised distributed ADMM-based SVM, which could only outperform the PCA-based anomaly detection when the training set was large.Mahalanobis distance-based ensemble detection methods demonstrated success for FDIA detection in [92][93][94][95], including in high-fidelity real-time simulation.
Reinforcement learning (RL)-based QCD approaches are explored in [82,96].The QCD problem can be formulated as a case of optimal stopping, in which a decision to exercise must be made to minimize cost [97,98].In QCD, this is understood as declaring a stop time τ s at a cost relative to the actual stop time τ a .For the Markov Decision Process (MDP) component of RL, one can either seek to maximize reward or minimize cost [99].Two components for the cost are constructed [97]: one for continuing (associated with missed detection) and one for stopping (associated with false alarm).The authors in [96] use a model-free state-action-reward-state-action (SARSA) approach to learn the expected future cost for each state-action pair in a Q-table.The authors opt for a quantization scheme for learning when faced with the continuous observation space.Because the actual change time τ a is a hidden state, a partially observable Markov decision process (POMDP) formulation is used.This RL approach significantly outperformed the Euclidean [57] and cosine-similarity metric [100]-based detectors in terms of minimizing the mean probability of false alarm and detection delay for various cyber-attack types, including hybridFDI/jamming, DoS, and network topology attacks.
Neural network and deep learning approaches also show promise for malicious and standard bad data detection.A Deep-Belief-Network-based classifier is proposed in [101] using Conditional Gaussian-Bernoulli Restricted Boltzmann Machines in the hopes of revealing higher-dimensional temporal features of stealthy FDIAs.The temporal correlation between measurements with the state estimator is analyzed through Recurrent Neural Networks (RNNs) for FDIA detection in [102].A nonlinear autoregressive exogenous (NARX) model configuration for ANNs is explored in [103] for stealthy optimized FDIA detection.The authors in [104] consider a limited set of target labels for attacked measurement data, an example of semi-supervised learning.Autoencoders, used for dimensionality reduction and feature extraction, are integrated into a generative adversarial network.The framework compensates for the limited labeled data set by using two neural networks: one generative, responsible for creating fake samples, and the other discriminative, responsible for distinguishing between real and generated samples.

Conclusions and Suggestions for Future Work
A survey of legacy bad-data-detection procedures has been presented along with limitations with respect to malicious bad data.Cyber-attack formulations such as FDIA highlight the need for better data detection by pointing out the theoretical manipulation of grid-operating procedures by bad actors.Even if one argues that the FDIA formulation is more of a theoretical exercise than a practical concern, it still points to shortcomings in legacy bad data detection.Standard bad data and physical line faults under the leverage point conditions discussed earlier are difficult to detect for similar reasons as statistically derived stealth FDIAs.Newer methods such as QCD and AI seek to overcome legacy bad-data-detection techniques by leveraging features such as measurement data temporal patterns and probability density changes in the state estimation error.
Increased access to real state estimation measurement data would aid greatly in accessing the practicality of QCD and AI anomaly-detection formulations.For example, a QCD formulation assuming independent and identically distributed (i.i.d.) observations may be compromised under dynamic load and generation profiles, in which case the measurement data exhibit complicating factors like autocorrelation, as investigated in [82].The robustness of newer anomaly detection strategies to asynchronous measurement data should also be investigated.Until synchronized measurement data for state estimation become standard, uncertainty quantification of this type should considered so as not to be considered a false-positive source of anomalous behavior.The availability of timeseries data such as SCADA and/or PMU measurements for multi-bus systems would aid state estimation researchers in quantifying uncertainty and measurement correlation.It is also recommended that future work incorporate dynamic load and generation profiles to better reflect the future directions of the modern smart grid.This was a motivation in the work [81], which highlighted the importance of discerning anomalies from dynamic

Figure 1 .
Figure 1.Example of a small mean shift observation sequence with the corresponding CUSUM evolution.