Distribution System State Estimation and False Data Injection Attack Detection with a Multi-Output Deep Neural Network

: Distribution system state estimation (DSSE) has been introduced to monitor distribution grids; however, due to the incorporation of distributed generations (DGs), traditional DSSE methods are not able to reveal the operational conditions of active distribution networks (ADNs). DSSE calculation depends heavily on real measurements from measurement devices in distribution networks. However, the accuracy of real measurements and DSSE results can be signiﬁcantly affected by false data injection attacks (FDIAs). Conventional FDIA detection techniques are often unable to identify FDIAs into measurement data. In this study, a novel deep neural network approach is proposed to simultaneously perform DSSE calculation (i.e., regression) and FDIA detection (i.e., binary classiﬁcation) using real measurements. In the proposed work, the classiﬁcation nodes in the DNN allow us to identify which measurements on which phasor measurement unit (PMU), if any, were affected. In the proposed approach, we aim to show that the proposed method can perform DSSE calculation and identify FDIAs from the available measurements simultaneously with high accuracy. We compare our proposed method to the traditional approach of detecting FDIAs and performing SE calculations separately; moreover, DSSE results are compared with the weighted least square (WLS) algorithm, which is a common model-based method. The proposed method achieves better DSSE performance than the WLS method and the separate DSSE/FDIA method in presence of erroneous measurements; our method also executes faster than the other methods. The effectiveness of the proposed method is validated using two FDIA schemes in two case studies: one using a modiﬁed IEEE 33-bus distribution system without DGs, and the other using a modiﬁed IEEE 69-bus system with DGs. The results illustrated that the accuracy and F 1-score of the proposed method are better than when performing binary classiﬁcation only. The proposed method successfully detected the FDIAs on each PMU measurement. Moreover, the results of DSSE calculation from the proposed method has a better performance compared to the regression-only method, and the WLS methods in the presence of bad data.


Introduction
The concept of false data injection attacks (FDIAs) in power systems was first studied in [1]; additional studies about the damaging effects and threats of these new attacks quickly followed [2,3].Because of the transition from traditional power networks to smart grids, more smart devices and communication infrastructures are required to enable the reliable and efficient performance of smart grids [4].However, despite the progress in the power network structure, attackers attempt to disrupt the performance of power networks by manipulating the data obtained from measurement devices [5].The goals of attackers are to falsify the data and measurement information in power networks.Therefore, state estimation (SE) results can be influenced by FDIAs due to dependencies of state estimation calculations on measurement information and network topology [6].SE results are submitted to the control center for further processing including optimal power flow, contingency analysis, security analysis, etc.If FDIAs cannot be successfully detected, the control center performance will be severely degraded, and the likelihood of both economic and physical risks may arise from a wrong decision in a control center [7][8][9].
The goal of BDD is to determine the existence of erroneous bad data on measurement information.Traditional bad data detection (BDD) schemes compute the 2 -norm of the residual measurement after SE calculations are performed [10,11].However, FDIAs can successfully bypass most conventional BDD algorithms, and attackers can inject bad data into measurement data and introduce arbitrary errors into the output of the SE [12].State estimation techniques, attacks, and defense strategies on transmission networks have been well established [13,14].Unfortunately, these approaches cannot be applied in distribution networks (DNs) due to their differences with transmission networks [15].Some features of DNs include: 1.
Unlike transmission networks, which have a mesh structure, DNs generally have radial or weakly meshed configuration.2.
DNs generally consist of many buses compared to transmission networks, making installation of measurement devices at all buses in DNs economically impractical.

3.
DNs normally have large resistance to reactance ratios (r/x), compared to transmission lines.
In addition to these differences, renewable energy sources are becoming more common in DNs [16][17][18].These sources typically introduce higher variance and inconsistency, making it more difficult to perform state estimation.Because of these reasons, new SE methods should be developed that consider the characteristics of DNs: multiple renewable energy sources [19], electric vehicles, variable loads, etc.Moreover, the integration of different technologies and components in active distribution networks (ADNs) must emphasize the security aspects of these networks, including the ability to detect cyberattacks such as FDIAs [20,21].
In the last decade, machine learning approaches have been widely used and developed for control and monitoring in power networks [22][23][24][25].In [26], a machine learning approach is used for an energy storage program and load management in power networks.Moreover, due to limitations in detecting FDIAs using conventional (model-based) BDD methods, machine learning approaches have been applied widely to identify malicious data injection.Faster execution time and accurate results are two main advantages of using machine learning approaches over conventional BDD methods [27][28][29].Machine learning algorithms are based on the data obtained from the power networks, unlike model-based approaches which are based on network topologies as well as measurement data.
Different machine learning methods, including supervised [30][31][32], semi-supervised [33,34], and unsupervised learning [35,36], are used to explore the detection of FDIAs in many different fields.In [37], a machine learning approach is utilized to identify cyberattacks such as structured query language injection attack (SQLIA.In [38], a new machine learning method is proposed to identify false data injection attack on an information of technology (IOT) system.In [39], a supervised-machine learning algorithm is used to classify different failure parts of a wind turbine.
In this study, we focus our machine learning efforts in FDIA detection in power distribution networks.FDIA detection is considered a supervised binary classification problem.In [40], the abilities of different machine learning approaches are tested to identify attacks in ADNs.Furthermore, various scenarios are considered to verify that FDIAs can be identified using machine learning methods.In [41], a hybrid weighted least absolute value (WLAV) method is proposed to use supervisory control and data acquisition (SCADA) and micro-PMU measurements for three-phase unbalanced distribution networks.The robustness of WLAV and WLS estimators are compared against potential FDIAs, and it is shown that WLAV has a better performance to enhance the security of the distribution grid.In [42], two distributed sparse state estimation and attack detection methods are studied to make a DN observable and to perform FDIA detection locally in distribution networks.In [43], the affine interval state estimation method is applied to identify attacks in measurement data by considering the upper and lower boundaries of the state variables.In [44], a new method is proposed to identify types of faults and cyberattack locations simultaneously by utilizing a deep neural network (DNN) method.The authors call their method fault and attack location and classification (FALCON); this method is categorized as a multi-output classification approach.In [45], a new method is proposed to identify the presence of corrupted measurement data and the location of compromised micro-PMUs in order to ignore the corrupted measurement devices as a defense strategy.In [46], two separate DNN models are designed to perform DSSE and topology detection from available synchronized measurements for unbalanced ADNs.
In existing research, DSSE calculation and bad data detection on state variables and measurement data are performed separately by using model-based or data-based approaches.In some cases, multi-output classification problems are solved by a single DNN model.In contrast, this paper uses a single DNN model for simultaneously performing DSSE and FDIA detection using PMU measurements as inputs.Different scenarios are considered for FDIAs to verify the proposed method; moreover, the performance of the proposed method is compared to when DSSE calculation and FDIA detection are performed by separate DNN models.To make a comparison between the robustness of data-based and model-based approaches, the WLS method is performed to obtain state variables in the presence of FDIAs on PMU measurements.
The main contributions of this paper are summarized as follows: 1.
We design a single DNN model to simultaneously perform DSSE calculation and FDIA detection based on PMU measurement inputs.The results are compared to when DSSE calculation and FDIAs are performed separately using two independent DNN models.

2.
Having N + 1 classification nodes, where N is the number of PMU measurements, allows the DNN to identify which PMU measurement, if any, was affected by FDIA, or if none of the measurements were affected.

3.
The performance of the proposed method is investigated for FDIA detection on PMU measurements with different attack scenarios.4.
To make a comparison between data-based and model-based approaches, DSSE calculation is performed using the WLS as a model-based approach.

5.
The effectiveness of the proposed method is tested for passive and active distribution networks: the 33 and 69 IEEE distribution networks, respectively.6.
We show that the proposed method accurately calculates state estimation variables, even in the presence of erroneous measurements.

7.
The execution time comparison between the proposed method, the disjoint DNN model for DSSE calculation and FDIA detection, and the model-based approach is calculated.The results indicate that performing FDIA detection and DSSE calculation simultaneously lead to a significant decrease in execution time.

Power System State Estimation
State estimation calculations are essential for continually improving the performance and management of power networks [47].Different state estimation techniques have been developed and used for transmission networks for several years, but these methods cannot be used at the distribution level directly due to differences between transmission and distribution networks.The distribution system state estimation (DSSE) enhances monitoring and controlling of distribution grids effectively and efficiently.Moreover, state estimation results can be utilized for load forecasting, stability analysis, optimal power flow, bad data detection, and energy market analysis [48,49].The weighted least square (WLS) method is one of the traditional methods which is effective in both distribution and transmission grids.

WLS Formulation for State Estimation
The general measurement model for the state estimation problem can be expressed as: where z = [z 1 . . .z M ] T is the vector of the measurements, and M is the number of available measurements; h = [h 1 . . .h M ] T is the list of measurement function vectors, and it is commonly nonlinear.The relationship between the available measurements and state vectors are shown by h(x).The state variable vector is given by x = [x 1 . . .x N ] T and N is the number of state variables.Lastly, e z ∼ N (0, R z ) is the measurement noise vector, and it is assumed to be of zero mean dimension and be a Gaussian random variable with covariance matrix For instance, the power flow equations can be expressed as: In these equations, P i and Q i are the real and reactive power injections at bus i, respectively; G ij and B ij are the real and imaginary part of the nodal admittance matrix element Y ij , respectively; and θ ij = θ i − θ j is the standing phase angle difference between buses i and j.
Additional equations describe the active (P ij ) and reactive (Q ij ) power flow from bus i to bus j: When WLS is performed for the state estimation calculation, the objective function is defined as: In this formulation, w i is the weight associated with the ith measurement, and M is the number of available measurements to perform the SE calculation.The difference between measurement values (z) and the function corresponding to the state vector (h(x)) is ex- pressed as z − h(x), and it is called the measurement residual in the literature.Equation ( 6) can be defined in matrix form as: W M×M is a diagonal matrix, whose diagonal elements correspond to the weights w i .The iterative Gauss-Newton (IGN) method is commonly performed to minimize the objective function J(x); Reference [47] In IGN, the following function is solved at each iteration k: where H k = H(x k ) is the Jacobian matrix at iteration k, and G(x k ) = H T k W H k is the gain matrix.∆x k is the updated state vector used to calculate the new state as follows: The iterative calculation continues until a predefined convergence criterion is reached.The largest absolute value of the updated state vector (∆x k ) is compared to a predefined tolerance threshold ε.When max(|∆x|) < ε, the calculation will be stopped.The state vector will be estimated in the last iteration by the WLS approach.
The state vector of the power grid can be defined as a set of variables; when the state variables are calculated, other electric power quantities could be computed from these states [50,51].In node voltage distribution system state estimation (NV-DSSE), voltage magnitudes and phase angles for all buses are considered as state variables.State vectors can be represented in polar coordinates as where δ N , V N are voltage phase angle and magnitude, respectively, and N is the number of buses.It is assumed that there are no measurement devices installed in the slack bus and only conventional measurements are available in the distribution grid.The voltage magnitude is 1 p.u. and the phase angle of the slack bus is zero (δ 1 = 0).However, if there is a measuring device at the slack bus, the state vector will be defined as x = [δ 1∅ , . . . ,δ N∅ , V 1∅ , . . . ,V N∅ ] where the phase angle δ 1∅ is not zero any more [52][53][54].
There are two main differences between the non-PMU configuration and the PMU configuration.First, the definition of the mathematical equation relating measurements to physical parameters of the distribution grids is altered.Second, the Jacobian matrix has a different structure [55].

False Data Injection Attacks (FDIAs)
To evaluate the operating status of power networks, including voltage magnitude and phase angle of buses, state estimation is made on the basis of available measurements.Unfortunately, the state variables can be manipulated by injecting FDIAs into meter measurements [56], which reduces the accuracy of DSSE results.
The measurement vector z could be manipulated and changed to a falsified measurement z a when attackers inject malicious data: In this formulation, a ∈ R m×1 is the malicious data vector that is injected into the measurement vector.The erroneous measurement vector, z a , can lead to an inaccurate system state xa = x + c, where c is the resultant error in the state vector.The FDIAs cannot be identified by bad data detection approaches if an attacker knows the structure of the power system h.For example, the FDIAs can bypass the BDD if a = h( xa ) − h( x), which causes the residual error before and after the attack to be the same: The general effect of FDIAs on measurements and DSSE procedures are shown in Figure 1.The measurement vector is manipulated by the FDIA vector (a), which modifies it to become z a = z + a. Falsified measurements and network topology are fed into DSSE calculation which can be performed using model-based or data-based approaches.The SE results are then sent to the control and management center for further processing, including bad data detection using appropriate methods.

Machine Learning Approach to Detect FDIAs
Machine learning is a form of artificial intelligence that gives computers an ability to learn without being explicitly programmed [57].FDIA detection is defined as a supervised binary classification problem.The main objective of a binary classifier for FDIAs is to clas sify measurements as being either secure () or attacked ( =  + ).A binary classifica

Machine Learning Approach to Detect FDIAs
Machine learning is a form of artificial intelligence that gives computers an ability to learn without being explicitly programmed [57].FDIA detection is defined as a supervised binary classification problem.The main objective of a binary classifier for FDIAs is to classify measurements as being either secure (z) or attacked (z a = z + a).A binary classification problem can be defined as: where y i = 0 and y i = 1 indicate there is no attack or there is an attack on a measurement, respectively, and a is the attack vector.
A deep neural network (DNN) is a subset of machine learning inspired by the organization or structure of the human brain.DNN is one of the fastest growing artificial intelligence technologies.DNN methods have been proposed widely to detect FDIAs with high accuracy [58,59]; however, this technique requires more time and data for a training phase [60,61].In feed-forward DNN models, the information flows in only one single direction from the input, through optional hidden layers, to the output, as shown in Figure 2.

Machine Learning Approach to Detect FDIAs
Machine learning is a form of artificial intelligence that gives compute learn without being explicitly programmed [57].FDIA detection is defined a binary classification problem.The main objective of a binary classifier for FD sify measurements as being either secure () or attacked ( =  + ).A bin tion problem can be defined as: where  = 0 and  = 1 indicate there is no attack or there is an attack on a respectively, and  is the attack vector.
A deep neural network (DNN) is a subset of machine learning inspired ization or structure of the human brain.DNN is one of the fastest growing ligence technologies.DNN methods have been proposed widely to dete high accuracy [58,59]; however, this technique requires more time and data phase [60,61].In feed-forward DNN models, the information flows in only rection from the input, through optional hidden layers, to the output, as sh 2. A DNN consists of activation functions, weights, neurons, an input lay ers, and an output layer.The input layer comprises neurons that receive th bles and transfer them to subsequent layers in the network.The number of input layer must be the same as the number of the features or attributes in th hidden layers are placed between the input and the output layers; the num layers and the number of neurons in each layer are determined experi weights in the network are constantly updated so the output can reliably p come based on the original input.The strength or the magnitude of conne two neurons is called a weight.The value of the weights is usually small an the range of 0 to 1. Neurons have two important roles: first, they determine weighted inputs, and second, they initiate an activation process to norm A DNN consists of activation functions, weights, neurons, an input layer, hidden layers, and an output layer.The input layer comprises neurons that receive the input variables and transfer them to subsequent layers in the network.The number of neurons in the input layer must be the same as the number of the features or attributes in the dataset.The hidden layers are placed between the input and the output layers; the number of hidden layers and the number of neurons in each layer are determined experimentally.The weights in the network are constantly updated so the output can reliably predict an outcome based on the original input.The strength or the magnitude of connection between two neurons is called a weight.The value of the weights is usually small and falls within the range of 0 to 1. Neurons have two important roles: first, they determine the sum of the weighted inputs, and second, they initiate an activation process to normalize the sum.Weights are associated with each input of the neuron.The network learns these weights during the learning phase.The activation function, which can be either linear or nonlinear, is the decision-making center at the neuron output.Three common activation functions are sigmoid, tanh, and rectified linear unit (ReLU).

Methodology
As mentioned earlier, DSSE calculation and FDIA detection are typically performed on measurements separately.In conventional cases, DSSE calculation is performed in the first stage, and then BDD is performed to identify FDIAs on measurements from SE results.
In data-based approaches, as shown in Figure 3a,b, two separate DNNs are considered: one to execute DSSE calculations and one to perform the binary FDIA detection.In this study, as shown in Figure 3c, FDIA detection and DSSE calculation are performed simultaneously using a single DNN model.The method is compared to traditional approaches that perform BDD and DSSE calculation using two independent DNN models.Traditional approaches use a regression-based DNN to perform DSSE calculation and a classification-based DNN to detect FDIAs.A description of regression-based DNNs and classification-based DNNs in the context of DSSE and BDD is provided in [62].In this paper, we assume the attacker injects false data into the original measurements by directly modifying the measurement vector: z → z a .(This can also be modeled as is the decision-making center at the neuron output.Three common activation functio are sigmoid, tanh, and rectified linear unit (ReLU).

Methodology
As mentioned earlier, DSSE calculation and FDIA detection are typically perform on measurements separately.In conventional cases, DSSE calculation is performed in first stage, and then BDD is performed to identify FDIAs on measurements from SE sults.
In data-based approaches, as shown in Figure 3a,b, two separate DNNs are cons ered: one to execute DSSE calculations and one to perform the binary FDIA detection.this study, as shown in Figure 3c, FDIA detection and DSSE calculation are perform simultaneously using a single DNN model.The method is compared to traditional a proaches that perform BDD and DSSE calculation using two independent DNN mode Traditional approaches use a regression-based DNN to perform DSSE calculation and classification-based DNN to detect FDIAs.A description of regression-based DNNs a classification-based DNNs in the context of DSSE and BDD is provided in [62].In t paper, we assume the attacker injects false data into the original measurements by direc modifying the measurement vector:  →  .(This can also be modeled as  =  + .In this study, the bus voltage and branch current magnitudes are considered as ava able measurements which are obtained from the PMU devices installed on a limited nu ber of buses in the distribution network. The attack models ( and  ) on voltage and branch current magnitudes are pressed in ( 13) and ( 14), respectively: In this study, the bus voltage and branch current magnitudes are considered as available measurements which are obtained from the PMU devices installed on a limited number of buses in the distribution network.The attack models (V a and I b ) on voltage and branch current magnitudes are expressed in ( 13) and ( 14), respectively: The proposed DNN model consists of input, two hidden, and output layers.The number of neurons in the input layer is equal to the number of available PMU measurements (N).The first and second hidden layers have 900 and 400 neurons, respectively.The tanh function is considered as the activation function for both hidden layers.The output layer consists of N b regression nodes and N + 1 classification nodes.N b is the number of state variables: the voltage magnitudes and phase angles of all buses.N is the number of PMU measurements in the distribution networks: one voltage magnitude and one current magnitude per PMU.The output nodes corresponding to regression use a linear activation function.The classification nodes in the output layer use a softmax activation function, allowing the algorithm to identify only one output node as being the most likely output.Having N + 1 classification nodes allows the DNN to identify which PMU measurement, if any, was affected by FDIA.Note that if multiple measurements are attacked with the same vector injection, the algorithm will only report one of them because the softmax function is used as the classification layer activation function.Stochastic gradient decent (SGD) is used as the optimizer.All other hyperparameters are set to their default values using the TensorFlow library 2.4.1 package in Python.
In order to evaluate the DSSE results (i.e., the regression outputs), mean percent error (MPE) and mean absolute error (MAE) are calculated using (15) and ( 16): In both equations, x is the estimated value, x is the actual value, and n refers to the data set size.
Accuracy and F 1 -score, defined below, are used for binary classification outputs: In (17), TP = true positives, TN = true negatives, FP = false positives, and FN = false negatives.Precision and recall are calculated as follows: Energies 2023, 16, 2288 9 of 22 A confusion matrix can be used to analyze the performance of a given classifier.Correctly classified and misclassified outcomes are represented on the on and off diagonals of the confusion matrix, respectively.Where these values are nonzero, we highlight the entries as blue (correct) or red (incorrect).

Case Study I
For one of the case studies, the effectiveness of the proposed method is evaluated on the IEEE 33 bus distribution network (shown in Figure 4)."True" values of electrical parameters are calculated by power flow calculations.

Case Study I
For one of the case studies, the effectiveness of the proposed method is evaluate the IEEE 33 bus distribution network (shown in Figure 4)."True" values of electrica rameters are calculated by power flow calculations.Measurements are randomly calculated based on their probability density fun for each Monte Carlo trial.The sets of assumptions are defined as follows: 1.The number of Monte Carlo trials is chosen as  = 12000; 2. A Gaussian distribution, with 3 = 50% of the nominal value, is considere power injection on the buses.3. Three PMUs are assumed as the available measurement devices in the network w are located at buses 9, 16, and 31 [63].Voltage and branch current magnitudes each PMU are used for FDIA detection and for performing DSSE using the prop method.For each PMU, 12,000 samples are given according to the number of M Carlo trials.4. Two types of attack vectors are injected into available measurements: (1)  = (1 ± .05),(1 ± 0.1) , as in [40].
(2)  = ~( ,  ), where  and  are the average and standard deviati each measurement vector, respectively.In this case, we assume that the att knows the distribution of each measurement and wants to falsify measurem based on the true measurement distribution.5.For training and test sets, 67% and 33% of of the data were used, respectively 6.The pseudo-measurements of active and reactive power injections and flows are erated to make the system observable and to perform WLS calculations with th clusion of the PMU measurements.7. Voltage magnitudes and phase angles for all the buses are considered as a state iable:  = [ , ⋯ ,  ,  , ⋯ ,  ], where  ,  are the voltage phase angle and nitude, respectively, and  is the number of buses.It is assumed that there a measurement devices installed in the slack bus and  = 0 and  = 1.8. The standard deviation is considered as 50% of the nominal value for pseudo-m urements and 3% of the actual value of active and reactive power flow mea Measurements are randomly calculated based on their probability density function for each Monte Carlo trial.The sets of assumptions are defined as follows: 1.
The number of Monte Carlo trials is chosen as N MC = 12,000; 2.
A Gaussian distribution, with 3σ = 50% of the nominal value, is considered for power injection on the buses.

3.
Three PMUs are assumed as the available measurement devices in the network which are located at buses 9, 16, and 31 [63].Voltage and branch current magnitudes from each PMU are used for FDIA detection and for performing DSSE using the proposed method.For each PMU, 12,000 samples are given according to the number of Monte Carlo trials.4.
Two types of attack vectors are injected into available measurements: (1) z a = ∼ N (µ z , σ z ), where µ z and σ z are the average and standard deviation of each measurement vector, respectively.In this case, we assume that the attacker knows the distribution of each measurement and wants to falsify measurements based on the true measurement distribution.

5.
For training and test sets, 67% and 33% of of the data were used, respectively.6.
The pseudo-measurements of active and reactive power injections and flows are generated to make the system observable and to perform WLS calculations with the inclusion of the PMU measurements.7.
Voltage magnitudes and phase angles for all the buses are considered as a state variable: where δ N , V N are the voltage phase angle and magnitude, respectively, and N is the number of buses.It is assumed that there are no measurement devices installed in the slack bus and The standard deviation is considered as 50% of the nominal value for pseudo-measurements and 3% of the actual value of active and reactive power flow measurements.A Gaussian error, with 3σ = 1%, is added to PMU measurements (voltage and branch current magnitudes) to model uncertainty.To model the zero injection buses, the error of virtual measurements is considered to be 10 −8 .
In Figure 5, true and attacked measurements for 120 consecutive samples are shown.As it is clear, after FDIA, the true measurements are changed to falsified values, and we aim to identify theses erroneous measurements using the proposed method.
In Figure 5, true and attacked measurements for 120 consecutive samples are As it is clear, after FDIA, the true measurements are changed to falsified values, aim to identify theses erroneous measurements using the proposed method.Table 1 shows the results for the proposed method for the and appropriate classification-only methods.It shows that the proposed method successfully dete FDIAs on the PMU measurements when false data  = (1 ± 0.05), (1 ± 0.1) jected randomly in 10% of available measurements.The accuracy and F1-score of posed method are 0.930 and 0.889, respectively, which are better than when per the binary classification only (i.e., 0.882 and 0.814, respectively).Next, the FDIA detection and DSSE results from the proposed method are discussed.Also discussed are the appropriate binary classification-only or regression-only DNN model and the WLS method.All model performances are analyzed for the case where 10% of each of the available measurements (N a = 1200) are falsified randomly by FDIAs (z a = {(1 ± 0.05)z, (1 ± 0.1)z}).
Table 1 shows the results for the proposed method for the and appropriate binary classification-only methods.It shows that the proposed method successfully detected the FDIAs on the PMU measurements when false data z a = {(1 ± 0.05)z, (1 ± 0.1)z} are injected randomly in 10% of available measurements.The accuracy and F1-score of the proposed method are 0.930 and 0.889, respectively, which are better than when performing the binary classification only (i.e., 0.882 and 0.814, respectively).The results of DSSE calculation from the proposed method, the regression-only, and the WLS method are shown in Figure 6.As it is clear from this figure, the proposed method has a better performance in both MPE and MAE compared to the other methods in the presence of bad data.When no bad data are present, both of the DNN-based methods (regression-only and combined regression and classification) have similar performance for the state estimation.The WLS estimator has the worst performance in both cases, and the performance is significantly worse in the presence of bad data measurements.The results of DSSE calculation from the proposed method, the regression-only, and the WLS method are shown in Figure 6.As it is clear from this figure, the proposed method has a better performance in both MPE and MAE compared to the other methods in the presence of bad data.When no bad data are present, both of the DNN-based methods (regression-only and combined regression and classification) have similar performance for the state estimation.The WLS estimator has the worst performance in both cases, and the performance is significantly worse in the presence of bad data measurements.

False Data Injection Attacks on Measurements with 𝑧 = ~𝒩(𝜇 , 𝜎 )
In this case, we assume that the attacker knows the distribution of each measurement and wants to falsify measurements based on the true measurement distribution.
In Figure 7, 103 measurements before and after FDIAs are shown.We aim to identify falsified measurements by applying the proposed method.The FDIA detection and DSSE regression results are shown when the FDIAs' vector is constructed with  = ~( ,  ).It is clear from the figure that attacked samples come from the same distribution as true measurement samples, making them more difficult to identify.

False Data Injection Attacks on Measurements with z
In this case, we assume that the attacker knows the distribution of each measurement and wants to falsify measurements based on the true measurement distribution.
In Figure 7, 103 measurements before and after FDIAs are shown.We aim to identify falsified measurements by applying the proposed method.The FDIA detection and DSSE regression results are shown when the FDIAs' vector is constructed with z a = ∼ N (µ z , σ z ).It is clear from the figure that attacked samples come from the same distribution as true measurement samples, making them more difficult to identify.The results of DSSE calculation from the proposed method, the regression-only, and the WLS method are shown in Figure 6.As it is clear from this figure, the proposed method has a better performance in both MPE and MAE compared to the other methods in the presence of bad data.When no bad data are present, both of the DNN-based methods (regression-only and combined regression and classification) have similar performance for the state estimation.The WLS estimator has the worst performance in both cases, and the performance is significantly worse in the presence of bad data measurements.

False Data Injection Attacks on Measurements with 𝑧 = ~𝒩(𝜇 , 𝜎 )
In this case, we assume that the attacker knows the distribution of each measurement and wants to falsify measurements based on the true measurement distribution.
In Figure 7, 103 measurements before and after FDIAs are shown.We aim to identify falsified measurements by applying the proposed method.The FDIA detection and DSSE regression results are shown when the FDIAs' vector is constructed with  = ~( ,  ).It is clear from the figure that attacked samples come from the same distribution as true measurement samples, making them more difficult to identify.The results in Table 2 for the proposed method and for the appropriate binary classification-only methods show that the proposed method successfully detected most FDIAs on all three PMUs when false data z a = ∼ N (µ z , σ z ) were injected randomly to 5% of the available measurements.Furthermore, it shows that the proposed method works better than an independent binary classification method.The accuracy and F1-score of the proposed method (0.924 and 0.556, respectively) are better than when performing binary classification only (0.909 and 0.403, respectively).The results in Table 2 for the proposed method and for the appropriate binary classi fication-only methods show that the proposed method successfully detected most FDIA on all three PMUs when false data  = ~( ,  ) were injected randomly to 5% of the available measurements.Furthermore, it shows that the proposed method works bette than an independent binary classification method.The accuracy and F1-score of the pro posed method (0.924 and 0.556, respectively) are better than when performing binary clas sification only (0.909 and 0.403, respectively).Figure 8 shows the results of DSSE calculation from the proposed method, the regres sion-only method, and the WLS method.It is clear from the figure that the proposed method has a better performance in the MPE and MAE criteria compared to other meth ods in the presence of bad data.Similar to what was seen in the previous case study, when no bad data are present, both DNN-based methods (regression-only and combined re gression and classification) have similar performance for state estimation.The WLS esti mator has the worst performance in both cases, and the performance is significantly worse in the presence of bad data measurements.Table 3 shows the results for the proposed method and the appropriate binary classification-only method, showing that the proposed method successfully detected most FDIAs on all three PMUs when false data z a = ∼ N (µ z , σ z ) were injected randomly to 10% of the available measurements.It also shows that the proposed method works better than an independent binary classification method.The accuracy and F1-score of the proposed method (0.907 and 0.856, respectively) are better than when performing binary classification only (0.839 and 0.403, respectively).Predicted Values Accuracy = 0.9068 F 1 -Score = 0.8557 Accuracy = 0.8393 F 1 -Score = 0.4028 Figure 9 shows the results of DSSE calculation from the proposed method, the regression-only method and the WLS method.It is clear from the figure that the proposed method has a better performance in MPE and MAE criteria compared to other methods in the presence of bad data.Once again, when no bad data are present, both the DNN-based methods (regression-only and combined regression and classification) have similar performance for state estimation.The WLS estimator has the worst performance in both cases, and the performance is significantly worse in the of bad data measurements.Table 4 shows the execution time for the proposed method,regression-only method and the WLS method when the testing data are the same for each method.As it is clear, the execution time is decreased significantly when regression and binary classification are performed simultaneously by a joint DNN model.The WLS execution time is ten times Table 4 shows the execution time for the proposed method, regression-only method and the WLS method when the testing data are the same for each method.As it is clear, the execution time is decreased significantly when regression and binary classification are performed simultaneously by a joint DNN model.The WLS execution time is ten times greater than for the other methods because, as mentioned earlier, the Jacobian matrix, which is based on physical parameters of a network, is recalculated in each iteration, which increases the execution time.Therefore, by applying the proposed method which is a data-based approach, the execution time is decreased significantly.

Case Study II
The modified IEEE standard 69 bus distribution network (shown in Figure 10) is chosen for the second case study.The effectiveness of the proposed method is evaluated using this system.The system is suitably adapted to include a mix of commercial and residential loads and DGs.A set of experimental data (available for a time period of one year), obtained from Open Energy Information (OpenEI) [64], is utilized with the simulation time step of 1 h.Three photovoltaic (PV) panels are placed [65] on buses 21, 62, and 64 with maximum generation of 929.7 kW, 1075.2 kW, and 992.5 kW, respectively.The hourly data of power generation of a photovoltaic system are computed and adopted based on actual data for Bozeman, MT, USA [66,67].The reactive power of bus is defined as: where  ()~(0.85,0.95).
1. Three PMUs-located at buses 20, 60, and 67 [63]-are assumed as the available meas urement devices in the network.The bus voltage and branch current magnitude The hourly data of power generation of a photovoltaic system are computed and adopted based on actual data for Bozeman, MT, USA [66,67].The reactive power of bus i is defined as: where P f i (t) ∼ Uni f (0.85, 0.95).
1. Three PMUs-located at buses 20, 60, and 67 [63]-are assumed as the available measurement devices in the network.The bus voltage and branch current magnitudes from each PMU are used for detection of FDIAs and for performing DSSE using the proposed method.We have 8760 samples with a temporal resolution of one sample per hour for each PMU, resulting in data for one year.

2.
Two types of attack vector are injected into available measurements: (1) z a = ∼ N (µ z , σ z ), where µ z and σ z are the average and standard deviation of each measurement vector, respectively.In this case, we assume that the attacker knows the distribution of each measurement and wants to falsify measurements based on the true measurement distribution.

3.
All the data are then split into train and test sets with a ratio of 67% and 33%. 4.
The pseudo-measurements of active and reactive power injections and flows are generated to make the system observable and to perform WLS calculations with the inclusion of PMU measurements.5.
Voltage magnitudes and phase angles for all buses are considered as state variable: where δ N , V N are voltage phase and magnitude, respectively, and N is the number of buses.It is assumed that there are no measurement devices installed in the slack bus and that δ 1 = 0 and V 1 = 0. 6.
The standard deviation is considered as 50% of the nominal value for pseudo-measurements and 3% of the actual value of active and reactive power flow measurements.A Gaussian error, with 3σ = 1%, is added to the PMU measurements (i.e., the voltage and branch current magnitudes) to model the uncertainty.To model zero injection buses, the error in virtual measurements is considered to be 10 −8 .
The results in Table 5 for the proposed and appropriate binary classification-only methods show that the proposed method successfully detected most FDIAs on each PMU measurement when the false data, z a = {(1 ± 0.05)z, (1 ± 0.1)z}, are injected randomly to 10% of available measurements.The false data are the same for the independent binary classification method.The table shows that the accuracy and F1-score of the proposed method (0.954 and 0.5334, respectively) are better than when performing the binary classification only (0.916 and 0.521, respectively).The results of DSSE calculation from the proposed method, the regression-only method, and the WLS method are shown in Figure 11.It is clear from this figure that the proposed method has a better performance with respect to the MPE and MAE criteria compared to other methods in the presence of bad data.When no bad data are present, both DNNbased methods (regression-only and combined regression and classification) have similar performance for state estimation.The WLS estimator has the worst performance in both cases, and the performance is significantly worse in the presence of bad data measurements.

False Data Injection Attacks on Measurements with 𝑧 = ~𝒩(𝜇 , 𝜎 )
The results in Table 6 for the proposed method and appropriate binary classificat only method show the proposed method successfully detected most FDIAs on each P measurement when false data  ~( ,  ) are injected randomly to 5% of avail measurements, and it works better than an independent binary classification method

False Data Injection Attacks on Measurements with
The results in Table 6 for the proposed method and appropriate binary classificationonly method show the proposed method successfully detected most FDIAs on each PMU measurement when false data z a ∼ N (µ z , σ z ) are injected randomly to 5% of available measurements, and it works better than an independent binary classification method.
The accuracy and F1-score of the proposed method (0.9723 and 0.722, respectively) are better than when performing binary classification only (0.970 and 0.702, respectively).
Figure 12 shows the results of the DSSE calculation from the proposed method, the regression-only method, and the WLS method.As it is clear from Figure 12, the proposed method has a better performance in the MPE and MAE criteria compared to the other methods in the presence of bad data.This is similar to what was seen in the previous case study.When no bad data are present, both DNN-based methods (regression-only and combined regression and classification) have similar performance for state estimation.The WLS estimator has the worst performance in both cases, and the performance is significantly worse in the presence of bad data measurements.Figure 12 shows the results of the DSSE calculation from the proposed method, the regression-only method, and the WLS method.As it is clear from Figure 12, the proposed method has a better performance in the MPE and MAE criteria compared to the other methods in the presence of bad data.This is similar to what was seen in the previous case study.When no bad data are present, both DNN-based methods (regression-only and combined regression and classification) have similar performance for state estimation.The WLS estimator has the worst performance in both cases, and the performance is significantly worse in the presence of bad data measurements.Table 7 shows the results for the proposed method and the appropriate binary classification-only method, where the proposed method successfully detected most FDIAs on each PMU measurement when false data  = ~( ,  ) are injected randomly to 10% of available measurements, and it works better than an independent binary classification method.The accuracy and F1-score of the proposed method (0.9619 and 0.792, respectively) are also better than when performing the binary classification only (0.957 and 0.770, respectively).Table 7 shows the results for the proposed method and the appropriate binary classification-only method, where the proposed method successfully detected most FDIAs on each PMU measurement when false data z a = ∼ N (µ z , σ z ) are injected randomly to 10% of available measurements, and it works better than an independent binary classification method.The accuracy and F1-score of the proposed method (0.9619 and 0.792, respectively) are also better than when performing the binary classification only (0.957 and 0.770, respectively).The results of DSSE calculation from the proposed method, the regression-only method, and the WLS method are shown in Figure 13.It is clear from this figure that the proposed method has a better performance in the MPE and MAE criteria compared to the other methods in the presence of bad data.Once again, when no bad data are present, both DNNbased methods (regression-only and combined regression and classification) have similar performance for state estimation.The WLS estimator has the worst performance in both cases, and the performance is significantly worse in the presence of bad data measurements.The results of DSSE calculation from the proposed method, the regression-only method, and the WLS method are shown in Figure 13.It is clear from this figure that the proposed method has a better performance in the MPE and MAE criteria compared to the other methods in the presence of bad data.Once again, when no bad data are present both DNN-based methods (regression-only and combined regression and classification have similar performance for state estimation.The WLS estimator has the worst perfor mance in both cases, and the performance is significantly worse in the presence of bad data measurements.In Table 8, the execution time for the proposed method, regression-only method, and WLS method are shown when the testing data are the same for each method.It is clear from the table that the execution time is decreased significantly when regression and binary classification are performed simultaneously by a joint DNN model.WLS execution time is ten times greater than that of the other two methods.As mentioned previously, each iteration recalculates the Jacobian matrix, which is based on physical parameters of a network, and this increases the execution time.Therefore, by applying the proposed method, which is based on a data-based approach, the execution time is decreased significantly.

Conclusions
In this paper, a new method using a DNN approach is proposed to simultaneously perform DSSE calculation and FDIA detection on measurements in distribution networks.Voltage magnitudes and phase angles are defined as state vector variables in this study.The proposed method considers the constraints of DG penetration and limitations on the installation of measurement tools in distribution networks, making it more applicable in a real-world DNN model with two hidden layers is designed to perform both regression (DSSE calculation) and binary classification (FDIA detection), and the results are compared when regression and binary classification are carried out with two separate DNN models using the same hyperparameters as the proposed DNN model.Moreover, DSSE calculation-based on the WLS method, along with PMU and pseudo-measurements-is performed to make a comparison between data-based and model-based approaches.In this work, we showed that DSSE calculation can be performed precisely from corrupted measurements and simultaneously identify FDIAs on corrupted measurements with high accuracy.We consider two case studies to verify the proposed method: IEEE 33-bus system without DG, and IEEE 69-bus systems with DGs.MPE and MSE values are considered to evaluate DSSE results for the proposed method, disjoint DNN method, and WLS method.Accuracy and F1-score are considered for evaluating the binary classification task.False data vectors are defined as being of two types: 1-z a = {(1 ± 0.05)z, (1 ± 0.1)z}, and 2-z a = ∼ N (µ z , σ z ).
For the 33-bus case study, DSSE is performed using the proposed method, achieving 0.93% and 1.99% MPE for the first and second false data vectors, respectively.The accuracy for bad data detection is 93% and 92% for the first and second false data vectors, respectively, when 10% of each of the PMU measurements are corrupted by FDIAs.The execution time for the proposed method (min: 0.40 (s)-max: 0.63 (s)) is much faster than for the WLS method (min: 23.49 (s)-max: 30.91 (s)).For the 69-bus case study, DSSE is performed using the proposed method, achieving 1.98% and 2.01% MPE for the first and second false data vectors, respectively.The accuracy for bad data detection is 0.95 and 0.72 for the first and second false data vectors, respectively, when 10% of each of the PMU measurements are corrupted by FDIAs.The execution time for the proposed method (min: 0.15 (s)-max: 0.63 (s)) is much faster than for the WLS method (min: 43 (s)-max: 51 (s)).The difference in execution time between simultaneous and disjoint DNN models was insignificant.

Figure 1 .
Figure 1.General Effect of FDIAs on Measurements and DSEE Procedure.

Figure 3 .
Figure 3. (a) A regression DNN architecture for DSSE calculation; (b) a binary classification DN model for FDIA detection on measurements; (c) the proposed DNN model configuration to perfo DSSE calculation and FDIAs on measurements simultaneously.

Figure 3 .
Figure 3. (a) A regression DNN architecture for DSSE calculation; (b) a binary classification DNN model for FDIA detection on measurements; (c) the proposed DNN model configuration to perform DSSE calculation and FDIAs on measurements simultaneously.

Figure 8
Figure8shows the results of DSSE calculation from the proposed method, the regression-only method, and the WLS method.It is clear from the figure that the proposed method has a better performance in the MPE and MAE criteria compared to other methods in the presence of bad data.Similar to what was seen in the previous case study, when no bad data are present, both DNN-based methods (regression-only and combined regression and classification) have similar performance for state estimation.The WLS estimator has the worst performance in both cases, and the performance is significantly worse in the presence of bad data measurements.

Figure 8 .
Figure 8. DSSE Results Obtained Using the Proposed Method, the Regression-only DNN Model, and WLS with or without Bad Data when N a = 600 (5%) and z a = ∼ N (µ z , σ z ).

Figure 9 .
Figure 9. DSSE Results Obtained Using the Proposed Method, the Regression-only DNN Model, and WLS with or without Bad Data when N a = 1200 (10%), and z a = ∼ N (µ z , σ z ).

Figure 10 .
Figure 10.The modified IEEE Standard 69 Bus Distribution System.

Figure 12 .
Figure 12.DSSE Results Obtained Using the Proposed Method, the Regression-only DNN Model, and WLS with or without Bad Data when N a = 438 (5%), and z a = ∼ N (µ z , σ z ).

Figure 13 .
Figure 13.DSSE Results Obtained Using the Proposed Method, the Regression-only DNN Model and WLS with or without Bad Data when  = 876, and   = ~(  ,   ).

Figure 13 .
Figure 13.DSSE Results Obtained Using the Proposed Method, the Regression-only DNN Model, and WLS with or without Bad Data when N a = 876, and z a = ∼ N (µ z , σ z ).

Table 1 .
Confusion Matrix Showing Bad Data Detection Results and Accuracy Values Obta

Table 1 .
Confusion Matrix Showing Bad Data Detection Results and Accuracy Values Obtained Using the Proposed Method and the Binary Classification-only DNN Model when N a = 1200 (10%), and z a = {(1 ± 0.05)z, (1 ± 0.1)z}.

Table 2 .
Confusion Matrix Showing Bad Data Detection Results and Accuracy Values Obtained Us ing the Proposed Method and the Binary Classification-only DNN Model when  = 600 (5%)   = ~(  ,   ).

Table 4 .
Execution time for the proposed method, regression-only method, and WLS method.

Table 6 .
Confusion Matrix Showing Bad Data Detection Results and Accuracy Values Obtained ing the Proposed Method and the Binary Classification-only DNN Model when  = 438 (5%)   = ~(  ,   ).

Table 6 .
Confusion Matrix Showing Bad Data Detection Results and Accuracy Values Obtained Using the Proposed Method and the Binary Classification-only DNN Model when N a = 438 (5%), and z a = ∼ N (µ z , σ z ).

Table 7 .
Confusion Matrix Showing Bad Data Detection Results and Accuracy Values Obtained Us ing the Proposed Method and the Binary Classification-only DNN Model when  = 876 (10%) and   = ~   ,   .

Table 8 .
Execution time for the proposed method, regression-only method, and WLS method.