A Hybrid Framework for Detecting and Eliminating Cyber-Attacks in Power Grids

: The work described in this paper aims to detect and eliminate cyber-attacks in smart grids that disrupt the process of dynamic state estimation. This work makes use of an unsupervised learning method, called hierarchical clustering, in an attempt to create an artiﬁcial sensor to detect two different cyber-sabotage cases, known as false data injection and denial-of-service, during the dynamic behavior of the power system. The detection process is conducted by using an unsupervised learning-enhanced approach, and a decision tree regressor is then employed for removing the threat. The dynamic state estimation of the power system is done by Kalman ﬁlters, which provide beneﬁts in terms of the speed and accuracy of the process. Measurement devices in utilities and buses are vulnerable to communication interruptions between phasor measurement units and operators, who can be easily manipulated by false data. While Kalman ﬁlters are incapable of detecting the majority of such cyber-attacks, this article proves that the proposed unsupervised machine learning method is able to detect more than 90 percent of the mentioned attacks. The simulation results on the IEEE 9-bus with 3-machines and IEEE 14-bus with 5-machines systems verify the efﬁciency of the proposed approach.


Introduction
Many industries are becoming more modernized as technology advances, including the power systems [1]. High-speed internet is being used as the primary mean of communication between various sectors of the power grid. Cyber-attacks pose a significant threat to various industries, as technologies increasingly rely on wireless communications, and most of the power system operations, such as energy management programs, state estimation, optimal power flow, etc., depend on safe and reliable communications [2,3]. DSE is an important tool for monitoring and controlling the power network, especially when the system is performing in the transient mode [4]. DSE is an effective method to track the behavior of the power system in the transient mode, and it usually employs Kalman filters to perform the state estimation process. Modernized power systems, known as smart grids, rely heavily on wireless communication, making them vulnerable to cybercriminals who can tamper with data derived from PMUs [5]. DSE usually uses PMU data as inputs for estimation of the dynamic behavior of the power system, and thus the communication between PMUs and central control is of paramount importance due to vulnerability to cyber-attacks.
Because traditional methods based on various types of Kalman filters are only effective against certain types of cyber-sabotages [5], a machine learning-enhanced method is needed for optimizing the detection process of these attacks. In this article, clustering and In Towards Data Science, Lorraine Li used decision tree for regression and classification problems [31], and in [32], the authors propose a power system toolbox for MATLAB. An EKF/UKF toolbox is proposed in [33]. In [34], a toolbox named PSAT is proposed for dynamic analysis of the power system. It is worth noting that all of the mentioned toolboxes are employed for simulating our results in this paper.
Reference [35] proposes a machine learning-based approach to detect FDI attacks in the power system. In [36], the authors propose a linear approach to detect cyber-attacks and outliers in PMU-based power system state estimation, and in [37], a supervised learning-based approach is proposed to detect DoS attacks in smart grids. Table 1 shows a comparison of the methods studied in this article and others in the same field. As illustrated in Table 1, this article contributes to the field in at least two ways. Firstly, it employs two machine learning-based methods to detect and eliminate the cyber-threats. Secondly, it draws a comparison between Kalman filters and a proposed hybrid machine learning method for the same purpose. The rest of this paper is organized as follows. Section 2 formulates models for DSE, fourth-order generator, cyber-attacks and the two Kalman filters utilized for simulation. Machine learning methods are described in Section 3. In Section 4, the proposed approach is detailed, and in Section 5 the proposed method is examined by different case studies, with the results of the simulations being illustrated as well. The paper is concluded in Section 6.

Methods
This section firstly presents the generator model of the power system and then describes attack models.

DSE and the Generator Model
The procedure for estimating the dynamic state of a power system is relatively well known and can be found in many related articles [12,18,[23][24][25][26]. The relationship between the system's dynamic states and the measurements is formulated as follows: where x is the vector containing the states of the system, y is the measurements vector, u refers to the control vector, v is the process noise, w represents the measurement noise, and k is the number of iterations. Both h and f are non-linear state and measurement functions, illustrated in (3) and (4). It is worth noting that in this article, we assume that x, y, and u are shaped as follows.
where δ, ω are the rotor angle and rotor speed, respectively. ω y is the measurement of the rotor speed, P e is the electrical power of the generator, and P m is the mechanical input power of a generator. U and ϕ represent the voltage magnitude and phase angle of the respective bus, E f is the field voltage of the synchronous generator, and T m is the mechanical torque derived from the governor. The fourth-order transient swing equations are formulated below [12,18,[23][24][25][26].
T J represents the inertia time constant, while T e is the electromagnetic torque, and D is the damping coefficient. X d and X q are d-axis and q-axis reactances, respectively, while X d and X q are d-axis and q-axis transient reactances. T d0 and T q0 are d-axis and q-axis transient time constants. i d , i q are d-axis and q-axis output currents and E d , E q are d-axis and q-axis voltages of a generator.
where δ y is the measurement of the rotor angle, and P y e is the measurement of the electrical power. Both E f and T m are control features that can be obtained from governor and exciter models depending on the power utility, and in both IEEE 9-bus and IEEE 14-bus systems, the power utilities are assumed to be steam power plants. For the random noises, v along with w were calculated by random value generation ranging from "Gaussian" to "Central Limit".
In this article, two different Kalman filters for the forecasting and filtering stages are proposed to improve the speed and accuracy of the DSE. Kalman filtering is an algorithm that estimates some unknown variables taking into account the observed measurements over time. Kalman filters have proven themselves in a wide variety of applications, are relatively simple and easy to use, and require little computing power. The primary aim of employing Kalman filters in our study is to make use of the minimization ability of both EKF and UKF in a non-linear space to reduce the covariance of the squared error between estimated and real states. Both of the aforementioned filters take different approaches to accomplishing this task. Both EKF and UKF formulations can be found in [27], and the two final stages of DSE in this paper, i.e., forecasting and filtering, are based on these. Figure 1 depicts the DSE procedure with the addition of EKF and UKF.

Attack Models
Cyber-criminals can easily manipulate the DSE process by changing the measurement data-driven by PMUs. Numerous attacking scenarios can be obtained in cyber-enhanced sabotage, particularly in dynamic situations ranging from FDI and DoS to spoofing attacks. Malfunctions and bad data frequently cause disasters and significantly impact the power network's monitoring system, which should be held accountable for the grid's smooth operation. This section goes over two different attack scenarios. Figure 2 depicts a brief summary of how cyber-attacks are carried out on power grids.

Attack Models
Cyber-criminals can easily manipulate the DSE process by changing the measurement data-driven by PMUs. Numerous attacking scenarios can be obtained in cyber-enhanced sabotage, particularly in dynamic situations ranging from FDI and DoS to spoofing attacks. Malfunctions and bad data frequently cause disasters and significantly impact the power network's monitoring system, which should be held accountable for the grid's smooth operation. This section goes over two different attack scenarios. Figure 2 depicts a brief summary of how cyber-attacks are carried out on power grids.

FDI Attack
Assume the state vector derived from Kalman filters as = [ 1 , 2 , 3 , … , ] T , in which n is the number of states and the measurement vector as = [ 1 , 2 , 3 , … , ] , where m is the number of measurements. The invader targets measurements for the FDI attack because the DSE is highly vulnerable to PMUs records and can be easily manipulated by incorrect data. As shown in (1), measurements are dependent on states and the control vector, so the residual definition can be written as follows: in which is the residual representing the difference between the measured value and the calculated value. It is worth noting that in the optimum situation, the residual is a

Attack Models
Cyber-criminals can easily manipulate the DSE process by changing the measurement data-driven by PMUs. Numerous attacking scenarios can be obtained in cyber-enhanced sabotage, particularly in dynamic situations ranging from FDI and DoS to spoofing attacks. Malfunctions and bad data frequently cause disasters and significantly impact the power network's monitoring system, which should be held accountable for the grid's smooth operation. This section goes over two different attack scenarios. Figure 2 depicts a brief summary of how cyber-attacks are carried out on power grids.

FDI Attack
Assume the state vector derived from Kalman filters as = [ 1 , 2 , 3 , … , ] T , in which n is the number of states and the measurement vector as = [ 1 , 2 , 3 , … , ] , where m is the number of measurements. The invader targets measurements for the FDI attack because the DSE is highly vulnerable to PMUs records and can be easily manipulated by incorrect data. As shown in (1), measurements are dependent on states and the control vector, so the residual definition can be written as follows: in which is the residual representing the difference between the measured value and the calculated value. It is worth noting that in the optimum situation, the residual is a

FDI Attack
Assume the state vector derived from Kalman filters as x = [x 1 , x 2 , x 3 , . . . , x n ] T , in which n is the number of states and the measurement vector as y = [y 1 , y 2 , y 3 , . . . , y m ] T , where m is the number of measurements. The invader targets measurements for the FDI attack because the DSE is highly vulnerable to PMUs records and can be easily manipulated by incorrect data. As shown in (1), measurements are dependent on states and the control vector, so the residual definition can be written as follows: in which ε is the residual representing the difference between the measured value and the calculated value. It is worth noting that in the optimum situation, the residual is a definite zero. Assume the attack vector as A = [A 1 , A 2 , A 3 , . . . , A m ] T . The residual under the FDI attack can be calculated as follows: The outcome of the process is an incorrect x which will significantly impact operators for making decisive decisions and may also lead to power outages. In conclusion, while it has the shape of A, the attack has succeeded, and the system states are going to be inaccurate, as illustrated in Figure 3. The subplot in Figure 3 shows how the true measurements are ] . The residual under the FDI attack can be calculated as follows: The outcome of the process is an incorrect x which will significantly impact operators for making decisive decisions and may also lead to power outages. In conclusion, while it has the shape of A, the attack has succeeded, and the system states are going to be inaccurate, as illustrated in Figure 3. The subplot in Figure 3 shows how the true measurements are changing under the FDI attack. As it is illustrated in the mentioned figure, some of the measurements randomly increase or decrease.

DoS Attack
This attack is based on data loss, and it can mislead operators by disrupting communication between PMUs and data centers, resulting in power system failures, such as blackouts. A DoS attack can be simulated in a variety of ways, and the Bernoulli distribution [24] method is considered in this article. DoS attacks can be carried out at various intervals or concurrently during the grid's transient time. Assume that = [ 1 , 2 , 3 , … , ] is the measurement vector in which m is the number of measurements. The interval of DoS attack is assumed to be a vector named tD, and the attack lasts until tDk, so the time vector is = [ 0 , 1 , 2 , … , ]. By employing a Bernoulli distribution, the attack vector of A is described as below.
and ∈ [ 0 , ] while the ( ( )) = 1. The DoS attack will result in a new measurement vector named yA. The attacked measurement vector is formulated as follows: As in (6), the after the DoS attack is going to be calculated as: Therefore, by twisting the measured data, DSE will fail to estimate the true states, and operators will be manipulated by the wrong information delivered by DSE at several different times (within tD). In Figure 4, the mechanism of the DoS attack is illustrated. The

DoS Attack
This attack is based on data loss, and it can mislead operators by disrupting communication between PMUs and data centers, resulting in power system failures, such as blackouts. A DoS attack can be simulated in a variety of ways, and the Bernoulli distribution [24] method is considered in this article. DoS attacks can be carried out at various intervals or concurrently during the grid's transient time. Assume that y = [y 1 , y 2 , y 3 , . . . , y m ] T is the measurement vector in which m is the number of measurements. The interval of DoS attack is assumed to be a vector named t D , and the attack lasts until t Dk , so the time vector is t D = [t D0 , t D1 , t D2 , . . . , t Dk ]. By employing a Bernoulli distribution, the attack vector of A is described as below.
The DoS attack will result in a new measurement vector named y A . The attacked measurement vector is formulated as follows: As in (6), the ε after the DoS attack is going to be calculated as: Therefore, by twisting the measured data, DSE will fail to estimate the true states, and operators will be manipulated by the wrong information delivered by DSE at several different times (within t D ). In Figure 4, the mechanism of the DoS attack is illustrated. The subplot in Figure 4 shows how the true measurements change under the FDI attack. As it is illustrated in the mentioned figure, some of the measurements, depending on Bernoulli's probability, become zeros.  Figure 4 shows how the true measurements change under the FDI attack. As it is illustrated in the mentioned figure, some of the measurements, depending on Bernoulli's probability, become zeros.

AI-Based Methods
In this article, an unsupervised learning method, HC, and a supervised method, DTR, are employed to facilitate the detecting and eliminating process.

Hierarchical Clustering
HC is a broad category of clustering algorithms that construct nested clusters by successively merging or splitting them. This cluster hierarchy is portrayed as a tree (or dendrogram). The tree's root is a single cluster that collects all of the samples, while the leaves are clusters with just one sample [28], making the HC a suitable method for detecting the cyber-attack performed in a power system with various sample data derived from PMUs. In this article, the agglomerative type of HC is employed, which is a bottom-up approach. Each discovery begins in its cluster, and when one progresses up the hierarchy, pairs of clusters are combined. Figure 5 represents the dendrogram of agglomerative HC.

AI-Based Methods
In this article, an unsupervised learning method, HC, and a supervised method, DTR, are employed to facilitate the detecting and eliminating process.

Hierarchical Clustering
HC is a broad category of clustering algorithms that construct nested clusters by successively merging or splitting them. This cluster hierarchy is portrayed as a tree (or dendrogram). The tree's root is a single cluster that collects all of the samples, while the leaves are clusters with just one sample [28], making the HC a suitable method for detecting the cyber-attack performed in a power system with various sample data derived from PMUs. In this article, the agglomerative type of HC is employed, which is a bottom-up approach. Each discovery begins in its cluster, and when one progresses up the hierarchy, pairs of clusters are combined. Figure 5 represents the dendrogram of agglomerative HC.
Energies 2021, 14, x FOR PEER REVIEW 7 of 21 subplot in Figure 4 shows how the true measurements change under the FDI attack. As it is illustrated in the mentioned figure, some of the measurements, depending on Bernoulli's probability, become zeros.

AI-Based Methods
In this article, an unsupervised learning method, HC, and a supervised method, DTR, are employed to facilitate the detecting and eliminating process.

Hierarchical Clustering
HC is a broad category of clustering algorithms that construct nested clusters by successively merging or splitting them. This cluster hierarchy is portrayed as a tree (or dendrogram). The tree's root is a single cluster that collects all of the samples, while the leaves are clusters with just one sample [28], making the HC a suitable method for detecting the cyber-attack performed in a power system with various sample data derived from PMUs. In this article, the agglomerative type of HC is employed, which is a bottom-up approach. Each discovery begins in its cluster, and when one progresses up the hierarchy, pairs of clusters are combined. Figure 5 represents the dendrogram of agglomerative HC.  The distance metrics for HC range from the Euclidean distance to the Mahalanobis distance. The most common distance metric for agglomerative clustering is the Euclidean distance. Assume that a and b are two different data vectors. The Euclidean distance is formulated as below.
The linkage criterion determines the distance between sets of observations that includes pairwise distances between observations. Commonly used linkages criteria are complete-linkage clustering and single-linkage clustering. Assume that A and B are two sets of observations. The single-linkage clustering is formulated as below.
where d is the distance metric.

Decision Tree
Decision trees are a non-parametric supervised learning method for classification and regression tasks [29]. The goal is to learn basic decision rules from data features to construct a model for prediction. Used at the elimination stage, DTR is employed as a prediction method to prevent the manipulation of the operators and the upcoming disasters. A supervised learning technique requires labeled data to train the model with, and, for that purpose, we simulated numerous dynamic events and trained the tree regressor with different non-attacked data. DTR can work with time series and continuous values such as those we are facing in the power networks, making this method well suited for this purpose [30]. DTR does not necessarily require pre-scaling or pre-processing of data, which is useful in the case of generators' angles as the angle is forecasted in degree. Missing values in the power system data also do not affect the process of building a decision tree to any considerable extent. Last but not least, DTR prevents overfitting and boosts the speed of the learning process compared to other methods. Assume a training data set X = x 1 , x 2 , . . . , x n with the responses Y = y 1 , y 2 , . . . , y n in which n is the number of samples. Bagging will create a random sample with replacement to boost the accuracy in B repeats. Therefore, a sample of training data X b , Y b , b = 1, 2, . . . , B, is created, and the DTR can be fitted with them, and after that, other non-tested samples x will be predicted. A considerable advantage of DTR is that with respect to the whole forest trained on the bagged datasets, the variance will be decreased without increasing the bias, meaning that the model is not sensitive to noise. The predicted value for test data x is calculated as follows by assuming f as tree regressor.
and the standard deviation can be formulated as below:

Problem Formulation
Both machine learning methods mentioned in the last section are employed to spot and eliminate cyber-attacks. For tackling the cyber-sabotage problem in the power system, a modified version of HC is employed. The main idea behind this approach is to identify the anomalous data and eliminate them and reduce the features used as input to the DTR so that the algorithm predicts the correct states of the system. The challenge of the proposed method is to maintain the high accuracy of its predicted states (rotor angle and rotor mechanical speed), and reducing the dimension of the main features exerts a pervasive influence on the accuracy of this method. The main features chosen in this work are as follows. In which f ea is the vector of features, and the clustering distortion is formulated as below [28].
While s i is the ith sample, the cluster centroids are µ c , m is the number of clusters, and c = [1, 2, 3, . . . , k].
HC is a vital tool for detecting anomalies, and when data is far from other tree roots or leaves, it is usually clustered as an outlier with respect to the threshold set for the method. Therefore, for each measurement type, voltage, speed, etc., an HC algorithm will be utilized to detect the attacked data. As the data derived from PMUs are flowing, the HC accepts the new data and starts clustering. If the data is clustered as an outlier, the algorithm will send it to the DTR, and the regression method replaces the data by using other features and predicts the real value of the attacked data, and then the predicted value will be sent to the DSE, while HC will delete the attacked data from its database. If not an outlier, the data will be sent directly to the DSE for state estimating purposes. We need an impurity metric appropriate for continuous variables to use a decision tree for regression, so we define the impurity measure using the children's leaves' weighted mean squared error (MSE) [31].
where N t is the number of samples at the leave t, while D t is the training subset, a (i) is the true target value andâ t is the estimated target value. It is worth noting that the mentioned equations are used for the training process of the DTR. Figure 6 illustrates the flowchart of the proposed method.  For comparing different methods, some indices are defined and used as follows [12,36,37].
where N is the number of samples whilex i and x itrue are estimated and true states, respectively. S a and S ad are the number of the attacked data and detected attacked data, respectively. M c is the number of Monte-Carlo replications, which in this article is set to be 100. T and t 0 are the end and the starting time of the period in which the cyber-attack was launched, respectively. It is clear that the first index is able to evaluate the estimation results, while the second one is the attack classification ratio. The last index represents the least squared error measure.

Simulation and Results
Here, the proposed method was tested on the IEEE 3-machine 9-bus system and the IEEE 5-machine 14-bus system, while the data of these test systems are derived by using the MATLAB power system toolbox [32] and the EKF and UKF methods are from the EKF/UKF toolbox [33]. All tests are conducted with MATLAB 2020a and Python 3.8. A sudden load fluctuation happened in 0.1 and lasted for 1 s in both test systems. The PMU sample rate is 120 samples per second, and a PMU is utilized at each generator bus.
Two case studies are represented in this article, and various cyber-attacks are employed for the simulation process. Both FDI and DoS attacks are simulated with different attack vectors and probabilities as illustrated in Table 2. It is worth noting that the base rotor speed is 376.8 rad/sec for both case studies, and the base generator angle is 1 degree. The cyber-attacks were launched over t = 4.2 s and exerted a significant influence on the DSE. The HC has clustered all the features simultaneously by taking the distortion level of features into account, and the DTR was held responsible for clearing the attack and correcting the states. It is worth noting that the DTR was trained by numerous data from different contingencies ranging from three-phase fault to lightning stroke, all of which are available on a MATLAB power system's toolbox named PSAT [34].

FDI-first scenario
In the three scenarios of FDI cyber-attack, the "Normal Distribution" is employed with different standard deviations for simulating the attacks [35]. In DoS cases, a "Packet Loss Ratio" is utilized for simulating the DoS attack process with four different intensities. Figure 7 shows the schematic of the IEEE 3-machine 9-bus and IEEE 5-machine 14-bus systems. The whole simulation time is about 10 s, while the distortion constant is set to 10 for the IEEE 9-bus and 30 for the IEEE 14-bus. Figure 8 illustrates the first generator's states derived from the DSE, aided by EKF, UKF, and the proposed method under the three FDI cyber-attack scenarios. Figure 9 shows the dynamic states of the mentioned generator calculated by DSE under DoS cyber-sabotages for the IEEE 3-machine 9-bus test system. with different standard deviations for simulating the attacks [35]. In DoS cases, a "Packet Loss Ratio" is utilized for simulating the DoS attack process with four different intensities. Figure 7 shows the schematic of the IEEE 3-machine 9-bus and IEEE 5-machine 14-bus systems. The whole simulation time is about 10 s, while the distortion constant is set to 10 for the IEEE 9-bus and 30 for the IEEE 14-bus. Figure 8 illustrates the first generator's states derived from the DSE, aided by EKF, UKF, and the proposed method under the three FDI cyber-attack scenarios. Figure 9 shows the dynamic states of the mentioned generator calculated by DSE under DoS cyber-sabotages for the IEEE 3-machine 9-bus test system.   From Figure 8a-c, it is clear that the proposed method boosted the accuracy of the DSE, especially during the time of FDI cyber-attacks, a task in which both EKF and UKF performed poorly. It is worth noting that before the cyber-attack, all three methods accurately estimated the dynamic states of the network. After the cyber-attack was launched, however, Kalman filters failed to detect and eliminate the attacks. The situation deteriorates in the case of DoS attacks. From (a) to (d) subplots of Figure 9, it can be observed that the mentioned filters almost failed to eliminate the attacks, while the proposed DTR-based method properly detected and eliminated the attack vectors.
In Figure 10a, an example of an attacked dataset detected by the HC method is illustrated, while in Figure 10b, a feature is shown which is not attacked. Both of the mentioned figures are heatmaps plotted by scatter function in Python with "cmap" set to cool. The former is the rotor speed of the second generator, and the latter is the voltage angle of bus three. Figure 11a,b shows the clustering inertia of both mentioned features. The accuracy of the proposed method significantly depends on the accurate functioning of the clustering method, which diagnoses malfeatures. From Figure 8a-c, it is clear that the proposed method boosted the accuracy of the DSE, especially during the time of FDI cyber-attacks, a task in which both EKF and UKF performed poorly. It is worth noting that before the cyber-attack, all three methods accurately estimated the dynamic states of the network. After the cyber-attack was launched, however, Kalman filters failed to detect and eliminate the attacks. The situation deteriorates in the case of DoS attacks. From (a) to (d) subplots of Figure 9, it can be observed that the mentioned filters almost failed to eliminate the attacks, while the proposed DTR-based method properly detected and eliminated the attack vectors.
In Figure 10a, an example of an attacked dataset detected by the HC method is illustrated, while in Figure 10b, a feature is shown which is not attacked. Both of the mentioned figures are heatmaps plotted by scatter function in Python with "cmap" set to cool. The former is the rotor speed of the second generator, and the latter is the voltage angle of bus three. Figure 11a,b shows the clustering inertia of both mentioned features. The accuracy of the proposed method significantly depends on the accurate functioning of the clustering method, which diagnoses malfeatures.
In Figure 10a, an example of an attacked dataset detected by the HC method is illustrated, while in Figure 10b, a feature is shown which is not attacked. Both of the mentioned figures are heatmaps plotted by scatter function in Python with "cmap" set to cool. The former is the rotor speed of the second generator, and the latter is the voltage angle of bus three. Figure 11a,b shows the clustering inertia of both mentioned features. The accuracy of the proposed method significantly depends on the accurate functioning of the clustering method, which diagnoses malfeatures.
(a) (b) Figure 10. (a) Rotor speed of the second generator in IEEE 9-bus system under the FDI attack; (b) voltage angle of the third bus in IEEE 9-bus system. Figure 10. (a) Rotor speed of the second generator in IEEE 9-bus system under the FDI attack; (b) voltage angle of the third bus in IEEE 9-bus system. The proposed indices are calculated and compared to another related study in Tables 3 and 4 under different attack scenarios in the IEEE 3-machine 9-bus test system. It is worth noting that the method proposed in [12] is RCKF.  The proposed indices are calculated and compared to another related study in Tables 3 and 4 under different attack scenarios in the IEEE 3-machine 9-bus test system. It is worth noting that the method proposed in [12] is RCKF.
From Figure 11a,b, it is clear that as soon as the attacked measurement of rotor speed enters the HC, the distortion of only one cluster boosts rapidly, and the injected data is eliminated, while all of the voltage angle data are correct and the distortion for only one cluster is smaller than d. The second index has the same value for both rotor speed and angle, as it measures the detecting accuracy of HC and does not depend on any individual features. By taking the first index into account, the proposed method works slightly better than that of [12], which shows the higher accuracy of the proposed method.
From Tables 3 and 4, it can be seen that the HC-DTR-based dynamic state estimation outperformed the RCKF technique. The HC model managed to detect the attacked data better than the Kalman filter algorithm, and the DTR predicted the actual values more robustly than the method conducted in [12]. Figure 12 illustrates the generator's states in the IEEE 5-machine 14-bus under three different FDI attack scenarios, while Figure 13 shows the generator's states under DoS attack scenarios. In this test system, only UKF was employed as an alternative method due to the low accuracy of EKF.   From Figure 11a,b, it is clear that as soon as the attacked measurement of rotor speed enters the HC, the distortion of only one cluster boosts rapidly, and the injected data is eliminated, while all of the voltage angle data are correct and the distortion for only one cluster is smaller than d. The second index has the same value for both rotor speed and angle, as it measures the detecting accuracy of HC and does not depend on any individual features. By taking the first index into account, the proposed method works slightly better than that of [12], which shows the higher accuracy of the proposed method.
From Tables 3 and 4, it can be seen that the HC-DTR-based dynamic state estimation outperformed the RCKF technique. The HC model managed to detect the attacked data better than the Kalman filter algorithm, and the DTR predicted the actual values more robustly than the method conducted in [12]. Figure 12 illustrates the generator's states in the IEEE 5-machine 14-bus under three different FDI attack scenarios, while Figure 13 shows the generator's states under DoS attack scenarios. In this test system, only UKF was employed as an alternative method due to the low accuracy of EKF.  As it is clear from Figure 12, the proposed machine learning-based method's accuracy is far better than the UKF's, even in more extensive scenarios under FDI attacks. Similar to the previous cyber-attack, it is clear from Figure 13 that the DoS attack is well detected and eliminated by the proposed method, a task in which the UKF has failed. The DTR method shows considerable potential in eliminating different cyber-attacks, as illustrated in the mentioned figures for both the IEEE 3-machine 9-bus test system and the IEEE 5-machine 14-bus test system. Figure 14 illustrates the rotor speed's data of the second generator and voltage angle's data of the third bus as attacked and the true features. Both mentioned figures are heatmaps plotted by scatter function in Python with "cmap" set to warm, while Figure 15 shows the cluster inertia of both features. It is worth noting that the HC method is clustering the data simultaneously, which is vitally essential for rapid response against cyber-attacks. The proposed indices are illustrated in Tables 5 and 6 for rotor angle and  in the mentioned figures for both the IEEE 3-machine 9-bus test system and the IEEE 5machine 14-bus test system. Figure 14 illustrates the rotor speed's data of the second generator and voltage angle's data of the third bus as attacked and the true features. Both mentioned figures are heatmaps plotted by scatter function in Python with "cmap" set to warm, while Figure 15 shows the cluster inertia of both features. It is worth noting that the HC method is clustering the data simultaneously, which is vitally essential for rapid response against cyber-attacks. The proposed indices are illustrated in Tables 5 and 6 for rotor angle and   in the mentioned figures for both the IEEE 3-machine 9-bus test system and the IEEE 5machine 14-bus test system. Figure 14 illustrates the rotor speed's data of the second generator and voltage angle's data of the third bus as attacked and the true features. Both mentioned figures are heatmaps plotted by scatter function in Python with "cmap" set to warm, while Figure 15 shows the cluster inertia of both features. It is worth noting that the HC method is clustering the data simultaneously, which is vitally essential for rapid response against cyber-attacks. The proposed indices are illustrated in Tables 5 and 6 for rotor angle and rotor speed, respectively, and compared to results from two other related studies [36,37], for the IEEE 5-machine 14-bus test system. It is worth mentioning that [36] proposed a non-linear method based on a novel Kalman filter for detecting and eliminating the FDI attack, while [37] employed a support vector machine classification-based method for diagnosing the DoS attack.    It is clear from Tables 5 and 6 that the proposed method possesses better detection accuracy in the case of DoS attacks than that of [36] and more accuracy for estimating the dynamic states of the case study than that of [37]. Other indices illustrate that the proposed method is fully capable of eliminating DoS and FDI cyber-attacks and simply outperforms other mentioned techniques in [36,37].

Conclusions
A two-stage machine learning-based method was proposed in this paper to tackle the cyber-sabotage issue in the smart grid by clustering data using an HC method and regressing with DTR to eliminate the attack. This paper contributes to the area of DSE in power networks by using an unsupervised learning method for attack detection and an ensemble learning method for attack elimination. This novel technique was capable of detecting and eliminating cyber-attacks and tracking the dynamic states of the power system, which can provide significant help to human operators to prevent them from making wrong decisions during the transient time in the power system operation. The proposed method carried out the given tasks better than previous methods based on the traditional Kalman filter and support vector machines. By correctly diagnosing the attack vectors, the proposed method provides the operators with accurate state estimations, decreasing the risk of blackouts or other disasters due to wrong commands. However, the full efficiency of the proposed method is yet to be tested in a large-scale power grid network, and the cost for this was not considered in the present study. Our future work will also focus on developing effective methods for distinguishing between faults, cyber-attacks, damaged PMUs, and measurement noise in power networks.

Conflicts of Interest:
The authors declare no conflict of interest.