Covert Cyber Assault Detection in Smart Grid Networks Utilizing Feature Selection and Euclidean Distance-Based Machine Learning

.


Introduction
The emerging smart grid (SG) concept as a cyber-physical complex organization is being implemented through a composition of communications networks overlaying traditional power systems. Due to the electrical energy flow's vital dependency on communications technologies in the SG, its vulnerability to new malicious types of cyber attack is very high. Nation states are apprehensive about power grid privacy and security. Therefore, vigorous and secure communications management is essential to all aspects of the SG. The security, privacy and integrity of data and the information network have become a prime focus of research activities in SGs. Traditionally, bulk storage of the generated electricity is not possible, and hence, its generation should be closely equated to consumption; otherwise, there can be a deviation in the electrical quantities. Thus, the power control center (PCC) needs to monitor the power network closely to make sure that the operation of the power system is safe and reliable. State estimation (SE) is a fundamental approach employed in an energy management system (EMS) to monitor states in power networks.
Fundamental elements (generation, transmission and consumption) of a power system, along with communications links, are illustrated in Figure 1. Distributed sensors, actuators and meters designated as remote terminal units (RTUs) are employed in electric power grids to aggregate measurements, including bus power insertions and branch power flows. These measurements are combined at the PCC via communications links and are further used to estimate the states (i.e., bus voltage angles). These state variables form the basis of suitable decisions by the EMS about auto-generation control (AGC) and optimal power flow (OPF) to keep electric power systems in a safe operating zone. On the one hand, the existence of a communications infrastructure is compulsory for the realization of efficient monitoring and intelligent control in the framework of an SG, but the communications infrastructure is prone to malicious cyber-assault threats [1][2][3], due to certain incentives for the attacker. Unidirectional flow of information in legacy power networks (i.e., from RTUs to PCC) makes it more important to study a particular type of malicious user behavior that attempts to target the integrity of the measurement data by inserting a deceptive bias value into the SE. Such malicious activity goes mostly undetected by bad-data detection (BDD) systems in the legacy PCC. We term this kind of attack a covert cyber deception (CCD) assault, but it is also known as a false data injection (FDI) attack, a cyber stealthy deception (CSD) attack, and so on [4]. Identification and removal of the susceptibilities or anomalies injected by a CCD assault are critically important because of their negative impacts on the safety and reliability of SGs. Methods reported in the literature to mitigate the effects of CCD assaults on SGs can be broadly divided into two classes: (1) protection-based defense; and (2) detection-based defense. M. Ozay et al. [5] utilized a variety of machine learning-based schemes to detect the CCD assault in SG. Sandeep et al. [6] proposed joint-transformation-based scheme to detect CCD assault. They utilize the Kullback-Leibler distance to find out the difference between probability distributions obtained from measurement variations. However, they did not employ feature selection (FS)-based techniques in their work to tackle the dimensionality issue with increasing power system sizes. Esmalifalak et al. [7] used the PCA-based feature extraction (FE) technique to tackle the dimensionality issue in the state estimation-measurement feature (SE-MF) dataset and then employed a statistical method-based anomaly detection (AD) mechanism to detect the CCD assault. Unlike the existing schemes, in this paper, we focus on the selection of the discriminating features from the SE-MF dataset utilizing genetic algorithm (GA) to tackle the curse of dimensionality [8]. The optimal features selected from the SE-MF dataset are then used as input by the two proposed Euclidean distance (ED)-based AD schemes for the detection of a CCD assault. Contrary to the FE-based approach, the proposed FS-based method does not alter the original representation of the data.

Motivation
Normal data are consistent with physical laws, like Kirchhoff's current and voltage laws, whereas the compromised data that are affected by a CCD assault are inconsistent with these laws. Therefore, normal and compromised data will have different distributions and will, therefore, tend to form different clusters. These clusters would be distinguishable in a feature space of suitable dimensions. This fundamental distinction inspires the distance-based anomaly detection schemes for the detection of CCD assaults. Unsupervised anomaly detection (AD) techniques are persuasive at differentiating between data that have different underlying distributions, particularly when the data are not labeled. Similarly, supervised anomaly detection techniques can be employed to detect the anomalies in the labeled data. Thousands of sensors or RTUs are employed in power grids, spanning over a vast geographical area. Practically, defense mechanism can be designed to protect a limited set of critical RTUs and corresponding measurement features (MFs). Therefore, in the context of SG cyber-security, the selection of distinctive features from the SE-MF dataset becomes a promising strategy to detect the CCD assault and tackle the curse of dimensionality [8].

Related Works
The benefits and risks involved in utilizing communications technologies alongside legacy electrical power systems have been widely reviewed [9][10][11][12][13]. Intrusion into a communications network by a pernicious user who is aiming to target the integrity of the data can have a catastrophic impact on the secure and reliable operation of an SG [14][15][16]. Therefore, in the context of the security of SGs, understanding the nature of the assault and identification of compromised data has been the focus of research in electric power systems. The conventional state estimator in a PCC utilizes BDD to single out and disassociate the bad data for state estimation. However, Liu et al. [17] demonstrated that a smart attacker who has information on the network topology can realize the construction of a set of falsified data that can dodge legacy BDD. This type of attack is known as an unobservable (or covert) cyber assault. Many schemes considering the construction of intrusion assaults against state estimation, and the subsequent defense measures against them, have been discussed in the literature [4,[13][14][15][16][17][18][19][20][21]. Li et al. [18] proposed a decentralized conjunctive rule-based majority voting algorithm to detect compromised or assaulted phase measurement units. Huang et al. [19] proposed cumulative sum hypothesis test-based bad-data detection in a state estimator. Xie et al. [22] demonstrated that a data integrity assault can methodically result in a considerable economic loss in real-time market operations. Similarly, Esmalifalak and colleagues [23] studied the fiscal impact of a false data injection attack on electric power market operations. An encryption-based security mechanism integrated into power system devices was proposed [24] to improve the security of the power system against FDI attacks. Methods reported in the literature to mitigate the effects of CCD assaults on SGs can be broadly divided into two classes: (1) protection-based defense; and (2) detection-based defense. Thousands of sensors or RTUs are employed in power grids, spanning over a vast geographical area. Practically, defense mechanism can be designed to protect a limited set of critical RTUs and corresponding measurement features (MFs). Therefore in the context of SG cyber-security and computational complexity, the selection of distinctive features becomes a promising strategy to detect the CCD assault in real time [5]. Sandeep et al. [6] proposed a joint-transformation-based scheme to detect CCD assault. They utilized the Kullback-Leibler distance to find out the difference between probability distributions obtained from measurement variations. However, they did not employ feature selection (FS)-based techniques in their work to tackle the dimensionality issue with increasing power system sizes. Esmalifalak et al. [7] used the PCA-based FE technique to tackle the dimensionality issue in the SE-MF dataset and then employed a statistical method-based AD mechanism to detect the CCD assault.
In summary, existing works on CCD assault detection in SGs only have generally considered feature extraction or transformation [5][6][7] in the context of cybersecurity and the curse of dimensionality. To the best of our knowledge, the selection of distinguishing features from the SE-MF dataset in the context of SG security is still an open problem.

Contributions
In this paper, we focus on the selection of the discriminating features from the SE-MF dataset utilizing the genetic algorithm (GA) to tackle the curse of dimensionality [8]. The optimal features selected from the SE-MF dataset are then used as input by the two proposed Euclidean distance (ED)-based AD schemes for the detection of a CCD assault. Contrary to the FE-based approach, the proposed FS-based method does not alter the original representation of the data. The main contributions of this paper can be summarized as follows: • We study intelligently-crafted CCD assaults on the SE-MF dataset, and we investigate how such an assault goes undetected in legacy systems that use bad-data detectors.

•
To tackle the increasing computational complexity with the growing sizes of power systems, we use GA for the selection of independent and discriminating features from the SE-MF dataset. The selection of discriminative features leads to lower computational costs, a shorter time delay and improved accuracy. • First, we propose an ED-based AD scheme to detect the presence of outliers in the unlabeled SE-MF dataset. Next, we extend the first scheme to propose a detection mechanism for the labeled SE-MF dataset. In both schemes, the optimal features selected through the GA are employed as input.

•
We use the IEEE standard 14-bus, 39-bus, 57-bus and 118-bus test systems to evaluate the efficiency of the proposed schemes. The performance evaluation shows that the proposed schemes provide better accuracy, in comparison to existing AD-based schemes.

Paper Organization
The remainder of this paper is organized as follows. In Section 2, we present the system model and explain the behavior of a CCD assault in SG networks. In Section 3, we first describe the GA-based FS mechanism and then describe the two proposed ED-based AD schemes to detect CCD assaults. Simulation results are presented in Section 4. We conclude the paper in Section 5. Table 1 lists the abbreviations used throughout the paper, and Table 2 lists and CCD assault notations and concepts.

Covert Cyber Deception Assault
State estimation at the PCC is the essential instrument for ensuring the reliable and sustainable functioning of electrical power networks [25]. As illustrated in Figure 1, the measurement data collected from RTUs via communications networks are used by the state estimator to determine the system states over time. The problem with state estimation is how to approximate power system state variables based on the measurement data.

Legacy Bad-Data Detectors in PCCs
The measurement data and the state variables are related through the following alternating AC power flow observation model: where h (δ) is a non-linear relationship between measurement data, Z meter , and the state vector δ; e = [e 1 , e 2 , ..., e m ] T is the Gaussian measurement noise vector with standard deviation σ. Using a linear or direct current (DC) power flow model, the observation model in (1) becomes further simplified with a small sacrifice of accuracy, as follows [26,27]: In a DC power flow problem, the Jacobian matrix H can be approximated as follows: where H is composed of topology and impedance data only. One objective of (2) is to determine the estimated state,δ, that is the best fit for the meter measurements. In other words, we can say that the best estimated value can minimize estimation weighted least square (WLS) error, Z meter −Hδ T Ω Z meter −Hδ . By applying the WLS statistical estimation criteria, the estimated voltage phase angle is given as follows:δ being the variances of meter errors. Noise in the wireless medium, faulty meters or malicious user behavior (like CCD assaults) can be potential sources for abnormal data in estimated measurements. Current power systems use a residual-based detector for BDD to protect state estimation [28]. The difference between observed meter measurements Z meter and estimated measurementsẐ is the residual, R, and it is expressed as follows: The expected value and the co-variance of the residual are: The BDD present in the PCC performs the l 2 -norm test [28] to compare the results with a predefined threshold. The hypothesis of not being attacked is accepted if we have: where R i is the component of residual vector R and λ is the threshold.

The Covert Cyber Deception Assault
Familiar with the topology of H matrix, an attacker can initiate an assault by altering the value of the meter measurements. Let Z assault = Z meter + a, where a ∈ R, m×1 denotes the malicious data injected into the meter measurement data vector. If the malicious user constructs vector a as follows: where c ∈ R, m×1 is any arbitrary non-zero vector, the legacy BDD cannot detect such an assault. The reason is as follows. Letδ assault denote the estimate of state variables using assaulted meter measurements Z assault , i.e.,δ assault = WZ meter + Wa =δ + WHc =δ + c.
Now, the l 2 norm for the assaulted measurement Z assault residual is as follows: The residual calculated with assaulted measurements is the same as it is for normal measurements. Hence, Z assault will be able to deceive the BDD statistical test presented in (8) and will change the system states, resulting in crucial operational failures [4,17]. This sort of assault is termed an unobservable (or covert) attack [4]. Under these assumptions, the observation model in the presence of the CCD assault can be described as follows: where a is the non-zero assault vector.

CCD Assault Detection Using Euclidean Distance-Based Anomaly Detection Schemes
In this section, we discuss a two-tier mechanism for the detection of CCD assaults in the SE-MF dataset. The curse of dimensionality [8] increases the computational complexity when measurement features grow with increased sizes of the power systems. Moreover, all of the SE-MF dataset attributes would not be equally supportive in leading to plainly distinguishable clusters in the feature space; this can have a negative impact on the performance of a detection method. Therefore, first, we use the FS-based approach to select an optimal subset of features that would result in more tightly-packed and distinctly-separable clusters of vectors of chosen features in the resulting subspace. Through FS, we also can reduce the measurement and storage requirements at the PCC, as well as the training and prediction times [29]. The selected optimal features are then employed as input to the two ED-based AD schemes (EDADS-1 and EDADS-2) to detect CCD assaults in SG communications networks. When the class labels are not given in the SE-MF dataset, we utilize the proposed EDADS-1 to identify the outliers as potential assaults. On the other hand, when the data are supplemented with class labels, we propose EDADS-2 to distinguish between the normal and compromised data. In the following subsections, we explain the GA and the proposed detection schemes.

Dimensionality Reduction Using Genetic Algorithm-Based Feature Selection
The goal of FS is to choose a subset of features (from a given set of features) that yields minimum classification error. Works reported on dimensionality reduction affirm that FS techniques retain data characteristics for interpretability. Furthermore, overfitting due to fewer redundant data is reduced and modeling accuracy is improved with FS methods [29][30][31][32][33][34]. The interrelationship between numerous dimensionality reduction approaches (encompassing feature subset selection) and FE with different flavors of PCA techniques was studied. Janecek et al. [30] analytically tested the effects of these methods on classification accuracy with two different types of datasets (email data and drug discovery data). The results revealed that feature transformation using PCA is highly sensitive to the type of data. Generally, FS methods can be divided into three categories: filters, wrappers, and embedded/hybrid methods. Wrapper methods are advantageous over filter approaches due to them giving better performance since they use the target classifiers such as K-nearest neighbors and support vector machine for feature selection. For a large dataset, however, wrapper methods are computationally expensive. The filter approach is known to be more computationally efficient than wrapper methods and performs well with large dataset [29,35]. In this paper, considering the delay-sensitive nature and increasing sizes of power systems, we use a filter-based FS mechanism that is independent of any learning algorithm or classifier. Working as a preprocessor in the paper, the filter-based FS will select features by considering their scores in different statistical tests for correlation with the outcome variable. We use GA to select the subset of features from the SE-MF dataset that is the best at discriminating compromised data from normal data. GA has been widely used for FS purposes in the machine learning and is considered suitable for large combinatorial problems [30,31,36]. However, recently, particle swarm optimization (PSO) and other evolutionary and metaheuristic algorithms have gained the attention of researchers due to their lesser complexity and simplicity. PSO may be a promising method for FS and an interesting topic for future works in SG security. The GA emulates biological evolution and Darwinian selection [36]. The evolution mechanism of living beings is believed to follow natural selection, i.e., living species that are better suited to their environment thrive, whereas species that are at a disadvantage in their environment go extinct. Following the same principle, GA improves a given solution by incrementally choosing better possible solutions, while eliminating worse solutions.
The quality of each solution is calculated using a fitness value function based on the objective function. The m-dimensional set of SE-MF vector data in R n is given as input to GA as follows: GA yields a set of n-dimensional vectors in subspace R n , described as: It is notable that the GA reduces the dimensionality of each vector in the set without affecting the cardinality of the set of vectors in Equation (14), i.e., n << m. The selected dimensions are chosen to optimize the fitness function. Hence, n << m denotes an instance of the feature vector in the subspace that optimizes the fitness function. Fitness function F, which is adopted in this paper, is given as follows: In (14),C is the mean compactness of classes and is expressed as follows: where the mean separability, denoted byS in (14), is the separation between any two classes in an L-class problem, obtained as follows:S In this paper, we are dealing with a binary classification problem. Therefore, L = 2, i.e., normal and compromised SE-MF measurement data. GA finds a feature subspace that would minimize the ratio of the mean values of inter-class separability and intra-class compactness, defined as follows and illustrated in Figure 2. To measure the compactness of a given class, GA calculates the mean or centroid, µ (i) , of class i as follows: Here, N is the total number of samples of class i. After that, the compactness of class i is determined by finding the mean value of the Euclidean norm, as follows: The Euclidean distance between the centroids of two classes describes the separability between the two classes i and j. It is determined as follows: GA encodes the SE-MF data into chromosomes, which goes through crossovers and mutations. Thus, new generations of chromosomes are yielded, which substitute for their parents, provided they are healthier, i.e., their fitness or objective function value is smaller. This process is iterated over many generations until there is no further improvement in the fitness function [29]. A binary encoding scheme is used to represent the features or attributes of the SE-MF dataset as chromosomes. A chromosome is simply a string of binary ones and zeroes, where one indicates that a certain SE-MF feature is selected and zero means it is rejected. The index of each one and zero in the chromosome corresponds to a distinct SE-MF attribute. In the beginning, the GA randomly selects different subsets of the SE-MF. In other words, a primary population of chromosomes (a string of ones and zeroes) initiates the algorithm. A new population of chromosomes is created by subjecting the primary (parent) chromosomes to crossover and mutations. Two parent chromosomes exchange information or swap fragments at randomly-chosen crossover points during the crossover process. However, during the mutation process, the bits are flipped at randomly-selected positions in a chromosome. Then, based on their respective fitness function value, chromosomes are ranked in the evaluation process. Finally, the chromosomes that minimize the proposed fitness function are selected to produce new chromosomes. This process is repeated for many generations until there is no further decrease in the value of the proposed fitness or objective function.

Euclidean Distance-Based Anomaly Detection Scheme 1
For a large number of given data points, the datasets that vary significantly from the average of the data are called outliers or anomalies. Anomaly detection is a class of machine learning applications, and it has many areas of utilization, such as data cleaning, diagnosis, fraud detection, intrusion detection, and so on. Different types of anomaly detection techniques have been proposed in the literature, such as model-based, distance-based and statistical-based methods [37]. Considering the scenario where the class labels are not provided in the SE-MF dataset, we propose an ED-based anomaly detection scheme (EDADS-1), depicted in Figure 3. The measurement samples are periodically collected at the PCC via the RTUs installed in different locations of the electrical power network. The historical dataset is formulated at the beginning. During the FS process, the optimal features of the SE-MF dataset are selected through GA. Then, the optimal features dataset is divided into training and testing subsets. It is worth mentioning that the training and testing subsets carry both normal and compromised data. In Step 1, centroid vector C d = {µ 1 , µ 2 , ..., µ n } is calculated by finding the mean, µ r , of all the features of the training data subset, where µ r = a ∑ i=1 X (r) Ri a , ∀r ∈ {1, 2, ..., n}. In Step 2, the Euclidean distance between each sample and the centroid is calculated to form a distance vector, is calculated. The average distance can be considered a virtual boundary around the normal dataset.
In the testing phase, the Euclidean distance of each sample is calculated from its centroid to form Finally, the test is performed to identify the new test point as being normal (if D Tq < D R_avg ) or an outlier (if D Tq > D R_avg ). The proposed EDADS-1 algorithm has appreciably low computational complexity grounded in the fact that calculation of the distance of one sample is required to identify the new data point as being normal or a potential CCD assault. Additionally, it has the ability to identify the outliers in the SE-MF dataset as a potential assault even though the class labels are provided. Moreover, with the increased historical SE-MF dataset, the average distance D R_avg becomes closer to the actual value, and the detection is more accurate.

Euclidean Distance-Based Anomaly Detection Scheme 2
Using the Euclidean distance-based method, we can detect a CCD assault with improved accuracy when the labels (normal versus compromised) are given in the SE-MF dataset. In this subsection, we explain the second proposed scheme, EDADS-2, shown in Figure 4. In the beginning, the GA is applied to select the optimal features subset from the SE-MF historical dataset. The resulting optimal features data subset consisting of compromised and normal data is divided into the training Rp ∈ X, ∀p ∈ {1, 2, 3, ..., a} ; and the testing dataset, Tq ∈ X, ∀q ∈ {1, 2, 3, ..., b}. The labels are used to select the normal set, X RN = X Step-I (Centroid) Step-III (Average Distance) Step  Step-III (Average Distance) Step-II (Train Distance)

Experimental Results
In this section, we evaluate the performance of the proposed EDADS-1 and EDADS-2. We performed the simulations using MATLAB 2017b. The proposed schemes were evaluated through experiments using the standard 14-bus, 39-bus, 57-bus and 118-bus IEEE test systems. Experiments' results have been averaged over 20 iterations for each case bus system. Figure 5 illustrates the IEEE 39-bus system [38], also known as the New England 10-machine system. Because of space limitations, figures for the other IEEE bus systems employed for testing in this work are not included. To simulate the operation of the power network, we used the Matpower 6.0 toolbox [39] to generate the configuration of these test systems (especially the Jacobian matrix). We employed the AC power flow model and used DC power flow analysis to approximate the state vectors and measurement dataset. In a B-bus system, state variable vector δ ∈ R n is composed of (B − 1) bus voltage phase angles, and the meter measurement vector consists of active power injections into the buses and branch active power flows. To conduct a fair comparison with a real-world power network scenario, we used the stochastic loads with uniform load distributions similar to [7], i.e., in the range [0.9 × B 0 − 1.1 × B 0 ], where B 0 is the base load. In these simulations, the active power measurement features, including the active power injections into the buses and active power flows on the branches, are the input to the GA for the selection of optimal features.

GA-Based Feature Selection
In the paper, we set the number of chromosomes in each population as 50 and the maximum number of generations as 80, respectively, for GA-based feature selection. Stochastic universal sampling (SUS) is used as the selection operator, and uniform crossover is employed. The crossover rate is 0.63 and the mutation rate 0.018. Because the GA randomly selects different subsets of the SE-MF dataset to create a primary population of chromosomes, we iterate the GA 30 times and choose only those features that were selected more than 70% in 30 iterations. Table 3 lists the average number of selected features from the application of the GA to the SE-MF dataset for various IEEE standard systems.

3D Representation of the Proposed EDADS-1 and EDADS-2
In this subsection, a 3D pictorial representation of the proposed schemes (EDADS-1 and EDADS-2) is illustrated with Figures 6 and 7, respectively. To represent the workings of the proposed schemes, we use three features of the standard IEEE 57-and 118-bus systems. In Figure 6, the centroid (indicated in the figure) is calculated using training data consisting of normal and compromised optimal features. The average of the distances of all the samples from the centroid defines a virtual boundary, as shown in Figure 6. The data points lying outside of the virtual boundary are termed anomalous data or outliers. However, in Figure 7, the centroid (indicated in the figure) is calculated on the basis of data labeled as normal only. The distances of all the samples from the centroid are sorted in descending order, and then, the first 10 percent distances (farthest from the centroid) are chosen to calculate their average distance from the centroid. The average distance defines a virtual boundary as shown in Figure 7. The samples lying out of the boundary are considered as outliers. Basic performance metrics used in this work, i.e., accuracy, F 1 score and receiver operating characteristic (ROC) curves, are shown in the following subsections.

Receiver Operating Characteristic Curves
Figures 8 and 9 illustrate the ROC curves for the proposed EDADS-1 and EDADS-2, respectively, employing the standard IEEE 14-, 39-, 57-and 118-bus systems for testing. The ROC curve is obtained by plotting the false positive rate (FPR) versus the true positive rate (TPR). FPR is defined as the probability that normal data are identified as compromised. It is used as a measure of specificity in our detection scheme. The sensitivity of our scheme is defined as the probability that compromised data are identified as assaulted. TPR is used as a measure of sensitivity. From Figures 8 and 9, we can see that the area under the curve is closer to 1one in all cases. This means that the detection accuracy of the proposed schemes is near one, which validates its good performance. In the next subsections, we elaborate on the accuracy and F 1 score for the proposed schemes.

Accuracy
Calculating the accuracy is a standard way to evaluate the anomaly detection algorithms. It is a single-number summary of the performance of the proposed algorithm and can be calculated as follows: where true positive (TP) corresponds to the samples that the proposed algorithm detects as positive samples and that are, in fact, positive. Similarly, true negatives (TNs) are the points that the proposed algorithm detects as negative samples and that are, in fact, negative. Figure 10 shows the accuracy of the proposed schemes (EDADS-1 and EDADS-2) for various IEEE standard bus systems according to a varying number of training samples. The efficiency of the learning algorithm can be improved by increasing the amount of learning data. We compare the performance of the proposed scheme with that of the statistical model-based anomaly detection method [7] in which the FE technique for dimensionality reduction was utilized and further that of the neighborhood component analysis (NCA) technique, respectively. NCA is a supervised learning method for classifying multivariate data into distinct classes according to a given distance metric over the data. The results show that the proposed FS-based schemes (EDADS-1 and EDADS-2) have higher CCD assault detection accuracy. EDADS-2 exhibits slightly higher performance than EDADS-1 since it only employs the normal features for training, utilizing the labeled data. Hence, the average of all the distances of test samples is close to the value that is required to accurately separate the normal class from the compromised one. It is also shown from Figure 10 that accuracy in detection increases with the increasing number of training data samples. In addition, the NCA-based FS technique has low detection accuracy as compared to the proposed schemes. However, the NCA-based FS scheme performs better than the statistical-based method [7].

F 1 score
Next, we utilize the F 1 score as another metric of detection accuracy. The F 1 score is considered a measure of the precise detection or classification of the subject dataset. The F 1 score is obtained as follows: where P r is precision and is calculated as follows: True positive corresponds to samples that the proposed algorithm detects as positive samples and that are, in fact, positive. Predicted positives may include both compromised and normal sample points, but the algorithm detects them all as positive. R e is recall, calculated as follows: Figure 11 shows the F 1 score of the proposed schemes for different IEEE standard bus systems. The F 1 score of the statistical model-based anomaly detection method [7] employing FE for dimensionality reduction is included for comparison. The proposed schemes are also compared with the NCA-based scheme. It is obvious that the proposed FS-based schemes have a higher F 1 score for all test cases, whereas the FE-based scheme requires many historical samples from the SE-MF dataset for learning to achieve a higher F 1 score. The proposed EDADS-2 has a higher F 1 score than EDADS-1 due to the reason that it employs only normal data samples for training. Hence, the average of the distances of the normal training samples from their centroid is close to the value required for accurately separating the normal class from the compromised class. The performance of the NCA-based scheme is lower than that of proposed schemes; however, it performs better compared to the FE-based scheme [7].
Next, to investigate the impact of several compromised load profiles (compromised samples), we consider different numbers of compromised load profiles, i.e., 24,30,36,40, 45 and 60. The SE-MF dataset load profile is comprised of 360 samples collected through sensors or RTUs at regular intervals of four minutes over 24 h. We use 75% of the data for training and the rest of the samples for testing. Figure 12 shows the F 1 score as a measure of the accuracy of the proposed FS-based proposed detection schemes for standard IEEE 14-, 39-, 57-and 118-bus systems. Figure 12 shows that the FS-based proposed methods have an accuracy of more than 90% for all the employed test systems.

Execution Time Comparison
In this subsection, we compare the execution time of the proposed schemes with that of the existing schemes. Table 3 shows that the proposed schemes (EDADS-1 and EDADS-2) consume less time for feature selection and anomaly detection as compared to the existing schemes. The feature selection time of the two proposed schemes is similar because both schemes utilize GA. However, the detection time of EDADS-2 is slightly less than that of EDADS-1. The reason is that EDADS-2 utilizes only normal features for training from the test data. On the other hand, EDADS-1 employs all the features from the test data for training because the data are unlabeled for this case. Table 3 shows that for NCA, the execution time is higher than the proposed schemes. Table 3 clearly shows that the FE technique [7] employing PCA requires more time. The PCA-based approach achieves the dimensionality reduction by transforming original samples with binary values into new samples with numeric values while making the execution time much longer than other schemes.

Conclusions
In this paper, we propose two FS-based anomaly detection schemes for the detection of CCD assaults in SG communications networks. In the proposed schemes, GA is employed for the selection of discriminative and distinguishing features from historical SE-MF datasets. The selected optimal features are used as the input for two Euclidean distance-based anomaly detection schemes (EDADS-1 for unlabeled data and EDADS-2 for labeled data) to detect anomalies/outliers in the smart-grid SE measurement samples. To validate the performance of the proposed schemes, we utilize the standard IEEE 14-bus, 39-bus, 57-bus and 118-bus systems. In addition, we utilize data that are collected from active power injections into the buses and active power flow measurements in the branches as the learning data and study the accuracy of our detection methods under CCD attack. The test results show that the proposed ED-based FS schemes have reasonably improved detection accuracy, compared to PCA-based FE and NCA-based FS schemes in the occasional operational environment. The low computational complexity of the proposed schemes enables the identification of outliers or anomalies in a short time. In the future, we will model our work by considering more diverse attack scenarios, and we will incorporate a learning mechanism to automatically update the Euclidean distances with incoming test data to improve the detection accuracy.