Mitigating Missing Rate and Early Cyberattack Discrimination Using Optimal Statistical Approach with Machine Learning Techniques in a Smart Grid

: In the Industry 4.0 era of smart grids, the real-world problem of blackouts and cascading failures due to cyberattacks is a significant concern and highly challenging because the existing Intrusion Detection System (IDS) falls behind in handling missing rates, response times, and detection accuracy. Addressing this problem with an early attack detection mechanism with a reduced missing rate and decreased response time is critical. The development of an Intelligent IDS is vital to the mission-critical infrastructure of a smart grid to prevent physical sabotage and processing downtime. This paper aims to develop a robust Anomaly-based IDS using a statistical approach with a machine learning classifier to discriminate cyberattacks from natural faults and man-made events to avoid blackouts and cascading failures. The novel mechanism of a statistical approach with a machine learning (SAML) classifier based on Neighborhood Component Analysis, ExtraTrees, and AdaBoost for feature extraction, bagging, and boosting, respectively, is proposed with optimal hyperparameter tuning for the early discrimination of cyberattacks from natural faults and man-made events. The proposed model is tested using the publicly available Industrial Control Systems Cyber Attack Power System (Triple Class) dataset with a three-bus/two-line transmission system from Mississippi State University and Oak Ridge National Laboratory. Furthermore, the proposed model is evaluated for scalability and generalization using the publicly accessible IEEE 14-bus and 57-bus system datasets of False Data Injection (FDI) attacks. The test results achieved higher detection accuracy, lower missing rates, decreased false alarm rates, and reduced response time compared to the existing approaches.


Introduction
The mission-critical infrastructure of Cyber-Physical Power Systems [1] (CPPS), such as a smart grid, has been targeted for cyberwarfare to cause physical sabotage, large-scale load loss, blackouts, and cascading failures [2][3][4].In most cases of power grid disturbances, fault analysis and diagnosis [5] are conducted using state estimation methods [6] and time-series analysis [7].However, the continuously increasing rate of network traffic makes it difficult for cyber analysts to spot new patterns of behavior in the network.The current data volume, velocity, and variety across firewalls make it more difficult for cyber analysts to monitor successfully.Moreover, 61% of firms admit that, without Artificial Intelligence (AI), they cannot spot critical threats [8].In the current era of Industry 4.0, researchers and scientists have started to apply AI in cybersecurity to employ Intelligent IDS in Wide Area Measurement Systems (WAMS) to protect smart grids from advanced cyberattacks [9,10].These challenges have motivated researchers to provide a defensive in-depth approach to predict anomalies earlier for smart grids with machine learning techniques [11][12][13][14] and deep learning techniques [15,16].Moreover, to enhance the cybersecurity aspects in smart grids, a study on False Data Injection (FDI) attacks provides insights into the paradigm shift in power systems to digitalization, the vulnerability of the protocols, various detection methods, and a mitigation strategy [17].
Generally, an IDS is broadly classified into one of three distinct groups: Signaturebased IDS, Specification-based IDS, or Anomaly-based IDS [18,19].A Signature-based IDS will not be sufficient for the growing cyber threats from motivated attackers since it often needs to be updated.It is limited to known attacks and fails to recognize unknown attacks.A Specification-based IDS is complex and resource-intensive, and it requires system expertise, which is partially effective in identifying unknown attacks [20].In comparison, an Anomaly-based IDS can detect and recognize unknown attacks but with a high FPR.So, a robust Anomaly-based IDS is required to detect anomalies with higher accuracy, low false negatives (missing rate), and low false positives (false alarms) with decreased response times.In the mission-critical infrastructure of a smart grid, even a few instances of misclassification might have fatal consequences regarding the power system's stability and reliability, which necessitates thorough investigation [21].Our proposed SAML-Triple approach is an alternative solution to the existing Specification-based IDS (state estimation approach) and Signature-based IDS.
The scope of this paper includes developing a robust Anomaly-based IDS that discriminates cyberattacks from natural faults and man-made events with an early attack detection mechanism with reduced missing attacks, increased specificity, and fewer false alarms with high detection accuracy.This work will support the nation's smart grid mission and private industries to provide in-depth defense in detecting cyberattacks at the physical layer when the system is in a critical situation compromised by an attacker or intruder within the system or externally.The significant contributions of this paper are presented below:

•
The novel mechanism of a Statistical Approach with a Machine Learning (SAML) classifier based on Neighborhood Component Analysis (NCA), ExtraTrees, and AdaBoost for feature extraction, bagging, and boosting, respectively, is proposed with optimal hyperparameter tuning for the early discrimination of cyberattacks in a smart grid with the three-bus/two-line transmission system of triple-class datasets (No Events/Natural Events/Attack Events); • The proposed model is evaluated for generalization and scalability with IEEE 14-bus and 57-bus system datasets of False Data Injection (FDI) attacks to prove the robustness; The early response time is set to less than 8.3 ms on 120 samples/a second system to detect the attack before the system collapses.
The proposed SAML-Triple approach considers the preprocessing aspects of both INFAZ and INFAD.The two distinct aspects of comparison are aimed at overcoming the existing drawbacks, as discussed in the related work in Section 2. SMOTE is applied to balance the dataset by considering an equal number of samples from each class with stratified sampling.The train-test split of an 80:20 ratio is taken for the SAML-Triple approach.
This paper is split into seven sections: Section 2 discusses the related work regarding triple-class classification with the proposed techniques, drawbacks of the existing approaches, and challenges addressed.Section 3 describes the proposed approach with a process flow diagram of the preprocessing techniques of feature engineering: handling "INFinity" Attack Events records with two preprocessing aspects (INFAZ and INFAD), feature scaling, handling imbalanced data using SMOTE, and the statistical approach of NCA with a hyperparameter optimization strategy combined with a machine learning classifier.Moreover, the description of the publicly available ICS Cyber Attack Power System datasets with operational scenario categories is presented in detail.Section 4 describes the proposed methodology, which deals with the statistical approach of feature extraction techniques using NCA and optimal parameter/hyperparameterized tuning with the (ET + AdB) ML classifier.Algorithm 1 represents the Pseudocode for Data Preprocessing and NCA transformation, whereas Algorithm 2 represents the Pseudocode for Optimal Hyperparameter tuning to find the 'N' Component, and Max.Iteration 'I' of NCA with the best parameters for each ML classifier is applied.Section 5 provides the implementation details of the data preparation, hyperparameter settings for the models, test case scenarios, tools for implementation, and evaluation metrics.Section 6 includes the results analysis and discussion of the three-bus/two-line transmission system (triple-class dataset) and IEEE 14-and 57-bus systems datasets of FDI attacks for generalization and scalability in detail using tables, graphs, and a confusion matrix.The performance metrics of FNR (or missing rate), response time, FPR (or false alarm), and accuracy are considered to compare the results.Finally, Section 7 discusses the conclusion and the scope of future work.

Related Works
In the related works, some researchers recently addressed the problem of discriminating cyberattacks from Natural Events and No Events with their proposed techniques and approaches.
Upadhyay et al. [22] used Gradient Boosting Feature Selection (GBFS), an ensemble learning technique, to reduce the features from 128 to 15 features and obtained an accuracy of 96.50% with a tree-based machine learning classifier.GBFS combines multiple weak learners' predictions to create a strong predictive model.The removal of the features based on feature importance scores provided by GBFS may not necessarily imply the improvement of the model.Some features might be important in combination with others, and removing them individually might not result in a better-performing model.GBFS relies solely on feature importance scores, which limits the feature selection in the triple-class dataset.Also, each of the 15 datasets yields a different combination of 15 features from the 128 features in Table 1.The authors were not convinced as to which 15 features were the best among the 15 datasets to discriminate Attack Events from Natural Events and No Events.Furthermore, the same author group, Upadhyay et al. [23], proposed an integrated framework for an IDS for SCADA-based power grids, in which they used Recursive Feature Elimination (Feature Selection Technique) to reduce the features from 128 to 30 features and nine heterogeneous ensemble classifiers with the majority voting stacking concept to achieve improved accuracy of 97.95% for the same triple-class dataset.Recursive Feature Selection (RFE) removes the least important features based on a model's performance until the desired number of features is selected.The drawback of the RFE approach is that it may struggle when dealing with highly correlated features, and removing one of a set of correlated features might not necessarily improve the model's performance.It could lead to the loss of valuable information, with degraded performance in extracting the features and discriminating the Attack Events from Natural Events and No Events.The first drawback of [22,23] is that the authors' perspectives on preprocessing the "INFinity" Attack Events records are contradictory.Moreover, the second drawback is that, in [22], the author group mentioned the top 15 promising influenced features for attack classification; in contrast, in [23], they did not list the top 30 influencing features, which contradicts the statement of promising features from 15 to 30.However, they achieved a slight improvement in accuracy of 97.95% in [23] compared to [22], with an accuracy of 96.50% for the triple-class datasets.
Hu et al. [24] used a Stacked Denoising Autoencoder (SDAE) to extract the features from 128 to 60 features.They classified them using an Extreme Gradient Boosting (XG-Boost) classifier for the triple-class dataset with an accuracy of 90.48%.The authors used the deep learning model of SDAE to learn the complex input data representation to extract the features.SDAEs have several hyperparameters, including the number of layers, the number of nodes in each layer, learning rates, and corruption levels.Selecting appropriate hyperparameters is challenging, and suboptimal choices result in the model's poor performance in extracting the features and discriminating the Attack Events from Natural Events and No Events.Moreover, the same author group, Hu et al. [25], used Multiple Autoencoders (AE) to extract the features from 128 to 30 features and classified them using a Random Forest machine learning classifier to discriminate the triple-class dataset, with an improved accuracy of 91.78%.Multiple Autoencoders (AE) learn the complex representation of input data to extract features.Hyperparameter tuning of Multiple Autoencoders and the adaptive boosting mechanism is challenging.It requires extensive experimentation to find an optimal combination.This results in the degradation of model performance in extracting the features and discriminating the Attack Events from Natural Events and No Events.The first common drawback of both works [24,25] is that they have not mentioned the preprocessing of "INFinity" Attack Events records.The second drawback is that the feature selection is inconclusive, with wide variation in selecting optimal features, as they mentioned 60 features in [24] and 30 features in [25], a slight improvement in accuracy from 90.40% to 91.78%.
Gumaei et al. [26] used the Correlation-based Feature Selection Technique for selecting the optimal features with KNN as a machine learning classifier to discriminate the tripleclass datasets with an accuracy of around 91.87%, where each of the 15 datasets was processed individually to select the optimal features, with variation regarding eight to eleven features.Correlation-based feature selection measures linear relationships between variables, and it may not capture the true association between the features and the target variable if the relationships are non-linear.Also, it may not capture complex interactions or dependencies involving multiple features simultaneously.It results in a poor choice of feature selection, which may not be accurate enough to discriminate Attack Events from Natural Events and No Events.Moreover, this paper recommended the future scope of increasing the accuracy with less computational time based on hybrid feature selection techniques.
Ankitdeshpandey and Karthi, R. [27] applied Principal Component Analysis (PCA) as an unsupervised feature extraction technique to reduce the dimensionality to 31 principal components to discriminate the triple-class dataset using ML and DL classifiers with an accuracy of 91.14% for Random Forest, 89.91% for DNN, and 76.90% for SVM, where all three of the classifier results demonstrated meager detection rates.PCA focuses on capturing the global variance in the data along the principal components in which the data vary.It does not preserve the local structure or relationships within the data, which might result in misclassification between the three classes.The limitation of this paper is that they tested only for the reduced amount of around 13,200 samples with random selection from the entire 15 datasets.
Hink et al. [28] developed the original datasets by considering the scenario of an insider attack (or compromised system) in a smart grid.They investigated power system disturbances and cyberattack discrimination using machine learning applications.This author's group's dataset provides the first proof for carrying out research in machine learning applications to develop an IDS.The limitation of this paper is that they tested only for 1% of the randomly sampled records from the entire 15 datasets across all three classification formats: Binary Class, Triple Class, and Multiclass.The sample measurement considered 294 records of "No Events", 3711 Attack Events, and 1221 Natural Events records used across all three classification formats.Using the Information Gain as a Feature Selection Technique, 40 features were selected as optimal features and discriminated against the triple-class dataset using the Adaboost + JRipper ML classifier with an accuracy of 95.0%.Information Gain assumes that features are independent, and this assumption is often violated in the case of highly correlated triple-class datasets.As a result, Information Gain may not accurately reflect the true importance of features, leading to suboptimal feature selection, which may not be efficient in discriminating between the three classes.This original author group recommended evaluating the future scope of work with a broader range of power system data, learning algorithms, classification strategies, and labeled data amounts.
Agrawal et al. [29] applied the concept of dynamic retraining with drift detection toward robust power grid attack protection using LightGBM.They classified the triple-class dataset with an accuracy of 95.30% for the complete 128 features and 97.1% for the top selected ten features using the ExtraTrees approach as the feature selection technique.The drawback of this paper is that they removed the "INFinity" Attack Events records, which may lead to missing rates or SCADA inoperability.The feature importance scores provided by ExtraTrees as a feature selection method may not always reflect the true importance of features, especially in the presence of highly correlated features.The algorithm may arbitrarily assign importance to one of the correlated features, leading to potential bias in feature selection and misclassification between the three classes.
Sunku Mohan et al. [30] investigated the problem using Power Domain Knowledge.These authors employed manual feature selection by filtering out features of positive-, negative-, and zero-sequence components and logs.They selected 36 features manually for this triple-class dataset for discriminating cyberattacks, Natural Events, and load variation in SCADA smart grid systems.They applied a Rule-based Machine Learning Adaboost classifier to discriminate the triple-class dataset with an accuracy of 97.25%.The limitation of this paper is that manual feature selection is more domain-specific, which is a partial specification-based IDS, even though the classification was performed through a machine learning classifier.This manual feature selection may not be suitable for the generalizability and scalability of the model for different architectures and may require complex logical calculations.
Bitirgen K and Filik ÜB [31] developed a hybrid model by combining particle swarm optimization (PSO) with convolutional neural network (CNN) and long short-term memory (LSTM), PSO-CNN-LSTM, to optimize the features for better triple-class classification, with an accuracy of 96.92%.The authors utilized PSO as a metaheuristic optimization algorithm for better search space, along with CNN to develop input features with complicated mathematical operations and LSTM to preserve both the short-term and long-term dependencies of time-series data.The drawback of this paper is that they did not mention the preprocessing of "INFinity" Attack Events records and the number of feature selections, even though they achieved better detection accuracy.In the case of a highly correlated triple-class dataset, the effectiveness of PSO is influenced by the starting positions and velocities of particles, and it may struggle to handle redundant features effectively.If multiple features are highly correlated, PSO might select only one, leading to potential bias in the feature selection and misclassification between the three classes.The graphical representation shown in this paper is a mere feature representation before applying the model and does not better represent the cluster of the labels.
In the solving approaches, NCA indirectly involves eigenvalues and eigenvectors in the computation of transformation matrix A. In [32], the author efficiently used eigenvalues and eigenvectors for optimal sensor placement with multi-objective robust optimization.In [33], the author incorporated Adaptive PCA (A-PCA) for extracting the best features from the network traffic for the IDS.The eigenvalues and eigenvectors of the covariance matrix provide valuable information regarding the principal axes and magnitudes of variation in the data.This information can guide the selection of appropriate dimensions or components for representation.

•
The drawbacks and challenges faced by most of the researchers are that they tried to discriminate the No Events/Natural Events/Attack Events of triple-class datasets with different feature selection numbers varying from eight to sixty features with their adopted feature selection techniques [22,23,26,28,29,31] to improve the detection accuracy.The feature importance scores obtained from the suboptimal feature selection of the existing methods indicate poor choices of features, leading to potential bias in feature selection and a lack of model performance when discriminating against attacks.
The authors [24,25,27] used Autoencoders as dimensionality reduction techniques of feature extraction to reduce the features using unsupervised techniques.The feature extraction using deep learning methods lacks the optimal combination of extracting the features due to several hyperparameter factors.Moreover, dimensionality reduction techniques (feature extraction) using PCA [27] fail to capture the local structure or relationships within the data, which might result in misclassification between the three classes.

•
The other major drawback is that dropping the feature column of "PA: Z" (Apparent Impedance for Four Relays) or removing the "INFinity" Attack Events records rows seems to be quite contradictory as those researchers attempted to avoid the attack scenarios, which might lead to increases in the missing rate or false negative rate.
If left unprocessed, it may have a massive impact on the SCADA systems, making them inoperable.It might result in fatal consequences regarding the power system's stability and reliability.We have addressed this problem in our proposed SAML-Triple approach by considering it zero (INFAZ).

Research Gap Identified
Due to the highly complex correlation between the features of the Triple Class Power System Cyberattack Dataset, the existing approaches are ineffective in selecting the optimal subset of features.The existing approaches [22,23,26,28,29,31] do not select the suboptimal features with feature importance scores, leading to potential feature selection bias.The existing approaches do not effectively discriminate cyberattacks from Natural Events and No Events.The rest of the existing approaches [24,25,27] use the feature extraction techniques of deep learning methods like Autoencoders and PCA.Such techniques do not result in the optimal combination of extracting the features due to several hyperparameter factors.Moreover, the existing approaches provide less accuracy in the discrimination of cyberattacks.We propose an alternative approach of SAML-Triple to overcome the shortcomings of the existing approaches in discriminating cyberattacks from natural faults and man-made events.The proposed work focuses on improving the accuracy as well as mitigating the missing rates with earlier cyberattack detection.

•
To avoid ambiguity regarding the different number of feature selections, we utilized the NCA as a supervised feature extraction method for dimensionality reduction.This method does not rely on feature importance scores; instead, it converts highdimensional data into lower-dimensional data in a new transformation space suitable for complex and highly correlated data.It preserves both the global and the local neighborhood structure relationship between the data records in the dataset.The proposed SAML-Triple adopts NCA as a feature extraction technique by optimal parameter/hyperparameter tuning with the (ET + AdB) ML classifier in discriminating cyberattacks from natural faults and man-made events.

•
We addressed the "INFinity" Attack Events records in the feature column of "PA:Z" (Apparent Impedance for Four Relays) by replacing them with "Zero" (INFAZ).In the context of the power domain, the "PA:Z" -INFinity value can be processed either as "Zero" or the range of value above its limit to avoid the missing rate [21].Here, two preprocessing aspects of INFAZ (Zero) and INFAD (Dropping) in the SAML-Triple work were performed to compare the results with those of other existing works.

Proposed Approach
The Process Flow Diagram in Figure 1 stands for the steps involved in the SAML-Triple involving feature engineering aspects and the hyperparameter optimization strategy of the model for discrimination of cyberattacks from natural faults and man-made events.
In SAML-Triple, the source of datasets considers the publicly available ICS Cyber Attack Triple Class (No Events/Natural Events/Attack Events) Power System Datasets [34].In SAML-Triple, the data preprocessing steps in feature engineering are carried out with two aspects: handling "INFinity" Attack Events records as zero (INFAZ) and dropping "INFinity" Attack Events records (INFAD) for the feature columns of "PA:Z" (Apparent Impedance for Four Relays).Adopting two distinct data preprocessing aspects in the SAML-Triple is to compare the results with the existing approaches.In continuation with feature engineering, Standard Scalar, and Label Encoders were applied, followed by SMOTE to balance the dataset, considering the equal number of records from each class label with stratified sampling.For both aspects, optimal features are extracted by utilizing NCA as a feature extraction technique to find the Optimal 'N' Component with a Maximum number of Iterations 'I'.In SAML-Triple, the train-test split of an 80:20 ratio was applied.After the train-test split process, optimal hyperparameter tuning with GridSearchCV (10-fold cross-validation) for each of the ML classifiers is applied to exhaustively search for the best parameters from the grid of provided parameters.A pool of ML classifier algorithms [35] is applied for training and testing the datasets with ExtraTrees with AdaBoost classifier (ET + AdB), ExtraTrees (ET), Random Forest (RF), Decision Tree (DT), K-Nearest Neighbor (KNN), and XGBoost (XGB).The SAML-Triple approach of NCA with the (ET + AdB) ML classifier performs well in discriminating cyberattacks on both aspects with better performance metrics.The performance metrics of FNR (or missing rate), Response time, FPR (or false alarm), and Accuracy were compared in this work with various ML classifiers.

Industrial Control Systems (ICS) Cyber Attack Power System Dataset Testbed Description
The publicly available ICS Cyber Attack Power System Dataset [34] was developed by Mississippi State University in collaboration with Oak Ridge National Laboratory in 2014.
Figure 2 represents the power system framework configuration of a 3-Bus/2-generator system developed by the authors [36].The assumptions are made that an intruder has already compromised the system, acquired access to the substation network, and sent commands to the substation switch.The intruder (or attacker) could be a former employee of the company or present employee or an external source from outside the network.Since the IEDs (Intelligent Electronic Devices) lack an internal validation system to distinguish between genuine and fraudulent faults, they employ a distance protection technique to trip the breakers on detected faults.Operators can manually trip breakers BR1 through BR4 by sending orders to IEDs R1 through R4.Usually, manual overriding is performed during line maintenance or when other system components fail.SAML-Triple considers this framework, which comprises various operational scenarios [34], such as Single Line-to-Ground, line maintenance, remote tripping command injection attack, relay setting change attack, and FDI attack to ensure that cyberattack discrimination is valid during normal routine operations, including manipulated breakers.The SAML-Triple considers 15 triple-class datasets for evaluation to discriminate cyberattacks from Natural Events and No Events.Each of the 15 datasets has an approximately equal number of 5000 records with 128 feature columns and one marker column as the target label for classification.Table 1 provides the dataset's feature description of 128 features.A detailed description of the features dataset is available in [34].Various types of operation scenarios (37 power system events) are categorized into three labels (No Events (1)/Natural Events (8)/Attack Events (28)).Table 2 shows the SAML-Triple class with detailed 41 event scenario splits into three class labels, namely-No Events (41), Natural Events (1 to 6, 13, 14), and Attack Events (7 to 12, 15 to 20, 21 to 30, and 35 to 40).

•
No Events-stands for the normal system operation with changes in loads.

•
Natural Events-stands for the system with Single Line-to-Ground (SLG) with various percentages of fault location in L1 and L2 with the addition of Line Maintenance (L1 and L2).

•
Attack Events-stands for the data injection attack (SLG fault replay), remote tripping command injection attack, and relay setting change attack with various percentages of fault location.

Various Types of Operational Scenarios
• SLG or Short-circuit fault.A short circuit in the power line can occur anywhere along the line; the percentage range specifies the fault location.

•
Line maintenance.This event category is performed when one or more relays are disabled on a specific line so maintenance can be completed for that line.• FDI Attack.Here, the intruder imitates a valid fault by changing values of parameters such as current, voltage, sequence components, etc.The intruders mimic the valid SLG fault by synchronizing with the phasor measurements, followed by sending an illicit trip command to relays at the ends of the transmission line.This attack involves altering the parameters of current, voltage, phase angles, sequence components, and so on to blind the operator without raising the alarm and inducing a blackout.This attack imposes a physical or large-scale load loss and substantial economic loss.Similar to SLG faults, faults can occur at any location in the transmission line with various percentage ranges (10-19%, 20-79%, and 80-90%).

•
Remote tripping command Injection attack.This attack type arises when an intruder's system on the communications network sends unauthorized relay trip commands to relays at the terminals of a transmission line.The command injection attacks are performed against a single relay (R1 to R4) or two relays (R1 and R2 or R3 and R4).

•
Relay setting change attack.The intruders alter the relay settings in a distance protection scheme to cause a malfunction in the relay operation.This type of attack fails to recognize valid faults or commands.The faults can occur in any location on the transmission line with R1/R2/R3/R4 disabled and faults.
In this framework [34], a PMU device estimates the magnitude and phase angle of a phasor quantity (such as voltage or current) by synchronizing with a common time source.Each of the four PMU/relays is integrated and measures 29 features, each constituting 116 PMU measurement columns.Following the PMU measurement columns, there are 12 columns for control panel logs, snort alerts, relay logs, and the marker/target column.

Methodology
Neighborhood Components Analysis [37] is a supervised non-parametric statistical feature extraction technique based on the K-Nearest Neighbor (KNN) method.KNN makes classifications by grouping an individual data point with the distance between two points using the closest neighbor with a given Euclidean distance (default) metric.The drawbacks of KNN are that it is a computationally memory-expensive training and modeling problem to evaluate the data for a larger dataset, which becomes a lazy learner.On the other hand, NCA uses Mahalanobis distance as a distance measure to optimize (maximize) the selection accuracy and minimize the leave-one-out (LOO) classification error on the training dataset using a stochastic nearest neighbor approach, which reduces the prediction complexity.It classifies any given test well and speeds up the procedure for faster discrimination.The computation complexity comparison is described in detail under Section 4.3.

•
Both NCA and PCA are linearly transformed to lower dimensionality.• NCA [37] (Supervised Learning) of the statistical approach is a feature extraction technique that employs a method similar to k-Nearest Neighbor in which the neighborhoods of records with the same labels are packed together more densely than those with different labels.

Neighborhood Components Analysis in the Context of Power Domain
NCA [37] effectively classifies multivariate data into different classes based on the Mahalanobis distance metric computed across the data, based on the distance between a data point and a distribution.It is beneficial for multivariate anomaly detection and classification on highly imbalanced datasets when a correlation between distinct groups or clusters of data is required.It works well when two or more features have high correlations and different scales of values.It matches the power domain context of the ICS Cyber Attack Triple-Class dataset taken with the relationship between associated features of Voltage (V) and Current (I).
NCA accomplishes the same goal as the K-Nearest Neighbor algorithm with a different distance measure of Mahalanobis distance, employing stochastic nearest neighbors selection approach by maximizing the selection accuracy and minimizing the training data's leave-one-out (LOO) classification error.In the power domain context of cyberattack detection, the analysis combines multiple dependent variables (features) to predict the classification's single outcome (target column).In SAML-Triple, the dataset contains multivariant data with the features of Voltage Phase Angle and Magnitude, Current Phase Angle and Magnitude, Frequency for relays, Frequency Delta (df/dt) for relays, Appearance Impedance and its angle for relays and logs feature results in predicting the single outcome (target column) of attack discrimination and classification.
For the triple-class dataset taken, let 'X' represent the matrix of the original feature vector of the dataset.Each row of 'X' is denoted as x i corresponds to the feature vector of a single data point.The matrix 'A' represents the linear transformation applied to the original feature vectors to obtain transformed feature vectors.NCA aims to optimize a linear transformation matrix 'A' for the triple-class dataset.The dataset comprises data records categorized into three distinct classes, each represented by feature vectors.Through iterative optimization with conjugate gradient descent, NCA aims to learn the matrix 'A' that maximizes the probability of accurate classification in a transformed feature space.This transformation enhances the discrimination between classes, facilitating more effective classification.The gradient-based optimization process involves updating the elements of 'A' to maximize a predefined cost function, typically measuring the preservation of nearest neighbor relationships in the transformed space.Therefore, maximizing the objective function corresponds to minimizing the leave-one-out (LOO) classification error, leading to better performance of the NCA algorithm on the triple-class dataset.Ultimately, the resulting matrix 'A' encapsulates the learned transformation, enabling improved classification performance on the triple-class dataset.
Let, p ij be the probability that point x j is selected as a point x i 's neighbor.The probability that points are correctly classified when x i is used as a reference is To maximize the p i of (1) for all x i means to minimize LOO error.Then, p ij of ( 2) is defined using the softmax function of the squared Euclidean distance between a given LOO classification point and every other point in the transformed space: where d ij is defined as a distance measure with between points x i and x j provided by Here, x and y are the original feature vectors of matrix 'X' projected into another vector space with the transformation matrix 'A', where 'Q' is a symmetric, positive semi-definite covariance matrix of the transformed feature space, represented as 'Q'= A T A. Here, x i and x j , are the transformed feature vectors obtained by multiplying the original feature vectors x and y by the transformation matrix 'A', i.e., Ax i and Ax j , respectively.Substituting (3), d ij in (2), p ij is represented below in (4): , where p ii = 0 (4) Now, the objective function of NCA is maximized using LOO classification with stochastic neighbor selection rule, which can be expressed in terms of the cost function f (A) in (5).This cost function f (A) pulls points from the same class closer together.
where C i is the class label of sample i.Since the cost function is differentiable concerning A, it is optimized using gradient descent.Maximizing the objective function f (A) using a "gradient-based optimizer" such as "conjugate gradient descent" is represented below in (6): The terms used in (6)  x i represents the feature vector of sample i.

•
x ij is the j-th element of the feature vector of sample i.
y represents the output of the transformation Ax.
The larger the f (A) during training, the better the test performance.The convergence criteria in NCA are met by observing the changes in the objective function across iterations and stopping the optimization process when the change becomes negligible or the threshold is reached, in addition to setting the maximum number of iterations to prevent the algorithm from running indefinitely.
In the smart grid context, NCA is utilized to detect or discriminate cyberattacks in ICS power system datasets; the transformation matrix 'A' plays a crucial role in mapping the input features to a lower-dimensional space where attack detection or discrimination tasks are performed.In this scenario, the input features from the ICS power system cyberattack dataset, such as voltage, current, phasor values of voltage and current, and frequency, etc., of all 128 features considered are represented in Table 1 as feature description.The transformation matrix 'A' is utilized to transform these high-dimensional input features into a lower-dimensional space, where the inherent structure of the data relevant to attack detection or discrimination is captured better.The parameters of the transformation matrix 'A' correspond to the individual elements of the matrix denoted as A ij , which determine the contribution of each input feature to each dimension of the reduced space.During the optimization process in NCA, these parameters are adjusted iteratively to maximize the discriminative power of the reduced space for discriminating cyberattacks from natural and normal instances in the power system data.This optimization typically involves using gradient-based optimization algorithms to update the parameters of 'A' iteratively, aiming to maximize the objective function that quantifies the effectiveness of the reduced space in detecting or discriminating cyberattacks.
The final values of these parameters after convergence represent the optimized transformation matrix 'A', which provides an effective representation of the power system data for attack detection or discrimination purposes.
The significance of the proposed SAML-Triple approach is that it utilizes NCA, a supervised non-parametric statistical feature extraction (dimensionality reduction technique), which considers the local neighborhood structure relationship between data points belonging to each of three classes (No Events, Natural Events, and Attack Events) and employs Mahalanobis distance measure (3) and the transformation matrix 'A' to transform into a newer dimension preserving the global variance of the data along the principal components.The probability of selecting the closest data points to each class (1) is calculated using the SoftMax function (2) or (4).It leverages class labels to learn a transformation that not only reduces dimensionality but also enhances discriminative information for classification tasks.The objective function of NCA ( 5) is maximized using LOO classification with stochastic nearest neighbors selection approach.So, it pulls points from the same class closer together for each of the three classes.
Further, the cost function ( 5) is optimized using conjugate gradient descent (6) for a better training process for faster convergence.The algorithm aims to learn a linear transformation represented by the weight matrix 'A' that improves the accuracy of the k-Nearest Neighbor classifier (non-linear) with Mahalanobis distance measure on the training data.Therefore, NCA effectively maintains both the local and global variance associations of the data, making it particularly well-suited for extracting features from complex datasets that exhibit both linear and non-linear dependencies.The optimal number of components from NCA is hyperparameter tuning with ML classifier parameters, which helps for clear discrimination between Attack Events and Natural Events and No Events.

Comparison of Computation Complexity
The computational efficiency of both KNN and NCA algorithms involves two phases: training complexity and prediction complexity.
In the case of the KNN algorithm, it stores all the training instances (n) in the memory and requires a memory-intensive training phase, and its prediction complexity increases with the number of training instances and the dimensionality of the feature space (d), which is less scalable for large datasets due to curse of dimensionality.For each test instance, KNN calculates the distances to all training instances, which requires computing the distance metric (Euclidean distance by default) between the test instance and each training instance.The memory complexity of KNN is O(n * d) since all training instances need to be stored in memory.This KNN algorithm comes under the family of lazy learners.Beyond a point, KNN is not effective in the discrimination of attacks due to the Euclidean distance metric.
NCA involves a more computationally intensive training phase due to a gradientbased optimization algorithm used to learn the transformation matrix 'A'.The computational complexity per iteration depends on the number of training instances (n) and the dimensionality of the feature space (d).Once the transformation matrix A is learned, making predictions with NCA involves applying this transformation to new instances, which have a computational complexity of O(d * m), where 'm' is the reduced dimensionality.NCA uses Mahalanobis distance as a distance measure to optimize (maximize) the selection accuracy in the training phase using the stochastic nearest neighbor approach and minimizing the leave-one-out (LOO) classification error while offering lower prediction complexity and better generalization performance by learning a transformation to optimize a specified objective function.The memory complexity of NCA depends on the size of the transformation matrix 'A', which is typically O(d * m), where 'd' is the original dimensionality and 'm' is the reduced dimensionality.
Overall, the NCA is better in prediction complexity compared to KNN.KNN requires a more memory-intensive training phase, whereas NCA requires more computationally intensive training.Comparatively, NCA achieved better results in terms of accuracy in discriminating the cyberattack, and it outperforms KNN.Using the Mahalanobis distance in the NCA algorithm further improves accuracy.This has also been demonstrated through our results provided below.
The complexity results for the proposed SAML approach using NCA achieved 97.51% accuracy compared to KNN with 90.64% accuracy, which shows an improvement of 6.87% accuracy in the classification of attack, which is represented through the line graph in Figure 3.

end END
A pool of machine learning classifier algorithms [35] used for training and testing the datasets are (ET + AdB), ET, RF, DT, KNN, and XGB.For the proposed approach of SAML-Triple, the classification algorithm applied is the ExtraTrees (bagging) classifier as a base classifier with the AdaBoost (boosting) technique.ExtraTrees classifier works conceptually like Random Forest but differs in the construction of decision trees.ExtraTrees classifier uses a random selection of features and thresholds at each decision tree with an initial training sample and aggregates the output of multiple decision trees as a "forest".This randomness makes the ExtraTrees classifier less prone to overfitting.Based on the Entropy Index's mathematical criteria, the data partition with the best features is constructed for each decision tree.Further, the AdaBoost technique is applied to transform weak learners into stronger ones by reassigning weights to each incorrectly classified instance.

Implementation Details
The proposed approach of SAML-Triple is implemented to discriminate triple-class events such as No Events, Natural Events, and Attack Events.Table 5 represents preprocessing aspects, train-test split ratio, and SMOTE records (before and after).The preprocessing aspects for SAML-Triple are shown with an example for the first dataset out of fifteen datasets.A similar process is followed for the rest of the datasets.

N-Estimator Number of trees in the forest Criterion
The quality measure of the split   Table 3 represents the parameter specifications and optimal hyperparameter tuning of the SAML-Triple with NCA.For the SAML-Triple, optimal hyperparameter tuning results were obtained from Table 8 on an average of 15 datasets, and similar approaches were performed for other ML classifiers to obtain the same.

Tools for Implementation and Evaluation Metrics
The tool used for implementation is the Online Google Colab Data Analytics platform (free subscription).It uses Python 3 Google Compute Engine utilizing 13 GB RAM, a 2-core Xeon CPU @ 2.20 GHz, a processor, and 108 GB of hard disks.
SAML-Triple utilizes the NCA algorithm that requires this configuration to find the optimal 'N' Component and maximum Iteration 'I' for the specified range, as provided in Table 3.The confusion matrix represented in Figure 4 is used to evaluate the performance of a machine learning model on the testing data for triple class.It summarizes the model's predictions and the actual values for the classification problem.For the performance evaluation of IDS, SAML-Triple adopts standard metrics, such as accuracy, FNR, FPR, response time, precision, recall, and F1-score.Figure 4 represents the confusion matrix of SAML-Triple between actual versus predicted classes.True positive cases for three scenarios (marked in green) TP (AE) is correctly classified as Attack Events, TP (NaE) is correctly classified as Natural Events, and TP (NoE) is correctly classified as No Events, whereas true negatives occur for three cases regarding the respective left and right diagonal elements (marked in red).The false negatives for the three cases were respective horizontal rows, and false positive cases for the three were vertical columns.In the attack scenario case, F (AE-NaE) and F (AE-NoE) Attack Events were misclassified as Natural Events and No Events, respectively.In the case of the Natural Events scenario, F (NaE-AE) and F (NaE-NoE) were misclassified as Attack Events and No Events.In the case of the No Events scenario, F (NoE-AE) and F (NoE-NaE) were misclassified as Attack Events and Natural Events.

Results Analysis and Discussion
SAML-Triple provides a robust solution for discriminating Attack Events from Natural Events and No Events, represented by tables, graphs, and a confusion matrix.

SAML-Triple: Triple-Class Datasets (No Events/Natural Events/Attack Events)-Test Results
Table 10 represents the potential impact of the SMOTE operation on datasets.The SMOTE operation achieved higher accuracy for all 15 triple-class datasets with an equal number of records considered from each class through stratified sampling, specified in Table 4 in Section 5, Implementation Details.From Figure 5, it is indicated that, without SMOTE, the dataset is imbalanced with three label records, showing decreased accuracy for all 15 datasets marked with a blue line, with an average accuracy of 95.73% compared to the SMOTE operation, with an average accuracy of 97.51%, marked with a red line.Figure 6 implies that NCA_INFAZ (blue bar) is comparatively higher in accuracy than PCA_INFAZ (yellow bar) across 15 datasets, and the same is true for NCA_INFAD (red bar) vs. PCA_INFAD (green bar).From another perspective, NCA_INFAD (red bar) performs comparatively better than NCA_INFAZ (blue bar), which is not an ideal case that may lead to a missing attack rate; hence, NCA_INFAZ (blue bar) provides good accuracy of more than 96% across 15 datasets.Figure 7 depicts the SAML-Triple (INFAZ) approach using the PCA representation of three events with the (ET + AdB) ML classifier (e.g., Dataset-1).The three axes, x, y, and z, represent PCA-1, PCA-2, and PCA-3, respectively.These three axes' values represent the top three components with high variance from the transformed 128 features, which retain the maximum relative information.Figure 7a  Figure 8 depicts the SAML-Triple approach using NCA representation with discrimination of three events using the (ET + AdB) ML classifier (e.g., Dataset-1).The three axes, x, y, and z, represent NCA-1, NCA-2, and marker (target) columns.The first two axes' values represent the top two components with high variance from the transformed 128 features, which retain maximum relative information.In contrast, the third axis is a marker (target)-labeled column.It clearly shows the discrimination of Attack Events, Natural Events, and No Events since the third axis is a marker (target)-labeled column, which aids in discriminating the processed component level values per the mathematical objective defined in Section 4. Figure 8a,b show the training and testing samples of Attack Events.Figure 8c,d show the training and testing samples of Natural Events.Figure 8e,f show No Events' training and testing samples.
Table 8 represents a comparison of the performance metrics of SAML-Triple using NCA with the (ET + AdB) ML classifier in two preprocessing aspects.The testing samples of 20% were taken for evaluation across 15 datasets with INFAZ and INFAD.The accuracy, precision, recall, and F1-score performance metrics were compared.On average, 97.51% and 98.25% were achieved by considering INFAZ and INFAD, respectively.Each of the 15 datasets is executed with optimal hyperparameter tuning to find the optimal 'N' Component and maximum Iteration 'I' for NCA as a feature extraction technique and optimal 'N' estimator for the (ET + AdB) ML classifier.12 compares the SAML-Triple classification accuracy comparison of various test cases with two preprocessing aspects of INFAZ and INFAD.The test results across various test cases achieved more than 96% average accuracy for both the preprocessing aspects.
Table 13 represents the SAML-Triple performance metrics comparison of NCA with the (ET + AdB) ML classifier obtained from Table 8.The results were obtained using a similar approach to the rest of the ML classifiers.The performance metrics of precision, recall, F1-score, accuracy, FPR, FNR, and testing time results were compared with distinct preprocessing aspects of INFAZ and INFAD.Table 14 represents the SAML-Triple (INFAZ) average response time comparison across various ML classifiers for 120 samples/second system [36].The average response time was obtained with batch processing of 120 samples/second test records for 10 rounds of batch processing.The data generated from the power system framework [34] were collected in the PDC (Phasor Data Concentrator).For a synchrophasor system [36] with 120 samples/second, there are 8.3 ms between samples; this time could be employed to process the samples with the SAML-Triple (INFAZ) approach, and it can detect anomalies with less than 8.3 ms between samples.The proposed SAML-Triple (INFAZ) IDS can process within 0.23 ms between samples and less than 8.3 ms between samples for a 120 samples/second system [36].The Attack Events records of "INFinity" are processed in the SAML-Triple (INFAZ) approach, whereas other existing approaches are lagging in this aspect, as discussed in Section 2. If the Attack Events records are unprocessed (INFAD), it might have fatal consequences for the stability of the power system and cause the system to collapse.The SAML-Triple (INFAZ) approach, with a response time of 0.23 ms per sample, can alert the prevention system to take further action to regain the power system's stability.This crucial response time is more critical in the mission-critical infrastructure of the smart grid to undertake faster decision operations, which is achieved through our proposed work of SAML-Triple (INFAZ).Figure 3 has an x-axis with various ML classifiers and two y-axes, with the left y-axis representing the accuracy metric and the right y-axis representing the response time metric.Figure 3 implies that the SAML-Triple (INFAZ) NCA with the (ET + AdB) ML classifier performs better, with an accuracy of 97.51% and a response time of 0.23 ms, detecting an attack that is less than 8.3 ms between samples (120 samples/second) [36].Even though the DT classifier has a lower response time of 0.09 ms compared to the (ET + AdB) ML classifier of 0.23 ms, the accuracy of the DT classifier remains low at 93.17%.
Figure 10 represents the confusion matrix of SAML-Triple (INFAZ) for the first dataset out of 15 datasets after the SMOTE operation represented in Table 5.The total number of samples in the first dataset before SMOTE is 4966, with 173 (No Events), 927 (Natural Events), and 3866 (Attack Events).After applying SMOTE to the first dataset, the total number of sample records was 11,598, with an equal number of 3866 records for each of the three events (No Events, Natural Events, and Attack Events).Out of the 11,598 records, 9278 were taken for training, and 2320 were taken for testing, with a ratio of 80:20 as per the Pareto principle.Moreover, 80% of the training samples' 9278 records contain almost an equal number of 3093 samples from the three events.For the remaining 20% of the testing samples, 2320 records containing almost equal numbers of 774 samples from the three events were considered.With the testing samples of 2320 records, the confusion matrix in Figure 10 depicts the True Label vs. Predicted Label reference to the confusion matrix in Figure 4.The labels in the confusion matrix in Figure 10 represent Attack Events (0), Natural Events (1), and No Events (2).From the confusion matrix of SAML-Triple (INFAZ) in Figure 10, in the attack scenario case, 758 samples (marked in yellow) were correctly classified as Attack Events (true positive), whereas 16 samples (marked in violet) were misclassified as Natural Events (false negative).In the case of the Natural Events scenario, 762 samples (marked in yellow) were correctly classified as Natural Events (true positive), whereas 11 samples (marked in violet) were misclassified as Attack Events (false negative).In the case of the No Events scenario, 773 samples (marked in yellow) were correctly classified as No Events, and there were no misclassifications.Furthermore, the false positives are the opposite for the three scenarios concerning false negatives.
Table 16 compares the SAML-Triple accuracy metrics of NCA with the (ET + AdB) ML classifier vs. other existing approaches.SAML-Triple achieved a higher accuracy of 97.51% and 98.25% by considering INFAZ and INFAD, respectively.The INFAZ aspect can address the missing rate, whereas the INFAD aspect will not deal with the missing rate.Both aspects of comparison were conducted with the existing approaches with the number of features selected or extracted for classification.The SAML-Triple (INFAZ) approach outperforms the other existing approaches with an accuracy of 97.51%.The proposed approach with 31 components preserves the data's local and global variance associations, making it wellsuited for extracting features from highly complex correlated datasets that exhibit linear and non-linear dependencies.Specifically, we addressed the "INFinity" Attack Events records in the feature column of "PA:Z" (Apparent Impedance for Four Relays) by replacing them with "Zero" (INFAZ), which avoids the missing rate, which can maintain the power system's stability and reliability.Meanwhile, the existing approaches [22,23,26,28,29,31] lack the selection of suboptimal features with feature importance scores, leading to potential feature selection bias.They lack sufficient performance in discriminating Attack Events from Natural Events and No Events.The rest of the existing approaches [24,25], which use the feature extraction techniques of deep learning methods, lack the optimal combination of extracting the features due to several hyperparameter factors.Moreover, the feature extraction using PCA [27] fails to capture the local structure or relationships within the data, which might result in misclassification between the three classes.Also, the author in [30] undertook manual feature selection, which may not be suitable for the generalizability and scalability of the model for different architectures and may require complex logical calculations.Meanwhile, our SAML-Triple (INFAZ) can be scalable and generalizable to the IEEE 'N' bus system for different architectures.
The limitation of the ICS Cyber Attack Power System Triple-Class Dataset [34] is that the data generated for the No Events records are less than the other two events.Furthermore, the Natural Events records are less than the Attack Events records.Due to the imbalance of the dataset, SMOTE is required to balance it, as depicted in Table 4. Its potential impact is shown in Table 10, with the accuracy metric comparing those without SMOTE vs. with SMOTE.

SAML-Triple: Generalization and Scalability for IEEE 'N' Bus System
For the generalization and scalability of the proposed approach of SAML-Triple, we have considered the IEEE 14-and 57-bus systems' datasets from J. Sakhnini et al. [13], generated with the MATPOWER library.The dataset is posted publicly as an open source at the GitHub link [38].The author simulated an FDI attack on the IEEE 14-and 57-bus systems with 10,000 training records and 1000 records for testing on each bus system.The IEEE 14-bus system has 34 feature columns, whereas the IEEE 57-bus system has 137 feature columns.
Table 17 represents the parameter specifications of the feature extraction technique (NCA) with the specified range to find the optimal 'N' component and maximum iteration 'I' for the IEEE 14-and 57-bus systems [38].The parameter specification range for the IEEE 14-bus system lies between 2 and 10 NCA components, the iteration range from 5 to 20, and a learning rate of 0.001.In contrast, the IEEE 57-bus system lies from 60 to 90 NCA components, with an iteration range of 5 to 20 and a learning rate of 0.001.The choice of range for each bus system is set below the number of actual features.Table 18 represents the generalization and scalability of the proposed SAML-Triple approach to the IEEE 14-and 15-bus systems with accuracy metric comparison.The IEEE 14-and 57-bus datasets [38] provided by the author [13] used the Binary Cuckoo Search (BCS) optimization algorithm as a Heuristic Feature Selection approach to select the optimal feature subset.BCS is susceptible of converging to local optima, especially in complex and irregular fitness landscapes.It may struggle to select the features due to premature convergence and suboptimal solutions.Identifying suitable parameter values for this dataset [38] may require extensive tuning.This may limit its ability to find globally optimal or near-optimal solutions.Our proposed SAML-Triple approach utilizing NCA preservers the data's local and global variance associations, making it well-suited for extracting features from highly complex correlated datasets that exhibit linear and nonlinear dependencies.The detailed significance of the proposed approach is explained in Methodology Section 4. Figures 11 and 12 represent the accuracy vs. parameter range graph for the IEEE 14and 57-bus systems, respectively.The proposed approach of SAML-Triple utilizing NCA with the (ET + AdB) ML classifier outperforms the existing approach of BCS with Support Vector Machine (SVM).Based on the parameter specification range from Table 18, the proposed approach is fined-tuned to obtain a higher accuracy of 93.94% at two components, fifteen iterations, and a 0.001 learning rate compared to 90.69% with eleven features for the IEEE 14-bus system.For the IEEE 57-bus system, the proposed approach yields 90.92% accuracy at 90 components and five iterations, with a 0.001 learning rate compared to 88.59% with 94 features.

Overall Summary of the Proposed Work
Table 19 stands for the overall summary of the SAML-Triple approach with various performance metrics.In the SAML-Triple approach, comparatively higher accuracy values of 97.51% and 98.25% were obtained with INFAZ and INFAD, respectively.The FNR (missing rate) values of 2.49% and 1.75% and FPR (false alarm) values of 1.24% and 0.88% were obtained low with INFAZ and INFAD, respectively.The testing time of the IDS was 0.23 ms to detect an attack that is less than 8.3 ms between samples, by which the system admin is alerted early to activate the prevention system.Hence, the SAML-Triple approach of NCA with the (ET + AdB) ML classifier outperforms better in the discrimination of cyberattacks in triple class (No Events/Natural Events/Attack Events).Based on the comparison between the results of the existing approaches and our proposed work of SAML-Triple (INFAZ), it is concluded that SAML-Triple (INFAZ) alone can conduct this triple-class event discrimination with robustness.The robustness of the proposed SAML-Triple approach was tested for generalizability and scalability with the IEEE 14-bus and 57-bus system datasets of the FDI attacks [38].The proposed approach outperformed with 93.94% and 90.92% for the IEEE 14-bus and 57-bus systems, respectively, compared to the existing approach with 90.69% and 88.59% accuracy.
For both the ICS Cyber Attack Power System Triple-Class Dataset and the IEEE 14and 57-bus system datasets of the FDI attacks, the accuracy metric results were significantly impacted by the NCA parameters of the N-components and Iterations.Tables 8 and 18 demonstrate that the parameters of the N-components and iterations have a significant impact on the accuracy performance metrics.

Conclusions and Future Works
In the mission-critical infrastructure of a smart grid, the proposed approach of SAML-Triple (INFAZ) addresses the specific problem of cyberattack discrimination from power system disturbances with a reduced missing rate and decreased response time.This paper proposes a novel mechanism of the statistical approach with Neighborhood Component Analysis as a feature extraction technique by optimal hyperparameterized tuning with the (ET + AdB) ML classifier.In the SAML-Triple approach, three events-No Events, Natural Events, and Attack Events-were discriminated with the highest accuracy of 97.51% and 98.25% by preprocessing with INFAZ and INFAD, respectively, compared to the existing approaches.The overall summary section provides insights into other performance metrics such as FNR, FPR, and response time with better results.Several test cases were executed to test the robustness of the model, which achieved more than 95%.Thus, the SAML-Triple (INFAZ) approach performs as a robust Anomaly-based IDS with a low missing rate of 2.49%, lower response time of 0.23 ms, decreased false alarm rate of 1.24%, and high detection accuracy of 97.51%.Our proposed novel approach addresses the privacy and access control violations of cyberattacks in the smart grid infrastructure to minimize the processing downtime, large-scale load loss, blackouts, and cascading failures.The robustness of the proposed model was evaluated with the IEEE 14-bus and 57-bus system datasets of FDI attacks for generalization and scalability, and accuracy values of 93.94% and 90.92%, respectively, were achieved.
In future work, the proposed approach will be extended to find the attack-specific location and the type of attack established from the attacker's end in the Multiclass dataset available from the same data source.The proposed approach can also be extended for scalability with other IEEE 'N' bus systems.Since data records are treated statistically with the proposed approach, it can be extended to other cyber-physical systems for anomaly detection.This proposed work can be extended to any WAMS Testbed as a smart grid by including a few more attacks from generator-side faults and insider attacks.The potential of insider attacks has not been investigated much in the context of a smart grid, and future work could focus on defining such attacks and mitigating them.

Figure 1 .
Figure 1.Process flow diagram of the SAML-Triple approach of Anomaly-based IDS in a smart grid using dataset [34].

4. 4 .
Proposed Algorithm-SAML Algorithm 1 depicts the pseudocode for data preprocessing and NCA transformation.The input for Algorithm 1 consists of triple class for SAML-Triple.The output of Algorithm 1 is used to obtain the fitted N-Component List (N-CompList) and Iteration List (ItrList) for the given datasets.For SAML-Triple, it is carried out with an 80:20 train-test ratio.For SAML-Triple, Step 1 reads the input dataset into a data frame, which iterates separately for 15 individual datasets.In Step 2, the data wrangling is carried out with INFAZ and INFAD.In Step 3, the features were split into independent columns (X) and the dependent column as the target column (y).Step 4 involves standardization (standard scalar) to bring down all the features to a standard scale without distorting the difference in the range of values.In Step 5, a label encoder applies the target/marker column as y_label.In Step 6, SMOTE is applied for X ′ and y_label to balance the dataset into X ′′ , y ′′ .Step 7 used stratified sampling to perform a train-test split with equal samples from each class.Steps 8 and 9 involve performing fit and transform on the train data and transform on the test data, respectively, with the ranges specified in Table 3.Finally, Step 10 stores the train and test records in the pickle format fitted and transformed with the Optimal N-CompList and ItrList.The pickle format stores the object in the file in byte format and can reload whenever necessary.Algorithm 2 depicts the Pseudocode for Optimal Hyperparameter tuning to find the 'N' Component (N) and Iteration (I) of NCA with the best parameters for each of the ML classifiers applied.The ML classifiers (C i ) pool is used to evaluate the performance with (ET + AdB) ML classifier, ET, DT, RF, KNN, and XGBoost.The input for Algorithm 2 was obtained from the output of Algorithm 1 with N-CompList and ItrList of stored NCA train and test in the pickle format.The output includes the optimal hyperparameters for each classifier applied to classify Attack Events from Natural Events and No Events.In Step 1 of SAML-Triple, the stored pickle NCA data (train and test) are loaded and assigned to NCA_X_train, NCA_y_train, NCA_X_test, and NCA_y_test, respectively, for each of the 15 datasets separately.Step 2 is used to find the Optimal 'N' Component (N) and Iteration (I) of NCA through GridSearchCV (10-fold cross-validation) to exhaustively search for the best parameters from the given specified parameters provided in Table3separately for each of the ML classifiers applied.In Step 3, find the best hyperparams from the trained model on each ML classifier (C i ) applied.Step 4 is used to perform the predictions on the test data with various performance metrics on each of the ML classifiers (C i ) applied.Finally, the test result prediction is used to discriminate the Attack Events from Natural Events and No Events in the triple class.

Figure 6
Figure 6 shows the SAML-Triple accuracy (bar) graph of the NCA vs. PCA with the (ET + AdB) ML classifier with two preprocessing aspects of INFAZ and INFAD.Figure6implies that NCA_INFAZ (blue bar) is comparatively higher in accuracy than PCA_INFAZ (yellow bar) across 15 datasets, and the same is true for NCA_INFAD (red bar) vs. PCA_INFAD (green bar).From another perspective, NCA_INFAD (red bar) performs comparatively better than NCA_INFAZ (blue bar), which is not an ideal case that may lead to a missing attack rate; hence, NCA_INFAZ (blue bar) provides good accuracy of more than 96% across 15 datasets.

Figure 7 .
Figure7depicts the SAML-Triple (INFAZ) approach using the PCA representation of three events with the (ET + AdB) ML classifier (e.g., Dataset-1).The three axes, x, y, and z, represent PCA-1, PCA-2, and PCA-3, respectively.These three axes' values represent the top three components with high variance from the transformed 128 features, which retain the maximum relative information.Figure7arepresents the Attack Events training samples, Figure7brepresents the Natural Events training samples, and Figure7crepresents the No Events training samples.Figure7drepresents the test samples that are tedious to discriminate between the Attack Events and Natural Events that might belong to any of the three events.

Figure 9a represents the
Figure 9a represents the accuracy (bar) graph comparison of SAML-Triple with two preprocessing aspects, INFAZ and INFAD.Figure 9a implies that the (ET + AdB) ML classifier (blue bar) achieved a higher accuracy of 97.51% and 98.52% for INFAZ and INFAD, respectively, than the rest of the ML classifiers.Figure 9b is an FNR graph (missing rate) comparison of SAML-Triple with two preprocessing aspects.Figure 9b implies that the (ET + AdB) ML classifier (blue bar) achieved the lowest FNR values of 2.49% and 1.75% for INFAZ and INFAD, respectively, compared to the rest of the ML classifiers.Figure 9c is an FPR graph (false alarm) comparison of SAML-Triple with two distinct preprocessing aspects.Figure 9c implies that the (ET + AdB) ML classifier (blue bar) achieved the lowest FPR values of 1.24% and 0.88% for INFAZ and INFAD, respectively, compared to the rest of the ML classifiers.

Figure 9 .
Figure 9. SAML-Triple-comparison of various ML classifiers with two preprocessing aspects: (a) accuracy graph; (b) false negative rate; (c) false positive rate.

Table 2 .
SAML-Triple with event scenario split for triple class.

NCA aims to optimize the selection of local neighborhood relationships for maximum classification accuracy, whereas PCA focuses on capturing global variance without optimization.
[27]A[27](Unsupervised Learning) of the statistical approach is a feature extraction technique that projects the matrix into a linear space of lower dimensionality.It transforms a set of correlated variables into a new set of uncorrelated variables called principal components.These principal components are sorted in descending order of variance, capturing the maximum amount of information from the original data in the first few components.•NCA takes this further by clustering data based on the matrix's dimensionality reduction results with the label.• Overall, is the probability associated with sample i.•p ik is the probability that sample i belongs to class k. •

Algorithm 2
Pseudocode for Optimal Hyperparameter tuning to find the 'N' Component and Iteration 'I' of NCA with the best parameters for each (C i ) classifier applied INPUT : N − CompList , ItrList of stored NCA Train and Test Data in pickle format OUPUT : Discrimination of Attack Events from Natural Events and No Events in TRIPLE i ).

Table 4 represents
SAML-Triple (INFAZ)-SMOTE records (before and after) for the 15 datasets.Each of the 15 datasets is highly imbalanced, with the uneven distribution of Attack Events records from 64.73% to 78.13% and normal records (Natural Events and No Events) varying from 21.87% to 35.27% of the distribution.The imbalanced original dataset (without SMOTE) constitutes 4405 rows of No Events records, 18309 rows of Natural Events records, and 55,663 rows of Attack Events records, totaling 78,377 of the triple-class datasets.Meanwhile, with SMOTE (balanced dataset), all three classes are equal to Attack Events records of 55,663, constituting 166,989 records.As per the Pareto principle, an 80:20 ratio of train-test split is performed for both the before and after SMOTE process cases.

Table 4 .
SAML-Triple (INFAZ) before and after SMOTE with the train-test split of the 15 datasets.

Table 5 .
SAML-Triple with preprocessing aspects and train-test split ratio before and after SMOTE records for the first dataset.

Preprocessing Aspects INFAZ INFAD Applying SMOTE on each of the 15 Datasets Separately
(e.g., Dataset

Table 6
represents the terminology used in the specifications of the parameters.

Table 6 .
The terminology used in the specifications of the parameters.

Table 7
represents the parameter specifications of the feature extraction technique (NCA) with the specified range to find the optimal 'N' Component and maximum Iteration 'I'.The parameter specification range for SAML-Triple lies between 20 and 35 components in the NCA component list, and iterations range from 2 to 10 with a two-step increment.The range is set based on trials and broad studies in related work in Section 2.

Table 8 .
SAML-Triple-performance metrics comparison of NCA with AdaBoost (ET*) classifiers in two preprocessing aspects.

Table 9
(2)resents the possible test case scenarios.SAML-Triple has seven test cases with an Attack Events label as (0), Natural Events label as(1), and No Events label as(2).

Table 11 represents
the SAML-Triple accuracy metric comparison of feature extraction techniques-NCA vs. PCA with the (ET + AdB) classifier with two preprocessing aspects, INFAZ and INFAD.The NCA with the (ET + AdB) classifier provides an average higher accuracy of 97.51% and 98.25% for INFAZ and INFAD, respectively, than the PCA with the (ET + AdB) classifier of 97.25% and 97.69%.The accuracy for the SAML-NCA with (ET + AdB) with INFAZ and INFAD was obtained from Table8, and a similar approach was carried out for the PCA technique.

Table 11 .
SAML-Triple-comparison of NCA vs. PCA with two preprocessing aspects.

Table 12 .
SAML-Triple-classification accuracy of various test cases with two preprocessing aspects.

Table 13 .
SAML-Triple-performance metrics comparison of various ML classifiers with two preprocessing aspects.

Table 14 .
SAML-Triple (INFAZ)-average response time of various ML classifiers for 120 samples/second system.
Table 15 represents the SAML-Triple (INFAZ) accuracy with average response time and per sample response time comparison across various ML classifiers.

Table 16 .
Accuracy metric comparison of SAML-Triple with other existing approaches.
*-represents the highest accuracy achieved through proposed approach.

Table 17 .
Parameter specifications of feature extraction technique (NCA) for IEEE 'N' bus system.

Table 18 .
Generalization and scalability of proposed SAML-Triple approach to IEEE 'N' bus systems with accuracy metric comparison.
*-represents the highest accuracy achieved through proposed approach.

Table 19 .
Overall summary of SAML-Triple approach with performance metrics.