Correction: Almalaq et al. Deep Machine Learning Model-Based Cyber-Attacks Detection in Smart Power Systems. Mathematics 2022, 10 , 2574

decrease feature redundancy and learning time while minimizing data information loss. Furthermore, the proposed model investigates the potential of deep

confidentiality, and availability.In addition to transmitting, distributing, monitoring, and controlling electricity, a smart grid (SG) would greatly enhance energy effectiveness and reliability.Such systems may fail and result in temporary damage to infrastructures [5].Power grids are regarded as essential infrastructure nowadays by many societies, which have developed security measures and policies related to them [6].Phasor measurement units (PMUs) are adopted in modern electrical systems to improve reliability as they become more complex in their structure and design.Utilizing the gathered information for quick decision making is one of the advantages.There is still the possibility that hacker exploits vulnerabilities to result in branch overloaded tripping, which will lead to cascading failures and, therefore, leads to considerable damage to SG systems [7].As the operators monitor and manage the energy grid, they must consider possible attacks on the grid.To accomplish this, much energy and grid expertise is required.However, deep machine learning (DML) methods are used because of their capability to recognize patterns and learn, as well as being quickly able to identify potential security boundaries [8].
4-In the contribution section (Section 1.3), Ref. [17] should be [20], also (1), ( 2), (3) should be updated to: (1) A new classification model based on the Decision Tree (DT) and auto-encoder technique has been proposed as a binary classifier to detect attacks with the aim of increasing the detection accuracy and decreasing the false positive index.(2) A Principal Component Analysis (PCA) applied to the raw data of PMUs as an effective feature selection model reduces feature redundancy and learning time while minimizing the loss of data information.This approach has been shown to be effective in various evaluations as it significantly improves the performance of models.(3) A new process for handling abnormal data, such as non-numbers and infinity values in data sets, is proposed.This approach could significantly enhance accuracy in comparison to the conventional processes of processing abnormal data.
6-The section (Section 2.1, paragraph 2) "This experiment applied a data set that contains 128 features recorded using PMUs 1 to 4 and relay snort alarms and logs (Relay and PMU have been combined).A synchronous phasor, or PMU, measures electric waves on a power network using a common time source.A total of 29 features could be measured by every PMU.The data set also contains 12 columns of log data from the control panel and one column of an actual tag.There are three main categories of scenarios in the multiclass classification data set: No Events, Events, Intrusion, and Natural Events.Table 1 summarizes the scenarios, and a brief explanation of each category is provided in the data set" should be "The experiment applied a data set that contained 128 features recorded using PMUs from 1 to 4 and relay snort alarms and logs as in Ref. [24].A PMU measures electric waves on a power network using a common time source.A total of 29 features could be measured by every PMU [24].The data set also contains twelve columns of log data from the control panel and one column of an actual tag.There are three main categories of scenarios in the multiclass classification data set: No Events, Events, Intrusion, and Natural Events such as storms and tornadoes [25]".
10-Table 1. Explanation of scenarios should be removed.11-In Section 2.2, Reference " [22]" should be " [26]".12-In Section 2.2, References " [11,12]" should be " [14,15]".13-In Section 2.2, "Technically, it is possible to construct novel features using a combination of attributes that could help more effectively utilize possible types of data instances, which could be used in machine learning models for better application.It is worth noting that we made use of the random forest method to create and classify features.Finally, based on anticipation weighted voting, 37 various case studies were implemented for simulation purposes.",should be updated to "Technically, it is possible to construct novel features using the PCA as the feature selection model that could help to more effectively utilize possible types of data instances which could be used in machine learning models for better application".
14-Section 2.3 should be inserted after Section 2.2 as follows: The presence of redundant or irrelevant features could impede the performance of a machine learning classifier, causing slow convergence or complete failure.To address this issue, this paper employs a PCA, also known as the Karhunen-Loeve transform or the eigenvector regression filter [27].A PCA reduces dimensionality by eliminating the weakest principal components, resulting in a lower-dimensional projection of the raw feature data that preserves maximal data variance.This reduction is achieved through an orthogonal, linear projection operation.It is worth noting that the PCA operation does not result in any loss of generality.
The projected data matrix Y ∈ R S×P contains P principal components of X, where P is less than or equal to N. The key step involves determining the projection matrix C ∈ R N×P , which can be accomplished by finding the eigenvectors of X's covariance matrix or by solving a singular value decomposition (SVD) problem for X [28].
The orthogonal matrices U ∈ R S×S and V ∈ R N×N represent the column and row spaces of X, respectively, while D is a diagonal matrix that contains the singular values λ n , for n = 0, . . ., N − 1.The singular values are arranged non-increasingly along the diagonal of D. It has been demonstrated [28] that the projection matrix C can be derived from the first P columns of V, with P being the desired number of principal components. and In which v n ∈ R N×1 defines the n th right singular vector of X, and The singular values in D from (2) indicate the standard deviations of X along the principal directions in the space spanned by the columns of C. The value of λ 2 n represents the variance in X's projection along the n th principal component direction.Variance is often used as a measure of the amount of information contributed by a component to the data representation.To evaluate this, the cumulative explained variance ratio of the principal components is typically examined and expressed as a fraction.
15-Section 2.4 should be "Diagnosing Attack Behavior Model Structure".16-In Section 2.4, Reference [24] should be added after "electrical grids" in the first paragraph, and one sentence "(In this paper, PCA is used as the feature selection model)" should be added at the end of the second paragraph.17

Properties Making
During property making, 16 novel features have been extracted from every PMU measurement feature and incorporated into the original data set for preparing for the next step.Raw data is mainly used for extracting novel features based on corresponding computations.

Data Processing
It is important to process the data prior to sending it to the machine learning model.The normalization of the data is an important part of data processing.The benefit of this method is that it speeds up and improves the accuracy of iterations for finding the best solution for gradient descent.Among the most common techniques of data normalization are z-score standardization and min-max standardization.Basically, min-max standardization works by changing the original data linearly toward an outcome between [0, 1] shown below: (6) In addition, Z-score standardization has been known as standard deviation standardization, and it has been mostly applied for characterizing deviations from the average.The data analyzed through this technique assure the standard usual distribution, which is that the standard deviation and average are equal to one and zero, respectively.The data processed using the process can satisfy the standard normal distribution, meaning the mean equals 0 and the standard deviation Equation (6).Following is the transformation function, the mean amount of the instant data is shown by and the standard deviation is represented by .This study adopts this normalization process.(7) A data set may contain the not a number (NAN) and infinity (INF) amount, but it has been usually substituted through the mean amount or zero.For the data set applied here, the novel replacement process is proposed to avoid underflows in the final replacement value and the data being overly discrete.
value is used for replacing NAN and INF values present in the data.It can be calculated as follows: 18-Section "2.4" should be updated to "2.5.", "2.4.1" updated to "2.5.1", and "2.4.2" updated to "2.5.2" and other numbers reordered accordingly.
19-The sentence "Table 2 shows the name, explanation, and extraction process of the extracted feature" should be removed, and Table 2 as well.

Establish Classifier Layouts
Following a series of tests using various machine learning classifiers, a DT classifier was selected due to its superior performance.The sigmoid layer's fusion activation function is defined by the equation provided below.
In which F 1 defines the sigmoid layer's fusion activation function, y k shows the k th pattern's tag, t k defines the k th pattern's prediction, and w l and w s define the stable pattern and unstable pattern, respectively [31].
A for loop was used to test the Autoencoders (AEs) with varying numbers of layers, neurons, batch sizes, loss and activation functions, optimizers, epochs, and dropout layers in order to improve accuracy and the f-measure.Both Stacked Autoencoder (SAE) and Deep Neural Network (DNN) models utilize the Binary Cross-Entropy (BCE) cost function and the Rectified Linear Unit (ReLU) activation function to achieve optimal performance, as represented by the performance metrics.
In which x defines the observation.

Proposed Machine Learning
An advanced deep learning approach is presented to make a powerful detector for the system.The proposed approach involves building a deep base model to learn representative features.To ensure diversity in the base model, multiple deep autoencoders were created, including an SAE, a Denoising Autoencoder (DAE), and linear decoder methods.Each of these models was trained using unique datasets generated through the Bootstrap method.To this end, the characteristics were first selected.Secondly, deep base models were developed to adaptively learn hidden characteristics from the exploited indexes obtained.To ensure diversity in the base patterns, deep autoencoders were constructed using SAE, DAE, and linear decoder methods.
22-Section 3.1 "Data Set" section should be modified as follows: A multiclass classification data set for ICS cyber-attacks is used in the present study.There are several terms applied in machine learning that require an explanation.The true positive (TP) is the positive sample that the layout predicts to be positive, the false positive (FP) is the negative sample that the layout predicts to be positive, and the false negative (FN) is the positive sample that the model predicts to be negative, the true negative (TN) is the negative sample that the model predicts to be negative.The suggested layout is evaluated using accuracy, precision, recall, and F1 score.An F1 score is basically the harmonic value of precision and recall, which are calculated according to the following equations: 23-Remove Table 3. 24-Section 3.2.1 should be updated as follows:
Actually, the main purpose of this research is to show the high and successful role of the deep learning models in reinforcing the smart grid against various cyber-attacks.In this regard, the proposed model would detect and stop cyber-hacking at the installation location rather than focusing on the cyber-attack type.Therefore, the localization procedure would be attained through the diverse detection models located in the smart grid, but the cyber-attack type detection requires more data that can be made later based on the recorded abnormal data.
25-Section 3.2.2"Outcomes" should be updated as follows: In order to determine the need for various models (fault analysis), we performed some comparative experiments according to various PMU kinds.In one group, properties of localization/segmentation are sent to the related DML model in order to train, and in the other one, whole features are sent to various machine learning models.Moreover, it is shown in Table 1 that data can be effectively split according to the PMU resources.Splitting the data can enhance the accuracy of classification models as well as reduce data dimensions and enhance training speed and minimize computing sources.Several corresponding experiments are conducted on various ways of replacing abnormal values in data.Table 2 shows the outcomes.The replacement method is shown in the left column, and the suggested approach is represented by log_mean.Zero shows a process to replace NAN and INF with zero values, and mean shows a process to replace with the mean value.The proposed model is utilized as a trial model, and the accuracy is adopted as the assessment metrics, that is, the Log-mean column in Table 2.
Applying the log_mean technique for replacing the unusual amount in the data is intuitively the best approach.According to the outcome, the suggested process in order to process abnormal values has proven successful.
Table 3 shows the suggested method with PCA in compare of other selection method.As can be seen the accuracy rate of the suggested method with PCA is better than other methods.
Comparison experiments are also conducted to verify feature selection.First, the significance of the original features is determined, and afterward, they are arranged based on significance.A variety of mixtures of features has been selected for training, and Table 4 shows these outcomes.The approach was verified practically through a comparative test.The test extracts the test group and training group from 15 multiclass data sets in a 9:1 ratio at random, and afterward, these data sets have been combined into 1 training group.The training group has been transferred to the layout to train and learn.Table 5 presents the outcomes of 15 test sets transferred to the model for practically simulating the efficiency of the model applications.It is apparent that the model's accuracy has decreased.It is because data interaction would occur by increasing the number of data resulting in changing the model, and whenever whole data has been combined, there would unavoidably be abnormal points and noises.Due to the fact that such noises and anomalies have not been separated in training, the model's indexes alter, and the robustness decreases.Firstly, the efficacy of the features created from the feature construction engineering in the model is determined by sorting the significance of features.Model interpretability can be determined by determining the significance of features.Weights, gain, cover, and so on are general indicators of feature significance [25].
The test trains 15 sets of multiclass classification data sets and tests respectively and uses accuracy as an assessment metric [24].The accuracy of the trail data sent to the layout before and after optimization based on the main 128 properties is shown in Figures 3-5.The classification accuracy of the trail group on various layouts with default variables is shown in Figure 3, and the accuracy of the trail group on the layout applying optimized variables is represented in Figure 4.For a more intuitive visualization of the variation in accuracy after layouts are optimized, Figures 3 and 4 are combined, and the mean of the accuracy values for whole sets are adopted, i.e., Figure 5. Figures 3-5 shows that the SVM layout with default variables has an accuracy of approximately 0.30, but after optimization, it grows to 0.85, which represents a near 200% advancement.Other models have improved significantly in accuracy after optimization as well.The best accuracy of the proposed model is 0.925.The test set had better performance on the model suggested in this study in comparison to the conventional DML and CNN, as shown in Figures 3-5.
A true decision is obtained when the detection layout produces the correct result.Conversely, a false decision indicates a false response from the cyber-attack detection layout and can lead to decreased reliability.Therefore, it is important to develop an anomaly detection layout with low false rates.Four criteria, namely Correct Reject Rate (CR), Miss Rate (MR), False Alarm Rate (FR), and Hit Rate (HR), can be used to assess the effectiveness of the layout.To better understand these criteria, a confusion matrix is provided in Table 6.A true decision is obtained when the detection layout produces the correct result.Conversely, a false decision indicates a false response from the cyber-attack detection layout and can lead to decreased reliability.Therefore, it is important to develop an anomaly detection layout with low false rates.Four criteria, namely Correct Reject Rate (CR), Miss Rate (MR), False Alarm Rate (FR), and Hit Rate (HR), can be used to assess the effectiveness of the layout.To better understand a confusion matrix is provided in Table 6.In order to evaluate the performance of the suggested detection mechanism for detecting cyber-attacks and anomalies in smart grids, the cyber-attack models are applied.The evaluation outcomes were recorded and are presented in Tables 7 and 8. From the tables, it can be observed that the proposed detection mechanism is highly effective in detecting cyber-attacks, with a detection accuracy rate of over 97%.This indicates that the suggested detection method is capable of accurately detecting FDI attacks and can be considered an efficient solution to the problem.The evaluation results demonstrate the effectiveness of the suggested detection mechanism for detecting cyber-attacks in smart grids, and highlight the potential of deep machine learning methods with PCA and DT for addressing challenges in the field of cyber security.In order to evaluate the performance of the suggested detection mechanism for detecting cyber-attacks and anomalies in smart grids, the cyber-attack models are applied.The evaluation outcomes were recorded and are presented in Tables 7 and 8. From the tables, it can be observed that the proposed detection mechanism is highly effective in detecting cyber-attacks, with a detection accuracy rate of over 97%.This indicates that the suggested detection method is capable of accurately detecting FDI attacks and can be considered an efficient solution to the problem.The evaluation results demonstrate the effectiveness of the suggested detection mechanism for detecting cyber-attacks in smart grids, and highlight the potential of deep machine learning methods with PCA and DT for addressing challenges in the field of cyber security.26-The conclusion section should be updated as follows:

Conclusions
This study proposes a new deep model and feature selection approach for identifying faults and cyber-attacks in electrical systems using various smart grid information and data analysis.Different DML assessment indexes with PCA and DT were used to evaluate the suggested model and conventional DML methods.The results showed that the information analyzing process improves the model's accuracy and the proposed layout detects various types of behavior in smart grids efficiently.Machine learning with PCA and DT can be used in the power grid to assist operators in making decisions, such as detecting abnormality in data gathering and estimating the system status if data readings from any PMU are unusual.According to the results, the proposed method can accurately and efficiently detect cyber-attacks in smart grids.Furthermore, the study concluded that the proposed model demonstrates good performance in detecting destructive attacks with different intensities.The outcomes of two different metrics, namely the detection rate and the confusion matrix, support the precision and reliability of the proposed anomaly detection approach.
27-The citations should be reordered and updated as follows:

Figure 2 .
Figure 2. Overview of layout to detect cyber-attacks in smart grids.

Figure 2 .
Figure 2. Overview of layout to detect cyber-attacks in smart grids.

Table 1 .
Transfer diverse characteristics to the layout for comparison.

Table 2 .
Diverse methods to procedure INF and NAN.

Table 3 .
Accuracy rate of various feature selection method.

Table 4 .
Assessment of characteristics chosen.

Table 5 .
Layout accuracy on 15 trails sets in the actual simulation.

Table 6 .
Confusion matrix of proposed scheme.

Table 7 .
The proposed detection scheme.

Table 6 .
Confusion matrix of proposed scheme.

Table 7 .
The proposed detection scheme.

Table 8 .
Confusion matrix of the proposed detection scheme.