A Smart Grid AMI Intrusion Detection Strategy Based on Extreme Learning Machine

: The smart grid is vulnerable to network attacks, thus requiring a high detection rate and fast detection speed for intrusion detection systems. With a fast training speed and a strong model generalization ability, the extreme learning machine (ELM) perfectly meets the needs of intrusion detection of the smart grid. In this paper, the ELM is applied to the ﬁeld of smart grid intrusion detection. Aiming at the problem that the randomness of input weights and hidden layer bias in the ELM cannot guarantee the optimal performance of the ELM intrusion detection model, a genetic algorithm (GA)-ELM algorithm based on a genetic algorithm (GA) is proposed. GA is used to optimize the input weight and hidden layer bias of the ELM. Firstly, the input weight and hidden layer bias of the ELM are mapped to the chromosome vector of a GA, and the test error of the ELM model is set as the ﬁtness function of the GA. Then, the parameters of the ELM intrusion detection model are optimized by genetic operation; the input weight and bias, corresponding to the minimum test error, are selected to improve the performance of the ELM model. Compared with the ELM and online sequential extreme learning machine (OS-ELM), the GA-ELM e ﬀ ectively improves the accuracy, detection rate and precision of intrusion detection and reduces the false positive rate and missing report rate.


Introduction
With the passage of time, the smart grid has made great progress in both technical and practical levels, and the ensuing smart grid security issues have also attracted more and more attention. If the smart grid is attacked, people's lives will be seriously affected. For example, in 2009, the smart meter system of a US power grid company was attacked by hackers, resulting in enormous economic losses [1]. In 2010, Stuxnet targeted a number of energy facilities, hitting the industry hard [2]. In 2015, a power system in Ukraine was targeted by a Denial of Service (DoS) network attack, resulting in a large-scale blackout in the region [3]. It is obvious that smart grids bring convenience, as well as new challenges, to society; hence, targeted research must be carried out. Based on this, the United States, Europe and many other countries, as well as China, have successively carried out some researches in smart grid security and explored the field of smart grid intrusion detection. methods cannot detect some new attacks against AMI and require frequent updates, research mainly focuses on anomaly-based and standardized intrusion detection.
Anomaly-based IDS uses statistical measures to identify deviations from predefined normal behavior. In recent years, IDS based on machine learning (ML) and data mining technology has been a research focus in network security. A similar approach was adopted when developing IDS for AMI.
Zhang and others described a distributed IDS for AMI and Supervisory Control and Data Acquisition (SCADA) systems [23]. The system relies on anomaly-based sensors deployed in the HAN (home area network), NAN (neighborhood area network), WAN (wide area network) and SCADA environments. IDS sensors are collected from communication streams. For security-related information, two ML algorithms (support vector machine (SVM) and clonal selection algorithm (CLONALG)/Airs2Parallel) are used to process the data to identify malicious behavior. These algorithms require good training to achieve good performance; however, attack samples in AMI are basically rare. In addition, traditional ML algorithms are difficult to implement in embedded systems. Krishna and others proposed an anomaly detection method that combines the PCA (principal component analysis) feature selection algorithm and the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm to verify the integrity of smart meter measurements [24]. They used an open smart meter database, which was deployed from actual deployment and obtained to simulate normal electricity consumption behavior. Mohammadi and others proposed an anomaly and signature-based neighborhood network intrusion detection method in AMI [25]. This method considers various attacks against the physical layer, media access control (MAC) layer, transport layer and network layer and considers the specific needs of the NAN. Mustafa et al. [26] first introduced data mining technology to AMI's IDS and studied various data flow mining algorithms that performed well on the KDDCUP99 dataset, and they further expanded the research work [27], considering more data flow mining algorithms and the feasibility of the analysis of different IDS on three different components of the AMI (smart meter, data concentrator and AMI data transponder). Aiming at the infrastructure of the AMI, Fadwa et al. proposed an intrusion detection system based on real-time distribution (DIDS) [28] using multi-layer implementation and data flow mining technology; using a method called Mini-Batch K-means, the unsupervised online cluster technology monitors the data flow in the advanced measurement system to determine whether there is abnormal traffic.
The specification-based IDS models the ideal behavior of the system through its functions and security strategies. Any sequence of operations performed outside the specification is considered a violation and a suspicious intrusion. Specification-based IDS can be seen as a "stricter" exception-based IDS.
For network intrusion detection, Berthier and others used a specification-based method to monitor the ANSI C12.22 protocol through dedicated sensors deployed in the NAN [29]. The uniqueness of this solution is that it uses a formal method to prove that the specification-based checker provides sufficient coverage related to the AMI security policy. Jokar and others proposed a hierarchical specification-based IDS for the HAN. An intrusion detection system based on the IEEE 802.15.4 standard can detect abnormal behavior at the physical layer and MAC layer [30]. Ali and others proposed a mutation-based IDS that made the attacker's behavior unpredictable and, at the same time, ensured the certainty of the system [31]. They used the event logs collected from the intelligent collector to conduct AMI behavior, modeling, and then using, the invariant specifications generated by AMI behavior and variable configurations to verify these logs. In addition, they further extended their work [32,33], using a fourth-order Markov chain to model the event log and construct a specification written in linear time logic (LTL), and proposed a method of configuring randomization modules to resist and circumvent imitation attacks. Ruanand and others proposed a hybrid IDS framework that combined techniques based on specifications and signatures to handle known or unknown attacks [34]. The framework can also be deployed on devices with limited resources in the AMI. In [35], Gao and others proposed a scheme named IELM (the incremental extreme learning machine); in this scheme, the features about security are selected from the network traffic, where a better detection accuracy can be obtained, on the NSL-KDD standard dataset [36].
For host intrusion detection, Tabizi and others proposed a model-based smart meter IDS construction technology [37] that uses LTL to represent the time relationship between monitored entities, which is suitable for smart meters with low processing capacity/limited memory capacity and other limitations. They implemented the IDS on an open-source smart meter platform, SegMeter, and verified its effectiveness. Liu and others used color Petri nets to describe the information flow between the units in the smart meter and proposed a threat model for the smart meter [38]. Aiming at the situation of low processing power and limited memory capacity, an attack against false data injection was proposed, a collaborative intrusion detection mechanism.
In summary, there is a lot of literature on intrusion detection for smart grids, and scholars have conducted many explorations, whether based on host information and network information or based on abnormality-based detection, specification-based detection and signature-based detection.
Under different classification methods, various researches have achieved certain effects, but none of them can completely solve the problems of IDS in smart grids. Some new research methods are constantly emerging, especially in today's development of artificial intelligence technology; intrusion detection methods combining machine learning and deep learning [39][40][41] are constantly proposed. However, until today, a mature intrusion detection scheme has not been formed completely, and there is a lack of a benchmark for evaluation.
In addition, how to solve the security problem of AMI in smart grids is also a research focus. Therefore, the research on AMI strategy for smart grid security combined with artificial intelligence method ELM in this paper is of great theoretical and practical significance.

Research Foundation
In 2004, Professor Guang-bin Huang proposed an ELM, which is an effective single hidden layer feed-forward neural network with high generalization ability, as little manual intervention as possible and guarantees the premise of a certain learning accuracy; the speed of the algorithm is greatly accelerated [42]. Compared with traditional neural networks, it saves a lot of time and cost. This is because the algorithm does not need to repeatedly optimize the input weights and hidden layers from the input layer to the hidden layer during the training process. There is neuron offset, but the input weight and hidden layer offsets are obtained through random initialization. During training, we only need to determine the number of hidden layer neurons; we can directly get the output weight of the model to complete the model training.
Suppose a single hidden layer feed-forward neural network is provided with N training samples (x i ,t i ), where x i = [x i1 , x i2 , . . . , x in ] T ∈ R n is the sample input vector and t i = [t i1 , t i2 , . . . , t im ] T ∈ R m is the target output vector, and where n represents the number of features of the input sample and m represents that the training sample has a total of m classes. In addition, this single hidden layer neural network has L hidden nodes; then, the network output can be expressed as: where g(•) is the activation function, ω i = [ω i1 , ω i2 , . . . , ω in ] T is the input weight vector between the input layer and the hidden layer and β i = [β i1 , β i2 , . . . , β in ] T is the output weight vector between the hidden layer and the output layer; b i is the ith offset value of the first node of the hidden layer, where ω i ·x j represents the weight of the inner product of the value and the training sample value, and o j represents the actual output of the network model. The single hidden layer feedforward neural network model diagram is shown in Figure 1.  The goal of the single hidden layer feed-forward neural network is to minimize the error value of the output result, which is: where j t is the expected output.
According to the Formulas (1) and (2), i  , i  and i b exist to make the following formula true: Based on the matrix, the expression could be simplified to: In the formula, H is the output value matrix of the hidden layer,  is the output weight matrix from the hidden layer to the output layer and T is the expected output matrix. Further H,  and T are expressed as follows: The goal of the single hidden layer feed-forward neural network is to minimize the error value of the output result, which is: where t j is the expected output. According to the Formulas (1) and (2), β i , ω i and b i exist to make the following formula true: Based on the matrix, the expression could be simplified to: In the formula, H is the output value matrix of the hidden layer, β is the output weight matrix from the hidden layer to the output layer and T is the expected output matrix. Further H, β and T are expressed as follows: In most cases, Hβ = T cannot be established. In order to train the model, a set of parameters is found: ω i , β i and b i , so that the following equation is established: When dealing with such problems, the traditional neural network algorithm will continuously optimize the parameters during the iteration process, resulting in a longer model training time.
In the model training of the extreme learning machine, the input weights and the bias of the hidden layer are randomly initialized. Further, the output weight matrix. H is determined, so the model becomes a linear system, Hβ = T, which can be obtained by least squares β. The formula is as follows: In the formula, H + is the Moore-Penrose generalized inverse matrix of the output weight matrix H.
In summary, the learning process of the extreme learning machine is as follows: . . , N, the number of hidden layer nodes L and activation function g(x).
Output: The output weight β from the hidden layer to the output layer.
(1) Randomly initialize the input weights ω i and the offset of the hidden layer b i , (2) Calculate the output weight of the hidden layer H and (3) Calculate the output weight from the hidden layer to the output layer β.
Genetic algorithm is a global optimization algorithm for finding the optimal solution [43]. It is realized by simulating the Darwinian biological evolution theory and the genetic mechanism of nature. In the genetic algorithm, the search space about the representative problem is mapped to a genetic space, and the possible solutions in the genetic space are encoded to form a vector-that is, the chromosome in biological evolution-and each gene in the chromosome corresponds to each element of the vector through duplication, crossover, mutation and other ways to continuously evolve the "chromosome" represented by the decode, so as to find the optimal solution to the representative problem. The genetic algorithm has the characteristics of high parallelism and self-adaptation, the principle and operation are simple and it has been widely used in many optimization fields. The algorithm steps are as follows: (1) Population initialization In the process of initializing the population, including determining the size of the population M, the initial population Y(0) in the genetic algorithm is composed of M randomly generated chromosomes, the largest genetic algebra N, crossover probability P x and mutation probability P m in inheritance.
(2) Individual fitness calculation Let the t generation population be Y(t), and further, set f (y) as the population, where y ∈ M, M = y 1 , . . . , y m , y i = {X i , . . . , X m } and m represent the number of genes in each chromosome; then, the calculation formula of f (y) is as follows: Let fit(y) be the fitness function, where y ∈ y 1 , . . . , y M if it is an optimized neural network algorithm; then, the calculation formula of the fitness function of each chromosome is as follows: In (11), O j is the predicted value output of the j chromosome, and T j is the actual value output of the model; n represents the total number of input data.

(3) Individual evolution
Individual evolution-that is, in the model training process-imitates the biological evolution principle of nature; this process includes the following three steps: • Selection operation: Selection operation is based on the fitness of individuals in the population and randomly selects L parent pairs from the t generation population of Crossover operation: Crossover operation is the key algorithm in GA, which determines the global convergence. Among them, the key criterion is that the offspring chromosome should inherit the excellent characteristics of the parent chromosome and, at the same time, ensure the feasibility of the offspring itsel and,. further, randomly select L/2 chromosomes of the parents. When the probability of the chromosomes is less than the crossover probability P x , the crossover is made by randomly formulating one or more points, so as to obtain two progeny chromosomes and, finally, obtain L intermediate individuals by crossover.

•
Mutation operation: L intermediary individuals are obtained through crossover operations. Then, the mutation operation is performed according to the mutation probability P m , which is achieved by changing alleles and, finally, determined to form L candidate individuals.
(4) Child selection Select M chromosomes from L candidate individuals according to fitness and form a new generation of population Y(t + 1) by the fitness ratio method. If the probability of a certain body being selected is P x , then the calculation formula of P x is: In the formula, fit(y i ) is the fitness value of the i candidate, and f it(y i ) is the sum of the fitness values of all candidates in the population, and y i represents the i candidate in the population.
The specific selection steps are as follows: • Calculate the fitness value of each candidate individually, When the genetic algorithm iterates to the maximum genetic algebra N, the optimization of the genetic algorithm ends. At this time, the individual with the highest fitness value is selected as the global optimal solution for this model optimization, namely:

Data Selection
In a smart grid, because the advanced measurement system is connected to the internet and is susceptible to network intrusion, the advanced measurement system can be specifically divided into a three-layer network model: The first-level network is composed of smart meters and controllable appliances, which is susceptible to DoS and probing attacks; the second-level network is composed of the network before the smart meter and the concentrator, which is susceptible to user-to-root (U2R) attacks; the third-level network is composed of the network between the concentrator and the data processing center and is susceptible to remote-to-login (R2L) attacks [4]. In response to these four types of network attacks, this paper selects the KDDCUP99 dataset [44] for verification experiments, which can fully represent the IDS situation that AMI may encounter.
KDDCUP99 is composed of labeled training data and unlabeled test data. The training set data consists of five million connection records, and the test set data consists of two million records. Each record has 41 feature attributes and a class identifier. The characteristic attributes include the basic characteristics of the transmission control protocol (TCP) connection (nine types in total), content characteristics of the TCP connection (13 types in total), time-based network traffic statistical characteristics (nine types in total) and host-based network traffic statistical characteristics (10 species in total). Class identifiers include normal and abnormal. Among them, abnormal types can be divided into four attack types, which are further subdivided into 39 attacks. The training set contains 22 attacks, and the test set contains the rest (17 attacks).
The four types of abnormalities include DoS, R2L, U2R and probing: ( In addition, KDDCUP99 provides a 10% training set that contains most categories of all training sets. In this paper, the 10% of the set is used as our experimental data. Furthermore, 10% of the training set is still relatively large; there were still 494,021 datasets. It requires further sampling. In this experiment, 10,000 pieces of data were divided as the training set, and 10,000 pieces of data were divided as the test set. In the 10% KDDCUP training set, probing, U2R and R2L have less data. In order to enhance the model's detection of these three types, this article will use all the records of probing, U2R and R2L and split half of each type of data as training data, and the other half of the data are used as test data. For the two types of data, normal and DoS, refer to [45]; sampling is completed at a ratio of 1:3. For the training data set and test data sets of this paper, see Table 1.

Technical Evaluation Indicators
Six indicators are mainly used to evaluate the smart grid intrusion detection algorithm-namely, test error, correct rate, false negative rate, detection rate, false alarm rate and detection accuracy [45,46]. In the experiment, true positive (TP) refers to the number of normal types that triggered smart grid attack detection and was correctly identified as a normal type, false positive (FP) refers to triggered smart grid attack detection and the number of false alarms that are mistakenly recognized as the correct type, true negative (TN) refers to the number of abnormal alarms that triggered the smart grid attack detection and were correctly identified as abnormal alarms and false negative (FN) refers to the correct type of numbers that triggered the smart grid attack detection and were mistakenly identified as an abnormal alarm. The test error refers to the difference between the actual output type and the expected output type, TO j is the jth expected output value, TR j is the jth actual output value and the accuracy rate refers to the proportion of all types that are correctly detected to account for all types. The false negative rate refers to the percentage of the abnormal types that are mistakenly recognized as normal types in all abnormal types, the detection rate refers to the proportion of correctly detected types (T) in the total number of types (S), the false alarm rate refers to the proportion of normal types that are mistakenly recognized as abnormal types in all normal types and the detection accuracy refers to the proportion of correctly identified normal types in all those identified as normal types. The definition of the indicator is as follows: Correct rate = (TP + TN) (TP + FP + FN + TN) False negative rate = FP (FP + TN) Detection rate = T S False alarm rate = FN (FN + TP) Detection accuracy = TP (TP + FP)

ELM Intrusion Detection Algorithm Based on Optimization
For the traditional ELM, the input weight from the input layer to the hidden layer and the offset of the hidden layer neurons are assigned by random initialization, and the input weight is obtained by the least squares method. During training, only the number of hidden layer neurons needs to be determined. The output weight of the model can be directly obtained, and the training of the model can be completed, which greatly improves the training speed of the model. Smart grid intrusion detection requires fast training and detection speed. The over-learning machine has a unique advantage, but due to the randomness of input weights and hidden layer bias, cannot solve the problem of gradient descent and other problems, and it cannot guarantee that the performance of the over-learning machine intrusion detection model is optimal. However, the network security of the smart grid is of great importance. It requires training, and the detection is fast, so that it can respond to the security situation of the smart grid in a timely manner. At the same time, it is required to make a correct response to avoid the waste of human and financial resources, so the performance of the model's intrusion detection is crucial. Genetic algorithm is a classic optimization algorithm based on the natural selection mechanism and genetics principles. It can excellently optimize the model through operations such as selection, crossover and mutation. In order to take advantage of the rapid training of the over-learning machine and, also, to further improve the classification performance of the over-learning machine model when it is faced with smart grid big data, it is fast and has high accuracy. Based on this, we proposed a genetic algorithm-based ELM intrusion detection algorithm, GA-ELM. The genetic algorithm was used to optimize the input weight and hidden layer bias of the extreme learning machine in order to obtain the optimal intrusion detection model of the extreme learning machine.
The steps of the intrusion detection algorithm based on GA for the ELM are as follows: (1) Population initialization Let the size of the population be S-that is, there are S chromosomes, where each chromosome x i includes A·B input weight from the input layer to the hidden layer and B offset of the hidden layer. Further, the initial population is the first-generation population of this genetic algorithm, as follows: In the formula, a ij is the input weight from the input layer to the hidden layer, and b ik is the offset of the hidden layer neurons, where i = 1, 2, . . . , m, j = 1, 2, . . . , A and k = 1, 2, . . . , B; the weight and offset of the first-generation population are obtained by random initialization.
(2) Fitness setting and calculation In the GA-ELM, the genetic algorithm optimizes the input weight of the extreme learning machine ω i and the offset of the hidden layer neurons b i . Each chromosome x i of the population is composed of ω i and b i , and the expression of x i is: According to the theory of the extreme learning machine, the least squares method can be used to obtain the output weight matrix from the hidden layer to the output layer: Setting O j as the actual output value of the j chromosome and target output value T j , the fitness function of the intrusion detection algorithm of the ELM based on genetic learning is defined as follows:

(3) Parameter optimization
Calculate the fitness of each chromosome in the population, and select, cross and mutate the population according to the fitness to update the new generation of the population. When the number of optimizations reaches the maximum genetic algebra, the GA to the input weights of the ELM and the offset optimization of the hidden layer are ended, and the optimal parameters are selected as the output weights from the input layer of the extreme learning machine to the hidden layer and the offset of the hidden layer neurons.
According to the above steps, the flow chart of the intrusion detection algorithm based on the genetic algorithm for the extreme learning machine is shown in Figure 2.
Energies 2020, 13, x FOR PEER REVIEW 11 of 19 output weights from the input layer of the extreme learning machine to the hidden layer and the offset of the hidden layer neurons. According to the above steps, the flow chart of the intrusion detection algorithm based on the genetic algorithm for the extreme learning machine is shown in Figure 2.

Initial genetic algorithm
The end condition is met or not？ After receiving the data from the smart grid communication network, the genetic algorithmbased extreme learning machine intrusion detection system initializes the extreme learning machine intrusion detection model. Then, the genetic algorithm is used to input weights and a hidden layer of neurons in the ELM. Parameter optimization is performed to obtain the optimal intrusion detection model. Finally, the intrusion is detected by the trained model to form corresponding indicators, determine whether to trigger the alarm system and report the detection results to the data processing center. This algorithm not only guarantees the requirements of the smart grid for model training speed but, also, further improves the detection accuracy through the genetic algorithm. After receiving the data from the smart grid communication network, the genetic algorithm-based extreme learning machine intrusion detection system initializes the extreme learning machine intrusion detection model. Then, the genetic algorithm is used to input weights and a hidden layer of neurons in the ELM. Parameter optimization is performed to obtain the optimal intrusion detection model. Finally, the intrusion is detected by the trained model to form corresponding indicators, determine whether to trigger the alarm system and report the detection results to the data processing center. This algorithm not only guarantees the requirements of the smart grid for model training speed but, also, further improves the detection accuracy through the genetic algorithm.

Experiment
This experiment is tested on the KDDCUP99. In order to verify the effectiveness and superiority of the GA-ELM, it is compared with the intrusion detection algorithm based on the extreme learning machine and the online sequential (OS)-ELM [47] for comparison.

Intrusion Detection Technology Based on the GA-ELM
Based on the GA-ELM intrusion detection algorithm steps, the first set of the experimental parameters of the GA-ELM algorithm, by referring to experience and testing multiple parameters, we selected a set of relative optimal parameters for setting and set the population size to 20. The maximum genetic algebra is 100, the mutation probability is 0.01 and the crossover probability is 0.7. After a lot of experiments, when the number of the hidden layer of neurons is 480, the performance of the GA-ELM intrusion detection model is better, and the excitation function of the hidden layer of neurons adopts the Sigmoid function. Further, the evolution process of intrusion detection based on the GA-ELM is shown in Figure 3. It can be drawn from the figure that the GA-ELM algorithm performed 100 iterations. At the 0th iteration, it is the original ELM. The test error gradually decreases with the increase of the number of iterations, indicating that the GA-ELM algorithm is effective, and the input weight of the original ELM and the offset of the hidden layer neurons are effectively optimized. The confusion matrix is shown in Figure 4. For the convenience of comparison, the confusion matrix based on the intrusion detection technology of the ELM is given. The number of hidden neurons is also 480, as shown in Figure 5. According to the confusion matrix, we can understand the specific situation of the GA-ELM algorithm for normal, DoS, probing, R2L and U2R detection, from which we can initially see that GA-ELM algorithm-based intrusion detection has normal, DoS, probing, R2L and U2R detection; the detection rate is higher than the intrusion detection based on the ELM. When faced with a small sample intrusion of R2L and U2R, compared with the ELM, the GA-ELM increased the detection rate of R2L by 12.8% and the detection rate of U2R by 23.1%. The effect is remarkable. Furthermore, through the confusion matrix, technical indicators such as the correct rate, false alarm rate, detection rate and detection accuracy can be calculated, which will be further demonstrated in Section 6.

Experiment
This experiment is tested on the KDDCUP99. In order to verify the effectiveness and superiority of the GA-ELM, it is compared with the intrusion detection algorithm based on the extreme learning machine and the online sequential (OS)-ELM [47] for comparison.

Intrusion Detection Technology Based on the GA-ELM
Based on the GA-ELM intrusion detection algorithm steps, the first set of the experimental parameters of the GA-ELM algorithm, by referring to experience and testing multiple parameters, we selected a set of relative optimal parameters for setting and set the population size to 20. The maximum genetic algebra is 100, the mutation probability is 0.01 and the crossover probability is 0.7. After a lot of experiments, when the number of the hidden layer of neurons is 480, the performance of the GA-ELM intrusion detection model is better, and the excitation function of the hidden layer of neurons adopts the Sigmoid function. Further, the evolution process of intrusion detection based on the GA-ELM is shown in Figure 3. It can be drawn from the figure that the GA-ELM algorithm performed 100 iterations. At the 0th iteration, it is the original ELM. The test error gradually decreases with the increase of the number of iterations, indicating that the GA-ELM algorithm is effective, and the input weight of the original ELM and the offset of the hidden layer neurons are effectively optimized. The confusion matrix is shown in Figure 4. For the convenience of comparison, the confusion matrix based on the intrusion detection technology of the ELM is given. The number of hidden neurons is also 480, as shown in Figure 5. According to the confusion matrix, we can understand the specific situation of the GA-ELM algorithm for normal, DoS, probing, R2L and U2R detection, from which we can initially see that GA-ELM algorithm-based intrusion detection has normal, DoS, probing, R2L and U2R detection; the detection rate is higher than the intrusion detection based on the ELM. When faced with a small sample intrusion of R2L and U2R, compared with the ELM, the GA-ELM increased the detection rate of R2L by 12.8% and the detection rate of U2R by 23.1%. The effect is remarkable. Furthermore, through the confusion matrix, technical indicators such as the correct rate, false alarm rate, detection rate and detection accuracy can be calculated, which will be further demonstrated in Section 6.

Intrusion Detection Technology Based on the OS-ELM
According to the principle of the ELM, only the number of hidden-layer neurons needs to be determined during training, but in the training stage, it needs to use all the data for training and testing. While the dataset capacity is large in the field of smart grids, if each training needs to obtain all the original data, it will take a long time and is inefficient. Based on this, the team of Professor Guang-bin Huang proposed an online sequential ELM. The OS-ELM can continuously update the output weight with the arrival of new data, instead of retraining based on the entire dataset, which greatly improved the generalization of the ELM ability. Hence, we applied OS-ELM to the field of smart grid intrusion detection as a comparison experiment between the GA-ELM-based intrusion detection algorithm and ELM-based intrusion detection algorithm. In the experiment, the numbers of hidden-layer neurons of the OS-ELM and GA-ELM are set to 480, keeping the same with the ELM. In addition, the initial training set of the OS-ELM is 5000, updated 1000 data each time, until 10,000 data updates are completed, and the model training is completed. The confusion matrix is shown in Figure 6.

Intrusion Detection Technology Based on the OS-ELM
According to the principle of the ELM, only the number of hidden-layer neurons needs to be determined during training, but in the training stage, it needs to use all the data for training and testing. While the dataset capacity is large in the field of smart grids, if each training needs to obtain all the original data, it will take a long time and is inefficient. Based on this, the team of Professor Guangbin Huang proposed an online sequential ELM. The OS-ELM can continuously update the output weight with the arrival of new data, instead of retraining based on the entire dataset, which greatly improved the generalization of the ELM ability. Hence, we applied OS-ELM to the field of smart grid intrusion detection as a comparison experiment between the GA-ELM-based intrusion detection algorithm and ELM-based intrusion detection algorithm. In the experiment, the numbers of hiddenlayer neurons of the OS-ELM and GA-ELM are set to 480, keeping the same with the ELM. In addition, the initial training set of the OS-ELM is 5000, updated 1000 data each time, until 10,000 data updates are completed, and the model training is completed. The confusion matrix is shown in Figure 6. It can be seen from the confusion matrix that the detection rate of intrusion detection based on the OS-ELM algorithm for normal, DoS, probing, R2L and U2R is lower than that based on the GA-ELM algorithm. In U2R, compared with the OS-ELM, the GA-ELM algorithm increased the detection rate of R2L by 7.8% and the detection rate of U2R by 11.6%, which has a significant effect. Furthermore, through the confusion matrix, technical indicators such as the correct rate, false alarm rate, detection rate and detection accuracy can be calculated, which will be further demonstrated in Section 6.3.

Comparison of Detection Indexes
First, the variation of the test errors of the three algorithms GA-ELM, OS-ELM and ELM, with the increase of the number of hidden layer neurons, is investigated. In addition, according to the It can be seen from the confusion matrix that the detection rate of intrusion detection based on the OS-ELM algorithm for normal, DoS, probing, R2L and U2R is lower than that based on the GA-ELM algorithm. In U2R, compared with the OS-ELM, the GA-ELM algorithm increased the detection rate of R2L by 7.8% and the detection rate of U2R by 11.6%, which has a significant effect. Furthermore, through the confusion matrix, technical indicators such as the correct rate, false alarm rate, detection rate and detection accuracy can be calculated, which will be further demonstrated in Section 6.3.

Comparison of Detection Indexes
First, the variation of the test errors of the three algorithms GA-ELM, OS-ELM and ELM, with the increase of the number of hidden layer neurons, is investigated. In addition, according to the confusion matrix in the previous two sections, the specific conditions of the classification of the three algorithms GA-ELM, OS-ELM and ELM in smart grid intrusion detection can be obtained intuitively and, based on this, the correct rate of technical indicators such as the false alarm rate, detection rate and detection accuracy. For the above three algorithms, this section will compare their error detection number, detection rate and other detection indicators.

Influence of the Number of Hidden Layer Neurons on the Test Error
While considering the performance of the model, the number of hidden-layer neurons in the three algorithms GA-ELM, OS-ELM and ELM is set to be between 120 and 480 in view of the effect of the number of hidden-layer neurons on the test error. The step length is 40, with a total of 10 experiments; the test results are shown in Figure 7.
Energies 2020, 13, x FOR PEER REVIEW 15 of 19 confusion matrix in the previous two sections, the specific conditions of the classification of the three algorithms GA-ELM, OS-ELM and ELM in smart grid intrusion detection can be obtained intuitively and, based on this, the correct rate of technical indicators such as the false alarm rate, detection rate and detection accuracy. For the above three algorithms, this section will compare their error detection number, detection rate and other detection indicators.  It can be seen from Figure 7 that the test error of the three algorithms gradually decreases with the growth of the number of hidden-layer neurons. Further, the test error of the GA-ELM algorithm is the lowest, indicating that, after the genetic algorithm, the input of the ELM network model, the weights and the offset of the hidden layer neurons are optimized. Under the same test error, the number of hidden-layer neurons of the GA-ELM is much less than that of the OS-ELM and the ELM. It shows that the GA-ELM's network structure is more compact than the OS-ELM and ELM. In addition, the test errors of the OS-ELM and the ELM are not much different, but the OS-ELM can update the model faster. In the smart grid, it is better than the ELM algorithm-more applicable. In summary, the test error of GA-ELM is lower, the network structure is more compact and it has greater universality for the smart grid.

Evaluation of Other Indicators
According to the confusion matrix, technical indicators such as the correct rate, false alarm rate, detection rate and detection accuracy can be calculated. For the above three algorithms, this section It can be seen from Figure 7 that the test error of the three algorithms gradually decreases with the growth of the number of hidden-layer neurons. Further, the test error of the GA-ELM algorithm is the lowest, indicating that, after the genetic algorithm, the input of the ELM network model, the weights and the offset of the hidden layer neurons are optimized. Under the same test error, the number of hidden-layer neurons of the GA-ELM is much less than that of the OS-ELM and the ELM. It shows that the GA-ELM's network structure is more compact than the OS-ELM and ELM. In addition, the test errors of the OS-ELM and the ELM are not much different, but the OS-ELM can update the model faster. In the smart grid, it is better than the ELM algorithm-more applicable. In summary, the test error of GA-ELM is lower, the network structure is more compact and it has greater universality for the smart grid.

Evaluation of Other Indicators
According to the confusion matrix, technical indicators such as the correct rate, false alarm rate, detection rate and detection accuracy can be calculated. For the above three algorithms, this section will compare their error detection number, detection rate and other detections. The indicators are shown in Tables 2-4, respectively.   Table 4. Comparison of the detection indexes of the three methods. It can be seen from Tables 2-4 that the GA-ELM-based intrusion detection algorithm is lower than the OS-ELM-based intrusion detection algorithm and the ELM-based intrusion detection number and false alarm rate. The intrusion detection algorithm is higher than the other two in terms of accuracy. It can be seen that the genetic learning-based extreme learning machine algorithm can effectively optimize the input weight of the ELM and the offset of hidden-layer neurons. By improving the performance of the model, it maintains the training speed of the extreme learning machine while better augmenting the accuracy in a smart grid intrusion detection.

Conclusions
In this paper, a thorough analysis of various applications of the extreme learning machine to the field of smart grid AMI intrusion detection was provided, along with a discussion around the impact of over-learning the input weight and randomness of the hidden layer bias. Featuring its prominent optimization ability about classification performance, the genetic algorithm was proposed as a solution to raise model accuracy based on the fast training and detecting speed already secured by the over-learning machine. By mapping the input weight in the hidden layer bias map to chromosome vectors and utilizing the test errors as fitness functions in the genetic algorithm, this GA-based algorithm was able to optimize the parameters based on genetic traits, hence enhancing the performance of an over-learning model. The best performance could be achieved by setting up input weight and hidden-layer neurons corresponding to the minimized testing error. This paper also presented detailed examinations and comparisons on three different algorithms, including the OS-ELM, ELM and GA-ELM. It was shown that the GA-ELM most effectively augmented the performance of the ELM network, creating a more compact network structure while further improving the accuracy, precision and detection rate of smart grid intrusion detection. This algorithm greatly diminished the false alarm rate and omission alarm rate, which represents a new way of combining algorithm usage in the field of smart grid intrusion detection.
In the follow-up work, when conditions are ripe, we will further collect real smart grid network intrusion data and research smart grid security strategies.