Next Article in Journal
The Stabilization of a Nonlinear Permanent-Magnet- Synchronous-Generator-Based Wind Energy Conversion System via Coupling-Memory-Sampled Data Control with a Membership-Function-Dependent H Approach
Previous Article in Journal
A Method of Adaptive Power Allocation for Fuel Cell Hybrid Power Systems Taking into Account Stack Performance
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine-Learning-Based Anomaly Detection for GOOSE in Digital Substations

by
Hong Nhung-Nguyen
1,†,
Mansi Girdhar
2,†,
Yong-Hwa Kim
3,* and
Junho Hong
2
1
Department of AI and Software Enineering, School of Computing, Gachon Unviersity, Seongnam-si 1342, Gyeonggi-do, Republic of Korea
2
Department of Electrical and Computer Engineering, University of Michigan-Dearborn, Dearborn, MI 48128, USA
3
Department of Artificial Intelligence, Korea National University of Transportation, Uiwang-si 16106, Gyeonggi-do, Republic of Korea
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Energies 2024, 17(15), 3745; https://doi.org/10.3390/en17153745
Submission received: 14 June 2024 / Revised: 19 July 2024 / Accepted: 26 July 2024 / Published: 29 July 2024
(This article belongs to the Section F: Electrical Engineering)

Abstract

:
Digital substations have adopted a high amount of information and communication technology (ICT) and cyber–physical systems (CPSs) for monitoring and control. As a result, cyber attacks on substations have been increasing and have become a major concern. An intrusion-detection system (IDS) could be a solution to detect and identify the abnormal behaviors of hackers. In this paper, a Deep Neural Network (DNN)-based IDS is proposed to detect malicious generic object-oriented substation event (GOOSE) communication over the process and station bus network, followed by the multiclassification of the cyber attacks. For training, both the abnormal and the normal substation networks are monitored, captured, and logged, and then the proposed algorithm is applied for distinguishing normal events from abnormal ones within the network communication packets. The designed system is implemented and tested with a real-time IEC 61850 GOOSE message dataset using two different approaches. The experimental results show that the proposed system can successfully detect intrusions with an accuracy of 98%. In addition, a comparison is performed in which the proposed IDS outperforms the support vector machine (SVM)-based IDS.

1. Introduction

ICT penetration and significant advancements in electronic equipment in the power sector have helped traditional power systems advance significantly toward an intelligent grid [1]. Due to the inter-connectedness and synergy between various cyber–physical entities and layers of the grid infrastructure, as well as the easier accessibility of IEEE 802.3 Ethernet-based networks, it has become a valuable target for knowledgeable adversaries [2]. Since the substation automation system (SAS) can accommodate a number of remote protection, monitoring, and control tasks, it has led to numerous cyber vulnerabilities that have compromised the confidentiality, integrity, and availability (CIA) of data [3]. Exploiting these security flaws can corrupt a CPS or cause communication and/or physical devices to malfunction, which might cause a cascade event or a blackout. For instance, the Ukraine power grid’s supervisory control and data acquisition (SCADA) systems and computers were the subject of a large cyber attack in 2015 that used the “BlackEnergy” malware. This attack resulted in temporary power disruptions and remote substation switching. With a dramatic escalation in cyber attacks, cyber security has become predominant for rendering reliability and security in power substations. Therefore, it is critical to analyze the cyber security vulnerabilities of a substation.
GOOSE is one of the communication protocols described in the IEC 61850 standard [4], which replaces the traditional hardwired signals with Ethernet-based communications and provides direct, reliable, and high-speed data transmission between intelligent electronic devices (IEDs) within a substation and between substations [5,6,7]. It is a time-critical and highly essential communication protocol for transmitting time-sensitive and high-priority information, e.g., protection-tripping commands across substations. Moreover, due to the plaintext and multicast nature of the GOOSE messages, they are highly susceptible to cyber intrusions if poorly secured and so could render notable damage to electric power systems [8]. GOOSE messages are transmitted in plaintext, making them easily readable and modifiable by malicious actors intercepting the network traffic. This lack of encryption poses a significant risk of data breaches and unauthorized modifications. The multicast nature of GOOSE messages means that they are broadcasted to multiple devices, increasing the potential attack surface. A hacker needs to compromise with only one device to intercept or inject malicious messages into the network. Many researchers and white hat hackers have shown a lot of cyber attacks on GOOSE protocols in the past. To substantiate, the authors of [9] demonstrated a layer-two-based spoofing cyber attack against GOOSE messages, while a denial-of-service (DoS) attack is implemented in [10], which causes GOOSE poisoning over the subscriber IED. Also, the original version of IEC 61850 and other protocols do not host cyber security features. Therefore, it brought many concerns, as the communication infrastructure may allow hackers to gain access, establish malicious communications in a substation, and impact its behavior.
In order to overcome this limitation, the IEC 62351 security standard was published to manage the security conditions of the communication standards [11]. The literature has identified a number of ICT measures that aim towards securing GOOSE message exchanges by implementing the recommendations of the IEC 62351-6 security standard. For instance, the first version proposed an authentication scheme using the Rivest–Shamir–Adleman (RSA) digital signature [12]. The latest version proposed more efficient authentication algorithms, such as the asymmetric cryptography-based hash-based message authentication code (HMAC) and Galois message authentication code (GMAC), as outlined in [13]. Many substations still operate with legacy equipment that lacks modern security features. Updating such equipment to comply with the latest security standards (IEC 62351) is a continuous and resource-intensive process, involving firmware and hardware upgrades that can disrupt normal operations [14]. For example, to ensure secure end-to-end communications, it may be required to revise the passwords or certificates on several power tools, e.g., energy meters, switches, and controllers at power plants and substations created years ago.
Additionally, security measures that have been put in place need to be regularly updated and maintained to strengthen communication in line with changing security risks. It might involve implementing cyber security features into IED software architecture or updating various outdated security methods. However, the low computing capacity of IEDs prevents proprietary vendors from putting these security measures into practice. The latency restrictions inherent to GOOSE protocol operations further complicate the integration of advanced security features.
To date, neither the IEC 62351 recommendation nor exclusive vendor solutions have been fully adopted to enhance GOOSE communication security. As a result, many GOOSE protocol implementations are still open to hacker attacks. Cyber security design incorporates the idea of subsidiary schemes, e.g., an IDS, for detecting the aberrant behavior of remote hackers in addition to the existing encryption and authentication procedures [15]. IDSs operate as the second line of defense and detect intrusions using the anomaly-detection approach and the signature-based detection method [16]. To secure the GOOSE-based grid communication, a variety of knowledge-based IDSs have already been proposed (elaborated on in Section 2).
However, they have some reservations given that the focus of these investigations is statistical analysis based on the characteristics of the most recent GOOSE communications. Additionally, even if the transmissions adhere to the GOOSE semantics, these works fail to detect the malicious GOOSE packets that are injected between the two time frames. Therefore, for efficient operation, it would be ideal if the subscriber IED had knowledge of potential attack scenarios so that it could locate the erroneous messages more accurately. Consequently, the subscriber IEDs can be programmed with the desired intelligence using machine learning (ML)-based techniques.
The implementation of an ML-based IDS for detecting anomalies in GOOSE communication plays a crucial role in addressing the growing cyber security threats faced by substations and enhancing the overall resilience of power systems. By providing the early detection of anomalies, adaptability to new threats, a reduction in false positives, and an automated threat response, the IDS ensures robust protection against cyber attacks. Furthermore, its continuous monitoring, real-time alerts, improved incident response, enhanced situational awareness, and support for compliance collectively strengthen the resilience and reliability of power systems.
Existing IDSs predominantly rely on statistical analysis and predefined rules to detect intrusions. While these methods are useful against general attack patterns, they work to identify sophisticated or unknown attacks, particularly those adhering to the semantic requirements of the IEC 61850 protocol but harboring malicious intent. The high rates of false positives and negatives in current IDS solutions present a significant challenge. False positives can inundate operators with excessive alerts, while false negatives allow attacks to go undetected, thereby compromising system security. Additionally, many IDSs lack the capability for real-time analysis, which is crucial for detecting and mitigating attacks on time-critical protocols like GOOSE. Delays in detecting and responding to intrusions can lead to significant damage before any corrective actions are implemented. Most research efforts have focused on developing IDSs for specific components of smart grids, such as SCADA systems or advanced metering infrastructure (AMI), with limited attention to the security of GOOSE messages. The few studies addressing GOOSE security often lack comprehensive evaluations or practical implementations, resulting in gaps in the understanding of real-world applicability and effectiveness. Although ML offers promising solutions for anomaly detection, its application to GOOSE message security remains nascent. Current ML-based IDS proposals, such as those discussed in [17], are limited in scope and do not fully leverage the potential of advanced ML algorithms to detect subtle and sophisticated attacks. This underscores the need for more comprehensive and practical ML-based solutions to enhance the security of GOOSE messages in smart grids.
DNNs have become a cornerstone in the development of advanced IDSs for the cyber security of GOOSE messages in substations. Despite their effectiveness, DNNs are susceptible to adversarial attacks (e.g., evasion attacks and poisoning attacks), where small, often imperceptible perturbations in input data can lead to incorrect predictions or classifications by the model. This vulnerability poses significant risks in a substation environment where the integrity and reliability of GOOSE messages are paramount. To counter these vulnerabilities, several mitigation strategies can be employed, e.g., adversarial training, defensive distillation, and feature squeezing. Given the evolving nature of adversarial tactics, it is crucial to maintain a proactive stance in IDS development. Continuous research and development efforts, coupled with regular updates to the IDS model, are essential to stay ahead of emerging threats. Incorporating the latest advancements in adversarial machine learning will further enhance the robustness and reliability of DNN-based IDSs for GOOSE message security.
The Deep Neural Network (DNN) model processes GOOSE packets by following a comprehensive series of steps. Initially, the model performs feature extraction, where it identifies and isolates relevant attributes or patterns from the GOOSE packets. Following this, the extracted features undergo transformation, a process where they are converted or normalized to a format suitable for further analysis. Finally, the transformed features are fed into a classification algorithm within the DNN, which evaluates them to make informed decisions or predictions about the GOOSE packets, thus enabling effective analysis and response.
The main contributions of this work are outlined as follows: (1) STRIDE threat modeling is presented to identify the cyber vulnerabilities associated with a digital substation; (2) a novel DNN-based IDS is developed to distinguish malicious IEC 61850 GOOSE messages; and (3) the performance of the proposed IDS is estimated against the traditional ML classification methods, e.g., SVM.
The remainder of this paper is organized as follows: Section 2 provides some background on the state-of-the-art IDSs associated with the security of GOOSE messages and their drawbacks. Section 3 discusses potential GOOSE vulnerabilities that an adversary can exploit to impact the grid operations, followed by STRIDE threat modeling and an attack tree. In Section 4, the proposed DNN-based IDS for detecting abnormalities in the GOOSE communication is designed and implemented. Section 5 shows the experimental validation, and a comparison is made between the performance of the proposed IDS and SVM-based IDS. Finally, the paper draws its conclusions in Section 6.

2. Related Works

Digitization has greatly raised cyber risks and attacks on intelligent grids [18]. As mentioned, the main communication standard IEC 61850 is susceptible to a wide range of cyber attacks, including injection attacks, DoS assaults, spoofing, replay attacks, and eavesdropping [19]. In addition to applying fundamental IEC 62351 security measures, substantial research has been conducted in the past on building extremely effective and lightweight IDSs with a focus on securing smart grids, advanced metering infrastructure (AMI), SCADA systems, and substations [20,21].
This section presents the existing literature on cutting-edge IDSs, discusses their shortcomings, and provides a succinct comparison of earlier efforts and the suggested IDS. For instance, a survey presented in [22] focuses on IDS implementations for protocols specified by the IEC–61850 standards (i.e., GOOSE, SV, and MMS). Similarly, the work of [23] focused on active power-limiting attacks using the manufacturing message specification (MMS) protocol and developed a signature-based IDS for substations. The authors created a behavior-based IDS and used it to examine GOOSE and MMS protocols to find unusual occurrences in [24]. In a different study [25], the researchers developed a specification-based IDS devoted to protecting substations utilizing IEC 61850-based MMS, GOOSE, and SV communication. The suggested system discovered a number of cyber attacks, including DoS, man in the middle (MITM), and packet injection assaults. An integrated anomaly-detection system (ADS) is used in another example [26] to safeguard IEC 61850 substations by identifying anomalous multicast GOOSE and SV messages. In another study [27], the researchers presented a statistical anomaly-detection method to detect DoS attacks against GOOSE network communication. Lastly, the authors developed ED4GAP, a network-level system for the efficient detection of GOOSE-based poisoning attacks in [28].
Although there has been some dedicated research conducted on designing different IDSs for the smart grid domain in the literature, there are huge gaps that have not been fully addressed. The internal processing cost and the increased hardware design constraints are the key negatives. Particularly, the suggested IDSs successfully identify replay attacks, DoS assaults, and other malicious GOOSE and protocol-related activities.
However, due to their high specialization, poor recall, and greater rate of false negatives, these IDSs have a restricted ability to detect attacks and are unable to identify masquerade attacks. Additionally, the earlier research lacked numerical analysis and a thorough description of the IDSs’ design. ICSs, including SCADA systems, are characterized by rhythmic communication patterns, and ML algorithms may readily exploit these patterns, which are imperceptible to humans. Additionally, by developing comparable data-driven models of typical behavior, ML algorithms can analyze enormous amounts of cyber security statistics and identify anomalous events. Thus, a wide range of cyber attack situations can be covered by ML-based IDSs. These ML algorithms, also referred to as cyber security data science, have more advanced computing, storage, and data-gathering capabilities [29]. As a result, increasing focus is being placed on researching ML applications for intrusion detection in multicloud setups.
In the literature, ML algorithms have been designed for intrusion detection in SCADA systems. To substantiate, the researchers of [30,31] employed conventional neural network (CNN)-based approaches for network intrusion detection in SCADA networks. Some existing efforts include the proposal of an ML-based IDS for identifying unauthorized communication messages. For example, the work of [32] identified malicious MMS messages with a CNN-based IDS, while the study of [33] detected malicious SV messages by utilizing decision tree (DT), random forest (RF), and artificial neural network (ANN)-based IDSs. To the best of the authors’ knowledge, there are no proposals that employ ML-based IDSs to detect unusual GOOSE communication, except in [17], where the proposed IDS distinguishes the normal and abnormal GOOSE transmission over the process bus network using different ML algorithms, e.g., k-nearest neighbors (KNN), DT, RF, SVM, and adaptive boosting (AdaBoost).
Furthermore, it assembles assessments of accuracy and detection times using the GOOSE packet frame’s sequence number (sqNum) and state number (stNum) features. But it could fail to detect GOOSE attacks that follow the semantic requirements of IEC 61850 (e.g., increase stNum and reset the sqNum for the new data). In order to increase efficiency and accuracy, the postulated IDS evaluates the features that were taken from the GOOSE datasets through this study. Additionally, it makes use of the DNN to categorize STRIDE cyber attacks. The outcomes of the IDS with numerous packets have further proven the viability of the suggested approach. It is clear that the suggested IDS extracts more features for the model, hence increasing its effectiveness with repeated experimentation.

3. Cyber Security Vulnerability of a Digital Substation

A substation communication network is not only at risk from physical attacks; it is also vulnerable to cyber attacks. Since most substations are unmanned, they can be remotely accessed for control and maintenance operations. This makes it simpler for an adversary to reach the cyber layer utilizing different wireless substation network mechanisms. Furthermore, these remote access points (RAPs) may not be equipped with sufficient security measures, e.g., a firewall that is improperly configured or has weak passwords, making them potential points of vulnerability that hackers could exploit to trip multiple circuit breakers (CBs) without authorization or gain access to crucial data [34]. Thus, it is essential to identify known vulnerabilities and potential cyber security threats in a substation in order to design defenses against nefarious attacks.

3.1. Cyber–Physical System of a Digital Substation

A high-level summary of an SAS is displayed in Figure 1. The station, bay, and process levels make up the substation’s organizational structure. A user interface, gateways, databases, routers, servers, firewalls, global positioning system (GPS), workstations, engineering facilities, and RAPs are a few examples of the equipment that may be present at the station level. These devices enable communication between substations and external systems. Protection and control (P&C) IEDs and phasor measuring units (PMUs) are bay-level equipment. The sensors, actuators, switchyard equipment, current transformer (CT), voltage transformer (VT), and CBs are installed at the substation field. A digital substation employs IEC 61850-based communication protocols, e.g., GOOSE, SV, and MMS. GOOSE, with a peer-to-peer system, is utilized for sending tripping signals from protective IEDs to CBs over the process bus network. The voltage and existing samples are transmitted between combining units (MUs) and IEDs. MMS, with client and server structure, is utilized to track, control, and report between the user interface and the IEDs. The proposed ML-based IDS is located at the bay level, so it can access both station and process bus networks.
The test setup includes the Omicron, merging unit (MU), protection intelligent electronic devices (PIEDs), and Ethernet LAN switches for both the process bus and station bus, as depicted in Figure 2. The Omicron generates analog three-phase currents (I) and voltages (V), which are received by the MU. The MU then sends sampled values (SVs) over the process bus switch to the PIEDs. If an overcurrent fault is detected, the PIED (specifically the SEL 421) uses GOOSE messages to instruct the Omicron to trip the circuit breaker (CB). This CB trip command, utilizing GOOSE messages, is transmitted via the station bus switch. Similarly, a hacker can generate malicious GOOSE packets.

3.2. STRIDE Modeling

Current works have demonstrated that cyber attacks on the infrastructure are possible and can have severe financial, privacy, and safety harms. STRIDE threat modeling is a method to recognize the security vulnerabilities manipulated by the agents in a CPS, thereby providing reliable operations of the system by choosing the potential effect and risk associated with the exploitation of vulnerabilities and designing appropriate threat-mitigation solutions. Therefore, STRIDE is applied to estimate the complicated system design by building data-flow diagrams (DFDs) and identifying cyber threats against each system entity in the substation.
STRIDE is an iterative threat modeling tool that has been involved in many CPSs in history. A DFD is a graphical representation of the data flow and comprises various entities, as displayed in Figure 3. The trusted boundaries and untrusted boundaries demonstrate potential threats and attack surfaces. STRIDE also includes a high-level summarization of the connectivities and imaginable vulnerabilities within a digital substation [35]. It defines six different types of security threats as described below:
  • Spoofing: Described as masquerading as a legitimate source, procedure, or system entity by faking data. In a spoofing attack, an adversary can pretend to be a publisher IED and broadcast a multicast GOOSE message to the SCADA control center, therefore fooling a receiver into believing it is being expressed by a trusted reference. It could finally direct to disastrous scenarios, specifically under various faults.
  • Tampering: Refers to an unauthorized alteration of legitimate information. Also called data corruption, this attack is widely identified as a false data injection (FDI) attack. For example, the hacker executes packet sniffing or eavesdropping to understand some parameters, such as the source and destination MAC addresses and the data scope of the GOOSE frame. It then clones the MAC address of the GOOSE publisher and attempts to alter the contents of the GOOSE message by infiltrating anomalous data.
  • Repudiation: Means rejecting or disowning a specific action executed in the system. Since most of the operations at a substation are exchanged via communication protocols, there is a risk that both publishers and subscribers will decline their abnormal activities. For example, the compromised IEDs can deny the control commands from the publishing IEDs, or IEDs can drop the legitimate GOOSE messages, and this will disrupt the normal operation of the substation.
  • Information disclosure: Denoted as a data breach or unauthorized access to security-sensitive information. For instance, the communication between IEDs is highly vulnerable to interception by the threat agent. Therefore, the hacker might gain entrance to the sensitive information (e.g., header details) inside the application protocol data unit (APDU) of the GOOSE frame during the data exchange. The hacker may simulate a valid IED and utilize these details to compromise the IED processing and damage message traffic.
  • Denial of Service: Also called a flooding attack, it forces the disruption of timely entrance to network services for planned users as a consequence of a hacker’s activity to jam and overload the bus network by transmitting a continuous stream of negative traffic. The purpose of a DoS attack is the termination of the communication.
  • Elevation of Privilege: Happens when a threat actor obtains more privileges to access more data in the system as compared to the legitimate user with specified authority. Primarily, information disclosure or authentication failure may cause this. The entire substation could be compromised due to data poisoning.
STRIDE is implemented in this paper to showcase innumerable vulnerabilities that could be exploited by the adversary to intrude into the substation system using the IEC 61850 communication protocol gaps and how abnormal GOOSE packet generation could be caused by using these susceptibilities. It further underscores potential threats and attacks surfaces by using DFDs. The purpose of STRIDE is to highlight the potency of abnormal GOOSE messages generated through different attack types that could harm the operations of the substation network.

3.3. Potential Cyber Attack Scenarios

After evaluating the cyber security vulnerabilities by STRIDE, this section discusses multiple attack scenarios by which the hackers can attack and disrupt the normal operations of the substation by transmitting falsified GOOSE communication through tampering, spoofing, and DoS attacks.
Threats may arise from different sources, e.g., authorized (internal agents) or unauthorized (external agents) users. Therefore, this section demonstrates a combination of cyber security failure scenarios, including problems due to compromised tool functionality, data integrity attacks, communication failures, or human errors. It describes a cyber security analysis by identifying the adversary’s objectives of attack using a graphical, structured tree notation of multiple coupling attack leaves from each node.
Cyber security in power systems is deploying cutting-edge technology and strategies to address the hazards of malicious activities due to the constant evolution and growing complexity of cyber attacks. Therefore, it necessitates the use of visual aids, e.g., attack modeling approaches, to better comprehend or perceive cyber attacks. Bruce Schneier first introduced the attack tree, which is an essential modeling tool for threat analysis [36]. It is a technique for simulating and depicting the series of circumstances that could lead to a successful cyber attack on a host or network (e.g., a substation network). In addition, it is a way to describe potential dangers and attack routes or vectors for cyber invasions. Additionally, it enables the analysts to appreciate or recognize the underlying possible weaknesses. Figure 4 shows potential scenarios for the ways in which different cyber attacks (e.g., spoofing, DoS, and tampering) might cause unauthorized CB operations in a substation. An intruder may cause an undesirable opening of CBs using spoofing, DoS, or tampering attacks in a substation. For instance, a DoS attack can be conducted by a combination of either a semantic attack or IED poisoning, causing the substation communication channel to be hijacked and flooded. The spoofing attack fabricates GOOSE frames with a high stNum. Consequently, both attacks will lead to the discarding of legitimate GOOSE frames. Another case scenario may arise in which an IED might be shipped with a backdoor (e.g., a supply chain attack). The compromised IED may provide an external communication path to the hacker. Then, it will lead to a malicious CB switching attack. A hacker may infiltrate the substation network by compromising the substation gateway and then access the human–machine interface (HMI) so they can maliciously send crafted or modified commands to cause undesirable operations in the CB. Another attack scenario of a supply chain attack is an MITM attack, where a hacker either injects malicious codes or updates malicious firmware on critical devices (e.g., IEDs). Then, this corrupt firmware might be downloaded by different devices, including IEDs, hence compromising them and further changing the status of the CBs. To support more cyber intrusion scenarios in a digital substation, a number of attack scenarios can be developed to examine the attack vectors and potential cyber threats.

4. Proposed Deep Neural Network

In this section, we propose a DNN to build intrusion-detection models based on information on network traffic data. At first, we extract features from the GOOSE APDU, and preprocessing features are used for a DNN. Then, we propose a DNN to detect the three types of cyber attacks (i.e., spoofing, tampering, and DoS) from STRIDE threat modeling.

4.1. Data Processing

4.1.1. Data Generation

GOOSE messages are generated and periodically sent throughout the substation network when certain power system events (e.g., the tripping of CBs, line faults, transformer saturation, voltage dips, and other interruptions) occur either due to cyber attacks or otherwise. Since the proper interpretation of a fault and an event is critical for the reliability and continuous operation of the power system, an event analysis is performed to have valuable insight into the conditions and behaviors of various power system protective equipment. Consequently, based on the historical event logs, the behavior of the most recent event is analyzed by comparing it with the regular normal behavior of the power system to evaluate faults.

4.1.2. Feature Extraction

Feature extraction in DNNs is a process where the network automatically learns to identify and extract relevant features from raw input data during training. In order to apply a DNN for an IDS based on traffic data, feature extraction is an important step. In this paper, we extract information for the critical features of the GOOSE packet by using Pyshark [37]. From the data collection process, the network packets in raw format, created with Wireshark [38] and Pyshark, are unsuitable for the ML model. Therefore, to effectively apply a machine learning model to GOOSE data, we must first perform feature extraction. Feature extraction is a critical preprocessing step in the machine learning pipeline. It involves identifying and isolating the most relevant characteristics from raw data, which in this case includes network traffic related to substation events. Therefore, we extract the features in order to apply the ML model to the GOOSE data. The data are extracted in *.pcap format and saved in *.csv format, which consists of the header and raw data tracking. In Deep Learning (DL), the features of the data significantly impact the performance of the trained model. Therefore, the selected features are the most important information extracted from the GOOSE packet. Table 1 shows features extracted from the GOOSE message. As shown in Table 1, we use all nine features extracted from the packet and feed into the input of the model.

4.2. Intrusion-Detection System Based on Deep Neural Network

In the proposed method, the features are collected from the GOOSE traffic network and the DNN is used for performing training processes and classifying the connection records as normal or anomalous.
As shown in Table 1, features are extracted from the incoming GOOSE message. Besides the classification method with a training input of one packet for the model, we experiment with a model input of three packets. After data preprocessing, we perform data labeling for two cases: input data are one packet and input data are three packets for a data sample. Labeled training data are put into training with the proposed model. After the training process, the built model is evaluated with the test data file. The proposed DNN for multiclass classification is utilized to detect attacks (e.g., DoS, tampering, and spoofing) in the STRIDE threat modeling.
The DNN model contains an input layer, hidden layers, and an output layer. The output of the hidden layer is achieved by adding bias and weights and then applying the activation function.
In a DNN model, the activation function is applied to the outputs of the neurons in the hidden layers. Those get the weight matrix and pass through some kind of nonlinear activation function before passing into the next layer [39].
Here, rectified linear units (ReLUs) are employed as the activation function in the hidden layer [40] and are defined as f ( x ) = max { 0 , x } . There, x is the input to the ReLU function and m a x ( 0 , x ) returns x if x is positive; otherwise, it returns 0.
In the output layer, the output is illustrated as
z = [ z 1 , . . . , z S ] T = σ ( h ) ,
where z m is the predicted class representing the m-th category in the S classes; h = [ h 1 , . . . . , h S ] T is the output of the last fully connected layer; and σ ( h ) is the softmax function, which is presented as
z m = [ σ ( h ) ] m = e h m j = 1 S e h j .
Categorical cross-entropy is used as the loss function in the proposed DNN and is defined as
Loss = log z .
where z = z m from Equation (4) if m is the corresponding target index.
To minimize the loss function, there are variations in gradient-based optimization algorithms that were studied in previous research, such as Momentum, AdaGrad, and Adam [41,42,43]. We choose the Adam optimizer with a learning rate of 0.001 to update the learnable network parameters.

5. Performance Analysis for Intrusion Detection

In this section, we evaluate the performance analysis of intrusion detection using DNNs with a single packet and three packets of GOOSE. Evaluations are conducted to represent the performance of our proposed approach for anomaly detection. The results consolidate three case studies of GOOSE cyber intrusion and detection for (1) spoofing attacks, (2) tampering attacks, and (3) DoS attacks. The latest version of IEC 62351-6 proposes the implementation of a GOOSE replay protection state machine to identify the out-of-order stNum and sqNum. However, if the injected abnormal GOOSE packet complies with the IEC 61850-8-1 semantics, identifying the cyber attacks with this new state machine is hard. Let us assume that the current GOOSE stNum is five and sqNum is 12 and the cyber hackers injected an abnormal GOOSE with stNum six and sqNum one. However, the new GOOSE replay protection state machine (IEC 62351-6) cannot detect the anomaly of the injected GOOSE packet since it complies with the IEC 61850-8-1 semantics. Therefore, we simulated three different types of GOOSE attacks based on IEC 61850-8-1 semantics, and the proposed ML-based IDS is designed to mitigate these new problems. The training was conducted with two different strategies: (1) a single packet including normal and abnormal GOOSE messages and (2) three packets with mixed normal and abnormal packets.
To avoid overfitting in a machine learning model, the dataset is generally split into three sections: the training set, the validation set, and the test set. This method helps to confirm that the model generalizes well to new, unseen data. In our experiments, the input data were typically split into two parts: a training set and a testing set with 90% and 10% streams of GOOSE message samples of the data, respectively. In the training set, 10% of randomly selected GOOSE message stream samples are used for validation. In addition, to build and execute DNNs, we used a Tensorflow and Keras framework [39].

5.1. Intrusion Detection with a Single Packet

Detecting a DoS attack with a single GOOSE packet can be a challenging task, especially if the packet is crafted to comply with the semantics of the IEC 61850 standard since the amount of data available for analysis is limited. However, it is possible to use an ML-based IDS to detect DoS attacks with a high degree of accuracy by training the model on a diverse dataset of normal and anomalous GOOSE packets and using preprocessing techniques to extract relevant features from the GOOSE packet. Further, it can be used to detect any packet that falls outside o–f the learned distribution. In the proposed work, the training dataset includes a representative sample of legitimate GOOSE packets as well as packets that were generated artificially to simulate a DoS attack. The dataset includes packets with different sizes, contents, and source addresses to provide the model with a broad understanding of legitimate and malicious traffic. Once the DNN model is trained, it can take a single GOOSE packet as input and classify it as normal or anomalous. The model is designed to analyze the features of the GOOSE packet, such as the packet size, source and destination addresses, and timing information, to identify patterns associated with a DoS attack, and then it is used to detect any packet that falls outside of the learned distribution.
The proposed IDS employs a DNN to enhance the accuracy and efficiency of detecting cyber attacks within GOOSE message exchanges in smart grids. The structure of the DNN is carefully designed to capture complex patterns indicative of malicious activities. As shown in Table 2.
The proposed DNN is designed to process a single GOOSE packet that consists of one input layer, three hidden layers, and one output layer. Features extracted from the GOOSE packet are fed into the input layer of the network, which then processes the data through the hidden layers to learn complex patterns and representations. The final output layer produces the network’s prediction or classification based on the learned features. This architecture aims to efficiently capture and utilize the critical features within the GOOSE packet for accurate analysis and decision making.
Table 3 illustrates the experimental dataset, which consists of three attack types and a normal type. The number of normal data is 1430, and the number of DoS attacks, tampering attacks, and spoofing attacks are 300, 34, and 34, respectively.
Hyperparameter optimization, or tuning, is performed during the training process to enhance the performance of the model. In this study, a DNN is trained using various parameters, including layer type, batch size, number of hidden layers, and epochs. Specifically, the number of hidden layers is varied from two to eight, and batch sizes are adjusted among 16, 32, 64, 128, and 256. For consistent comparisons, a batch size of 64 is chosen, and each experiment is run for 150 epochs to facilitate continuous evaluation. The learning rate is fixed at 0.001 throughout the experiments to maintain stability in the training process. The best combination of these parameters is selected to achieve the highest performance by the end of the tuning process, as shown in Table 4.
Table 5 shows the performances of the DNN algorithm and SVMs, where the SVM models use the nine-feature set and the two-feature set, respectively. For the SVM with two features, two features of the GOOSE packet frame, i.e., sqNum and stNum, are used to distinguish the normal and abnormal GOOSE messages [17].
The proposed DNN achieved an accuracy of 97% in detecting attacks using a GOOSE packet. Also, the accuracy of the proposed method is 3.5% higher than that of the SVM using nine features and 12% higher than that of the SVM using two features [17]. Figure 5 shows the recognition accuracy for normal messages and for each type of attack using the confusion matrix.
Based on the results, even a single packet of a GOOSE attack (e.g., tampering and spoofing) can be detected by the proposed ML-based IDS model.
The F1-score is expressed as
F 1 s c o r e = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l ,
where,
P r e c i s i o n = T P T P + F P ,
and
R e c a l l = T P T P + F N .
Here, TP (true positive) is the number of cases correctly classified as attacks, FP (false positive) is the number of instances misclassified, and FN (false negative) is the number of instances misclassified as nonattacks.
The F1-score is an essential metric for evaluating the performance of DNNs, providing a comprehensive measure that accounts for both precision and recall. It is especially useful in scenarios involving imbalanced datasets and is widely used for model tuning and comparison. Therefore, in this study, we evaluated the proposed model based on the F1-score as shown in Table 6.
Table 6 presents the precision, recall, and F1-score for the suggested Deep Neural Network. For precision and recall, the detection scores for tampering and DoS attacks achieved identical scores of 100% and 50%, respectively. However, for the normal and DoS attacks, precision scores of 99% and 88%, respectively, are achieved. Similarly, recall scores of 99% and 100%, respectively, are obtained for the normal and DoS attacks. For the F1-score, the tampering and spoofing obtained similar scores of 67%. Despite this, it is noted that the results for the “tampering” and “spoofing” attack collections are anomalous.
To better comprehend the benefits of the proposed method, Figure 6a,b show the t-distributed stochastic neighbor embedding (tSNE) representations of vectors for the input data before training and the last hidden layer of the DNN, respectively, where tSNE is a tool that embeds high-dimensional vectors into 2D spaces [44]. As shown in Figure 6a, the abnormal and normal connection records are very close together, so it is difficult to recognize attacks using the input connection records. Figure 6b indicates that the vector of the last hidden layer of the suggested DNN is more distributed when approximated to the input vector. Therefore, the proposed DNN learned the patterns to categorize attacks. From Figure 6b, it can be seen that the features in the DNN after training are much more separable, specifically the normal and DoS features.

5.2. Intrusion Detection with Three Packets

In many previous works, ML-based methods performed intrusion detection with only one packet to detect malicious network activity or network intrusions. But this methodology is not very accurate because the patterns involved in intrusion do not appear in a single packet and are distributed for multiple packets. As a result, ML algorithms are not able to capture packets, and hence they fail to analyze network traffic. So, in the case of DoS attack detection using one packet, it becomes formidable, since a hacker can send multiple malformed packets to every port on the targeted server and the server might be unable to distinguish the forged packets as each of the transmitted packets are similar to the normal packets. Therefore, it is necessary to process multiple packets rather than a single packet for more accurate intrusion detection [45].
Figure 7 presents the process of feature extraction using three packets. It is assumed that we have n packets; accordingly, employing window sliding with three packets, we hold a total of n − 2 window labels. Here, if a window has an abnormal pattern, the window is denoted as an attack. The input vector can be defined as
X t = x t 2 T , x t 1 T , x t T ,
where x t in (2) has nine features for a GOOSE packet. Table 7 shows the number of samples for the DNN model with three packets. To evaluate our proposed method for the network IDS, we adopted the GOOSE dataset.
As shown in Table 7, the experimental dataset comprises three attack types and a normal type, where the number of normal data is 1433 and the number of DoS attacks, tampering attacks, and spoofing attacks are 360, 102, and 102, respectively.
After hyperparameter optimization, the structure of the DNN model with three packets is obtained, as shown in Table 8, where the input matrix X t is flattened into a vector to enable comparison and conduct more experiments with different types of machine learning models such as the SVM and LSTM.
Although the LSTM model with an input of three packets yields promising results, in all cases, the proposed DNN model still achieves a better performance and higher efficiency. Table 9 shows the performance comparison of the proposed DNN-based IDS and SVM-based IDS and the LSTM-based IDS in terms of classification accuracy as the evaluation metric, where the LSTM-based IDS consists of LSTM modules and an output layer for classification. As shown in Table 9, the DNN outperforms the SVM with a significant margin of 3.5% in accuracy and provides a better performance compared with the LSTM model. This is because the number of input features is as small as 27.
Figure 8a,b illustrate confusion matrices from the test set of the proposed DNN and LSTM model. In particular, we find that the attacks are distinguished effectively by the proposed DNN with three packets. In addition, the outcomes of the LSTM model and the proposed DNN model will correspond. The proposed DNN method acquires a higher accuracy than the LSTM model in terms of performance classification. The average classification accuracy of the LSTM model is 97%, whereas the proposed DNN model achieves a classification performance of 98%.
Table 10 shows the classification report of the DNN model with three packets. For precision, the detection scores for normal, DoS, tampering, and spoofing are 99%, 97%, 89%, and 100%, respectively. Concerning recall, precision scores of 100%, 94%, 80%, and 100% are obtained for normal, DoS, tampering, and spoofing, respectively. From Table 6 and Table 10, values of the F1-score obtained using the DNN with three packets for normal, DoS, tampering, and spoofing are 99%, 96%, 84%, and 100%, respectively, while the classification results of the DNN model with a single packet exhibit F1-score values of 99%, 94%, 67%, and 67% for normal, DoS, tampering, and spoofing, respectively. Hence, a DNN model with three packets achieves a higher accuracy and F1-score when compared with the single-packet approach. Although tampering’s performance is the lowest, it still shows that all the precision, recall, and F1-scores are still greater than 80%. Specifically, the precision index is 89%, recall is 80%, and F1-score is 84%. This means that it proved the effectiveness of the proposed model when used for anomaly detection with the model’s input of three packets. In our problem, the goal is to detect unusual attacks compared to normal data, so the result with tampering data combined with the accuracy of other types of attacks ultimately shows the effectiveness of the proposed method in detecting network vulnerabilities. These results validate that the proposed methods can successfully identify cyber attacks with high accuracy based on GOOSE message parameters. Additionally, it is also deduced that the model-learning process with an input shape of three packets has an effect that is considerably improved from a single packet. Hence, the model of single-packet IDSs can be used for early warning detection, and the model of a three-packet IDS can be used for the confirmation of GOOSE cyber attacks.

6. Conclusions

This paper proposes supervised learning that can identify and detect the anomalies of GOOSE messages to handle the above-mentioned problems. ML-based intrusion-detection models for GOOSE-based cyber attacks are proposed in this paper. The proposed method detects malicious GOOSE activity by using three packets. The F1-score, which was established to obtain the performance measure of imbalanced data, is used to assess the effectiveness of the proposed classification using the GOOSE dataset. The F1-score results with input data of three packets are quite high as follows: 99%, 96%, 84%, and 100%, respectively. Furthermore, a performance comparison between the DNN-based and SVM-based IDSs was conducted. The experimental results show that the DNN-based IDS outperforms traditional ML algorithms, such as the SVM, and can successfully detect cyber attacks with high performance. In addition, the proposed IDS can identify anomalies that the GOOSE replay protection state machine (defined in IEC 61850-6) cannot detect.
Future research could focus on integrating additional security measures beyond anomaly detection. Exploring methods to enhance encryption, authentication, and access-control protocols in conjunction with ML-based IDSs could further bolster the resilience of GOOSE-based communication networks against evolving cyber threats. While this study demonstrated promising results with the current dataset, future efforts could benefit from larger and more diverse datasets. Increasing the volume and variety of data would enable more robust training and validation of ML models, enhancing their generalizability and effectiveness in real-world scenarios. The real-world deployment of ML-based IDSs for GOOSE message security presents unique challenges and opportunities. Future research should focus on addressing practical deployment issues, such as system integration, scalability, performance optimization, and compliance with industry standards. Conducting field trials and case studies in operational substations would provide valuable insights into the practical implications and efficacy of the proposed IDS solutions. Continued research could involve further performance comparisons between different ML algorithms and optimization techniques. Exploring advanced neural network architectures, ensemble methods, or hybrid approaches could potentially enhance detection accuracy and efficiency compared to traditional ML algorithms like the SVM. Given the dynamic nature of cyber threats, future research should also emphasize the adaptability of IDSs to emerging attack vectors and tactics. Developing adaptive and self-learning systems capable of autonomously updating their detection capabilities based on real-time threat intelligence would be pivotal in maintaining robust cyber security defenses. In addition, to gain more experience and achieve better results, having a larger dataset and conducting more experiments would be beneficial. In future studies, we intend to increase the dataset for tampering and spoofing attacks to validate the proposed method and address the scalability of the proposed DNN-based IDS, ensuring real-time detection for large-scale digital substations with high volumes of data traffic.

Author Contributions

Conceptualization, Y.-H.K., H.N.-N., J.H. and M.G.; formal analysis, Y.-H.K., H.N.-N., J.H. and M.G.; writing—original draft preparation, Y.-H.K., H.N.-N., J.H. and M.G.; writing—review and editing, Y.-H.K., H.N.-N., J.H. and M.G.; funding acquisition, Y.-H.K. and J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Research Foundation of Korea (NRF), grant funded by the Korea Government [Ministry of Science and ICT (MSIT)] (no. 2022R1F1A1074975), and in part by the Korea Institute of Energy Technology Evaluation and Planning (KETEP) and the Ministry of Trade, Industry, and Energy (MOTIE) of the Republic of Korea (no. 20221A10100011).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Nomenclature

The following nomenclatures are used in this manuscript:
CNNConvolution neural network
ANNArtificial neural network
FFTFast Fourier transform
COMTRADECommon format for transient data exchange for power systems
FTDDFusion of time domain descriptors
NBNaive Bayes
IEDIntelligent electronic device
SSSubstation
SLGSingle line to ground fault
DLGDouble line to ground fault
DLLDouble line to line fault
TLGThree line to ground fault
t-SNEt-distributed Stochastic Neighbor Embedding

References

  1. Narayan, A.; Krueger, C.; Goering, A.; Babazadeh, D.; Harre, M.C.; Wortelen, B.; Luedtke, A.; Lehnhoff, S. Towards Future SCADA Systems for ICT-reliant Energy Systems. In Proceedings of the International ETG-Congress 2019; ETG Symposium, Esslingen, Germany, 8–9 May 2019; pp. 1–7. [Google Scholar]
  2. Elgargouri, A.; Elmusrati, M. Analysis of Cyber-Attacks on IEC 61850 Networks. In Proceedings of the 2017 IEEE 11th International Conference on Application of Information and Communication Technologies (AICT), Moscow, Russia, 20–22 September 2017; pp. 1–4. [Google Scholar] [CrossRef]
  3. Madonsela, B.; Davidson, I.E.; Mulangu, C. Advances in Telecontrol and Remote Terminal Units (RTU) for Power Substations. In Proceedings of the 2018 IEEE PES/IAS PowerAfrica, Cape Town, South Africa, 28–29 June 2018; pp. 827–832. [Google Scholar] [CrossRef]
  4. Ajjarapu, V.; Christy, C. Automated analysis of power system events. IEEE Power Energy Mag. 2005, 3, 48–55. [Google Scholar]
  5. Huang, W. A Practical Guide of Troubleshooting IEC 61850 GOOSE Communication. In Proceedings of the 2018 IEEE/PES Transmission and Distribution Conference and Exposition (T&D), Denver, CO, USA, 16–19 April 2018; pp. 1–6. [Google Scholar] [CrossRef]
  6. Aljohani, T.; Almutairi, A. Modeling time-varying wide-scale distributed denial of service attacks on electric vehicle charging stations. Ain Shams Eng. J. 2024, 15, 102860. [Google Scholar] [CrossRef]
  7. Elrawy, M.F.; Fioravanti, C.; Oliva, G.; Michael, M.K.; Setola, R. A Geometrical Approach to Enhance Security Against Cyber Attacks in Digital Substations. IEEE Access 2024, 12, 18724–18738. [Google Scholar] [CrossRef]
  8. Youssef, T.A.; Esfahani, M.M.; Mohammed, O. Data-Centric Communication Framework for Multicast IEC 61850 Routable GOOSE Messages over the WAN in Modern Power Systems. Appl. Sci. 2020, 10, 848. [Google Scholar] [CrossRef]
  9. Hoyos, J.; Dehus, M.; Brown, T.X. Exploiting the GOOSE protocol: A practical attack on cyber-infrastructure. In Proceedings of the 2012 IEEE Globecom Workshops, Anaheim, CA, USA, 3–7 December 2012; pp. 1508–1513. [Google Scholar] [CrossRef]
  10. Kush, N.; Branagan, M.; Foo, E.; Ahmed, E. Poisoned GOOSE: Exploiting the GOOSE protocol. In Proceedings of the Conferences in Research and Practice in Information Technology Series; ACS: Auckland, New Zealand, 2014; Volume 149. [Google Scholar]
  11. Hussain, S.M.S.; Ustun, T.S.; Kalam, A. A Review of IEC 62351 Security Mechanisms for IEC 61850 Message Exchanges. IEEE Trans. Ind. Inform. 2020, 16, 5643–5654. [Google Scholar] [CrossRef]
  12. Ahmed, N.; Khan, M.Z.R. A Secure IoT Based Grid-Connected Inverter using RSA Algorithm. In Proceedings of the 2021 31st Australasian Universities Power Engineering Conference (AUPEC), Perth, Australia, 26–30 September 2021; pp. 1–5. [Google Scholar] [CrossRef]
  13. Hussain, S.M.S.; Farooq, S.M.; Ustun, T.S. Analysis and Implementation of Message Authentication Code (MAC) Algorithms for GOOSE Message Security. IEEE Access 2019, 7, 80980–80984. [Google Scholar] [CrossRef]
  14. Rodríguez, M.; Lázaro, J.; Bidarte, U.; Jiménez, J.; Astarloa, A. A Fixed-Latency Architecture to Secure GOOSE and Sampled Value Messages in Substation Systems. IEEE Access 2021, 9, 51646–51658. [Google Scholar] [CrossRef]
  15. Radoglou-Grammatikis, P.I.; Sarigiannidis, P.G. Securing the Smart Grid: A Comprehensive Compilation of Intrusion Detection and Prevention Systems. IEEE Access 2019, 7, 46595–46620. [Google Scholar] [CrossRef]
  16. Chen, Y.; Hong, J.; Liu, C.C. Modeling of Intrusion and Defense for Assessment of Cyber Security at Power Substations. IEEE Trans. Smart Grid 2018, 9, 2541–2552. [Google Scholar] [CrossRef]
  17. Ustun, T.S.; Hussain, S.M.S.; Ulutas, A.; Onen, A.; Roomi, M.M.; Mashima, D. Machine Learning-Based Intrusion Detection for Achieving Cybersecurity in Smart Grids Using IEC 61850 GOOSE Messages. Symmetry 2021, 13, 826. [Google Scholar] [CrossRef]
  18. Wang, D.; Li, Y.; Dehghanian, P.; Wang, S. Power Grid Resilience to Electromagnetic Pulse (EMP) Disturbances: A Literature Review. In Proceedings of the 2019 North American Power Symposium (NAPS), Wichita, KS, USA, 13–15 October 2019; pp. 1–6. [Google Scholar] [CrossRef]
  19. Xu, Y.; Yang, Y.; Li, T.; Ju, J.; Wang, Q. Review on cyber vulnerabilities of communication protocols in industrial control systems. In Proceedings of the 2017 IEEE Conference on Energy Internet and Energy System Integration (EI2), Beijing, China, 26–28 November 2017; pp. 1–6. [Google Scholar] [CrossRef]
  20. Prisco, A.F.S.; Freddy Duitama, M.J. Intrusion detection system for SCADA platforms through machine learning algorithms. In Proceedings of the 2017 IEEE Colombian Conference on Communications and Computing (COLCOM), Cartagena, Colombia, 16–18 August 2017; pp. 1–6. [Google Scholar] [CrossRef]
  21. Tong, W.; Lu, L.; Li, Z.; Lin, J.; Jin, X. A Survey on Intrusion Detection System for Advanced Metering Infrastructure. In Proceedings of the 2016 Sixth International Conference on Instrumentation & Measurement, Computer, Communication and Control (IMCCC), Harbin, China, 21–23 July 2016; pp. 33–37. [Google Scholar] [CrossRef]
  22. Quincozes, S.E.; Albuquerque, C.; Passos, D.; Mossé, D. A survey on intrusion detection and prevention systems in digital substations. Comput. Netw. 2021, 184, 107679. [Google Scholar] [CrossRef]
  23. Kang, B.; Mclaughlin, K.; Sezer, S. Towards A Stateful Analysis Framework for Smart Grid Network Intrusion Detection. In Proceedings of the 4th International Symposium for ICS & SCADA Cyber Security Research 2016 (ICS-CSR), Belfast, UK, 23–25 August 2016. [Google Scholar] [CrossRef]
  24. Kwon, Y.; Kim, H.K.; Lim, Y.H.; Lim, J.I. A behavior-based intrusion detection technique for smart grid infrastructure. In Proceedings of the 2015 IEEE Eindhoven PowerTech, Eindhoven, The Netherlands, 29 June–2 July 2015; pp. 1–6. [Google Scholar] [CrossRef]
  25. Yang, Y.; Xu, H.Q.; Gao, L.; Yuan, Y.B.; McLaughlin, K.; Sezer, S. Multidimensional Intrusion Detection System for IEC 61850-Based SCADA Networks. IEEE Trans. Power Deliv. 2017, 32, 1068–1078. [Google Scholar] [CrossRef]
  26. Hong, J.; Liu, C.C.; Govindarasu, M. Integrated Anomaly Detection for Cyber Security of the Substations. IEEE Trans. Smart Grid 2014, 5, 1643–1653. [Google Scholar] [CrossRef]
  27. Elbez, G.; Keller, H.B.; Bohara, A.; Nahrstedt, K.; Hagenmeyer, V. Detection of DoS Attacks Using ARFIMA Modeling of GOOSE Communication in IEC 61850 Substations. Energies 2020, 13, 5176. [Google Scholar] [CrossRef]
  28. Bohara, A.; Ros-Giralt, J.; Elbez, G.; Valdes, A.; Nahrstedt, K.; Sanders, W.H. ED4GAP: Efficient Detection for GOOSE-Based Poisoning Attacks on IEC 61850 Substations. In Proceedings of the 2020 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Tempe, AZ, USA, 11–13 November 2020; pp. 1–7. [Google Scholar] [CrossRef]
  29. Hong, W.C.; Huang, D.R.; Chen, C.L.; Lee, J.S. Towards Accurate and Efficient Classification of Power System Contingencies and Cyber-Attacks Using Recurrent Neural Networks. IEEE Access 2020, 8, 123297–123309. [Google Scholar] [CrossRef]
  30. Maglaras, L.; Jiang, J. Intrusion Detection in SCADA systems using machine learning techniques. In Proceedings of the IEEE Science and Information Conference (SAI), London, UK, 27–29 August 2014. [Google Scholar] [CrossRef]
  31. Yang, H.; Cheng, L.; Chuah, M.C. Deep-Learning-Based Network Intrusion Detection for SCADA Systems. In Proceedings of the 2019 IEEE Conference on Communications and Network Security (CNS), Washington, DC, USA, 10–12 June 2019; pp. 1–7. [Google Scholar] [CrossRef]
  32. Dantas, D.T.; Li, H.; Charton, T.; Chen, L.; Zhang, R. Machine learning based anomaly-based intrusion detection system in a full digital substation. In Proceedings of the 15th International Conference on Developments in Power System Protection (DPSP 2020), Liverpool, UK, 9–12 March 2020; pp. 1–6. [Google Scholar] [CrossRef]
  33. Ustun, T.S.; Hussain, S.M.S.; Yavuz, L.; Onen, A. Artificial Intelligence Based Intrusion Detection System for IEC 61850 Sampled Values Under Symmetric and Asymmetric Faults. IEEE Access 2021, 9, 56486–56495. [Google Scholar] [CrossRef]
  34. Rajkumar, V.S.; Tealane, M.; Ştefanov, A.; Palensky, P. Cyber Attacks on Protective Relays in Digital Substations and Impact Analysis. In Proceedings of the 2020 8th Workshop on Modeling and Simulation of Cyber-Physical Energy Systems, Sydney, NSW, Australia, 21 April 2020; pp. 1–6. [Google Scholar] [CrossRef]
  35. Girdhar, M.; Hong, J.; Lee, H.; Song, T.J. Hidden Markov Models based Anomaly Correlations for the Cyber-Physical Security of EV Charging Stations. IEEE Trans. Smart Grid 2021, 13, 3903–3914. [Google Scholar] [CrossRef]
  36. Falco, G.; Viswanathan, A.; Caldera, C.; Shrobe, H. A Master Attack Methodology for an AI-Based Automated Attack Planner for Smart Cities. IEEE Access 2018, 6, 48360–48373. [Google Scholar] [CrossRef]
  37. Pyshark: Python Wrapper for Tshark, a Packet Capture Tool. 2024. Available online: https://github.com/KimiNewt/pyshark (accessed on 13 June 2024).
  38. Wireshark Foundation. Wireshark User’s Guide, 2014. Available online: https://www.wireshark.org/ (accessed on 13 June 2024).
  39. Gulli, A.; Pal, S. Deep Learning with Keras; Packt Publishing: Birmingham, UK, 2017. [Google Scholar]
  40. Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectifier Neural Networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), Ft. Lauderdale, FL, USA, 11–13 April 2011; Volume 15, pp. 315–323. [Google Scholar]
  41. Duchi, J.; Hazan, E.; Singer, Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. [Google Scholar]
  42. Zeiler, M.D. Adadelta: An adaptive learning rate method. arXiv 2012, arXiv:1212.5701. [Google Scholar]
  43. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
  44. van der Maaten, L.; Hinton, G. Viualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
  45. Gwon, H.; Lee, C.; Keum, R.; Choi, H. Network Intrusion Detection based on LSTM and Feature Embedding. arXiv 2019, arXiv:1911.11552. [Google Scholar]
Figure 1. Cyber–physical system of a digital substation.
Figure 1. Cyber–physical system of a digital substation.
Energies 17 03745 g001
Figure 2. Proposed hardware in the loop (HIL) testbed.
Figure 2. Proposed hardware in the loop (HIL) testbed.
Energies 17 03745 g002
Figure 3. STRIDE threat model of a digital substation.
Figure 3. STRIDE threat model of a digital substation.
Energies 17 03745 g003
Figure 4. Examples of tampering, spoofing, and DoS attack scenarios.
Figure 4. Examples of tampering, spoofing, and DoS attack scenarios.
Energies 17 03745 g004
Figure 5. Confusion matrix of DNN with input shape of 1 packet.
Figure 5. Confusion matrix of DNN with input shape of 1 packet.
Energies 17 03745 g005
Figure 6. Visualized data using t-distributed stochastic neighbor embedding (t-SNE) (a) with input data and (b) with a feature vector of last hidden layer for the DNN model.
Figure 6. Visualized data using t-distributed stochastic neighbor embedding (t-SNE) (a) with input data and (b) with a feature vector of last hidden layer for the DNN model.
Energies 17 03745 g006
Figure 7. Sliding window data for proposed method with 3 packets.
Figure 7. Sliding window data for proposed method with 3 packets.
Energies 17 03745 g007
Figure 8. The confusion matrix of proposed DNN and LSTM model.
Figure 8. The confusion matrix of proposed DNN and LSTM model.
Energies 17 03745 g008
Table 1. Features extracted from the GOOSE packet.
Table 1. Features extracted from the GOOSE packet.
FeatureDescriptionData Type
DestinationDestination MAC addressString
AppidApplication identificationInt
Goose_timeTime to stNum increaseInt
Goose_pkt_lengthGOOSE message lengthInt
Goose_pkt_datSetControl block referenceString
Goose_pkt_timetoliveMax. wait time of frameInt
Goose_pkt_stNumFrame status numberInt
Goose_pkt_sqNumFrame sequence numberInt
Goose_dataGOOSE packet dataBool
Table 2. Structure of proposed DNN model with single packet.
Table 2. Structure of proposed DNN model with single packet.
Layer TypeActivationNumber of NeuronsOutput Shape
Input layer--9
Fully connectedReLU3232
Fully connectedReLU6464
Fully connectedReLU128128
Fully connectedSoftmax44
Table 3. Experimental dataset for DNN model with one packet.
Table 3. Experimental dataset for DNN model with one packet.
Type of DataNormalDoSTamperingSpoofing
Number of samples14303003434
Table 4. Hyperparameter optimization.
Table 4. Hyperparameter optimization.
Optimal ParametersValue
Batch size64
Number of hidden layers4
The numeral of epochs300
Learning rate 0.001
Optimizer algorithmAdam
Table 5. Comparison of classification accuracy of the model trained with input shape of 1 packet.
Table 5. Comparison of classification accuracy of the model trained with input shape of 1 packet.
MethodsAverage Classification Accuracy
Proposed DNN97%
SVM with 9 features 93.5 %
SVM with 2 features [17]85%
Table 6. Classification report of DNN with 1 packet.
Table 6. Classification report of DNN with 1 packet.
ClassesPrecisionRecallF1-Score
Normal999999
DoS8810094
Tampering1005067
Spoofing1005067
Table 7. Experimental dataset for the DNN model with three packets.
Table 7. Experimental dataset for the DNN model with three packets.
Type of DataNormalDoSTamperingSpoofing
Number of samples1433360102102
Table 8. Structure of proposed DNN model with three packets.
Table 8. Structure of proposed DNN model with three packets.
Layer TypeActivationNumber of NeuronsOutput Shape
Input layer--9 × 3
Flatten layer--27
Fully connectedReLU3232
Fully connectedReLU6464
Fully connectedReLU128128
Fully connectedSoftmax44
Table 9. Comparison of classification accuracy of the models with 3 packets.
Table 9. Comparison of classification accuracy of the models with 3 packets.
MethodsAverage Classification Accuracy
Proposed DNN98%
SVM 94.5 %
LSTM97%
Table 10. Classification report of DNN with 3 packets.
Table 10. Classification report of DNN with 3 packets.
ClassesPrecisionRecallF1-Score
Normal9910099
DoS979496
Tampering898084
Spoofing100100100
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nhung-Nguyen, H.; Girdhar, M.; Kim, Y.-H.; Hong, J. Machine-Learning-Based Anomaly Detection for GOOSE in Digital Substations. Energies 2024, 17, 3745. https://doi.org/10.3390/en17153745

AMA Style

Nhung-Nguyen H, Girdhar M, Kim Y-H, Hong J. Machine-Learning-Based Anomaly Detection for GOOSE in Digital Substations. Energies. 2024; 17(15):3745. https://doi.org/10.3390/en17153745

Chicago/Turabian Style

Nhung-Nguyen, Hong, Mansi Girdhar, Yong-Hwa Kim, and Junho Hong. 2024. "Machine-Learning-Based Anomaly Detection for GOOSE in Digital Substations" Energies 17, no. 15: 3745. https://doi.org/10.3390/en17153745

APA Style

Nhung-Nguyen, H., Girdhar, M., Kim, Y. -H., & Hong, J. (2024). Machine-Learning-Based Anomaly Detection for GOOSE in Digital Substations. Energies, 17(15), 3745. https://doi.org/10.3390/en17153745

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop