Next Article in Journal
Integrated High-Voltage Bidirectional Protection Switches with Overcurrent Protection: Review and Design Guide
Previous Article in Journal
A Detailed Review of the Design and Evaluation of XR Applications in STEM Education and Training
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Smart Grid Intrusion Detection System Based on Incremental Learning

1
State Grid Jinhua Electric Power Company, Jinhua 321000, China
2
China Electric Power Research Institute, Haidian, Beijing 100192, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(19), 3820; https://doi.org/10.3390/electronics14193820
Submission received: 28 August 2025 / Revised: 18 September 2025 / Accepted: 23 September 2025 / Published: 26 September 2025

Abstract

With the rapid development of information and communication technology, the intelligent transformation process of traditional power grid continues to accelerate. As an important innovation in the field of power service, smart grid completely revolutionizes the traditional power supply process, and relies on an agile and efficient communication network to realize the two-way interaction between users and the power grid, which significantly improves the power supply flexibility and service quality. However, the two-way communication process is vulnerable to all kinds of network attacks, but most of the current intrusion detection schemes are difficult to effectively identify the emerging attack types, even if incremental learning methods are adopted, they are often trapped in catastrophic forgetting problems. In order to meet the above challenges, this paper proposes smart grid intrusion detection system (Grid-IDS). By establishing an incremental learning method based on tree structure, it can not only accurately detect existing attacks, but also incrementally learn new attack types, and at the same time relief the catastrophic forgetting problem caused by incremental learning. Experiments show 99.65% accuracy on CICIDS2017 with performance superior to baselines, and competitive accuracy and precision on WUSTL-IIoT-2018, indicating good generalization under heterogeneous traffic.

1. Introduction

Smart grid is a new concept, which evolved from the combination of renewable energy integration, digital communication progress, and increasing demand for reliable power supply [1]. Driven by the continuous progress of science and technology, smart grid has developed rapidly. The number of all kinds of terminal equipment and control units in smart grid has greatly increased, enriching the functions and services of power grid. With the help of communication technology, smart grid not only realizes the interconnection of components in the power grid (as illustrated in Figure 1), but also can communicate with external users and distributed energy sources in a diversified way. The emergence and development of smart grid has brought many opportunities for improving efficiency and overall performance [2]. Related studies also indicate that frequency stability control and orderly electricity market operation—such as robust super-twisting LFC and holistic risk-aware market design—are integral to reliable smart-grid operation [3,4].
However, with the popularity of smart grid [5], their security risks have gradually become prominent. In the smart grid environment, business units are scattered and independent, creating a conflict between data privacy protection requirements and network security monitoring [6]. Crucially, widespread cyberattacks can undermine frequency-control loops, such as LFC [3] and disrupt market clearing and price formation [4]. This reality highlights the urgent need for intrusion detection systems (IDS) that can continuously learn and adapt in real time to these evolving threats.
In recent years, several significant cyberattacks have targeted smart grids and energy infrastructures, highlighting the vulnerability of these critical systems. One of the most notable attacks occurred in 2015, when Ukraine’s power grid was hit by a cyberattack, causing a massive blackout. The attackers used spear-phishing emails to install malware (BlackEnergy3) on the network, granting them remote access to the control systems. This allowed them to shut down 30 substations, leaving over 230,000 customers without power for several hours [7]. Another high-profile attack was the 2017 breach of Saudi Aramco, where the Shamoon malware destroyed over 30,000 computers, disrupting operations but not causing physical damage [8]. Similarly, the 2021 Colonial Pipeline attack underscored the growing threat to critical energy supply chains. Hackers infiltrated the company’s network using compromised employee credentials and deployed ransomware, halting fuel supplies to the U.S. East Coast. The company eventually paid a $4.4 million ransom [9].
These cases underscore the growing need for robust cybersecurity measures in smart grids. The attacks on Ukraine’s grid and Colonial Pipeline illustrate how cyber threats have evolved from targeting isolated industrial systems to infiltrating interconnected, complex infrastructures, affecting both public safety and economic stability. As attacks become more frequent and sophisticated, it is crucial to enhance the resilience of smart grid systems by developing adaptive, real-time cybersecurity solutions.
Moreover, the dependence of smart grid systems on communication networks makes them vulnerable to network attacks, posing significant risks to grid reliability [5]. As a network-embedded infrastructure, smart grids must be able to detect network attacks and respond appropriately in a timely manner [6].
The security of smart grids is increasingly challenged by sophisticated cyberattacks, making the design of robust IDS crucial for addressing these threats. In addition to real-world case studies on these attacks, Hardware-in-the-Loop (HIL) validation has become an essential tool for testing the resilience of IDS in smart grid environments. By combining real-time simulations with physical hardware components, HIL enables researchers to simulate diverse attack scenarios and evaluate IDS performance in realistic settings. For instance, HIL can be used to simulate various cyberattack strategies on power grid components, such as denial-of-service attacks or false data injections, to assess how well the IDS can detect and respond to these threats [10].
This validation method not only ensures that IDSs perform under controlled conditions but also allows for the dynamic evaluation of how IDS models adapt to new, evolving attacks [11]. This is particularly valuable for testing IDS solutions in scenarios where new attack types may emerge or existing methods evolve. In fact, the application of HIL in smart grid cybersecurity has been demonstrated in several studies [12,13,14,15], where real-world attack data were fed into physical and virtual smart grid systems, allowing for real-time testing of IDS performance under attack conditions.
With the continuous evolution of attack methods, traditional IDS often struggle to cope with the ever-changing threats of network attacks, especially when dealing with new attacks. Due to the complexity and high dynamics of smart grids, existing IDS solutions face significant limitations: these methods can only handle known attack types but fail to adapt to the emergence of new attacks in real time [16]. Additionally, the high requirements of smart grids for real-time processing, computing power, and storage resources make retraining models to handle new attacks a costly and time-consuming process. Many existing smart grid IDS methods have failed to effectively address this issue, leading to significant declines in efficiency when faced with new attack threats.
Unlike traditional static learning methods, incremental learning [17] offers an adaptive approach by updating the model step by step instead of retraining the entire system. This reduces computation costs and training time, making it especially suitable for new attack modes that frequently appear in the smart grid environment. Incremental learning can quickly adapt to new attack types without losing the ability to detect known attacks.
Given the continuous rise in the number of attack types in the smart grid, it has become an inevitable requirement to incorporate incremental learning into IDS design. This method allows the model to dynamically adapt to new attack types and provide continuous response capability for power grid security. However, with the continuous input of new data, the model is prone to catastrophic forgetting, where the ability to identify previously learned attack types declines significantly, and excessive parameter updates can lead to a sharp increase in computational overhead [18]. The experimental results of this paper show that this problem can be effectively alleviated by a structured design, where a tree structure shows unique advantages in balancing model updates and knowledge retention due to its modular expansion characteristics.
Therefore, this paper proposes a tree-based incremental learning method, which trains a new model with new attack type data and then connects it to the original model as a child node, realizing a tree structure. This incremental learning method effectively avoids catastrophic forgetting and increases the ability of the original model to identify new types of attacks, thus ensuring the continuous protection of the smart grid.
The remainder of this paper is organized as follows: Section 2 reviews the related works on IDS in smart grid scenarios. Section 3 highlights the main contributions of this study. Section 4 presents the overall system architecture, including data preprocessing, model training, and the incremental learning mechanism. Section 5 reports the experimental setup, results, and performance analysis of the proposed scheme. Finally, Section 6 concludes the paper and discusses potential future work.

2. Related Works

To protect the security of the smart grid, many scholars have focused on researching IDS in the smart grid scenario. By analyzing the information from the power grid network, IDS can detect whether there is an attack, monitor the running state of the grid in real time, and ensure its security. Currently, smart grid IDS methods are mainly divided into time-frequency-based IDS [19,20] and machine learning or deep learning-based IDS [21,22,23,24]. Among them, IDS based on deep learning can provide higher detection accuracy and robustness compared to other types. However, despite numerous innovative schemes, existing smart grid IDS still face key issues, such as insufficient response to diversified attack modes, catastrophic forgetting, and computational overhead.
Many schemes based on specific standards or models have limitations in handling diversified attack modes. Quincozes et al. [25] proposed a synthetic traffic generation framework based on the IEC-61850 standard [26] to address the issue of lacking real data for training/testing/evaluation of IDS. However, this framework struggles to flexibly respond to evolving attack patterns and cannot effectively detect new attack behaviors. Similarly, IDS based on federated learning, such as those proposed by Wen et al. [27] improves training efficiency but has limited detection capabilities when facing complex and evolving attacks. Basheer et al. [1] proposed a deep learning IDS based on graph convolution network (GCN) to identify complex threats and maintain the integrity and reliability of the power grid. While it has advantages in real-time detection, its feature extraction ability is not targeted enough to accurately capture the unique features of different attack types. Mohammed et al. [28] constructed a dual hybrid IDS for detecting false data injection in the smart grid. It combines feature selection and deep learning classifiers to improve detection accuracy and robustness, but it is difficult to cover all kinds of new injection methods when dealing with diverse false data injection attack modes.
In terms of catastrophic forgetting and computational cost, many schemes based on federated learning or deep learning face challenges. Hamdi [29] pointed out that the efficiency of federated learning IDS dropped significantly in unconventional scenarios where the distribution of training and test data was inconsistent. While it proposed an IDS [30] combining centralized learning and federated learning to improve the ability to identify unknown attacks, the learning of new attack patterns weakened the ability to recognize previously learned attack patterns. At the same time, computational cost increases sharply with model updates and data growth. For instance, the federated learning IDS for the smart grid based on fog edge support vector machines proposed by Noshina et al. [31] shares learning parameters to ensure data privacy and collaborative learning, but it is difficult to avoid catastrophic forgetting in continuous learning, and the computational resources consumed are large. Additionally, the framework of a fog computing-based AI integration model proposed by Alsirhani et al. [32] combines machine learning and deep learning to improve detection accuracy and address the class imbalance problem, but it does not avoid catastrophic forgetting and high computational overhead caused by model integration. Pasumponthevar et al. [33] combined Kalman filtering with recurrent neural networks, achieving high classification accuracy of 97.3%. However, it still faces catastrophic forgetting when continuing to learn and deal with complex attack scenarios, and the computational resource requirements are high. The challenges of different intrusion detection methods are also outlined in Table 1.
Although research in recent years has significantly improved the performance of smart grid IDS, most existing schemes [1,25,27,28,29,30,31,32,33] still face the following challenges:
(1)
Insufficient response to diversified attack modes: Smart grids have a variety of communication modes [34] making them vulnerable to various attacks. Therefore, IDS needs to detect a wide range of attacks comprehensively. However, most existing schemes focus on detecting known attack types, and they are not well equipped to handle new or unknown attack patterns, especially in environments where attack methods are constantly evolving.
(2)
Catastrophic forgetting and computational overhead: With the development of technology, new attack methods continue to emerge. If the model is retrained only with new attack samples, it can easily lead to catastrophic forgetting [35]. If all historical data are used for retraining, it incurs significant computational cost and time. Many existing IDS methods fail to effectively solve the dilemma of how to continue learning new attacks while retaining existing knowledge.

3. Contributions

Aiming at the security threats and challenges faced by IDS in smart grid application scenarios, this paper proposes Grid-IDS based on incremental learning. In current research, although a variety of IDS approaches have achieved certain results, there are still some obvious limitations, especially in dealing with new attacks, catastrophic forgetting and data imbalance, and no effective solution has been achieved so far. In view of these problems, the contribution of this paper is reflected in the following aspects:
Introduce class-incremental learning into the smart grid intrusion detection scenario: Unlike traditional machine learning models which lack dynamic update capabilities, this work pioneers the application of class-incremental learning to smart grid detection. While conventional incremental learning primarily focuses on known attack types and struggles with catastrophic forgetting, and concept drift, the proposed class-incremental mechanism can continuously learn new attack categories while retaining existing knowledge, significantly reducing the high costs associated with frequent complete retraining in traditional machine learning methods.
Propose a tree-structured class-incremental learning mechanism: This paper proposes a class-incremental learning mechanism based on a tree structure. By dynamically integrating models of new attack types as child nodes into the existing model, a hierarchically expandable detection structure is formed. This effectively adapts to novel attacks while avoiding the issues of catastrophic forgetting and the extensive retraining overhead required by traditional methods, significantly enhancing the system’s adaptability and efficiency in dynamic network environments.
Alleviate the data imbalance problem: In smart grid environments, normal traffic significantly outnumbers attack traffic, causing traditional IDS to be biased toward identifying normal traffic. To address this challenge, this paper employs SMOTE to oversample minority classes, successfully improving the model’s detection performance under imbalanced data conditions and further enhancing the model’s robustness and reliability.
Significantly improve detection performance and resource utilization: Through extensive experiments on the CICIDS2017 dataset, this study validates the exceptional performance of the Grid-IDS system across multiple evaluation metrics including precision, recall, and F1-Score. Compared with four baseline approaches (DNN-batch, Hoeffding Tree, ImFace and T-DFNN), the proposed system demonstrates superior performance across all metrics, achieving an average accuracy of 99.65%. Notably, it exhibits enhanced adaptability and stability when confronted with novel attack types. Furthermore, the system demonstrates significant advantages in inference efficiency. After three incremental learning cycles, the maximum classification time per network packet is only 1.5688 ms, underscoring its remarkable practical utility and scalability.

4. System Architecture

In this section, we first introduce the overall architecture of the proposed scheme, and then present the working principles and implementation steps of each module.

4.1. Overall Architecture

This paper focuses on building a smart grid intrusion detection system based on incremental learning, which is named Grid-IDS. The system has two core capabilities. On the one hand, it can accurately detect all kinds of attacks suffered by the smart grid on the bus and network sides; On the other hand, it can dynamically adapt and identify new attack categories through an incremental learning mechanism. As shown in Figure 2, the overall architecture is mainly composed of two core components: an intrusion detection model and an incremental learning module, where the intrusion detection model is divided into two submodules: data preprocessing and model training. As shown in the intrusion detection model flow in Figure 2, the existing raw training data of the smart grid should be collected first, and then imported into the data preprocessing pipeline. This stage consists of three steps:
First of all, obtain the original training data, and then proceed to the data preprocessing stage. The first step in the data preprocessing stage is data cleaning, such as filling the missing values in the training data, either deleting rows with missing values or imputing them (e.g., mean imputation). The second step is to determine whether the training data is balanced; If not, the SMOTE method [36] is used for oversampling to achieve better training results. The third step is to standardize the data to reduce the influence of scale, characteristics and distribution differences on the model. After data preprocessing, input the data into one-dimensional convolutional neural network (1D-CNN) for training.
However, in the actual operation of the smart grid, new types of attacks may emerge constantly. When Grid-IDS needs to identify these new attack categories, the system enters the incremental learning stage. At this stage, the system collects and processes the emerging attack data and integrates it into the existing learning process. Through incremental learning, the system can dynamically adjust its classification ability, not only accurately identifying the original attack types, but also effectively classifying new attack types.
Finally, when smart-grid data requires security inspection, the data to be inspected is input into the updated and optimized Grid-IDS. The system classifies the input data according to the learned knowledge and classification rules, and outputs the corresponding detection results, thus providing strong assurance for the safe operation of the smart grid. Next, the implementation details of each part will be further described based on the overall architecture of the system.

4.2. Data Preprocessing

It is essential to preprocess the training data before model training because the original training data may have problems such as incomplete data, inconsistent data types and unbalanced classes, which can seriously affect the model training performance. The data preprocessing process mainly consists of data cleaning, checking whether the training data is balanced, and data standardization. The details are as follows.

4.2.1. Data Cleaning

The first step in the data preprocessing stage is data cleaning, which aims to remove the noise and anomalies in the original training data and provide a high-quality data foundation for the subsequent model training. This process mainly covers two key steps: invalid value processing and string encoding.
(1)
Invalid value processing
The original data may contain null values and other invalid items, which will have a negative impact on the training performance of the model if they are not processed. In order to ensure the effectiveness of the training, this paper has formulated targeted processing strategies: on the one hand, directly remove rows with missing values to ensure data integrity; on the other hand, the average imputation method is used to maintain the number of samples and reduce the influence of missing values on subsequent analysis. Considering the stability and accuracy of training, this paper chooses to delete the row where the null value is located for invalid value processing.
(2)
String encoding
In training data, attack types are usually labeled in the form of strings, while model training needs numerical input, so these string labels need to be converted into numeric values. Specifically, the normal type should be labeled as 0 , and the attack type should be labeled as 1 ~ n in turn. Through this encoding method, the model can better understand and deal with different types of attack data, thus improving the classification performance and detection accuracy of the model.

4.2.2. SMOTE

In practical network traffic, benign flows vastly outnumber attacks, yielding highly imbalanced training data and biasing classifiers toward majority classes. Among common remedies we adopt SMOTE on the training split of each incremental batch. Undersampling removes a large portion of benign traffic and degrades the model’s calibration to the background distribution required at inference. Random oversampling duplicates minority instances and tends to overfit the 1D-CNN. Class weighting or focal losses can mitigate bias but require per increment retuning as class priors shift in a class-incremental protocol, which destabilizes probability calibration and decision thresholds. By contrast, SMOTE synthesizes minority samples via linear interpolation among k-nearest neighbors, increasing diversity without discarding majority data while keeping the training recipe invariant across increments. In practice, SMOTE is applied only after data splitting and only to the training set, after feature standardization, and sampling ratios set according to each class imbalance. We prefer vanilla SMOTE over boundary-focused variants (e.g., ADASYN [37], Borderline-SMOTE [38]) to avoid noisy synthetic points for extremely small classes. A schematic is shown in Figure 3. SMOTE is implemented as follows:
(3)
Calculation of k-nearest neighbor:
For each sample x in the minority sample set, the distance between it and all samples in the minority sample set is calculated based on Euclidean distance, and the k-nearest neighbor of the sample is determined.
(4)
Determine the sampling rate:
Set the sampling rate n according to the sample imbalance ratio, and then determine the sampling rate n . For each minority sample x , several samples are randomly selected from its k neighbors, and the selected neighbors are x i ( i = 1 , 2 , , n ) .
(5)
Constructing new samples:
Generating new samples according to Formula (1) for each randomly selected neighbor x i :
x n e w = x + r a n d 0 , 1 × x i x , i = 1 , 2 , , k ,
where r a n d 0 , 1 is a random number evenly distributed in the range of [ 0 , 1 ] , so as to enrich minority samples, improve the balance of data distribution, avoid overfitting, provide high-quality sample data for model training, and help improve model performance in an all-round way.

4.2.3. Data Standardization

The third step in the data preprocessing stage is data standardization, which brings features to a common scale and reduces the impact of outliers and scale differences on model training, thereby improving learning performance and generalization. In this paper, the Z-score standardization method is used, and the formula is as follows:
z = x μ σ ,
where x denotes a sample value, μ denotes the average value of the sample data, σ represents the standard deviation of the sample data, and z represents the normalized value. Z-score can make the processed data conform to the standard normal distribution, that is, the mean value is 0 and the standard deviation is 1 . The standardized data are then used for model training.

4.3. Model Training

After data preprocessing, we use a 1D-CNN to carry out model training, thus constructing the original Grid-IDS. As a classic deep learning algorithm, CNN significantly reduces the number of parameters and shortens the calculation time by virtue of the shared convolutional kernel. This algorithm is not only widely used in the field of image recognition but has also achieved remarkable results in natural language processing. Specifically, while two-dimensional convolutional neural networks are suitable for image processing, 1D-CNNs are more appropriate for time-series data such as text and sensor data.
In view of the fact that smart grid data are essentially time-series data, this paper selects 1D-CNN for data classification. The 1D-CNN model consists of a one-dimensional convolution layer, a pooling layer, a dropout layer, and a fully connected layer [39]. The convolution layer is responsible for extracting key features from the input data; the pooling layer filters the features extracted by the convolution layer to reduce the data dimension; the dropout layer effectively prevents the model from overfitting by randomly ignoring some neurons; and the fully connected layer maps the outputs of the previous layers to the sample label space to achieve classification. The model trained at this stage can classify the existing attack types. Below, we briefly introduce the implementation steps of the 1D-CNN.
While 1D-CNNs have proven highly effective for time-series data classification, it is important to justify why they are preferred over other sequential models like RNNs and Transformers in the context of smart grid traffic. Unlike RNNs, which are specifically designed for sequential data and are capable of capturing temporal dependencies, 1D-CNNs excel at learning local patterns over fixed-length windows of data [40]. This makes 1D-CNNs particularly effective in environments like smart grid traffic, where local patterns (e.g., short-term fluctuations in traffic) play a key role in detecting intrusions. RNNs, on the other hand, often suffer from issues like vanishing gradients during training and can be computationally intensive when handling long sequences [41].
Furthermore, compared to Transformer models, which are powerful for capturing long-range dependencies in sequential data, 1D-CNNs offer a simpler architecture with fewer parameters, leading to reduced computational overhead. Transformers, though highly effective in sequence modeling, require significant computational resources and are often overkill for tasks where the temporal dependencies are short-range and local [42]. In contrast, 1D-CNNs offer a more efficient solution, as they are capable of extracting relevant features with a lower computational cost, making them better suited for real-time intrusion detection in smart grid systems where speed and efficiency are critical.
The following briefly introduces the implementation steps of the 1D-CNN:
Consider a set of 1-D input feature sequences S l ( l = 1 , 2 , , L ) . Each convolutional feature map C m ( m = 1 , 2 , , M ) is connected to multiple input feature sequences through a local weight matrix W m . W m has size L × F (where F is the convolution-kernel lengththe convolution kernel length), determining how many input units each output unit depends on. The corresponding mapping operation is called convolution in the field of signal processing.
The convolutional feature value of each unit can be obtained by the following formula:
c m , k = σ l = 1 L f = 1 F s l , f + k 1 w l , m , f + w 0 , m ,
where σ ( · ) denotes the activation function, c m , k denotes the k -th unit in the m -th convolution feature map C m ; s l , k is the k -th unit of the l -th input feature sequence S l ; and w l , m , f is the f -th coefficient of the kernel (weight matrix) that connects S l to C m . The final convolutional feature map can also be written as:
C m = σ l = 1 L S l W l , m m = 1 , 2 , , M ,
where denotes the convolution operator. Next, a max-pooling operation is applied to the feature maps. The purpose of pooling is to reduce the dimensionality of feature signals and enhance invariance to small perturbations. The formula of the pool layer is as follows:
p m , k = m a x n = 1 N C m , k 1 × q + n ,
where N is the pooling window size and q is the stride (the step size of the window) on the convolutional feature map. After pooling, a fully connected layer is typically used. The fully connected layer mainly acts as a classifier in convolutional neural networks, so its details are omitted here for brevity. Typically, the model uses multiple convolution layers, pooling layers, and fully connected layers, and the final output corresponds to a probability distribution over the output classes.
The model designed by this method is shown in Figure 4, which includes three convolutional layers, three pooling layers, and two fully connected layers. The first convolutional layer consists of 32 filters with a kernel length of 1 and ReLU as the activation function, which transforms the input samples into vectors of ( k , 32 ) , where k is the input dimension. A max-pooling layer is then applied. The next two convolution layers are composed of 64 filters with a kernel length of 1. After three consecutive convolution–pooling operations, the Flatten layer is used to transform the multidimensional input into a one-dimensional vector.
As illustrated in Figure 4, the model comprises three convolution–pooling blocks followed by two fully connected layers. The first convolution uses 32 pointwise 1 × 1 filters with ReLU activation to map the input to a ( k , 32 ) representation (where k is the input dimension), after which a max-pooling layer is applied. The next two convolution layers each use 64   1 × 1 filters and are paired with pooling. After three rounds of convolution and pooling, a Flatten layer converts the multi-dimensional features into a vector, and two fully connected layers with ReLU and Softmax activations produce the final predictions. Cross-entropy is used as the training loss.
We select three convolution layers to balance representational power and real-time latency. Pointwise 1 × 1 convolutions perform channel lifting and nonlinear feature mixing without changing sequence length; stacking three such layers yields progressively more discriminative intermediate representations that help separate fine-grained attack patterns, while keeping the parameter count and computation modest. Under our hardware and latency budget, this depth maintains per-sample inference in the millisecond range and provides consistent accuracy gains over 1–2 layers, whereas going deeper to four layers brings diminishing returns and higher latency. The three-layer design also offers a stable intermediate feature space that facilitates knowledge retention and transfer in the subsequent tree-based incremental learning. For these reasons, three convolution layers are adopted as the default configuration.

4.4. Incremental Learning

In the part of incremental learning, this paper refers to the incremental learning framework [43] proposed by DATA and ARITSUGI, and improves upon it. Specifically, this paper selects 1D-CNN, which is more suitable for traffic data, as the benchmark classifier for model training and uses SMOTE technology to solve the class imbalance problem that may exist in the training data. This incremental learning method is also applied to the construction of Grid-IDS. The principle of the incremental learning part is shown in Figure 5.
This study addresses catastrophic forgetting in incremental learning by employing a tree structure. The tree structure allows the model to add new classes at the leaf nodes of the existing model during each incremental learning phase, instead of retraining the entire network. By training only new data for each new class, the model retains the representations of previously learned classes, preventing forgetting. Compared to replay-based methods, the tree structure provides a more efficient solution.
In incremental learning, replay-based methods [44] mitigate catastrophic forgetting by storing historical data and continuously replaying it during subsequent learning stages. However, this method has a significant drawback: as the dataset grows, there is a need to store and repeatedly use past samples, which increases storage costs [45]. This issue becomes especially problematic in large-scale datasets and when class imbalance is severe. In contrast, the tree structure employed in this work learns new classes by performing local incremental updates and adding new nodes, avoiding the need to retrain the entire model. As a result, the tree structure reduces both storage and computational costs, providing a more efficient solution to catastrophic forgetting compared to traditional replay-based methods. Moreover, by maintaining independent nodes for each class and performing local incremental updates, the tree structure avoids the performance degradation typically caused by global updates in traditional incremental learning approaches. Therefore, the tree structure provides a more efficient and stable incremental learning mechanism, especially when dealing with concept drift and new attack types.
The most important part of the incremental learning method in this paper is the model node. The model node consists of two parts: the model and the model label mapping. The model classifies the input data into several labels, and these output labels are connected to other models through mapping. This mapping is realized by key-value pairs, where the output label is the key, and the value is the connected model or null, indicating that the label is not connected to any model. Table 2 shows the implementation of model nodes.
When input data needs to be classified, the root node of the original model is used for classification first, and whether to continue classification with child nodes or terminate is determined based on the mapping of the root node’s output labels. This process is similar to the decision tree algorithm. The following sections will introduce the specific details of the incremental process and classification process in the incremental learning method.
When the original training data is obtained, we build the first model using Table 2, which will be the root node of the incremental tree. First, the original data is used to train the root node, and then the output labels of the root node are mapped to NULL to indicate that the current root is not connected to any child node.
As shown in Figure 5, it is assumed that the initial training data includes three types of data with labels of 0 , 1 and 2 . We first use this dataset to train a 1D-CNN model as the root node of Grid-IDS. As the smart grid operates, new attack types may appear, such as classes 3 and 4 . When incremental learning is needed for these new classes, the specific steps are as follows:
  • Input the data of classes 3 and 4 into the root node classifier for classification. Because the root node classifier is trained based on old data, it will misclassify new data into one or more existing classes. For example, in the second part of Figure 5, a part of data of class 3 is classified as l a b e l 0 , and we mark this part of data as c 3 1 ; The other part is classified as l a b e l 2 , labeled c 3 2 ; meanwhile, the data of class 4 are all classified as l a b e l 2 and labeled c 4 .
  • Retrain models using the partition produced above. As shown in the third part of Figure 5, c 3 1 classified as l a b e l 0 is merged with the original class 0 data, and a new model is obtained by retraining. The c 3 2 , c 4 classified as l a b e l 2 and the original class 2 data are merged and retrained to obtain another new model.
  • After retraining, the models form a tree structure, and each node consists of a 1D-CNN model, as shown in the fourth part of Figure 5. When new data needs to be classified, it is first input to the root node for prediction. If the label output by the root node is not connected to any child node, then the label is the final classification result of the data; If the label is connected to other nodes, the data will continue to be input to the connected nodes for prediction, and so on recursively until the label corresponding to the prediction result is not connected to any node. For example, for the updated model, the input data D is fed into the root node for prediction. Assuming that the prediction result is 0 , since the label 0 is connected to another node n 1 , the data D is put into n 1 for prediction. If the prediction result is still 0 and label 0 is not connected to any node at this stage, then the category of data D is 0 .
The pseudo code of the incremental process is shown in Table 3:
In addition, data screening is needed when training the new model, and the pseudo code of data screening is shown in Table 4:

4.5. Classification Process

After the incremental process, the model can classify both previously seen (old) types and newly added types. The classification is a recursive procedure. The first step is to classify the input at the root node and then check whether the output label is linked to another node. If the output label is linked to another node, the linked node is invoked recursively to classify the sample again, and this continues until the output label is not linked to any further node.
If the output label is not linked to any node, that label is taken as the final classification result. As shown in Figure 6, the root node assigns the input to label 1 ; because label 1 is not linked to any node, the final classification result is label 1 .
If the output label is linked to another node, recursive classification is required until the label reached is not linked to any node; that label is then the final result. As shown in Figure 7, the root node assigns the input to label 2 ; since label 2 is linked to node c 2 , the samples classified as label 2 are recursively passed to node c 2 for further classification. Node c 2 assigns the input to label 5 ; because label 5 is not linked to any node, the final classification result is label 5 .
In addition, the model may classify a given type of data via multiple nodes. As shown in Figure 8, the root node assigns the input to labels 0 and 2 . At this point both label 0 and label 2 are linked to other nodes, so the data routed to label 0 are recursively classified at node c 1 , yielding label 4 , and the data routed to label 2 are recursively classified at node c 2 , also yielding label 4 . Since both instances of label 4 are not linked to any further node, the final classification result for this group of data is label 4 .
The pseudo code of the classification process is given in Table 5.
During classification, data selection is required when connected nodes are traversed, and the pseudocode is shown in Table 6.
In addition, the pseudocode of the function UpdateLabel, which updates the original classification result, is shown in Table 7:
The classification process traverses nodes at different levels of the tree; thus, the inference time for one input equals the sum of per-node classification times along the path from root to leaf. As increments proceed, the tree can grow and reduce IDS throughput. We therefore employ two complementary measures: pruning and fallback full retraining when necessary.
(1) Pruning mechanism. For a node’s output label i , let n c be the number of samples belonging to a new class c in the current increment, and n c i the number of those samples predicted as label i at this node. We define the misassignment ratio:
ϕ c , i = n c i m a x ( 1 , n c ) .
If the maximum misassignment ratio over all new classes is below the threshold, m a x   ϕ c , i < α , and the absolute count of new-class samples routed to label i is insufficient ( c n c i < m m i n , to avoid unstable child nodes), we do not train a new child model for label i in this round (i.e., the branch is pruned). Concretely, as in Figure 5, if the new class 4 has X samples and fewer than α X are predicted as label 0 , then the child of label-0 is not trained in this increment. Importantly, the samples involved in pruning are not permanently discarded: they are buffered and revisited in later increments once their presence becomes significant ( ϕ c , i α ) or their cumulative count reaches the minimum support ( c n c i m m i n ). This lightweight rule limits the effective branching factor and average depth, thereby curbing latency growth while preserving accuracy. α ( 0 , 1 ) controls whether the new-class presence at label i is significant enough to justify a new branch; sets the minimum support. Larger α (or larger m m i n ) prunes more aggressively, reducing depth and latency but potentially causing a slight recall drop on rare classes; smaller α does the opposite.
If, despite pruning, the per-decision latency exceeds the real-time requirement in smart grid scenario as increments accumulate, we trigger a full retraining on the aggregated dataset and then resume incremental updates on the refreshed model. With these mechanisms, learning new attack types only reuses a subset of old data, which reduces training cost and mitigates catastrophic forgetting, enabling the model to adapt dynamically to emerging attacks while maintaining robust detection performance and practical real-time latency.
Let the output space be Y = { Benign } A , where A is the set of attack types. For deployment, a binary alarm is obtained by collapsing all y A into a single Attack label, i.e., g y = 1 [ y A ] ; otherwise we return the fine-grained label y .

5. Experiment

To evaluate the effectiveness of the proposed Grid-IDS, we use two complementary datasets: CICIDS2017 [46] and WUSTL-IIoT-2018 [47]. Grid-IDS supports both fine-grained attack-type classification and binary intrusion detection within a unified pipeline. Accordingly, on CICIDS2017 we keep the native multi-class labels and report per-class metrics; on the WUSTL-IIoT-2018 dataset, which contains only Benign and Attack labels, we train and evaluate in binary mode using identical preprocessing and model settings. This setup demonstrates that Grid-IDS operates in both modes without any architectural changes. In what follows, we describe the datasets and experimental environment, specify the model parameters and evaluation metrics, and then present the results and performance analysis.

5.1. Dataset Description

5.1.1. CICIDS2017 Dataset

CICIDS2017 is an open network intrusion traffic dataset released in 2017, which is widely used in the field of network traffic intrusion detection. This dataset covers normal traffic and 14 kinds of attack traffic, such as DoS, FTP-Patator, SSH-Patator and other attack types, and each traffic sample contains 78 features. See Table 8 for details of its flow distribution.
Due to some traffic lines in the dataset having missing values, these lines were deleted in this study. From the number of original data samples in Table 8, it can be seen that there is a serious data imbalance problem in CICIDS2017 dataset, so SMOTE technology is used for over-sampling processing.
In order to support the incremental learning process of the model, we divide the dataset into multiple batches, and each batch carries out incremental learning for one or several categories. The specific classification is reflected in Table 8. In order to prevent over-fitting, each batch of data is divided into training samples and test samples according to the ratio of 8:2, so as to ensure the model has good performance.

5.1.2. WUSTL-IIoT-2018 Dataset

The WUSTL-IIoT-2018 dataset focuses on network traffic and attack detection in the SCADA scenario. As shown in Table 9, this dataset contains only six basic features: source port (Sport), total number of packets (TotPkts), total number of bytes (TotBytes), number of source packets (SrcPkts), number of destination packets (DstPkts), and number of source bytes (SrcBytes). Given the limited feature set, we primarily use this dataset to demonstrate and verify the detection performance of the proposed model.
In terms of labeling, the WUSTL-IIoT-2018 dataset uses binary labels. As shown in Table 9, there is a “Target” feature, which is the label for traffic type, with only two values: “0” (no attack) and “1” (attack). Unlike CICIDS2017, it neither subdivides specific attack types nor provides richer feature fields, which makes the dataset relatively easy to handle. As shown in Table 10 which reports the SCADA data distribution, we can see the number of raw instances for Target 0 and Target 1. Clearly, there is a severe class imbalance in this dataset, with the amount of normal traffic far exceeding that of attack traffic. Therefore, we apply SMOTE to oversample the minority class.

5.2. Experimental Environment and Parameter Setting

This experiment is based on Python 3.8.3 and TensorFlow 2.8.3. The computer running the experiment adopts Intel (R) Xeon (R) CPU E5-2650V4 @ 2.20 GHz processor (Intel Corporation, Santa Clara, CA, USA), and is equipped with 256 GB memory, which can ensure fast data reading and writing and meet the needs of large amount of experimental data. The operating system used is Ubuntu 20.04.1 LTS. The set 1D-CNN model parameters are shown in Table 11.
When 1D-CNN is applied to IDS, the filter is the key factor that affects the performance. In this study, the model configuration was determined through many experiments, with three convolution layers and 256 filters in each layer. Adam, with a learning rate of 0.01, is selected as the optimizer, and the loss function is classified cross entropy function, which effectively measures the prediction error. Relu activation function is used in the input layer and softmax function is used in the output layer. At the same time, the number of iterations is set to 20 times, and dropout technology is used to avoid over-fitting of the model and improve its generalization ability.

5.3. Evaluating Indicator

In order to comprehensively evaluate the performance of the model, this paper selects Precision, Recall, F1-score and Accuracy as evaluation indexes, and the calculation formula of each index is as follows:
P r e c i s i o n = T P T P + F P ,
Recall = T P T P + F N ,
F 1 S c o r e = 2 × T P 2 × T P + F P + F N ,
Accuracy = T P + T N T P + T N + F P + F N ,
where T P stands for true positive, T N stands for true negative, F P stands for false positive and F N stands for false negative, which should be determined by confusion matrix. The confusion matrix is shown in Table 12.
With the help of confusion matrix, the prediction results of the model are systematically classified, which provides data support for calculating various evaluation indexes and accurately reflects the performance of the model in identifying normal and abnormal situations.

5.4. Experimental Results and Analysis

5.4.1. Results on CICIDS2017

The performance of the model proposed in this paper on CICIDS2017 dataset is shown in Table 13.
As can be seen from Table 13, the model shows good performance on CICIDS2017 dataset. Among them, in the second batch of data, in terms of Precision, Recall and F1-score, except Brute Force, Sql Injection and XSS, all other categories of indicators exceeded 0.9880, and the corresponding indicators of Benign, Heartbleed and other types of attack traffic reached 1, and Brute Force and Sql Injection indicators were also above 0.8496. However, the data distribution of XSS was similar to normal data, which led to the recognition effect. In the process of incremental learning, three increments have little influence on the recognition of old classes. Precision, Recall and F1-score fluctuate by about 0.01, and the accuracy fluctuates by 0.636% at the maximum. The model can achieve such performance, thanks to the adaptability of 1D-CNN to time series data, which can effectively extract data features, and optimize the model structure by incremental learning, thus reducing the catastrophic forgetting problem.
The model structure validated on the CICIDS2017 dataset is shown in Figure 9. After applying the pruning strategy during validation, eight candidate nodes marked with “X” are removed. Because α is set to 0.1%, the proportion of discarded samples is small, so the impact on evaluation metrics is negligible while the overall complexity is reduced.
More importantly, the tree-based organization helps mitigate catastrophic forgetting. New attack types are accommodated by attaching new child models under existing nodes rather than rewriting the parameters of previously trained nodes. As visualized in Figure 9, branches learned in earlier batches remain intact; only weak or unstable candidates are pruned, and the parent models that encode historical decision boundaries are preserved. During inference, samples belonging to previously learned classes still traverse the original paths from root to leaf, so their predictions do not depend on parameters updated for newer classes. A small α prevents premature specialization when evidence is limited, and the samples involved in pruning are buffered and can be revisited in later increments. Together, localized updates, parameter preservation at parent nodes, and conservative pruning maintain performance on old classes while absorbing new ones, thereby alleviating forgetting in practice.

5.4.2. Results on WUSTL-IIoT-2018

On the WUSTL-IIoT-2018 dataset, our model attains an accuracy of 94.27 percent and a precision of 94.27 percent, with a recall of 1.00, yielding an F1-score of 97.05. The perfect recall indicates that all malicious instances are captured (no missed attacks), while the slightly lower precision and accuracy suggest a small number of false positives, which is an acceptable tradeoff for IIoT and SCADA security where missed alarms are costlier. Overall, the high F1 score confirms balanced performance and robust generalization on this dataset. A more detailed head to head comparison with other schemes is provided in Section 5.5.2.
Across CICIDS2017 and WUSTL-IIoT-2018, the evaluation metrics are generally consistent and satisfactory. We attribute this to the 1D-CNN backbone being suitable for time-series traffic, the use of imbalance-handling techniques that stabilize training, and a tree-based incremental learning design that adds branches for new classes rather than overwriting existing parameters, which helps reduce catastrophic forgetting.

5.5. Performance Analysis

5.5.1. Performance on CICIDS2017

We have made statistics on the message classification time of the incremental model of smart grid network, as shown in Table 14. It can be seen that after each increment, the classification time becomes longer as the depth of the model changes. After three increments, the maximum classification time of a single message on CICIDS2017 dataset is 1.5688 ms, which is also far below the real-time detection requirement of 10 ms.
In addition, in order to systematically verify the effectiveness of the Grid-IDS method proposed in this paper, we compare it with T-dfnn [43], DNN-batch [48], ImFace [48] and Hoeffding Tree [49] in many dimensions, and the comparison results are summarized in Table 15, Table 16 and Table 17.
As shown in Table 15, in terms of precision, the proposed Grid-IDS method achieves higher precision values than the other four methods in 9 out of the 12 attack types. For DoS Slowhttptest, its detection precision is only second to the Hoeffding Tree method, with a difference of less than 0.2%, indicating that Grid-IDS can more accurately identify attack traffic while minimizing the misclassification of normal data as attacks.
For recall, as shown in Table 16, Grid-IDS demonstrates the best recall performance in 8 out of the 12 traffic data types for attack recognition. However, for FTP-Patator and DDoS detection, the differences in recall compared to the best-performing method (T-dfnn [43]) are both less than 0.01, which can be considered acceptable errors. Notably, Grid-IDS exhibits overwhelming performance in both precision and recall for SQL injection traffic identification, indicating its significant advantage in capturing certain real attack traffic and substantially reducing the number of undetected attacks.
In terms of the F1-score, as presented in Table 17, Grid-IDS shows significant advantages in 9 out of the 12 attack traffic detection tasks. For DDoS traffic identification, it trails the best method (T-dfnn [43]) by only 0.28%, while outperforming T-dfnn [43] in the other 11 traffic types. As a comprehensive metric balancing precision and recall, the outstanding F1-score of Grid-IDS fully demonstrates its superior overall performance in intrusion detection tasks. Therefore, it can be concluded that Grid-IDS adopts a model more suitable for traffic data, enabling in-depth mining of traffic features and enhancing the model’s ability to identify attack patterns. In addition, the model effectively addresses the issue of data imbalance, optimizes the distribution of the training dataset, and thereby significantly improves the model’s stability and generalization capability.

5.5.2. Performance on WUSTL-IIoT-2018

In this section, we evaluate the performance of the proposed Grid-IDS model on the WUSTL-IIoT-2018 dataset. The results are shown in Table 18 below.
On the WUSTL-IIoT-2018 dataset, our model delivers accuracy and precision comparable to Diaba et al. [51], higher than Ahakonye et al. [52], yet lower than Yousuf et al. [50]. In contrast, our Recall reaches 1.00 (the best among all methods), and the F1-score is 97.05, also the highest in the table. This pattern suggests a deliberate trade-off: the model favors capturing all malicious instances, which can slightly increase false positives and thus lower precision/accuracy relative to the strongest baseline.From an application perspective, such a recall-oriented profile is reasonable for IIoT/SCADA security, where missed detections are usually more costly than a small number of false alarms. At the same time, the leading F1-score indicates that the overall balance between detecting attacks and controlling false positives is solid. The results show that our approach is competitive across metrics, with room to further improve precision/accuracy without sacrificing recall.
Overall, across CICIDS2017 and WUSTL-IIoT-2018, Grid-IDS maintains competitive accuracy and leading F1-score with a recall-first profile: it wins or ties on most CICIDS2017 classes with millisecond-level latency and achieves perfect recall on WUSTL-IIoT-2018 while keeping precision competitive. These cross-dataset results indicate that the 1D-CNN backbone, imbalance handling, and tree-based incremental learner generalize well to heterogeneous traffic. Moving forward, we will focus on light calibration/post-processing to further improve precision and accuracy without compromising recall or real-time constraints.

6. Conclusions

Smart grid security is fundamental to a stable energy supply. This paper proposes Grid-IDS, an incremental intrusion detection system with three modules—data preprocessing, model training, and incremental learning. By introducing a tree-based incremental strategy, Grid-IDS monitors attacks in real time, learns new attack types while retaining prior knowledge, and mitigates catastrophic forgetting.On CICIDS2017, Grid-IDS attains an average detection accuracy of 99.65% and consistently outperforms baseline approaches including T-dfnn [43], DNN-batch [48], ImFace [48], and Hoeffding Tree [49]. On WUSTL-IIoT-2018, it maintains competitive accuracy and precision—comparable to Diaba et al. [51], higher than Ahakonye et al. [52], while achieving state-of-the-art F1 and perfect recall, indicating good generalization under heterogeneous traffic.
Despite these promising results, several limitations remain. First, our evaluation uses public datasets rather than live smart-grid traffic, which may constrain generalizability. Second, prolonged incremental updates could still degrade performance or increase computational overhead. Finally, although latency is acceptable in our testbed, real-time performance under resource-constrained deployments requires further validation.Future work will integrate Grid-IDS with SCADA/PMU infrastructures to assess deployment feasibility, address concept drift in highly dynamic traffic, and incorporate multi-source data (e.g., device logs and control signals). We also plan to explore distributed or federated learning to enhance scalability, privacy, and adaptability, paving the way for real-world deployment in smart grids.

Author Contributions

Conceptualization, X.N.; methodology, S.J.; project administration, K.Y.; writing—original draft, C.A.; validation, Y.Z.; writing—review and editing, H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Project of State Grid Zhejiang Electric Power Co., Ltd. “Research and Application of Load-side Security Technology for Sensing Equipment and Networking” grant number [5211JH240008].

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Xuming Ni, Shuo Jiang and Kan Yu were employed by the State Grid Jinhua Electric Power Company, Chunyan An, Yuchen Zhang and Hairui Huang were employed by the China Electric Power Research Institute. The authors declare that this study received funding from State Grid Zhejiang Electric Power Co., Ltd. The funder had the following involvement with the study: Security requirements analysis.

Abbreviations

The following abbreviations are used in this manuscript:
IDSIntrusion detection system
AIArtificial intelligence
CNNConvolutional neural network
Grid-IDSSmart grid intrusion detection system

References

  1. Basheer, L.; Ranjana, P. A deep learning framework for intrusion detection system in smart grids using graph convolutional network. Eng. Res. Express 2025, 7, 015257. [Google Scholar] [CrossRef]
  2. Fabio, M.; Francesco, M.; Antonella, S. A Method for Intrusion Detection in Smart Grid. Procedia Comput. Sci. 2022, 207, 327–334. [Google Scholar] [CrossRef]
  3. Abdelaal, A.K.; El-Hameed, M.A. Application of robust super twisting to load frequency control of a two-area system comprising renewable energy resources. Sustainability 2024, 16, 5558. [Google Scholar] [CrossRef]
  4. Xiao, D.; Peng, Z.; Lin, Z.; Zhong, X.; Wei, C.; Dong, Z.; Wu, Q. Incorporating financial entities into spot electricity market with renewable energy via holistic risk-aware bilevel optimization. Appl. Energy 2025, 398, 126449. [Google Scholar] [CrossRef]
  5. Otuoze, A.O.; Mustafa, M.W.; Larik, R.M. Smart grids security challenges: Classification by sources of threats. J. Electr. Syst. Inf. Technol. 2018, 5, 468–483. [Google Scholar] [CrossRef]
  6. Zhai, F.; Yang, T.; Chen, H.; He, B.; Li, S. Intrusion Detection Method Based on CNN–GRU–FL in a Smart Grid Environment. Electronics 2023, 12, 1164. [Google Scholar] [CrossRef]
  7. Case, D.U. Analysis of the Cyber Attack on the Ukrainian Power Grid; Electricity Information Sharing and Analysis Center (E-ISAC): Washington, DC, USA, 2016; Volume 388, p. 3. [Google Scholar]
  8. Alladi, T.; Chamola, V.; Zeadally, S. Industrial control systems: Cyberattack trends and countermeasures. Comput. Commun. 2020, 155, 1–8. [Google Scholar] [CrossRef]
  9. Voas, J.; Kshetri, N.; DeFranco, J.F. Scarcity and global insecurity: The semiconductor shortage. IT Prof. 2021, 23, 78–82. [Google Scholar] [CrossRef]
  10. Barragán-Villarejo, M.; García-López, F.D.P.; Marano-Marcolini, A.; Maza-Ortega, J.M. Power system hardware in the loop (PSHIL): A holistic testing approach for smart grid technologies. Energies 2020, 13, 3858. [Google Scholar] [CrossRef]
  11. Montoya, J.; Brandl, R.; Vishwanath, K.; Johnson, J.; Darbali-Zamora, R.; Summers, A.; Hashimoto, J.; Kikusato, H.; Ustun, T.S.; Ninad, N.; et al. Advanced laboratory testing methods using real-time simulation and hardware-in-the-loop techniques: A survey of smart grid international research facility network activities. Energies 2020, 13, 3267. [Google Scholar] [CrossRef]
  12. Li, Y.; Yan, J. Cybersecurity of smart inverters in the smart grid: A survey. IEEE Trans. Power Electron. 2022, 38, 2364–2383. [Google Scholar] [CrossRef]
  13. Rahman, M.; Ali, M.; Rahman, A.; Sun, W. A real-time cyber-physical hil testbed for cybersecurity in distribution grids with ders. In Proceedings of the 2024 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, USA, 19–22 February 2024; pp. 1–5. [Google Scholar]
  14. Singh, V.K.; Govindarasu, M. Cyber kill chain-based hybrid intrusion detection system for smart grid. In Wide Area Power Systems Stability, Protection, and Security; Springer International Publishing: Cham, Switzerland, 2020; pp. 571–599. [Google Scholar]
  15. Hu, C.; Yan, J.; Liu, X. Reinforcement learning-based adaptive feature boosting for smart grid intrusion detection. IEEE Trans. Smart Grid 2022, 14, 3150–3163. [Google Scholar] [CrossRef]
  16. Abdulganiyu, O.H.; Ait Tchakoucht, T.; Saheed, Y.K. A systematic literature review for network intrusion detection system (IDS). Int. J. Inf. Secur. 2023, 22, 1125–1162. [Google Scholar] [CrossRef]
  17. Van de Ven, G.M.; Tuytelaars, T.; Tolias, A.S. Three types of incremental learning. Nat. Mach. Intell. 2022, 4, 1185–1197. [Google Scholar] [CrossRef]
  18. Song, X.; Shu, K.; Dong, S.; Cheng, J.; Wei, X.; Gong, Y. Overcoming catastrophic forgetting for multi-label class-incremental learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2024; pp. 2389–2398. [Google Scholar]
  19. Yu, D. Research on Anomaly Intrusion Detection Technology in Wireless Network. In Proceedings of the International Conference on Virtual Reality and Intelligent Systems 2018, Changsha, China, 10–11 August 2018; pp. 540–543. [Google Scholar]
  20. Attia, M.; Sedjelmaci, H.; Senouci, S.M.; Aglzim, E.-H. A new intrusion detection approach against lethal attacks in the smart grid:temporal and spatial based detections. In Proceedings of the Global Information Infrastructure and Networking Symposium 2015, Guadalajara, Mexico, 28–30 October 2015; pp. 1–3. [Google Scholar]
  21. Shabad, P.K.R.; Alrashide, A.; Mohammed, O. Anomaly Detection in Smart Grids using Machine Learning. In Proceedings of the IEEE Industrial Electronics Society 2021, Toronto, ON, Canada, 13–16 October 2021; pp. 1–8. [Google Scholar]
  22. Tang, H. Intrusion Detection Method Based on Improved Neural Network. In Proceedings of the International Conference on Smart Grid and Electrical Automation 2018, Changsha, China, 9–10 June 2018; pp. 151–154. [Google Scholar]
  23. Wang, Z.; Zhang, F.; Wang, H.; Zhang, X.; Lu, W.; Zhang, C.; Wang, L.; Wang, B. Network Intrusion Detection Method for Smart Grid Based on PCA-ISBO-GRU-AM. In Proceedings of the International Conference on Renewable Energy and Power Engineering 2024, Beijing, China, 25–27 September 2024; pp. 97–101. [Google Scholar]
  24. Jasper, J.; Praveen, B.M.; Berlin Shaheema, S. Res2-UNeXt Combined with Federated Learning for Cyber-Attack Detection and Classification in Multi Area Smart Grid Power System. In Proceedings of the IEEE Silchar Subsection Conference 2024, Agartala, India, 15–17 November 2024; pp. 1–6. [Google Scholar]
  25. Quincozes, S.E.; Albuquerque, C.; Passos, D.; Mossé, D. ERENO: A Framework for Generating Realistic IEC–61850 Intrusion Detection Datasets for Smart Grids. IEEE Trans. Dependable Secur. Comput. 2024, 21, 3851–3865. [Google Scholar] [CrossRef]
  26. IEC 61850:2025 SER; Communication Networks and Systems for Power Utility Automation. International Electrotechnical Commission (IEC): London, UK, 2013.
  27. Wen, M.; Zhang, Y.; Zhang, P.; Chen, L. IDS-DWKAFL: An intrusion detection scheme based on Dynamic Weighted K-asynchronous Federated Learning for smart grid. J. Inf. Secur. Appl. 2025, 89, 103993. [Google Scholar] [CrossRef]
  28. Mohammed, S.H.; Singh, M.S.J.; Al-Jumaily, A.; Islam, M.T.; Islam, S.; Alenezi, A.M.; Soliman, M.S.; Nejad, M.G. Dual-hybrid intrusion detection system to detect False Data Injection in smart grids. PLoS ONE 2025, 20, e0316536. [Google Scholar] [CrossRef] [PubMed]
  29. Hamdi, N. Investigating the Efficiency of a Federated Learning-Based Intrusion Detection System for Smart Grid. SN Comput. Sci. 2025, 6, 245. [Google Scholar] [CrossRef]
  30. Hamdi, N. A hybrid learning technique for intrusion detection system for smart grid. Sustain. Comput. Inform. Syst. 2025, 46, 101102. [Google Scholar] [CrossRef]
  31. Tariq, N.; Alsirhani, A.; Humayun, M.; Alserhani, F.; Shaheen, M. A fog-edge-enabled intrusion detection system for smart grids. J. Cloud Comput. 2024, 13, 43. [Google Scholar] [CrossRef]
  32. Alsirhani, A.; Tariq, N.; Humayun, M.; Alwakid, G.N.; Sanaullah, H. Intrusion detection in smart grids using artificial intelligence-based ensemble modelling. Clust. Comput. 2025, 28, 238. [Google Scholar] [CrossRef]
  33. Pasumponthevar, M.K.; Jeyaraj, P.R. Kalman reinforcement learning-based provably secured smart grid false data intrusion detection and resilience enhancement. Electr. Eng. 2024, 107, 2883–2901. [Google Scholar] [CrossRef]
  34. Abrahamsen, F.E.; Ai, Y.; Cheffena, M. Communication Technologies for Smart Grid: A Comprehensive Survey. Sensors 2021, 21, 8087. [Google Scholar] [CrossRef] [PubMed]
  35. Pathak, R.K.; Yadav, V. Improvised Progressive Neural Network (iPNN) for Handling Catastrophic Forgetting. In Proceedings of the International Conference on Electronics and Sustainable Communication Systems 2020, Coimbatore, India, 2–4 July 2020; pp. 143–148. [Google Scholar]
  36. Elreedy, D.; Atiya, A.F. A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance. Inf. Sci. 2019, 505, 32–64. [Google Scholar] [CrossRef]
  37. He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
  38. Han, H.; Wang, W.Y.; Mao, B.H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Proceedings of the International Conference on Intelligent, Hefei, China, 23–26 August 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 878–887. [Google Scholar]
  39. Qazi, E.U.H.; Almorjan, A.; Zia, T. A One-Dimensional Convolutional Neural Network (1D-CNN) Based Deep Learning System for Network Intrusion Detection. Appl. Sci. 2022, 12, 7986. [Google Scholar] [CrossRef]
  40. Ige, A.O.; Sibiya, M. State-of-the-art in 1d convolutional neural networks: A survey. IEEE Access 2024, 12, 144082–144105. [Google Scholar] [CrossRef]
  41. Noh, S.H. Analysis of gradient vanishing of RNNs and performance comparison. Information 2021, 12, 442. [Google Scholar] [CrossRef]
  42. Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef]
  43. Data, M.; Aritsugi, M. T-dfnn: An incremental learning algorithm for intrusion detection systems. IEEE Access 2021, 9, 154156–154171. [Google Scholar] [CrossRef]
  44. kumar Amalapuram, S.; Channappayya, S.S.; Tamma, B. Augmented memory replay-based continual learning approaches for network intrusion detection. In Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
  45. Prasath, S.; Sethi, K.; Mohanty, D.; Bera, P.; Samantaray, S.R. Analysis of continual learning models for intrusion detection system. IEEE Access 2022, 10, 121444–121464. [Google Scholar] [CrossRef]
  46. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 2018, 1, 108–116. [Google Scholar]
  47. Teixeira, M.A.; Salman, T.; Zolanvari, M.; Jain, R.; Meskin, N.; Samaka, M. SCADA System Testbed for Cybersecurity Research Using Machine Learning Approach. Future Internet 2018, 10, 76. [Google Scholar] [CrossRef]
  48. Zhang, B.; Xia, H.; Zhang, G.; Gao, Z. Incremental Intrusion Detection Based on Multi-feature Fusion Autoencoder. Comput. Syst. Appl. 2023, 32, 42–50. (In Chinese) [Google Scholar] [CrossRef]
  49. Domingos, P.; Hulten, G. Mining high-speed data streams. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, 20–23 August 2000; pp. 71–80. [Google Scholar]
  50. Yousuf, O.; Mir, R.N. DDoS attack detection in Internet of Things using recurrent neural network. Comput. Electr. Eng. 2022, 101, 108034. [Google Scholar] [CrossRef]
  51. Diaba, S.Y.; Anafo, T.; Tetteh, L.A.; Oyibo, M.A.; Alola, A.A.; Shafie-Khah, M.; Elmusrati, M. SCADA securing system using deep learning to prevent cyber infiltration. Neural Netw. 2023, 165, 321–332. [Google Scholar] [CrossRef]
  52. Ahakonye, L.A.C.; Nwakanma, C.I.; Lee, J.-M.; Kim, D.-S. Scada intrusion detection scheme exploiting the fusion of modified decision tree and chi-square feature selection. Internet Things 2023, 21, 100676. [Google Scholar] [CrossRef]
Figure 1. The network along the smart grid infrastructure.
Figure 1. The network along the smart grid infrastructure.
Electronics 14 03820 g001
Figure 2. Grid-IDS system architecture.
Figure 2. Grid-IDS system architecture.
Electronics 14 03820 g002
Figure 3. Schematic diagram of SMOTE.
Figure 3. Schematic diagram of SMOTE.
Electronics 14 03820 g003
Figure 4. Model structure diagram.
Figure 4. Model structure diagram.
Electronics 14 03820 g004
Figure 5. The principle of incremental learning.
Figure 5. The principle of incremental learning.
Electronics 14 03820 g005
Figure 6. Classification with no downstream node.
Figure 6. Classification with no downstream node.
Electronics 14 03820 g006
Figure 7. Classification with a downstream node.
Figure 7. Classification with a downstream node.
Electronics 14 03820 g007
Figure 8. Multi-node classification.
Figure 8. Multi-node classification.
Electronics 14 03820 g008
Figure 9. Model structure after CICIDS2017 verification.
Figure 9. Model structure after CICIDS2017 verification.
Electronics 14 03820 g009
Table 1. Summary of IDS Methods and Challenges.
Table 1. Summary of IDS Methods and Challenges.
MethodAdvantagesChallenges
Synthetic Traffic Generation (IEC-61850) [25]Solves lack of real data for training/testing/validation of IDSStruggles to handle evolving attack patterns and cannot detect new attack behaviors
Federated Learning-based IDS [27,29,30,31]Improves training efficiency and supports collaborative learningLimited detection capabilities in the face of complex, evolving attack methods; catastrophic forgetting
Graph Convolution Network-based IDS [1]Effective in identifying complex threats and maintaining grid reliabilityInsufficient feature extraction to accurately capture unique attack characteristics
Hybrid IDS (Feature Selection + Deep Learning) [28]Improves detection accuracy and robustnessDifficult to cover all new injection methods, especially diverse false data injection attacks
Table 2. Pseudo code of the definition of model nodes.
Table 2. Pseudo code of the definition of model nodes.
Class ModelNode
1:  M = Null//model
2:  model_map = {}//map
3:  def train (self, x, y): //Model node initialization
4:    self.M = model_train (x, y)//Training data to get the model
5:    For _ in y:
6:      self.model_map [y] = Null //Map is set to null
7:  def classify (self, x): //Classification of input data
8:    x_label = Classification (self.M, x)
9:return x_label
10:  def get_link_node (self, l): //Get the mapping of model nodes
11:    node = self.model_map [l]
12:    return node
13:  def set_link_node (self, l, m): //Resets the mapping of model nodes
14:    self.model_map [l] = m
Table 3. Pseudo code of incremental process.
Table 3. Pseudo code of incremental process.
IncrementalTraining (X, Y, N)
Input:X: training data, Y: training data label, N: model node
Output:N: model node
1:L ← N.classify (X)//get the classification result l of data x at node n
2:for i in set (L):
3:  do Xi, Yi ← SelectTrainingData (X, Y, L, i)//Filter out the training data with the output label of i
4:  linkedN ← N.get_link_node (i)//Get the mapping of node n in tag i
5:  if linkedN ≠ NULL then //The output label is mapped to a new node
6:    linked ← IncrementalTraining (XL, YL, linkedN)//Recursively classify
7:  else//Output label mapping is empty
8:    newN ← ModelNode ()//Create a new node
9:  newN.train (XL, YL)
10:N.set_link_node (i, newN)//Update the mapping
11:return N
Table 4. Pseudo code of data filtering.
Table 4. Pseudo code of data filtering.
SelectTrainingData (X, Y, L, i)
Input:X: training data, Y: training data label, L: all labels of model classification, i: data labels to be selected
Output:XL: selected training data, YL: selected training data label
1:XL, YL ← []
2:j ← 0
3:for m in range (0, length (X)-1)
4:  if row (L, m) = i then//If the m-th L tag is equal to the tag to be selected
5:row (XL, j) ← row (X, m) //Assign the mth X data to YL
6:row (YL, j) ← row (Y, m)//Assign the mth Y data to YL
7:j ← j + 1
8:return XL, YL
Table 5. Pseudo code of classification process.
Table 5. Pseudo code of classification process.
Classifification(X, N)
Input:X: classification data, N: model node
Output:Y: classification data label result
1:Y ← N.classify (X)
2:for i in set (Y):
3:  linkedN ← N.get_link_node (i)//Get the mapping of node n in tag i
4:  if linkedN ≠ NULL then//The output label is mapped to a new node
5:    XL, YI ← SelectData (X, Y, i)//Get the training data with the output label of i
6:    newY ← Classifification (XL, linkedN)//Recursively classify the connected nodes
7:Y ← UpdateLabel (Y, newY, YI)//Update the original classification results
8:return Y
Table 6. Pseudo code of data selection.
Table 6. Pseudo code of data selection.
SelectData (X, Y, i)
Input:X: training data, Y: label of training data after classification, i: data labels to be selected
Output:XL: the selected training data, YI: the label index to be selected
1:XL, YL ← []
2:j ← 0
3:for m in range (0, length (X)−1)
4:  if row (Y, m) = i then//If the m-th y tag is equal to the tag to be selected
5:row(XL, j) ← row (X, m)//Assign the mth x data to XL
6:Row (YI, j) ← m//Assign m to YI
7:j ← j + 1
8:return XL, YI
Table 7. Pseudo code of original classification updating.
Table 7. Pseudo code of original classification updating.
UpdateLabel(Y, newY, YI)
Input:Y: label of training data after classification, newY: new label of training data after newly classification, YI: index of training data to be updated
Output:Y: updated classification label
1:for i in range (0, length (newY)−1) do
2:j ← row (YI, i)//Get the i-th index value to update
3:Row (Y, j) ← row (newY, i)//Update the original label
4:Return Y
Table 8. Data Distribution of CICIDS2017.
Table 8. Data Distribution of CICIDS2017.
PatchAttack TypeNumber of Original MessagesNumber of Messages After SMOTE
0Benign2,273,0972,273,097
FTP-Patator793879,350
SSH-Patator589758,970
1DoS GoldenEye10,29351,465
DoS Hulk231,073231,073
DoS Slowhttptest549954,990
DoS slowloris579657,960
2Heartbleed11110,000
Brute Force150775,350
Sql Injection2110,500
XSS65232,600
3Infiltration36180,000
Bot1966195,600
DDoS128,027128,027
Port Scan158,930158,930
Table 9. The feature values contained in the WUSTL-IIoT-2018 dataset.
Table 9. The feature values contained in the WUSTL-IIoT-2018 dataset.
FeatureNotions
Source Port (Sport)Port number of the source
Total Packets (TotPkts)Total transaction packet count
Total Bytes (TotBytes)Total transaction bytes
Source packets (SrcPkts)Source/Destination packet count
Destination Packets (DstPkts)Destination/Source packet count
Source Bytes (SrcBytes)Source/Destination transaction bytes
TargetLabel for traffic type
Table 10. Data Distribution of SCADA.
Table 10. Data Distribution of SCADA.
CategoryNumber of Original Messages
Without Attacks (Target 0)6,634,581
With Attacks (Target 1)403,402
Total7,037,983
Table 11. 1D-CNN model parameter settings.
Table 11. 1D-CNN model parameter settings.
ParameterValue
Activation Function Inputrelu
Epoch20
Dropout0.1
Activation Function Outputsoftmax
OptimizerAdam
Learning Rate0.01
Loss Functioncategorical crossentropy
Table 12. Confusion matrix.
Table 12. Confusion matrix.
Actual ClassPredictive Class
XY
XTPFN
YFPTN
Table 13. Model representation using CICIDS2017 dataset.
Table 13. Model representation using CICIDS2017 dataset.
BatchClassPrecisionRecallF1-ScoreAccuracy
0Benign11199.992%
FTP-Patator0.99940.99820.9988
SSH-Patator0.99760.99920.9984
1Benign11199.970%
FTP-Patator0.99940.99820.9988
SSH-Patator0.99680.99920.9980
DoS GoldenEye0.99850.99980.9992
DoS Hulk10.99990.9999
DoS Slowhttptest0.99560.99370.9946
DoS slowloris0.99530.99540.9954
2Benign11199.267%
FTP-Patator0.99940.99820.9988
SSH-Patator0.99680.99920.9980
DoS GoldenEye0.99850.99980.9992
DoS Hulk0.99990.99990.9999
DoS Slowhttptest0.99560.99370.9946
DoS slowloris0.98890.99540.9922
Heartbleed111
Brute Force0.84970.87270.8611
Sql Injection0.89940.89130.8953
XSS0.72230.67040.6954
3Benign11199.372%
FTP-Patator0.99940.99820.9988
SSH-Patator0.99680.99920.9980
DoS GoldenEye0.99850.99980.9992
DoS Hulk0.99990.99990.9999
DoS Slowhttptest0.99560.99370.9946
DoS slowloris0.98890.99540.9922
Heartbleed111
Brute Force0.84500.87270.8587
Sql Injection0.89940.89130.8953
XSS0.70970.67040.6895
Infiltration111
Bot111
DDoS10.99240.9962
Port Scan111
Table 14. Performance analysis of CICIDS2017.
Table 14. Performance analysis of CICIDS2017.
BatchMaximum Classification Time of a Single Message (ms)
00.0522
10.4163
20.9890
31.5688
Table 15. Comparison results in Precision.
Table 15. Comparison results in Precision.
BatchDNN-BatchDFNN-AllT-dfnnHoeffding TreeImFaceML-AIDSGrid-IDS
FTP-Patator0.99000.9990.99810.9838-0.9994
SSH-Patator0.86680.9840.9770.96150.9722-0.9968
DoS GoldenEye0.91970.9910.9960.97760.9831-0.9985
DoS Hulk0.99990.9900.9850.97820.9988-0.9999
DoS Slowhttptest0.93540.9770.9790.99730.9829-0.9956
Heartbleed010.96711-1
Brute Force0.68530.7060.7050.94120.99620.580.845
Sql Injection00.8600.72500.7500.8994
XSS00.7590.5820.60.983510.7097
Bot0.94270.9360.9620.97020.9287-1
DDoS0.842610.9990.99980.9987-1
Port Scan0.99940.9940.9940.99870.9988-1
Table 16. Comparison results in Recall.
Table 16. Comparison results in Recall.
BatchDNN-BatchDFNN-AllT-dfnnHoeffding TreeImFaceML-AIDSGrid-IDS
FTP-Patator0.99690.99990.9970.9950.9969-0.9982
SSH-Patator0.99810.9880.980.99490.95-0.9992
DoS GoldenEye0.99080.9930.9910.99560.9913-0.9998
DoS Hulk0.8930.9940.9950.35260.9992-0.9999
DoS Slowhttptest0.98730.9920.9920.98910.9927-0.9937
Heartbleed01.000010.51-1
Brute Force0.85380.9850.9630.10630.87380.990.8727
Sql Injection00.4750.27500.500.8913
XSS00.0650.0640.02310.91540.010.6704
Bot0.75320.5180.4580.99490.9949-1
DDoS0.999510.9990.99810.997-0.9924
Port Scan0.996810.9990.99860.9991-1
Table 17. Comparison results in F1-score.
Table 17. Comparison results in F1-score.
BatchDNN-BatchHoeffding TreeImFaceT-dfnnImFaceML-AIDSGrid-IDS
FTP-Patator0.99340.9990.9980.99750.9903-0.9988
SSH-Patator0.92350.9860.9780.97790.961-0.998
DoS GoldenEye0.95390.9920.9940.98650.9872-0.9992
DoS Hulk0.94340.9920.990.51830.999-0.9999
DoS Slowhttptest0.96060.9840.9850.99320.9878-0.9946
Heartbleed01.00000.9830.66671-1
Brute Force0.76040.8230.8140.1910.9310.730.8587
Sql Injection00.6120.39900.600.8953
XSS00.1190.1150.04440.94820.020.6895
Bot0.83730.6670.6210.98240.9607-1
DDoS0.914410.9990.99890.9978-0.9962
Port Scan0.99980.9970.9970.99870.9989-1
Table 18. Model representation using WUSTL-IIoT-2018.
Table 18. Model representation using WUSTL-IIoT-2018.
SchemeRef.AccuracyPrecisionRecallF1-Score
Yousuf et al.[50]99.0997.1295.0596.45
Diaba et al.[51]94.2293.6493.7394.38
Ahakonye et al.[52]93939393
Ours-94.2794.27197.05
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ni, X.; Jiang, S.; Yu, K.; An, C.; Zhang, Y.; Huang, H. Smart Grid Intrusion Detection System Based on Incremental Learning. Electronics 2025, 14, 3820. https://doi.org/10.3390/electronics14193820

AMA Style

Ni X, Jiang S, Yu K, An C, Zhang Y, Huang H. Smart Grid Intrusion Detection System Based on Incremental Learning. Electronics. 2025; 14(19):3820. https://doi.org/10.3390/electronics14193820

Chicago/Turabian Style

Ni, Xuming, Shuo Jiang, Kan Yu, Chunyan An, Yuchen Zhang, and Hairui Huang. 2025. "Smart Grid Intrusion Detection System Based on Incremental Learning" Electronics 14, no. 19: 3820. https://doi.org/10.3390/electronics14193820

APA Style

Ni, X., Jiang, S., Yu, K., An, C., Zhang, Y., & Huang, H. (2025). Smart Grid Intrusion Detection System Based on Incremental Learning. Electronics, 14(19), 3820. https://doi.org/10.3390/electronics14193820

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop