This section reviews related work in the field of IDS, with a primary focus on two critical aspects: (1) from a data perspective, strategies for addressing the pervasive class imbalance problem commonly found in intrusion detection datasets; and (2) from a model perspective, the utilization of deep learning algorithms to construct effective intrusion detection models. The class imbalance issue, characterized by a significant disparity where normal traffic samples vastly outnumber attack samples, represents a critical factor influencing IDS performance. Owing to their powerful representation learning abilities, deep learning models have revolutionized the intrusion detection domain, showing exceptional promise in detecting advanced and evasive threats. Accordingly, this section will delve into an in-depth discussion and analysis of existing research centered around these two core challenges.
2.1. Data-Level Approaches for Class Imbalance in IDS
Data-level mitigation methods typically operate by rebalancing the training data distribution, aiming to enhance the model’s attention to and learning from minority-class instances.
Oversampling techniques based on traditional sampling algorithms, particularly the Synthetic Minority Over-sampling Technique (SMOTE) and its variants, have been favored by researchers due to their ease of implementation and effectiveness. The STB algorithm proposed by Li-Hua Li [
12] integrates SMOTE with XGBoost, effectively improving model accuracy in intrusion detection tasks. Mhamad Bakro [
13] also combined SMOTE with Random Forest (RF) to detect various network attacks, achieving promising results. However, when generating synthetic samples, the SMOTE algorithm only considers local information among minority class samples while ignoring the distribution of the majority class. This can lead to the generation of samples within majority class regions, creating noisy samples or increasing class overlap, potentially degrading classification performance.
To tackle these challenges, researchers have introduced several improved methods, such as Borderline-SMOTE [
14]. Chunhui Zhang [
15] introduced the Borderline-SMOTE method to synthesize minority class attack samples. In addition, isolated forests and local outlier factors (LOF) are adopted for noise reduction within the dataset. Chao Chen [
16] proposed an optimization strategy based on Borderline-SMOTE that reassigns threshold values for weighting coefficients, demonstrating improved effectiveness in handling imbalanced classification tasks. Arjun Puri [
17] proposed a resampling technique model combining K-Means SMOTE with ENN, showing superior performance on binary imbalanced datasets, especially as the percentage of noise increases. Sedat Korkmaz [
18] introduced a novel hybrid method based on Differential Evolution oversampling combined with ENN and evaluated its performance using 44 highly imbalanced datasets, demonstrating its ability to alleviate imbalance in most datasets. Subhajit Chatterjee [
19] employed a hybrid resampling strategy that integrates Adaptive SMOTE for oversampling and Edited Nearest Neighbors (ENN) for undersampling, respectively.
Despite these improvements in specific scenarios, these methods essentially remain confined to the interpolation or replication of existing minority samples. Consequently, they are inherently limited in capturing the complex intrinsic patterns and full diversity of real-world attack data.
Generative Adversarial Networks (GANs) [
20], have shown tremendous potential in addressing the class imbalance problem. Original GANs and their image-focused variants are not directly applicable to the tabular network traffic data commonly encountered in intrusion detection. To address this, researchers developed the Conditional Tabular Generative Adversarial Network (CTGAN), significantly expanding the application domain of GANs. Sudeshna Das [
21] utilized CTGAN to generate synthetic network traffic data and constructed an IDS using lightweight feed-forward and convolutional neural networks. Omar Habibi [
22] highlighted the limitations of existing GAN models in understanding complex datasets and modeling realistic tabular data, subsequently using CTGAN for tabular data modeling to synthesize IoT botnet data and address zero-day threats. Basim Ahmad Alabsi [
23] introduced a CTGAN-based IDS aimed at detecting DDoS and DoS attacks in IoT networks. The approach leveraged synthetically generated tabular data from CTGAN to subsequently train a suite of both shallow and deep learning classifiers, leading to enhanced detection performance.
Although CTGAN demonstrates superiority over traditional GANs in generating tabular data, it still faces critical limitations when applied to industrial control protocols. First, its generation process does not inherently adhere to the rigid syntactic and structural constraints of network protocols, often resulting in realistic-looking but functionally invalid packets that would be rejected by real network stacks. Second, it lacks explicit mechanisms for optimizing class boundaries, meaning it might produce minority class samples that overlap with majority class samples, blurring the decision boundary and complicating subsequent classification tasks.
2.2. Model-Level Approaches for Intrusion Detection
At the model level, researchers in the field of network intrusion detection have continually explored various sophisticated deep learning architectures. The goal is to more effectively extract latent information from network traffic data, thereby enhancing classification performance. Deep learning models are highly favored due to their powerful capabilities in feature learning and pattern recognition. Convolutional Neural Networks (CNN), with their excellent mechanisms for local perception and weight sharing, exhibit significant advantages when processing data possessing spatial structures.
Shalini Subramani et al. [
24] proposed a model combining CNN with fuzzy inference. A fuzzy CNN framework incorporating spatial and temporal constraints was employed for malicious node detection, enabling subsequent tracking of network and system activities. Ogobuchi Daniel Okey et al. [
25] employed a transfer learning-based IDS built upon CNN architectures. They trained five pre-trained CNN models on a specified dataset. Their experiments indicated that an ensemble model, developed using model averaging with three selected models (InceptionV3, MobileNetV3Small, and EfficientNetV2B0), demonstrated the best performance in image classification tasks. Abdulrahman Mahmoud Eid et al. [
26] first employed SMOTE to achieve data balance and subsequently optimized the hyperparameters of a CNN model for IDS classification, validating its generalization ability on the UNSW_NB15 dataset. Amani K. Samha et al. [
27], addressing the challenge of identifying zero-day attacks in IDS, constructed a hybrid model combining a CNN and a Deep Watershed Autoencoder (CNN-DWA) for attack identification. Their experiments showed a 3.51% improvement in accuracy compared to traditional CNN.
On the other hand, Long Short-Term Memory (LSTM) networks, a crucial variant of Recurrent Neural Networks (RNNs), effectively address the vanishing and exploding gradient problems inherent in traditional RNNs through the introduction of gating mechanisms. Given the intrinsic temporal nature of network traffic data, LSTM is adept at capturing long-term dependencies within the data, enabling the identification of anomalous behavior patterns concealed within the time dimension. Such patterns might reflect attacker activities like penetration, reconnaissance, or data exfiltration over extended periods. Mohit Sewak et al. [
28] conducted experiments with various hyperparameter configurations for LSTM. They observed that due to the increasing complexity of malware and network protocols, the performance of LSTM networks configured for IDS is highly sensitive to factors like the number of hidden layers, input sequence length, and specific architectural choices. Rubayyi Alghamdi et al. [
29] proposed a deep ensemble-based IDS utilizing the Lambda architecture. This approach employed LSTM for binary classification to distinguish between malicious and benign traffic, and an integrated classifier combining LSTM, CNN, and Artificial Neural Networks (ANN) for multi-class classification to detect specific attack types. Pradeepkumar Bhale [
30], tackling the issue of inconsistent accuracy in existing IDS when handling high-rate or low-rate DDoS attacks, proposed a distributed IDS solution named OPTIMIST. Its IDS module first synthesizes data using WGAN and then performs offline training based on LSTM. Jun Gao [
31] combined LSTM with Feedforward Neural Networks (FNNs) for intrusion detection in SCADA networks, noting that the system could detect both temporally uncorrelated and correlated attacks.
To leverage the complementary strengths of CNN in spatial feature extraction and LSTM in modeling temporal dependencies, researchers have not only employed these models individually but have also actively explored hybrid models that integrate both. The aim is to simultaneously capture both spatial and temporal characteristics within network traffic data, thereby achieving more comprehensive and precise intrusion detection.
Azizjon Meliboev [
32] compared the performance of models including CNN, LSTM, RNN, and GRU on specific datasets of malicious traffic records for classifying network activity as benign or malicious, finding that the CNN-LSTM combination yielded the best results over 100 epochs. Yung-Chung Wang [
33] compared the effectiveness of a combined CNN-LSTM model against single models on the CSE-CIC-IDS2018 dataset, noting that while both achieved high accuracy, their inference times differed. Asaad Balla et al. [
34] conducted comparative experiments using the Morris power and CICIDS2017 datasets, demonstrating that applying CNN-LSTM to balanced datasets resulted in improved model performance. P Rajesh Kanna [
35] highlighted the potential of integrated CNN-LSTM models to enhance large-scale IDS and proposed a unified IDS model featuring an optimized CNN (OCNN) and a Hierarchical Multi-scale LSTM (HMLSTM). The proposed model achieved accuracy rates exceeding 90%.
Despite the advancements discussed above, significant challenges remain in the current landscape of intrusion detection for AMI. First, regarding data augmentation, traditional oversampling methods like SMOTE often introduce noise and class overlap, while standard GAN variants focus on statistical approximation but frequently ignore the rigid syntactic constraints of industrial protocols. This results in the generation of semantically invalid packets that degrade the training quality of downstream classifiers. Second, in terms of detection models, existing deep learning architectures such as ResNet or LSTM often prioritize detection accuracy at the expense of computational efficiency, making them unsuitable for deployment on resource-constrained edge devices. Furthermore, methods that reshape tabular data into image formats rarely address the artifactual spatial bias introduced by such transformations. Consequently, there is a clear need for a unified framework that simultaneously guarantees protocol-compliant data synthesis and achieves lightweight, robust anomaly detection. To address these specific limitations, this paper proposes the MC-CGAN to enforce protocol validity and the ADS-Net to ensure efficient, bias-free classification.