2.1. Anomaly Detection
Considering that generative AI can rapidly produce dozens or even hundreds of new attack variants, the likelihood of evading existing detection models increases substantially. This underscores the urgent need for adaptive, continual learning-based detection systems capable of evolving in real time to counter emerging threats.
Table 1 summarizes related studies in this domain, highlighting prior efforts in feature engineering, model design, and anomaly detection strategies for intrusion detection.
Kim et al. [
9] vectorized data for malicious script detection by applying various preprocessing techniques, Abstract Syntax Tree (AST) [
10], n-gram [
11], and fuzzy hash [
12]-based structural features. These feature vectors were effectively fed into an LGBM model, demonstrating robust detection performance even in code environments where both high- and low-level feature extraction is challenging due to encoding and obfuscation.
Panwar et al. [
13] utilized Information Gain [
14] as a filter-based feature selection method to address the high dimensionality of the CICIDS 2017 [
15] dataset. They computed the entropy-based importance of each feature and ranked all 77 features by their information gain scores, grouping them into seven subsets. The selected subsets were evaluated using multiple classifiers, including Random Forest (RF), Bayes Net (BN), Random Tree (RT), Naive Bayes (NB) [
16,
17,
18,
19]. Their experimental results showed that selecting a smaller set of relevant features not only reduced computational complexity but also improved detection performance. Specifically, the Random Forest classifier achieved the highest accuracy of 99.86% using just 22 selected features, while the J48 classifier obtained a slightly higher accuracy of 99.87% when using 52 features, albeit with longer execution time. This highlights the importance of effective feature reduction in enhancing both accuracy and efficiency for real-time intrusion detection systems.
Maseer, Z.K. et al. [
20] preprocessed the CICIDS 2017 dataset by handling missing and infinite values, normalizing the data within the range of {−3, 3}, and applying z-score standardization, resulting in 38 refined features [
21,
22]. They evaluated ten machine learning algorithms, including k-Nearest Neighbors (k-NN), Decision Tree (DT), Naïve Bayes (NB) [
23,
24]. Among these, k-NN, DT, and NB showed superior performance in detecting web-based attacks. The results indicate that proper feature scaling and model selection can significantly improve the accuracy and efficiency of intrusion detection systems (IDS).
Vibhute, A.D. et al. [
25] applied MinMax normalization [
26] to the UNSW-NB15 and NSL-KDD [
27,
28] dataset after handling inconsistencies and missing or zero values. To reduce dimensionality, they used an RF-based feature selection method, selecting the 15 most significant features out of 41. These were used to train a convolutional neural network (CNN) model composed of convolutional, pooling, batch normalization, and fully connected layers [
29]. The proposed model achieved 99.00% accuracy on the test set, demonstrating that combining effective feature selection with deep learning can significantly enhance detection performance in complex network environments.
More, S. et al. [
30] preprocessed the UNSW-NB15 dataset by removing null values, correcting inconsistent data types, normalizing feature values, and eliminating highly correlated attributes through correlation analysis. The study employed various machine learning models, including Logistic Regression (LR) [
31], Support Vector Machine (SVM) [
32], Decision Tree (DT), and Random Forest (RF), all of which achieved accuracy scores above 0.9. These results suggest that combining exploratory data analysis with proper feature selection can enhance the reliability and robustness of intrusion detection systems.
Altulaihan, E. et al. [
33] applied a series of data refinement steps, including null value removal, categorical encoding [
34], noise elimination, and numerical feature scaling, to the IoTID20 dataset [
35]. These procedures were designed to enhance data consistency and reduce irrelevant variation prior to feature selection and model training. The study findings indicate that such preprocessing improves the quality of extracted features and enables more reliable anomaly detection in IoT environments, particularly when dealing with noisy or high-dimensional data.
The reviewed studies collectively emphasize that effective preprocessing, feature engineering, and dimensionality reduction are critical to enhancing the performance of IDS. Proper feature scaling, normalization, and handling of missing or inconsistent data significantly improve model stability and detection precision across diverse datasets, including CICIDS 2017, UNSW-NB15, and IoTID20. Furthermore, the integration of optimized feature subsets with advanced classifiers or deep learning architectures demonstrates superior performance, underscoring the necessity of coupling data refinement with appropriate model design. These findings highlight that a systematic combination of preprocessing, feature selection, and model optimization is essential for developing real-time, adaptive IDS capable of addressing evolving cyber threats.
Table 1.
Summary of anomaly detection.
Table 1.
Summary of anomaly detection.
References (Year) | Preprocessing Methods Used | Used Datasets | Training Models |
---|
Kim, K et al. (2024) [9] | AST [10], n-gram [11], Fuzzy Hash [12] | A self-collected dataset | LGBM [8] |
Panwar, S.S et al. (2022) [13] | Information gain [14], | CICIDS 2017 [15] | RF [16], BN [17], RT [18], NB [19] |
Maseer, Z.K et al. (2021) [20] | Normalization [21], Standardization [22] | CICIDS 2017 [15] | NB [19], KNN [23], DT [24] |
Vibhute, A.D et al. (2024) [25] | MinMax normalization [26] | CICIDS 2017 [15], UNSW-NB15 [27] NSL-KDD [28] | RT [18], CNN [29] |
More, S et al. (2024) [30] | Normalization [21] | UNSW-NB15 [27] | RF [16], DT [24], LR [31], SVM [32] |
Altulaihan, E et al. (2024) [33] | categorical encoding [34] | IoTID20 [35] NSL-KDD [28] | RF [16], KNN [23], DT [24], SVM [32] |
2.2. Continual Learning
Continual learning (CL) is a training paradigm that enables a model to sequentially learn new data or tasks while retaining previously acquired knowledge. It aims to mitigate catastrophic forgetting and allows the model to adapt to evolving environments and shifting data distributions over time [
36,
37].
Liang, Y. S. et al. [
38] proposed a continual learning framework that introduces task-specific Low-Rank Adaptation (LoRA)-like branches, consisting of fixed dimensionality reduction matrices and learnable adapters. By projecting new task gradients into a subspace orthogonal to the gradients of previous tasks via CL method effectively mitigates task interference. This architecture enables efficient and modular task expansion without retraining the pre-trained model, offering a well-balanced trade-off between stability and plasticity while preserving parameter efficiency.
Yu, J. et al. [
39] propose a parameter-efficient continual learning framework built upon Contrastive Language–Image Pretraining (CLIP) by integrating a Mixture-of-Experts (MoE) architecture with task-specific routers and LoRA adapters. To address catastrophic forgetting and support task-specific adaptation, the model selectively activates a subset of experts for each task while keeping the remaining experts frozen. Additionally, the framework introduces a Distribution Discriminative Auto-Selector (DDAS), which automatically identifies the task by analyzing the input distribution, enabling fully automated continual learning without requiring explicit task IDs. Extensive experiments on multi-domain Task-Incremental Learning (TIL) and Class-Incremental Learning (CIL) benchmarks demonstrate that the proposed method surpasses previous approaches while maintaining CLIP’s zero-shot generalization capability.
Gao, Z. et al. [
40] proposed the consistent prompting (CPrompt) framework, which enhances continual learning by incorporating two consistency-driven modules: Classifier Consistency Learning (CCL) and Prompt Consistency Learning (PCL). CCL mitigates catastrophic forgetting through smooth regularization, preventing outdated classifiers from overpowering current predictions. PCL addresses the misalignment between task-specific prompts and classifiers by introducing random prompt selection and auxiliary supervision. Experimental results on multiple benchmarks demonstrate that CPrompt significantly improves both accuracy and robustness in class-incremental learning settings, while maintaining parameter efficiency.
Marczak, D. et al. [
41] present Maximum Magnitude Selection (MagMax), a novel continual learning framework designed for exemplar-free settings. The method leverages sequential fine-tuning and task vector merging, selecting parameters with the highest magnitude of change across tasks to construct a merged task representation that preserves essential knowledge while minimizing interference. Empirical results show that a small subset of high-impact parameters largely determines performance, and that sequential fine-tuning significantly reduces parameter sign conflicts. These findings demonstrate the effectiveness of MagMax in enabling efficient knowledge consolidation and stable continual adaptation.
Le, M. et al. [
42] propose a unified continual learning framework that combines Mixture-of-Experts (MoE), prefix tuning, and a novel Non-linear Residual Gating (NoRGA) mechanism. By interpreting prefix tuning as a modular expert selection process within the MoE formulation, the framework enables task-specific adaptation without modifying the backbone. It integrates both shared pre-trained experts and prefix-specific experts, selected through a dynamic gating function. To overcome the limitations of linear gating, NoRGA introduces non-linear activation and residual connections, significantly improving parameter estimation under limited data. Theoretical analysis further establishes convergence guarantees and demonstrates the statistical efficiency of the proposed method.
Although these approaches demonstrate progress in addressing catastrophic forgetting, they also reveal limitations: LoRA- and MoE-based methods introduce additional task-specific modules that increase model complexity over time; automated selectors like DDAS assume stable distributions that may not hold in adversarial environments; consistency-driven strategies such as CPrompt require careful tuning and struggle with ambiguous task boundaries; exemplar-free designs like MagMax reduce plasticity; and unified frameworks with complex gating mechanisms face optimization challenges. These limitations highlight that while continual learning methods have advanced in benchmark settings, their scalability, efficiency, and adaptability to unpredictable real-world conditions—particularly in intrusion detection—remain unresolved.
To address this gap, our work introduces a lightweight continual learning framework that leverages the encoder–memory module of memory-augmented autoencoders while replacing the decoder with a gradient-free LightGBM (LGBM) classifier. This design reduces computational overhead, avoids reconstruction bias, and enables efficient memory updates, making it well-suited for real-time intrusion detection where both adaptability and efficiency are essential.