Cognitive Network Intrusion Detection Systems: Anomaly and Malware Detection for Zero-Day Attack Resilience

Gunawan, Jimmy Agung; Singgih, Moses Laksono; Ginardi, Raden Venantius Hari

doi:10.3390/network6020041

Open AccessArticle

Cognitive Network Intrusion Detection Systems: Anomaly and Malware Detection for Zero-Day Attack Resilience

by

Jimmy Agung Gunawan

,

Moses Laksono Singgih

^*

and

Raden Venantius Hari Ginardi

Interdisciplinary School of Management Technology, Institut Teknologi Sepuluh Nopember, Surabaya 60264, Indonesia

^*

Author to whom correspondence should be addressed.

Network 2026, 6(2), 41; https://doi.org/10.3390/network6020041 (registering DOI)

Submission received: 15 December 2025 / Revised: 29 May 2026 / Accepted: 15 June 2026 / Published: 18 June 2026

(This article belongs to the Special Issue Latest Advancements in Machine Learning Applications for Cybersecurity)

Download

Browse Figures

Versions Notes

Abstract

Traditional Network Intrusion Detection Systems (NIDSs) face persistent challenges in detecting zero-day attacks due to concept drift, high false-positive rates, and limited adaptability. This research introduces a Cognitive Network Intrusion Detection System (CNIDS) whose central novelty is that effective zero-day handling does not arise from any single mechanism but from the interaction between continual representation learning, persistent vector memory, and human-aligned feedback. By reframing zero-day resilience as a continuous learning process rather than a static detection task, CNIDS emphasizes adaptive operational behavior over raw automated accuracy. The proposed framework integrates Continual Pre-Training (CPT) to align representations with evolving traffic, Supervised Fine-Tuning (SFT) to preserve precision on known attacks, and a Human-in-the-Loop Reinforcement Signal (HRS) that converts low-confidence alerts into structured learning updates. These components are unified through a vector database that functions as long-term episodic memory, enabling similarity-based reasoning and cross-dataset generalization. Ablation results show that disabling any component degrades zero-day adaptation: removing CPT increases drift sensitivity, removing vector memory prevents knowledge retention, and removing human feedback collapses learning to static inference. Using a class-exclusion zero-day protocol on NSL-KDD, UNSW-NB15, and CICIDS2017, CNIDS raises zero-day detection from 0% to 18.2% while maintaining precision above 80% and stabilizing false positives.

Keywords:

cognitive network intrusion detection; human-in-the-loop learning; vector database memory; zero-day adaptation; network security

1. Introduction

The rapid evolution of cyberattacks poses critical challenges to the security of networks. Conventional Intrusion Detection Systems (IDSs) largely depend on static signatures or pre-trained models that cannot easily adapt to novel threats. AI-enhanced Cognitive Network Intrusion Detection Systems (CNIDSs) offer new capabilities for contextual reasoning and adaptive learning. This study proposes a next-generation IDS framework that integrates a vector database (vectorDB) with machine learning (ML) and human-in-the-loop decision-making for robust and explainable intrusion detection. An IDS is a cornerstone of modern cybersecurity infrastructure, serving as a vigilant sentinel that monitors network traffic and system activities for signs of malicious behavior [1,2]. The evolution of IDS technology has progressed through distinct phases [3], from early signature-based systems that relied on predefined attack patterns to anomaly-based approaches that utilize statistical methods and machine learning (ML) algorithms [4,5,6]. However, traditional IDS architectures face fundamental limitations when confronted with the sophisticated threat landscape of contemporary cyberspace [3,6,7,8]. The deployment challenges of current IDS implementations are multifaceted and severe in nature. Organizations report that up to 80% of security alerts constitute false positives [9,10], overwhelming security teams and creating dangerous blind spots where genuine threats can go undetected [11,12,13]. An average enterprise processes over 11,000 alerts daily, with 30% remaining uninvestigated owing to resource constraints and alert fatigue [14,15,16]. Furthermore, traditional systems struggle to analyze encrypted traffic [17], detect zero-day attacks [10,18,19], and adapt to dynamic threats [20,21], creating significant vulnerabilities in modern network environments [7]. Artificial Intelligence (AI), particularly in the form of ML, offers transformative potential to address these challenges through enhanced contextual reasoning [8,22], pattern recognition [23,24,25], and adaptive learning capabilities [22,24,25,26]. ML can process both structured network data and unstructured security logs, enabling deeper threat analysis and more accurate classification of security events [27,28,29]. The exceptional performance of ML-enhanced systems, which achieve detection accuracies exceeding 97% while reducing false-positive rates by up to 50%, demonstrates the significant advantages of AI integration in cybersecurity applications [25,30,31,32]. Fine-tuning ML through specialized training methodologies is crucial for achieving optimal cybersecurity performance [28,33,34]. CPT enables models to absorb domain-specific knowledge from unlabeled network traffic and security logs, thereby adapting to organization-specific communication patterns and baseline behaviors [35,36]. This approach has demonstrated performance improvements of up to 40.7% while requiring only 40% of the traditional training resources [37,38,39]. Supervised Fine-Tuning leverages expert-labeled cybersecurity datasets to teach precise attack pattern recognition, enabling the accurate identification of known threats and attack signatures [40,41,42]. The integration of specialized cybersecurity datasets, such as merged CyNER and APTNER resources, provides comprehensive training for security-specific entity recognition and threat classification [43,44,45]. Rather than competing with fully automated zero-day detectors, this work focuses on accelerating the transition from uncertainty to actionable knowledge through human-aligned adaptation. The contribution lies in demonstrating how continual representation learning, vector-memory reasoning, and analyst feedback jointly reduce zero-day exposure time while preserving precision and operational trust. This research does not claim state-of-the-art zero-day detection accuracy under fully automated inference. Instead, it evaluates how quickly CNIDS converts zero-day uncertainty into reliable knowledge while preserving precision and operational trust. Our contributions are (i) a deployment-oriented CPT–SFT–HRS architecture with a vector memory that enables similarity-aware decisions and cross-dataset reuse; (ii) a class-exclusion protocol for zero-day evaluation across NSL-KDD, UNSW-NB15, and CICIDS2017; and (iii) a focus on learning velocity and post-feedback precision stabilization as operational metrics that better reflect real-world resilience than static zero-day accuracy alone. This research further bounds simulated human feedback (delay, label noise assumptions) to present an upper-bound scenario for analyst availability and to maintain a conservative interpretation of gains.

2. Literature Study

Recent advancements in Network Intrusion Detection Systems (NIDSs) have increasingly emphasized the integration of machine learning (ML) and real-time analytics to address evolving cybersecurity threats. Traditional NIDS architecture often relies on static rule-based engines or isolated supervised models, limiting their adaptability to novel attack patterns. In contrast, CNIDS introduces a dual-process architecture that combines pretrained models, Supervised Fine-Tuning, and feedback-driven human reinforcement within a unified cognitive loop. As illustrated in Figure 1, the system comprises two interconnected workflows: (1) a training pipeline that processes heterogeneous data sources through normalization, feature extraction, and multi-stage learning to populate a vectorDB and (2) a detection cycle that continuously monitors real-time traffic, generates alerts, and incorporates analyst feedback to identify and assimilate zero-day threats. This closed-loop design enables the CNIDS to evolve dynamically, bridging the gap between automated inference and expert-guided learning. CNIDS evolves dynamically, bridging the gap between automated inference and expert-guided learning.

2.1. Continual Pre-Training (CPT)

CPT continuously adapts internal representations to evolving traffic patterns. An LSTM-based temporal model processes incoming feature streams to capture long-term dependencies and mitigate concept drift [46,47,48,49,50]. The process begins with a pre-trained base model and incorporates curated or synthetic datasets relevant to the target domain. To preserve existing knowledge while integrating new information, CPT employs techniques such as mixing general and domain-specific data, curriculum learning, and replay buffers. Hyperparameters, such as the learning rate and replay ratio, are carefully tuned to balance retention and adaptation, thereby minimizing the risk of catastrophic forgetting [51,52,53]. Model selection for CPT: The LSTM architecture was chosen because network traffic exhibits strong temporal dependencies, including packet order, burst patterns, and inter-arrival times [54,55,56], which recurrent networks are specifically designed to model. Moreover, LSTM offers a favorable trade-off between expressiveness and inference latency, making it suitable for real-time intrusion detection. In preliminary experiments, a Support Vector Machine (SVM) with an RBF kernel performed poorly on high-dimensional, non-linear traffic features without extensive manual kernel tuning, and it does not naturally accommodate sequential data. Generative Adversarial Networks (GANs) were not adopted because generating adversarial network flows for zero-day simulation would introduce substantial complexity and is orthogonal to our cognitive learning loop, which emphasizes representation adaptation and human feedback rather than synthetic data generation [57,58]. Transformer-based models, while capable of higher accuracy, have significantly higher inference costs (memory and latency) that are currently prohibitive for many deployment scenarios, especially where real-time processing on edge or limited-resource infrastructure is required. Nevertheless, we acknowledge transformers as a promising direction and list transformer-based IDS as a key area for future work. CPT operates on both labeled and unlabeled data, enabling proactive adaptation before explicit attack labels are available. In this research, CNIDS assesses incoming traffic using the CPT module to adapt its representations.

2.2. Supervised Fine-Tuning (SFT)

SFT is a direct and efficient method for adapting a pre-trained ML model to perform specific tasks using labeled data [59,60]. It is widely used to teach models to handle well-defined objectives, such as classification, summarization, or translation, by aligning outputs with human expectations through explicit examples. Although originally developed for LLMs, the same Supervised Fine-Tuning principles apply to our network traffic classifiers. The process begins with a general-purpose LLM and a curated dataset of input–output pairs, such as questions and their correct answers. During training, the model minimizes the prediction errors using a token-level cross-entropy loss function and updates its weights through gradient descent. Padding and label shifting are applied to ensure accurate sequence alignment, particularly in conversational or instruction-following formats [60,61,62]. SFT is based on supervised learning, in which the model learns directly from ground-truth labels. It is commonly implemented using a prompt-completion format, making it suitable for tasks that require structured responses [59,63,64,65]. Its simplicity and effectiveness make it ideal for domain adaptation, especially when the objectives are clear and labeled data are available. SFT enables models to specialize without sacrificing their general capabilities, making it a foundational step in many AI training pipelines across industries. The SFT module performs real-time classification by matching incoming traffic vectors with stored representations in the vectorDB. Based on the similarity thresholds and confidence scores, the traffic is classified as benign, a known attack, or uncertain. The SFT module provides explainable decisions by explicitly referencing the matched historical patterns.

2.3. Feedback-Driven Online Reinforcement (Human-in-the-Loop Reinforcement Signal)

HRS is an advanced method for aligning ML with human values and preferences [66,67,68]. HRS begins with a pre-trained ML and introduces a reward model trained on human feedback using pairwise comparisons or scalar ratings to evaluate the output quality. The model is then fine-tuned using reinforcement learning algorithms, such as Proximal Policy Optimization (PPO), to optimize the responses and maximize human-aligned rewards [69,70,71]. Key concepts include human-in-the-loop learning, where feedback actively shapes model behavior, and balancing exploration with exploitation to ensure novelty and reliability. HRS enhances model helpfulness, safety, and coherence, offering fine-grained control over the outputs. This is a critical technique for enhancing ethical alignment and minimizing harmful or irrelevant responses in open-ended generation tasks, thereby making AI systems more trustworthy and operationally relevant in real-world applications [72,73]. HRS represents a powerful approach to continuous system improvement by incorporating real-time analysis feedback into model optimization [74,75,76]. This methodology enables systems to learn analyst preferences, reduce false positives, and ensure that alerts are aligned with operational priorities [68,77]. Research has demonstrated that HRS can improve model safety by 34.2% and helpfulness by 34.3% while significantly reducing the burden on security teams through intelligent alert prioritization and contextual enrichment [67,78,79,80]. When the classification confidence is insufficient or anomalous behavior is detected, the CNIDS invokes human analyst feedback. A summary of the methods for enhancing the IDS is presented in Table 1.

2.4. Related Works and Research Gap

Existing IDS solutions include classical ML, deep learning, and hybrid models. Recent advances have introduced transformer- and graph-based anomaly detection. However, limited work has explored the application of Vector-Memory IDS, especially with ML and continuing learning capabilities that focus on zero-day attacks. This research positions the proposed framework as an extension of this emerging research direction. Despite significant advances in AI-driven cybersecurity, two critical research gaps persist that limit the effectiveness of current IDS implementations.

2.4.1. Gap 1: Limited Integration of Multimodal Learning with Domain-Specific Adaptation

Current network IDSs predominantly address either structured traffic analysis or unstructured log processing, yet few integrate both modalities within a unified machine learning (ML) framework. Although existing ML-based approaches demonstrate potential for traffic analysis, they generally lack comprehensive domain adaptation mechanisms. The absence of CPT strategies tailored to cybersecurity domains leads to suboptimal performance when encountering novel attack vectors or evolving network environments. This limitation is particularly pronounced in enterprise settings, where network behavior changes dynamically due to infrastructure modifications, new applications, and evolving business processes. Prior work has shown that most ML-based IDSs rely on manually engineered flow features while disregarding raw log semantics [10] and that effective integration between alert correlation and heterogeneous data sources remains elusive [12]. Furthermore, dataset integration across different formats persists as an open challenge [21]. Specific to the lack of cybersecurity-oriented Continual Pre-Training, existing pre-training methods for network traffic are predominantly static and fail to adapt to evolving attack patterns [35], a limitation that directly motivates our use of CPT.

2.4.2. Gap 2: Insufficient Human–AI Collaboration Frameworks for Real-Time Threat Response

Existing intrusion detection systems (IDSs) inadequately integrate human expertise through systematic feedback mechanisms. Current deployments generate an overwhelming volume of alerts, with false-positive rates reaching 80%, which induces analyst fatigue and increases the risk of missed critical threats [13,15]. Prior controlled experiments have demonstrated that false alarm rates exceeding 30% significantly degrade analyst performance [13], and cloud-based anomaly detection systems have been shown to suffer from false-positive rates above 80% in practice [15]. Therefore, security operations centers devote more than half of their operational time to false alerts, often at the expense of genuine threat detection [77]. Although HRS has proven effective in aligning machine learning behavior with human preferences in domains such as natural language processing and robotics, its application within cybersecurity remains underexplored [66,67,68]. The absence of structured frameworks for incorporating analyst feedback into continuous model improvement creates a persistent disconnect between automated detection capabilities and operational security requirements, thereby limiting the practical effectiveness of ML or AI-driven IDS in real-world deployments.

3. Methodology and Experimental Setup

The proposed CIDS framework employs a comprehensive three-stage training pipeline designed to optimize ML performance for intrusion detection tasks. The methodology incorporates established cybersecurity datasets and evaluation metrics to ensure a rigorous performance assessment and real-world applicability.

3.1. Overall Architecture

The proposed CNIDS adopts a layered and service-oriented architecture designed to address evolving network behaviors and zero-day attacks through continuous learning and human feedback. The system integrates CPT, SFT, and HRS into a unified cognitive loop. CNIDS is composed of five principal layers:

Data acquisition and preprocessing;
Unified vectorDB, which functions as a long-term episodic memory rather than a generative retriever;
Cognitive learning core;
Decision and response layer;
Human-in-the-loop feedback layer.

This modular design ensures scalability, adaptability to concept drift, and operational robustness in real-time settings. This study uses scalar human feedback as a reinforcement signal for model and memory updates, rather than full policy optimization (e.g., PPO).

3.2. Data Acquisition and Feature Representation

This study employs three widely used intrusion detection benchmarks—NSL-KDD, UNSW-NB15, and CICIDS2017—which collectively provide diverse representations of benign and malicious network traffic. These datasets are selected to ensure robustness across varying traffic patterns and attack taxonomies. All preprocessing steps are designed to maintain consistency, prevent data leakage, and support reproducible evaluation under the class-exclusion protocol.

3.2.1. Data Preprocessing and Encoding

To standardize feature distributions, all numerical features are normalized using z-score normalization:

x^{'} = \frac{x - μ}{σ}

, where

μ

and

σ

are computed strictly from the training set to avoid leakage. Categorical features are encoded using training-set-only transformations (e.g., label or target encoding). Categories not observed during training are mapped to an “unknown” token during testing. This ensures that unseen patterns introduced by zero-day samples are not inadvertently learned during preprocessing.

3.2.2. Feature Inclusion and Selection

Feature selection is governed by an explicit, data-driven criterion based on Information Gain (IG) based on the formula

I G (f) = H (y) - H (y ∣ f)

; only features satisfying

I G (f) \geq 0.4

are retained. This threshold ensures that selected features provide meaningful discriminatory power while avoiding arbitrary manual exclusion. The resulting feature space is therefore consistent, reproducible, and aligned with the underlying class structure.

3.2.3. Correlation Analysis Between Known and Zero-Day Samples

To ensure that the class-exclusion protocol does not produce a trivial or random evaluation, this research analyzes the relationship between known and withheld classes using two complementary measures. Mutual Information (MI) between features and class labels is computed as

M I (f; y) = \sum_{f, y} p (f, y) l o g \frac{p (f, y)}{p (f) p (y)}

, where the analysis shows that while high-MI features for known classes remain partially informative, several features exhibit reduced MI for withheld classes, indicating a shift in feature relevance. This confirms that zero-day samples are not trivially captured by the same feature dependences. Cosine similarity between feature vectors is computed as

sim (x_{i}, x_{j}) = \frac{x_{i} \cdot x_{j}}{∥ x_{i} ∥ ∥ x_{j} ∥}

and then compares intra-known similarity with known-to-zero similarity. Empirically, this research observes formula

E [{sim}_{k n o w n - k n o w n}] ≫ E [{sim}_{k n o w n - z e r o}]

, indicating that zero-day samples occupy distinct regions in the feature space. This separation introduces a non-trivial detection challenge consistent with open-set conditions. This design maximizes the distributional shift while remaining realistic. As shown by the MI and cosine similarity analyses, the feature space does not trivially separate zero-day samples from normal traffic; instead, the zero-day samples are detectable in principle (they are not pure noise) but lie far from known attack clusters. Thus, this is not a random trial; the withheld classes are chosen to maximize distributional shift while remaining realistic, and the feature space is analyzed to ensure detectability is not trivially impossible.

3.2.4. Implications for Zero-Day Detection

These results demonstrate that zero-day samples are distributionally shifted yet structurally related to known classes. Consequently, detection cannot rely solely on static classification boundaries. Instead, the framework leverages CPT to adapt feature representations, VectorDB to establish similarity-based boundaries, and HRS to incorporate new knowledge.

3.2.5. Experimental Validity

The class-exclusion protocol is designed to maximize distributional shift while preserving detectability, ensuring a controlled and reproducible evaluation. Therefore, it should be interpreted as a systematic approximation of zero-day conditions rather than a fully unconstrained real-world scenario.

3.3. Unified Vector Database (VectorDB)

The unified vectorDB serves as long-term memory for CNIDS. It stores feature vectors and their semantic annotations, including the attack labels and zero-day indicators. Formally, the vectorDB storage and development are defined as

V = {(x_{i}, y_{i}, z_{i})}_{i = 1}^{N},

where

y_{i} \in {0,1}

denotes benign or malicious traffic and

z_{i} \in {0,1}

indicates known or zero-day attacks. This memory enables efficient similarity searches, anomaly scoring, and incremental knowledge accumulation across various domains.

3.4. Foundational Cognitive Learning Framework

3.4.1. Unsupervised Continued Pre-Training (CPT)

CPT continuously adapts internal representations to evolving traffic patterns. An LSTM-based temporal model processes incoming feature streams to capture long-term dependencies and mitigate concept drift. The hidden state update is defined as

h_{t} = L S T M (x_{t}, h_{t - 1})

, where

h_{t}

represents the learned temporal context at time

t

. CPT operates on both labeled and unlabeled data, enabling proactive adaptation before explicit attack labels are available. In the CPT phase, the LSTM-based temporal model is trained on unlabeled and partially labeled traffic streams to learn evolving network patterns in

h_{t}

formulation. Data from each dataset were introduced sequentially to emulate the concept drift over time. The CPT module continuously updates its parameters without resetting the previously learned representations, thereby allowing for incremental adaptation to new traffic behaviors.

3.4.2. Supervised Task-Specific Fine-Tuning (SFT)

Following the CPT, Supervised Fine-Tuning was performed using the labeled samples stored in the vector warehouse. Known attack and benign samples were used to refine the classifier’s decision boundaries. The SFT module was trained incrementally in batches to reflect online deployment scenarios rather than offline retraining. The SFT module performs real-time classification by matching incoming traffic vectors with stored representations in the vectorDB. Cosine similarity is used to quantify the closeness between vectors as in the equation

s i m (x, x_{i}) = \frac{x \cdot x_{i}}{∥ x ∥ ∥ x_{i} ∥}

; based on the similarity thresholds and confidence scores, the traffic is classified as benign, a known attack, or uncertain. The SFT module provides explainable decisions by explicitly referencing the matched historical patterns.

3.4.3. Reinforcement Signal Acquisition via Human Feedback

When the classification confidence is insufficient or anomalous behavior is detected, the CNIDS invokes human analyst feedback. This feedback acts as a reinforcement signal to refine the system’s behavior and enrich the vectorDB warehouse.

Feedback (Equation (1)) is modeled as a reward signal

r_{t}

:

r_{t} = {\begin{matrix} + 1, & confirmed attack, \\ - 1, & false - positive, \\ + 2, & zero - day confirmation . \end{matrix}

(1)

During inference, traffic instances with low classification confidence or high anomaly scores were flagged as ambiguous. These instances were presented to a simulated human analyst for labeling. The resulting feedback was incorporated through the HRS mechanism, which updated both the model parameters and the vector warehouse. Zero-day samples identified through this process were considered novel attack classes and were subsequently integrated into the training dataset. The HRS mechanism ensures continuous alignment between automated detection and expert knowledge. In this research, human feedback is simulated as a scalar signal

r_{t}

applied to low-confidence or anomalous instances, with a fixed feedback delay of <7 min reflecting best-case analyst availability. Labels are treated as binary confirmations (no abstentions) without additional contextual notes; thus, the reported gains represent an upper bound on real-world performance. This study did not model conflicting labels or fatigue, and this research capped the feedback volume per interval to avoid unrealistic burst learning. These assumptions bound the bias while making the adaptation effect measurable and reproducible.

3.5. Decision and Response Layer

The final decision integrates the outputs of the CPT and SFT using a confidence-aware ensemble mechanism. Depending on the predicted threat level, the system triggers appropriate responses, including alert generation, traffic blocking, and escalation to human analysts for further investigation. This design balances detection accuracy with operational efficiency.

3.6. Mathematical Model and Problem Formulation

3.6.1. Problem Definition

Given a continuous stream of network traffic represented by feature vectors

{x_{t}}_{t = 1}^{\infty},

The objective of CNIDS is to learn a decision function as follows:

f (x_{t}) \to {" benign ", " known attack ", " zero - day attack "}

(2)

while adapting to evolving network behaviors and minimizing false alarms but still focusing on zero-day-attack patterns.

3.6.2. Anomaly Scoring Function

An anomaly score is computed to quantify the deviation from the known traffic patterns shown by Equation (3):

A (x) = α (1 - \underset{i}{m a x} s i m (x, x_{i})) + β σ (x)

(3)

where

σ (x)

denotes feature variance and

α + β = 1

. The traffic patterns in this study were saved in vectorDB for each model (NSL_KDD, UNSW-NB15, CICIDS2017 and the zero-day attack).

3.6.3. Zero-Day Detection Criterion

To evaluate the zero-day detection performance, selected attack categories from each dataset were withheld during the initial Supervised Fine-Tuning phase of the model. These unseen attacks were introduced only during the testing phase, thereby simulating realistic zero-day scenarios. The detection performance for these withheld attacks was used to assess the ability of the system to identify novel threats without prior explicit labels. The zero-day likelihood is determined by combining the anomaly and similarity measures as follows:

Z (x) = I (\underset{i}{m a x} s i m (x, x_{i}) < τ) \cdot A (x)

(4)

where

τ

is a similarity threshold and

I (\cdot)

is the indicator function.

3.6.4. Unified Learning Objective

The overall learning objective integrates supervised loss, representation adaptation, and reinforcement feedback as follows:

\underset{θ}{m i n} \sum_{t} L_{S F T} (θ) + λ L_{C P T} (θ) - γ E [r_{t}]

(5)

where

λ

and

γ

control the influences of continual learning and human feedback, respectively. The proposed IDS framework comprises four primary components:

Preprocessing and Feature Extraction: parsing and embedding of network flow features;
ML-Enhanced Detection Engine: semantic reasoning for anomaly classification;
Vector-Memory-Augmented IDS for Threat Intelligence Integration: context-aware augmentation using security feeds;
Human-in-the-Loop Verification: Final confirmation to mitigate false-positive results.

The performance of the CNIDS is evaluated using standard classification and detection metrics, including accuracy, precision, recall, and F1-score. In addition, the improvement after HRS, which leads to precision under analyst confirmation, was defined as the proportion of previously unseen attacks that were correctly identified as malicious or anomalous. To assess robustness against concept drift, performance metrics were monitored over time as new traffic distributions were introduced. The false-positive rate (FPR) was measured to evaluate the operational feasibility of real-world deployments.

The process diagram proposed for research completion is shown in Figure 2, which identifies the proposed CNIDS completion to achieve and adapt to a zero-day attack in the system.

3.7. Dataset Selection and Preprocessing

The experimental framework utilized three benchmark datasets, CICIDS2017, NSL-KDD, and UNSW-NB15, which represent diverse attack scenarios and network environments. CICIDS2017 contains over 2 million records with 78 attributes, encompassing modern attack types, including brute-force, DoS, web, infiltration, botnet, DDoS, and port scan attacks. NSL-KDD provides 42 features with resolved redundancy issues from the original KDD’99 dataset, whereas UNSW-NB15 offers 49 features with contemporary attack vectors. Data preprocessing included normalization, feature selection using Information Gain Thresholds (0.4), and conversion to appropriate input formats for neural network processing.

3.8. Implementation Details and Experimental Workflow

This experimental research adopts a class-exclusion zero-day simulation, consistent with prior IDS studies. The baselines include the ML Isolation Random Forest, CNN, LSTM, and Transformer-based IDS models called ML Ensemble. The metrics covered include Accuracy, Precision, Recall, F1-score, and latency. The experiments used a 70/30 train–test split with 5-fold cross-validation and bootstrapped confidence intervals. Statistical tests (t-test/Wilcoxon) were applied at the 0.05 significance level. To validate the proposed framework, this research evaluated it using three public IDS datasets and one real dataset from the log of our system: NSL-KDD, UNSW-NB15, CICIDS2017, and the zero-day Attack Database. Figure 3 shows the process plan in Python scripts (ver 3.12.3 in this study). The detailed flow for the main detection pipeline in Figure 3 can be defined and followed with the following steps.

The client sends a POST request to detect our pipeline with a JSON payload containing the features;
In app.py, the request is handled by the detect or full_pipeline function;
The function calls
- sft_engine.process_query() to obtain the SFT results;
- cpt_engine.predict() to get the CPT result.

4.: The results are combined using calculate_ensemble_result();
5.: If a zero-day is detected, the function triggers rhs_engine.process_feed() in the background;
6.: The response is sent back to the client with the detection results;
7.: In addition, the system can be trained, tested, and reported using the respective end points and scripts.

The sequence diagram for the detection of the previous steps is shown in Figure 3, which illustrates the main components and their interactions. Data flow is described in Figure 3:

Forward Flow: Request → Feature Extraction → Engine Processing → decision → response;
Feedback Flow: Detection Results → Human Feedback → HRS Engine → Database Updates → Improved Detection;
Training Flow: Data → Model Training → Model Storage → Deployment → Inference.

The proposed architecture offers several key advantages, as shown in Figure 3, as summarized in five points:

Adaptability: Continuous retraining ensures resilience against concept drift and zero-day attacks;
Explainability: Human feedback introduces interpretability and traceability;
Scalability: Microservice separation enables the independent scaling of inference and training workloads;
Dataset Agnosticism: Unified vector representation supports heterogeneous training sources;
Operational Practicality: SQLite-backed vectorDB enables lightweight deployment while preserving auditability.

3.9. Zero-Day Simulation Protocol

To evaluate the ability of the proposed Cognitive Network Intrusion Detection System (CNIDS) to handle previously unseen threats, this study adopts a class-exclusion zero-day simulation protocol, a controlled experimental design widely used in intrusion detection research. In this setting, selected attack categories are intentionally excluded from the supervised training phase and introduced only during testing, thereby approximating zero-day conditions where labeled examples are unavailable at deployment time. Unlike random exclusion strategies, the selection of withheld classes in this study follows a predefined, theory-driven criterion to ensure both realism and analytical validity. Specifically, three selection principles are applied:

Semantic Distance from Dominant Attack Families: Withheld classes are chosen to be structurally and behaviorally distinct from the majority attack categories present in the training data. For example, volumetric attacks (e.g., DoS/DDoS) are separated from application-layer or privilege-escalation attacks (e.g., infiltration or backdoor activity). This ensures that zero-day samples occupy different regions in the feature space, making detection non-trivial.
Low Class Frequency (Rarity Constraint): Attack categories with relatively low representation in the original dataset are prioritized for exclusion. This prevents the model from implicitly learning their characteristics during training and better reflects the rarity typically associated with zero-day threats in real-world environments.
Real-World Plausibility: The selected classes correspond to attack types that are commonly observed as emerging or evolving threats in operational cybersecurity settings (e.g., infiltration, credential abuse, or backdoor activity), thereby enhancing the ecological validity of the evaluation.

Under this protocol, the model is trained exclusively on benign traffic and known attack classes, without exposure to the withheld categories. During evaluation, instances belonging to the excluded classes are treated as zero-day samples, and detection performance is assessed without prior label-dependent adaptation.

This protocol, shown in Table 2, is therefore best interpreted as a controlled approximation of zero-day conditions, designed to evaluate learning adaptability rather than absolute zero-day detection accuracy under fully unconstrained environments.

3.9.1. Formal Definition of Class-Exclusion Protocol

The full labeled dataset is defined, as shown, as

D = {(x_{i}, y_{i})}_{i = 1}^{N}, y_{i} \in C,

where

x_{i} \in R^{d}

is the feature vector;

y_{i}

is the class label; and

C = C_{k n o w n} \cup C_{z e r o}

is the full class set. This research defines a partition of class space of known classes (training-visible)

C_{k n o w n}

; zero-day classes (withheld during training) are defined as

C_{z e r o}

such that

C_{k n o w n} \cap C_{z e r o} = \emptyset

.

The training set construction is such that the training dataset is restricted to known classes only,

D_{t r a i n} = {(x_{i}, y_{i}) ∣ y_{i} \in C_{k n o w n}}

, and the test set construction is such that the evaluation dataset includes both known and zero-day classes as

D_{t e s t} = {(x_{j}, y_{j}) ∣ y_{j} \in C_{k n o w n} \cup C_{z e r o}}

.

The zero-day definition (operational) is as follows: a sample is considered zero-day if

y_{j} \in C_{z e r o} and (x_{j}, y_{j}) \notin D_{t r a i n} .

The distributional shift property means that the protocol induces a class-conditional distribution shift

P (x ∣ y \in C_{z e r o}) \neq P (x ∣ y \in C_{k n o w n})

; this ensures that zero-day samples lie in distinct feature regions, making detection non-trivial.

3.9.2. Connection to Zero-Day Detection Criterion

Under the class-exclusion protocol, all stored reference vectors in the vectorDB:

V = {(x_{i}, y_{i})}

(6)

Satisfy:

y_{i} \in C_{k n o w n}

Thus, for any test sample

x

:

If

x \in C_{k n o w n}

, then

\underset{i}{m a x} s i m (x, x_{i}) \geq τ

(7)

If

x \in C_{z e r o}

, then

\underset{i}{m a x} s i m (x, x_{i}) < τ (expected under distributional shift)

(8)

This interpretation formalizes the definition of the class-exclusion protocol, where the similarity threshold

τ

functions as a demarcation line, separating regions characterized by established knowledge from those marked by uncertainty. The anomaly term

A (x)

operationalizes this distinction by ensuring that conditions of low similarity in conjunction with high deviation yield a pronounced signal indicative of potential zero-day anomalies. The class-exclusion protocol ensures that the similarity-based decision boundary in the Zero-Day Detection Criterion is not trivially satisfied, as zero-day samples are structurally absent from the vector memory. This enforces a genuine open-set detection condition rather than closed-set classification.

4. Results

The experimental evaluation of the proposed CNIDS is designed to assess adaptive performance under uncertainty, rather than relying solely on static zero-day detection accuracy. In practical deployment contexts, zero-day attacks are inherently absent from the training distribution, and fully automated detection systems typically exhibit either negligible detection capability or unacceptably high false-positive rates. Accordingly, the evaluation framework is structured around four complementary dimensions: (i) baseline automated inference, (ii) zero-day detection under the class-exclusion protocol, (iii) the impact of human-in-the-loop feedback (HRS), and (iv) ablation analysis to quantify the contribution of individual system components. In addition, statistical significance testing and temporal learning dynamics are reported to provide a comprehensive performance characterization. All experiments adhere to the class-exclusion zero-day protocol described in Section 3.9. For each dataset (NSL-KDD, UNSW-NB15, CICIDS2017), a 70/30 train–test split is applied, with selected attack classes withheld during training and introduced exclusively at the evaluation stage. Reported results are averaged over five cross-validation runs and accompanied by 95% bootstrap confidence intervals to ensure robustness and reproducibility.

4.1. Baseline Performance

Before incorporating human feedback, CNIDS was evaluated as a purely supervised classifier (i.e., SFT only, without CPT activation or HRS). Table 3 summarizes baseline performance on known attacks.

The system achieved high recall and F1-score for known attacks, confirming that SFT alone is effective for previously seen threat patterns. However, as shown in the next subsection, this supervised model failed to generalize to withheld zero-day classes, with detection rates near zero.

4.2. Zero-Day Detection Under Class-Exclusion

Under the class-exclusion protocol, the baseline (SFT-only) was tested on the withheld attack categories. Across all datasets, the initial zero-day detection rate (ZDR) was 0% for the SFT-only configuration, as the model had never seen those classes during training. When the full CNIDS (SFT + CPT + vector memory) was run without human feedback (i.e., autonomous inference only), ZDR increased marginally to 10.13% (373 out of 3681 zero-day samples), with a precision of 40.76% and an FPR of 2.6% on normal traffic. As shown in Figure 4, the system missed 3308 zero-day samples and generated 69 false alarms, yielding a precision of 40.76%. In the baseline phase, the system functioned purely as a supervised classifier, detecting only known attacks with a zero-day detection rate of 0%. Following feedback integration around 13:40, detection rates improved sharply to 21–26%, stabilizing in the optimized phase at 18–25% with precision levels between 81 and 94% and low false alarm rates (0.79–2.61%).

Comparative analysis confirmed the statistical significance of the feedback integration. Zero-day detection improved from 0% to 18.2%, the precision increased to 82.86%, and the false alarms remained controlled. The feedback loop demonstrated rapid learning (<7 min response time), effective knowledge retention, and adaptability to novel patterns. While immediate improvements were evident, further enhancements in feedback volume, quality, and automation are required to achieve production-ready zero-day detection.

Overall, the results validate the effectiveness of reinforcement through HRS in enabling zero-day detection capabilities, transforming the system from a purely supervised classifier into a hybrid model capable of adapting to emerging threats, as shown in Figure 5, where accuracy spiking also increased after the (adaptive) HRS kick-in. The decision-making logic of the CNIDS is governed by a conditional routing mechanism that dynamically selects between automated classification and human-guided reinforcement. This is computed and ratified as shown in Equation (9):

C N I D S (x) = {\begin{matrix} f_{θ} (x), & if P \geq τ \\ H R S (x) & otherwise \end{matrix}

(9)

Where:

x

is the input feature vector;

f_{θ} (x)

is the supervised model prediction;

H R S (x)

is the output from the HRS’s Signal mechanism;

P

is the confidence or similar score;

τ

is the decision threshold.

Figure 5. Accuracy Trend for Zero-day Attack Detection.

This formulation enables adaptive routing based on the model’s confidence. When the system is confident (

P \geq τ

), it proceeds with autonomous classification. Otherwise, it defers to HRS for human-guided evaluation, which is particularly useful for zero-day threats and ambiguous traffic patterns. Tests on several benchmark datasets showed that CNIDS achieves high detection accuracy and is especially effective at spotting new and unseen attacks compared with traditional methods. CNIDS offers a flexible, adaptive, and practical approach to maintaining network security. The improvement after HRS, shown in Table 4, which led to precision under analyst confirmation, was defined as the proportion of previously unseen attacks that were correctly identified as malicious or anomalous. After enabling human feedback (HRS) and allowing the system to learn from low-confidence alerts, ZDR improved to 18.2% and precision rose to 82.86%, while FPR remained below 3% (Table 4), confirming that zero-day detection is not feasible with supervised learning alone; the combination of Continual Pre-Training, vector memory, and human feedback is required.

For comparison, this research evaluated several baseline models under the same class-exclusion protocol (Table 5). CNIDS (with HRS) significantly outperforms all baselines in zero-day detection while maintaining low FPR.

4.3. Impact of Human Feedback (HRS)

Figure 6 shows the temporal evolution of zero-day detection precision after the first HRS intervention (at approximately 13:40 min into the evaluation). Prior to feedback, ZDR was effectively zero. Within 5 min of receiving analyst confirmations (reward signal

r_{t} = + 2

for zero-day samples), the detection rate rose to 18–26% across datasets, stabilizing at 18.2% (average) with precision between 81% and 94% and FPR between 0.8% and 2.6%. The learning velocity was high: the system assimilated novel attack patterns in less than 7 min, as measured by the time from the first low-confidence alert to reliable detection. The feedback mechanism also reduced false positives on normal traffic. Before HRS, the autonomous system generated occasional false alarms on edge cases; after incorporating analyst correction (reward

r_{t} = - 1

for false positives), the FPR decreased from an initial 8.2% (on known attacks) to below 3% on zero-day runs. This demonstrates that HRS aligns system behavior with operational expectations. Precision rises rapidly within 5 min and stabilizes at approximately 18–19%.

4.4. Ablation Analysis

To assess the necessity of individual components, this research conducted a qualitative ablation study by removing each of the three core modules: CPT, vector memory, and HRS. Table 6 summarizes the observed impact on zero-day adaptation and precision.

Key observations:

Removing CPT increased sensitivity to concept drift, with delayed adaptation to novel traffic patterns and reduced stability after feedback integration. This confirms that continual representation alignment is essential for dynamic environments.
Removing vector memory limited the system’s ability to retain and reuse previously assimilated attack patterns, resulting in inconsistent responses to recurring zero-day instances and diminished instance-level explainability.
Removing HRS caused the system to revert to static inference behavior, showing minimal improvement in zero-day handling and a higher incidence of ambiguous alerts.

No single component independently enables adaptive intrusion handling; rather, the interaction of CPT, persistent vector memory, and structured feedback is required for controlled adaptation, precision preservation, and operational robustness.

4.5. Statistical Significance and Temporal Behavior

All reported differences were tested for statistical significance using paired t-tests and Wilcoxon signed-rank tests (α = 0.05). The improvement in zero-day detection rate after HRS (from 10.1% to 18.2%) was significant (p < 0.01) across all three datasets. Precision increases were also significant (p < 0.01). The reduction in FPR after analyst feedback (from 8.2% to <3% on normal traffic) was significant (p = 0.02). To characterize learning velocity, this research recorded the cumulative zero-day detection rate as a function of time after the first feedback signal. Within the first 2 min, ZDR rose to 12%; after 5 min, it reached 18%; and after 10 min, it stabilized at approximately 18–20% with no further significant increase (plateau). This rapid initial learning (τ90 ≈ 6 min) indicates that the combination of vector memory and HRS enables fast assimilation of novel attack patterns. No catastrophic forgetting was observed; the system retained high precision on known attacks throughout the feedback integration period.

Figure 7 shows a control chart with the intervention point clearly marked. Before feedback, accuracy fluctuated around 60–65% (due to zero-day samples being misclassified); after HRS activation, accuracy rose to 82–85% and remained stable over subsequent test windows. A cumulative sum (CuSum) analysis, provided in the Supplementary Material, confirms a statistically significant shift in the mean accuracy after feedback.

Lastly, this research notes that the feedback delay was fixed at <7 min, representing a best-case analyst availability scenario. In the discussion (Section 5), this research will address the implications of larger delays and potential label noise.

5. Discussion

Deploying an IDS in real-world environments often reveals a host of operational and architectural pitfalls that undermine its effectiveness. One of the most critical issues is the presence of blind spots in network visibility, where segments of traffic or endpoints evade monitoring because of misconfigured sensors, encrypted payloads, or architectural gaps. CNIDS should be understood as an adaptive framework rather than a standalone zero-day detector.

As shown in Table 7, these blind spots allow threats to propagate. Compounding this is alert fatigue, where excessive false positives overwhelm analysts, leading to neglected monitoring and missed threats. Many IDS implementations rely excessively on signature-based detection, which, although effective against known threats, often fail to identify novel or polymorphic attacks. Poor baseline modeling further exacerbates detection challenges; without a clear understanding of normal behavior, anomaly-based systems generate noise rather than insight. In specialized environments, such as pharmaceutical R&D networks or hybrid cloud infrastructures, generic rule sets often prove inadequate, failing to account for domain-specific workflows and data flows.

As shown in Table 8, the lack of feedback loops between detection, analyst input, and model retraining stifles system adaptability and prevents continuous improvement and contextual learning. Resource overload, whether in terms of CPU, memory, or analyst bandwidth, can cripple IDS performance, particularly when deployed at scale without proper tuning or prioritization. Addressing these pitfalls requires a modular, feedback-driven approach that integrates domain expertise, dynamic baselines, and scalable resource management to ensure that IDS remain both responsive and resilient. All IDS Deployment pitfalls are shown in Table 8, and the strategies to overcome these are shown in Table 9.

In the design and deployment of IDS, selecting appropriate detection engine options is critical for achieving robust threat identification across diverse operational environments. Detection engines typically fall into three primary model types: signature-based, anomaly-based, and hybrid. Signature-based engines rely on predefined patterns of known threats, offering high precision for established attack vectors but limited adaptability to new exploits. Anomaly-based models, often leveraging statistical profiling or ML algorithms, detect deviations from established baselines, making them suitable for identifying zero-day attacks and insider threats, albeit with higher false-positive rates. Hybrid models aim to balance these strengths by integrating rule-based heuristics with adaptive-learning mechanisms. Use cases vary widely depending on the deployment context; enterprise networks may prioritize signature-based engines for compliance and low-latency detection, whereas industrial control systems or cloud-native architectures benefit from anomaly-based or hybrid approaches that accommodate dynamic traffic and evolving threat surfaces. Tools for detection engines include open-source platforms such as Snort and Suricata for signature-based detection, as well as frameworks such as Zeek or custom Python-based pipelines for anomaly detection. ML toolkits such as Scikit-learn, TensorFlow, and PyTorch are increasingly integrated into IDS workflows to support model training and refinement. Modular platforms such as Apache Kafka and Elasticsearch facilitate scalable data ingestion and real-time analytics. The choice of detection engine must align with the threat model, resource constraints, and feedback integration strategy of the system to ensure continuous improvement and operational resilience. This research optimized the AI-Ops-driven IDS, as shown in Table 10.

Key tools for evaluating IDS performance include MLflow, which facilitates the tracking of experiments, model parameters, and detection metrics over time. For real-time system monitoring, Prometheus paired with Grafana offers robust visualization and alerting capabilities. Figure 8 shows the Unified Modeling Language (UML) component diagram for building the CNIDS.

The experimental results confirm that CNIDS effectively addresses several key challenges in modern intrusion detection. First, the unified vector warehouse enables cross-domain generalization and reduces dependence on dataset-specific features. Second, continued pre-training allows the system to adapt to evolving traffic patterns without requiring costly retraining. Third, incorporating human feedback provides a controlled mechanism for assimilating novel attack knowledge and improving long-term robustness. Despite these advantages, certain limitations exist. Reliance on human feedback introduces latency and scalability constraints in high-throughput settings. Although the LSTM-based CPT module captures temporal dependencies, more expressive architectures may further enhance the representation learning. These limitations point to promising directions for future research, including automated feedback modeling and distributed learning strategies. This research adopts a class-exclusion zero-day simulation protocol, consistent with prior intrusion detection studies, in which selected attack categories are withheld during supervised training and introduced only during evaluation to emulate previously unseen threats.

The ablation analysis (Ablation Analysis table in Section 4.4) indicates that no single component enables effective zero-day handling in isolation. Instead, adaptive performance emerges from the interaction between continual representation learning, persistent vector memory, and human-aligned feedback. To further validate the necessity of each CNIDS component under the class-exclusion zero-day protocol (Section 3.9), this research analyzes how the protocol interacts with CPT, VectorDB, and HRS. Formal Dependency Mapping under class exclusion is explained in Section 3.9.1, and the detection relies on

Z (x) = I (\underset{i}{m a x} sim (x, x_{i}) < τ) \cdot A (x)

.

The detection framework is underpinned by three critical dependencies, each of which governs a distinct dimension of system robustness. First, CPT enables representation adaptation, ensuring that the distributional alignment between training and test data is preserved. In its absence, the divergence between

P (x_{test})

and

P (x_{train})

produces poor feature alignment, which in turn manifests as elevated false negatives and a diminished capacity for timely adaptation following feedback. Second, VectorDB establishes the similarity boundary by providing a stable reference memory against which new inputs can be evaluated. Without this component, the maximum similarity measure

{m a x}_{i} sim (x, x_{i})

becomes undefined or unstable, leading to the collapse of the formalized decision boundary. Third, HRS facilitates knowledge assimilation, enabling the transition from uncertainty to establish knowledge by mapping zero-day classes (

C_{zero}

) into known categories (

C_{known}

). In the absence of HRS, detection remains static, with zero-day signals failing to evolve into recognized knowledge. Cooperatively, these dependencies highlight the necessity of CPT, VectorDB, and HRS as foundational mechanisms for adaptive, memory-driven, and knowledge-integrative detection. The class-exclusion protocol enforces an open-set condition where zero-day classes are absent from training. Under this constraint, ablation results demonstrate that CPT, vector memory, and HRS are not independent improvements but jointly necessary components to operationalize Equation (9) for adaptive zero-day detection, as shown in Table 11.

6. Conclusions

This research presents CNIDS, a cognitive and continuously adaptive intrusion detection architecture that synergistically integrates CPT, SFT, and HRS. Unlike traditional static IDS models, CNIDS supports lifelong learning by dynamically incorporating new labeled data and human expertise into the decision-making process. The unified vectorDB enables seamless learning across heterogeneous datasets, whereas the microservice-oriented deployment ensures scalability and modular extensibility. The incorporation of HRS significantly enhances system resilience against unknown and zero-day attacks by introducing explainability and controlled adaptability, a human-aligned adaptive learning pipeline that converts zero-day uncertainty into learnable knowledge faster than static IDS. Future work will focus on migrating to distributed vector databases, incorporating attention-based deep learning models, and evaluating CNIDS in large-scale real-world network environments. The proposed framework provides a solid foundation for intelligent, self-evolving cybersecurity systems, with the following key innovations:

Ensemble Approach: combines rule-based (SFT) and ML-based (CPT) detection;
Zero-day Focus: specialized algorithms for detecting novel attacks;
Continuous Learning: HRS engine improves the system based on feedback;
Comprehensive Testing: multiple testing strategies that ensure robustness;
Detailed Reporting: automated and human-readable reports.

A comprehensive evaluation of the CNIDS demonstrates its strong capability in detecting known attacks, achieving consistent recall and high F1-scores across multiple test runs. However, the baseline performance revealed limited effectiveness in identifying zero-day threats, with detection rates averaging only 10.23%. The integration of HRS significantly improved zero-day detection, raising the precision to above 80% and enabling adaptive learning within minutes of feedback incorporation. These findings highlight the importance of human-guided reinforcement in bridging the gap between automated detection and expert judgment. Beyond performance metrics, the architecture of CNIDS introduces several novel contributions to the field of network intrusion detection:

Human-Aligned Adaptive IDS Architecture, a cognitive intrusion detection framework that integrates continual representation learning, supervised classification, and human feedback into a unified adaptive loop, explicitly designed to handle uncertainty arising from zero-day attacks;
Vector-Memory-Augmented Detection, a unified vector database that acts as long-term episodic memory, enabling similarity-based reasoning, cross-dataset generalization, and explainable instance-level decisions across heterogeneous intrusion datasets;
Feedback-Driven Zero-Day Assimilation, a Human-in-the-Loop Reinforcement mechanism that transforms ambiguous or low-confidence detections into structured learning signals, improving precision and reducing false positives without requiring full retraining;
Learning Velocity as an Evaluation Perspective, an experimental analysis emphasizing adaptation speed, precision after feedback, and operational robustness, rather than relying solely on static zero-day detection rates;
Deployment-Oriented Design, a modular microservice architecture with persistent model storage, standardized logging, and reproducible workflows, supporting realistic integration into operational security environments.

In summary, CNIDS represents a production-ready cognitively enhanced intrusion detection framework that combines supervised learning, anomaly detection, and human feedback into a self-improving defense system. Its novelty lies in the integration of cognitive mechanisms, unified data representation, and microservice-based deployment, positioning it as a forward-looking solution for resilient cybersecurity in dynamic network environments. But CNIDS should be understood as an adaptive framework rather than a standalone zero-day detector. CNIDS evaluates how quickly CNIDS converts zero-day uncertainty into reliable knowledge while preserving precision and operational trust. Our contributions are (i) a deployment-oriented CPT–SFT–HRS architecture with a vector memory that enables similarity-aware decisions and cross-dataset reuse; (ii) a class-exclusion protocol for zero-day evaluation across NSL-KDD, UNSW-NB15, and CICIDS2017; and (iii) a focus on learning velocity and post-feedback precision stabilization as operational metrics that better reflect real-world resilience than static zero-day accuracy alone. This research further bounds simulated human feedback (delay, label noise assumptions) to present an upper-bound scenario for analyst availability and to maintain a conservative interpretation of gains.

In the future, although the current evaluation demonstrates the effectiveness of the CNIDS in detecting known attacks and its emerging capability for zero-day discovery through feedback-driven reinforcement, several avenues remain for further enhancement. First, automated feedback ingestion is a critical next step. Currently, human analysts provide structured reinforcement signals that enable the system to assimilate novel attack patterns. Automating this process through integration with external threat intelligence feeds and security information-sharing platforms would reduce latency, improve scalability, and accelerate the assimilation of zero-day threats. Second, the system requires larger and more diverse zero-day datasets to strengthen its anomaly detection capabilities. Expanding the training corpora with heterogeneous traffic sources, simulated adversarial attacks, and real-world incident data will improve generalization and reduce false alarms. This aligns with the unified vectorDB design, which facilitates cross-dataset embedding and supports broader generalization. Third, integration with global threat intelligence ecosystems enhances adaptability. By linking CNIDS to CVE repositories, malware databases, and collaborative cybersecurity networks, the system can continuously update its knowledge base, ensuring resilience against evolving attack vectors and concept drifts. Finally, future work will explore real-time feedback automation and microservice orchestration to enable near-instantaneous adaptation and reproducibility across diverse operational environments. These directions will advance the CNIDS toward a fully autonomous, cognitively enhanced intrusion detection framework capable of sustaining long-term resilience under dynamic network conditions.

7. Patents

Articles are licensed under an open access Creative Commons CC BY 4.0 license, meaning that anyone may download and read the research for free. In addition, the article may be reused and quoted, provided that the original published version has been cited.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/network6020041/s1.

Author Contributions

Conceptualization, J.A.G., M.L.S. and R.V.H.G.; methodology, J.A.G.; software, J.A.G.; validation, J.A.G., M.L.S. and R.V.H.G.; formal analysis, J.A.G. and M.L.S. Investigation, J.A.G.; resources, J.A.G.; data curation, J.A.G., M.L.S. and R.V.H.G.; writing—original draft preparation, J.A.G., M.L.S. and R.V.H.G.; writing, J.A.G.; writing—review and editing: J.A.G., M.L.S. and R.V.H.G.; visualization, J.A.G.; supervision, M.L.S. and R.V.H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

NSL-KDD (https://www.kaggle.com/datasets/hassan06/nslkdd accessed on 26 January 2026), CICIDS2017 (https://www.kaggle.com/datasets/sateeshkumar6289/cicids-2017-dataset accessed on 26 January 2026), and UNSW-NB15 (https://www.kaggle.com/datasets/mrwellsdavid/unsw-nb15 accessed on 26 January 2026), dataset in this research (Scripts, plot reports, csv files, etc.), Python scripts and dataset generated can be downloaded at https://its.id/m/JAGnetworkMDPI and also extended data files that uploaded to Zenodo Repository name: Harnessing Uncertainty for Knowledge Insight: A Cognitive Network IDS for Feedback-Driven Zero-Day Adaptation (https://doi.org/10.5281/zenodo.20426228 published 28 May 2026). The project contains the following underlying data: Figure 1, Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8 show the NIDS for Zero-Day Attack preparation 2; result.docx contains every script and how-to guide for the research study developed; and supplement data.zip contains all the data supplements and the results provided, including the datasets generated in the file Cognitive IDS.zip).

Acknowledgments

We would like to express our deepest gratitude to the Interdisciplinary School of Management Technology, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia, and the university staff for their support of this study. We appreciate our families, colleagues, superiors, mentors and friends for their constant support, encouragement, and inspiration, both online and offline. During the preparation of this manuscript/research, the author used GPT4All, ver 3.10.0, to check the grammar and as an offline AI engine for data extraction from literature assets, along with Zotero-AI for indexing and reference management. Some context and rephrasing were performed using PaperPal AI. The authors have reviewed and edited the output and research experiments and have taken full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript.

BOT	A bot attack is a type of cyberattack where automated scripts, known as bots, are used to perform malicious activities.
CICIDS2017	Canadian Institute for Cybersecurity Intrusion Detection Systems 2017
CNIDS	Cognitive Network Intrusion Detection Systems
CPT	Continual Pre-Training
DDoS	Distributed Denial of Service
DMZ	Demilitarized Zone
HRS	Human-in-the-Loop Reinforcement Signal
IDS	Intrusion Detection Systems
ML	Machine Learning
NSL-KDD	Refined version of the original KDD’99 dataset
PPO	Proximal Policy Optimization
SFT	Supervised Fine-Tuning
UNSW-NB15	The dataset created by the IXIA PerfectStorm tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS)
UML	A standardized modeling language used to specify, visualize, construct, and document the artifacts of software systems.
VectorDB	A specialized database designed to store, index and search high-dimensional vector representations of data known as embeddings. Unlike traditional databases that rely on exact matches, vector databases use similarity search techniques such as cosine similarity or Euclidean distance to find items that are semantically or visually similar.

References

Saqib, M.; Mehta, D.; Yashu, F.; Malhotra, S. Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning. arXiv 2025. [Google Scholar] [CrossRef]
Ali, G.; Shah, S.; ElAffendi, M. Enhancing cybersecurity incident response: AI-driven optimization for strengthened advanced persistent threat detection. Results Eng. 2025, 25, 104078. [Google Scholar] [CrossRef]
Alhayan, F.; Alshuhail, A.; Ismail, A.; Alrusaini, O.; Alahmari, S.; Yahya, A.E.; Albouq, S.S.; Al Sadig, M. Enhanced anomaly network intrusion detection using an improved snow ablation optimizer with dimensionality reduction and hybrid deep learning model. Sci. Rep. 2025, 15, 13270. [Google Scholar] [CrossRef] [PubMed]
Roshan, K.; Zafar, A.; Haque, S.B.U. A Novel Deep Learning based Model to Defend Network Intrusion Detection System against Adversarial Attacks. arXiv 2023. [Google Scholar] [CrossRef]
Ahmed, U.; Nazir, M.; Sarwar, A.; Ali, T.; Aggoune, E.M.; Shahzad, T.; Khan, M.A. Signature-based intrusion detection using machine learning and deep learning approaches empowered with fuzzy clustering. Sci. Rep. 2025, 15, 1726. [Google Scholar] [CrossRef] [PubMed]
Diana, L.; Dini, P.; Paolini, D. Overview on Intrusion Detection Systems for Computers Networking Security. Computers 2025, 14, 87. [Google Scholar] [CrossRef]
Maseer, Z.K.; Kadhim, Q.K.; Al-Bander, B.; Yusof, R.; Saif, A. Meta-analysis and systematic review for anomaly network intrusion detection systems: Detection methods, dataset, validation methodology, and challenges. IET Netw. 2024, 13, 339. [Google Scholar] [CrossRef]
Kheddar, H. Transformers and large language models for efficient intrusion detection systems: A comprehensive survey. Inf. Fusion 2025, 124, 103347. [Google Scholar] [CrossRef]
Sowmya, T.; Anita, E.A.M. A comprehensive review of AI based intrusion detection system. Meas. Sens. 2023, 28, 100827. [Google Scholar] [CrossRef]
Vanin, P.; Newe, T.; Dhirani, L.L.; O’Connell, E.; O’Shea, D.; Lee, B.; Rao, M. A Study of Network Intrusion Detection Systems Using Artificial Intelligence/Machine Learning. Appl. Sci. 2022, 12, 11752. [Google Scholar] [CrossRef]
Corsini, A.; Yang, S.J. Are Existing Out-Of-Distribution Techniques Suitable for Network Intrusion Detection? In In Proceedings of the 2023 IEEE Conference on Communications and Network Security (CNS), Orlando, FL, USA, 2–5 October 2023; pp. 1–9. [Google Scholar] [CrossRef]
Albasheer, H.; Siraj, M.M.; Mubarakali, A.; Tayfour, O.E.; Salih, S.; Hamdan, M.; Khan, S.; Zainal, A.; Kamarudeen, S. Cyber-Attack Prediction Based on Network Intrusion Detection Systems for Alert Correlation Techniques: A Survey. Sensors 2022, 22, 1494. [Google Scholar] [CrossRef] [PubMed]
Layman, L.; Roden, W. A Controlled Experiment on the Impact of Intrusion Detection False Alarm Rate on Analyst Performance. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2023, 67, 220. [Google Scholar] [CrossRef]
Ghadermazi, J.; Shah, A.; Jajodia, S. A Machine Learning and Optimization Framework for Efficient Alert Management in a Cybersecurity Operations Center. Digit. Threat. Res. Pract. 2024, 5, 19. [Google Scholar] [CrossRef]
Olateju, O.O.; Okon, S.U.; Igwenagu, U.T.I.; Salami, A.A.; Oladoyinbo, T.O.; Olaniyi, O.O. Combating the Challenges of False Positives in AI-Driven Anomaly Detection Systems and Enhancing Data Security in the Cloud. Asian J. Res. Comput. Sci. 2024, 17, 264. [Google Scholar] [CrossRef]
Yang, T.; Shen, J.; Su, Y.; Ren, X.; Yang, Y.; Lyu, M.R. Characterizing and Mitigating Anti-patterns of Alerts in Industrial Cloud Systems. In 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN); IEEE: New York, NY, USA, 2022. [Google Scholar] [CrossRef]
Zhu, G. Automated False Positive Filtering for esNetwork Alerts. arXiv 2022. [Google Scholar] [CrossRef]
Wang, Z.; Thing, V.L.L. Feature mining for encrypted malicious traffic detection with deep learning and other machine learning algorithms. Comput. Secur. 2023, 128, 103143. [Google Scholar] [CrossRef]
Chao, J.; Xie, T. Deep Learning-Based Network Security Threat Detection and Defense. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 64. [Google Scholar] [CrossRef]
Almahmoud, Z.; Yoo, P.D.; Alhussein, O.; Farhat, I.A.H.; Damiani, E. A holistic and proactive approach to forecasting cyber threats. Sci. Rep. 2023, 13, 8049. [Google Scholar] [CrossRef] [PubMed]
Magán-Carrión, R.; Urda, D.; Díaz-Cano, I.; Dorronsoro, B. Improving the Reliability of Network Intrusion Detection Systems Through Dataset Integration. IEEE Trans. Emerg. Top. Comput. 2022, 10, 1717. [Google Scholar] [CrossRef]
Ali, T.; Kostakos, P. HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs). arXiv 2023. [Google Scholar] [CrossRef]
Sayduzzaman, M.; Tamanna, J.T.; Kundu, D.; Rahman, T. Interoperability and Explicable AI-based zero-day Attacks Detection Process in Smart Community. arXiv 2024. [Google Scholar] [CrossRef]
Zhang, X.; Meng, H.; Li, Q.; Tan, Y.P.; Zhang, L. Large Language Models powered Malicious Traffic Detection: Architecture, Opportunities and Case Study. IEEE Netw. 2025, 39, 51–57. [Google Scholar] [CrossRef]
Yang, S.; Zheng, X.; Zhang, X.; Xu, J.; Li, J.; Xie, D.; Long, W.; Ngai, E.C.H. Large Language Models for Network Intrusion Detection Systems: Foundations, Implementations, and Future Directions. arXiv 2025. [Google Scholar] [CrossRef]
Roshanaei, M.; Khan, M.R.; Sylvester, N.N. Enhancing Cybersecurity through AI and ML: Strategies, Challenges, and Future Directions. J. Inf. Secur. 2024, 15, 320. [Google Scholar] [CrossRef]
Kaur, R.; Klobucar, T.; Gabrijelcic, D. Harnessing the power of language models in cybersecurity: A comprehensive review. Int. J. Inf. Manag. Data Insights 2024, 5, 100315. [Google Scholar] [CrossRef]
Tian, S.; Zhang, T.; Liu, J.; Wang, J.; Wu, X.; Zhu, X.; Zhang, R.; Zhang, W.; Yuan, Z.; Mao, S.; et al. Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey. arXiv 2025. [Google Scholar] [CrossRef]
Jaffal, N.O.; Alkhanafseh, M.; Mohaisen, D. Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques. arXiv 2025. [Google Scholar] [CrossRef]
Ferrag, M.A.; Ndhlovu, M.; Tihanyi, N.; Cordeiro, L.C.; Debbah, M.; Lestable, T. Revolutionizing Cyber Threat Detection with Large Language Models. arXiv 2023. [Google Scholar] [CrossRef]
Xu, H.; Wang, S.; Li, N.; Wang, K.C.; Zhao, Y.; Chen, K.; Yu, T.; Liu, Y.; Wang, H. Large Language Models for Cyber Security: A Systematic Literature Review. arXiv 2024. [Google Scholar] [CrossRef]
Al-Hammouri, M.F.; Otoum, Y.; Atwa, R.; Nayak, A. Hybrid LLM-Enhanced Intrusion Detection for zero-day Threats in IoT Networks. arXiv 2025. [Google Scholar] [CrossRef]
Zhang, J.; Bu, H.; Wen, H.; Liu, Y.; Fei, H.; Xi, R.; Li, L.; Yang, Y.; Zhu, H.; Meng, D. When LLMs meet cybersecurity: A systematic literature review. Cybersecurity 2025, 8, 55. [Google Scholar] [CrossRef]
Mohammed, K. Harnessing the Speed and Accuracy of Machine Learning to Advance Cybersecurity. arXiv 2023. [Google Scholar] [CrossRef]
Li, X.; Feng, B.; Zang, T.; Xu, X.; Zhao, S.; Ma, J. Facing Unknown: Open-World Encrypted Traffic Classification Based on Contrastive Pre-Training. In In Proceedings of the 2023 IEEE Symposium on Computers and Communications (ISCC), Gammarth, Tunisia, 9–12 July 2023; pp. 1255–1260. [Google Scholar] [CrossRef]
Lin, C.; Zhang, W.; Zuo, T.; Zha, C.; Jiang, Y.; Meng, R.; Luo, H.; Meng, X.; Zhang, Y. Convolutions are Competitive with Transformers for Encrypted Traffic Classification with Pre-training. arXiv 2025. [Google Scholar] [CrossRef]
Tulczyjew, L.; Jarrah, K.; Abondo, C.; Bennett, D.; Weill, N. LLMcap: Large Language Model for Unsupervised PCAP Failure Detection. In 2024 IEEE International Conference on Communications Workshops (ICC Workshops); IEEE: New York, NY, USA, 2024; p. 1559. [Google Scholar] [CrossRef]
Cui, T.; Lin, X.; Li, S.; Chen, M.; Yin, Q.; Li, Q.; Xu, K. TrafficLLM: Enhancing Large Language Models for Network Traffic Analysis with Generic Traffic Representation. arXiv 2025. [Google Scholar] [CrossRef]
Shi, Z.; Zhao, D.; Zhu, Y.; Xie, G.; Li, Q.; Jiang, Y. Helios: Learning and Adaptation of Matching Rules for Continual In-Network Malicious Traffic Detection. In Proceedings of the ACM on Web Conference 2025; ACM: New York, NY, USA, 2025. [Google Scholar]
Bayer, M.; Frey, T.; Reuter, C. Multi-level fine-tuning, data augmentation, and few-shot learning for specialized cyber threat intelligence. Comput. Secur. 2023, 134, 103430. [Google Scholar] [CrossRef]
Karlsen, E.; Luo, X.; Zincir-Heywood, A.N.; Heywood, M.I. Benchmarking Large Language Models for Log Analysis, Security, and Interpretation. J. Netw. Syst. Manag. 2024, 32, 59. [Google Scholar] [CrossRef]
Sha, Z.; He, X.; Berrang, P.; Humbert, M.; Zhang, Y. Fine-Tuning Is All You Need to Mitigate Backdoor Attacks. arXiv 2022. [Google Scholar] [CrossRef]
Al-Aamri, A.S.; Abdulghafor, R.; Turaev, S.; Alshaikhli, I.F.T.; Zeki, A.M.; Talib, S. Machine Learning for APT Detection. Sustainability 2023, 15, 13820. [Google Scholar] [CrossRef]
Zheng, C.; Lu, C.; Li, C.; Zheng, Z.; Pan, L. CyberDualNER: A Dual-Stage Approach for Few-Shot Named Entity Recognition in Cybersecurity. Electronics 2025, 14, 1791. [Google Scholar] [CrossRef]
Alam, T.; Bhusal, D.; Park, Y.; Rastogi, N. CyNER: A Python Library for Cybersecurity Named Entity Recognition. arXiv 2022. [Google Scholar] [CrossRef]
Guo, Y.; Fu, J.; Zhang, H.; Zhao, D.; Shen, Y. Efficient Continual Pre-training by Mitigating the Stability Gap. arXiv 2024. [Google Scholar] [CrossRef]
Zhang, H.; Dong, Y.; Xiao, C.; Oyamada, M. Large Language Models as Data Preprocessors. arXiv 2023. [Google Scholar] [CrossRef]
Chen, J.; Wang, X.; Gao, A.; Jiang, F.; Chen, S.; Zhang, H.; Song, D.; Xie, W.; Kong, C.; Li, J.; et al. HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs. arXiv 2023. [Google Scholar] [CrossRef]
Manchanda, J.; Boettcher, L.; Westphalen, M.; Jasser, J. The Open Source Advantage in Large Language Models (LLMs). arXiv 2024. [Google Scholar] [CrossRef]
Ruan, Y.; Lan, X.; Ma, J.; Dong, Y.; He, K.; Feng, M. Language Modeling on Tabular Data: A Survey of Foundations, Techniques and Evolution. arXiv 2024. [Google Scholar] [CrossRef]
Bhatt, G.; Ross, J.; Sigal, L. Preventing Catastrophic Forgetting Through Memory Networks in Continuous Detection. In Computer Vision—ECCV 2024; Lecture Notes in Computer Science; Springer Science + Business Media: Cham, Switzerland, 2024; p. 442. [Google Scholar] [CrossRef]
Li, X.; Tang, B.; Li, H. AdaER: An adaptive experience replay approach for continual lifelong learning. Neurocomputing 2023, 572, 127204. [Google Scholar] [CrossRef]
Zhang, J.; Fu, Y.; Peng, Z.; Yao, D.; He, K. CORE: Mitigating Catastrophic Forgetting in Continual Learning through Cognitive Replay. arXiv 2024. [Google Scholar] [CrossRef]
Ke, Z.; Shao, Y.; Lin, H.; Konishi, T.; Kim, G.; Liu, B. Continual Pre-training of Language Models. arXiv 2023. [Google Scholar] [CrossRef]
Yildiz, Ç.; Ravichandran, N.K.; Sharma, N.; Bethge, M.; Ermis, B. Investigating Continual Pretraining in Large Language Models: Insights and Implications. arXiv 2024. [Google Scholar] [CrossRef]
Weyssow, M.; Zhou, X.; Kim, K.; Lo, D.; Sahraoui, H. Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models. arXiv 2023. [Google Scholar] [CrossRef]
Gu, J.; Yang, Z.; Ding, C.; Zhao, R.; Tan, F. CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing; ACL: Stroudsburg, PA, USA, 2024; p. 16143. [Google Scholar] [CrossRef]
Wang, X.; Tissue, H.; Wang, L.; Li, L.; Zeng, D.D. Learning Dynamics in Continual Pre-Training for Large Language Models. arXiv 2025. [Google Scholar] [CrossRef]
Gupta, S.; Nandwani, Y.; Yehudai, A.; Khandelwal, D.; Raghu, D.; Joshi, S. Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models. In Findings of the Association for Computational Linguistics: NAACL 2025; ACL: Stroudsburg, PA, USA, 2025; p. 6240. [Google Scholar] [CrossRef]
Xie, S.; Chen, H.; Yu, F.; Sun, Z.; Wu, X. Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation. arXiv 2024. [Google Scholar] [CrossRef]
Kim, G.; Thakur, S.S.; Park, S.M.; Wei, W.; Bao, Y. SFT-GO: Supervised Fine-Tuning with Group Optimization for Large Language Models. arXiv 2025. [Google Scholar] [CrossRef]
Dong, A.; Li, P.; Chen, Y.; Gibson, S.; Zhao, L.; He, M. Human–AI Collaboration Across Decision Support, Autonomous Systems, and LLM Agents: A Systematic Review and Collaboration Convergence Framework. Sustainability 2026, 18, 5313. [Google Scholar] [CrossRef]
Taheri, A.; Taban, A.; Ye, S.; Mirzaei, A.; Liu, T.; Han, B. Forgetting: A New Mechanism Towards Better Large Language Model Fine-tuning. arXiv 2025. [Google Scholar] [CrossRef]
Luo, J.; Luo, X.; Ding, K.; Yuan, J.; Xiao, Z.; Zhang, M. RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response. arXiv 2024. [Google Scholar] [CrossRef]
Bhatt, G.; Chen, Y.; Das, A.; Zhang, J.; Truong, S.; Mussmann, S.; Zhu, Y.; Bilmes, J.; Du, S.S.; Jamieson, K.; et al. An Experimental Design Framework for Label-Efficient Supervised Fine-tuning of Large Language Models. arXiv 2024. [Google Scholar] [CrossRef]
Lambert, N. Reinforcement Learning from Human Feedback. arXiv 2025. [Google Scholar] [CrossRef]
Entezami, E.; Naseh, A. LLM Misalignment via Adversarial RLHF Platforms. arXiv 2025. [Google Scholar] [CrossRef]
Kaufmann, T.; Weng, P.; Bengs, V.; Hüllermeier, E. A Survey of Reinforcement Learning from Human Feedback. arXiv 2023. [Google Scholar] [CrossRef]
Wang, S.; Zhang, Z.J.; Zhao, R.; Tan, F.; Nguyen, C.T. Reward Difference Optimization for Sample Reweighting in Offline RLHF. arXiv 2024. [Google Scholar] [CrossRef]
Liu, W.; Wang, X.; Wu, M.; Li, T.; Lv, C.; Ling, Z.; Zhu, J.; Zhang, C.; Zheng, X.; Huang, X. Aligning Large Language Models with Human Preferences through Representation Engineering. arXiv 2023. [Google Scholar] [CrossRef]
Hu, J.; Liu, J.K.; Xu, H.; Shen, W. REINFORCE++: Stabilizing Critic-Free Policy Optimization with Global Advantage Normalization. arXiv 2025. [Google Scholar] [CrossRef]
Iovane, G.; Iovane, G. Co-Creation by Human–AI Sophimatics Framework and Applications. Algorithms 2026, 19, 175. [Google Scholar] [CrossRef]
Dam, H.; Knochelmann, J.; Joseph, V.; Gopalakrishnan, G. Derailing Non-Answers via Logit Suppression at Output Subspace Boundaries in RLHF-Aligned Language Models. arXiv 2025. [Google Scholar] [CrossRef]
Dehghan, M.; Sadeghiyan, B.; Khosravian, E.; Moghaddam, A.S.; Nooshi, F. ProAPT: Projection of APT Threats with Deep Reinforcement Learning. arXiv 2022. [Google Scholar] [CrossRef]
Aref, Z.; Wei, S.; Mandayam, N.B. Human-AI Collaboration in Cloud Security: Cognitive Hierarchy-Driven Deep Reinforcement Learning. arXiv 2025. [Google Scholar] [CrossRef]
Sewak, M.; Sahay, S.K.; Rathore, H. Deep Reinforcement Learning for Cybersecurity Threat Detection and Protection: A Review. In Secure Knowledge Management in the Artificial Intelligence Era; Communications in Computer and Information Science; Springer: Cham, Switzerland, 2022; pp. 51–72. [Google Scholar] [CrossRef]
Jalalvand, F.; Chhetri, M.B.; Nepal, S.; Paris, C. Adaptive alert prioritisation in security operations centres via learning to defer with human feedback. arXiv 2025. [Google Scholar] [CrossRef]
Tellache, A.; Korba, A.A.; Mokhtari, A.; Moldovan, H.; Ghamri-Doudane, Y. Advancing Autonomous Incident Response: Leveraging LLMs and Cyber Threat Intelligence. arXiv 2025. [Google Scholar] [CrossRef]
Castro, P.; Santos, F.; Lopes, P. Comparative Analysis of Supervised and Unsupervised Learning for Intrusion Detection in Network Logs. Computation 2026, 14, 92. [Google Scholar] [CrossRef]
Chittepu, Y.; Metevier, B.; Schwarzer, W.; Hoag, A.; Niekum, S.; Thomas, P.S. Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints. arXiv 2025. [Google Scholar] [CrossRef]

Figure 1. Overview of the CNIDS Cognitive Loop, Combining Continual Pre-Training, Supervised Fine-Tuning, and Human Feedback.

Figure 2. System Integration Diagram for Research Completion.

Figure 3. High-Level Diagram for CNIDS Processing.

Figure 4. Zero-day Detection Results for CNIDS under Autonomous and HRS Inferences.

Figure 6. Temporal Evolution of Zero-day Detection Precision after the First HRS Intervention.

Figure 7. Chart of Accuracy Over Time with Intervention Point.

Figure 8. UML Component Diagram for CNIDS.

Table 1. Summary Table of Methods for IDS Enhancement.

Technique	Goal	Data Type Optimization
CPT	Domain adaptation	Unlabeled domain-specific text Continued pre-training
SFT	Task-specific tuning	Labeled input–output pairs Supervised learning
HRS	Human-aligned behavior	Human preference markings Human Reinforcement Feedback

Table 2. Class-Exclusion Configuration for zero-day simulation.

Dataset Withheld Attack Classes	Samples Removed	Removed Dominant Training Classes	Selection Rationale
NSL-KDD: U2R (e.g., buffer_overflow, rootkit), R2L (e.g., guess_passwd, imap)	~1100	DoS, Probe	Privilege-escalation and credential attacks are semantically distinct from volumetric attacks and are low-frequency, making them suitable proxies for rare zero-day behavior.
UNSW-NB15: Analysis, Backdoor	~2000	Generic, Exploits, DoS	These classes represent stealthy and persistent attack patterns with lower frequency and different behavioral signatures compared to high-volume attacks.
CICIDS2017: Infiltration and Web Attack	~3500	DDoS, DoS, PortScan, Botnet	Multi-stage and application-layer attacks differ structurally from traffic-based attacks and reflect realistic emerging threats.

Table 3. Baseline performance on known attack detection (averaged across three datasets).

Metric	Known Attack Detection
Accuracy/F1 Score	92.4% (±1.2%)
Precision	89.7% (±1.5%)
Recall/Detection Rate (DR)	94.1% (±0.9%)
False-Positive Rate (FPR)	8.2% (±1.4%)
F1-Score	91.8% (±1.1%)

Table 4. Comparison of Zero-day detection performance experiences.

Configuration	ZDR (%)	Precision (%)	FPR (%)
SFT only (supervised)	0.0	—	—
CNIDS without HRS	10.1	40.8	2.6
CNIDS with HRS (5 min feedback)	18.2	82.9	2.6

Table 5. Zero-day detection performance after CNIDS with HRS integration.

Model	Known Attack Accuracy (%)	Zero-Day Detection Rate (%)	FPR (%)
Random Forest	89.2	2.3	11.8
CNN (1D)	91.5	4.1	9.2
LSTM (Standalone)	92.8	6.7	8.5
Transformer (Small)	93.1	7.2	7.9
CNIDS (ours)	94.2	18.2	2.6

Table 6. Ablation Analysis of CNIDS Components (Qualitative Impact).

Configuration	CPT	Vector Memory	HRS	Expected Impact on Zero-Day Detection	Expected Impact on Precision	Operational Interpretation
Full CNIDS	✓	✓	✓	Moderate improvement over time	High (≥80%)	Adaptive system with rapid learning
No Human Feedback	✓	✓	✗	Near-zero detection of unseen attacks	Moderate	Static anomaly scoring, no adaptation
No Vector Memory	✓	✗	✓	Limited novelty generalization	Moderate–Low	Feedback lacks memory persistence
No CPT	✗	✓	✓	Slower adaptation, higher drift	Moderate	Memory present but representations stale
SFT Only (Baseline)	✗	✗	✗	0% (by design)	High (for known attacks)	Conventional supervised IDS

Table 7. Experimental Setup.

Phase	Task	Tooling
Data Prep	Normalize CICIDS/UNSW/Real Logs datasets	Pandas, Scikit-learn
Modeling	Train ML classifiers	HuggingFace Transformers, Python Sklearn (SVM, neural network, random forest, gradient boosting), Python Keras LSTM
Evaluation	Benchmark metrics	MLFlow, Weights and Biases
Deployment	Containerize and monitor	Docker, Prometheus or Log Management

Table 8. Common Pitfalls in IDS Deployment.

Pitfall	Issue	Impact Fix
Blind Spots in Network Visibility	IDS sensors are not placed at strategic choke points (e.g., inside/outside firewalls, DMZ).	Missed lateral movements or internal threats. Port mirroring, SPAN, or network taps are used to ensure complete traffic visibility.
Alert Fatigue and Neglected Monitoring	Although IDS generates alerts, these are not actively reviewed or triaged.	Critical threats go unnoticed, and IDSs become post-incident forensic tools. Integrate with SIEMs and automate alert prioritization using ML or HRS.
Overreliance on Signature-Based Detection	Static signatures fail to detect zero-day or polymorphic attacks.	Sophisticated threats can bypass detection. Combine signature-based anomaly detection and ML-based semantic analysis.
Poor Baseline Modeling	Inadequate profiling of “normal” traffic leads to a high number of false positives.	This wastes time for analysts and erodes trust in the system. Unsupervised learning or CPT can be used to adapt the baselines.
Generic Rule Sets in Specialized Environments	Applying IT-centric rules to OT or IoT networks.	Misses protocol-specific threats (e.g., Modbus, DNP3). Tailor rules to the environment and collaborate with domain experts.
Lack of Feedback Loop	IDS do not evolve based on analyst input or a changing threat landscape.	Static performance and increasing irrelevance. Implement HRS or active learning to refine detection and alert over time.
Resource Overload	An IDS consumes excessive CPU/memory, especially with deep packet inspection.	Network latency or dropped packets are also considered. Offload preprocessing to edge devices or use scalable cloud-native architectures.

Table 9. Simulated Pitfalls and Mitigation Strategies.

Pitfall	Simulation Strategy	Mitigation Benchmark
Alert Fatigue	Inject excessive false positives	HRS-based alert ranking, analyst feedback loop
Blind Spots	Remove traffic from internal segments	Multi-sensor fusion, ML-based log correlation
Signature Overreliance	Use only static rules (Snort-like)	Hybrid detection: anomaly + ML semantic matching
Poor Baseline Modeling	Randomize benign traffic profiles	CPT on unlabeled traffic to adapt baseline
Resource Overload	Simulate high-throughput traffic	Benchmark latency with edge preprocessing
Semantic Drift in ML	Use outdated log formats	CPT with recent traffic logs, continual learning

Table 10. Detection Engine Options.

Model Type	Use Case	Tooling
Random Forest, Gradient Boosting, SVM, ANN, LSTM, CNN	Fast baseline detection that is undetected with a supervised trained dataset	Scikit-learn, TensorFlow (in cpt_engine.py)
Supervised Dataset	Anomaly or normal detection (NSL_KDD style)	Sqlite3, Python Pickle (in sft_engine.py)
zero-day Detection	Semantic log analysis	Feedback Pattern (in hrs_engine.py)

Table 11. Protocol–Component Interaction Analysis.

Configuration	Protocol Effect	Result
Full CNIDS	Handles distributional shift	Adaptive zero-day detection
Without CPT	Poor feature alignment	Drift sensitivity
Without VectorDB	No similar reference	Boundary collapse
Without HRS	No learning from zero-day	Static performance

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gunawan, J.A.; Singgih, M.L.; Ginardi, R.V.H. Cognitive Network Intrusion Detection Systems: Anomaly and Malware Detection for Zero-Day Attack Resilience. Network 2026, 6, 41. https://doi.org/10.3390/network6020041

AMA Style

Gunawan JA, Singgih ML, Ginardi RVH. Cognitive Network Intrusion Detection Systems: Anomaly and Malware Detection for Zero-Day Attack Resilience. Network. 2026; 6(2):41. https://doi.org/10.3390/network6020041

Chicago/Turabian Style

Gunawan, Jimmy Agung, Moses Laksono Singgih, and Raden Venantius Hari Ginardi. 2026. "Cognitive Network Intrusion Detection Systems: Anomaly and Malware Detection for Zero-Day Attack Resilience" Network 6, no. 2: 41. https://doi.org/10.3390/network6020041

APA Style

Gunawan, J. A., Singgih, M. L., & Ginardi, R. V. H. (2026). Cognitive Network Intrusion Detection Systems: Anomaly and Malware Detection for Zero-Day Attack Resilience. Network, 6(2), 41. https://doi.org/10.3390/network6020041

Article Menu

Cognitive Network Intrusion Detection Systems: Anomaly and Malware Detection for Zero-Day Attack Resilience

Abstract

1. Introduction

2. Literature Study

2.1. Continual Pre-Training (CPT)

2.2. Supervised Fine-Tuning (SFT)

2.3. Feedback-Driven Online Reinforcement (Human-in-the-Loop Reinforcement Signal)

2.4. Related Works and Research Gap

2.4.1. Gap 1: Limited Integration of Multimodal Learning with Domain-Specific Adaptation

2.4.2. Gap 2: Insufficient Human–AI Collaboration Frameworks for Real-Time Threat Response

3. Methodology and Experimental Setup

3.1. Overall Architecture

3.2. Data Acquisition and Feature Representation

3.2.1. Data Preprocessing and Encoding

3.2.2. Feature Inclusion and Selection

3.2.3. Correlation Analysis Between Known and Zero-Day Samples

3.2.4. Implications for Zero-Day Detection

3.2.5. Experimental Validity

3.3. Unified Vector Database (VectorDB)

3.4. Foundational Cognitive Learning Framework

3.4.1. Unsupervised Continued Pre-Training (CPT)

3.4.2. Supervised Task-Specific Fine-Tuning (SFT)

3.4.3. Reinforcement Signal Acquisition via Human Feedback

3.5. Decision and Response Layer

3.6. Mathematical Model and Problem Formulation

3.6.1. Problem Definition

3.6.2. Anomaly Scoring Function

3.6.3. Zero-Day Detection Criterion

3.6.4. Unified Learning Objective

3.7. Dataset Selection and Preprocessing

3.8. Implementation Details and Experimental Workflow

3.9. Zero-Day Simulation Protocol

3.9.1. Formal Definition of Class-Exclusion Protocol

3.9.2. Connection to Zero-Day Detection Criterion

4. Results

4.1. Baseline Performance

4.2. Zero-Day Detection Under Class-Exclusion

4.3. Impact of Human Feedback (HRS)

4.4. Ablation Analysis

4.5. Statistical Significance and Temporal Behavior

5. Discussion

6. Conclusions

7. Patents

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI