Conditional Counter-Inspection with Curriculum-Biased Experts for Lightweight 5G Intrusion Detection

Tahori, Khaoula; Fatani, Imade Fahd Eddine; Moughit, Mohamed

doi:10.3390/fi18030116

Open AccessArticle

Conditional Counter-Inspection with Curriculum-Biased Experts for Lightweight 5G Intrusion Detection

by

Khaoula Tahori

¹

,

Imade Fahd Eddine Fatani

^1,* and

Mohamed Moughit

^1,2,*

¹

Sciences and Techniques for the Engineer Laboratory (LASTI), National School of Applied Science, University Sultan Moulay Slimane, Khouribga 25000, Morocco

²

Artificial Intelligence, Modeling & Computational Engineering Laboratory (AIMCE), ENSAM Casablanca, University Hassan II, Casablanca 20190, Morocco

^*

Authors to whom correspondence should be addressed.

Future Internet 2026, 18(3), 116; https://doi.org/10.3390/fi18030116

Submission received: 22 January 2026 / Revised: 12 February 2026 / Accepted: 13 February 2026 / Published: 25 February 2026

Download

Browse Figures

Versions Notes

Abstract

In contemporary 5G network environments, intrusion detection systems must balance detection accuracy with operational efficiency, as improvements in one dimension are often achieved at the expense of the other. This study addresses this trade-off by proposing a lightweight two-stage intrusion detection architecture that augments a standard decision-tree classifier with a conditional counter-inspection mechanism. At inference time, a global decision tree produces an initial classification for each traffic record, which is selectively validated by a small set of class-biased expert trees trained under controlled minority exposure. Only experts associated with the opposite class of the initial prediction are activated, and decision revision is governed by a unanimous-dissent rule, ensuring conservative and deterministic correction while avoiding over-correction. Experiments conducted on the 5G-NIDD dataset in a binary benign/malicious setting show that the proposed architecture consistently improves upon the standalone decision tree, reducing false negatives from 51 to 27 (−47.1%) and false positives from 48 to 30 (−37.5%), and achieving an F1-score of 0.99981 on a held-out test set. Ablation and paired statistical tests confirm that these gains arise from selective validation and the unanimous-dissent mechanism rather than from uniform ensembling. The complete pipeline operates in the microsecond inference regime per record, evaluates fewer models on average than flat voting strategies, and preserves full interpretability through deterministic decision paths, making it suitable for practical and resource-constrained 5G intrusion detection deployments.

Keywords:

5G security; intrusion detection system (IDS); decision trees; conditional routing; cost-sensitive inference; ensemble learning; interpretability; 5G-NIDD; unanimous dissent; class-biased experts

Graphical Abstract

1. Introduction

In contemporary 5G network environments, intrusion detection systems (IDSs) increasingly face a structural trade-off between detection accuracy and operational efficiency. As traffic becomes more heterogeneous, dense, and latency-sensitive—driven by diverse service classes such as eMBB, mMTC, and uRLLC—improvements in detection performance are frequently achieved at the expense of computational cost, interpretability, or deployability. Designing IDS that remain both accurate and practical under such constraints remains an open challenge.

Traditional signature-based intrusion detection mechanisms are increasingly ineffective in this setting, as they fail to generalize to evolving and previously unseen attack patterns. Recent surveys emphasize that adaptive, data-driven detection approaches are essential for modern 5G infrastructures [1]. Consequently, learning-based IDS—spanning classical machine learning (ML) and deep learning (DL) techniques—have become the dominant research direction in 5G security.

Early learning-based IDS studies were largely conducted on synthetic or semi-synthetic benchmarks such as NSL-KDD, UNSW-NB15, and CICIDS2017. Although these datasets enabled reproducible experimentation, they have been widely criticized for outdated traffic profiles and limited realism [2,3]. To address these shortcomings, recent work has shifted toward datasets collected from functional 5G testbeds. The 5G-NIDD dataset represents a notable step in this direction, offering realistic benign and malicious traffic derived from an operational 5G environment. Studies based on 5G-NIDD report lower—but more credible—performance ceilings, with classical ML models typically achieving accuracies around 99.4–99.6% under rigorous preprocessing and leakage-free protocols [4].

Classical machine learning approaches—including decision trees, random forests, support vector machines, and k-nearest neighbors—remain attractive for intrusion detection due to their transparency, low computational cost, and ease of deployment. Tree-based models in particular are valued for their explicit decision rules, which align well with auditing and explainability requirements in security-critical systems [2,5]. However, such models typically rely on a single global decision boundary. As traffic distributions become more heterogeneous and imbalanced, this global boundary becomes increasingly fragile, leading to persistent class-specific errors that are difficult to resolve through tuning alone [4].

Deep learning models address these representational limitations by learning hierarchical or temporal features directly from traffic data. Convolutional and recurrent architectures, attention mechanisms, and Transformer-based models have reported strong detection performance on 5G-NIDD and related datasets [6]. Transfer learning strategies further extend deep models to data-scarce scenarios, as demonstrated in frameworks such as DTL-5G [7], while autoencoder–hypernetwork combinations have reported near-ceiling accuracy under controlled conditions [8]. Despite these advances, deep learning–based IDS often introduce substantial computational overhead, increased memory footprint, and limited interpretability. Their reliance on complex preprocessing pipelines and heavy architectures complicates deployment in latency-sensitive or resource-constrained 5G environments, particularly at the network edge.

To balance accuracy and practicality, hybrid and ensemble IDS architectures have also been explored. Common strategies include combining multiple learners through stacking, boosting, or voting, or pairing deep feature extractors with classical classifiers [9,10]. While these approaches can improve robustness, they typically rely on uniform aggregation, evaluating all models for every record regardless of uncertainty. As a result, they inherit much of the latency and complexity of their most expensive components and do not fully resolve the trade-off between accuracy, efficiency, and interpretability [11].

A key observation motivating this work is that intrusion detection errors are not uniformly distributed across classes. In practice, a classifier trained on heterogeneous traffic is frequently distracted by competing benign and malicious patterns, leading to class-specific misclassifications near decision boundaries. Rather than increasing model depth or aggregating more learners to average out this ambiguity, we argue that a more effective strategy is structured decision validation: selectively re-examining only those predictions that are most likely to be erroneous, using specialized expertise tailored to the predicted class.

Based on this insight, we propose a conditional counter-inspection architecture, a lightweight two-stage intrusion detection architecture. An initial decision-tree classifier acts as a fast filter, providing a coarse prediction for each traffic record. Validation is performed through conditional counter-inspection: only experts trained on the opposite class of the initial prediction are evaluated, and their predictions are combined through a unanimous dissent rule. The initial classification is revised if and only if all counter-class experts disagree with it. This design enables targeted correction of difficult cases while preserving efficiency, and interpretability.

Experiments conducted on the 5G-NIDD dataset in a binary benign/malicious setting demonstrate that the proposed architecture consistently outperforms both a standalone decision tree and flat voting over the same experts. Statistical and ablation analyses confirm that the observed gains arise from selective validation and the unanimous dissent mechanism rather than from expert aggregation or increased model complexity. At the same time, the proposed architecture operates with microsecond-level inference latency per record, a compact memory footprint, and fully auditable decision paths, making it suitable for practical and resource-constrained 5G intrusion detection deployments.

The main contributions of this work are summarized as follows:

Routing-based counter-inspection architecture

We propose a conditional counter-inspection architecture that departs from flat voting and uniform ensemble aggregation. Instead of evaluating all component models for every traffic record, the proposed design performs routing-based inference, activating only a subset of class-biased expert trees conditioned on the initial prediction. This selective execution reduces unnecessary model evaluations while preserving corrective capability.

2.: Noise-aware expert specialization through controlled training

We introduce a curriculum-biased training strategy that enables expert models to specialize by restricting exposure to non-target class samples. By controlling the input distribution during training, each expert focuses on class-specific decision regions, improving discrimination near class-ambiguous boundaries without increasing model depth or capacity.

3.: Deterministic unanimous-dissent decision logic

We formalize a unanimous-dissent rule for conditional validation, under which the initial classification is revised only when all counter-class experts agree on the opposite label. This decision logic enforces conservative correction, avoids over-adjustment, and preserves full interpretability through deterministic decision paths.

4.: Deployability-oriented evaluation and causal analysis

We provide a comprehensive evaluation on the 5G-NIDD dataset, including detection performance, robustness, efficiency, and routing cost measured as the number of tree traversals per record. Through ablation studies and paired statistical testing, we show that performance gains arise from routing-based selective validation and the unanimous-dissent decision logic, rather than from expert multiplicity or flat aggregation.

2. Materials and Methods

2.1. Overview of the Conditional Counter-Inspection Pipeline

This work proposes a lightweight intrusion detection architecture that augments a standard decision-tree classifier with a conditional counter-inspection layer. The pipeline is composed of two stages:

Initial filtering stage, trained on the full training dataset, provides a fast and interpretable initial classification.
Conditional counter-inspection stage, composed of a small set of class-biased decision-tree experts, selectively validates the initial prediction. Only experts trained on the opposite class of the initial decision are evaluated, and decision revision is governed by a unanimous dissent rule.

Crucially, all models in the pipeline are standard CART decision trees trained with default settings and a fixed random seed. Performance gains arise solely from training data exposure and selective validation, not from architectural complexity or hyperparameter tuning. The overall conditional counter-inspection architecture is illustrated in Figure 1.

Lightweight Design Clarification

In this work, “lightweight” refers to three concrete properties:

All components are shallow decision trees, ensuring low model complexity.
Inference cost is strictly bounded: each record traverses at most four trees (one global model and up to three counter-class experts), compared to seven in flat voting.
The framework does not rely on boosting, deep neural networks, probabilistic gating, or calibration layers.

The routing mechanism activates only class-relevant experts, avoiding full ensemble evaluation while preserving corrective capacity.

2.2. Dataset and Experimental Preparation

2.2.1. Dataset Description

We evaluate the proposed architecture on the publicly available 5G-NIDD dataset [12], which was collected from an operational 5G testbed and contains realistic benign traffic alongside diverse attack scenarios. The dataset is widely used for intrusion detection research in modern mobile networks and provides a representative flow-level view of 5G traffic under both normal and adversarial conditions [13].

In this study, we operate on the released tabular representation of 5G-NIDD, which describes each network flow using a rich set of timing, size, loss, and protocol-related attributes.

In addition to 5G-NIDD, we evaluate the proposed architecture on the UNSW-NB15 benchmark, an intrusion detection dataset introduced by Moustafa and Slay [14]. UNSW-NB15 contains modern synthetic benign and malicious network traffic generated within a controlled cyber range environment. We use the official training and testing split provided by the authors to ensure reproducibility and comparability with prior studies.

2.2.2. Preprocessing

To ensure data integrity and compatibility with decision-tree models, the following preprocessing steps are applied:

Categorical encoding: All categorical features (Proto, sDSb, dDSb, Cause, State) are transformed using one-hot encoding.
Missing value handling: Missing entries are replaced with an explicit Unknown category to preserve records and avoid biased imputation.
Column filtering: Auxiliary metadata fields (Attack Type, Attack Tool) are removed to prevent label leakage, as their presence would implicitly disclose the class during training and evaluation. This filtering also reflects realistic deployment conditions.
Label mapping: The target variable is standardized to binary values: 0 (benign) and 1 (malicious).
Leakage prevention: Hash-based row checks and feature-level consistency tests are performed post-encoding to verify that no flow appears in both training and testing sets.

After preprocessing, the dataset is stratified into 80% training (972,712 flows) and 20% testing (243,178 flows). The test split remains strictly unseen until final evaluation.

2.2.3. Terminology and Unit of Prediction

Although the 5G-NIDD dataset originates from raw packet captures, all experiments in this work are conducted on the publicly released tabular CSV representation, in which each row corresponds to a single traffic record derived from flow-level aggregation. Accordingly, throughout this paper, the term record denotes one dataset row, and inference latency is reported per record. The term packet is reserved exclusively for raw pcap-level analysis, which is outside the scope of this study. Accordingly, all latency and efficiency metrics reported in this paper are expressed per record.

2.3. First Layer: Global Decision Tree

The first layer of the pipeline is a global CART decision tree trained on the full training dataset without class bias, reweighting, or resampling. This model serves as a fast and interpretable initial filter, capturing general decision boundaries across both benign and malicious traffic.

Formally, the global model produces an initial prediction:

G (x) \in {0, 1},

where 0 denotes benign traffic and 1 denotes malicious traffic.

Although effective, a single global decision boundary may remain vulnerable to ambiguity near class boundaries due to exposure to heterogeneous traffic patterns. The second layer of the pipeline is specifically designed to selectively validate such borderline cases.

2.4. Second Layer: Counter-Inspection

2.4.1. Motivation: Specialization Through Controlled Data Exposure

Rather than increasing model complexity or introducing heterogeneous learners, the counter-inspection layer is constructed by training multiple identical decision-tree models on deliberately biased data subsets. The central idea is to control what each expert observes, not how it learns.

Each expert is trained to be class-dominant: it fully observes one class while being exposed only to a limited fraction of the opposite class. This deliberate imbalance induces distinct inductive biases among experts while preserving model simplicity, interpretability, and homogeneity.

All experts remain standard CART decision trees trained with identical settings; differences in behavior arise exclusively from differences in data exposure rather than from architectural choices or hyperparameter tuning.

2.4.2. Curriculum-Biased Training Set Construction

Let

D_{B}

and

D_{M}

denote the benign and malicious subsets of the training data, respectively. For each fraction:

f \in {0.1, 0.2, 0.3},

we construct biased training sets as follows:

D_{M}^{f} = D_{M} \cup {R e c o r d}_{f} (D_{B}),

(1)

D_{B}^{f} = D_{B} \cup {R e c o r d}_{f} (D_{M}),

(2)

Each training set preserves all records of the majority class while including only a fraction

f

of the minority class. This ensures that each expert remains majority-dominant while being selectively exposed to counter-class patterns.

Smaller fractions yield narrow specialists, while larger fractions produce broader experts with increased contextual awareness.

2.4.3. Expert Training

Using the biased datasets defined above, six experts are trained independently:

Malicious-biased experts: ${E_{M, 0.1}, E_{M, 0.2}, E_{M, 0.3}}$ .
Benign-biased experts: ${E_{B, 0.1}, E_{B, 0.2}, E_{B, 0.3}}$ .

All experts are trained using CART decision trees with identical hyperparameters. The class exposure ratios for all expert models are reported in Table 1.

Because all experts share the same learning algorithm and capacity, any observed specialization emerges solely from controlled data exposure, ensuring a transparent and interpretable design.

We adopt three experts per class, corresponding to the first three curriculum steps (0.10–0.30). A saturation analysis (Appendix A) shows that these steps capture the majority of performance gains, while additional experts yield diminishing returns or bias erosion at increased cost.

2.4.4. Expert Number and Curriculum Justification

The proposed architecture employs three malicious-biased experts and three benign-biased experts, corresponding to minority exposure fractions

{0.10,0.20,0.30}

. This choice is informed by a depth analysis of standalone expert performance across increasing exposure levels (Appendix A).

Validation results show that increasing minority exposure from 0.10 to 0.20 yields a substantial improvement in F1-score and a marked reduction in false-positive rate for malicious-biased experts. However, gains beyond 0.30 exhibit clear saturation, with diminishing improvements in F1 and only marginal reductions in FPR. Similar behavior is observed for benign-biased experts: fraction 0.20 achieves peak F1 performance, while further increases introduce bias erosion, reflected by rising false-positive rates and stagnating F1.

The plateau analysis confirms that the first three curriculum steps capture the majority of specialization benefits, while extending exposure to 0.40 produces negligible or inconsistent gains relative to additional computational overhead.

Using three experts per class also ensures a minimal odd-number configuration compatible with the unanimous-dissent rule, preventing tie conditions while preserving bounded inference cost (maximum four tree traversals).

Therefore, the selection of six experts represents a principled trade-off between specialization depth, correction stability, and lightweight deployment constraints.

2.5. Full Pipeline: Conditional Counter-Inspection

2.5.1. Conditional Expert Activation

The global decision tree G(x) provides the initial prediction:

G (x) \in {0, 1}

where 0 denotes benign and 1 denotes malicious traffic.

Rather than querying all experts, counter-inspection is conditional:

If G(x) = 0, only malicious-biased experts are activated.
If G(x) = 1, only benign-biased experts are activated.

This design restricts computation to counter-class validation, focusing inference on class-specific uncertainty while avoiding unnecessary expert evaluation.

2.5.2. Counter-Inspection Semantics

Expert validation is performed exclusively by the set of counter-class experts associated with the initial prediction. Each expert provides an independent assessment of whether the initial decision should be reconsidered. The role of this layer is not to refine the decision progressively, but to validate or veto the initial classification through collective agreement. This formulation ensures that decision revision occurs only under strong and consistent counter-evidence.

2.5.3. Unanimous-Dissent Rule and Logical Formulation

Decision revision is governed by a unanimous-dissent rule. The global decision is revised if and only if all counter-experts unanimously disagree.

The final prediction

\hat{y} (x)

is defined as:

I f G (x) = 0 a n d \forall f : E_{M, f} (x) = 1 \Rightarrow \hat{y} (x) = 1,

(3)

I f G (x) = 1 a n d \forall f : E_{B, f} (x) = 0 \Rightarrow \hat{y} (x) = 0

(4)

Otherwise \hat{y} (x) = G (x) .

(5)

This logical formulation directly captures the counter-inspection mechanism and ensures deterministic, conservative decision revision. The full step-by-step inference process is detailed in Algorithm 1.

Algorithm 1: Conditional counter-inspection inference procedure with unanimous-dissent decision rule.

Require: Feature vector x
Ensure: Final prediction

\hat{y}

(x) ∈ {0, 1}

1: Compute the initial prediction using the global decision tree:

y_{G}

← G(x)
2:

if y_{G}

= 0 (initially classified as benign) then
3: for each malicious-biased expert

E_{M, f}, evaluate E_{M, f} (x)

,
4:

if any E_{M, f} (x)

= 0 then
5: return

\hat{y}

(x) = 0 // retain benign classification.
6: else (all malicious-biased experts disagree),
7: return

\hat{y}

(x) = 1 // unanimous dissent → revise to malicious.
8:

else (y_{G}

= 1, initially classified as malicious),
9: for each benign-biased expert

E_{B, f}

, evaluate E_{B, f} (x)

,
10:

if any E_{B, f} (x)

= 1, then
11: return

\hat{y}

(x) = 1 // retain malicious classification.
12: else (all benign-biased experts disagree),
13: return

\hat{y}

(x) = 0 // unanimous dissent → revise to benign

This design ensures that the initial decision is revised only under unanimous counter-class agreement, preventing over-correction while preserving precision gains and reducing unnecessary model evaluations.

Unlike classical ensemble methods, the proposed architecture does not aggregate predictions from multiple models. Instead, it performs conditional validation of a single global decision, where expert models act as veto mechanisms rather than contributors to a combined score. As a result, expert multiplicity alone does not guarantee performance gains, which arise specifically from selective routing and the unanimous-dissent decision rule.

2.6. Distinction from Cascaded and Gated Expert Architectures

While the proposed architecture shares superficial similarities with cascaded classifiers and gated expert systems, it differs fundamentally in its objective and decision logic. Traditional cascades are designed to progressively increase model complexity, where early stages reject easy samples to reduce computational load in subsequent stages. Their primary goal is computational efficiency through hierarchical filtering. In contrast, our architecture does not escalate complexity across stages; it maintains homogeneous shallow trees and introduces a corrective validation layer that targets class-specific errors of a single primary model.

Similarly, classical ensemble and mixture-of-experts models aggregate multiple predictors to improve average generalization performance. In such systems, all models typically contribute symmetrically through weighted voting, probability averaging, or learned gating networks that dynamically assign responsibility. These frameworks optimize predictive diversity and collective strength. By contrast, the proposed method does not perform prediction aggregation. The global tree remains the sole predictive authority, and expert models are activated only to validate or challenge its output. Their role is strictly corrective rather than contributive.

Moreover, veto-based or confidence-driven systems often rely on probabilistic thresholds or meta-learned gating mechanisms to override predictions. In contrast, our framework introduces a deterministic unanimous-dissent rule: a label is modified only when all activated experts contradict the global decision. This eliminates probabilistic arbitration and avoids additional calibration or meta-learning layers.

Expert specialization is also constructed differently. Rather than using bootstrap resampling, random feature subspaces, or architectural heterogeneity to induce diversity, specialization is achieved through controlled minority exposure during training. This curriculum-biased mechanism shapes experts into targeted detectors of class-specific misclassifications rather than alternative global classifiers.

Therefore, the methodological novelty lies not in the presence of multiple trees per se, but in the integration of (i) role separation between prediction and validation, (ii) selective activation of expert subsets rather than full ensemble aggregation, (iii) curriculum-induced specialization, and (iv) deterministic unanimous correction under bounded inference traversal. This combination distinguishes the proposed framework from classical ensemble aggregation, cascade filtering, and probabilistic gating architectures.

2.7. Multi-Class Scalability

Although experiments in this work focus on binary intrusion detection, the conditional counter-inspection mechanism extends directly to multi-class settings.

Let

G

be a primary classifier such that:

\hat{y} = G (x), \hat{y} \in \{1, \dots, k\}

(6)

where

k

denotes the number of classes.

For each class

k \in \{1, \dots, K\}

, we define a class-specific expert group:

ε_{k} = \{E_{k, 1}, \dots, E_{k, M}\}

Each expert

E_{k, j}

is trained using controlled exposure so that class

k

remains dominant in its training distribution, while other classes are included in limited proportions. This induces class-specific specialization without architectural modification.

During inference, once the global classifier produces a prediction

\hat{y} = G (x)

, only the experts in

ε_{\hat{y}}

are activated. We define a unanimous-dissent condition as:

D (x) = 1 i f a n d o n l y i f E_{\hat{y}, j} (x) \neq \hat{y} f o r a l l j \in \{1, \dots, M\}

The final prediction is then defined as:

{\hat{y}}^{'} = \{\begin{matrix} \hat{y}, i f D (x) = 0, \\ \arg \underset{c \neq \hat{y}}{m a x} \sum_{j = 1}^{M} 1 (E_{\hat{y} \cdot j} (x) = c), i f D (x) = 1 . \end{matrix}

(7)

That is, the global decision is retained unless all activated experts unanimously contradict it. When unanimous dissent occurs, the label is reassigned to the most frequently predicted alternative class among the activated experts.

This extension preserves the same structural principles as the binary case:

A single primary predictor;
Selective activation of only class-relevant experts;
Deterministic correction under unanimous dissent;
Bounded inference traversal.

A full empirical evaluation of the multi-class extension is left for future work.

3. Results

This section is structured to progressively evaluate the proposed framework. Section 3.1 isolates the added value of the conditional counter-inspection layer by analyzing how it augments the behavior of the base decision-tree model in terms of accuracy, error correction, efficiency, interpretability, and robustness. Building on this analysis, Section 3.2 evaluates the complete two-layer pipeline and positions its performance and deployability relative to state-of-the-art intrusion detection methods on the 5G-NIDD dataset.

3.1. Added Value of Conditional Counter-Inspection

3.1.1. Incremental Enhancement: Impact of Conditional Counter-Inspection

We first evaluate the decision-tree model that constitutes the base layer of the proposed pipeline, trained on the full training set. This global decision tree serves as a transparent reference point for all subsequent comparisons and enables a direct assessment of the incremental improvements introduced by conditional counter-inspection. As reported in Table 2, the standalone tree already achieves strong detection performance on the 5G-NIDD test set, with an F1-score of 0.99966. Nevertheless, it still produces non-negligible false negatives and false positives, which are critical in intrusion detection settings.

We then assess whether augmenting this base model with the conditional counter-inspection layer improves detection quality without modifying or retraining the global tree. Table 3 reports the resulting performance comparison, while Table 4 details the corresponding confusion-matrix statistics. The proposed pipeline achieves an F1-score of 0.99981, reflecting consistent improvements across all metrics. In particular, false negatives are reduced from 51 to 27 (−47.1%), and false positives from 48 to 30 (−37.5%). These gains indicate that targeted counter-inspection effectively corrects class-specific errors made by the global tree, reducing both missed intrusions and false alarms without altering the base model.

From an operational perspective, these improvements are significant. Reducing false negatives directly limits undetected attacks, while reducing false positives alleviates analyst workload and alert fatigue. Importantly, these benefits are achieved through selective validation rather than through retraining, tuning, or increasing the complexity of the global model.

3.1.2. Cost-Sensitive Evaluation

In intrusion detection systems, error types do not carry equal consequences. False negatives (missed attacks) typically incur significantly higher operational risk than false positives. To reflect this asymmetry, we evaluate the global model (G) and the proposed conditional counter-inspection (CS) architecture under a cost-sensitive formulation:

R i s k = λ \cdot F N + F P

(8)

where λ lambda represents the penalty assigned to false negatives.

Table 5 reports the results for

λ \in {5,10,20,50}

. The proposed architecture consistently reduces the total cost-weighted risk across all penalty levels.

Key observations:

At $λ = 20$ , risk reduces from 1068 (G) to 570 (CS).
At $λ = 50$ , risk reduces from 2598 (G) to 1380 (CS).

The relative reduction in total risk remains close to 50% across all tested penalty factors, indicating stability of the correction mechanism under increasingly severe attack cost assumptions.

These results confirm that the proposed validation framework improves not only aggregate metrics but also operationally meaningful cost-sensitive risk, reinforcing its suitability for real-world 5G intrusion detection environments.

3.1.3. Ablation Analysis: Source of Performance Gains

To identify the mechanisms responsible for the observed performance improvements, we conduct two controlled ablation studies. The first isolates the effect of conditional routing, while the second evaluates the impact of the decision revision rule, independently of routing and model composition.

Ablation A—Does conditional routing matter beyond flat voting?

Objective: This ablation examines whether the observed performance gains arise from conditional routing itself, rather than from the presence of multiple expert models.

Compared configurations: All configurations rely on the same CART models and training data; only the inference strategy differs:

G (Global tree): a single decision tree trained on the full dataset (1 tree per record).
FV (Flat voting): uniform aggregation of all models, where every record is evaluated by the global tree and all six experts (7 trees per record).
Proposed pipeline: a conditional counter-inspection strategy in which the global prediction is validated only by counter-class experts, and revised under unanimous dissent (4 trees per record on average, −42.86% routing cost).

Results and statistical significance: Table 6 reports paired McNemar tests comparing error rates between configurations on the test set (N = 243,178).

Key observations.

The proposed pipeline significantly improves over the global tree (p = 1.35 × 10⁻⁶) and over flat voting (p = 2.06 × 10⁻⁶).
Flat voting does not yield a statistically significant improvement over the global tree (p = 1.0), despite evaluating all experts.
Because flat voting and the proposed pipeline reuse exactly the same expert models, these results isolate conditional routing as the primary source of improvement, rather than expert multiplicity.

Deployability implication: Flat voting incurs the full ensemble cost (7 trees per record) without measurable benefit, whereas the proposed pipeline achieves higher accuracy with fewer evaluations (4 trees/record), making conditional routing both more effective and more efficient.

Ablation 2—Does the unanimous-dissent flip rule matter?

Objective: This ablation isolates the impact of the decision revision rule, independently of routing or model composition.

Variant definition (fixed routing, different dissent thresholds): All variants follow identical conditional routing: the global prediction is computed first, and only counter-class experts are evaluated. They differ only in the rule used to revise the decision:

Any-dissent rule: flip the global decision if at least one counter-class expert disagrees (highly permissive).
Majority-dissent rule: flip if at least two out of three counter-class experts disagree (moderately permissive).
Unanimous-dissent rule (proposed): flip only if all counter-class experts disagree (conservative).

This controlled setup eliminates confounding factors related to routing, training data, or model capacity. Performance comparison: Table 7 summarizes accuracy and F1-score for each variant.

Key observations:

The unanimous-dissent rule achieves the strongest overall performance (Accuracy = 0.99977, F1-score = 0.99981).
The majority-dissent rule performs slightly worse but remains competitive (Accuracy = 0.99973, F1-score = 0.99978).
The any-dissent rule degrades performance below that of the global tree (Accuracy = 0.99954, F1-score = 0.99962), indicating excessive and erroneous decision reversals.

In addition to performance metrics, paired McNemar tests were conducted to statistically validate the effect of the decision revision rule under fixed routing and identical trained components. The unanimous-dissent rule significantly outperforms the any-dissent variant (p = 2.0 × 10⁻⁸), yielding marked reductions in both false-positive and false-negative rates, while the difference between unanimous and majority dissent is not statistically significant (p = 0.1698). Full statistical results are reported in Appendix B.

3.1.4. Efficiency and Footprint

To verify that the observed accuracy gains do not incur proportional computational overhead, we evaluate inference latency and model footprint. As reported in Table 8, the proposed pipeline operates in the microsecond inference regime (≈1–2 µs per record under CPU-only execution), while substantially reducing the number of model evaluations compared to flat voting. Inference latency was measured using best-of-n per-record timing on a single CPU environment and should be interpreted as indicative rather than absolute.

Specifically, conditional counter-inspection evaluates, on average, four decision trees per record, whereas flat voting requires seven trees per record, corresponding to a 42.86% reduction in routing cost (measured as the number of tree traversals per record). This reduction is achieved without sacrificing detection performance, confirming that efficiency gains stem from selective model activation rather than architectural simplification.

In addition, the complete serialized pipeline—including the global tree and all six expert models—remains compact, with a total size of approximately 201 kB. This footprint corresponds to Python pickle serialization of scikit-learn CART models and provides a practical estimate of deployable memory usage. The combination of CPU-only inference, compact model size, and the absence of complex pre-processing supports deployment in resource-constrained 5G environments, such as edge nodes or embedded monitoring systems.

3.1.5. Interpretability and Auditability of the Full Pipeline

Because all learners in the proposed architecture are homogeneous CART decision trees, interpretability is preserved at two complementary levels: (i) model-level transparency for the global tree and each class-biased expert, and (ii) pipeline-level auditability enabled by explicit conditional routing and the unanimous-dissent rule.

(i) Model-level specialization (Global vs. experts): To verify that the six class-biased experts are not redundant copies of the global tree, we report impurity-based feature importances for each model. Table 9 summarizes the top-8 ranked features for the global model (G) and the experts (EM1–EM3: malicious-biased experts; EB1–EB3: benign-biased experts). The global tree is dominated by Seq and sTtl, whereas experts exhibit distinct emphasis patterns. For instance, malicious-biased experts (EM*) remain strongly driven by Seq/sTtl, while benign-biased experts (EB*) elevate other discriminative signals (e.g., sMeanPktSz, SrcWin, and occasionally X). This diversity supports the intended effect of class bias: despite identical hyperparameters, experts exhibit data-induced specialization rather than acting as redundant replicas of the global tree.

For clarity, we focus on the top-ranked features, as lower-ranked attributes contribute marginally to the decision process. Overall, the pipeline’s discriminative power is primarily driven by Seq and sTtl across the global tree and malicious experts, while benign experts exhibit complementary reliance on size-related features (e.g., sMeanPktSz, SrcWin), supporting the intended specialization effect. The distribution of feature importance across the global tree and the class-biased experts is further visualized in Figure 2, which highlights both shared dominant predictors and specialization patterns induced by biased training exposure.

Quantitative Overlap Analysis

To complement the qualitative feature-importance inspection, we quantify structural similarity between the global tree and each expert using Top-k feature overlap and Jaccard similarity (k = 8).

Overlap values range from 62.5% to 100%, with Jaccard coefficients between 0.45 and 1.00. These results indicate that while experts remain partially aligned with the global model—preserving core discriminative signals—they also exhibit measurable structural divergence induced by controlled minority exposure.

This quantitative analysis confirms that specialization is neither arbitrary nor enforced through architectural heterogeneity; rather, it emerges naturally from curriculum-biased training while maintaining alignment with the global decision structure.

Controlled Disagreement Analysis

To characterize the behavior of the counter-inspection mechanism, we analyze routing-aware disagreement between the global model and the experts. Unanimous counter-class dissent occurs in only 0.0296% of test instances, with disagreement entropy equal to 0.009 bits.

Although disagreement is rare, these events correspond precisely to the subset of instances where correction is necessary. The resulting selective reversals reduce both false positives and false negatives, as demonstrated in Section 3.1.1 and Section 3.1.2.

The low entropy indicates that routing behavior is highly deterministic rather than unstable. Thus, the expert layer does not introduce ensemble noise; instead, it performs sparse, high-precision corrections on ambiguous boundary cases.

(ii) Pipeline-level auditability (end-to-end routing trace): Beyond per-model transparency, our pipeline is auditable at inference time because routing is deterministic: the global tree predicts first, then only counter-class experts are consulted, and the decision flips only under unanimous dissent. Table 10 illustrates a representative test record where our pipeline corrects the global prediction. The trace explicitly records: (1) the global prediction, (2) which experts were triggered by conditional routing, (3) their predictions, and (4) whether unanimous dissent was activated. This provides a concrete, record-level explanation of why the final decision differs from the global model.

We emphasize that the reported feature importances are impurity-based and therefore descriptive rather than causal; however, when combined with explicit decision paths and the routing trace, they provide a practical and lightweight explanation mechanism suitable for IDS auditing in latency-constrained environments.

3.1.6. Robustness Across Data Splits

To assess the stability of the proposed pipeline under data re-partitioning, we evaluate the full architecture across five independent stratified 80/20 splits (random seeds: 11, 22, 33, 44, 55) while keeping the methodology unchanged, including the expert fractions (0.1/0.2/0.3). Results are reported for the global tree (G), flat voting (FV), and the proposed pipeline.

Error-rate stability (FNR/FPR): Figure 3a,b report the false negative rate (FNR) and false positive rate (FPR) across splits. The proposed pipeline yields consistently lower FNR (fewer missed attacks) and generally lower FPR (fewer false alarms), showing that the observed performance gains translate into operationally meaningful error reductions.
Compact stability summary: Figure 3c summarizes robustness using the mean ± standard deviation of F1 across splits. The proposed pipeline achieves the highest mean F1 while maintaining low variance, supporting stable gains under data re-partitioning.
F1-score stability: Figure 3d reports the F1-score across the five splits. The proposed pipeline consistently achieves the highest F1-score, indicating that the improvement is reproducible and not driven by a particular partition.

Robustness Under Attack Prevalence Shift

To evaluate stability under varying class distributions, we simulate different malicious prevalence levels ranging from 50% down to 0.1%, while preserving the same trained models. Results show that the proposed pipeline consistently maintains strong recall (up to 1.000 at extreme imbalance) while reducing false positives relative to the global tree.

At 0.001 prevalence, the pipeline achieves perfect recall (FNR = 0) and improves precision from 0.6667 (G) to 0.7619, demonstrating better discrimination under extreme rarity. Across all tested prevalence levels, CS preserves or improves F1-score and maintains a lower FPR (0.000314 vs. 0.000502 for G).

These results confirm that the correction mechanism remains stable under severe class imbalance, a critical requirement in operational IDS environments where attack traffic is typically rare.

3.2. Benchmarking on 5G-NIDD: Performance and Deployability

3.2.1. Classical Benchmarking Against ML Baselines (Closest Experimental Alignment)

We first position the proposed pipeline against classical machine-learning baselines reported under a closely aligned supervised, tabular IDS setting on 5G-NIDD. To reflect deployment practicality, we additionally report serialized model size when it is explicitly disclosed in the source study [15].

Table 11 summarizes representative ML baselines and ensembles together with their reported model sizes (when available), and includes the proposed pipeline as a reference point.

All reported results correspond to binary benign/malicious classification on 5G-NIDD, as stated in the original study [15]; differences in preprocessing or optimization strategies may exist, and results are included to contextualize accuracy–footprint trade-offs under the same dataset family, rather than to claim strict experimental equivalence.

Beyond near-ceiling predictive quality, the proposed pipeline avoids the typical footprint cost associated with ensemble methods. Performance gains are achieved through structured specialization and conditional routing rather than always-on aggregation. Based on the serialized sizes reported in Table 11, the proposed pipeline (≈201 kB) is approximately 187× smaller than LightGBM, 229× smaller than Random Forest, and over 2100× smaller than KNN, while operating in the same high-accuracy regime.

3.2.2. Deployability-Oriented Benchmarking Against Deep Learning Systems (Cost at Similar Reported Accuracy)

Deep learning IDS studies are often not strictly like-for-like with tabular ML pipelines because they frequently change representation, optimize different objectives (binary vs. multi-class or attack-specific detection), and make stronger hardware assumptions. Rather than forcing strict experimental equivalence, we adopt a reported-value, deployment-driven comparison that asks a narrower question: among studies reporting strong detection performance and explicitly disclosing inference time, what is the reported inference-cost regime?

In 5G monitoring pipelines, inference latency directly constrains whether detection can be performed inline, near-inline, or only offline; therefore, order-of-magnitude differences between microsecond- and millisecond-scale inference are operationally decisive, even at similar accuracy levels.

Accordingly, Table 12 lists recent deep IDS models that report both performance and inference time, alongside the proposed pipeline. Figure 4 visualizes the latency regime gap (µs vs. ms) using only disclosed values.

These visuals do not claim strict parity across methodologies; instead, they show that the proposed pipeline operates in the microsecond-per-record regime while maintaining near-ceiling detection quality, whereas the reported deep models lie in the millisecond regime under their respective setups. This cost-centric view directly supports suitability for low-latency, high-throughput 5G monitoring.

Using only reported values, proposed pipeline sits in the microsecond inference regime, while the cited deep models that disclose inference time typically fall in the 0.06–2.78 ms range per record, corresponding to an approximately 10²–10³× slower inference regime, even when their predictive quality is strong.

This supports the claim that, at high accuracy, our architecture delivers that performance at a deployability-friendly latency scale suitable for low-latency, high-throughput 5G monitoring—without needing representational transformations or deep inference stacks.

Together, these two benchmarking tiers show that the proposed pipeline attains state-of-the-art detection quality within a uniquely lightweight and low-latency operating regime, motivating the design analysis presented next.

3.3. Cross-Dataset Validation on UNSW-NB15

To assess the robustness of the proposed conditional counter-inspection architecture beyond 5G-NIDD, we conducted additional experiments on the UNSW-NB15 benchmark using its official train/test split.

UNSW-NB15 presents a substantially more heterogeneous traffic distribution than 5G-NIDD, with broader feature variability and a more challenging class structure. On this dataset, the standalone global decision tree (G) achieved an accuracy of 89.18%, with a false positive rate (FPR) of 3.70% and a false negative rate (FNR) of 14.16%.

When the conditional counter-inspection layer was applied, the overall accuracy improved to 89.62%. More importantly, the mechanism produced simultaneous reductions in both major error types:

False positives decreased from 2074 to 1515 (−559 instances, corresponding to a 26.9% relative reduction).
False negatives decreased from 16,904 to 16,688 (−216 instances).
FPR decreased from 3.70% to 2.71%.
FNR decreased from 14.16% to 13.98%.

The reduction in false positives is particularly significant for operational intrusion detection systems, where excessive alerts can overwhelm analysts and degrade response efficiency. In large-scale 5G deployments processing millions of flows daily, a relative reduction of approximately 27% in false positives translates into a substantial decrease in unnecessary investigations. Simultaneously, reducing false negatives enhances protection against undetected malicious activity.

These findings confirm that the proposed conditional validation mechanism remains effective under heterogeneous traffic conditions and does not depend on dataset-specific separability properties. While this study focused on interpretable decision-tree models to preserve simplicity and bounded inference cost, further gains may be achievable by exploring alternative lightweight base classifiers within the same counter-inspection framework in future work.

4. Discussion

This study addresses a recurring limitation in learning-based intrusion detection for 5G networks: the structural trade-off between detection accuracy and operational efficiency. While prior work has predominantly pursued higher accuracy through increased model complexity—via deeper architectures or larger ensembles—the results presented here demonstrate that substantial performance gains can instead be achieved through architectural selectivity.

The proposed conditional counter-inspection architecture builds on the observation that misclassifications in intrusion detection are often class-specific and concentrated near ambiguous decision regions. Rather than relying on a single global model or uniformly aggregating multiple experts to resolve this ambiguity, the proposed approach introduces selective validation, whereby only experts relevant to the initial prediction are consulted. This design departs from conventional ensemble strategies that evaluate all models for every instance and therefore incur unnecessary computational cost.

Experimental results on the 5G-NIDD dataset confirm that this selective validation strategy yields meaningful improvements. Compared to a strong baseline decision tree trained on the full dataset, the proposed architecture simultaneously reduces both false positives and false negatives—a result that is rarely achieved through simple post-processing or threshold tuning. Importantly, ablation experiments show that these gains do not arise from the mere presence of additional experts. Flat voting over the same expert models fails to improve performance, whereas conditional counter-inspection combined with a unanimous-dissent decision rule yields statistically significant error reduction. This indicates that decision logic and selective activation, rather than ensemble size, are the primary drivers of improvement.

From a deployment perspective, the proposed architecture preserves the advantages traditionally associated with classical machine learning models. Inference remains in the microsecond regime per record, model size remains on the order of hundreds of kilobytes, and training time is measured in seconds rather than minutes or hours. These properties contrast with many deep learning–based intrusion detection pipelines, which often require complex preprocessing, GPU acceleration, and substantially larger memory footprints. While such models may offer advantages for fine-grained or multi-class analysis, their operational cost can limit applicability in latency-sensitive or resource-constrained 5G environments.

Interpretability is another important strength of the proposed approach. Because all components are decision trees, each prediction can be traced through explicit decision paths, both at the initial classification stage and within the counter-inspection layer. This transparency is particularly valuable in security-critical settings, where auditability and explainability are often as important as raw detection performance.

Finally, although the present study focuses on binary benign/malicious detection using tabular flow-level records, the architectural principles introduced here are not inherently limited to this setting. Conditional routing, class-biased specialization, and unanimous-dissent decision rules could be extended to multi-class detection, alternative data representations, or hybrid pipelines combining lightweight models at different stages. These directions represent promising avenues for future research.

5. Conclusions

This work introduced a conditional counter-inspection architecture for lightweight intrusion detection in 5G networks. Starting from a single decision-tree classifier trained on the full dataset, the proposed approach augments initial predictions with a selectively activated layer of class-biased experts. By consulting only counter-class experts and revising decisions under a unanimous-dissent rule, the architecture corrects residual classification errors without incurring the cost of uniform ensemble evaluation.

Extensive experiments on the 5G-NIDD dataset demonstrate that the proposed architecture achieves near-ceiling detection performance while simultaneously reducing both false positives and false negatives relative to a strong baseline and to flat expert aggregation. These gains are obtained with microsecond-level inference latency, compact model size, and full interpretability, making the approach particularly suitable for practical and resource-constrained 5G deployment scenarios.

Overall, the results suggest that structured decision validation offers a compelling alternative to increasing model complexity, enabling intrusion detection systems that are both accurate and operationally efficient.

Author Contributions

Conceptualization, K.T.; Data curation, K.T.; Formal analysis, K.T., I.F.E.F. and M.M.; Funding acquisition, K.T.; Investigation, K.T.; Methodology, K.T.; Project administration, K.T.; Resources, K.T.; Software, K.T.; Supervision, I.F.E.F. and M.M.; Validation, K.T., I.F.E.F. and M.M.; Writing—original draft, K.T.; Writing—review and editing, K.T., I.F.E.F. and M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The experiments in this study were conducted using the publicly available 5G-NIDD dataset. The implementation code is available from the corresponding author upon reasonable request.

Acknowledgments

The authors acknowledge the developers and maintainers of the 5G-NIDD dataset for making this resource publicly available.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

IDS	Intrusion Detection System
CART	Classification and Regression Tree
DL	Deep Learning
ML	Machine Learning
FPR	False Positive Rate
FNR	False Negative Rate

Appendix A

This appendix reports the standalone performance of class-biased expert models as a function of minority-class exposure during training. The goal is to quantify how increasing exposure fractions affect discrimination quality and to identify saturation and bias-erosion effects that inform the selection of the expert curriculum used in the main pipeline. All results are obtained on a training-internal validation split and are reported independently of the routing and decision logic used at inference time.

Interpretation and design implications

For malicious-biased experts (Table A1), increasing minority exposure from 0.10 to 0.20 yields a substantial improvement, primarily driven by a large reduction in false-positive rate. Subsequent increases provide only marginal gains, with performance largely saturating beyond fraction 0.30. This indicates diminishing returns, where additional exposure improves robustness only slightly while increasing model redundancy.

For benign-biased experts (Table A2), the optimal trade-off is reached earlier. Fraction 0.20 achieves the highest F1-score with the lowest false-positive rate, while further increases lead to bias erosion: false-positive rates rise and overall F1 stagnates or degrades, despite minor recall fluctuations. This behavior reflects increased contamination from the malicious class, which weakens specialization.

Although marginal improvements persist beyond fraction 0.30 in some metrics, these gains are inconsistent and come at the cost of additional models and higher inference overhead. Taken together, these results confirm that the first three curriculum steps (0.10, 0.20, 0.30) capture the majority of specialization benefits, while extending the curriculum further yields limited or adverse effects. This supports the selection of three experts per class as a principled and cost-sensitive design choice.

Table A1. Malicious-biased experts—performance vs. minority exposure (validation on training-internal split).

Fraction	F1	FNR	FPR	Recall
0.10	0.999560	0.000093	0.001217	0.999907
0.20	0.999678	0.000110	0.000824	0.999890
0.30	0.999708	0.000152	0.000667	0.999848
0.40	0.999729	0.000152	0.000602	0.999848

Table A2. Benign-biased experts—performance vs. minority exposure (validation on training-internal split).

Fraction	F1	FNR	FPR	Recall
0.10	0.999737	0.000483	0.000065	0.999517
0.20	0.999780	0.000356	0.000131	0.999644
0.30	0.999750	0.000381	0.000183	0.999619
0.40	0.999750	0.000322	0.000275	0.999678

Appendix B

Table A3. Paired McNemar tests isolating the effect of the decision revision rule under fixed routing and fixed trained pipeline components.

Comparison	p-Value	ΔError	ΔFPR	ΔFNR
Pipeline (any-dissent) → Unanimous	2.0 × 10⁻⁸	−0.000226	−0.000408	−0.000108
Pipeline (majority-dissent) → Unanimous	0.1698	−0.000033	−0.000084	0.000000

References

Hamroun, C.; Fladenmuller, A.; Pariente, M.; Pujolle, G. Intrusion Detection in 5G and Wi-Fi Networks: A Survey of Current Methods, Challenges, and Perspectives. IEEE Access 2025, 13, 40950–40976. [Google Scholar] [CrossRef]
Kasongo, S.M.; Sun, Y. A Deep Learning Method with Wrapper-Based Feature Extraction for Wireless Intrusion Detection Systems. Comput. Secur. 2020, 92, 101752. [Google Scholar] [CrossRef]
Sood, K.; Nosouhi, M.R.; Nguyen, D.D.N.; Jiang, F.; Chowdhury, M.; Doss, R. Intrusion Detection Scheme with Dimensionality Reduction in Next Generation Networks. IEEE Trans. Inf. Forensics Secur. 2023, 18, 965–979. [Google Scholar] [CrossRef]
Bouke, M.A.; Abdullah, A. An Empirical Assessment of Machine Learning Models for 5G Network Intrusion Detection: A Data Leakage-Free Approach. e-Prime—Adv. Electr. Eng. Electron. Eng. 2024, 8, 100590. [Google Scholar] [CrossRef]
Turukmane, A.V.; Devendiran, R. M-MultiSVM: An Efficient Feature Selection Assisted Network Intrusion Detection System Using Machine Learning. Comput. Secur. 2023, 137, 103587. [Google Scholar] [CrossRef]
Djaidja, T.E.T.; Brik, B.; Senouci, S.M.; Boualouache, A.; Ghamri-Doudane, Y. Early Network Intrusion Detection Enabled by Attention Mechanisms and Recurrent Neural Networks. IEEE Trans. Inf. Forensics Secur. 2024, 19, 7783–7793. [Google Scholar] [CrossRef]
Farzaneh, B.; Shahriar, N.; Al Muktadir, A.H.; Towhid, M.S.; Khosravani, M.S. DTL-5G: Deep Transfer Learning-Based DDoS Attack Detection in 5G and Beyond Networks. Comput. Commun. 2024, 228, 107927. [Google Scholar] [CrossRef]
Ilias, L.; Palmos, S.; Doukas, G.; Blika, A.; Kiokes, G.; Ntanos, C.; Askounis, D. Convolutional Autoencoders Coupled with Hypernetworks for Recognizing Attacks in 5G Networks and Beyond. IEEE Open J. Commun. Soc. 2025, 6, 7885–7898. [Google Scholar] [CrossRef]
Lazzarini, R.; Tianfield, H.; Charissis, V. A Stacking Ensemble of Deep Learning Models for IoT Intrusion Detection. Knowl.-Based Syst. 2023, 279, 110941. [Google Scholar] [CrossRef]
Mahmoud, L.; Liyanage, M.; Singla, J.; Gangopadhyay, S. DSEM-NIDS: Enhanced Network Intrusion Detection System Using a Deep Stacking Ensemble Model. IEEE Open J. Commun. Soc. 2025, 6, 955–967. [Google Scholar] [CrossRef]
Xylouris, G.; Vekraki, A.; Christopoulou, M.; Kourtis, M.A.; Markakis, E.K.; Trakadas, P. Advancing Predictive Security for Consumer Applications in Beyond 5G/6G Networks with Annotated Datasets. IEEE Trans. Consum. Electron. 2025, 71, 5108–5118. [Google Scholar] [CrossRef]
Samarakoon, S.; Siriwardhana, Y.; Porambage, P.; Liyanage, M.; Chang, S.-Y.; Kim, J.; Kim, J.; Ylianttila, M. 5G-NIDD: A Comprehensive Network Intrusion Detection Dataset Generated over 5G Wireless Network; IEEE Dataport: Piscataway, NJ, USA, 2022. [Google Scholar] [CrossRef]
Ilias, L.; Doukas, G.; Lamprou, V.; Ntanos, C.; Askounis, D. Convolutional Neural Networks and Mixture of Experts for Intrusion Detection in 5G Networks and beyond. arXiv 2024. [Google Scholar] [CrossRef]
Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Military Communications and Information Systems Conference (MilCIS); IEEE: New York, NY, USA, 2015; pp. 1–6. [Google Scholar] [CrossRef]
Kim, H.; Lee, J.; Park, J.-G. SITRAN: Self-Supervised IDS With Transferable Techniques for 5G Industrial Environments. IEEE Internet Things J. 2024, 11, 35465–35476. [Google Scholar] [CrossRef]
Hadi, H.J.; Cao, Y.; Li, S.; Xu, L.; Hu, Y.; Li, M. Real-time fusion multi-tier DNN-based collaborative IDPS with complementary features for secure UAV-enabled 6G networks. Expert Syst. Appl. 2024, 252, 124215. [Google Scholar] [CrossRef]
Harshdeep, K.; Sumalatha, K.; Mathur, R. DeepTransIDS: Transformer-Based Deep learning Model for Detecting DDoS Attacks on 5G NIDD. Results Eng. 2025, 26, 104826. [Google Scholar] [CrossRef]

Figure 1. End-to-end inference workflow of the proposed conditional counter-inspection architecture.

Figure 2. Feature-importance distribution across the global tree and experts.

Figure 3. Robustness of the proposed pipeline across five independent stratified data splits. (a) False positive rate (FPR), (b) false negative rate (FNR), (c) mean ± standard deviation of F1-score across splits, and (d) F1-score per split.

Figure 4. Deployability benchmarking based on reported inference latency.

Table 1. Training datasets and class exposure ratios for class-biased expert trees.

Expert	Training Dataset	Benign Exposure	Malicious Exposure
$E_{M, 0.1}$	$D_{M}^{0.1}$	10%	100%
$E_{M, 0.2}$	$D_{M}^{0.2}$	20%	100%
$E_{M, 0.3}$	$D_{M}^{0.3}$	30%	100%
$E_{B, 0.1}$	$D_{B}^{0.1}$	100%	10%
$E_{B, 0.2}$	$D_{B}^{0.2}$	100%	20%
$E_{B, 0.3}$	$D_{B}^{0.3}$	100%	30%

Table 2. Baseline performance of the global decision tree on the 5G-NIDD test set.

Model	Accuracy	Precision	Recall	F1-Score
Global Tree (G)	0.99959	0.99967	0.99965	0.99966

Table 3. Test-set performance comparison.

Model	Accuracy	Precision	Recall	F1-Score
Global Tree (G)	0.99959	0.99967	0.99965	0.99966
Proposed pipeline	0.99977	0.99980	0.99982	0.99981

Table 4. Confusion-matrix comparison (absolute counts).

Model	TN	FP	FN	TP	FPR	FNR
Global Tree (G)	95,499	48	51	147,580	0.000502	0.000345
Proposed pipeline	95,517	30	27	147,604	0.000314	0.000183

Table 5. Cost-sensitive evaluation of G vs. CS.

Model	Lambda	FN	FP	Risk
Global tree (G)	5	51	48	303
Proposed pipeline (CS)	5	27	30	165
Global tree (G)	10	51	48	558
Proposed pipeline (CS)	10	27	30	300
Global tree (G)	20	51	48	1068
Proposed pipeline (CS)	20	27	30	570
Global tree (G)	50	51	48	2598
Proposed pipeline (CS)	50	27	30	1380

Table 6. McNemar paired tests with error deltas on the test set (N = 243,178).

Comparison (A → B)	p-Value	ΔError (B − A)	ΔFPR (B − A)	ΔFNR (B − A)
G → Proposed pipeline	1.35 × 10⁻⁶	−0.000173	−0.000188	−0.000163
FV → Proposed pipeline	2.06 × 10⁻⁶	−0.000169	−0.000178	−0.000163
G → FV	1.0	−0.000004	−0.000010	0.000000

Table 7. Ablation performance on the test set (N = 243,178).

Method	Accuracy	F1-Score
Global tree (G)	0.99959	0.99966
Flat voting (FV)	0.99960	0.99967
Proposed pipeline (unanimous dissent)	0.99977	0.99981
Proposed pipeline (any-dissent variant)	0.99954	0.99962
Proposed pipeline (majority-dissent variant)	0.99973	0.99978

Table 8. Efficiency and footprint of the proposed pipeline.

Metric	Value
Inference latency (Proposed pipeline)	1–2 µs/record
Routing reduction vs. flat voting	−42.86%
Serialized model size (G + experts)	201 kB

Table 9. Feature importance rankings for the global decision tree and class-biased expert models (EB1–EB3, EM1–EM3).

Model Rank	EB1	EB2	EB3	EM1	EM2	EM3	G
1	sMeanPktSz (0.335)	sTtl (0.533)	sTtl (0.516)	Seq (0.574)	Seq (0.561)	Seq (0.548)	Seq (0.472)
2	SrcWin (0.247)	Seq (0.259)	Seq (0.319)	sTtl (0.305)	sTtl (0.338)	sTtl (0.358)	sTtl (0.427)
3	X (0.199)	State_ECO (0.182)	State_ECO (0.143)	sMeanPktSz (0.054)	State_ECO (0.050)	State_ECO (0.055)	State_ECO (0.079)
4	Seq (0.130)	dTtl (0.009)	sMeanPktSz (0.010)	State_ECO (0.040)	sMeanPktSz (0.031)	sMeanPktSz (0.022)	sMeanPktSz (0.011)
5	SrcBytes (0.053)	sMeanPktSz (0.007)	dTtl (0.007)	TotBytes (0.021)	SrcBytes (0.013)	SrcBytes (0.009)	SrcBytes (0.004)
6	DstWin (0.025)	pLoss (0.005)	pLoss (0.003)	dTtl (0.002)	dTtl (0.002)	dTtl (0.003)	dTtl (0.003)
7	sTtl (0.007)	SrcBytes (0.003)	DstWin (0.002)	State_ACC (0.001)	X (0.001)	TotBytes (0.001)	pLoss (0.002)
8	dMeanPktSz (0.002)	DstWin (0.001)	X (0.000)	X (0.001)	State_ACC (0.001)	State_ACC (0.001)	DstWin (0.001)

Table 10. Example routing trace for one test record (G → counter-inspection → Unanimous dissent).

	Step	Model	Pred	Unanimous Dissent
0	G	G	0	—
1	Expert	EM3	1	—
2	Expert	EM2	1	—
3	Expert	EM1	1	—
4	Decision	Proposed pipeline	1	True

[Trace] record_idx = 10,406 (Proposed pipeline corrects G).

Table 11. Classical ML benchmarking on 5G-NIDD (tabular supervised IDS), with size when reported.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	Serialized Model Size
LightGBM	99.98	99.98	99.99	99.98	37.6 MB
Random Forest	99.97	99.97	99.98	99.97	46.1 MB
SVM (Linear)	99.50	99.43	99.74	99.59	–
Logistic Regression	99.74	99.73	99.84	99.79	–
KNN	99.96	99.97	99.97	99.96	432 MB
SITRAN	99.98	99.88	99.99	99.93	4.6 MB
Proposed pipeline (ours)	99.98	99.98	99.98	99.98	≈0.20 MB (≈201 kB)

Note: “–” indicates that the serialized model size was not reported in the original study.

Table 12. Deployability-oriented comparison at strong performance (reported values).

Study/Model	Representation	Task	Reported Performance (5G-NIDD)	Reported Inference Time
Proposed pipeline (ours)	tabular, record-level	binary	99.98% acc/99.98% F1	≈1–2 µs/record
Fusion Multi-Tier DNN (ESWA 2024) [16]	multi-tier DNN + fusion	binary	99.15% detection accuracy	~118 ms/inference (Jetson TX2)
DeepTransIDS (Transformer) [17]	Transformer-based deep learning model	binary	98.8% accuracy (binary)	2.78 ms/record
DTL-5G (BiLSTM TL4, Obj2) [7]	transfer learning DL	binary DDoS detection	94.46% acc/94.40% F1	0.15 ms (CPU)/0.08 ms (GPU)

Reported inference times correspond to the experimental setups described in the original studies.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tahori, K.; Fatani, I.F.E.; Moughit, M. Conditional Counter-Inspection with Curriculum-Biased Experts for Lightweight 5G Intrusion Detection. Future Internet 2026, 18, 116. https://doi.org/10.3390/fi18030116

AMA Style

Tahori K, Fatani IFE, Moughit M. Conditional Counter-Inspection with Curriculum-Biased Experts for Lightweight 5G Intrusion Detection. Future Internet. 2026; 18(3):116. https://doi.org/10.3390/fi18030116

Chicago/Turabian Style

Tahori, Khaoula, Imade Fahd Eddine Fatani, and Mohamed Moughit. 2026. "Conditional Counter-Inspection with Curriculum-Biased Experts for Lightweight 5G Intrusion Detection" Future Internet 18, no. 3: 116. https://doi.org/10.3390/fi18030116

APA Style

Tahori, K., Fatani, I. F. E., & Moughit, M. (2026). Conditional Counter-Inspection with Curriculum-Biased Experts for Lightweight 5G Intrusion Detection. Future Internet, 18(3), 116. https://doi.org/10.3390/fi18030116

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Conditional Counter-Inspection with Curriculum-Biased Experts for Lightweight 5G Intrusion Detection

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of the Conditional Counter-Inspection Pipeline

Lightweight Design Clarification

2.2. Dataset and Experimental Preparation

2.2.1. Dataset Description

2.2.2. Preprocessing

2.2.3. Terminology and Unit of Prediction

2.3. First Layer: Global Decision Tree

2.4. Second Layer: Counter-Inspection

2.4.1. Motivation: Specialization Through Controlled Data Exposure

2.4.2. Curriculum-Biased Training Set Construction

2.4.3. Expert Training

2.4.4. Expert Number and Curriculum Justification

2.5. Full Pipeline: Conditional Counter-Inspection

2.5.1. Conditional Expert Activation

2.5.2. Counter-Inspection Semantics

2.5.3. Unanimous-Dissent Rule and Logical Formulation

2.6. Distinction from Cascaded and Gated Expert Architectures

2.7. Multi-Class Scalability

3. Results

3.1. Added Value of Conditional Counter-Inspection

3.1.1. Incremental Enhancement: Impact of Conditional Counter-Inspection

3.1.2. Cost-Sensitive Evaluation

3.1.3. Ablation Analysis: Source of Performance Gains

3.1.4. Efficiency and Footprint

3.1.5. Interpretability and Auditability of the Full Pipeline

Quantitative Overlap Analysis

3.1.6. Robustness Across Data Splits

Robustness Under Attack Prevalence Shift

3.2. Benchmarking on 5G-NIDD: Performance and Deployability

3.2.1. Classical Benchmarking Against ML Baselines (Closest Experimental Alignment)

3.2.2. Deployability-Oriented Benchmarking Against Deep Learning Systems (Cost at Similar Reported Accuracy)

3.3. Cross-Dataset Validation on UNSW-NB15

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI