1. Introduction
With the rapid growth of connected systems and AI-enabled services, cyber attacks have become an increasingly persistent threat to individuals and organizations. AI-driven intrusion detection has therefore attracted substantial attention, since learning-based models can improve attack prediction and reduce potential damage when compared to purely manual or rule-based approaches [
1]. At the same time, security operators and stakeholders often require transparent decision rationales to build trust in black-box AI systems, motivating the integration of explainable AI (XAI) into cybersecurity pipelines [
2].
To contextualize this trend, our review of publications indexed on ScienceDirect indicates a sustained increase in research output at the intersection of cybersecurity, explainable AI, and IoT security.
Figure 1 summarizes the publication growth from 2002 to 2025 for cybersecurity in general, as well as the smaller (but increasing) body of work that jointly considers cybersecurity with XAI and IoT.
Intrusion detection systems (IDSs) are widely used to monitor network behavior and raise alarms upon detecting abnormal or malicious activity. IDS methods are commonly categorized into signature-based and anomaly-based systems [
3]. Signature-based IDS compare traffic against known attack patterns (signatures), offering effective detection of known threats but requiring frequent updates and typically failing against unseen attacks. In contrast, anomaly-based IDS learn profiles of normal behavior and can therefore detect novel or evolving attacks, although they may suffer from higher false positives under non-stationary traffic conditions [
4]. Because IoT and wireless environments are exposed to diverse threats, such as denial-of-service (DoS) and distributed denial-of-service (DDoS) attacks and malware variants, hybrid IDS designs that combine signature-based filtering with anomaly-based learning are often preferable [
5,
6].
Recent work highlights the importance of robust data collection, feature engineering, and realistic evaluation. For example, high-speed packet capture considerations have been studied in software-based capture architectures [
7,
8,
9], while deployment considerations such as IDS sensor placement have also been investigated [
10]. Dataset choice remains critical: classic benchmarks such as KDD Cup’99 and NSL-KDD are historically important but may not represent real-world traffic characteristics, motivating careful selection of modern datasets and leakage-aware protocols [
11].
Deep learning and hybrid learning pipelines have shown strong promise in improving detection performance by learning discriminative patterns directly from traffic features or sequences. Convolutional Neural Network (CNN)-based intrusion detection has been widely explored due to its ability to automatically extract informative patterns and reduce reliance on manual feature design [
12]. Several recent studies report strong performance using deep models, feature engineering, or hybrid frameworks, while also noting practical challenges such as class imbalance, computational cost, and generalization [
13,
14,
15,
16]. For example, CNN-based approaches have achieved high multi-class and binary detection accuracy on CIC IoT-DIAD 2024 [
17], while edge-oriented anomaly detection has been explored using autoencoders with transfer learning [
18]. Hybrid CNN feature extraction with XGBoost classification, combined with SHAP explanations, has also demonstrated very high accuracy for device identification and attack detection [
19]. Other directions include SDN-IoT anomaly detection using DNN-integrated controllers [
20], as well as tabular deep learning (TabNet) for intrusion detection and generalization analysis [
21]. Additionally, efficient processing techniques (e.g., adaptive quantization) have been investigated to improve deployment practicality without compromising detection accuracy [
22].
Beyond accuracy, explainability is increasingly viewed as a deployment requirement in security contexts. XAI can support analyst trust, facilitate triage, and assist with model auditing by clarifying which features contribute most to predictions [
23]. Common explanation techniques include LIME and SHAP [
24], and feature-importance behavior has been compared across linear and nonlinear models [
25]. XAI has also been applied to improve interpretability in host-based intrusion detection and anomaly explanation, including validation/perturbation analyses and reference-based explanation strategies [
26,
27]. Lightweight, explanation-driven compression methods have additionally been proposed to reduce model size while preserving performance [
28]. Grad-CAM-inspired explainable approaches have also been explored for intrusion detection to make deep model decisions more interpretable [
1]. Recent contributions further reflect ongoing interest in IoT and cyber–physical security, spanning large-scale intrusion detection and traffic assessment, DoS detection in IoT environments [
29], integrity monitoring for embedded neural networks, and UAV system security [
30,
31,
32,
33].
Table 1 summarizes representative studies using the CIC IoT-DIAD 2024 dataset and highlights the strong performance of deep and hybrid pipelines, alongside ongoing challenges such as generalization, resource efficiency, and trustworthy explanations.
Despite the reported performance of recent IDS models, data leakage is not always explicitly controlled, particularly when supervised feature selection is performed prior to train/test partitioning. If features are ranked using the full dataset, information from samples that later appear in the evaluation split can influence the selected subset and inflate performance estimates, especially under class imbalance where accuracy may remain high while minority-class performance degrades. For this reason, leakage-aware feature selection is treated in this study as a necessary step for reliable evaluation, and results are reported under both a leakage-prone (biased) protocol and a leakage-aware (unbiased) protocol to quantify the impact of selection timing on generalization, alongside macro-averaged metrics and SHAP-based model explanations of the CNN embedding used by XGBoost.
Motivated by these findings, we present an explainable hybrid IDS for CIC IoT-DIAD 2024 using 1D-CNN embeddings and XGBoost that targets strong detection performance with interpretable outputs. Our main contribution is leakage-aware evaluation rather than a novel architecture, via biased vs. unbiased feature-selection comparisons and SHAP analysis of dominant latent dimensions [
23,
24].
This paper is organized as follows.
Section 2 presents the overall system workflow and data acquisition process, including network feature extraction, preprocessing, and feature selection.
Section 3 reports the experimental results and discussion, including classification performance and explainability analysis. Finally,
Section 4 concludes the paper and outlines directions for future work.
2. System Overview and Data Acquisition
Figure 2 summarizes the end-to-end workflow adopted in this study. Network traffic generated by IoT devices is first captured as packet traces (PCAP) and then transformed into structured tabular records (CSV) for machine learning analysis. The workflow consists of network feature extraction, data preprocessing and feature selection, and downstream learning for intrusion detection and multi-class attack classification.
The CIC IoT-DIAD 2024 dataset [
34], is used as the primary benchmark. Compared to traditional IDS benchmarks such as KDD Cup’99 and NSL-KDD, which are widely used for historical comparison, CIC IoT-DIAD 2024 is intended to reflect contemporary IoT environments. It was collected in an IoT testbed with diverse device types and a wide range of IoT-relevant attack behaviors, making it suitable for multi-class intrusion detection under realistic traffic characteristics [
11,
34]. In addition, it is a dual-function dataset designed for IoT device identification and anomaly/attack detection, collected in an IoT topology comprising 105 devices and including 33 distinct attacks conducted at the Canadian Institute for Cybersecurity. These attacks are grouped into seven high-level categories: DDoS, DoS, Recon, Web-based, Brute Force, Spoofing, and Mirai [
34]. In the present work, emphasis is placed on intrusion detection and multi-class attack classification using the labels provided in the selected dataset partition, while device identification is treated as an optional comparison task when alignment with prior work is required.
Although the dataset includes multiple attack types, the class distribution in the selected partition is highly imbalanced. In particular, DoS/DDoS flooding categories account for most traffic records, whereas several web- and brute-force-related attacks occur with substantially fewer samples.
Table 2 reports the sample count and corresponding percentage for each class in the selected partition. To further clarify the binary composition, the benign-to-malicious ratio is also reported based on the same class totals; for this partition, it is 1:18.7 (Benign:Malicious), computed from
Table 2. This imbalance is important when interpreting macro-averaged metrics, since macro-F1 assigns equal weight to each class and can therefore be disproportionately influenced by low-support minority classes.
The dataset provides both packet-based and flow-based feature representations extracted from PCAP files. Flow-based features are extracted using standard flow metering tools such as CICFlowMeter and are intended primarily for anomaly detection and attack classification, whereas packet-based features are derived from packet-level analysis and can be used for both device identification and anomaly/attack detection. The extracted features cover protocol and statistical descriptors such as flow duration, packet-length statistics, and inter-arrival timing, as well as higher-layer attributes relevant to IoT fingerprinting and security analytics such as TLS handshake fields, HTTP host and user–agent strings, DNS characteristics, and stream/jitter/channel metrics computed over multiple time intervals.
Prior to learning, the CSV tables are processed in a scalable manner suitable for large datasets. Data are read and processed in manageable blocks, then concatenated after cleaning and transformation. Preprocessing includes handling missing values, converting categorical/string fields via label encoding where applicable, and scaling numeric features using standardization to improve optimization stability and classifier performance. Labels are encoded into integer classes to support both binary (benign vs. malicious) and multi-class (attack-type) learning settings.
To improve the reliability of the evaluation, a leakage-aware protocol is applied so that feature selection and normalization are performed using training data only within each split, and the resulting transformations are then applied to the corresponding test data. In addition, a leakage-aware feature-selection strategy is considered by contrasting a biased selection that selects features using the full dataset prior to splitting against an unbiased selection that selects features using training-only information, thereby reducing label leakage and over-optimistic performance estimates. Following hybrid intrusion-detection pipelines reported on CIC IoT-DIAD, a compact feature set can be constructed by ranking features using model-based importance and retaining the top-
k features for downstream learning [
19]. Correlation analysis between selected features and labels can further assist in identifying overly label-correlated attributes that may indicate leakage or dataset artifacts.
After feature selection, a hybrid learning approach is employed in which a CNN learns a compact representation from the selected tabular features, and a gradient-boosted decision tree classifier XGBoost is used for final classification. Explainability is integrated using SHAP values to quantify feature contributions and analyze feature interactions for multi-class decisions, enabling both global interpretability, overall feature relevance, and local interpretability, per-sample decision rationale, in the intrusion-detection setting.
In this work, the CNN is used as a supervised representation learner that maps the selected predictors into a compact latent space to capture nonlinear feature interactions prior to downstream classification. XGBoost is then applied to the learned embedding, leveraging its strong performance on structured data and its ability to model complex decision boundaries. A direct ablation comparing XGBoost trained on the selected features (no CNN) versus XGBoost trained on CNN-derived embeddings was conducted, and the corresponding results are reported in
Section 3.
3. Experimental Results and Discussion
An end-to-end pipeline was implemented to evaluate a hybrid IDS for multi-class attack classification using the CIC IoT-DIAD 2024 dataset. The original dataset comprised 84 traffic features and was refined through preprocessing to enable reliable feature extraction. Features with zero variance were removed and missing values were addressed, reducing the dimensionality to 73 features. To support efficient processing of the large dataset, chunk-wise feature scaling was adopted: the data were partitioned into manageable subsets, and the global mean and variance were incrementally updated across subsets. The learned global mean and variance were then applied consistently to all subsets, which were subsequently recombined into a single normalized dataset for modeling.
The pipeline was implemented with scalability in mind, using block-wise ingestion during preprocessing and reducing the input dimensionality to a compact top-k subset prior to downstream learning. After the normalization parameters and selected predictors are determined within a given split, inference operates on a fixed-length feature vector per record, which supports streaming or near-real-time scoring when the same feature-extraction procedure is available. Applicability to other IoT intrusion datasets is expected when comparable flow- or packet-derived features and consistent label definitions are provided; in such cases, the same preprocessing, leakage-aware feature ranking, and CNN–XGBoost stages can be applied without structural changes, while cross-dataset evaluation is used to quantify transferability under distribution shift.
Label encoding was applied to the attack classes by mapping each categorical class name to a numerical identifier, thereby enabling supervised learning with multi-class classifiers.
Table 3 reports the class names and their corresponding encoded labels. After data cleaning, an additional dimensionality-reduction stage was carried out using supervised feature selection. The objective was to retain a compact subset of predictors that preserves strong discriminative power for multi-class attack-type classification, while avoiding over-optimistic performance estimates that can arise when label-related information inadvertently leaks into the selection process. Accordingly, a top-
k subset (
) was retained based on RF-derived feature-importance ranking with respect to the encoded attack labels.
Although the evaluated dataset partition contains multiple attack types, the class distribution is highly imbalanced, with a small number of traffic categories dominating the records while several attack classes are comparatively under-represented. To make the imbalance explicit,
Table 2 reports the number of samples per class.
Given the high dimensionality and heterogeneity of network traffic attributes, supervised feature selection is often adopted to reduce redundancy, improve computational efficiency, and focus learning on the most informative predictors for attack discrimination. However, feature selection is also a common source of inadvertent information leakage when the ranking step is performed using data that later appear in the evaluation split, which can bias the reported performance upward and weaken the reliability of comparisons between studies [
35]. For this reason, two feature-selection protocols were evaluated; the protocols differed only in the timing of feature ranking relative to the train/test partitioning:
- (i)
Biased (leakage-prone) protocol: Feature importance was computed using the full dataset prior to any train/test split. As a result, samples that subsequently appear in the evaluation set can influence the ranking step, which may inflate reported performance and lead to an inaccurate characterization of generalization.
- (ii)
Unbiased (leakage-aware) protocol: Feature importance was computed using the training subset only, after partitioning. The resulting ranked list was then used to select the top-k predictors, and the same selected subset was applied to the corresponding held-out test subset. This protocol preserves evaluation-set independence and yields a more defensible estimate of performance on unseen samples.
While the above protocols explicitly address leakage arising from the timing of supervised feature selection, additional leakage mechanisms may also be present in IoT intrusion datasets. Temporal correlation may arise when temporally adjacent or near-duplicate records are distributed across training and test sets. Device overlap may occur when samples from the same device appear in both splits. Session or flow dependence may also be introduced when multiple records from the same capture context are separated across splits. Mitigation of these effects typically requires time-aware partitioning and group-aware splitting based on device or session identifiers, when such metadata are available. In this work, leakage mitigation is limited to feature-selection leakage; thus results under random K-fold CV are interpreted as estimates, not guarantees for unseen devices or time.
Feature importance was estimated using an RF model with 100 DTs, where Gini impurity was adopted as the split criterion. RF feature-importance scores were computed using impurity-based importance and aggregated across trees to produce a single ranking, after which the top-
k predictors were retained. To maintain scalability on large CSV tables, importance estimation and subsequent processing were executed in a chunk-wise manner using blocks of 100,000 records. In addition,
K-fold cross-validation was used to form training and testing partitions. Under the biased protocol, five folds were employed. Under the unbiased protocol, three folds were used to reduce computational overhead, since feature ranking must be repeated using training data only within each split. It should be noted that using different fold counts can introduce a potential confound in direct comparisons, since cross-validation variance and the effective training fraction differ across
K. The lower-
K setting in the unbiased protocol was adopted as a practical constraint due to repeated within-split feature ranking, and it is generally conservative because fewer folds can slightly reduce performance estimates. To isolate the effect of leakage-aware feature selection from the choice of fold count, we additionally report a matched-fold comparison in
Section 3, where both protocols are evaluated under the same 3-fold cross-validation.
For transparency and reproducibility, the selected predictors are summarized in
Table 4. The table reports the union of the top-
k features obtained under both protocols and indicates which features were retained by each protocol. Several predictors were consistently selected, including flow inter-arrival statistics, throughput/rate descriptors, packet-length measures, and port identifiers, suggesting that these attributes provide stable class-discriminative information for IoT attack classification. Differences between the two ranked lists are expected, since ranking on the full dataset, the biased protocol, can implicitly incorporate evaluation-set characteristics and alter the resulting importance ordering, whereas training-only ranking, the unbiased protocol, constrains selection to patterns supported by the training distribution and therefore provides a leakage-aware basis for downstream evaluation.
To further characterize the selected subsets, Pearson correlation analysis was computed for the top-
k features obtained under each protocol with respect to the encoded attack labels. The corresponding correlation heatmaps are shown in
Figure 3 and summarize linear dependencies among the selected predictors. The strongest correlation patterns occur within semantically similar feature groups, most notably among flow inter-arrival statistics and packet-length descriptors. Rate and throughput features also exhibit consistent coupling, indicating partial redundancy within the selected set while still preserving class-discriminative information for downstream learning.
Two CNN models with identical architectures and training settings were trained using the RF-selected feature subsets obtained under the biased and unbiased protocols. In both cases, the CNN was trained in a supervised manner and subsequently used as a deep feature extractor: the penultimate dense layer produced a 128-dimensional embedding for each traffic record, which served as the learned representation for downstream classification.
To qualitatively assess the separability of the learned embeddings, the 128-dimensional CNN outputs were projected to two dimensions using t-distributed stochastic neighbor embedding (t-SNE). The resulting visualization is shown in
Figure 4. It should be noted that t-SNE is used here as a qualitative visualization tool, and the observed separation patterns are illustrative rather than definitive. When the biased (leakage-prone) feature subset was used, the projected clusters appeared to exhibit clearer separation and reduced overlap among several classes. This visual difference is reported only as a qualitative observation and is not used as definitive evidence of leakage. Because the biased protocol computes feature ranking using the full dataset prior to evaluation, it can incorporate test-set information into the selected feature subset; therefore, any conclusions regarding leakage are grounded in the experimental protocol definition and the quantitative cross-validation results, rather than the t-SNE visualization. In contrast, the embeddings obtained with the unbiased (leakage-aware) feature subset showed less pronounced separation and more controlled overlap across classes, which is consistent with a more conservative representation learned under a strictly held-out feature-ranking protocol.
The CNN architecture and hyperparameters used for feature extraction are summarized in
Table 5. Training was performed for 30 epochs using the Adam optimizer. During training, the CNN includes the final softmax layer (Dense-2) and is optimized end-to-end for 30 epochs using a softmax cross-entropy loss with the Adam optimizer. After training, the softmax layer is removed and the output of the penultimate Dense-1 layer is used as a 128-dimensional embedding for feature extraction.
A 1D-CNN was selected because the input to the network is a one-dimensional vector of engineered tabular predictors; convolution along this feature axis provides a lightweight mechanism to learn local feature interactions and nonlinear combinations without imposing an artificial 2D/3D spatial structure that is more appropriate for images or volumetric data.
The selected predictors are tabular and therefore do not possess an inherent spatial topology as in images; accordingly, no claim is made that the feature axis encodes a physical neighborhood structure. Instead, the 25-dimensional input is provided to the network in a fixed, deterministic order (kept identical across training and evaluation), and 1D convolutions are employed primarily as a parameter-sharing mechanism to learn nonlinear feature interactions with fewer parameters than a comparably sized fully connected MLP. To support this design choice and to quantify the contribution of the representation-learning stage, an ablation study was conducted under the same evaluation setting by comparing against two baselines trained directly on the selected features: (i) a standard MLP and (ii) a direct XGBoost model without the CNN embedding stage.
Table 6 summarizes the aggregate results, indicating that the CNN embeddings → XGBoost pipeline yields higher macro-F1 than these baselines while maintaining comparable accuracy and weighted-F1.
The 128-dimensional embedding produced by the CNN feature extractor was used as the input representation for the final classifier. Classification was performed using XGBoost, a gradient-boosted decision tree ensemble in which trees are added sequentially and each new tree is fitted to reduce the residual errors of the current ensemble. XGBoost was used with its standard implementation settings and a multi-class objective (softprob), with the number of classes set to . In this setting, the CNN acts as a representation learner that maps the selected tabular features into a compact latent space, while XGBoost operates on this latent space to learn nonlinear decision boundaries between the attack classes. Consistent with the two feature-selection protocols, two XGBoost classifiers were trained: one using embeddings extracted from the CNN trained on the biased feature subset, and one using embeddings extracted from the CNN trained on the unbiased feature subset.
Model performance was quantified using accuracy as well as macro- and weighted-averaged precision, recall, and F1-score. Macro-averaged metrics assign equal weight to each class and are therefore informative under class imbalance, whereas weighted metrics reflect aggregate performance while accounting for the class support. Discriminative ability was further assessed using one-vs.-rest ROC curves, reporting the macro-averaged ROC–AUC, and using the macro-averaged precision–recall (PR) score.
Under the biased feature-selection protocol, the CNN → XGBoost pipeline achieved an accuracy of 0.9302, a macro-F1 of 0.5862, and a weighted-F1 of 0.9295. The macro-precision and macro-recall were 0.7323 and 0.5443, respectively, while the weighted precision and weighted recall were 0.9355 and 0.9302. A macro ROC–AUC of 0.9897 and a macro PR score of 0.6134 were obtained.
Under the unbiased feature-selection protocol, performance increased slightly and more consistently across the macro-averaged criteria. An accuracy of 0.9324 was achieved, with a macro-F1 of 0.5911 and a weighted-F1 of 0.9321. The macro-precision and macro-recall were 0.7422 and 0.5501, respectively, while the weighted precision and weighted recall were 0.9378 and 0.9324. The macro ROC–AUC increased to 0.9905 and the macro PR score to 0.6218.
Because the initial evaluation used different fold counts across protocols for computational tractability, a matched 3-fold comparison was additionally conducted to remove this potential confound. In particular, the biased (leakage-prone) protocol was re-evaluated under the same 3-fold cross-validation setting adopted for the unbiased (leakage-aware) protocol.
Table 7 reports mean ± standard deviation across folds for key aggregate metrics, together with the corresponding deltas (Unbiased−Biased) and paired-test
p-values. Under the matched 3-fold setting, the aggregate metrics are very close across protocols and the fold-wise differences are not statistically significant at
, indicating that the overall conclusions are not driven by the earlier fold-count mismatch.
Confusion matrices for both protocols are reported in
Figure 5. The dominant mass remains on the diagonal for high-support categories, while most residual errors are concentrated in a small subset of minority classes. Consistent with the per-class behavior reflected by the macro-averaged metrics, the highest misclassification rates occur for the rare web- and brute-force-related categories, including Dictionary Brute Force, XSS, Uploading Attack, and SQL Injection, whereas major DoS/DDoS classes are detected more reliably. A higher number of correct predictions was observed under the unbiased protocol, which is consistent with the expected benefit of leakage-aware feature selection: by ensuring that the test portion remains unseen during feature ranking, the resulting evaluation provides a more defensible estimate of generalization performance.
Figure 6 presents the one-vs.-rest ROC curves for all attack classes together with the macro-averaged ROC–AUC for the CNN–XGBoost classifier under the biased and unbiased feature-selection protocols. In both cases, the majority of ROC traces remain close to the upper-left region of the ROC space, indicating that high true-positive rates are achieved at relatively low false-positive rates across a wide range of decision thresholds. This behavior suggests that the learned decision function provides discriminative capability beyond chance-level prediction, particularly for the high-support classes. However, the lower macro-F1 indicates that this discriminative capability is not uniform across all classes under severe imbalance; most residual errors occur in a small subset of minority web/brute-force categories (see
Figure 5 and
Table 8). A modest increase in macro-averaged ROC–AUC is observed for the unbiased protocol relative to the biased protocol, which is consistent with the expectation that leakage-aware feature selection yields a more defensible estimate of performance on unseen data and reduces the risk of overly optimistic evaluation.
In addition to ROC–AUC, aggregate performance was summarized using both macro-averaged and weighted-averaged metrics. A noticeable gap between these two averaging schemes was observed for both protocols. This discrepancy is commonly associated with pronounced class imbalance: weighted averages are dominated by majority classes and can therefore remain high even when minority classes exhibit lower recall and F1-scores, whereas macro averages assign equal weight to each class and are thus more sensitive to minority-class performance degradation. Consequently, macro-averaged metrics provide a more conservative view of performance in imbalanced multi-class intrusion-detection settings, while weighted metrics reflect overall effectiveness under the empirical class distribution.
Overall, the biased and unbiased protocols yielded very similar performance, with only small differences in point estimates. Under the matched 3-fold setting (
Table 7), paired testing on the fold-wise deltas (Unbiased−Biased) did not indicate statistical significance at
; therefore, these deltas should be interpreted cautiously and may fall within cross-validation variability (noting that
provides limited statistical power). Rather than claiming large performance gains, we emphasize that the unbiased (leakage-aware) protocol is methodologically preferable because it enforces evaluation-set independence during feature ranking and therefore provides a more defensible generalization estimate. To provide a class-granular assessment,
Table 8 reports per-class precision, recall, and F1-score for both protocols, highlighting attack categories that remain challenging and indicating where differences, if any, are most apparent.
To support interpretability, SHAP analysis was conducted to quantify feature contributions to the model decisions. Since the unbiased protocol provides the more reliable generalization estimate and achieved marginally improved overall performance, SHAP explanations were computed for the corresponding unbiased CNN–XGBoost model. This analysis enables both global interpretation (ranking the most influential predictors across the dataset) and local interpretation (explaining individual predictions and misclassifications), thereby improving transparency and supporting analyst-driven validation of the learned intrusion-detection behavior.
SHAP was used to interpret the CNN–XGBoost model trained under the unbiased (leakage-aware) protocol. Because the CNN produces a 128-dimensional latent representation, the SHAP analysis in this subsection reflects the contribution of the CNN-extracted latent features (feature indices 1–128) to the final XGBoost decision function, rather than the original network traffic variables.
Accordingly, the SHAP explanations in this subsection should be interpreted as latent-space attributions that quantify how the downstream XGBoost classifier leverages the CNN embedding dimensions to form its decisions. Although the latent dimensions are learned from the original traffic-feature vector, they represent compressed combinations of multiple inputs and therefore do not admit a one-to-one correspondence with individual traffic variables. Any security or behavioral interpretation should thus be treated as indirect unless an explicit latent-to-input attribution analysis is performed.
A practical linkage back to input variables can be obtained by treating the trained CNN as a deterministic mapping from the selected traffic features to the latent embedding. After the most influential latent dimensions are identified (e.g., by SHAP), input-level relevance can be estimated through a complementary attribution step on the CNN, such as gradient-based methods (saliency or integrated gradients) computed with respect to those dominant latent dimensions. Alternatively, controlled perturbations can be applied in the original input space while tracking the induced changes in (i) the dominant latent dimensions and (ii) the final XGBoost outputs. This provides an analyst-oriented pathway to relate influential latent factors to concrete traffic-feature indicators without altering the evaluation protocol used in this study. A quantitative implementation of these input-level mapping procedures is outside the scope of the present study and is left for future work.
Figure 7 summarizes the global distribution of absolute SHAP coefficients across all 128 latent features and all evaluated samples. A strongly right-skewed (heavy-tailed) distribution is observed: most latent dimensions exhibit near-zero contribution for the majority of samples, whereas a small subset attains substantially larger SHAP magnitudes. This concentration of attribution mass indicates that the downstream classifier relies primarily on a limited number of highly informative latent directions, while many remaining dimensions contribute marginally. Such behavior is consistent with a representation-learning stage that compresses discriminative structure into a sparse set of salient latent factors, which may also suggest that further latent-space pruning or dimensionality sensitivity analysis could be feasible without severely degrading performance.
To resolve how the learned representation supports specific attack categories, class-wise mean absolute SHAP values were computed and aggregated for the top 20 latent features, as shown in
Figure 8. Each bar reports the mean contribution of a latent feature, with the stacked segments indicating how that feature contributes across different classes. Several latent dimensions exhibit class-dependent importance (i.e., stronger attribution for a subset of attacks), suggesting that parts of the learned representation encode patterns that are more distinctive for particular behaviors. At the same time, a consistently dominant latent dimension is apparent: feature 66 attains the highest mean impact across most classes, indicating that it captures broadly discriminative structure shared by multiple attack types and benign traffic. This global relevance suggests that feature 66 may encode a high-level factor correlated with general traffic intensity or temporal/structural irregularities that manifest across many attacks. In contrast, the remaining top-ranked latent features show comparatively more selective contributions, which is consistent with additional latent factors specializing in finer-grained class separation. Collectively, these observations indicate that the CNN representation contains both shared (cross-class) and class-specific latent cues, which are subsequently leveraged by XGBoost to form decision boundaries. From an analyst perspective, the concentration of attribution in a small set of latent dimensions also suggests that model behavior is driven by a limited number of dominant factors, which can simplify downstream auditing and robustness checks.
Overall, the SHAP results support two complementary conclusions: (i) the learned 128-dimensional CNN representation is effectively utilized in a sparse manner by the XGBoost classifier, with decision-making dominated by a small subset of latent dimensions; and (ii) within that subset, both globally discriminative latent factors (e.g., feature 66) and class-selective latent factors are present, providing evidence that the hybrid pipeline combines shared attack-related signatures with more specialized class-specific cues.