Closed-Set Heterogeneous Domain Adaptation for IoT Intrusion Detection: An Anchor-Based Benchmark Across Single- and Multi-Source Transfer
Highlights
- This paper introduces a controlled closed-set heterogeneous domain adaptation (HDA) benchmark for Internet of Things (IoT) intrusion detection with fixed contexts, representation contracts, anchor references, and twenty-seed paired statistical testing; at the labelled-target ratio, GGA recovers – of the source-only-to-oracle headroom across C1–C4, while JSTN recovers – in the contemporary-source MS-HDA family and – in the mixed-vintage family.
- A same-budget comparison shows that DA’s advantage is not universal: some contexts favour adaptation, some show scarcity crossover, and some favour direct target-side supervised learning; PMGN and CWAN method-coverage checks confirm that most deployment directions are not driven by a single selected method, and a compact ToN-IoT second-target confirmatory experiment shows the framework remains meaningful under a different IoT/IIoT target while confirming target-conditioned magnitudes.
- Closed-set HDA results should be interpreted against source-only, matched-budget target-only, and oracle target-only anchors, not as raw leaderboard scores.
- Native DA is deployment-valuable only when it recovers more operationally meaningful headroom than direct target-side labelling under the same resolved context and labelled-target budget.
Abstract
1. Introduction
- 1.
- A controlled closed-set HDA benchmark for IoT intrusion detection. The paper defines fixed resolved contexts covering SS-HDA and MS-HDA transfer into Edge-IIoTset, with declared source–target tuples, labelled target budgets, representation contracts, label contracts, fixed target test identities, and leakage-safe preprocessing.
- 2.
- Anchor-based headroom interpretation with paired statistical evidence. The paper introduces a deployment-oriented analysis based on GapClosure and a matched-budget target-only comparison, converting raw Macro-F1 improvements into claims about recovered supervision headroom and adaptation-versus-labelling trade-offs. Each DA-versus-target-only comparison is supported by paired seed-level statistical evidence, including bootstrap 95% confidence intervals, paired t-tests, Wilcoxon signed-rank checks, and effect size reporting.
- 3.
- Native-regime empirical characterisation without pooled leaderboards. The paper evaluates GGA and JSTN as primary selected native exemplars for SS-HDA and MS-HDA, respectively, rather than pooling methods into a single leaderboard. At the labelled target ratio, GGA recovers – of the source-only-to-oracle headroom across C1–C4, while JSTN recovers – in the contemporary-source MS-HDA family and – in the mixed-vintage family.
- 4.
- Method coverage and target-confirmatory evidence. PMGN and CWAN method coverage checks at test whether the main deployment patterns depend on a single method choice within each regime. A compact ToN-IoT second-target confirmatory experiment tests whether the qualitative headroom recovery framework remains visible under a different IoT/IIoT target.
- 5.
- Reproducibility-oriented disclosure. The paper reports split manifests, feature admissibility decisions, excluded feature lists, raw-label mappings, paired statistical-test outputs, and evidence bundle schema elements so that the reported scores can be traced back to their resolved evaluation contexts.
- RQ1: How does native SS-HDA behave under fixed source–target tuples, labelled target budgets, and representation contracts?
- RQ2: How does native MS-HDA behave under contemporary-source and mixed-vintage source-pair conditions?
- RQ3a: How much source-only-to-oracle target supervision headroom is recovered by native closed-set HDA within each resolved context?
- RQ3b: When compared at the same labelled target budget, does native DA recover more supervision headroom than direct target-only supervised learning?
- RQ4: Are the headline deployment patterns sensitive to using only one selected method per regime, or are they supported by secondary PMGN and CWAN method coverage checks?
- RQ5: Do the qualitative headroom recovery patterns remain visible under a second IoT/IIoT target dataset, ToN-IoT, or are the observed magnitudes strongly target-conditioned?
Section Roadmap
2. Related Work
2.1. Closed-Set Heterogeneous Transfer for Intrusion Detection
2.2. Datasets, Label Contracts, and Feature Heterogeneity
2.3. Evaluation Discipline and Anchor-Based DA Interpretation
2.4. Positioning of the Present Paper
3. Benchmark Setup
3.1. Scope, Controls, and Research Question Mapping
- RQ1: How does the native SS-HDA method behave under fixed source–target tuples, labelled target budgets, and representation contracts?
- RQ2: How does the native MS-HDA method behave under contemporary-source and mixed-vintage multi-source transfer conditions?
- RQ3a: How much source-only-to-oracle target supervision headroom is recovered by native closed-set HDA across the reported regimes, ratios, and representation contracts?
- RQ3b: Under what conditions is native HDA preferable to matched-budget target-only learning, and when does direct target-side labelling dominate?
- RQ4: Are the headline deployment patterns sensitive to using only one selected method per regime, or are they supported by secondary PMGN and CWAN method coverage checks?
- RQ5: Do the qualitative headroom recovery patterns remain visible under a second IoT/IIoT target dataset, ToN-IoT, or are the observed magnitudes strongly target-conditioned?
3.2. Datasets and Closed-Set Label Contract Derivation
3.3. Representation Contracts: Intersection and Union
3.4. Closed-Set Evaluation Contexts
3.5. Methods and Native Role Declarations
3.6. Anchors and the GapClosure Framing
3.7. Metrics, Aggregation, Run Policy, and Admissibility
4. Native Closed-Set DA Results
4.1. Anchor Performance Across Closed-Set Contexts
4.2. SS-HDA Results: GGA Across Single-Source Tuples
4.3. MS-HDA Results: JSTN Across Multi-Source Tuples
4.4. Representation Contract Sensitivity
4.5. Label Budget and Seed Stability Profiles
5. GapClosure and Supervision Headroom Analysis
5.1. Per-Context GapClosure
5.2. Regime Family GapClosure and Sign-Based Deployment Incidence
5.3. Same-Budget Comparison: DA Versus Matched-Budget Target-Only Learning
5.4. Method Coverage Checks at the Labelled Target Ratio
5.5. Second-Target Confirmatory Evidence Using ToN-IoT
5.6. Residual Headroom and Deployment Decision Synthesis
Deployment Rule
6. Synthesis and Deployment Implications
7. Critical Evaluation and Scope Boundaries
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| DA | Domain adaptation |
| HDA | Heterogeneous domain adaptation |
| SS-HDA | Single-source semi-supervised heterogeneous domain adaptation |
| MS-HDA | Multi-source semi-supervised heterogeneous domain adaptation |
| IoT | Internet of Things |
| IIoT | Industrial Internet of Things |
| IDS | Intrusion detection system |
| NIDS | Network intrusion detection system |
| GGA | Geometric Graph Alignment |
| JSTN | Joint Semantic Transfer Network |
| PMGN | Prototype-Matching Graph Network |
| CWAN | Conditional Weighting Adversarial Network |
| CORAL | Correlation alignment |
| Macro-F1 | Macro-averaged F1-score |
| ROC-AUC | Area under the receiver operating characteristic curve |
| PR-AUC | Area under the precision–recall curve |
| OSDA | Open-set domain adaptation |
| OSDN | Open-set domain network |
| DoS | Denial-of-service |
| DDoS | Distributed denial-of-service |
| LightGBM | Light Gradient Boosting Machine |
Appendix A. Implementation Disclosure and Reproducibility Artefacts
| Item | Disclosure |
|---|---|
| Evaluation scope | Closed-set heterogeneous domain adaptation only. The main benchmark contexts are C1–C8, covering SS-HDA and MS-HDA under Intersection and Union representation contracts. The confirmatory second-target contexts are T1–T4, using ToN-IoT at the labelled target ratio. |
| Target datasets | Edge-IIoTset is used as the main fixed IoT/IIoT target dataset for C1–C8. ToN-IoT is used as a compact second-target confirmatory dataset for T1–T4. |
| Source configurations | CICIDS2017, UNSW-NB15, CICIDS2017 + UNSW-NB15, and CICIDS2017 + NSL-KDD are used for the Edge-IIoTset contexts according to the context definitions in the main paper and the split summary in Table A2. CICIDS2017 and CICIDS2017 + UNSW-NB15 are used for the ToN-IoT confirmatory contexts according to Table 6. |
| Method roles | GGA is treated as the primary selected SS-HDA exemplar in C1–C4 and T1–T2. JSTN is treated as the primary selected MS-HDA exemplar in C5–C8 and T3–T4. PMGN and CWAN are used as method coverage checks for C1–C4 and C5–C8, respectively, and are not pooled with the primary GGA/JSTN rows. |
| Representation contracts | Each context is executed under a declared representation contract. Intersection retains only admissible shared features, while Union materialises a deterministic admissible superset defined before execution. Feature admissibility decisions and excluded feature lists are stored as repository artefacts and summarised in the Appendices. |
| Label contracts | All reported closed-set results use the declared five-class contract: normal, scan, DoS, DDoS, and exploit/credential abuse. Raw-label mappings, merged families, and excluded target-specific families are recorded in the label mapping appendix and repository manifest. |
| Repeated run policy | Each fixed method–context–ratio configuration is repeated across twenty seeds: . |
| Paired comparison rule | For DA-versus-matched-budget target-only comparisons, paired differences are computed under the same context, representation contract, labelled target ratio, seed, and labelled target subset. The per-run split manifest records the seed-specific labelled-target subset identity used for both the DA run and the matched-budget target-only anchor. |
| Primary metric | Macro-F1 on the fixed target test split. Accuracy and Weighted-F1 are retained as secondary operational summaries. GapClosure, same-budget GapClosure difference, residual headroom, and paired statistical outputs are derived from the Macro-F1 records. |
| Neural optimiser and training budget | Adam optimiser; initial learning rate ; weight decay ; mini-batch size 256; maximum training budget 100 epochs; early stopping patience 15 epochs. |
| Model selection rule | Model selection uses only context-permitted validation data. The fixed target test split is never used for preprocessing fit, hyperparameter selection, early stopping, threshold selection, model selection, or post hoc calibration. |
| Anchor learner | Source-only, CORAL, matched-budget target-only, and oracle target-only anchors use the same LightGBM multiclass reference learner. Fixed settings are learning rate = 0.05, num_leaves = 31, feature_fraction = 0.8, bagging_fraction = 0.8, bagging_freq = 1, min_data_in_leaf = 20, maximum rounds = 500, and early stopping = 50 rounds. |
| Paired statistical outputs | For DA-versus-matched-budget target-only comparisons, the repository stores seed-level paired difference records, bootstrap confidence intervals, paired t-test outputs, Wilcoxon signed-rank outputs, effect-size summaries, and Holm–Bonferroni correction records. The Wilcoxon output record includes the computation mode used for each comparison, distinguishing exact signed-rank calculations from asymptotic calculations where ties or zero paired differences require a fallback. The full paired statistical outputs are reported in Appendix F. |
| Software and hardware manifest | The repository contains the execution environment file and hardware manifest, including the operating system, Python version 3.11.6, package versions, CPU, memory, accelerator configuration where applicable, random-seed configuration, and execution timestamp. |
| Repository artefacts | The repository stores executable scripts, configuration files, context specifications, split manifests, feature admissibility records, excluded feature lists, label mapping files, prediction files, score files where available, per-run metric summaries, GapClosure summaries, seed-level paired difference records, paired statistical test outputs, ToN-IoT confirmatory manifests, and evidence bundle records (schema in Appendix G). |
Appendix B. Split Manifest Summary
| Ctx. | Regime | Domain Tuple | Source Train | Source Val. | Target Unlabelled | Target Val. | Target Test |
|---|---|---|---|---|---|---|---|
| C1/C2 | SS-HDA | CICIDS2017 → Edge-IIoTset | 160,000 | 20,000 | 100,000 | 20,000 | 200,000 |
| C3/C4 | SS-HDA | UNSW-NB15 → Edge-IIoTset | 140,000 | 18,000 | 100,000 | 20,000 | 200,000 |
| C5/C6 | MS-HDA | CICIDS2017 + UNSW-NB15 → Edge-IIoTset | 160,000 + 140,000 | 20,000 + 18,000 | 100,000 | 20,000 | 200,000 |
| C7/C8 | MS-HDA | CICIDS2017 + NSL-KDD → Edge-IIoTset | 160,000 + 18,000 | 20,000 + 4500 | 100,000 | 20,000 | 200,000 |
| Ratio | Target Labelled Subset | Interpretation |
|---|---|---|
| 2000 | Severe labelled target scarcity. | |
| 10,000 | Moderate labelled target budget. | |
| 20,000 | Largest labelled target budget in the Edge-IIoTset closed-set benchmark. |
| Target | Target Train Pool | Labelled Budget | Effective Labelled Train | Target Validation | Target Test |
|---|---|---|---|---|---|
| ToN-IoT | 120,000 | 12,000 | 9600 | 2400 | 40,000 |
| Closed-Set Class | Target Train Pool | Labelled Budget | Effective Labelled Train | Target Validation | Target Test |
|---|---|---|---|---|---|
| normal | 36,000 | 3600 | 2880 | 720 | 12,000 |
| scan | 24,000 | 2400 | 1920 | 480 | 8000 |
| DoS | 24,000 | 2400 | 1920 | 480 | 8000 |
| DDoS | 18,000 | 1800 | 1440 | 360 | 6000 |
| exploit/credential abuse | 18,000 | 1800 | 1440 | 360 | 6000 |
| Total | 120,000 | 12,000 | 9600 | 2400 | 40,000 |
Appendix C. Raw Label Mapping and Label Contract Disclosure
| Dataset Family | Justification |
|---|---|
| CICIDS2017 | Fine-grained attack families were merged only where a broader benchmark class had a credible cross-dataset counterpart in the selected source–target tuples. Bot, Infiltration, and Heartbleed were excluded because they did not support a stable shared closed-set contract and would have introduced tuple-specific classes rather than comparable benchmark classes. |
| UNSW-NB15 | Source-native families were consolidated into broader benchmark classes to match the benchmark’s coarse closed-set contract rather than the original dataset taxonomy. Reconnaissance is mapped to scan, DoS is retained as DoS, and Exploits are merged into exploit/credential abuse. The Generic family is mapped under the DDoS/disruptive traffic class only as a coarse flow statistics approximation: in the realised benchmark interface, it provides high-volume disruptive traffic patterns that are closer to the retained DDoS-style operational class than to the exploit/credential-abuse class, although its original semantic definition is not identical to that of DDoS. Backdoor, Shellcode, Analysis, Fuzzers, and Worms were excluded because credible one-to-one alignment with the retained benchmark classes was weak or absent across the selected tuples. Retaining them would therefore have reduced the comparability. |
| NSL-KDD | The legacy taxonomy was mapped to the benchmark’s coarse classes only where a stable semantic correspondence was defensible, such as Probe to scan and R2L/U2R-style behaviours to exploit/credential abuse. NSL-KDD does not provide a stable direct DDoS family under the standard taxonomy; in C7/C8, DDoS support is therefore provided by the CICIDS2017 source side where applicable. The dataset is used only as a legacy auxiliary source in MS-HDA, and its limitations are disclosed rather than hidden through artificial relabelling. |
| Edge-IIoTset | Target-side labels were consolidated into shared benchmark classes to create a stable closed-set evaluation target across heterogeneous sources. MITM, Backdoor, Ransomware, and Uploading were excluded because they were not credibly supported across the selected source tuples and would have made the target contract richer than the transferable source-side contract. |
| ToN-IoT | The confirmatory target is mapped into the same five-class closed-set contract used for the Edge-IIoTset benchmark. Normal, scan, DoS, DDoS, and exploit/credential abuse are retained where a defensible mapping exists. Injection, password, and XSS are merged into exploit/credential abuse; Backdoor, Ransomware, and MITM are excluded because they would introduce target-specific classes not stably supported by the selected source tuples. |
| Benchmark Class | Raw Labels or Raw Families Mapped to the Benchmark Class | Status |
|---|---|---|
| CICIDS2017 | ||
| normal | BENIGN | Retained |
| scan | PortScan | Retained |
| DoS | DoS Hulk, DoS GoldenEye, DoS slowloris, DoS Slowhttptest | Merged |
| DDoS | DDoS | Retained |
| exploit/credential abuse | FTP-Patator, SSH-Patator, Web Attack – Brute Force, Web Attack – XSS, Web Attack – Sql Injection | Merged |
| (Excluded families) | Bot, Infiltration, Heartbleed | Excluded from closed-set contract because these families have weak or unstable correspondence to the retained five-class benchmark contract in the selected source–target tuples. |
| UNSW-NB15 | ||
| normal | Normal | Retained |
| scan | Reconnaissance | Merged |
| DoS | DoS | Retained |
| DDoS | Generic | Mapped under the benchmark’s coarse disruptive traffic convention; treated as a flow statistics approximation rather than a semantic DDoS label. |
| exploit/credential abuse | Exploits | Merged |
| (Excluded families) | Analysis, Backdoor, Fuzzers, Shellcode, Worms | Excluded from closed-set contract |
| NSL-KDD | ||
| normal | normal | Retained |
| scan | ipsweep, nmap, portsweep, satan, mscan, saint | Probe family merge |
| DoS | back, land, neptune, pod, smurf, teardrop, apache2, mailbomb, processtable, udpstorm | Merged |
| DDoS | No direct stable raw DDoS family is available in the standard NSL-KDD taxonomy. In C7/C8, DDoS class supervision is therefore source-asymmetric: CICIDS2017 provides DDoS examples, while NSL-KDD does not contribute DDoS-labelled training instances. This asymmetry is preserved in the benchmark rather than addressed through synthetic relabelling. | Not mapped for NSL-KDD |
| exploit/credential abuse | R2L/U2R-style families, including ftp_write, guess_passwd, imap, multihop, phf, warezclient, warezmaster, buffer_overflow, loadmodule, perl, and rootkit. | Merged where retained |
| (Excluded families) | Rare or unstable raw aliases without sufficient class support after benchmark filtering, including standard NSL-KDD rare-family aliases where applicable. | Excluded from closed-set contract |
| Edge-IIoTset main target | ||
| normal | Normal or dataset-equivalent benign label. | Retained |
| scan | Port_Scanning, Fingerprinting, Vulnerability_scanner | Merged |
| DoS | DoS_HTTP, DoS_TCP, DoS_UDP, DoS_ICMP, and spelling-normalised aliases. | Merged |
| DDoS | DDoS_HTTP, DDoS_TCP, DDoS_UDP, DDoS_ICMP, and spelling-normalised aliases. | Merged |
| exploit/credential abuse | Password, SQL_injection, XSS, and spelling-normalised aliases. | Merged |
| (Excluded families) | MITM, Backdoor, Ransomware, Uploading, and any additional raw families present in the preprocessed Edge-IIoTset variant that are not listed under the retained five-class contract. | Excluded from closed-set contract; the complete machine-readable raw label disposition list is preserved in the repository. |
| ToN-IoT confirmatory target | ||
| normal | normal or dataset-equivalent benign label. | Retained |
| scan | scanning, scan, and spelling-normalised aliases. | Merged |
| DoS | dos and spelling-normalised aliases. | Retained |
| DDoS | ddos and spelling-normalised aliases. | Retained |
| exploit/credential abuse | injection, password, xss, and spelling-normalised aliases. | Merged |
| (Excluded families) | backdoor, ransomware, mitm | Excluded from closed-set confirmatory contract |
Appendix D. Excluded-Feature Lists for Edge-IIoTset Representation Contracts
| Context (s) | Contract (s) | Dataset/Source | Original Feature Name | Normalised Feature Name | Exclusion Rationale |
|---|---|---|---|---|---|
| C1/C2: CICIDS2017 → Edge-IIoTset; 11 excluded features | |||||
| C1; C2 | C1 = Intersection; C2 = Union | CICIDS2017 | Flow ID | flow_id | Identifier-like field; excluded to avoid record-level artefacts and non-transferable metadata. |
| C1; C2 | C1 = Intersection; C2 = Union | CICIDS2017 | Source IP | src_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C1; C2 | C1 = Intersection; C2 = Union | CICIDS2017 | Destination IP | dst_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C1; C2 | C1 = Intersection; C2 = Union | CICIDS2017 | Timestamp | timestamp | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| C1; C2 | C1 = Intersection; C2 = Union | CICIDS2017 | Label | label | Label-proximal field; excluded because it directly encodes the prediction target. |
| C1; C2 | C1 = Intersection; C2 = Union | Edge-IIoTset | frame.time | timestamp | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| C1; C2 | C1 = Intersection; C2 = Union | Edge-IIoTset | ip.src_host | src_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C1; C2 | C1 = Intersection; C2 = Union | Edge-IIoTset | ip.dst_host | dst_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C1; C2 | C1 = Intersection; C2 = Union | Edge-IIoTset | tcp.payload | tcp_payload | Payload-like field; excluded because raw payload content is not represented consistently across domains and risks non-transferable encoding artefacts. |
| C1; C2 | C1 = Intersection; C2 = Union | Edge-IIoTset | http.file_data | http_file_data | Free-text or payload-like field; excluded because it cannot be represented consistently across domains. |
| C1; C2 | C1 = Intersection; C2 = Union | Edge-IIoTset | Attack_type | attack_type | Label-proximal field; excluded because it directly records the target-side attack family. |
| C3/C4: UNSW-NB15 → Edge-IIoTset; 16 excluded features | |||||
| C3; C4 | C3 = Intersection; C4 = Union | UNSW-NB15 | id | record_id | Record-management metadata; excluded because it identifies dataset rows rather than transferable network behaviour. |
| C3; C4 | C3 = Intersection; C4 = Union | UNSW-NB15 | srcip | src_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C3; C4 | C3 = Intersection; C4 = Union | UNSW-NB15 | dstip | dst_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C3; C4 | C3 = Intersection; C4 = Union | UNSW-NB15 | sport | src_port | Endpoint metadata; excluded specifically for the UNSW-NB15 → Edge-IIoTset tuple because the UNSW raw port representation could not be reconciled with the target-side port encoding under the declared representation contracts. |
| C3; C4 | C3 = Intersection; C4 = Union | UNSW-NB15 | dsport | dst_port | Endpoint metadata; excluded specifically for the UNSW-NB15 → Edge-IIoTset tuple because the UNSW raw port representation could not be reconciled with the target-side port encoding under the declared representation contracts. |
| C3; C4 | C3 = Intersection; C4 = Union | UNSW-NB15 | Stime | start_time | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| C3; C4 | C3 = Intersection; C4 = Union | UNSW-NB15 | Ltime | last_time | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| C3; C4 | C3 = Intersection; C4 = Union | UNSW-NB15 | attack_cat | attack_category | Label-proximal field; excluded because it records the source-side attack family. |
| C3; C4 | C3 = Intersection; C4 = Union | UNSW-NB15 | Label | label | Label-proximal field; excluded because it directly encodes the prediction target. |
| C3; C4 | C3 = Intersection; C4 = Union | Edge-IIoTset | frame.time | timestamp | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| C3; C4 | C3 = Intersection; C4 = Union | Edge-IIoTset | ip.src_host | src_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C3; C4 | C3 = Intersection; C4 = Union | Edge-IIoTset | ip.dst_host | dst_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C3; C4 | C3 = Intersection; C4 = Union | Edge-IIoTset | http.file_data | http_file_data | Free-text or payload-like field; excluded because it cannot be represented consistently across domains. |
| C3; C4 | C3 = Intersection; C4 = Union | Edge-IIoTset | http.request.full_uri | http_request_full_uri | Free-text or unstable string field; excluded because URI strings are not consistently transferable across domains. |
| C3; C4 | C3 = Intersection; C4 = Union | Edge-IIoTset | mqtt.topic | mqtt_topic | Protocol-specific categorical metadata; excluded because it is target-specific and not retained in the executable source–target interface. |
| C3; C4 | C3 = Intersection; C4 = Union | Edge-IIoTset | Attack_type | attack_type | Label-proximal field; excluded because it directly records the target-side attack family. |
| C5/C6: CICIDS2017 + UNSW-NB15 → Edge-IIoTset; 19 excluded features | |||||
| C5; C6 | C5 = Intersection; C6 = Union | CICIDS2017 | Flow ID | flow_id | Identifier-like field; excluded to avoid record-level artefacts and non-transferable metadata. |
| C5; C6 | C5 = Intersection; C6 = Union | CICIDS2017 | Source IP | src_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C5; C6 | C5 = Intersection; C6 = Union | CICIDS2017 | Destination IP | dst_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C5; C6 | C5 = Intersection; C6 = Union | CICIDS2017 | Timestamp | timestamp | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| C5; C6 | C5 = Intersection; C6 = Union | CICIDS2017 | Label | label | Label-proximal field; excluded because it directly encodes the prediction target. |
| C5; C6 | C5 = Intersection; C6 = Union | UNSW-NB15 | id | record_id | Record-management metadata; excluded because it identifies dataset rows rather than transferable network behaviour. |
| C5; C6 | C5 = Intersection; C6 = Union | UNSW-NB15 | srcip | src_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C5; C6 | C5 = Intersection; C6 = Union | UNSW-NB15 | dstip | dst_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C5; C6 | C5 = Intersection; C6 = Union | UNSW-NB15 | Stime | start_time | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| C5; C6 | C5 = Intersection; C6 = Union | UNSW-NB15 | Ltime | last_time | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| C5; C6 | C5 = Intersection; C6 = Union | UNSW-NB15 | attack_cat | attack_category | Label-proximal field; excluded because it records the source-side attack family. |
| C5; C6 | C5 = Intersection; C6 = Union | UNSW-NB15 | Label | label | Label-proximal field; excluded because it directly encodes the prediction target. |
| C5; C6 | C5 = Intersection; C6 = Union | Edge-IIoTset | frame.time | timestamp | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| C5; C6 | C5 = Intersection; C6 = Union | Edge-IIoTset | ip.src_host | src_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C5; C6 | C5 = Intersection; C6 = Union | Edge-IIoTset | ip.dst_host | dst_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C5; C6 | C5 = Intersection; C6 = Union | Edge-IIoTset | http.file_data | http_file_data | Free-text or payload-like field; excluded because it cannot be represented consistently across domains. |
| C5; C6 | C5 = Intersection; C6 = Union | Edge-IIoTset | http.request.full_uri | http_request_full_uri | Free-text or unstable string field; excluded because URI strings are not consistently transferable across domains. |
| C5; C6 | C5 = Intersection; C6 = Union | Edge-IIoTset | mqtt.topic | mqtt_topic | Protocol-specific categorical metadata; excluded because it is target-specific and not retained in the executable multi-source interface. |
| C5; C6 | C5 = Intersection; C6 = Union | Edge-IIoTset | Attack_type | attack_type | Label-proximal field; excluded because it directly records the target-side attack family. |
| C7/C8: CICIDS2017 + NSL-KDD → Edge-IIoTset; 20 excluded features | |||||
| C7; C8 | C7 = Intersection; C8 = Union | CICIDS2017 | Flow ID | flow_id | Identifier-like field; excluded to avoid record-level artefacts and non-transferable metadata. |
| C7; C8 | C7 = Intersection; C8 = Union | CICIDS2017 | Source IP | src_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C7; C8 | C7 = Intersection; C8 = Union | CICIDS2017 | Destination IP | dst_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C7; C8 | C7 = Intersection; C8 = Union | CICIDS2017 | Timestamp | timestamp | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| C7; C8 | C7 = Intersection; C8 = Union | CICIDS2017 | Label | label | Label-proximal field; excluded because it directly encodes the prediction target. |
| C7; C8 | C7 = Intersection; C8 = Union | NSL-KDD | label | label | Label-proximal field; excluded because it directly encodes the prediction target. |
| C7; C8 | C7 = Intersection; C8 = Union | NSL-KDD | difficulty | difficulty_level | Record-management metadata; excluded because it describes dataset construction rather than transferable network behaviour. |
| C7; C8 | C7 = Intersection; C8 = Union | NSL-KDD | num_outbound_cmds | num_outbound_cmds | Source-specific legacy telemetry; excluded because it is constant or unsupported in the target feature interface. |
| C7; C8 | C7 = Intersection; C8 = Union | NSL-KDD | is_host_login | is_host_login | Source-specific legacy telemetry; excluded because it is not retained in the target-side executable interface. |
| C7; C8 | C7 = Intersection; C8 = Union | NSL-KDD | hot | hot | Source-only content-derived telemetry; excluded because the target does not provide a comparable measurement. |
| C7/C8: CICIDS2017 + NSL-KDD → Edge-IIoTset; 20 excluded features | |||||
| C7; C8 | C7 = Intersection; C8 = Union | NSL-KDD | num_failed_logins | num_failed_logins | Source-only host/session telemetry; excluded because the target does not provide a comparable measurement. |
| C7; C8 | C7 = Intersection; C8 = Union | NSL-KDD | num_compromised | num_compromised | Source-only host-compromise telemetry; excluded because the target does not provide a comparable measurement. |
| C7; C8 | C7 = Intersection; C8 = Union | Edge-IIoTset | frame.time | timestamp | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| C7; C8 | C7 = Intersection; C8 = Union | Edge-IIoTset | ip.src_host | src_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C7; C8 | C7 = Intersection; C8 = Union | Edge-IIoTset | ip.dst_host | dst_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| C7; C8 | C7 = Intersection; C8 = Union | Edge-IIoTset | tcp.payload | tcp_payload | Payload-like field; excluded because raw payload content is not represented consistently across domains and risks non-transferable encoding artefacts. |
| C7; C8 | C7 = Intersection; C8 = Union | Edge-IIoTset | http.file_data | http_file_data | Free-text or payload-like field; excluded because it cannot be represented consistently across domains. |
| C7; C8 | C7 = Intersection; C8 = Union | Edge-IIoTset | http.request.full_uri | http_request_full_uri | Free-text or unstable string field; excluded because URI strings are not consistently transferable across domains. |
| C7; C8 | C7 = Intersection; C8 = Union | Edge-IIoTset | mqtt.topic | mqtt_topic | Protocol-specific categorical metadata; excluded because it is target-specific and not retained in the executable multi-source interface. |
| C7; C8 | C7 = Intersection; C8 = Union | Edge-IIoTset | Attack_type | attack_type | Label-proximal field; excluded because it directly records the target-side attack family. |
Appendix E. ToN-IoT Second-Target Confirmatory Disclosure
| Ctx. | Domain Tuple | Protocol | Shared Raw Candidates | Retained Features | Union-Only Materialised | Excluded |
|---|---|---|---|---|---|---|
| T1 | CICIDS2017 → ToN-IoT | Intersection | 36 | 24 | 0 | 12 |
| T2 | CICIDS2017 → ToN-IoT | Union | 36 | 43 | 19 | 12 |
| T3 | CICIDS2017 + UNSW-NB15 → ToN-IoT | Intersection | 25 | 21 | 0 | 17 |
| T4 | CICIDS2017 + UNSW-NB15 → ToN-IoT | Union | 25 | 52 | 31 | 17 |
| Context (s) | Contract (s) | Dataset/Source | Original Feature Name | Normalised Feature Name | Exclusion Rationale |
|---|---|---|---|---|---|
| T1/T2: CICIDS2017 → ToN-IoT; 12 excluded features | |||||
| T1; T2 | T1 = Intersection; T2 = Union | CICIDS2017 | Flow ID | flow_id | Identifier-like field; excluded to avoid record-level artefacts and non-transferable metadata. |
| T1; T2 | T1 = Intersection; T2 = Union | CICIDS2017 | Source IP | src_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| T1; T2 | T1 = Intersection; T2 = Union | CICIDS2017 | Destination IP | dst_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| T1; T2 | T1 = Intersection; T2 = Union | CICIDS2017 | Timestamp | timestamp | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| T1; T2 | T1 = Intersection; T2 = Union | CICIDS2017 | Label | label | Label-proximal field; excluded because it directly encodes the prediction target. |
| T1; T2 | T1 = Intersection; T2 = Union | ToN-IoT | ts | timestamp | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| T1; T2 | T1 = Intersection; T2 = Union | ToN-IoT | src_ip | src_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| T1; T2 | T1 = Intersection; T2 = Union | ToN-IoT | dst_ip | dst_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| T1; T2 | T1 = Intersection; T2 = Union | ToN-IoT | src_port | src_port | Endpoint metadata; excluded in the ToN-IoT confirmatory interface because the raw port representation was not retained after source–target feature reconciliation under the declared representation contracts. |
| T1; T2 | T1 = Intersection; T2 = Union | ToN-IoT | dst_port | dst_port | Endpoint metadata; excluded in the ToN-IoT confirmatory interface because the raw port representation was not retained after source–target feature reconciliation under the declared representation contracts. |
| T1; T2 | T1 = Intersection; T2 = Union | ToN-IoT | type | attack_type | Label-proximal field; excluded because it records the target-side attack family. |
| T1; T2 | T1 = Intersection; T2 = Union | ToN-IoT | label | label | Label-proximal field; excluded because it directly encodes the prediction target. |
| T3/T4: CICIDS2017 + UNSW-NB15 → ToN-IoT; 17 excluded features | |||||
| T3; T4 | T3 = Intersection; T4 = Union | CICIDS2017 | Flow ID | flow_id | Identifier-like field; excluded to avoid record-level artefacts and non-transferable metadata. |
| T3; T4 | T3 = Intersection; T4 = Union | CICIDS2017 | Source IP | src_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| T3; T4 | T3 = Intersection; T4 = Union | CICIDS2017 | Destination IP | dst_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| T3; T4 | T3 = Intersection; T4 = Union | CICIDS2017 | Timestamp | timestamp | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| T3; T4 | T3 = Intersection; T4 = Union | CICIDS2017 | Label | label | Label-proximal field; excluded because it directly encodes the prediction target. |
| T3; T4 | T3 = Intersection; T4 = Union | UNSW-NB15 | id | record_id | Record-management metadata; excluded because it identifies dataset rows rather than transferable network behaviour. |
| T3; T4 | T3 = Intersection; T4 = Union | UNSW-NB15 | srcip | src_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| T3; T4 | T3 = Intersection; T4 = Union | UNSW-NB15 | dstip | dst_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| T3; T4 | T3 = Intersection; T4 = Union | UNSW-NB15 | Stime | start_time | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| T3; T4 | T3 = Intersection; T4 = Union | UNSW-NB15 | Ltime | last_time | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| T3; T4 | T3 = Intersection; T4 = Union | UNSW-NB15 | attack_cat | attack_category | Label-proximal field; excluded because it records the source-side attack family. |
| T3; T4 | T3 = Intersection; T4 = Union | UNSW-NB15 | Label | label | Label-proximal field; excluded because it directly encodes the prediction target. |
| T3; T4 | T3 = Intersection; T4 = Union | ToN-IoT | ts | timestamp | Timestamp-derived field; excluded to avoid capture schedule artefacts and non-transferable temporal metadata. |
| T3; T4 | T3 = Intersection; T4 = Union | ToN-IoT | src_ip | src_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| T3; T4 | T3 = Intersection; T4 = Union | ToN-IoT | dst_ip | dst_ip | Identifier-like field; excluded to avoid host identity leakage and non-transferable addressing artefacts. |
| T3; T4 | T3 = Intersection; T4 = Union | ToN-IoT | type | attack_type | Label-proximal field; excluded because it records the target-side attack family. |
| T3; T4 | T3 = Intersection; T4 = Union | ToN-IoT | label | label | Label-proximal field; excluded because it directly encodes the prediction target. |
Appendix F. Paired Statistical Test Outputs
| Ctx. | Method | Ratio | Protocol | Mean Paired F1 | 95% CI | Adj. | Adj. | ||
|---|---|---|---|---|---|---|---|---|---|
| Primary Edge-IIoTset comparisons: GGA/JSTN, C1–C8, all ratios | |||||||||
| C1 | GGA | Intersection | <0.001 | <0.001 | |||||
| C1 | GGA | Intersection | |||||||
| C1 | GGA | Intersection | <0.001 | <0.001 | |||||
| C2 | GGA | Union | <0.001 | <0.001 | |||||
| C2 | GGA | Union | <0.001 | <0.001 | |||||
| C2 | GGA | Union | <0.001 | <0.001 | |||||
| C3 | GGA | Intersection | <0.001 | <0.001 | |||||
| C3 | GGA | Intersection | <0.001 | <0.001 | |||||
| C3 | GGA | Intersection | <0.001 | <0.001 | |||||
| C4 | GGA | Union | <0.001 | <0.001 | |||||
| C4 | GGA | Union | <0.001 | <0.001 | |||||
| C4 | GGA | Union | <0.001 | <0.001 | |||||
| C5 | JSTN | Intersection | <0.001 | <0.001 | |||||
| C5 | JSTN | Intersection | <0.001 | <0.001 | |||||
| C5 | JSTN | Intersection | <0.001 | <0.001 | |||||
| C6 | JSTN | Union | <0.001 | <0.001 | |||||
| C6 | JSTN | Union | |||||||
| C6 | JSTN | Union | <0.001 | <0.001 | |||||
| C7 | JSTN | Intersection | <0.001 | <0.001 | |||||
| C7 | JSTN | Intersection | <0.001 | <0.001 | |||||
| C7 | JSTN | Intersection | <0.001 | <0.001 | |||||
| C8 | JSTN | Union | |||||||
| C8 | JSTN | Union | <0.001 | <0.001 | |||||
| C8 | JSTN | Union | <0.001 | <0.001 | |||||
| Edge-IIoTset method coverage comparisons: PMGN/CWAN at | |||||||||
| C1 | PMGN | Intersection | <0.001 | <0.001 | |||||
| C2 | PMGN | Union | <0.001 | <0.001 | |||||
| C3 | PMGN | Intersection | <0.001 | <0.001 | |||||
| C4 | PMGN | Union | <0.001 | <0.001 | |||||
| C5 | CWAN | Intersection | <0.001 | <0.001 | |||||
| C6 | CWAN | Union | <0.001 | <0.001 | |||||
| C7 | CWAN | Intersection | <0.001 | <0.001 | |||||
| C8 | CWAN | Union | <0.001 | <0.001 | |||||
| ToN-IoT second-target confirmatory comparisons at | |||||||||
| T1 | GGA | Intersection | <0.001 | <0.001 | |||||
| T2 | GGA | Union | <0.001 | <0.001 | |||||
| T3 | JSTN | Intersection | <0.001 | <0.001 | |||||
| T4 | JSTN | Union | |||||||
Appendix G. Evidence Bundle Schema
| Artifact Field | Description | DA or Coverage | Anchor Runs |
|---|---|---|---|
| run_signature | Stable identifier binding method or anchor type, target dataset, context, protocol, ratio, seed, and reporting view. | Yes | Yes |
| target_dataset_role | Indicator of whether the run belongs to the main Edge-IIoTset benchmark target or the ToN-IoT second-target confirmatory setting. | Yes | Yes |
| resolved_context_spec | Fully resolved context specification, including domain tuple, supervision regime, representation contract, label contract, labelled target budget, and fixed target test identity. | Yes | Yes |
| context_fingerprint | Stable fingerprint or equivalent identifier for the resolved context, used to verify context identity across runs and derived analyses. | Yes | Yes |
| split_manifest_ref | Manifest filename or identifier, together with the split-identity record used to bind source partitions, target-training partitions, validation data, and fixed target test rows. | Yes | Yes |
| labelled_target_subset_ref | Seed-specific labelled-target subset identifier or hash. Required for semi-supervised DA, method coverage checks, matched-budget target-only anchors, and paired DA-versus-target-only comparisons. | Yes | Cond. |
| seed | Fixed random seed used for the execution. | Yes | Yes |
| label_contract_ref | Reference to the closed-set five-class label contract and raw-label mapping used by the run. | Yes | Yes |
| representation_provenance | Realised feature interface, retained columns, materialised Union-only fields where applicable, excluded feature record, preprocessing scope, applied transform sequence, and fitted transform state where applicable, such as scaler parameters or encoder vocabularies fitted only on permitted training partitions. | Yes | Yes |
| method_role | Declared role of the evaluated method: primary selected exemplar, method coverage check, confirmatory target run, source-only anchor, CORAL anchor, matched-budget target-only anchor, or oracle target-only anchor. | Yes | Yes |
| method_provenance | DA method name, declared native regime, implementation provenance, target label use contract, and fixed method configuration. | Yes | No |
| anchor_provenance | Anchor type, learner configuration, permitted supervision, ratio dependence flag, and target label budget where applicable. | No | Yes |
| train_config | Learner configuration, optimiser or learner settings, early stopping policy, fixed parameters, validation rule, and training budget record. | Yes | Yes |
| validation_record | Validation metric history or selected validation score used for model selection where applicable. The fixed target test split is never used for model selection. | Yes | Yes |
| prediction_file_ref | Path or reference to predictions on the fixed target test split. | Yes | Yes |
| score_file_ref | Path or reference to class scores or probabilities, where available and semantically meaningful. | Yes | Yes |
| metrics_summary | Target test metrics for the run, including Macro-F1, accuracy, Weighted-F1, and other reported metrics where semantically appropriate. | Yes | Yes |
| gapclosure_inputs | Source-only and oracle target-only references used when a run contributes to GapClosure, same-budget GapClosure difference, or residual headroom reporting. | Yes | Cond. |
| paired_comparison_ref | Reference to the paired DA-versus-matched-budget target-only comparison record when the run contributes to paired statistical testing. | Cond. | Cond. |
| statistical_family_ref | Declared inference-family identifier used for Holm–Bonferroni correction: primary Edge-IIoTset comparisons, PMGN/CWAN method coverage comparisons, or ToN-IoT confirmatory comparisons. | Cond. | Cond. |
| qa_core_status | PASS, WARN, or FAIL checks for split identity, target test isolation, labelled subset identity, label consistency, representation contract match, leakage boundaries, and required output completeness. | Yes | Yes |
| qa_extended_warnings | Scarcity warnings, class support warnings, representation sensitivity warnings, target-confirmatory warnings, and method role warnings, such as attempting to interpret a coverage check row as a primary method row or pooling confirmatory target rows with main benchmark rows. | Yes | Yes |
| run_status | Completion status, failure code if applicable, aggregation eligibility, and reason for exclusion from aggregation where applicable. | Yes | Yes |
References
- da Costa, K.A.P.; Papa, J.P.; Lisboa, C.O.; Muñoz, R.; de Albuquerque, V.H.C. Internet of Things: A survey on machine learning-based intrusion detection approaches. Comput. Netw. 2019, 151, 147–157. [Google Scholar] [CrossRef]
- Layeghy, S.; Baktashmotlagh, M.; Portmann, M. DI-NIDS: Domain invariant network intrusion detection system. Knowl.-Based Syst. 2023, 273, 110626. [Google Scholar] [CrossRef]
- Yuan, X.; Han, S.; Huang, W.; Ye, H.; Kong, X.; Zhang, F. A simple framework to enhance the adversarial robustness of deep learning-based intrusion detection system. Comput. Secur. 2024, 137, 103644. [Google Scholar] [CrossRef]
- Zhang, J.; Li, Y.; Zhang, L. Heterogeneous network intrusion detection via domain adaptation in IoT environment. Internet Technol. Lett. 2025, 8, e531. [Google Scholar] [CrossRef]
- Jin, H. Cross-Protocol Domain Gap in Internet of Things Intrusion and Anomaly Detection: An Empirical Internet Protocol-to-Bluetooth Low Energy Study of Domain-Adversarial Training. Sensors 2026, 26, 1184. [Google Scholar] [CrossRef]
- Apruzzese, G.; Pajola, L.; Conti, M. The Cross-evaluation of Machine Learning-based Network Intrusion Detection Systems. IEEE Trans. Netw. Serv. Manag. 2022, 19, 5152–5169. [Google Scholar] [CrossRef]
- Tavallaee, M.; Stakhanova, N.; Ghorbani, A.A. Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2010, 40, 516–524. [Google Scholar] [CrossRef]
- Hamouda, D.; Ferrag, M.A.; Benhamida, N.; Seridi, H.; Ghanem, M.C. Revolutionizing intrusion detection in industrial IoT with distributed learning and deep generative techniques. Internet Things 2024, 26, 101149. [Google Scholar] [CrossRef]
- Ferrag, M.A.; Friha, O.; Kantarci, B.; Tihanyi, N.; Cordeiro, L.; Debbah, M.; Hamouda, D.; Al-Hawawreh, M.; Choo, K.K.R. Edge Learning for 6G-Enabled Internet of Things: A Comprehensive Survey of Vulnerabilities, Datasets, and Defenses. IEEE Commun. Surv. Tutor. 2023, 25, 2654–2713. [Google Scholar] [CrossRef]
- Rahman, M.M.; Shakil, S.A.; Mustakim, M.R. A survey on intrusion detection system in IoT networks. Cyber Secur. Appl. 2025, 3, 100082. [Google Scholar] [CrossRef]
- Sun, S.; Zhou, L.; Wang, Z.; Han, L. Robust intrusion detection based on personalized federated learning for IoT environment. Comput. Secur. 2025, 154, 104442. [Google Scholar] [CrossRef]
- Sommer, R.; Paxson, V. Outside the closed world: On using machine learning for network intrusion detection. In Proceedings of the 2010 IEEE Symposium on Security and Privacy; IEEE: Piscataway, NJ, USA, 2010; pp. 305–316. [Google Scholar] [CrossRef]
- Ring, M.; Wunderlich, S.; Scheuring, D.; Landes, D.; Hotho, A. A survey of network-based intrusion detection data sets. Comput. Secur. 2019, 86, 147–167. [Google Scholar] [CrossRef]
- Chou, D.; Jiang, M. A Survey on Data-driven Network Intrusion Detection. ACM Comput. Surv. 2021, 54, 182. [Google Scholar] [CrossRef]
- Wu, J.; Wang, Y. TriHID: Towards verifiable domain adaptation-based IoT intrusion detection in heterogeneous environment. Expert Syst. Appl. 2026, 298, 129543. [Google Scholar] [CrossRef]
- Wu, J.; Dai, H.; Wang, Y.; Ye, K.; Xu, C. Heterogeneous Domain Adaptation for IoT Intrusion Detection: A Geometric Graph Alignment Approach. IEEE Internet Things J. 2023, 10, 10764–10777. [Google Scholar] [CrossRef]
- Wu, J.; Wang, Y.; Xie, B.; Li, S.; Dai, H.; Ye, K.; Xu, C. Joint Semantic Transfer Network for IoT Intrusion Detection. IEEE Internet Things J. 2023, 10, 3368–3383. [Google Scholar] [CrossRef]
- Wang, Z.; Luo, Y.; Huang, Z.; Baktashmotlagh, M. Prototype-Matching Graph Network for Heterogeneous Domain Adaptation. In Proceedings of the 28th ACM International Conference on Multimedia (ACM MM ’20); Association for Computing Machinery: New York, NY, USA, 2020; pp. 2104–2112. [Google Scholar] [CrossRef]
- Yao, Y.; Li, X.; Zhang, Y.; Ye, Y. Multisource Heterogeneous Domain Adaptation With Conditional Weighting Adversarial Network. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 2079–2092. [Google Scholar] [CrossRef]
- Sun, B.; Feng, J.; Saenko, K. Return of Frustratingly Easy Domain Adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Palo Alto, CA, USA, 2016; Volume 30. [Google Scholar] [CrossRef]
- Alsaedi, A.; Moustafa, N.; Tari, Z.; Mahmood, A.; Anwar, A.N. TON-IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. IEEE Access 2020, 8, 165130–165150. [Google Scholar] [CrossRef]
- Ferrag, M.A.; Friha, O.; Hamouda, D.; Maglaras, L.; Janicke, H. Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning. IEEE Access 2022, 10, 40281–40306. [Google Scholar] [CrossRef]
- Wang, Q.; Wang, X.; Liu, H.; Wang, Y.; Ren, J.; Zhang, B. A Domain Adaptive IoT Intrusion Detection Algorithm Based on GWR-GCN Feature Extraction and Conditional Domain Adversary. IEEE Internet Things J. 2024, 11, 41223–41234. [Google Scholar] [CrossRef]
- Wu, J.; Dai, H.; Kent, K.B.; Yen, J.; Xu, C.; Wang, Y. Open set dandelion network for IoT intrusion detection. ACM Trans. Internet Technol. 2024, 24, 4. [Google Scholar] [CrossRef]
- Jing, T.; Xia, H.; Liu, H.; Ding, Z. Interpretable Novel Target Discovery through Open-Set Domain Adaptation. ACM Trans. Multimed. Comput. Commun. Appl. 2025, 21, 147. [Google Scholar] [CrossRef]
- Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the ICISSP 2018—Proceedings of the 4th International Conference on Information Systems Security and Privacy; SciTePress: Setúbal, Portugal, 2018; Volume 2018, pp. 108–116. [Google Scholar] [CrossRef]
- Moustafa, N.; Slay, J. The significant features of the UNSW-NB15 and the KDD99 data sets for Network Intrusion Detection Systems. In Proceedings of the 2015 4th International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, BADGERS 2015; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2017; pp. 25–31. [Google Scholar] [CrossRef]
- Ragab, M.; Eldele, E.; Tan, W.L.; Foo, C.S.; Chen, Z.; Wu, M.; Kwoh, C.K.; Li, X. ADATIME: A Benchmarking Suite for Domain Adaptation on Time Series Data. ACM Trans. Knowl. Discov. Data 2023, 17, 1–18. [Google Scholar] [CrossRef]
- Zhao, H.; Zhang, S.; Wu, G.; Moura, J.M.F.; Costeira, J.P.; Gordon, G.J. Adversarial Multiple Source Domain Adaptation. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Volume 31, pp. 8559–8570. [Google Scholar]
- Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
- Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Davis, J.; Goadrich, M. The Relationship Between Precision-Recall and ROC Curves. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 3146–3154. [Google Scholar]






| Research Strand | Representative Works | Main Contribution of the Strand | How the Present Paper Extends It |
|---|---|---|---|
| Closed-set HDA methods | GGA [16]; JSTN [17]; PMGN [18]; CWAN [19] | Develop native methods for heterogeneous transfer under limited target supervision, prototype-based cross-domain matching, or multi-source source-side supervision. | Evaluates primary selected native SS-HDA and MS-HDA exemplars under fixed contexts, labelled target budgets, representation contracts, and anchor references; uses PMGN and CWAN as method coverage checks to reduce dependence on a single selected method. |
| Unsupervised and open-set DA for IDS | GWR-GCN [23]; OSDN [24]; OSDA work [25] | Addresses settings where target labels are unavailable or where unknown target classes may appear. | Recognises these as important but distinct settings with different supervision contracts and output semantics; avoids mixing them with closed-set claims. |
| Heterogeneous IDS datasets and benchmark resources | CICIDS2017 [26]; UNSW-NB15 [27]; NSL-KDD [7]; ToN-IoT [21]; Edge-IIoTset [22]; TriHID [15] | Provide realistic source and target domains for studying heterogeneous IDS transfer. | Uses multiple source configurations with Edge-IIoTset as the main fixed target and ToN-IoT as a compact second-target confirmatory check, while making label contracts, feature contracts, split identities, and target-conditioned interpretation explicit. |
| Cross-dataset IDS evaluation discipline | Apruzzese et al. [6]; Chou et al. [14]; Jin et al. [5] | Shows that dataset choice, protocol design, and evaluation context can materially change conclusions. | Operationalises this concern through fixed resolved contexts, leakage-safe preprocessing, repeated seeds, paired statistical testing, admissibility checks, and anchor-based interpretation. |
| Anchor-based DA benchmarking | AdaTime [28]; multi-source DA benchmarks [29] | Uses source-only and target-aware references to contextualise adaptation gains. | Extends anchor logic into a closed-set IoT IDS headroom analysis using GapClosure, matched-budget target-only comparison, and deployment direction interpretation. |
| Dataset Family | Benchmark Classes Retained | Merged Raw Families | Excluded Families or Notes |
|---|---|---|---|
| CICIDS2017 | normal, scan, DoS, DDoS, exploit/credential abuse | DoS subtypes are merged into DoS; credential and application abuse behaviours are merged into exploit/credential abuse | Bot, Infiltration and Heartbleed are excluded from the closed-set contract because they do not provide stable cross-dataset alignment in the reported tuples |
| UNSW-NB15 | normal, scan, DoS, DDoS, exploit/credential abuse | Generic is mapped into DDoS, where it functions as high-volume disruptive traffic; exploits are mapped into exploit/credential abuse | Families with weak semantic alignment to the target contract are excluded rather than retained as unstable benchmark classes |
| NSL-KDD | normal, scan, DoS, DDoS, exploit/credential abuse | Probe is mapped to scan; DoS variants are mapped to DoS/DDoS according to benchmark-side consolidation; U2R/R2L-style behaviours are mapped into exploit/credential abuse where retained | Used only as a legacy auxiliary source in MS-HDA; not interpreted as a modern IoT source by itself |
| Edge-IIoTset | normal, scan, DoS, DDoS, exploit/credential abuse | Target attack families are consolidated into the five benchmark classes where alignment is credible | Target-side families outside the closed-set contract are excluded from this paper rather than treated as unknown classes |
| ToN-IoT confirmatory target | normal, scan, DoS, DDoS, exploit/credential abuse | injection, password, and xss are merged into exploit/credential abuse | backdoor, ransomware, and mitm are excluded from the closed-set confirmatory contract; the full ToN-IoT mapping is reported in Appendix E |
| Domain Tuple | Protocol | Shared Raw Candidates | Retained Final Features | Numeric | Categorical | Union-Only Materialised | Excluded |
|---|---|---|---|---|---|---|---|
| CICIDS2017 → Edge-IIoTset | Intersection | 26 | 22 | 20 | 2 | 0 | 11 |
| CICIDS2017 → Edge-IIoTset | Union | 26 | 34 | 29 | 5 | 12 | 11 |
| UNSW-NB15 → Edge-IIoTset | Intersection | 21 | 17 | 15 | 2 | 0 | 16 |
| UNSW-NB15 → Edge-IIoTset | Union | 21 | 33 | 27 | 6 | 16 | 16 |
| CICIDS2017 + UNSW-NB15 → Edge-IIoTset | Intersection | 18 | 14 | 12 | 2 | 0 | 19 |
| CICIDS2017 + UNSW-NB15 → Edge-IIoTset | Union | 18 | 36 | 29 | 7 | 22 | 19 |
| CICIDS2017 + NSL-KDD → Edge-IIoTset | Intersection | 19 | 16 | 13 | 3 | 0 | 20 |
| CICIDS2017 + NSL-KDD → Edge-IIoTset | Union | 19 | 31 | 25 | 6 | 15 | 20 |
| Ctx | Domain Tuple | Setting | Protocol | Ratio | Target Test | Main Mismatch Source | Permitted References |
|---|---|---|---|---|---|---|---|
| C1 | CICIDS2017 → Edge-IIoTset | SS-HDA | Intersection | 200,000 | Environment shift and feature interface mismatch | SO, CORAL, DA, TO | |
| C2 | CICIDS2017 → Edge-IIoTset | SS-HDA | Union | 200,000 | Environment shift and materialised schema mismatch | SO, CORAL, DA, TO | |
| C3 | UNSW-NB15 → Edge-IIoTset | SS-HDA | Intersection | 200,000 | Collection process shift and feature interface mismatch | SO, CORAL, DA, TO | |
| C4 | UNSW-NB15 → Edge-IIoTset | SS-HDA | Union | 200,000 | Collection process shift and materialised schema mismatch | SO, CORAL, DA, TO | |
| C5 | CICIDS2017 + UNSW-NB15 → Edge-IIoTset | MS-HDA | Intersection | 200,000 | Contemporary source diversity and target mismatch | SO, CORAL, DA, TO | |
| C6 | CICIDS2017 + UNSW-NB15 → Edge-IIoTset | MS-HDA | Union | 200,000 | Contemporary source diversity under materialised schema mismatch | SO, CORAL, DA, TO | |
| C7 | CICIDS2017 + NSL-KDD → Edge-IIoTset | MS-HDA | Intersection | 200,000 | Mixed-vintage source divergence and target mismatch | SO, CORAL, DA, TO | |
| C8 | CICIDS2017 + NSL-KDD → Edge-IIoTset | MS-HDA | Union | 200,000 | Mixed-vintage source divergence under materialised schema mismatch | SO, CORAL, DA, TO |
| Ratio | Target Labelled | Target Unlabelled | Target Test |
|---|---|---|---|
| 2000 | 100,000 | 200,000 | |
| 10,000 | 100,000 | 200,000 | |
| 20,000 | 100,000 | 200,000 |
| Ctx. | Source Domain (s) | Target | Regime | Protocol | Ratio | Method |
|---|---|---|---|---|---|---|
| T1 | C17 | ToN-IoT | SS-HDA | Intersection | GGA | |
| T2 | C17 | ToN-IoT | SS-HDA | Union | GGA | |
| T3 | C17 + U15 | ToN-IoT | MS-HDA | Intersection | JSTN | |
| T4 | C17 + U15 | ToN-IoT | MS-HDA | Union | JSTN |
| Method | Benchmark Role | Source Setting | Target Supervision Contract | Evaluated Contexts | Reporting Role |
|---|---|---|---|---|---|
| GGA | Primary SS-HDA exemplar | One labelled source domain | C1–C4, all ratios; T1–T2 at | Primary result | |
| PMGN | Secondary SS-HDA coverage method | One labelled source domain | C1–C4 at | Coverage check | |
| JSTN | Primary MS-HDA exemplar | Multiple labelled source domains | C5–C8, all ratios; T3–T4 at | Primary result | |
| CWAN | Secondary MS-HDA coverage method | Multiple labelled source domains | C5–C8 at | Coverage check |
| Anchor | Role in This Paper | Target Label Use | Ratio Dependence |
|---|---|---|---|
| Source-only | Naive cross-domain transfer floor without adaptation | No target labels | Fixed within context family |
| CORAL | Lightweight second-order alignment reference under the same split and representation contract | No target labels; uses unlabelled target-training features | Fixed within context family |
| Target-only, matched-budget | Direct supervised target reference using the same labelled target budget available to the DA method | Uses ratio-specific labelled target subset | Ratio-specific |
| Target-only, oracle | Upper target-native reference using the full legal target training pool | Uses full legal target training labels | Fixed within context family |
| Item | Value |
|---|---|
| Repeated-run policy | Twenty runs per fixed method–context–ratio configuration, with the same seed determining model initialisation and labelled target subsampling. |
| Seed list | . |
| Aggregation rule | Mean ± standard deviation across twenty seeds within fixed context, ratio, representation contract, and declared method role. Paired DA-versus-target-only statistical tests are reported separately. |
| Primary SS-HDA exemplar | GGA in C1–C4 across , , and |
| Secondary SS-HDA coverage check | PMGN in C1–C4 at |
| Primary MS-HDA exemplar | JSTN in C5–C8 across , , and |
| Secondary MS-HDA coverage check | CWAN in C5–C8 at |
| Neural optimiser | Adam |
| Learning rate | |
| Weight decay | |
| Mini-batch size | 256 |
| Maximum epochs | 100 |
| Early stopping patience | 15 epochs |
| Closed-set model selection metric | Validation Macro-F1 on context-permitted validation data |
| Target test use | Never used for training, preprocessing fit, hyperparameter selection, threshold choice, model selection, or post hoc calibration |
| Anchor learner | LightGBM multiclass with fixed benchmark settings for source-only, CORAL, matched-budget target-only, and oracle target-only rows |
| Admissibility requirement | Split identity, label contract, representation protocol, leakage boundary, method role flag, predictions, metrics, and provenance must be complete before aggregation |
| Ctx | Domain Tuple | Setting | Protocol | Source-Only Macro-F1 | CORAL Macro-F1 | Matched-Budget Target-Only Range | Oracle Target-Only |
|---|---|---|---|---|---|---|---|
| C1 | CIC17 → Edge | SS-HDA | Intersection | 0.558 | 0.669 | 0.829–0.901 | 0.940 |
| C2 | CIC17 → Edge | SS-HDA | Union | 0.541 | 0.642 | 0.806–0.862 | 0.931 |
| C3 | UNSW → Edge | SS-HDA | Intersection | 0.487 | 0.595 | 0.801–0.874 | 0.940 |
| C4 | UNSW → Edge | SS-HDA | Union | 0.469 | 0.570 | 0.778–0.858 | 0.931 |
| C5 | CIC17+UNSW → Edge | MS-HDA | Intersection | 0.561 | 0.677 | 0.857–0.931 | 0.950 |
| C6 | CIC17+UNSW → Edge | MS-HDA | Union | 0.551 | 0.677 | 0.854–0.915 | 0.940 |
| C7 | CIC17+NSL → Edge | MS-HDA | Intersection | 0.518 | 0.635 | 0.861–0.916 | 0.949 |
| C8 | CIC17+NSL → Edge | MS-HDA | Union | 0.500 | 0.612 | 0.859–0.923 | 0.941 |
| Ctx | Tuple | Protocol | Ratio | Transfer Anchors | Primary Native DA | Target Ref. | |
|---|---|---|---|---|---|---|---|
| Src. | CORAL | GGA | Oracle | ||||
| C1 | CICIDS2017 → Edge-IIoTset | Intersection | 0.558 | 0.669 | 0.940 | ||
| C1 | CICIDS2017 → Edge-IIoTset | Intersection | 0.558 | 0.669 | 0.940 | ||
| C1 | CICIDS2017 → Edge-IIoTset | Intersection | 0.558 | 0.669 | 0.940 | ||
| C2 | CICIDS2017 → Edge-IIoTset | Union | 0.541 | 0.642 | 0.931 | ||
| C2 | CICIDS2017 → Edge-IIoTset | Union | 0.541 | 0.642 | 0.931 | ||
| C2 | CICIDS2017 → Edge-IIoTset | Union | 0.541 | 0.642 | 0.931 | ||
| C3 | UNSW-NB15 → Edge-IIoTset | Intersection | 0.487 | 0.595 | 0.940 | ||
| C3 | UNSW-NB15 → Edge-IIoTset | Intersection | 0.487 | 0.595 | 0.940 | ||
| C3 | UNSW-NB15 → Edge-IIoTset | Intersection | 0.487 | 0.595 | 0.940 | ||
| C4 | UNSW-NB15 → Edge-IIoTset | Union | 0.469 | 0.570 | 0.931 | ||
| C4 | UNSW-NB15 → Edge-IIoTset | Union | 0.469 | 0.570 | 0.931 | ||
| C4 | UNSW-NB15 → Edge-IIoTset | Union | 0.469 | 0.570 | 0.931 | ||
| Ctx | Tuple | Protocol | Ratio | Transfer Anchors | Primary Native DA | Target Ref. | |
|---|---|---|---|---|---|---|---|
| Src. | CORAL | JSTN | Oracle | ||||
| C5 | CICIDS2017 + UNSW-NB15 → Edge-IIoTset | Intersection | 0.561 | 0.677 | 0.950 | ||
| C5 | CICIDS2017 + UNSW-NB15 → Edge-IIoTset | Intersection | 0.561 | 0.677 | 0.950 | ||
| C5 | CICIDS2017 + UNSW-NB15 → Edge-IIoTset | Intersection | 0.561 | 0.677 | 0.950 | ||
| C6 | CICIDS2017 + UNSW-NB15 → Edge-IIoTset | Union | 0.551 | 0.677 | 0.940 | ||
| C6 | CICIDS2017 + UNSW-NB15 → Edge-IIoTset | Union | 0.551 | 0.677 | 0.940 | ||
| C6 | CICIDS2017 + UNSW-NB15 → Edge-IIoTset | Union | 0.551 | 0.677 | 0.940 | ||
| C7 | CICIDS2017 + NSL-KDD → Edge-IIoTset | Intersection | 0.518 | 0.635 | 0.949 | ||
| C7 | CICIDS2017 + NSL-KDD → Edge-IIoTset | Intersection | 0.518 | 0.635 | 0.949 | ||
| C7 | CICIDS2017 + NSL-KDD → Edge-IIoTset | Intersection | 0.518 | 0.635 | 0.949 | ||
| C8 | CICIDS2017 + NSL-KDD → Edge-IIoTset | Union | 0.500 | 0.612 | 0.941 | ||
| C8 | CICIDS2017 + NSL-KDD → Edge-IIoTset | Union | 0.500 | 0.612 | 0.941 | ||
| C8 | CICIDS2017 + NSL-KDD → Edge-IIoTset | Union | 0.500 | 0.612 | 0.941 | ||
| Tuple Family | Regime | Method | Ratio | Macro-F1 by Representation Contract | ||
|---|---|---|---|---|---|---|
| I | U | |||||
| CICIDS2017 → Edge-IIoTset | SS-HDA | GGA | 0.855 | 0.791 | ||
| CICIDS2017 → Edge-IIoTset | SS-HDA | GGA | 0.877 | 0.816 | ||
| CICIDS2017 → Edge-IIoTset | SS-HDA | GGA | 0.885 | 0.824 | ||
| UNSW-NB15 → Edge-IIoTset | SS-HDA | GGA | 0.782 | 0.731 | ||
| UNSW-NB15 → Edge-IIoTset | SS-HDA | GGA | 0.814 | 0.762 | ||
| UNSW-NB15 → Edge-IIoTset | SS-HDA | GGA | 0.824 | 0.776 | ||
| CICIDS2017 + UNSW-NB15 → Edge-IIoTset | MS-HDA | JSTN | 0.840 | 0.877 | ||
| CICIDS2017 + UNSW-NB15 → Edge-IIoTset | MS-HDA | JSTN | 0.863 | 0.900 | ||
| CICIDS2017 + UNSW-NB15 → Edge-IIoTset | MS-HDA | JSTN | 0.883 | 0.904 | ||
| CICIDS2017 + NSL-KDD → Edge-IIoTset | MS-HDA | JSTN | 0.888 | 0.853 | ||
| CICIDS2017 + NSL-KDD → Edge-IIoTset | MS-HDA | JSTN | 0.917 | 0.885 | ||
| CICIDS2017 + NSL-KDD → Edge-IIoTset | MS-HDA | JSTN | 0.922 | 0.893 | ||
| Summary Group | Method | Ratio | Mean Macro-F1 Across Contexts | Context-Level Range | Mean Seed-Level std |
|---|---|---|---|---|---|
| SS-HDA, C1–C4 | GGA | 0.790 | 0.731–0.855 | 0.007 | |
| SS-HDA, C1–C4 | GGA | 0.817 | 0.762–0.877 | 0.008 | |
| SS-HDA, C1–C4 | GGA | 0.827 | 0.776–0.885 | 0.007 | |
| MS-HDA contemporary-source, C5–C6 | JSTN | 0.858 | 0.840–0.877 | 0.009 | |
| MS-HDA contemporary-source, C5–C6 | JSTN | 0.881 | 0.863–0.900 | 0.007 | |
| MS-HDA contemporary-source, C5–C6 | JSTN | 0.894 | 0.883–0.904 | 0.006 | |
| MS-HDA mixed-vintage, C7–C8 | JSTN | 0.870 | 0.853–0.888 | 0.008 | |
| MS-HDA mixed-vintage, C7–C8 | JSTN | 0.901 | 0.885–0.917 | 0.007 | |
| MS-HDA mixed-vintage, C7–C8 | JSTN | 0.907 | 0.893–0.922 | 0.008 |
| Ctx | Tuple | Regime | Primary Native Method | |||
|---|---|---|---|---|---|---|
| C1 | CICIDS2017 → Edge-IIoTset | SS-HDA | GGA | 0.778 | 0.835 | 0.857 |
| C2 | CICIDS2017 → Edge-IIoTset | SS-HDA | GGA | 0.642 | 0.707 | 0.727 |
| C3 | UNSW-NB15 → Edge-IIoTset | SS-HDA | GGA | 0.651 | 0.722 | 0.745 |
| C4 | UNSW-NB15 → Edge-IIoTset | SS-HDA | GGA | 0.566 | 0.633 | 0.663 |
| C5 | CICIDS2017 + UNSW-NB15 → Edge-IIoTset | MS-HDA | JSTN | 0.718 | 0.776 | 0.827 |
| C6 | CICIDS2017 + UNSW-NB15 → Edge-IIoTset | MS-HDA | JSTN | 0.837 | 0.897 | 0.908 |
| C7 | CICIDS2017 + NSL-KDD → Edge-IIoTset | MS-HDA | JSTN | 0.858 | 0.926 | 0.936 |
| C8 | CICIDS2017 + NSL-KDD → Edge-IIoTset | MS-HDA | JSTN | 0.800 | 0.872 | 0.891 |
| Summary Group | Contexts | Primary Native Method | Ratio | Descriptive Mean GapClosure | Between-Context Spread |
|---|---|---|---|---|---|
| SS-HDA, CICIDS2017 source | C1–C2 | GGA | 0.710 | 0.096 | |
| SS-HDA, CICIDS2017 source | C1–C2 | GGA | 0.771 | 0.091 | |
| SS-HDA, CICIDS2017 source | C1–C2 | GGA | 0.792 | 0.092 | |
| SS-HDA, UNSW-NB15 source | C3–C4 | GGA | 0.609 | 0.060 | |
| SS-HDA, UNSW-NB15 source | C3–C4 | GGA | 0.678 | 0.063 | |
| SS-HDA, UNSW-NB15 source | C3–C4 | GGA | 0.704 | 0.058 | |
| MS-HDA, contemporary-source | C5–C6 | JSTN | 0.778 | 0.084 | |
| MS-HDA, contemporary-source | C5–C6 | JSTN | 0.837 | 0.086 | |
| MS-HDA, contemporary-source | C5–C6 | JSTN | 0.868 | 0.057 | |
| MS-HDA, mixed-vintage | C7–C8 | JSTN | 0.829 | 0.041 | |
| MS-HDA, mixed-vintage | C7–C8 | JSTN | 0.899 | 0.038 | |
| MS-HDA, mixed-vintage | C7–C8 | JSTN | 0.914 | 0.032 |
| Summary Group | Contexts | Ratio | DA > Target Only | Approx. Tie | DA < Target Only |
|---|---|---|---|---|---|
| SS-HDA, CICIDS2017 source | C1–C2 | 1/2 | 0/2 | 1/2 | |
| SS-HDA, CICIDS2017 source | C1–C2 | 0/2 | 1/2 | 1/2 | |
| SS-HDA, CICIDS2017 source | C1–C2 | 0/2 | 0/2 | 2/2 | |
| SS-HDA, UNSW-NB15 source | C3–C4 | 0/2 | 0/2 | 2/2 | |
| SS-HDA, UNSW-NB15 source | C3–C4 | 0/2 | 0/2 | 2/2 | |
| SS-HDA, UNSW-NB15 source | C3–C4 | 0/2 | 0/2 | 2/2 | |
| MS-HDA, contemporary-source | C5–C6 | 1/2 | 0/2 | 1/2 | |
| MS-HDA, contemporary-source | C5–C6 | 0/2 | 1/2 | 1/2 | |
| MS-HDA, contemporary-source | C5–C6 | 0/2 | 0/2 | 2/2 | |
| MS-HDA, mixed-vintage | C7–C8 | 1/2 | 1/2 | 0/2 | |
| MS-HDA, mixed-vintage | C7–C8 | 1/2 | 0/2 | 1/2 | |
| MS-HDA, mixed-vintage | C7–C8 | 0/2 | 1/2 | 1/2 |
| Ctx | Regime | Tuple | |||
|---|---|---|---|---|---|
| C1 | SS-HDA | CICIDS2017 → Edge-IIoTset | |||
| C2 | SS-HDA | CICIDS2017 → Edge-IIoTset | |||
| C3 | SS-HDA | UNSW-NB15 → Edge-IIoTset | |||
| C4 | SS-HDA | UNSW-NB15 → Edge-IIoTset | |||
| C5 | MS-HDA | CICIDS2017 + UNSW-NB15 → Edge-IIoTset | |||
| C6 | MS-HDA | CICIDS2017 + UNSW-NB15 → Edge-IIoTset | |||
| C7 | MS-HDA | CICIDS2017 + NSL-KDD → Edge-IIoTset | |||
| C8 | MS-HDA | CICIDS2017 + NSL-KDD → Edge-IIoTset |
| Analysis Block | Ctx. | Method | Protocol | Ratio | Mean Paired F1 [95% CI] | Adj. | Adj. | ||
|---|---|---|---|---|---|---|---|---|---|
| Edge-IIoTset primary native DA examples | |||||||||
| SS-HDA primary | C1 | GGA | Intersection | 0.601 | 0.240 | ||||
| SS-HDA primary | C4 | GGA | Union | <0.001 | <0.001 | ||||
| MS-HDA primary | C7 | JSTN | Intersection | <0.001 | <0.001 | ||||
| MS-HDA primary | C8 | JSTN | Union | <0.001 | <0.001 | ||||
| Edge-IIoTset method coverage examples | |||||||||
| SS-HDA coverage | C1 | PMGN | Intersection | <0.001 | <0.001 | ||||
| MS-HDA coverage | C7 | CWAN | Intersection | <0.001 | <0.001 | ||||
| ToN-IoT second-target confirmatory examples | |||||||||
| SS-HDA confirmatory | T2 | GGA | Union | <0.001 | <0.001 | ||||
| MS-HDA confirmatory | T3 | JSTN | Intersection | <0.001 | <0.001 | ||||
| Ctx | Regime | Protocol | Primary Method | Primary GapClosure | Coverage Method | Coverage GapClosure |
|---|---|---|---|---|---|---|
| C1 | SS-HDA | Intersection | GGA | 0.835 | PMGN | 0.821 |
| C2 | SS-HDA | Union | GGA | 0.707 | PMGN | 0.686 |
| C3 | SS-HDA | Intersection | GGA | 0.722 | PMGN | 0.701 |
| C4 | SS-HDA | Union | GGA | 0.633 | PMGN | 0.606 |
| C5 | MS-HDA | Intersection | JSTN | 0.776 | CWAN | 0.812 |
| C6 | MS-HDA | Union | JSTN | 0.897 | CWAN | 0.927 |
| C7 | MS-HDA | Intersection | JSTN | 0.926 | CWAN | 0.910 |
| C8 | MS-HDA | Union | JSTN | 0.872 | CWAN | 0.854 |
| Ctx | Regime | Primary | Coverage | Direction Agreement | Interpretation |
|---|---|---|---|---|---|
| C1 | SS-HDA | −0.002 | −0.016 | Agree | Both methods obtain approximate ties with matched-budget target-only. |
| C2 | SS-HDA | −0.045 | −0.067 | Agree | Both methods fall below target-only under the Union stress condition. |
| C3 | SS-HDA | −0.070 | −0.090 | Agree | Both methods fall below target-only; PMGN is slightly weaker. |
| C4 | SS-HDA | −0.152 | −0.180 | Agree | Both methods show target-only dominance in the hardest SS-HDA setting. |
| C5 | MS-HDA | −0.129 | −0.093 | Agree | Both methods remain below target-only; CWAN is closer. |
| C6 | MS-HDA | −0.005 | +0.025 | Diverge | Borderline: JSTN is tied, while CWAN is slightly above target-only. |
| C7 | MS-HDA | +0.058 | +0.042 | Agree | Both methods support a DA-favourable deployment direction. |
| C8 | MS-HDA | −0.069 | −0.087 | Agree | Both methods remain below the strong target-only reference under Union. |
| Ctx | Target | Regime | Protocol | Method | DA Macro-F1 | DA GapClosure | Direction | |
|---|---|---|---|---|---|---|---|---|
| T1 | ToN-IoT | SS-HDA | Intersection | GGA | 0.650 | +0.036 | DA > TO | |
| T2 | ToN-IoT | SS-HDA | Union | GGA | 0.667 | −0.034 | DA < TO | |
| T3 | ToN-IoT | MS-HDA | Intersection | JSTN | 0.680 | +0.058 | DA > TO | |
| T4 | ToN-IoT | MS-HDA | Union | JSTN | 0.675 | −0.014 | Tie |
| Summary Group | Contexts | Primary Native Method | Ratio | Mean Residual | Between-Context Spread |
|---|---|---|---|---|---|
| SS-HDA | C1–C4 | GGA | 0.341 | 0.088 | |
| SS-HDA | C1–C4 | GGA | 0.276 | 0.083 | |
| SS-HDA | C1–C4 | GGA | 0.252 | 0.081 | |
| MS-HDA, contemporary-source | C5–C6 | JSTN | 0.222 | 0.084 | |
| MS-HDA, contemporary-source | C5–C6 | JSTN | 0.163 | 0.086 | |
| MS-HDA, contemporary-source | C5–C6 | JSTN | 0.132 | 0.057 | |
| MS-HDA, mixed-vintage | C7–C8 | JSTN | 0.171 | 0.041 | |
| MS-HDA, mixed-vintage | C7–C8 | JSTN | 0.101 | 0.038 | |
| MS-HDA, mixed-vintage | C7–C8 | JSTN | 0.086 | 0.032 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Chizari, M.; Khan Ali Mirza, Q.; Alam, A.; Chizari, H. Closed-Set Heterogeneous Domain Adaptation for IoT Intrusion Detection: An Anchor-Based Benchmark Across Single- and Multi-Source Transfer. Sensors 2026, 26, 3610. https://doi.org/10.3390/s26113610
Chizari M, Khan Ali Mirza Q, Alam A, Chizari H. Closed-Set Heterogeneous Domain Adaptation for IoT Intrusion Detection: An Anchor-Based Benchmark Across Single- and Multi-Source Transfer. Sensors. 2026; 26(11):3610. https://doi.org/10.3390/s26113610
Chicago/Turabian StyleChizari, Mohammad, Qublai Khan Ali Mirza, Abu Alam, and Hassan Chizari. 2026. "Closed-Set Heterogeneous Domain Adaptation for IoT Intrusion Detection: An Anchor-Based Benchmark Across Single- and Multi-Source Transfer" Sensors 26, no. 11: 3610. https://doi.org/10.3390/s26113610
APA StyleChizari, M., Khan Ali Mirza, Q., Alam, A., & Chizari, H. (2026). Closed-Set Heterogeneous Domain Adaptation for IoT Intrusion Detection: An Anchor-Based Benchmark Across Single- and Multi-Source Transfer. Sensors, 26(11), 3610. https://doi.org/10.3390/s26113610

