From Edge Transformer to IoT Decisions: Offloaded Embeddings for Lightweight Intrusion Detection
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsIn the field of IoT cybersecurity, the integration of large language models has long been constrained by the resource limitations of IoT devices, making lightweight deployment an urgent and persistent demand. This manuscript proposes the SEED framework, which achieves efficient intrusion detection through collaborative embedding offloading between edge and IoT devices. The research issue is closely aligned with practical application.
However, the following issue needs focusing on:
(1)Several claims presented in this paper are unfounded and lack theoretical justification. For examples, in section 3.1, “LLMs have demonstrated strong potential in detecting abnormal behaviors and pre-viously unseen attacks, including zero-day exploits…….”. This viewpoint lacks both credibility and empirical robustness. Similar deficiencies abound in this manuscript.
(2)The entire manuscript lacks strong theoretical innovation, largely consisting of integrated applications of existing techniques and methods.
(3)The organization of the manuscript lacks logical rigor and contains numerous grammatical and logical errors.
(4) The most critical deficiency in this manuscript is the absence of theoretical novelty.
Comments on the Quality of English LanguageThe English needs refining!
Author Response
Comment 1: Several claims presented in this paper are unfounded and lack theoretical justification. For examples, in section 3.1, “LLMs have demonstrated strong potential in detecting abnormal behaviors and pre-viously unseen attacks, including zero-day exploits…….”. This viewpoint lacks both credibility and empirical robustness. Similar deficiencies abound in this manuscript.
Response 1: Thank you for the remark. We tried to bring clear justifications to our claims throughout the paper.
Comment 2: The entire manuscript lacks strong theoretical innovation, largely consisting of integrated applications of existing techniques and methods.
Response 2: We agree with the comment, thank you. The work is support by both our previous work [23] and state of the art work [10] which already provide strong theoretical innovation. In this work, we extended them into a 2-tiers IoT architecture.
Comment 3: The organization of the manuscript lacks logical rigor and contains numerous grammatical and logical errors.
Response: Thank you for the remark. Clearer recommendation would have permit us to further improve the paper structure. We tried our best to improve the revised version.
Comment 4: The most critical deficiency in this manuscript is the absence of theoretical novelty.
Response 4: Thank you for pointing this out, we agree with the comment. However, since this work builds upon established theories presented in [10] and [23], our contribution lied not in introducing new theoretical foundations but in extending and operationalising those theories within a 2-tiers IoT architecture. We therefore expanded the system overview and added algorithm 1 to clearly explain the processes and network data management which were not fully discussed in the previous version. All theoretical components are properly cited to avoid plagiarism or self-plagiarism, while the novelty of this manuscript resides in the system-level formulation and integration, rather than in new theory development.
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper investigates how LLM–style Transformer representations can be practically leveraged in resource-constrained IoT security settings. It proposes SEED, a two-tier Edge–IoT intrusion detection framework in which a heavily compressed BERT-based model deployed at the edge generates semantic embeddings of network traffic, while ultra-lightweight neural classifiers on IoT devices perform real-time detection using these embeddings.
Technical comments and weaknesses:
The notion of "semantic embeddings" for network traffic is central to SEED, yet the paper could further discuss how these semantics relate to classical network features (e.g., flow statistics, protocol states) and whether the learned representations capture higher-level network behaviors rather than dataset-specific patterns.
The paper positions SEED within the broader AIoT and LLM ecosystem, but please add a discussion clarifying the boundary between language semantics and network semantics to help readers better understand what aspects of LLM capability are truly being transferred to network intrusion detection.
While the Edge–IoT split is motivated by resource constraints, it would be useful to discuss alternative architectural trade-offs (e.g., partial on-device embedding generation or hierarchical multi-edge setups) and in which deployment scenarios SEED’s design is most advantageous.
Given the ethical and privacy section, a more explicit discussion on governance and operational responsibility, such as who controls the edge model, how updates are audited, and how trust is established between IoT devices and the edge, should be provided to tell the practical relevance of the framework without requiring further empirical validation.
The current literature is weak. Given the strong emphasis on LLMs and networks, the authors should explicitly relate their approach to broader perspectives on LLMs in networking and networked systems. Please cite and discuss below:
Large Language Models Meet Next-Generation Networking Technologies: A Review (https://www.mdpi.com/1999-5903/16/10/365)
Role of Generative AI in AI-Based Digital Twins in Industry 5.0 and Evolution to Industry 6.0 (https://www.mdpi.com/2076-3417/15/18/10102)
Author Response
Comment 1: The notion of "semantic embeddings" for network traffic is central to SEED, yet the paper could further discuss how these semantics relate to classical network features (e.g., flow statistics, protocol states) and whether the learned representations capture higher-level network behaviors rather than dataset-specific patterns.
Response 1: Agree. Elaborations on this aspect have been provided in the revised version in the 5th paragraph of the system overview section. Also due to their particular architecture, Transformers-based models can learn high level networks behaviour as long as the initial datasets used for the training are rich enough.
Comment 2: The paper positions SEED within the broader AIoT and LLM ecosystem, but please add a discussion clarifying the boundary between language semantics and network semantics to help readers better understand what aspects of LLM capability are truly being transferred to network intrusion detection.
Response 2: Thank you for pointing this out. Indeed, in the first submission, we didn't clearly establish the mentioned boundary. However, in the revised version, we included a discussion in the introduction, stating that the model was trained from a completely different context (dedicated vocabulary) built from network traffic, leading to a domain-dependant model. Also, further elaborations on why this work are provided in [10] and [23].
Comment 3: While the Edge–IoT split is motivated by resource constraints, it would be useful to discuss alternative architectural trade-offs (e.g., partial on-device embedding generation or hierarchical multi-edge setups) and in which deployment scenarios SEED’s design is most advantageous.
Response 3: Agree. But in this work, we wanted to offload as much as possible workload from the resource-constrained devices to speed up the inference. The features learning is one of the bottlenecks in the learning process. So leaving that computational costly process to more resource-equipped devices will logically remove the computational overhead.
Comment 4: Given the ethical and privacy section, a more explicit discussion on governance and operational responsibility, such as who controls the edge model, how updates are audited, and how trust is established between IoT devices and the edge, should be provided to tell the practical relevance of the framework without requiring further empirical validation.
Response 4: Agree. A dedicated work on these aspects is currently being conducted.
Comment 5: The current literature is weak. Given the strong emphasis on LLMs and networks, the authors should explicitly relate their approach to broader perspectives on LLMs in networking and networked systems. Please cite and discuss below:
Response 5: Thank you for the resources. We have investigated them and they have been used to reinforce the revised version.
Reviewer 3 Report
Comments and Suggestions for Authors- While SEED is positioned as novel, the manuscript does not sufficiently differentiate it from prior embedding-based or edge-assisted IDS approaches, particularly those using distilled or lightweight BERT variants (e.g., SecurityBERT, federated BERT-based IDS).
- Write a justification for the choice of BERT over other Transformer architectures.
- Communication overhead analysis is insufficiently quantified. Therefore, add a small table or paragraph in Section 5.2 quantifying communication overhead.
- The manuscript correctly notes that embeddings may leak sensitive information, but the discussion remains conceptual. Expand Section 5.2 (limitations) or Section 6 (Ethics and Privacy) with a concrete threat model.
- EdgeBERT is compared implicitly to full BERT in terms of training time, but no side-by-side detection performance table is provided. However, add a comparison table in Section 5.2 or move relevant results to an appendix.
- Terminology inconsistency: “LLM” vs “Transformer”. Clarify early that encoder-only Transformers are treated as LLMs in this work, or consistently use “Transformer-based model”.
- Algorithms 1 and 2 are clear conceptually but omit batch size, optimizer, learning rate and embedding extraction layer (CLS token? pooled output?)
- Add a brief data preprocessing description in Section 5.1.
Author Response
Comment 1: While SEED is positioned as novel, the manuscript does not sufficiently differentiate it from prior embedding-based or edge-assisted IDS approaches, particularly those using distilled or lightweight BERT variants (e.g., SecurityBERT, federated BERT-based IDS).
Response 1: To avoid self plagiarism, we referred to our previous work where more details are provided (ref. 23) and this work is supported both by [10] and [23]. The Embeddings generator derived from those methods.
Comment 2: Write a justification for the choice of BERT over other Transformer architectures.
Response 2: Thank you for this remark, justifications have been provided in the revised version specifically in the penultimate paragraph of the introduction.
Comment 3: Communication overhead analysis is insufficiently quantified. Therefore, add a small table or paragraph in Section 5.2 quantifying communication overhead.
Response 3: Agree. Those are some rooms for improvement we are currently investigating.
Comment 4: The manuscript correctly notes that embeddings may leak sensitive information, but the discussion remains conceptual. Expand Section 5.2 (limitations) or Section 6 (Ethics and Privacy) with a concrete threat model.
Response 4: Thank you for the comment, we agree with that. And this is actually part of our studies, and we intend to dedicate a whole work on the weaknesses of embeddings, that is why we left this as future directions.
Comment 5: EdgeBERT is compared implicitly to full BERT in terms of training time, but no side-by-side detection performance table is provided. However, add a comparison table in Section 5.2 or move relevant results to an appendix.
Response 5: Thank you for pointing this out. Comparing both models would indeed be valuable; however, we did not perform this comparison due to the high computational cost of training the full BERT model. We only ran a preliminary simulation to estimate the order of magnitude of the training time for a single epoch. In this work, we deliberately focus on reduced model variants, as our goal is to achieve faster inference and a lightweight architecture. While the full BERT model would likely yield better performance, it falls outside the scope of our computational and design constraints.
Comment 6: Terminology inconsistency: “LLM” vs “Transformer”. Clarify early that encoder-only Transformers are treated as LLMs in this work, or consistently use “Transformer-based model”.
Response 6: Thank you for the remark. Clarification has been provided as footnote in the introduction section.
Comment 7: Algorithms 1 and 2 are clear conceptually but omit batch size, optimizer, learning rate and embedding extraction layer (CLS token? pooled output?)
Response 7: Elaborations have been provided in the sixth paragraph of the "system overview section"
Comment 8: Add a brief data preprocessing description in Section 5.1.
Response 8: Algorithm 1 has been added to describe the process.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have carefully revised the manuscript in accordance with the previous comments. In its current form, the manuscript may be considered for acceptance.
Comments on the Quality of English LanguageThe English needs refining!
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have made considerable efforts to address the comments, and I am satisfied with the revised manuscript.
