Artificial Intelligence for Cybersecurity in IoT-Edge Systems: A Structured Review of Methods, Datasets, Evaluation, and Deployment Challenges

Xue, Qingshui; Xue, Pandong; Wang, Zhimin; Ma, Haifeng

doi:10.3390/electronics15112409

Open AccessReview

Artificial Intelligence for Cybersecurity in IoT-Edge Systems: A Structured Review of Methods, Datasets, Evaluation, and Deployment Challenges

School of Computer Science and Information Engineering, Faculty of Intelligent Technology, Shanghai Institute of Technology, Shanghai 201418, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(11), 2409; https://doi.org/10.3390/electronics15112409

Submission received: 26 April 2026 / Revised: 28 May 2026 / Accepted: 31 May 2026 / Published: 1 June 2026

Download

Browse Figures

Versions Notes

Abstract

The convergence of the Internet of Things (IoT), edge computing, and artificial intelligence (AI) is reshaping cyber defense in distributed cyber–physical environments. IoT-edge systems expose heterogeneous, resource-constrained, and intermittently connected devices to threats that unfold close to sensing and control processes, making purely signature-based or rule-based defenses increasingly insufficient. This article presents a structured review of AI for cybersecurity in IoT-edge systems from a systems-oriented perspective. Rather than surveying AI for IoT security in general, it organizes the literature around four practical lenses: AI methods, datasets and benchmarks, evaluation practice, and deployment constraints. The review reconstructs a workspace-verifiable corpus of 96 references, emphasizes literature published between January 2023 and April 2026 while retaining foundational benchmark papers, and uses a conservative 26-paper empirical subset for paper-level gap coding. Because this subset was purposively sampled and the original retrieval logs were not preserved, coded counts are interpreted as recoverable reporting signals and comparability indicators rather than field-level prevalence estimates. The revised synthesis further stratifies the coded evidence by task, model family, dataset, application scenario, metric type, and deployment signal, and translates deployment feasibility into a minimum reporting checklist and edge-hardware decision matrix. Within this evidence boundary, recent work remains dominated by intrusion and anomaly detection, with continued use of traditional machine learning, deep learning, federated learning, explainable AI, and graph-based approaches. However, experimentation remains concentrated around a small set of public benchmarks, while latency, memory, energy, communication overhead, operational robustness, and reproducibility are reported inconsistently. The field is therefore constrained less by classifier novelty than by benchmark concentration, weak deployment reporting, limited response-and-mitigation analysis, undercoverage of authentication, access-control, and trust-management tasks, and limited reproducible edge-aware evaluation.

Keywords:

IoT security; edge computing; cybersecurity; intrusion detection; anomaly detection; federated learning; explainable AI; datasets; benchmarking; deployment

1. Introduction

IoT-edge systems expand the attack surface of digital infrastructure in ways that differ substantially from classical enterprise and cloud-centered environments. They combine sensing devices, gateways, edge servers, wireless protocols, local controllers, and cloud backends into a layered but often loosely administered architecture in which devices differ in computation, memory, energy budget, firmware quality, connectivity, update cadence, and physical exposure [1,2]. Reviews of IIoT-edge security and IoT-edge intrusion detection establish that industrial IoT, healthcare IoT, transportation systems, and smart agriculture face sub-second response requirements, cyber–physical safety dependencies, intermittent bandwidth, noisy telemetry, and geographically distributed data generation [2,3,4], while embedded Edge-AI and TinyML work identifies constrained computation, memory, and energy as deployment-level limits [5,6]. Recent broad reviews on intelligent IoT, resilient IoT systems, and AIoT implementation reinforce this architectural and operational view by showing that security, privacy, and deployment feasibility are increasingly inseparable in distributed edge ecosystems [7,8,9,10]. As a result, cyber defense in IoT-edge environments is not only a detection problem but also a systems problem involving architectural placement, model efficiency, communication cost, and operational trustworthiness.

Traditional security mechanisms remain necessary, but they are increasingly insufficient as standalone defenses in these settings. Static signatures and manually engineered rules are effective for known patterns and policy enforcement, yet they struggle with polymorphic malware, botnet adaptation, zero-day behavior, traffic drift, and multi-stage attacks that emerge across distributed nodes, as documented in IoT intrusion-detection reviews [3,11,12,13], federated-security analysis [14], and botnet-focused IoT detection review [15]. The problem is intensified by the fact that IoT traffic is often heterogeneous, application-specific, and imbalanced, which makes handcrafted thresholds brittle and difficult to transfer across domains. Recent IDS and Edge-AI reviews note that many published IoT intrusion detection systems achieve strong laboratory accuracy while still falling short on real-time operation, lightweight execution, explainability, or scalability under realistic load [3,4,16,17,18]. Deployment-strategy and roadmap papers make the same point from a systems perspective by emphasizing implementation feasibility, benchmark realism, and operational integration [19,20,21]. Broader cybersecurity and anomaly-detection surveys echo this criticism, showing that performance-centric comparison still dominates over deployment realism and cross-domain robustness [7,22,23].

These limitations explain why AI has become central to recent IoT-edge cybersecurity research. Supervised and unsupervised machine learning are widely used to separate benign and malicious traffic patterns, deep learning models are adopted to discover temporal and structural attack signatures, federated learning is promoted to preserve privacy and reduce raw-data centralization, and explainable or trustworthy AI is increasingly introduced to support analyst interpretation and operational confidence. Edge-learning surveys support the shift toward AI-enabled distributed defense [24], explainable-AI work supports the need for interpretable cyber decisions [25,26], and federated-learning studies support privacy-preserving collaborative detection [14,27]. Deployment-oriented IDS and ML-based security surveys further show that these methods must be judged against systems constraints as well as predictive performance [18,19]. Recent lightweight transformer-style intrusion detection, feature-optimization studies, and edge-oriented ensembles suggest that the field is also becoming more explicit about compactness and deployment cost, not only predictive power [28,29,30]. At the same time, the deployment of AI close to devices or gateways creates new questions. High accuracy does not guarantee feasibility on constrained hardware. Privacy-preserving collaboration increases communication overhead. Explainability may improve trust, but it can also impose latency and memory cost. Model robustness becomes a first-order requirement when attackers target both the network and the model.

This review is written to address that systems-level gap. Its contribution is not another broad survey of AI in IoT security; instead, it provides a structured synthesis centered on methods, datasets, evaluation, and deployment. Compared with earlier broad reviews of IoT applications, attack surfaces, and AI-enabled IoT trends [31,32,33], the present review intentionally narrows the boundary to IoT-edge cybersecurity and asks what is actually known about deployable AI defense under edge constraints. It should therefore be read as a structured review grounded in a transparent search scope definition, workspace-verifiable corpus reconstruction, and paper-level coding.

This review makes six concrete contributions:

It proposes a two-dimensional taxonomy that links AI method families with deployment objectives in IoT-edge cybersecurity rather than cataloguing models in isolation.
It synthesizes datasets, evaluation practice, and deployment constraints as a connected evidence problem rather than as separate background topics.
It adds a conservative evidence-based gap analysis using a coded empirical subset, exposing how weakly standardized deployment reporting remains across the literature.
It adds evidence-stratified statistics and a deployment reporting framework that connect coded study characteristics to hardware tiers, runtime constraints, and comparable benchmark conditions.
It extends the deployment analysis beyond detection accuracy by adding response-and-mitigation considerations, dataset-bias interpretation, and a cross-cutting matrix of AI method vulnerabilities against IoT-edge threats.
It reframes the field’s central bottleneck from classifier novelty to reproducibility, deployment readiness, and edge-aware evaluation.

Table 1 positions the manuscript against representative review streams and makes this narrower contribution explicit. Four research questions then guide the review. RQ1 asks which AI methods are most commonly used for cybersecurity tasks in IoT-edge systems. RQ2 asks which datasets, benchmarks, and experimental settings dominate the recent literature. RQ3 asks how studies evaluate both detection quality and deployment feasibility. RQ4 asks which limitations continue to prevent trustworthy and deployable AI-driven cyber defense for IoT-edge environments. Figure 1 summarizes the review framework used to connect these questions.

Table 1 shows that prior reviews provide valuable coverage of IoT IDS, edge AI, federated learning, and trustworthy AI, but they usually treat datasets, deployment metrics, mitigation, and AI-model vulnerability as separate concerns. The present review is positioned differently: it asks whether the available evidence is sufficient to support deployable IoT-edge cyber defense claims under resource, communication, robustness, and reproducibility constraints.

2. Background: IoT-Edge Cybersecurity Landscape

The IoT refers to interconnected physical devices equipped with sensing, communication, and sometimes actuation capabilities. Edge computing complements this paradigm by moving parts of storage, analytics, and inference closer to the data source rather than relying exclusively on remote cloud infrastructure [1,2,5]. In practical IoT-edge architecture, sensing devices generate raw signals, gateways or local aggregators perform preliminary filtering and protocol translation, edge nodes execute latency-sensitive processing or local AI inference, and cloud services handle long-term storage, global correlation, and heavier retraining workloads [2,4,5,13]. Recent system-level work on edge AI deployment and IoT-edge cybersecurity frameworks describes the same architecture with stronger emphasis on orchestration, locality, and layered trust boundaries [34,35]. This layered organization reduces latency and network load, but it also creates a distributed trust boundary in which security monitoring must be coordinated across many weak points rather than concentrated at a single perimeter.

Within this architecture, cybersecurity tasks extend beyond classic intrusion detection. The most common task in the literature is traffic-based intrusion or anomaly detection, where models classify flows or sessions as benign or malicious using statistical, packet, or flow-derived features. Earlier IoT IDS reviews define this detection stream and its design choices [3,11,12,13,14], while recent IDS and deployment-roadmap papers show how the same stream is being extended toward edge-aware, lightweight, and operational settings [16,17,18,19,20,21]. Botnet and malware detection form a closely related stream, often centered on Mirai-, Bashlite-, or DDoS-like behaviors in network traffic [15,36,37]. Another stream addresses malicious traffic classification more broadly, including DoS, DDoS, scanning, spoofing, injection, and reconnaissance patterns, usually in multi-class settings over public benchmarks such as Edge-IIoTset and CICIoT2023 [36,38,39,40].

The threat surface is broader than these dominant tasks. Authentication and access control remain critical in distributed IoT-edge environments because device spoofing, rogue node enrollment, and weak trust management can undermine higher-layer analytics before detection even begins [2]. Privacy leakage is equally important because edge analytics often processes sensitive local data in healthcare, industrial control, transportation, and home environments. IoMT-focused analyses show that privacy, data fusion, and security monitoring are deeply entangled in medical edge settings, where the cost of both leakage and misclassification is unusually high [41,42]. Federated learning and other privacy-preserving approaches are increasingly proposed to reduce direct data sharing, although they do not eliminate leakage risk from model updates, metadata, or side channels [40,43,44,45,46]. Finally, AI introduces its own security concerns, including poisoning, adversarial perturbation, concept drift exploitation, and explanation misuse. These threats are especially relevant when models are trained across non-IID clients, updated online, or deployed in high-stakes cyber–physical settings [24,25,26,27].

From an application perspective, the literature clusters around several recurring environments. Industrial IoT and cyber–physical systems dominate deployment-oriented reviews because they combine critical infrastructure, real-time control, and stringent reliability requirements [2,4]. Healthcare IoT and Internet of Medical Things scenarios are another active domain, motivated by privacy sensitivity and patient safety [41,43,47]. Smart transportation, smart agriculture, and smart city settings also appear frequently because they expose heterogeneous edge devices to unstable connectivity and location-dependent traffic characteristics [48,49,50]. Anomaly-detection surveys in broader smart environments show that these scenarios also blur the boundary between cyber anomaly, operational anomaly, and safety anomaly [23]. Across these domains, the recurring systems challenge is the same: how to achieve timely, accurate, and trustworthy security analytics without assuming cloud-like resources or idealized network conditions.

3. Review Methodology

This article adopts a structured review design focused on transparent scope definition, study selection logic, and evidence extraction. The protocol targets literature explicitly positioned at the intersection of IoT or IIoT, edge computing or edge intelligence, cybersecurity tasks, and AI-driven methods. Because this topic evolves quickly, the review prioritizes studies published between January 2023 and April 2026, while retaining earlier foundational surveys and benchmark papers that remain central to the recent experimental landscape, especially TON_IoT, Edge-IIoTset, and N-BaIoT [36,37,38,51].

3.1. Search Scope and Core Search Strings

The search space used to assemble the present corpus covers Scopus, Web of Science, IEEE Xplore, ACM Digital Library, ScienceDirect, SpringerLink, and MDPI. Three core search strings were designed to capture the literature from complementary angles. These search strings are reported here explicitly because the manuscript is positioned as a structured review with transparent search scope rather than as a closed-box narrative synthesis.

Search String 1:

(“IoT” OR “Internet of Things” OR IIoT OR “edge computing” OR “edge AI”) AND

(cybersecurity OR security OR “intrusion detection” OR “anomaly detection”) AND

(“artificial intelligence” OR AI OR “machine learning” OR “deep learning” OR “federated learning”) AND

(review OR survey OR “systematic review”)

Search String 2:

(“IoT-edge” OR “edge-enabled IoT” OR “IoT-edge systems”) AND

(“AI-driven cybersecurity” OR “AI for cybersecurity”) AND

(review OR survey)

Search String 3:

(“intrusion detection” OR “anomaly detection” OR botnet OR malware) AND

(IoT OR IIoT OR “edge computing”) AND

(“machine learning” OR “deep learning” OR “federated learning” OR explainable) AND

(review OR survey)

These core strings were further expanded through terms such as trustworthy AI, privacy-preserving, lightweight, distributed learning, smart grid, healthcare IoT, and industrial IoT when the initial results under-covered deployment, trust, or domain-specific CPS security.

Because the three core strings intentionally privilege review and survey retrieval, representative empirical studies, dataset-originating papers, and deployment-framework papers were added through targeted follow-up searches and backward or forward snowballing from verified surveys, benchmark papers, and domain-specific reviews. These follow-up searches combined the same IoT-edge cybersecurity terms with dataset, benchmark, federated intrusion detection, non-IID evaluation, TinyML, latency, memory, communication overhead, robustness, and explainable intrusion detection. This step was used to build the contextual anchor layer and the conservative coded empirical subset; it is not reported as a retrieval-stage count because the raw database logs were not preserved.

3.2. Inclusion, Exclusion, and Extraction Rules

Studies were retained when they satisfied four conditions: they clearly addressed AI or machine learning for cybersecurity in IoT, IIoT, or edge-enabled IoT environments; they were review papers, benchmark papers, or representative original studies with extractable details on methods, datasets, metrics, or deployment evidence; the full text was available in English; and the security task was relevant to cyber defense rather than general IoT optimization. Studies were excluded when they focused only on general IoT applications without security, security without AI, purely cloud-based settings that ignored edge constraints, or abstract cryptographic schemes detached from operational IoT-edge scenarios. Editorials, posters, and duplicate records were also removed.

To improve practical value, the review did not rely solely on paper titles and abstracts. A quality-oriented extraction form was used to record the security task, AI method family, dataset or testbed, metric set, deployment evidence, preprocessing transparency, and reported limitations of each retained study. This was important because many papers in the area report strong detection performance but omit details on feature extraction, class balance treatment, hyperparameters, or on-device feasibility. Recent performance-evaluation surveys and domain-specific dataset analyses show that exactly these omissions often prevent reliable cross-paper comparison [22,52]. The resulting synthesis therefore prioritizes studies that contribute to at least one of four core themes. Survey and framework papers support the method-taxonomy and deployment-realism layers [2,3,4,5,6,16,17,18,19,20,21], dataset-originating and benchmark-analysis papers support the benchmark-practice layer [36,37,38,51,52], and representative empirical papers support the evaluation-completeness and gap-coding layers, including deep and federated evidence [39,40,43,44,45,46], domain and edge testbeds [48,49,50], and explainability, graph, and optimization studies [53,54,55,56].

3.3. Study Selection and Coding Protocol

To reduce impressionistic narrative synthesis, this review uses a conservative coding layer over 26 representative empirical studies [28,29,30,39,40,42,43,44,45,46,48,49,50,52,53,54,55,56,57,58,59,60,61,62,63,64]. The selection of this coded empirical subset follows a structured purposive sampling protocol rather than ad hoc selection. Broad surveys, background papers, and purely conceptual references were first removed from the coding subset candidate pool. The remaining candidates were then filtered using three rules: (i) they had to be empirical or benchmark-oriented; (ii) they had to be published mainly in 2023–2026; and (iii) they had to contain extractable evidence on at least methods, datasets, and one evaluation- or deployment-related signal. Coverage balancing was then applied to avoid overconcentration in a single method stream and to ensure representation across traditional ML, deep learning, federated learning, graph-based learning, and explainable or trustworthy AI. Redundancy reduction was also applied by avoiding multiple highly similar studies that reused the same dataset-model pipeline combination unless they contributed distinct evaluation, deployment, or reproducibility insight. The final subset is therefore designed for gap mapping rather than for statistical generalization or meta-analysis. For this reason, numerical summaries derived from this 26-paper layer are treated as recoverable reporting signals and cross-paper comparability indicators, not as prevalence estimates for the entire IoT-edge cybersecurity field. Its paper-level matrix is provided as Supplementary Table S1, where Panel A reports descriptive study fields and Panel B reports the coded signals that aggregate into the gap-coding snapshot reported later in this section.

3.4. Workspace-Verifiable Corpus Reconstruction

Because raw database export files are unavailable in the current workspace, the transparent count layer is necessarily limited to the verified manuscript corpus reconstructed here rather than to the full historical retrieval process. Table 2 therefore summarizes the corpus composition and coding layers that can be verified from the present workspace. It should be read as a workspace-verifiable reconstruction of the structured-synthesis evidence base, not as a retrieval-stage flow or a PRISMA-style identification and screening count.

Together, the contextual and coded layers define the 96-paper working corpus used in this structured review. The methodological boundary is therefore explicit: this article should be interpreted as a structured evidence synthesis with transparent scope definition, workspace-verifiable corpus reconstruction, and paper-level coding, rather than as a fully logged PRISMA-style retrieval record.

Within the 70-paper contextual layer, the review also retains a benchmark-and-deployment anchor layer that is itemized explicitly in Supplementary Table S2. These anchor papers include dataset-originating studies, benchmark-critique papers, and domain-specific benchmark analyses [36,37,38,51,52,61,65,66,67,68,69], as well as edge-deployment framing papers [5,34,41,70,71,72,73,74,75,76,77]. They are kept outside the conservative coding layer because they define benchmark provenance, domain realism, or systems constraints rather than providing directly comparable model-evaluation records. Making this anchor layer explicit thickens the evidence base for Section 5, Section 6, Section 7 and Section 8 without weakening the conservative interpretation of the coded snapshot. Additional contextual papers cited in Section 5, Section 6, Section 7, Section 8 and Section 9 are used to thicken the synthesis of domain-sensitive deployment, federated evaluation, robustness, explanation utility, and cross-domain transfer without being folded into the coded snapshot [78,79,80,81,82,83,84,85,86,87,88,89,90,91].

The methodology also recognizes a practical limitation common to this topic. The literature is heavily concentrated on intrusion detection, and papers frequently reuse the same public datasets. Emerging decentralized benchmarks for federated settings make this issue even more visible because they expose how quickly conclusions change when the data-collection topology changes [61]. Therefore, a review that merely counts model types would overstate progress. This manuscript instead treats dataset realism, reproducibility detail, and deployment-aware evaluation as explicit evidence dimensions, not as peripheral concerns. In other words, the review protocol is not used only to assemble papers; it is used to ask whether the field is converging toward deployable IoT-edge cyber defense.

4. Two-Dimensional Taxonomy of AI Methods and Deployment Objectives

Most existing surveys classify the field by model family alone. For the present topic, that one-dimensional view is too coarse. This review therefore uses a two-dimensional taxonomy that crosses AI method families with deployment objectives. The second dimension matters because IoT-edge cybersecurity studies are often motivated not only by predictive performance, but also by lightweight execution, privacy preservation, communication efficiency, interpretability, or robustness. Under this framing, the literature reads as a deployment-oriented design space rather than as a flat model inventory. Figure 2 presents the visual matrix, and Table 3 provides the corresponding textual synopsis.

The first method family is traditional machine learning, including decision trees, random forests, support vector machines, k-nearest neighbors, XGBoost, logistic regression, and related ensemble methods [3,11,16,19,92]. These models remain important because they are often more computationally affordable than deeper architectures, easier to train on tabular flow features, and easier to deploy at gateways or constrained edge servers. Their dominant deployment objective is usually lightweight execution. Recent optimization-oriented studies comparing feature selection, feature extraction, and simple statistical filtering show that careful preprocessing can substantially improve lightweight ML pipelines without requiring deep architectures [28,62,64]. In many benchmark studies they still provide strong baselines and, in some cases, competitive performance under binary or small multi-class intrusion detection tasks. Their limitations are equally clear: they depend heavily on hand-engineered features, can be brittle under domain shift, and often struggle to model complex temporal or relational behavior.

The second family is deep learning. Convolutional networks, recurrent networks, LSTMs, GRUs, autoencoders, hybrid CNN-RNN pipelines, and more recent transformer-like models dominate the literature when the goal is to capture sequential traffic dependencies or higher-dimensional patterns [3,14,16,17,18,19,20,21,39,40]. Autoencoders are particularly common in anomaly and botnet detection because they can be trained on benign traffic and then detect deviations. CNN-LSTM and related hybrids are frequently used to exploit local and temporal structure simultaneously. Recent methods also include explanation-aware attack-detection pipelines, edge-oriented ensembles, lightweight DDoS-oriented deep models, privacy-preserving BERT-style architectures, and explainable transformer designs for IoMT traffic [29,30,42,57,59]. In this family the deployment objective varies more strongly: some papers remain primarily accuracy-driven, while others explicitly target lightweight execution or explainability. The recent appearance of vision-transformer-inspired IDS models illustrates the field’s tendency to import high-capacity architectures from mainstream AI. However, the resource cost of these models raises direct deployment questions, especially when evaluations do not report inference latency, FLOPs, model size, or memory footprint [3,5,6,40].

The third family is federated and distributed learning. These approaches are motivated by a core IoT-edge tension: security analytics benefits from data diversity, but shipping raw device data to a central server can violate privacy, overload the network, or be administratively infeasible. Federated learning addresses this by keeping data local and aggregating updates instead [14,27,43,44,45,46,48,49,53,93]. Recent work extends this line toward privacy-preserving aggregation, blockchain-assisted coordination, and personalized or communication-efficient FL. In deployment-objective terms, this family is primarily privacy-driven and communication-sensitive. Personalized distillation-based FL and newly introduced decentralized crowdsensing benchmarks suggest that future federated IDS evaluation will have to account more explicitly for heterogeneity, personalization, and decentralized data topology [60,61]. For IoT-edge cybersecurity, federated approaches are attractive not only because of privacy claims, but also because they align structurally with distributed sensing infrastructures. Their weakness is that they add communication overhead, are sensitive to non-IID client distributions, and can be difficult to evaluate convincingly without realistic multi-client edge experiments.

The fourth family is graph-based and representation-learning methods. Compared with classical tabular IDS pipelines, graph neural networks and traffic-graph representations model relationships among nodes, flows, and temporal snapshots more explicitly [56]. This is conceptually appealing in IoT-edge environments, where attacks often propagate through communication structure rather than isolated feature vectors. Their main deployment objective is usually structural fidelity rather than lightweight execution. Although graph-based IoT security research is still smaller than DL and FL streams, it is important because it addresses a known weakness of many benchmark-driven classifiers: they treat flows as independent examples even when attack behavior is inherently relational. The current limitation is that graph construction choices, temporal windowing, and scalability under large dynamic traffic graphs are not yet standardized.

The fifth family covers explainable, trustworthy, privacy-preserving, and hybrid AI. These methods are not a single model class but a cross-cutting design orientation. Explainable AI is used to interpret alert generation with tools such as SHAP and LIME or by designing intrinsically interpretable detection logic [25,26,54,55]. Privacy-preserving AI appears mainly through federated learning, secure aggregation, differential privacy, homomorphic encryption, or blockchain-assisted coordination [40,43,44,45,46]. Trustworthy AI is increasingly discussed at the level of system architecture as well, including digital-twin cybersecurity analytics and edge-IoT conceptual frameworks that integrate explanation, governance, and operational trust [35,94]. In deployment-objective terms, this family is interpretability-driven or robustness-driven rather than purely accuracy-driven. Trustworthy AI is reflected in calls for robustness testing, reproducibility, fairness, and operational interpretability [3,18,19,21]. This family is strategically important because the field is gradually shifting from “Can a model classify attacks?” to “Can it be trusted, audited, and maintained in a real IoT-edge system?”

5. Cybersecurity Tasks and Application Scenarios

The most mature body of work still concerns intrusion detection and anomaly detection. This is unsurprising because flow-level classification maps naturally to benchmark datasets and to supervised or semi-supervised learning pipelines. IDS reviews define this pattern at the survey level [3,11,12,13,14,16,17,18,19,20,21], benchmark-originating papers provide recurring evaluation substrates [36,37,38,51], and recent empirical IDS studies instantiate it in centralized, federated, edge, and domain-specific settings [39,40,48,49,50,53,54,55]. In most cases, the model observes packet- or flow-derived features and predicts whether an instance is benign or malicious, sometimes with a finer attack-family label. More recent studies show this space fragmenting into anomaly-first pipelines, explanation-aware detection, and edge-oriented ensembles, rather than a single homogeneous IDS design pattern [29,57,58]. The literature increasingly distinguishes between binary detection, attack-family classification, and fine-grained multi-class classification because high binary accuracy can obscure poor minority-class performance or unreliable attack attribution [3,38]. Recent reviews argue that the field should move away from purely algorithmic comparison on static traces and toward real-time, adaptive, and lightweight IDS architectures [3,4,21].

At a higher level of synthesis, three recurrent task-method configurations now stand out. First, public-benchmark intrusion detection remains dominated by tabular ML and compact DL pipelines that optimize predictive performance over flow or packet features [28,29,39,40,62,63,64]. Second, privacy-sensitive and multi-site scenarios, especially in transportation, IIoT, and IoMT, increasingly favor federated or distributed learning because data locality and administrative fragmentation are built into the application setting [43,44,45,46,48,49,53,60]. Third, regulated or semantically rich environments increasingly attract explanation-aware, graph-based, or hybrid designs that try to preserve analyst interpretability or relational structure rather than only improving a headline classifier score [42,54,55,56,57]. This clustering is one reason the field can no longer be described adequately as a flat catalogue of models.

Botnet and malware detection form the second major application cluster. Legacy but still influential datasets such as N-BaIoT, together with more recent large-scale traffic datasets, keep this task highly visible in the literature [15,36,37]. Botnet detection is appealing to researchers because it provides a clear malicious-behavior target and because botnets remain operationally important in IoT-driven DDoS campaigns. GAN-oriented intrusion-detection surveys and lightweight DDoS studies suggest that data augmentation, synthetic sample generation, and compact detection pipelines are becoming more visible in this subfield, especially when attack rarity or fast mitigation matter [59,95]. However, many studies focus on the late-stage traffic manifestations of compromise rather than the earlier phases of infection, propagation, or command-and-control adaptation. This creates a gap between benchmark success and complete lifecycle defense.

DDoS and malicious traffic detection are sometimes treated as separate tasks and sometimes folded into intrusion detection. In practical terms, the literature is heavily skewed toward traffic-based recognition of DDoS, scanning, spoofing, and reconnaissance attacks, especially in IIoT and smart-city-like environments [2,20,38,39]. Lightweight DDoS mitigation for 5G-connected IoT and other streamlined deep models illustrate why this task remains operationally central: latency and packet-volume stress can collapse edge service quality before a heavier IDS pipeline produces a response [59]. These tasks are particularly relevant to edge environments because volumetric traffic and attack bursts can quickly saturate bandwidth, overwhelm gateways, and violate latency budgets. Yet many studies still report only accuracy-like outcomes and leave unresolved how the model would behave under sustained load, streaming drift, or partial connectivity.

Detection is therefore only the first stage of an IoT-edge cyber-defense loop. For cyber–physical systems, the operational question is what happens after an alert is produced at a sensor, gateway, or edge server. A deployment-ready system should specify whether the model output triggers local rate limiting, flow blocking, device quarantine, access-token revocation, re-authentication, route change, model-update throttling, or escalation to a cloud or human analyst. The mitigation action also changes the evaluation target: a detector used only for offline forensics can tolerate higher delay, whereas a DDoS filter at a gateway or an IoMT alarm must report end-to-end response time, false-positive cost, and rollback behavior. This review therefore treats mitigation readiness as a distinct dimension from detection accuracy. The coded corpus shows that mitigation is usually discussed at the level of motivation, while formal response protocols, safety fallback rules, and edge-local actuation constraints remain weakly specified.

Authentication, access control, and trust management are less prominent in the AI-driven literature, but they remain part of the cybersecurity problem space. Several reviews note that IoT-edge deployments suffer from weak device identity, decentralized enrollment, and limited trust establishment across heterogeneous administrative domains [2,35]. AI is sometimes introduced to support behavioral trust scoring or adaptive access decisions, but this stream is much smaller than intrusion detection research. This imbalance matters because a system that detects malicious traffic well but authenticates devices poorly can still fail operationally. Accordingly, the cybersecurity scope of this review should be read as evidence-weighted rather than uniformly comprehensive: the strongest recoverable empirical layer concerns AI-driven intrusion and anomaly detection, while authentication, access control, and trust management function mainly as boundary-setting and future-coding tasks that constrain how far IDS-heavy findings can be generalized.

Application scenarios reveal how deployment constraints shape method choice. Industrial IoT and smart grid contexts emphasize cyber–physical safety, low latency, and operational continuity, pushing the literature toward lightweight or distributed models [2,4]. Healthcare IoT adds privacy sensitivity and regulatory pressure, which helps explain the growth of privacy-preserving federated approaches and explainability-oriented designs [41,42,43,46,47]. Smart agriculture and transportation studies are useful because they expose edge intelligence to environmental instability, mobility, and intermittent connectivity rather than idealized lab networks [49,50]. Taken together, these scenarios show that the field is not moving toward a single universal IoT security model; it is moving toward domain-sensitive AI pipelines whose usefulness depends on deployment constraints as much as on detection accuracy.

Recent domain-focused studies make this scenario dependence more explicit. Medical IoT work increasingly treats adaptive cyber–physical response, workflow sensitivity, and device-level clinical context as part of the threat-detection problem rather than as downstream deployment detail [89]. Industrial IoT studies are pushing in a similar direction by evaluating federated IDS pipelines against latency, privacy, and coordination constraints that arise from distributed industrial control rather than from generic IoT assumptions [90,91]. Domain-specific benchmark resources such as IoMT-TrafficData further reinforce that realistic task design depends on the application environment, not just on the attack label set [76].

6. Datasets, Benchmarks, and Experimental Settings

Recent literature repeatedly centers experimentation on a small set of public benchmarks. TON_IoT, Edge-IIoTset, CICIoT2023, and N-BaIoT recur across both surveys and original studies [3,20,36,37,38,51]. Their prominence is understandable: they lower data-collection cost, support numerical comparison, and span complementary parts of the problem space. TON_IoT combines telemetry, operating-system, and network-level observations; Edge-IIoTset is explicitly framed for both centralized and federated learning in IoT and IIoT contexts; CICIoT2023 offers a large, real-time-oriented benchmark with binary, grouped, and fine-grained attack labels; and N-BaIoT remains a reference point for botnet-oriented anomaly detection. Beyond this dominant quartet, recent work has given greater visibility to domain-specific IoMT datasets, decentralized crowdsensing benchmarks for federated settings, and explainable medical intrusion-detection evaluation contexts [42,52,61].

This concentration, however, creates a methodological bottleneck. Many papers evaluate on one or two public datasets and then treat higher benchmark accuracy as evidence of operational readiness. Recent reviews increasingly question this pattern because it can align the research conversation with benchmark idiosyncrasies rather than with deployable security requirements [2,3,4,18,19,20,21]. The issue is not dataset reuse by itself; the sharper problem is that preprocessing choices, class regroupings, and train-test splits are often underdescribed, which makes fair reproduction and cross-paper comparison difficult. Benchmark-heavy recent studies using salp-swarm optimization, generic IoT security frameworks, or simple statistical feature selection illustrate how strongly reported outcomes can depend on feature handling and experimental setup details [62,63,64]. In many cases, authors report aggregate performance without clearly describing feature standardization, sampling strategy, label collapsing, or the treatment of rare attack classes.

Class imbalance is another recurring issue. Datasets such as CICIoT2023 explicitly expose severe imbalance across benign and malicious classes or across fine-grained attack types [38]. Under such conditions, accuracy can remain high even when minority-class detection is poor, which is especially dangerous for security analytics where rare attacks may be the most operationally important. The literature increasingly responds by emphasizing F1-score, recall, class-wise performance, and false positive behavior, but the reporting is still inconsistent [3,20]. Several papers continue to optimize for easily improved headline metrics rather than for stable minority-class detection under realistic priors.

The dominance of CICIoT2023 and Edge-IIoTset also affects the direction of the coded evidence. These benchmarks are useful because they provide public attack labels and repeatable experimental substrates, but they can overweight traffic patterns, device mixes, class distributions, and preprocessing conventions that are specific to their collection environments. CICIoT2023, for example, is valuable for large-scale and real-time-oriented IoT attack evaluation, yet its fine-grained class distribution makes macro-averaged and minority-class reporting essential. Edge-IIoTset is valuable because it explicitly supports centralized and federated learning settings, yet repeated reuse can make edge-readiness appear stronger than it is when studies do not vary client topology, hardware tier, communication budget, or class regrouping. Thus, the imbalance observed in the 26 coded papers should be interpreted in two ways: it reflects the current literature’s dependence on a small benchmark core, and it exposes the limits of drawing general IoT-edge deployment conclusions from benchmark-centered evidence alone.

Experimental setting is part of the evidence base as well. Many studies invoke edge relevance while still training and testing under conventional offline conditions, with little evidence of actual on-device or gateway execution. Reviews and deployment frameworks establish this as a recurring systems-level gap [2,3,4,5,6,16,17,18,19,20,21]. Recent empirical work illustrates it across deep or federated IDS studies [39,40,43,44,45,46], edge and domain testbeds [48,49,50], and explainability, graph, or optimization-oriented studies [53,54,55,56]. Edge or federated experiments are often simulated with a limited number of clients, homogeneous hardware assumptions, or simplified communication settings. Newly introduced decentralized crowdsensing datasets and personalized federated IDS proposals make this limitation easier to see because they treat client diversity and network topology as benchmark properties rather than as background assumptions [60,61]. This does not invalidate such studies, but it does narrow the strength of the deployment inferences that can reasonably be drawn. A federated IDS evaluated across a few synchronized clients on a laboratory network remains quite different from geographically dispersed IoT nodes operating under packet loss, client churn, non-IID data, and energy constraints.

The literature also contains a substantial number of synthetic, private, or custom datasets. These are often necessary because collecting field data from industrial or medical IoT systems is difficult and sensitive. However, private datasets reduce comparability, while synthetic traffic may fail to reflect realistic attacker adaptation, protocol diversity, and normal operational noise [2,3,47]. Domain-specific IoMT dataset analyses reinforce the point that medical, industrial, and decentralized edge settings often need specialized data collection rather than blind reuse of generic network traces [52,61]. The right conclusion is not that private data are unusable, but that benchmark quality in IoT-edge cybersecurity should be judged by representativeness, transparency, and reproducibility rather than by public availability alone.

Recent additions to the benchmark literature make this point more concrete. A large-scale survey of network intrusion datasets shows that dataset popularity, documentation quality, labeling errors, and domain specificity are now first-order concerns rather than secondary caveats [66]. Newer benchmark-producing papers also reveal where the field is trying to move next. TriHID is explicitly designed for heterogeneous domain-adaptation evaluation in IoT intrusion detection, ASEADOS-SDN-IoT targets realistic SDN-IoT traffic and control-plane observability, and a recent smart-home dataset adds multi-stage attack structure that is largely absent from older single-stage traces [67,68,69]. Domain-focused IoMT security reviews likewise argue that dataset suitability cannot be judged only by attack labels; it must also reflect device class, care workflow, architecture, and deployment context [74]. Together, these studies strengthen the conclusion that benchmark evidence in IoT-edge cybersecurity is no longer just a matter of which dataset is popular, but of whether benchmark design actually matches the deployment claim being made.

Supplementary Table S2 makes the benchmark layer behind this section more explicit. Read together, those anchor papers show that benchmark evidence enters the field through three distinct roles: dataset-originating papers that define the training and testing substrate [36,37,38,51], domain-sensitive benchmark critiques that question whether generic traces represent medical or decentralized edge environments [52,61], and deployment-framing studies that explain why hardware constraints, locality, and model compression should influence benchmark interpretation in the first place [5,34,41,65]. This separation matters because benchmark creation, benchmark reuse, and benchmark criticism contribute different kinds of evidence even when they are all cited in the same review.

Recent comparative and dataset-centric studies strengthen this point by showing that benchmark choice is not a background variable. Cross-dataset comparisons of deep learning IDS models across BoT-IoT, CICIoT, TON_IoT, and WUSTL-IIoT-2021 already show that ranking stability depends strongly on data balance and benchmark composition [88]. Domain-specific benchmark work in IoMT extends the same argument by providing healthcare-centered traffic and tooling rather than forcing evaluation through generic network traces [76]. A recent federated dataset-centric evaluation goes further by harmonizing labels across Edge-IIoTset, CIC-IoT2023, and TII-SSRC-23 and then showing how strongly cross-environment performance can degrade once models leave their home benchmark [77]. Together, these papers sharpen the review’s central point: benchmark evidence is not only about coverage, but about whether the evaluation design meaningfully supports transfer and deployment claims. Table 4 summarizes the dataset and benchmark issues discussed in this section.

To connect the dataset discussion to reporting quality, Table 5 summarizes a conservative coding snapshot over 26 representative empirical studies already included in the present corpus [28,29,30,39,40,42,43,44,45,46,48,49,50,52,53,54,55,56,57,58,59,60,61,62,63,64]. The coding captures only signals that are explicit in the paper-level extraction layer prepared for this review; when a deployment or reproducibility detail could not be recovered consistently, it was recorded as not reported rather than inferred. Under this lens, only five of 26 studies provide explicit edge or deployment framing, seven more are only partially edge-relevant, four provide even partial real-device or gateway grounding, five report cross-dataset validation, and only one exposes a reusable artifact signal. Supplementary Table S1 provides the audit trail behind these counts: Panel A records the descriptive fields for each coded study, and Panel B records the coded signals that aggregate into Table 5. The coded snapshot therefore supports the dataset discussion by showing how quickly comparability weakens once deployment and reproducibility fields are inconsistently reported.

Read alongside Supplementary Table S2, Table 5 also reveals an asymmetry that is easy to miss in thinner reviews: reproducibility-supporting assets are concentrated more in benchmark-producing papers than in model-evaluation papers. The field therefore has some infrastructure for shared datasets, but far less standardized reporting on what happens when those datasets are turned into deployment claims. The deployment checklist and hardware decision matrix developed below are derived from this gap pattern: because Table 5 and Table 6 show absent or inconsistent latency, memory, energy, communication, device-grounding, robustness, and artifact signals, the later checklist and decision matrix translate missing reporting fields into minimum comparable evidence requirements and hardware-tier decision rules. They are normative reporting aids grounded in the coded evidence gaps, not claims that the coded studies already satisfy those criteria.

7. Evaluation Metrics

The evaluation landscape of AI-driven IoT-edge cybersecurity remains more mature for detection quality than for deployment quality. Earlier reviews and recent IDS trend papers identify accuracy, precision, recall, F1-score, ROC-AUC, confusion matrices, and false positive rate as the dominant metric vocabulary [3,11,12,13,14,15,20,22,23]. The same pattern is visible across IDS and Edge-AI surveys [16,17,18,19,20,21,24], benchmark papers [36,37,38,39,40,51], federated studies [43,44,45,46], domain and edge studies [48,49,50,52], and explanation, graph, or optimization studies [53,54,55,56]. These metrics are useful and necessary, but their interpretation depends strongly on class balance and on the operational cost of mistakes. Performance-evaluation surveys and anomaly-detection reviews increasingly caution against interpreting such metrics in isolation from class structure, thresholding, and operating context [22,23]. In edge environments, a false positive can consume scarce compute, trigger unnecessary mitigation, or degrade service availability, while a false negative may expose vulnerable actuators or local control loops to real attack traffic. This is why the strongest recent reviews recommend more deliberate reporting of class-wise metrics and minority-class performance, especially on imbalanced datasets [3,20].

The asymmetry becomes clearer in the coded evidence summarized in Table 5. Within the representative empirical subset, explicit paper-level signals for latency, memory, energy, and communication overhead are absent from the current extractable evidence layer, even though these variables are often invoked to justify federated, lightweight, or edge-oriented methods. The problem is therefore not only sparse reporting, but also weak reporting standardization across papers.

Table 7 organizes the evaluation landscape into two complementary layers: detection-quality metrics and deployment-quality indicators. The distinction is essential because detection metrics alone do not establish whether a model can operate where it is needed. In IoT-edge systems, deployment-oriented indicators such as inference latency, end-to-end response time, memory footprint, CPU utilization, model size, energy consumption, communication overhead, and the feasibility of continuous or near-real-time execution on edge hardware are equally important [2,3,4,5,6,40,43,44,45,46,48,49]. Recent edge AI, TinyML, and AIoT implementation surveys reinforce this point by showing that model compression, quantization, hardware-software co-design, and low-precision inference are not optional engineering details but core determinants of deployability on constrained devices [5,6,10,34,65]. Federated learning papers illustrate the same gap from another angle: they often emphasize privacy preservation and collaborative training but only partially quantify the cost of local training, synchronization, update transmission, or performance degradation under non-IID client behavior [27,40,43,44,45,46,53,93]. Recent personalization-oriented FL and decentralized crowdsensing work further suggest that communication cost and client variability should be treated as first-class evaluation outputs rather than side remarks [60,61]. Newer federated IDS studies make this pressure more concrete. FedIoV evaluates real-time intrusion detection in vehicular settings where latency, communication, and energy become part of the security argument, Fed-FeRe reduces transmission cost through feature reduction, FD-IDS treats knowledge distillation as a way to stabilize performance under non-IID client partitions, and G-PFL-ID evaluates personalized graph-based intrusion detection under both Dirichlet and device-level heterogeneity [78,79,80,81]. These papers do not yet establish a shared reporting standard, but they show that communication cost, client partitioning, and personalization overhead can be treated as evaluation variables rather than implementation footnotes. The implication for this review is straightforward: IoT-edge cybersecurity evaluation must remain two-dimensional, with models judged by both detection performance and operational plausibility under edge constraints.

Table 8 summarizes the minimum deployment-reporting fields proposed for comparable IoT-edge AI cybersecurity studies.

Table 9 then links edge-deployment tiers to suitable AI-family choices and minimum evidence expectations.

8. Deployment Challenges in IoT-Edge Environments

Deployment difficulty begins with resource scarcity. The coded evidence in Table 5 shows little hardware-grounded validation: only four of 26 coded studies provide even partial real-device or gateway support. These counts should not be read as proof that deployment measurements never exist in full texts; rather, they indicate how limited and non-comparable deployment grounding remains in the current evidence layer. Against that backdrop, edge gateways and embedded devices often operate with limited memory, modest CPUs, restricted energy budgets, and no accelerator support. A model that performs well on a workstation using rich flow features may be impractical once the full preprocessing chain and runtime memory requirement are considered. This helps explain why lightweight traditional ML remains competitive in parts of the literature and why edge AI, TinyML, and AIoT implementation surveys devote sustained attention to quantization, pruning, compact architectures, and hardware-aware optimization [5,6,10,34,65].

Recent deployment-oriented work sharpens this argument by moving from general edge-AI discussion to concrete resource-fit questions. Studies on FL-TinyML convergence, edge-native ML enablement, TinyML attestation, and TinyML for IIoT increasingly foreground memory ceilings, power budgets, secure update paths, and low-overhead on-device execution as explicit design constraints rather than afterthoughts [70,71,72,73]. Domain-specific architectures for IoMT and trusted smart-building intrusion detection extend the same logic by tying edge placement, privacy-preserving collaboration, and trustworthy execution to the security argument itself [74,75]. In other words, the most useful recent deployment literature does not ask only whether a detector can run near the edge; it asks under what hardware, trust, and lifecycle conditions that claim remains credible.

Real-time operation is a second barrier. Many IoT-edge security tasks matter only if detection is fast enough to support mitigation. Traffic classification performed seconds or minutes after the fact may still inform forensic analysis, but it does not satisfy the needs of industrial control, healthcare alarms, vehicle systems, or time-sensitive cyber–physical loops [3,4]. Lightweight DDoS models, edge-targeted ensemble pipelines, and privacy-preserving compact transformer-style detectors illustrate one response to this pressure [29,30,59]. Even so, runtime-oriented claims still rarely enter the literature in a form that enables direct comparison across studies. Real-time requirements also complicate explainability: analysts may want a clear rationale for an alert, yet per-event explanation can impose non-trivial cost. The relevant question is therefore not whether explanation is useful, but how explanation fidelity and efficiency can be balanced under operational constraints [25,26,54,55,94].

Heterogeneity is a third challenge. IoT-edge systems rarely consist of identical clients. Devices differ in protocol mix, traffic profile, computational power, and attack exposure. This heterogeneity undermines the assumption of identically distributed training data and makes transferability difficult. Federated learning is attractive because it embraces distributed data ownership, but it also exposes problems such as non-IID drift, client imbalance, unstable participation, and heterogeneous local objectives [27,40,43,44,45,46,48,53,93]. Recent resilience studies, personalized FL methods, and decentralized crowdsensing benchmarks all push in the same direction: heterogeneity is not noise around the problem, but part of the problem definition itself [9,60,61]. Recent non-IID intrusion-detection studies sharpen this point further: FD-IDS couples federated training with knowledge distillation under heterogeneous partitions, G-PFL-ID uses both Dirichlet and natural device partitions, and pFedCross treats personalization as necessary for stable intrusion detection under local skew [80,81,82]. Personalized or clustered FL, adaptive aggregation, and communication-efficient designs are therefore becoming more relevant than naive global averaging.

Intermittent connectivity and model drift further complicate maintenance. Edge devices can disconnect, change workload, or operate in environments whose benign baseline evolves over time. A model trained once on a public dataset may become unreliable after deployment because normal behavior changes, sensor configurations shift, or attackers adapt. Yet long-horizon maintenance is poorly represented in published evaluations, which are typically offline and static [3,21]. Resilient-IoT and AIoT implementation reviews suggest that lifecycle adaptation, not just pointwise detection, should be treated as part of the deployment problem [9,10]. This is a serious gap because the edge setting makes continual adaptation both more necessary and more difficult.

Privacy and regulatory requirements constitute another deployment constraint. In healthcare, smart homes, transportation, and industry, raw telemetry can be operationally sensitive or legally protected. Federated learning and privacy-preserving training techniques respond to this need, but they do not eliminate governance challenges around update leakage, auditability, data retention, or trust between participants [40,43,44,45,46]. IoMT-specific reviews and edge-IoT cybersecurity frameworks make clear that these constraints are domain- and governance-specific rather than generic privacy labels [35,41]. In short, privacy-preserving edge AI is not only a modeling problem; it is an orchestration and systems-design problem.

Finally, adversarial robustness remains underdeveloped relative to its importance. IoT-edge cybersecurity studies frequently assume honest training data and benign client behavior even when the deployment scenario clearly includes active attackers. Poisoning, evasion, adversarial perturbation, and explanation gaming are particularly important in collaborative or online learning systems [24,25,26,27]. Targeted defenses are beginning to appear, but not yet as a common evaluation layer. WeiDetect evaluates server-side poisoning defense for federated NIDS, while pFedCross frames robustness under personalization and non-IID data as part of model design rather than as a post hoc stress test [82,83]. Recent trustworthy-cybersecurity and GAN-based IDS surveys suggest that the field is beginning to treat synthetic data generation, attack realism, and model resilience as connected issues rather than isolated topics [94,95]. The literature increasingly acknowledges this, but robustness testing is still not a standard part of evaluation. This is one reason why trustworthy AI has become a useful umbrella concept for the field: it captures the need to move beyond static benchmark accuracy toward resilience, transparency, maintainability, and secure operation.

The AI model itself should also be treated as part of the IoT-edge attack surface. Traditional ML pipelines can be manipulated through feature spoofing, threshold gaming, or preprocessing mismatch. Deep models can be vulnerable to adversarial perturbations, distribution shift, and overconfident behavior on unseen traffic. Federated learning reduces raw-data centralization but creates new risks through poisoned client updates, model inversion, update leakage, free-riding, and client-selection bias. Graph-based methods can be affected by topology poisoning, Sybil nodes, and unstable graph-window construction. Explainable AI can improve analyst trust, but explanation outputs may leak sensitive feature patterns or provide feedback to adaptive attackers. Compression, pruning, and quantization do not have a uniform security effect: they may remove brittle parameters and reduce attack surface in some cases, but they may also reduce margin, amplify discretization artifacts, or make rare-class behavior less stable. For this reason, an “edge-optimized” model should not be assumed to be more robust than a cloud-scale model unless robustness is explicitly tested under the same deployment tier. Table 10 summarizes this cross-cutting mapping across AI method families and IoT-edge threat types.

Figure 3 summarizes the transition from benchmark-efficient study patterns to the evidence layers required for deployment-ready IoT-edge defense.

9. Research Gaps and Future Directions

The evidence reviewed in Section 6, Section 7 and Section 8 supports eight sharper judgments about the state of the field.

Judgment 1 is that the literature is benchmark-rich in appearance but benchmark-concentrated in practice. The field uses many dataset names, but actual experimentation is concentrated on a narrow subset of public benchmarks. This creates the illusion of comparability while hiding weak cross-dataset generalization. Recent IoMT and decentralized benchmark papers reinforce the need for domain-specific and topology-aware evaluation, while benchmark-heavy optimization studies show how strongly outcomes depend on preprocessing detail [52,61,62,63,64]. Future work should therefore evaluate models across heterogeneous datasets with harmonized feature semantics and clearly documented preprocessing rather than optimizing repeatedly on a single benchmark [3,4,20,21,36,37,38,51,56,96].

Judgment 2 is that most claimed edge-ready defenses remain benchmark-efficient rather than deployment-ready. A large share of published papers invoke edge suitability without reporting hardware-level evidence, communication cost, sustained runtime behavior, or failure under client churn and intermittent connectivity [3,4,5,6,18]. Recent edge AI, resilient-IoT, TinyML, and AIoT implementation surveys all show that hardware-software co-design and operational context matter much earlier in the pipeline than most cyber-defense papers acknowledge [9,10,34,35,65]. Future studies should report concrete device or gateway targets, runtime profiles, energy cost, and system load behavior, and should distinguish clearly between simulation, emulation, and field deployment.

Judgment 3 is that preprocessing and experimental design are still underreported enough to distort comparison. IoT-edge cybersecurity papers often describe model architecture in detail but provide only partial information about class balancing, feature engineering, split strategy, threshold selection, or reproducibility assets. Recent feature-selection, swarm-optimization, and benchmark-centric studies make this problem easy to see because ostensibly small preprocessing decisions can shift both accuracy and ranking among methods [28,62,63,64]. Standardized reporting checklists for dataset handling, feature pipelines, and deployment assumptions would significantly improve the field’s methodological quality.

Judgment 4 is that edge-aware evaluation is still missing as a shared norm. Accuracy, precision, recall, and F1-score remain essential, but they are not sufficient for systems deployed in resource-constrained environments. Future benchmarking should require at least a minimal deployment profile containing latency, memory footprint, communication overhead, and energy or compute estimates. Without this, the literature will continue to reward models that are numerically strong but operationally implausible [2,3,4,5,6].

Judgment 5 is that response and mitigation are not yet integrated deeply enough into AI-for-IoT-edge cybersecurity evaluation. Many studies correctly motivate intrusion detection as a way to protect cyber–physical loops, but they stop at the alert-generation stage. Future work should specify the response class triggered by a model output, the delay between observation and action, the false-positive cost of the action, and the fallback behavior when confidence is low or connectivity is degraded. This is particularly important for DDoS filtering, IoMT alarms, industrial control, and transportation systems, where a correct decision that arrives too late may not be operationally useful.

A related scope limitation is that authentication, access control, and trust management remain under-coded relative to IDS and anomaly-detection tasks. Future reviews should therefore build dedicated extraction fields for device identity, enrollment, authorization policy, trust scoring, revocation, and cross-domain trust negotiation rather than treating these tasks as background security context. This limitation constrains the title-level cybersecurity scope of the present synthesis: its strongest evidence supports AI-driven detection and deployment-readiness claims, while non-IDS security tasks identify important but less recoverable evidence layers.

Judgment 6 is that robustness appears to be discussed more often than it is tested. Within the coded subset summarized in Table 5 and Supplementary Table S1, no study yields an explicit robustness-evaluation signal recoverable under the present protocol, and only one yields a partial robustness-centered signal [64]. Adversarial robustness, poisoning resistance, update integrity in federated learning, and explanation stability are therefore still weakly operationalized even though the threat model clearly justifies them [24,25,26,27,40,43,44,45,46]. Direct robustness-oriented studies remain sparse. pFedCross and WeiDetect are notable because they operationalize non-IID robustness and poisoning defense in federated intrusion detection, but they remain exceptions rather than a shared evaluation norm [82,83]. Recent trustworthy-cybersecurity, personalized federated, and GAN-oriented reviews suggest that robustness should be measured not only against perturbed inputs, but also against update corruption, synthetic-data artifacts, and deployment-induced instability [60,94,95]. This is especially important for safety-critical IoT-edge systems where false assurance can be more dangerous than openly limited performance.

Judgment 7 is that explainability appears to be becoming a design requirement, but not yet an evaluated property. As reflected in Table 5 and Supplementary Table S1, four coded papers provide partial explanation-centered evaluation signals [42,54,55,57], yet none treats explanation utility beyond visualization as a standardized quantitative endpoint recoverable under the present protocol. Recent work increasingly includes SHAP, LIME, or XAI-oriented design, but explanation usefulness is still seldom evaluated with criteria such as fidelity, stability, cognitive usefulness, or runtime overhead [25,26,54,55]. Recent IoT, IoMT, and connected-vehicle studies increasingly integrate SHAP/LIME into real-time smart-home, multi-class cross-dataset, medical, and federated vehicular IDS pipelines, yet they still emphasize interpretive display more than formal utility measurement [84,85,86,87]. Digital-twin cybersecurity work and explainable transformer studies in IoMT environments show how explanation is increasingly expected to support operational and domain-specific interpretation, not just generic feature attribution [42,94]. The field therefore needs clearer criteria for explanation fidelity, stability, cognitive usefulness, and runtime overhead in edge settings.

Judgment 8 is that cross-domain transfer is now a more important bottleneck than yet another classifier variant. A deployable IoT-edge cyber defense system should not depend on the peculiarities of one dataset, one lab topology, or one application domain. Domain-specific IoMT datasets, decentralized crowdsensing benchmarks, and personalized federated IDS methods all indicate that cross-domain transfer will remain weak unless data topology and application semantics are made explicit parts of the evaluation design [52,60,61]. Direct non-IID studies reinforce this conclusion: FD-IDS evaluates knowledge distillation under heterogeneous client partitions, G-PFL-ID tests both Dirichlet and natural device-level skew, and pFedCross explicitly treats local distribution shift as a central design constraint [80,81,82]. Dataset-centric federated benchmarking now points in the same direction by showing that transfer can collapse when models are moved across nominally related IoT environments without harmonized labels, feature spaces, or attack coverage [77]. The next stage of the field is therefore unlikely to be won by a single better classifier; it will be driven by trustworthy AI, privacy-preserving collaboration, lightweight explainable defense, and reproducible benchmarking infrastructure.

Future benchmarking in IoT-edge cybersecurity should adopt the minimum reporting protocol summarized in Table 8: (i) transparent preprocessing and feature-pipeline disclosure, (ii) cross-dataset validation wherever feasible, (iii) latency and memory reporting, (iv) communication-overhead reporting for federated or distributed settings, (v) energy or compute estimates where direct power measurement is unavailable, (vi) mitigation-action linkage for real-time claims, and (vii) code or artifact release where institutional constraints permit. Such a protocol would help distinguish benchmark-efficient systems from deployment-ready ones and would make edge-aware evaluation more comparable across studies.

This structured review also has explicit methodological boundaries. It reconstructs a workspace-verifiable corpus and uses paper-level coding, but it does not provide a formal retrieval-stage log because raw database export records were not preserved. The 26-paper coded subset is a conservative gap-mapping layer rather than a statistical sample for meta-analysis or prevalence estimation. Similarly, not-reported signals in Table 5 and Supplementary Table S1 should be interpreted as limits of consistently recoverable paper-level evidence under the present extraction protocol, not as absolute claims that such measurements are absent from all full-text discussions. Accordingly, conclusions about deployment readiness should be read as claims about what is consistently recoverable and comparable from the present corpus, not as estimates of field-wide prevalence. These boundaries narrow the interpretation of the synthesis while preserving its value as a transparent structured evidence review.

10. Conclusions

The recent literature shows that AI is now firmly embedded in the cybersecurity agenda for IoT-edge systems. Traditional machine learning remains useful for lightweight detection, deep learning expands pattern-extraction capacity, and federated, explainable, graph-based, and privacy-preserving approaches continue to widen the design space. At the same time, experimentation remains concentrated around a small set of public benchmarks, and evaluation practice still gives stronger weight to detection performance than to sustained deployment realism. Because the coded empirical subset is purposively sampled and the raw retrieval log is unavailable, these conclusions should not be read as statistical prevalence estimates for the whole field. They summarize recoverable reporting signals from a transparent corpus and use those signals to identify where cross-study comparison is currently strongest or weakest.

The main conclusion of this review is that the field is no longer limited primarily by classifier design. Its more persistent constraints are benchmark concentration, weak deployment reporting, insufficient response-and-mitigation specification, and the absence of reproducible edge-aware evaluation. Most claimed edge-ready defenses therefore remain benchmark-efficient rather than deployment-ready. The revised evidence synthesis indicates that credible deployment claims should include not only detection scores, but also hardware tier, latency, memory, energy or compute burden, communication overhead, mitigation action, robustness stress test, and reproducibility asset. It also clarifies that the current evidence base remains IDS-dominated: authentication, access control, and trust management constrain the cybersecurity scope and should be treated as dedicated future evidence-mapping tasks rather than assumed to be covered by intrusion-detection results. The most credible path forward is the convergence of trustworthy AI and edge-aware cyber defense through privacy-preserving edge AI, lightweight and interpretable models, stronger robustness testing, cross-dataset validation, and more consistent reporting of runtime feasibility. Progress on these fronts would move the field beyond paper-level accuracy gains toward security systems that are genuinely deployable in the heterogeneous, resource-constrained, and safety-sensitive environments that define IoT-edge computing.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/electronics15112409/s1. Supplementary Table S1, Included-Study Matrix for the Coded Empirical Subset, in which Panel A reports the descriptive study fields for the 26-paper coded subset and Panel B reports the coded signals used to derive the aggregate counts summarized in Table 5 and discussed in Judgment 5 and Judgment 6; and Supplementary Table S2, Benchmark and Deployment Anchor Studies Retained in the Contextual Non-Coded Layer, which itemizes the dataset-originating, domain-sensitive benchmark, and deployment-framing papers used to thicken the evidence base of Section 5, Section 6, Section 7 and Section 8 without entering the conservative coding layer.

Author Contributions

Conceptualization, Q.X. and P.X.; methodology, Q.X. and P.X.; investigation, P.X.; formal analysis, P.X.; writing—original draft preparation, P.X.; writing—review and editing, Q.X., Z.W. and H.M.; visualization, P.X.; supervision, Q.X., Z.W. and H.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new dataset was created in this structured review. Supplementary Table S1 provides the paper-level coding matrix used in the analysis, including the descriptive fields in Panel A and the coded evidence signals in Panel B. Supplementary Table S2 documents the benchmark and deployment anchor papers retained in the contextual non-coded layer. All bibliographic sources are listed in the reference section.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Al-Fuqaha, A.; Guizani, M.; Mohammadi, M.; Aledhari, M.; Ayyash, M. Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications. IEEE Commun. Surv. Tutor. 2015, 17, 2347–2376. [Google Scholar] [CrossRef]
Zhukabayeva, T.; Zholshiyeva, L.; Karabayev, N.; Khan, S.; Alnazzawi, N. Cybersecurity Solutions for Industrial Internet of Things–Edge Computing Integration: Challenges, Threats, and Future Directions. Sensors 2025, 25, 213. [Google Scholar] [CrossRef] [PubMed]
Sallam, S.; El Barachi, M.; Li, N. Intrusion Detection on the Internet of Things: A Comprehensive Review and Gap Analysis Toward Real-Time, Lightweight, Adaptive, and Autonomous Security. IoT 2026, 7, 16. [Google Scholar] [CrossRef]
Batista, E.P.; Santos, A.; Peixoto, M.; Figueiredo, G.; Prazeres, C. Edge AI for SD-IoT: A Systematic Review on Scalability and Latency. IoT 2026, 7, 23. [Google Scholar] [CrossRef]
Cordova-Cardenas, R.; Amor, D.; Gutiérrez, Á. Edge AI in Practice: A Survey and Deployment Framework for Neural Networks on Embedded Systems. Electronics 2025, 14, 4877. [Google Scholar] [CrossRef]
Pazmiño Ortiz, L.A.; Maldonado Soliz, I.F.; Guevara Balarezo, V.K. Advancing TinyML in IoT: A Holistic System-Level Perspective for Resource-Constrained AI. Future Internet 2025, 17, 257. [Google Scholar] [CrossRef]
Tariq, U.; Ahmed, I.; Bashir, A.K.; Shaukat, K. A Critical Cybersecurity Analysis and Future Research Directions for the Internet of Things: A Comprehensive Review. Sensors 2023, 23, 4117. [Google Scholar] [CrossRef]
Aouedi, O.; Vu, T.-H.; Sacco, A.; Nguyen, D.C.; Piamrat, K.; Marchetto, G.; Pham, Q.-V. A Survey on Intelligent Internet of Things: Applications, Security, Privacy, and Future Directions. IEEE Commun. Surv. Tutor. 2025, 27, 1238–1292. [Google Scholar] [CrossRef]
Alotaibi, B. A Review of Resilient IoT Systems: Trends, Challenges, and Future Directions. Appl. Sci. 2026, 16, 2079. [Google Scholar] [CrossRef]
Hou, K.M.; Diao, X.; Shi, H.; Ding, H.; Zhou, H.; de Vaulx, C. Trends and Challenges in AIoT/IIoT/IoT Implementation. Sensors 2023, 23, 5074. [Google Scholar] [CrossRef]
Abdullahi, M.; Baashar, Y.; Alhussian, H.; Alwadain, A.; Aziz, N.; Capretz, L.F.; Abdulkadir, S.J. Detecting Cybersecurity Attacks in Internet of Things Using Artificial Intelligence Methods: A Systematic Literature Review. Electronics 2022, 11, 198. [Google Scholar] [CrossRef]
Arisdakessian, S.; Wahab, O.A.; Mourad, A.; Otrok, H.; Guizani, M. A Survey on IoT Intrusion Detection: Federated Learning, Game Theory, Social Psychology, and Explainable AI as Future Directions. IEEE Internet Things J. 2023, 10, 4059–4092. [Google Scholar] [CrossRef]
Gyamfi, E.; Jurcut, A. Intrusion Detection in Internet of Things Systems: A Review on Design Approaches Leveraging Multi-Access Edge Computing, Machine Learning, and Datasets. Sensors 2022, 22, 3744. [Google Scholar] [CrossRef] [PubMed]
Ferrag, M.A.; Friha, O.; Maglaras, L.; Janicke, H.; Shu, L. Federated Deep Learning for Cyber Security in the Internet of Things: Concepts, Applications, and Experimental Analysis. IEEE Access 2021, 9, 138509–138542. [Google Scholar] [CrossRef]
Negera, W.G.; Schwenker, F.; Debelee, T.G.; Melaku, H.M.; Ayano, Y.M. Review of Botnet Attack Detection in SDN-Enabled IoT Using Machine Learning. Sensors 2022, 22, 9837. [Google Scholar] [CrossRef] [PubMed]
Manivannan, D. Recent endeavors in machine learning-powered intrusion detection systems for the Internet of Things. J. Netw. Comput. Appl. 2024, 229, 103925. [Google Scholar] [CrossRef]
Ghaffari, A.; Jelodari, N.; pouralish, S.; derakhshanfard, N.; Arasteh, B. Securing internet of things using machine and deep learning methods: A survey. Clust. Comput. 2024, 27, 9065–9089. [Google Scholar] [CrossRef]
Mallidi, S.K.R.; Ramisetty, R.R. Advancements in training and deployment strategies for AI-based intrusion detection systems in IoT: A systematic literature review. Discov. Internet Things 2025, 5, 8. [Google Scholar] [CrossRef]
Alfahaid, A.; Alalwany, E.; Almars, A.M.; Alharbi, F.; Atlam, E.; Mahgoub, I. Machine Learning-Based Security Solutions for IoT Networks: A Comprehensive Survey. Sensors 2025, 25, 3341. [Google Scholar] [CrossRef] [PubMed]
Bankó, M.B.; Dyszewski, S.; Králová, M.; Limpek, M.B.; Papaioannou, M.; Choudhary, G.; Dragoni, N. Advancements in Machine Learning-Based Intrusion Detection in IoT: Research Trends and Challenges. Algorithms 2025, 18, 209. [Google Scholar] [CrossRef]
Villafranca, A.; Thant, K.M.; Tasic, I.; Cano, M.-D. AI-Enabled IoT Intrusion Detection: Unified Conceptual Framework and Research Roadmap. Mach. Learn. Knowl. Extr. 2025, 7, 115. [Google Scholar] [CrossRef]
Meziane, H.; Ouerdi, N. A survey on performance evaluation of artificial intelligence algorithms for improving IoT security systems. Sci. Rep. 2023, 13, 21255. [Google Scholar] [CrossRef]
Fährmann, D.; Martín, L.; Sánchez, L.; Damer, N. Anomaly Detection in Smart Environments: A Comprehensive Survey. IEEE Access 2024, 12, 64006–64049. [Google Scholar] [CrossRef]
Ferrag, M.A.; Friha, O.; Kantarci, B.; Tihanyi, N.; Cordeiro, L.; Debbah, M.; Hamouda, D.; Al-Hawawreh, M.; Choo, K.-K.R. Edge Learning for 6G-Enabled Internet of Things: A Comprehensive Survey of Vulnerabilities, Datasets, and Defenses. IEEE Commun. Surv. Tutor. 2023, 25, 2654–2713. [Google Scholar] [CrossRef]
Zhang, Z.; Hamadi, H.A.; Damiani, E.; Yeun, C.Y.; Taher, F. Explainable Artificial Intelligence Applications in Cyber Security: State-of-the-Art in Research. IEEE Access 2022, 10, 93104–93139. [Google Scholar] [CrossRef]
Moustafa, N.; Koroniotis, N.; Keshk, M.; Zomaya, A.Y.; Tari, Z. Explainable Intrusion Detection for Cyber Defences in the Internet of Things: Opportunities and Solutions. IEEE Commun. Surv. Tutor. 2023, 25, 1775–1807. [Google Scholar] [CrossRef]
Agrawal, S.; Sarkar, S.; Aouedi, O.; Yenduri, G.; Piamrat, K.; Alazab, M.; Bhattacharya, S.; Maddikunta, P.K.R.; Gadekallu, T.R. Federated Learning for intrusion detection system: Concepts, challenges and future directions. Comput. Commun. 2022, 195, 346–361. [Google Scholar] [CrossRef]
Li, J.; Othman, M.S.; Chen, H.; Yusuf, L.M. Optimizing IoT intrusion detection system: Feature selection versus feature extraction in machine learning. J. Big Data 2024, 11, 36. [Google Scholar] [CrossRef]
Aldaej, A.; Ullah, I.; Ahanger, T.A.; Atiquzzaman, M. Ensemble technique of intrusion detection for IoT-edge platform. Sci. Rep. 2024, 14, 11703. [Google Scholar] [CrossRef]
Ferrag, M.A.; Ndhlovu, M.; Tihanyi, N.; Cordeiro, L.C.; Debbah, M.; Lestable, T.; Thandi, N.S. Revolutionizing Cyber Threat Detection with Large Language Models: A Privacy-Preserving BERT-Based Lightweight Model for IoT/IIoT Devices. IEEE Access 2024, 12, 23733–23750. [Google Scholar] [CrossRef]
Mishra, N.; Pandya, S. Internet of Things Applications, Security Challenges, Attacks, Intrusion Detection, and Future Visions: A Systematic Review. IEEE Access 2021, 9, 59353–59377. [Google Scholar] [CrossRef]
U, V.M.; Babu Kumaravelu, V.; C, V.K.; A, R.; Chinnadurai, S.; Venkatesan, R.; Hai, H.; Selvaprabhu, P. AI-Powered IoT: A Survey on Integrating Artificial Intelligence with IoT for Enhanced Security, Efficiency, and Smart Applications. IEEE Access 2025, 13, 50296–50339. [Google Scholar] [CrossRef]
Kuzlu, M.; Fair, C.; Guler, O. Role of Artificial Intelligence in the Internet of Things (IoT) cybersecurity. Discov. Internet Things 2021, 1, 7. [Google Scholar] [CrossRef]
Singh, R.; Gill, S.S. Edge AI: A survey. Internet Things Cyber-Phys. Syst. 2023, 3, 71–92. [Google Scholar] [CrossRef]
Reyes-Acosta, R.E.; Mendoza-González, R.; Oswaldo Diaz, E.; Vargas Martin, M.; Luna Rosas, F.J.; Martínez Romo, J.C.; Mendoza-González, A. Cybersecurity Conceptual Framework Applied to Edge Computing and Internet of Things Environments. Electronics 2025, 14, 2109. [Google Scholar] [CrossRef]
Ferrag, M.A.; Friha, O.; Hamouda, D.; Maglaras, L.; Janicke, H. Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning. IEEE Access 2022, 10, 40281–40306. [Google Scholar] [CrossRef]
Meidan, Y.; Bohadana, M.; Mathov, Y.; Mirsky, Y.; Shabtai, A.; Breitenbacher, D.; Elovici, Y. N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders. IEEE Pervasive Comput. 2018, 17, 12–22. [Google Scholar] [CrossRef]
Neto, E.C.P.; Dadkhah, S.; Ferreira, R.; Zohourian, A.; Lu, R.; Ghorbani, A.A. CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment. Sensors 2023, 23, 5941. [Google Scholar] [CrossRef]
Cherfi, S.; Boulaiche, A.; Lemouari, A. Exploring the ALNS method for improved cybersecurity: A deep learning approach for attack detection in IoT and IIoT environments. Internet Things 2024, 28, 101421. [Google Scholar] [CrossRef]
Zhou, H.; Zou, H.; Li, W.; Li, D.; Kuang, Y. HiViT-IDS: An Efficient Network Intrusion Detection Method Based on Vision Transformer. Sensors 2025, 25, 1752. [Google Scholar] [CrossRef]
Ahmed, S.F.; Alam, M.S.B.; Afrin, S.; Rafa, S.J.; Rafa, N.; Gandomi, A.H. Insights into Internet of Medical Things (IoMT): Data fusion, security issues and potential solutions. Inf. Fusion 2024, 102, 102060. [Google Scholar] [CrossRef]
Kalakoti, R.; Nõmm, S.; Bahsi, H. Explainable Transformer-based Intrusion Detection in Internet of Medical Things (IoMT) Networks. In Proceedings of the 2024 International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 18–20 December 2024; pp. 1164–1169. [Google Scholar] [CrossRef]
Begum, K.; Mozumder, M.A.I.; Joo, M.-I.; Kim, H.-C. BFLIDS: Blockchain-Driven Federated Learning for Intrusion Detection in IoMT Networks. Sensors 2024, 24, 4591. [Google Scholar] [CrossRef] [PubMed]
Deshmukh, A.; de la Rosa, P.E.; Rodriguez, R.V.; Dasari, S. Enhancing Privacy in IoT-Enabled Digital Infrastructure: Evaluating Federated Learning for Intrusion and Fraud Detection. Sensors 2025, 25, 3043. [Google Scholar] [CrossRef]
Mankotia, S.; Conte de Leon, D.; Rimal, B.P. FedPrIDS: Privacy-Preserving Federated Learning for Collaborative Network Intrusion Detection in IoT. J. Cybersecur. Priv. 2026, 6, 10. [Google Scholar] [CrossRef]
Khraisat, A.; Alazab, A.; Alazab, M.; Obeidat, A.; Singh, S.; Jan, T. Federated learning for intrusion detection in IoT environments: A privacy-preserving strategy. Discov. Internet Things 2025, 5, 72. [Google Scholar] [CrossRef]
Naghib, A.; Gharehchopogh, F.S.; Zamanifar, A. A comprehensive and systematic literature review on intrusion detection systems in the internet of medical things: Current status, challenges, and opportunities. Artif. Intell. Rev. 2025, 58, 157727–157760. [Google Scholar] [CrossRef]
Fenanir, S.; Semchedine, F. Smart Intrusion Detection in IoT Edge Computing Using Federated Learning. Rev. D’Intell. Artif. 2023, 37, 1133–1145. [Google Scholar] [CrossRef]
Bhavsar, M.H.; Bekele, Y.B.; Roy, K.; Kelly, J.C.; Limbrick, D. FL-IDS: Federated Learning-Based Intrusion Detection System Using Edge Devices for Transportation IoT. IEEE Access 2024, 12, 52215–52226. [Google Scholar] [CrossRef]
Javeed, D.; Gao, T.; Saeed, M.S.; Kumar, P. An Intrusion Detection System for Edge-Envisioned Smart Agriculture in Extreme Environment. IEEE Internet Things J. 2024, 11, 26866–26876. [Google Scholar] [CrossRef]
Alsaedi, A.; Moustafa, N.; Tari, Z.; Mahmood, A.; Anwar, A. TON_IoT Telemetry Dataset: A New Generation Dataset of IoT and IIoT for Data-Driven Intrusion Detection Systems. IEEE Access 2020, 8, 165130–165150. [Google Scholar] [CrossRef]
Doménech, J.; León, O.; Siddiqui, M.S.; Pegueroles, J. Evaluating and enhancing intrusion detection systems in IoMT: The importance of domain-specific datasets. Internet Things 2025, 32, 101631. [Google Scholar] [CrossRef]
Chahal, A.; Gulia, P.; Gill, N.S.; Rani, D. Design of a federated ensemble model for intrusion detection in distributed IIoT networks for enhancing cybersecurity. J. Ind. Inf. Integr. 2025, 44, 100800. [Google Scholar] [CrossRef]
Gaspar, D.; Silva, P.; Silva, C. Explainable AI for Intrusion Detection Systems: LIME and SHAP Applicability on Multi-Layer Perceptron. IEEE Access 2024, 12, 30164–30175. [Google Scholar] [CrossRef]
Alabbadi, A.; Bajaber, F. An Intrusion Detection System over the IoT Data Streams Using eXplainable Artificial Intelligence (XAI). Sensors 2025, 25, 847. [Google Scholar] [CrossRef]
Capuano, N.; Carletti, V.; Foggia, P.; Rosa, F.; Vento, M. Graph neural networks for IoT security: A comparative study. Internet Things 2026, 36, 101863. [Google Scholar] [CrossRef]
Le, T.-T.-H.; Wardhani, R.W.; Putranto, D.S.C.; Jo, U.; Kim, H. Toward Enhanced Attack Detection and Explanation in Intrusion Detection System-Based IoT Environment Data. IEEE Access 2023, 11, 131661–131676. [Google Scholar] [CrossRef]
Bhavsar, M.; Roy, K.; Kelly, J.; Olusola, O. Anomaly-based intrusion detection system for IoT application. Discov. Internet Things 2023, 3, 5. [Google Scholar] [CrossRef]
Javid, I.; Khara, S.; Frnda, J.; Khanday, S.A.; Wani, N.A.; Bedi, J.; Anwar, M.S. NIDD-enabled lightweight intrusion detection for effective DDoS mitigation in 5G and beyond. Sci. Rep. 2025, 15, 42207. [Google Scholar] [CrossRef]
Singh, G.; Sood, K.; Rajalakshmi, P.; Xiang, Y. Sentinel: Dynamic Knowledge Distillation for Personalized Federated Intrusion Detection in Heterogeneous IoT Networks. IEEE Internet Things J. 2026, 13, 14682–14694. [Google Scholar] [CrossRef]
Feng, C.; Huertas Celdrán, A.; Han, J.; Ren, H.; Cheng, X.; Zeng, Z.; Krauter, L.; Bovet, G.; Stiller, B. A crowdsensing intrusion detection dataset for decentralized federated learning models. Sci. Data 2026, 13, 796. [Google Scholar] [CrossRef]
Alzubi, O.A.; Alzubi, J.A.; Qiqieh, I.; Al-Zoubi, A.M. An IoT Intrusion Detection Approach Based on Salp Swarm and Artificial Neural Network. Int. J. Netw. Manag. 2025, 35, e2296. [Google Scholar] [CrossRef]
Qaddos, A.; Yaseen, M.U.; Al-Shamayleh, A.S.; Imran, M.; Akhunzada, A.; Alharthi, S.Z. A novel intrusion detection framework for optimizing IoT security. Sci. Rep. 2024, 14, 21789. [Google Scholar] [CrossRef]
Kaushik, S.; Bhardwaj, A.; Almogren, A.; bharany, S.; Altameem, A.; Rehman, A.U.; Hussen, S.; Hamam, H. Robust machine learning based Intrusion detection system using simple statistical techniques in feature selection. Sci. Rep. 2025, 15, 3970. [Google Scholar] [CrossRef]
Heydari, S.; Mahmoud, Q.H. Tiny Machine Learning and On-Device Inference: A Survey of Applications, Challenges, and Future Directions. Sensors 2025, 25, 3191. [Google Scholar] [CrossRef]
Goldschmidt, P.; Chudá, D. Network intrusion datasets: A survey, limitations, and recommendations. Comput. Secur. 2025, 156, 104510. [Google Scholar] [CrossRef]
Wu, J.; Wang, Y. TriHID: Towards verifiable domain adaptation-based IoT intrusion detection in heterogeneous environment. Expert Syst. Appl. 2026, 298, 129543. [Google Scholar] [CrossRef]
Lakshan Yasarathna, T.; Le-Khac, N.-A. ASEADOS-SDN-IoT: A novel SDN-IoT network intrusion detection dataset and framework. Internet Things 2026, 36, 101891. [Google Scholar] [CrossRef]
Das, V.; Nair, B.B. A novel multi-stage attack dataset for smart home intrusion detection. Data Brief 2026, 66, 112770. [Google Scholar] [CrossRef]
Ramadan, M.N.; Ali, M.A.; Khoo, S.Y.; Alkhedher, M. Federated learning and TinyML on IoT edge devices: Challenges, advances, and future directions. ICT Express 2025, 11, 754–768. [Google Scholar] [CrossRef]
Kumari, S.; Tulshyan, V.; Tewari, H. Cyber Security on the Edge: Efficient Enabling of Machine Learning on IoT Devices. Information 2024, 15, 126. [Google Scholar] [CrossRef]
Baciu, V.-E.; Braeken, A.; Segers, L.; Silva, B.d. Secure Tiny Machine Learning on Edge Devices: A Lightweight Dual Attestation Mechanism for Machine Learning. Future Internet 2025, 17, 85. [Google Scholar] [CrossRef]
Alharthi, S.; Rashid, M.; Aljabri, M. TinyML in Industrial IoT: A Systematic Review of Applications, System Components, and Methodologies. Sensors 2026, 26, 2550. [Google Scholar] [CrossRef]
Hernandez-Jaimes, M.L.; Martinez-Cruz, A.; Ramírez-Gutiérrez, K.A.; Feregrino-Uribe, C. Artificial intelligence for IoMT security: A review of intrusion detection systems, attacks, datasets and Cloud–Fog–Edge architectures. Internet Things 2023, 23, 100887. [Google Scholar] [CrossRef]
Garroppo, R.G.; Giardina, P.G.; Landi, G.; Ruta, M. Trustworthy AI and Federated Learning for Intrusion Detection in 6G-Connected Smart Buildings. Future Internet 2025, 17, 191. [Google Scholar] [CrossRef]
Areia, J.; Bispo, I.A.; Santos, L.; Costa, R.L.d.C. IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things. IEEE Access 2024, 12, 115370–115385. [Google Scholar] [CrossRef]
Bilal, M.A.; Ul Islam, I.; Idrees, S.; Qasim, M.; Khan, M.J.; Khan, J. Dataset-centric evaluation of federated intrusion detection models in IoT networks. Sci. Rep. 2026, 16, 2683. [Google Scholar] [CrossRef]
Heidari, A.; Rastegar, S.H.; Khonsari, A. FedIoV: A secure and adaptive federated framework for real-time intrusion detection in vehicular networks. Future Gener. Comput. Syst. 2026, 181, 108448. [Google Scholar] [CrossRef]
Nguyen, T.D.; Alazab, A.; Khraisat, A.; Jan, T. Feature reduction in federated learning for intrusion detection in IoT networks. Cybersecurity 2026, 9, 102. [Google Scholar] [CrossRef]
Peng, H.; Wu, C.; Xiao, Y. FD-IDS: Federated Learning with Knowledge Distillation for Intrusion Detection in Non-IID IoT Environments. Sensors 2025, 25, 4309. [Google Scholar] [CrossRef]
Oladele, D.A.; Ige, A.; Agbo-Ajala, O.; Ekundayo, O.; Thottempudi, S.G.; Sibiya, M.; Mnkandla, E. G-PFL-ID: Graph-Driven Personalized Federated Learning for Unsupervised Intrusion Detection in Non-IID IoT Systems. IoT 2026, 7, 13. [Google Scholar] [CrossRef]
Sun, S.; Zhou, l.; Wang, Z.; Han, L. Robust intrusion detection based on personalized federated learning for IoT environment. Comput. Secur. 2025, 154, 104442. [Google Scholar] [CrossRef]
Sameera, K.M.; Vinod, P.; Rocha, A.; Rafidha Rehiman, K.A.; Conti, M. WeiDetect: Weibull distribution-based defense against poisoning attacks in federated learning for network intrusion detection systems. J. Inf. Secur. Appl. 2025, 95, 104275. [Google Scholar] [CrossRef]
hulayyil, S.B.; Li, S.; Saxena, N. Explainable AI-based intrusion detection in IoT systems. Internet Things 2025, 31, 101589. [Google Scholar] [CrossRef]
Sadhwani, S.; Navare, A.; Mohan, A.; Muthalagu, R.; Pawar, P.M. IoT-based intrusion detection system using explainable multi-class deep learning approaches. Comput. Electr. Eng. 2025, 123, 110256. [Google Scholar] [CrossRef]
Turgut, Z.; Başarslan, M.S. XBiDeep: A novel explainable artificial intelligence based intrusion detection system for Internet of Medical Things environment. Internet Things 2025, 33, 101675. [Google Scholar] [CrossRef]
Taheri, R.; Jafari, R.; Gegov, A.; Arabikhan, F.; Ichtev, A. Explainable AI for Federated Learning-Based Intrusion Detection Systems in Connected Vehicles. Electronics 2025, 14, 4508. [Google Scholar] [CrossRef]
Waqas, A.; Khan, S.D.; Ullah, Z.; Ullah, M.; Ullah, H. Comparative Analysis of Deep Learning Models for Intrusion Detection in IoT Networks. Computers 2025, 14, 283. [Google Scholar] [CrossRef]
Alserhani, F. Intrusion Detection and Real-Time Adaptive Security in Medical IoT Using a Cyber-Physical System Design. Sensors 2025, 25, 4720. [Google Scholar] [CrossRef]
Pecherle, G.D.; Győrödi, R.Ș.; Győrödi, C.A. Federated Learning-Based Intrusion Detection in Industrial IoT Networks. Future Internet 2025, 18, 2. [Google Scholar] [CrossRef]
Anwer, R.W.; Abrar, M.; Ullah, M.; Salam, A.; Ullah, F. Advanced intrusion detection in the industrial Internet of Things using federated learning and LSTM models. Ad. Hoc Netw. 2025, 178, 103991. [Google Scholar] [CrossRef]
Tahsien, S.M.; Karimipour, H.; Spachos, P. Machine learning based solutions for security of Internet of Things (IoT): A survey. J. Netw. Comput. Appl. 2020, 161, 102630. [Google Scholar] [CrossRef]
Khraisat, A.; Alazab, A.; Singh, S.; Jan, T., Jr.; Gomez, A. Survey on Federated Learning for Intrusion Detection System: Concept, Architectures, Aggregation Strategies, Challenges, and Future Directions. ACM Comput. Surv. 2025, 57, 1–38. [Google Scholar] [CrossRef]
Sarker, I.H.; Janicke, H.; Mohsin, A.; Gill, A.; Maglaras, L. Explainable AI for cybersecurity automation, intelligence and trustworthiness in digital twin: Methods, taxonomy, challenges and prospects. ICT Express 2024, 10, 935–958. [Google Scholar] [CrossRef]
Alauthman, M.; Aslam, N.; Al-Qerem, A.; Aldweesh, A.; Sureephong, P. Generative Adversarial Networks for Intrusion Detection Systems: A Comprehensive Survey of Applications, Challenges, and Research Directions. Arab. J. Sci. Eng. 2026, 51, 179–203. [Google Scholar] [CrossRef] [PubMed]
Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J. Survey of intrusion detection systems: Techniques, datasets and challenges. Cybersecurity 2019, 2, 20. [Google Scholar] [CrossRef]

Figure 1. Review framework linking threats, method families, datasets and settings, evaluation regimes, deployment reality, and evidence-based gaps. Dashed feedback arrows indicate that gap analysis should inform method selection and benchmark design, rather than remaining only as end-of-pipeline observations.

Figure 2. Two-dimensional taxonomy of IoT-edge cybersecurity studies, crossing AI method families with deployment objectives. Cell shading and in-cell labels mark dominant (D), frequent (F), or occasional (O) alignments between method families and deployment priorities.

Figure 3. Conceptual bottleneck map from benchmark-efficient studies to deployment-ready IoT-edge defense. The illustration synthesizes the recurring reporting and deployment barriers identified in this review and is intended as a visual summary rather than as a quantitative result figure.

Table 1. Study-by-study comparison with representative prior reviews.

Review	Main Research Boundary	Retrieval or Corpus Orientation	Dataset Analysis	Deployment Metrics	Mitigation/AI Vulnerability	Distinction of This Review
Abdullahi et al. [11]	AI methods for IoT cyberattack detection	Systematic IoT security review	Dataset summary	Limited resource operationalization	Mainly detection	This review narrows the boundary to IoT-edge deployment evidence.
Gyamfi and Jurcut [13]	IoT IDS using MEC, ML, and datasets	IDS design review	Strong IDS dataset attention	MEC discussed, but not hardware-tiered	Detection centered	This review treats deployment feasibility as a coded evidence layer.
Mallidi and Ramisetty [18]	Training and deployment strategies for AI-based IoT IDS	Deployment-aware IDS review	Dataset discussion tied to IDS training	Deployment discussed without a minimum checklist	IDS deployment	This review formalizes minimum reportable deployment fields.
Banko et al. [20]	ML-based IoT IDS trends and challenges	Trend-oriented IDS review	Benchmark and imbalance awareness	Partial deployment critique	Detection/ evaluation	This review separates benchmark concentration, deployment realism, and mitigation.
Villafranca et al. [21]	AI-enabled IoT IDS framework and roadmap	Conceptual roadmap	Moderate dataset treatment	Roadmap-level rather than coded evidence	IDS roadmap	This review adds paper-level evidence coding and a decision matrix.
Ferrag et al. [24]	Edge learning vulnerabilities, datasets, and defenses	Broad edge-learning survey	Strong vulnerability/dataset framing	Strong systems framing	Explicit model vulnerabilities	This review maps AI families to IoT-edge threats within cybersecurity deployment evidence.
Singh and Gill [34]	Edge AI deployment broadly	Edge-AI systems survey	Not cybersecurity-dataset centered	Strong resource orientation	Not cyber-defense centered	This review applies edge-AI deployment realism to cybersecurity evaluation.
This review	AI for cybersecurity in IoT-edge systems	96-reference corpus plus 26-paper coded subset	Benchmark concentration, imbalance, domain datasets, transfer	Checklist, hardware tiers, latency, memory, energy, communication	Response/ mitigation and AI-method vulnerability	Novelty lies in evidence-weighted deployment synthesis, not another model catalogue.

Table 2. Minimal workspace-verifiable corpus and coding counts.

Corpus Layer	Count	Interpretation
Bibliographically verified working corpus reconstructed in the current workspace	96	Verified references supporting the present manuscript
Contextual non-coded layer retained for structured synthesis	70	Review, benchmark, framework, and deployment papers used for framing, taxonomy, and synthesis without entering the conservative coding layer
Conservative coded empirical subset	26	Representative empirical studies used for the later gap-coding snapshot and fully itemized in Supplementary Table S1

Table 3. Two-dimensional taxonomy of method families and deployment objectives.

Method Family	Typical Models	Typical Security Tasks	Dominant Deployment Objective	Common Tradeoff	Edge Maturity
Traditional ML	SVM, RF, XGBoost, KNN, LR	Intrusion detection, anomaly detection, malware and botnet classification	Lightweight execution	Lower model complexity but stronger dependence on feature engineering	Mature for gateway and edge-server use
Deep learning	CNN, RNN, LSTM, GRU, autoencoder, transformer-like	IDS, anomaly detection, multi-class traffic classification	Accuracy or representation quality	Better pattern extraction at the cost of compute and memory	Mixed; often needs compression or careful placement
Federated learning	FedAvg variants, personalized FL, secure aggregation	Privacy-preserving IDS, collaborative detection across sites	Privacy preservation and communication-aware collaboration	Reduces raw-data sharing but introduces orchestration and update cost	Emerging; strongest at gateway or multi-edge level
Graph-based learning	GCN, GAT, temporal GNN	Relational attack detection, node or flow classification	Structural fidelity	Captures topology better but raises graph-construction and scaling issues	Early-stage for practical edge use
Explainable or trustworthy AI	SHAP, LIME, interpretable pipelines, privacy-preserving hybrid models	Analyst-facing IDS, regulated domains, privacy-sensitive collaboration	Interpretability or robustness	Improves trust but may add runtime, implementation, or evaluation burden	Moderate when explanation is selective rather than continuous

Table 4. Frequently used and emerging datasets and benchmark issues.

Dataset	Domain	Typical Task	Distinguishing Feature	Public or Custom	Common Issue
TON_IoT [51]	IoT and IIoT telemetry	Intrusion and anomaly detection	Multi-source telemetry and logs	Public	Requires careful preprocessing; details are not always reported
Edge-IIoTset [36]	IoT and IIoT traffic	Multi-class IDS in centralized and FL settings	Explicit centralized and federated framing	Public	Overused in comparative studies; generalization rarely checked
CICIoT2023 [38]	Large-scale IoT traffic	Binary and fine-grained attack classification	Real-time-oriented design and many classes	Public	Large size and class imbalance complicate fair comparison
N-BaIoT [37]	IoT botnet traffic	Botnet detection and anomaly detection	Device-specific botnet traces	Public	Narrow attack family coverage and older traffic conditions
TriHID [67]	Heterogeneous IoT intrusion detection	Domain-adaptation-aware IDS evaluation	Designed for heterogeneous transfer evaluation across environments	Public	Too new for broad reuse; transfer protocols need careful replication
IoMT domain-specific datasets [52]	Healthcare and medical IoT	IDS and anomaly detection	Better alignment to medical workflows and device heterogeneity	Mixed public and curated	Limited transferability to general IoT settings
IoMT-TrafficData [76]	Healthcare and medical IoT	Domain-specific IDS benchmarking	Provides healthcare-centered traffic and benchmarking tools	Public	Narrower domain scope than general IoT benchmarks
Crowdsensing FL dataset [61]	Decentralized edge and mobile sensing	Federated intrusion detection	Built to expose topology and client-partition effects	Public	Too new for broad comparative reuse
Dataset-centric FL benchmarking [77]	Federated IoT and IIoT intrusion detection	Cross-environment federated IDS evaluation	Harmonizes multiple modern datasets for transfer- and communication-aware benchmarking	Public benchmark suite	Comparison depends on label harmonization and feature-space alignment
Custom edge testbeds [39,48,49,50,53]	Transport, agriculture, IIoT	Federated or edge IDS	Higher contextual realism	Mostly custom	Small scale, inconsistent release of code or raw data
Benchmark-heavy optimization studies [62,63,64]	Mixed IoT datasets	Feature optimization and framework comparison	Show how preprocessing changes ranking	Mixed	Hard to compare when preprocessing is underspecified

Table 5. Conservative gap-coding snapshot over representative empirical studies.

Signal	Count (n = 26)	Interpretation
Explicit edge or deployment evidence	5	Clear edge anchoring is a minority pattern
Explicit or partial edge relevance	12	More than half of the coded studies still do not make edge evidence concrete
No explicit edge evidence	14	Edge framing is often rhetorical rather than operational
Partial or full real-device or gateway grounding	4	Hardware-level validation is uncommon
Cross-dataset validation	5	Generalization is tested far less often than single-benchmark accuracy
Reusable code or artifact signal	1	Reproducibility assets are rare in the current corpus
Explicit latency reporting in the extractable evidence layer	0	Runtime feasibility remains opaque
Explicit memory reporting in the extractable evidence layer	0	Resource-fit claims are difficult to verify
Explicit energy reporting in the extractable evidence layer	0	Battery and remote-node feasibility are largely unevidenced
Explicit communication-overhead reporting in the extractable evidence layer	0	Distributed and federated cost is usually underspecified
Explicit robustness-evaluation signal	0	Robustness is rarely operationalized as a measured endpoint
Partial robustness-centered signal	1	Robustness is more often claimed than systematically tested
Explicit explanation-utility evaluation beyond visualization	0	XAI is seldom treated as a standardized outcome
Partial explanation-centered evaluation signal	4	Explanation appears in the literature, but rarely with formal utility criteria

Table 6. Evidence-stratified profile of the 26-paper coded empirical subset.

Dimension	Dominant Coded Categories	Recoverable Count	Interpretation for Evidence Weighting
Publication year	2023; 2024; 2025; 2026	3; 10; 9; 4	The coded subset is intentionally recent, with most empirical evidence concentrated in 2024–2025 and a smaller number of 2026 papers available by the review cutoff.
Security task	Intrusion/anomaly detection; botnet or malware; DDoS/malicious traffic; privacy-preserving collaborative IDS	20; 3; 3; 9	The evidence base is IDS-heavy. Mitigation, authentication, access control, and trust management remain underrepresented relative to the breadth of the cybersecurity title.
Model family	Traditional ML/optimization; deep or transformer-style learning; federated/distributed learning; explainable/trustworthy AI; graph-based learning	5; 8; 9; 4; 1	The field is methodologically diverse, but most families are still evaluated through detection benchmarks rather than deployment protocols. Counts are non-exclusive because several studies combine families.
Dataset layer	CICIoT2023; Edge-IIoTset; TON_IoT; N-BaIoT; domain-specific or custom datasets; cross-dataset validation	6; 6; 5; 3; 6; 5	Benchmark concentration remains visible even after including recent domain-specific and federated studies. Cross-dataset evidence is present but still a minority pattern.
Application scenario	Generic IoT IDS; IIoT/industrial; IoMT/healthcare; transportation; smart agriculture; decentralized/federated sensing	15; 5; 4; 1; 1; 3	The corpus contains domain signals, but generic traffic-based IDS remains dominant. This weakens direct transfer from benchmark results to safety-critical verticals.
Metric family	Accuracy/precision/recall/F1; class-wise or imbalance-aware reporting; ROC-AUC; latency; memory/model size; energy; communication overhead	26; 9; 8; 0; 0; 0; 0	Detection metrics are mature, whereas deployment-quality metrics are not consistently recoverable. This justifies treating deployment reporting as a separate review contribution rather than a minor evaluation detail.
Methodological quality signal	Preprocessing detail; cross-dataset validation; real-device/gateway grounding; reusable artifact/code	12 partial or explicit; 5; 4; 1	Reproducibility and hardware grounding remain sparse. The strongest evidence comes from papers that expose preprocessing, benchmark transfer, or deployment substrate, not only final accuracy.

Note: Counts refer to explicit paper-level signals recoverable under a consistent extraction protocol. They should be interpreted as indicators of reporting recoverability and cross-paper comparability, not as absolute claims that no such measurements exist anywhere in full-text discussions. This distinction matters because non-standardized reporting directly limits reproducibility and cross-study evaluation.

Table 7. Evaluation metrics and deployment indicators.

Metric	Category	Meaning	Why It Matters in IoT-Edge Settings
Accuracy	Detection quality	Overall fraction of correct predictions	Easy to report but can hide imbalance problems
Precision	Detection quality	Share of predicted attacks that are correct	Reduces false alarms on constrained systems
Recall	Detection quality	Share of actual attacks that are detected	Critical for safety-sensitive edge scenarios
F1-score	Detection quality	Harmonic mean of precision and recall	More stable than accuracy under imbalance
ROC-AUC	Detection quality	Ranking quality across thresholds	Useful for threshold analysis but not sufficient alone
False positive rate	Detection quality	Frequency of benign traffic flagged as attack	High FPR is costly in bandwidth- and compute-limited systems
Latency	Deployment quality	Time from observation to decision	Determines real-time usefulness
Memory footprint	Deployment quality	RAM or storage required by the model	Key for gateways, MCUs, and low-cost devices
Energy consumption	Deployment quality	Power cost of inference or communication	Important for battery-powered and remote nodes
Communication overhead	Deployment quality	Bytes, rounds, or bandwidth consumed	Central to federated or distributed learning
Inference cost	Deployment quality	Aggregate compute burden at runtime	Links model choice to edge feasibility
Edge-device feasibility	Deployment quality	Whether the method fits target hardware	The most direct bridge between lab results and deployment

Table 8. Minimum deployment reporting checklist for comparable IoT-edge AI cybersecurity studies.

Reporting Item	Minimum Required Detail	Comparable Unit	Why This Is Necessary
Target placement	Sensor/MCU, gateway, edge server, cloud-assisted edge, or multi-edge FL	Named hardware tier and execution location	Prevents treating a workstation experiment as an edge deployment claim.
Hardware profile	CPU/MCU type, RAM, storage, accelerator availability, operating system	Device model or resource range	Allows readers to judge whether the model fits constrained or gateway-class hardware.
Latency	Inference latency and end-to-end observation-to-action latency	ms/sample, ms/flow, or ms/window; batch size stated	Separates offline detection from real-time mitigation capability.
Memory and model footprint	Model size, peak RAM, feature-buffer size, and preprocessing memory	MB or KB	Captures the full runtime pipeline, not only stored model weights.
Energy or compute burden	Power draw, energy per inference, CPU utilization, or FLOPs/MACs when direct power is unavailable	mJ/inference, W, %, FLOPs/MACs	Supports battery and remote-node feasibility claims.
Communication overhead	Update size, number of rounds, synchronization frequency, or bandwidth per decision/training cycle	bytes, MB/round, rounds, or bandwidth	Essential for FL, distributed IDS, and intermittent connectivity settings.
Robustness and drift	Cross-dataset validation, non-IID split, drift scenario, adversarial or poisoning test, or client-churn test	Reported scenario and stress-test metric	Shows whether performance survives realistic IoT-edge variation.
Mitigation linkage	Whether model output triggers blocking, throttling, quarantine, re-authentication, escalation, or only logging	Action class and response delay	Connects detection results to cyber–physical response requirements.
Reproducibility asset	Code, feature extraction script, configuration, seed, data split, or trained model	Artifact availability and URL/DOI if public	Makes benchmark comparison auditable and repeatable.

Table 9. Deployment decision matrix linking edge constraints to AI-family choices.

Deployment Tier	Typical Constraint	Preferred AI Family	Avoid or Offload	Minimum Evidence Expected
Sensor or MCU-level TinyML	Very small RAM/storage, tight energy budget, local sampling, intermittent connectivity	Feature-sparse traditional ML, tiny autoencoders, quantized or pruned compact networks	Full transformers, large ensembles, continuous XAI, full local FL training	On-device memory, energy or compute estimate, preprocessing footprint, and latency under realistic sampling.
Constrained gateway	Memory below roughly 512 MB, latency target below roughly 10–50 ms for filtering or local alarms	Traditional ML, shallow ensembles, compact CNN/AE models, selective explanation	Large transformer-like IDS and expensive per-event SHAP unless batched or offloaded	ms/flow or ms/window, peak RAM, model size, false-positive cost, and mitigation action.
Edge server or industrial gateway	GB-level memory, stable power, local aggregation, multi-protocol traffic	Tree ensembles, CNN-LSTM, compact transformers, GNNs for topology-aware monitoring	Claims of real-time actuation without end-to-end timing	Hardware profile, throughput, batch size, queueing delay, cross-dataset or domain-transfer evidence.
Multi-edge or federated deployment	Data locality, non-IID clients, communication budget, client churn	Federated learning, personalized FL, distillation-assisted FL, secure aggregation	Single global FedAvg claims without client heterogeneity or communication accounting	Client partition rule, rounds, update size, communication cost, non-IID split, poisoning or churn robustness.
Cloud-assisted edge analytics	Local detection with heavier retraining or global correlation in the cloud	Hybrid edge-cloud pipelines, selective offloading, periodic retraining	Opaque offloading that hides latency or bandwidth cost	Placement diagram, offload frequency, bandwidth, fallback behavior, and privacy boundary.

Table 10. Cross-cutting matrix of AI method families and IoT-edge threat types.

AI Family	DDoS/Traffic Flood	Botnet/Mirai Variants	False Data Injection	Sybil/ Spoofing	Poisoning/ Evasion	Practical Interpretation
Traditional ML	Suitable for fast gateway filtering	Suitable when features are stable	Limited without domain features	Limited unless identity features exist	Sensitive to feature manipulation	Best for low-latency baselines, but feature pipeline and threshold behavior must be reported.
Deep learning	Suitable for temporal or high-dimensional traces	Strong when trained on diverse variants	Potentially useful with sensor/time-series context	Limited without graph or identity modeling	Sensitive to adversarial and distribution-shift effects	Useful for representation learning, but requires latency, memory, and robustness evidence.
Federated learning	Useful across sites or fleets	Useful when device data cannot be pooled	Useful for privacy-sensitive domains	Vulnerable if clients are malicious	High exposure to poisoning and update leakage	Appropriate when data locality matters, but communication and malicious-client assumptions must be explicit.
Graph-based learning	Useful for propagation and topology patterns	Useful for command-and-control structure	Useful where physical or network topology is meaningful	Promising but vulnerable to topology poisoning	Sensitive to graph construction and node injection	Best when relational structure is central; graph-window and scalability choices require reporting.
XAI/trustworthy AI	Helps explain alert drivers	Helps analyst validation	Supports safety-critical interpretation	Can expose identity or feature cues	Explanations can be gamed or leak signals	Useful as an audit layer, but explanation fidelity, runtime overhead, and leakage risk should be evaluated.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xue, Q.; Xue, P.; Wang, Z.; Ma, H. Artificial Intelligence for Cybersecurity in IoT-Edge Systems: A Structured Review of Methods, Datasets, Evaluation, and Deployment Challenges. Electronics 2026, 15, 2409. https://doi.org/10.3390/electronics15112409

AMA Style

Xue Q, Xue P, Wang Z, Ma H. Artificial Intelligence for Cybersecurity in IoT-Edge Systems: A Structured Review of Methods, Datasets, Evaluation, and Deployment Challenges. Electronics. 2026; 15(11):2409. https://doi.org/10.3390/electronics15112409

Chicago/Turabian Style

Xue, Qingshui, Pandong Xue, Zhimin Wang, and Haifeng Ma. 2026. "Artificial Intelligence for Cybersecurity in IoT-Edge Systems: A Structured Review of Methods, Datasets, Evaluation, and Deployment Challenges" Electronics 15, no. 11: 2409. https://doi.org/10.3390/electronics15112409

APA Style

Xue, Q., Xue, P., Wang, Z., & Ma, H. (2026). Artificial Intelligence for Cybersecurity in IoT-Edge Systems: A Structured Review of Methods, Datasets, Evaluation, and Deployment Challenges. Electronics, 15(11), 2409. https://doi.org/10.3390/electronics15112409

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Intelligence for Cybersecurity in IoT-Edge Systems: A Structured Review of Methods, Datasets, Evaluation, and Deployment Challenges

Abstract

1. Introduction

2. Background: IoT-Edge Cybersecurity Landscape

3. Review Methodology

3.1. Search Scope and Core Search Strings

3.2. Inclusion, Exclusion, and Extraction Rules

3.3. Study Selection and Coding Protocol

3.4. Workspace-Verifiable Corpus Reconstruction

4. Two-Dimensional Taxonomy of AI Methods and Deployment Objectives

5. Cybersecurity Tasks and Application Scenarios

6. Datasets, Benchmarks, and Experimental Settings

7. Evaluation Metrics

8. Deployment Challenges in IoT-Edge Environments

9. Research Gaps and Future Directions

10. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI