Next Article in Journal
Compact Four-Port Ku-Band MIMO Antenna with Enhanced Isolation Using Modified DGS for Early-Phase 6G Applications
Next Article in Special Issue
Heterogeneous Multi-Domain Dataset Synthesis to Facilitate Privacy and Risk Assessments in Smart City IoT
Previous Article in Journal
Digital Twinning Future Trends Evaluation Framework: A Digital Twins Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancing IoT Security with Generative AI: Threat Detection and Countermeasure Design

1
Department of Computer Science and Media Technology, Malmö University, 205 06 Malmö, Sweden
2
Sustainable Digitalisation Research Centre, Malmö University, 205 06 Malmö, Sweden
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(1), 92; https://doi.org/10.3390/electronics15010092
Submission received: 18 November 2025 / Revised: 17 December 2025 / Accepted: 22 December 2025 / Published: 24 December 2025

Abstract

The rapid proliferation of Internet of Things (IoT) devices has increased the attack surface for cyber threats. Traditional intrusion detection systems often struggle to keep pace with novel or evolving threats. This study proposes an end-to-end generative AI-based intrusion detection and response pipeline designed for automated threat mitigation in smart home IoT environments. It leverages a Variational Autoencoder (VAE) trained on benign traffic to flag anomalies, a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model to classify anomalies into five attack categories (C&C, DDoS, Okiru, PortScan, and benign), and Grok3—a large language model—to generate tailored countermeasure recommendations. Using the Aposemat IoT-23 dataset, the VAE model achieves a recall of 0.999 and a precision of 0.961 for anomaly detection. The BERT model achieves an overall accuracy of 99.90% with per-class F1 scores exceeding 0.99. End-to-end prototype simulation involving 10,000 network traffic samples demonstrate a 98% accuracy in identifying cyber attacks and generating countermeasures to mitigate them. The pipeline integrates generative models for improved detection and automated security policy formulation in IoT settings, enhancing detection and enabling quicker and actionable security responses to mitigate cyber threats targeting smart home environments.

1. Introduction

The Internet of Things (IoT) is a network of interconnected physical devices, ranging from smart home appliances and wearable gadgets to industrial machines and vehicles, embedded with sensors, software, and connectivity. These smart objects collect, share, and analyze data, enabling automation and intelligent decision-making across various domains. Smart homes, as one of the domains of IoT, refer to residential environments equipped with interconnected IoT devices for automation [1]. Smart home ecosystems encompass the broader network of devices and services and have transformed daily living through seamless automation, remote control, and monitoring [2]. However, this convenience comes at the cost of adding numerous endpoints to smart home networks, the underlying infrastructure facilitating communication among devices. The increased complexity can lead to a lack of robust security safeguards [1,3]. Cybercriminals can exploit these devices to compromise privacy and safety [1,4,5]. Recent reports estimate that over 60% of smart home devices harbor exploitable firmware flaws or default credentials, making them prime targets for botnets, data exfiltration, and lateral movement attacks [6]. Traditional intrusion detection systems (IDS) often rely on static signature databases or behavioral rules, which rapidly become obsolete against polymorphic or zero-day threats [2]. They are frequently inadequate due to the evolving nature of cyber threats and the diverse range of IoT devices, necessitating the need for effective cyber threat detection and mitigation strategies [2,7].
Generative AI models offer a powerful alternative by learning compact representations of normal system behavior and generating human-readable insights for mitigation [8]. Variational Autoencoders (VAE) can compress high-dimensional network telemetry into low-dimensional latent spaces, enabling real-time anomaly scoring without explicit attack signatures [7,9]. Meanwhile, transformer-based large language models (LLM), such as Bidirectional Encoder Representations from Transformers (BERT), excel at classification tasks once fine-tuned on labeled security data [10]. Models like Grok3 can translate detected anomalies data into contextualized, actionable countermeasures [11].
This study builds upon the research framework and initial findings presented by Oacheșu [12], and presents a three-stage AI pipeline for smart home security: (1) a VAE for unsupervised detection of anomalies in IoT traffic; (2) a BERT classifier that confirms and categorizes these anomalies with higher accuracy; and (3) an LLM that generates tailored mitigation strategies for each threat. Using the Aposemat IoT-23 dataset, the research shows promising detection performance, high classification fidelity, and policy-ready recommendations for IoT-based smart home network defense.
The structure of the paper is as follows: Section 2 reviews IoT security and related work on intrusion detection systems and AI approaches to identify research gaps in this domain. Section 3 outlines the methodology, including the Aposemat IoT-23 dataset preprocessing, VAE for anomaly detection, BERT for threat classification, and Grok 3 for countermeasure generation, alongside a simulation setup. Section 4 presents the results, discusses limitations, and addresses validity threats. Section 5 concludes with key findings and future research directions. To the authors’ knowledge, this is the first paper proposing a pipeline that integrates VAE, BERT, and a generative LLM to both detect and respond to IoT smart home-specific threats.

2. Related Work

2.1. Security Challenges in Smart Home IoT

The proliferation of IoT devices in smart homes has increased their vulnerability to attacks, including botnets, Distributed Denial of Service (DDoS) attacks, and data breaches [5]. These devices, often resource-constrained, employ diverse communication protocols, which complicates security efforts [13]. Traditional IDS, which relies on predefined attack signatures, has poor performance in detecting zero-day exploits or adapting to the dynamic IoT threat landscape [2]. The inadequacy of static signature-based systems in IoT contexts is highlighted by numerous existing papers, which note high false-negative rates against novel attacks [14,15]. Additionally, the scalability and adaptability required for smart home security implies a shift toward learning-based approaches [3,16].
In real-world applications, smart home ecosystems function as Cyber–Physical Systems (CPS), where digital computation interacts with physical processes. Security in this domain often intersects with the field of discrete event systems (DES), which provides a formal abstraction for modeling system behaviors under threat. Recent work in this field has focused on the synthesis of resilient supervisors capable of maintaining system safety even when attackers modify sensor information or manipulate actuators to force the system into unsafe states [17]. Understanding these theoretical foundations is critical for developing robust defense mechanisms that prevent physical hazards in smart home environments.

2.2. Intrusion Detection Systems (IDS) and Anomaly Detection

Traditional IDSs fall into three categories: Signature-based IDS, Anomaly-based IDS, and Specification-based IDS. A signature-based IDS detects suspicious network traffic by comparing attack signatures to a database, effectively handling known threats but struggling with zero-day and polymorphic attacks [5,14,18]. An anomaly-based IDS, also known as a behavioral-based IDS, learns only the normal behavior from data and flags those behaviors that deviate from the norm. While this approach has the capability to detect zero-day novel attacks, it suffers from high false-positive rates and there is a need for large amount of data to define what constitute normal behaviors [2]. Furthermore, anomaly detection approaches do not have the ability to provide information about the type of anomaly that is detected [19]. A specification-based IDS, also known as a behavioral-rule-based IDS, uses predefined normal behavior profiles from devices. Clear rules help to reduce false positives [20], but their reliance on predefined behaviors limits adaptability to new threats [2]. Conversely, hybrid approaches blend signature and anomaly detection to balance accuracy and false alarms but often need significant labeled data and computational resources, which can be challenging in resource-limited IoT settings [21].

2.3. Attack-Specific Countermeasures

This study focuses on four common cyber attacks in IoT smart homes: Command and Control (C&C), Distributed Denial of Service (DDoS), Okiru, and Horizontal Port Scan. Referencing the Aposemat IoT-23 dataset, these attacks exploit vulnerabilities in interconnected devices, threatening security, functionality, and user privacy [22,23]. For instance, attacks such as C&C enables compromised devices to enroll in botnets. Detection relies on identifying persistent C2 channels and blocking known malicious IPs [24]. DDoS attack enables attacker to flood IoT network with a massive amount of malicious traffic from multiple compromised devices. Typical mitigation includes rate limiting, traffic filtering, and network segmentation [25,26]. Okiru malware is a Trojan backdoor targeting IoT devices, including routers and web IP cameras. Countermeasures involve firmware hardening, disabling unused services, and anomaly-based triggers [27,28]. A horizontal port scan is a technique where a single port is scanned across multiple IP addresses on a network, rather than multiple ports on a single host. The defenses use dynamic firewall rules, IDS rules, and geo-IP filtering [4,23].
While these countermeasures reflect best practices, they are typically reactive and static, lacking adaptability [27,29].

2.4. AI in IoT Security

While attackers can use artificial intelligence (AI) to exploit IoT ecosystems, such as through adversarial machine learning (ML) to evade detection [7,30], AI—particularly ML—has emerged as a promising solution for securing IoT ecosystems [2,5]. Supervised ML models, such as Random Forests and Support Vector Machines, have been widely applied to classify malicious activities in network traffic [2,31]. However, their dependence on labeled datasets, often unavailable in IoT settings, limits their practicality [13]. To overcome this, unsupervised learning techniques like clustering and autoencoders have gained traction [32].
Unsupervised learning uses normal traffic patterns to detect anomalies. Generative AI models like VAEs and GANs improve cybersecurity by modeling complex data distributions [33], aiding in anomaly detection, dimension reduction, and generating synthetic data to tackle data imbalance or scarcity [7,34]. VAEs, in particular, excel at anomaly detection by learning to reconstruct normal data and flagging deviations with high reconstruction errors [35]. According to Vadisetty and Polamarasetti  [8], a generative AI-enhanced IDS outperformed traditional methods, achieving a 15% improvement in malware detection accuracy and a 6% reduction in false positive rate in their study.
Beyond anomaly detection, LLMs such as BERT have been adapted for security tasks [10,36]. They can be used for analyzing security logs and threat classification, although their full usability in cybersecurity has not been fully explored [37]. Generative AI can not only detect but also interpret and categorize threats, provide context, and develop understandable countermeasures. However, there are currently few IoT-specific implementations available.
Integrating advanced AI, such as Generative AI, into existing cybersecurity frameworks presents significant challenges due to high computing resource demands [8]. Centralized control techniques face scaling challenges as the number of nodes increases, leading to a decrease in overall efficiency [21]. These studies highlight AI’s potential in IoT security while also revealing its limitations in scalability, adaptability, and resource efficiency.

2.5. Countermeasure Generation

Generating effective countermeasures is as critical as detecting the anomalies or classifying the threats [20,21]. Traditional countermeasure systems rely on rule-based or expert-defined responses, which lack flexibility in dynamic environments [2,3,14]. For instance, a rule-based framework for IoT security that mitigates known threats can fail against new attack vectors [8,20]. To address this, adaptive approaches have been investigated, such as a reinforcement learning model to generate countermeasures for network security, dynamically adjusting responses based on attack patterns. However, this requires significant computational resources [2,7].
The use of generative AI for countermeasure design remains largely unexplored, particularly in smart homes where real-time, lightweight responses are essential. Current research lacks adaptive, automated countermeasures.

2.6. Research Gap and Motivation

AI-based IDS solutions currently face challenges with resource demands and adapting to IoT environments. While using VAEs for anomaly detection shows promise, it often lacks in classification and mitigation. Although LLMs could provide adaptive responses, few systems successfully integrate these elements.
These gaps are addressed with a generative AI approach in this study:
  • VAE-based Anomaly Detection: Learns benign traffic and flags novel patterns without labeled attack data.
  • BERT-based Threat Classification: Refines anomaly flags into specific threats, reducing false positives and enabling contextual interpretation.
  • LLM-driven Mitigation (Grok): Produces actionable, tailored countermeasures to address a specific cyber threat.
This work proposes an end-to-end pipeline for IoT environments with specific focus on smart home security using unsupervised learning, transformer-based classification, and generative AI-based mitigation strategies.

3. Methodology

3.1. Overview of the Proposed Pipeline

The proposed system is a generative AI-based pipeline designed to enhance smart home IoT security by enabling near real-time threat detection and automated countermeasure generation. It comprises three core components: a VAE model for unsupervised anomaly detection, a BERT model for threat classification, and Grok for generating adaptive responses [38]. This architecture addresses the limitations of traditional IDSs by leveraging the generative and contextual capabilities of generative AI to handle the dynamic and heterogeneous nature of IoT environments.
The data flow through the pipeline is sequential and can be summarized as follows (see Figure 1):
  • Input Processing: Raw network traffic from smart home IoT devices (from the Aposemat IoT-23 dataset) is preprocessed to extract relevant features (e.g., resp_bytes, duration, proto) and to handle missing or inconsistent data.
  • Anomaly Detection: The VAE is trained exclusively on benign samples to learn normal traffic patterns. During inference, it computes reconstruction errors and flags samples exceeding a threshold as anomalous.
  • Threat Classification: Detected anomalies are passed to the BERT model, which classifies them into five categories—Benign, Command and Control, DDoS, Okiru, or Horizontal Port Scan—along with associated confidence scores.
  • Countermeasure Generation: Each classified threat is processed by Grok, which generates context-specific countermeasures (e.g., IP blocking, rate limiting) based on traffic features and threat type.
The pipeline, implemented in Python 3.10.13, processes CSV-formatted Aposemat IoT-23 data, ensuring interoperability for uniform data handling across VAE, BERT, and Grok3. The system’s architecture leverages the strengths of each model: VAE’s ability to detect novel anomalies without labeled data, BERT’s contextual understanding for accurate threat classification, and Grok’s generative capabilities for producing actionable and adaptive countermeasures. The system aims to process network traffic dynamically, ensuring scalability and adaptability for smart home environments. By combining unsupervised anomaly detection with supervised classification and generative countermeasure design, the pipeline provides a solution that not only identifies but also responds to cyber threats.

3.2. Dataset Description and Preprocessing

The Aposemat IoT-23 dataset [22] is used to train and evaluate the proposed generative AI pipeline for smart home IoT security. This dataset, curated by Stratosphere Labs, contains simulated network traffic from IoT devices, capturing both benign and malicious activities. It comprises 23 scenarios, including various attack types, making it representative of smart home IoT environments. For this study, five classes of smart home threats are selected to align with common threats: Benign, C&C, DDoS, Okiru, and Horizontal Port Scan. These classes cover a range of attack vectors, from botnet communications to service disruptions, ensuring a comprehensive evaluation of the pipeline.
Preprocessing transforms raw network captures into a format suitable for the VAE, BERT, and Grok models. The dataset includes features such as duration, orig_bytes, resp_bytes, proto, service, and history, among others. The preprocessing steps are detailed below.

3.2.1. Handling Missing Values via Scenario-Specific Imputation

Missing values in the dataset, particularly in features like service and duration, were addressed using a scenario-specific imputation strategy. The Aposemat IoT-23 dataset comprises independent simulation scenarios. Training a single imputation model on the aggregated dataset would fail to capture the unique network behaviors and device characteristics inherent to each specific capture file.
Therefore, imputation models were instantiated and trained locally for each capture file. For categorical features (e.g., proto, service), a LightGBM classifier was fitted strictly on the subset of complete rows within that specific scenario. For numerical features (e.g., duration, orig_bytes), a LightGBM regressor paired with an autoencoder was used to refine imputation (see Algorithm 1 and Table 1). This process ensures that the imputation learns only from the local context (e.g., a specific Mirai botnet run) and does not leak distributional information from the global dataset.
Algorithm 1: Scenario-Specific Missing Value Imputation
Electronics 15 00092 i001
The aggregation of data and subsequent splitting into Training (VAE/BERT training) and Testing (Pipeline evaluation) sets was performed after this localized cleaning process. This enforces data separation, ensuring that the generative models were evaluated on patterns they had not implicitly learned during preprocessing.

3.2.2. Labeling and Filtering

The dataset includes five classes: Benign, C&C, DDoS, Okiru, and Horizontal Port Scan. Feature selection is performed using Chi-square tests and Principal Component Analysis (PCA) to identify important features and eliminate non-informative ones (e.g., uid, id_orig_h), reducing dimensionality and improving efficiency.

3.2.3. Normalization/Encoding

Categorical features (e.g., proto, service) are converted to numerical representations using one-hot encoding to ensure compatibility with the VAE and BERT models. Numerical features (e.g., duration, resp_bytes) are normalized using StandardScaler to standardize their range, mitigating the impact of varying scales on model performance.
The preprocessed dataset is stored in CSV format, containing selected features, labels, and metadata, ensuring interoperability across pipeline components. This preprocessing approach ensures the data is clean, representative, and optimized for anomaly detection, threat classification, and countermeasure generation.

3.3. Variational Autoencoder for Anomaly Detection

The VAE serves as the initial stage in the generative AI pipeline, conducting unsupervised anomaly detection on smart home IoT traffic. It learns a probabilistic representation of normal traffic patterns to identify deviations that signal potential threats, particularly useful for zero-day attack detection.

3.3.1. Encoder–Latent–Decoder Structure

The VAE architecture comprises an encoder, a 6-dimensional latent space, and a decoder, implemented as a neural network (see Figure 2). The encoder maps input features (e.g., duration, resp_bytes, proto) to the latent space through dense layers with dimensions [128, 64, 32]. The latent space is parameterized by mean and variance vectors, which enforce a Gaussian distribution through sampling. The decoder reconstructs the input from the latent representation through mirrored layers [32, 64, 128]. This structure balances model complexity and efficiency.

3.3.2. Activation Functions and Loss Function

Hidden layers use Leaky ReLU activations for non-linearity, with batch normalization to stabilize training and a dropout rate of 0.2 to prevent overfitting. The loss function combines reconstruction and regularization terms:
L = MSE ( x , x ^ ) + β · KL ( q ( z | x ) | | p ( z ) ) ,
where x is the input, x ^ is the reconstructed output, q ( z | x ) is the encoder’s distribution, p ( z ) is a standard normal prior, and β = 2.0 weights the KL-divergence for latent space regularization.

3.3.3. Training on Benign Data Only

The VAE is trained on 1,000,000 benign samples from the Aposemat IoT-23 dataset to capture normal traffic patterns. Training employs an optimizer with a learning rate of (1 × 10−4), weight decay of (1 × 10−6), and a batch size of 256 over 100 epochs, as shown in Algorithm 2 and Table 2. A validation set of 100,000 benign samples ensures convergence. The contamination parameter is set to 0.01, reflecting the expected proportion of anomalies, and a capacity parameter of 0.1 controls model complexity.
Algorithm 2: Variational Autoencoder (VAE) Training Procedure
Electronics 15 00092 i002

3.3.4. Threshold Selection for Anomaly Scoring

During inference, the VAE computes the Mean Squared Error (MSE) for each sample. A threshold is selected to classify samples as anomalous, determined using a validation set to prioritize high recall and minimize the number of missed threats. Anomalous samples are forwarded to the BERT module for classification.

3.4. BERT-Based Threat Classification

The BERT model serves as the second stage of the generative AI pipeline, classifying anomalies flagged by the VAE into specific threat categories.

3.4.1. Role of the Benign Class and Hierarchical Conflict Resolution

The VAE is intentionally tuned for high sensitivity, utilizing a low reconstruction error threshold (MSE > 0.26) based on Recall prioritization, resulting in a negligible missed-detection rate during evaluation. This configuration prioritizes the capture of almost all potential threats (minimizing False Negatives) at the cost of admitting a higher rate of benign traffic as potential anomalies (False Positives).
The BERT model serves two distinct purposes: (1) to classify the specific attack type of true anomalies, and (2) to act as a precision filter for the False Positives generated by the VAE. In cases of contradictory predictions—where the VAE flags a sample as anomalous but BERT classifies it as “Benign”, the system prioritizes the supervised BERT model’s prediction. The sample is treated as a False Positive from the unsupervised stage and is discarded from the countermeasure generation queue. This hierarchical approach leverages the VAE’s ability to catch novel deviations and BERT’s contextual understanding to reduce false alarms. Conversely, samples classified as benign by the VAE bypass the BERT module entirely.

3.4.2. Input Format for BERT Classification

The BERT model processes preprocessed network traffic features from the Aposemat IoT-23 dataset, including categorical features (e.g., proto, service) and numerical features (e.g., duration, orig_bytes, resp_bytes). These features are transformed into tokenized sequences using the bert-base-uncased tokenizer from the Hugging Face transformers library [39]. Each sample is converted into a sequence of token IDs, attention masks, and token type IDs, suitable for BERT’s sequence classification task.

3.4.3. Number of Classes

The pre-trained BERT model is fine-tuned for a 5-class classification task, corresponding to the threat categories: Benign (label 0), C&C (label 1), DDoS (label 2), Okiru (label 3), and Horizontal Port Scan (label 4), as illustrated in Algorithm 3 and Table 3.
Algorithm 3: BERT Fine-Tuning for Threat Classification
Electronics 15 00092 i003

3.4.4. Training, Validation, and Testing Split

The dataset is divided into training, validation, and testing sets for effective fine-tuning and evaluation. The training set has 150,000 samples (30,000 per class) for balanced representation across Benign, C&C, DDoS, Okiru, and Horizontal Port Scan. The validation set consists of 25,000 samples (5000 per class) for hyperparameter tuning and monitoring. The testing set includes 25,000 samples (5000 per class) to assess performance on unseen data.

3.4.5. Fine-Tuning and Evaluation Setup

Fine-tuning adapts the pre-trained bert-base-uncased model by adding a classification head for the 5-class task and with a learning rate of (3 × 10−5), a batch size of 32, and early stopping after three epochs without improvement in validation loss or when accuracy passes 99% (see Table 3 and Table 4, and Algorithm 4). The model is trained using the Hugging Face transformers library, with evaluation loss (eval_loss) monitored to select the best model checkpoint. The setup is designed to leverage BERT’s contextual understanding for accurate threat classification in IoT networks.
Algorithm 4: Model Training with Dual Early Stopping Strategy
Electronics 15 00092 i004

3.4.6. Classification Process

The BERT model transforms the tokenized input sequence into a probability distribution over the five threat classes. Given an input feature sequence X, comprising tokenized network traffic features, BERT processes X through its transformer layers to produce a contextualized embedding. A linear classification head, followed by a softmax function, maps this embedding to class probabilities:
P ( y | X ) = softmax ( W · h + b ) ,
where W is the weight matrix, h is the contextualized embedding, b is the bias vector, and P ( y | X ) represents the probability distribution over the classes y { 0 , 1 , 2 , 3 , 4 } (Benign, C&C, DDoS, Okiru, Horizontal Port Scan). This process ensures that BERT assigns each anomaly a threat label with an associated confidence score, facilitating precise classification for the generation of downstream countermeasures.

3.5. LLM-Based Countermeasure Generation

The final stage of the generative AI pipeline employs Grok 3, a transformer-based large language model developed by xAI, to generate context-specific countermeasures for cyber threats identified by the VAE and classified by BERT. Grok 3 leverages advanced natural language processing to produce actionable, human-readable mitigation strategies, addressing the limitations of static, rule-based countermeasures.

3.5.1. Prompt Engineering: Input Structure and Context

Grok 3 is driven by structured prompts that encapsulate comprehensive threat information, ensuring precise countermeasure generation. Each prompt includes:
  • A SYSTEM_PROMPT defining the requested behavior: “You are a cybersecurity expert tasked with generating specific and actionable countermeasures for network attacks based on the provided IoT smart home network traffic data.”
  • The VAE’s Mean Squared Error (MSE) reconstruction error, indicating anomaly severity.
  • The BERT classification label (e.g., Benign, C&C, DDoS, Okiru, Horizontal Port Scan) and confidence score, providing threat type and certainty.
  • Original sample features (e.g., duration, orig_bytes, resp_bytes, orig_ip_bytes, resp_ip_bytes, proto, service).
An example prompt is:
“Analyze the following network traffic sample: Duration: 3976 s, Original Bytes: 42,439, Original IP Bytes: 25,752.22, Response IP Bytes: 0.0, Service: http. It is classified as an Okiru attack with a VAE MSE reconstruction error of 0.222 and a BERT confidence of 1.0. Suggest specific countermeasures to mitigate this threat attack.”
Prompts are created using Python libraries (pandas, requests) and processed via xAI’s API with a temperature setting of 0 to prioritize determinism. Samples are processed in batches of five with a 1-s delay to manage API rate limits.

3.5.2. Output Structure

Grok 3 generates countermeasures in a human-readable format, presenting them as lists of actionable recommendations tailored to specific identified threats. For instance, in the case of an Okiru attack, the output includes suggestions such as blocking infected IP addresses, implementing rate limiting, and enabling intrusion detection systems. These recommendations are addressing the unique characteristics of the attacks, such as botnet communications for Okiru and flooding for DDoS attacks.

3.5.3. Post-Processing or Formatting

Post-processing standardizes Grok 3’s outputs for usability:
  • Parsing: Text output is parsed to extract individual countermeasures, ensuring clarity and consistency.
  • Categorization: Countermeasures are grouped by threat type (e.g., Okiru, DDoS) based on similarity to create aggregated mitigation strategies.
  • Severity Assignment: Severity levels (Low, Medium, High) are assigned using the VAE MSE score (considering the deviation from the threshold), prioritizing actions for high-severity threats (e.g., immediate IP blocking) over monitoring for lower-severity ones.
Processed countermeasures are stored in CSV format with metadata (e.g., sample_id, prompt, attack_label) for integration into the pipeline and real-time deployment.

3.6. System Integration and Simulation Setup

3.6.1. Pipeline Execution Flow: Anomaly → Classification → Response

The pipeline processes network traffic data sequentially, as illustrated in Algorithm 5.
  • VAE Anomaly Detection: Preprocessed network traffic samples, including features like duration, orig_bytes, and proto, are fed into the VAE. The VAE, trained on benign samples, computes reconstruction errors (Mean Squared Error, MSE) and flags samples exceeding a predefined threshold as anomalous. These anomalous samples, along with their MSE scores, are passed to the BERT module.
  • BERT Threat Classification: The BERT model, fine-tuned for a 5-class classification task (Benign, C&C, DDoS, Okiru, Horizontal Port Scan), processes anomalous samples. It tokenizes input features and assigns threat labels with confidence scores, filtering out false positives. Classified malicious samples and their metadata are forwarded to Grok 3.
  • Grok 3 Countermeasure Generation: Grok 3 receives structured prompts containing the VAE’s MSE score, BERT’s threat label, confidence score, and original sample features. It generates context-specific countermeasures, stored in CSV format with metadata (e.g., sample_id, attack_label). The pipeline is implemented in Python, using libraries such as pandas for data handling and requests for API interactions with Grok 3, ensuring seamless data flow across components.
The data is stored as CSV files for interoperability and processed sequentially to ensure each stage builds on the previous one’s output.
Algorithm 5: Generative AI Threat Detection and Response Pipeline
Electronics 15 00092 i005

3.6.2. Simulation Setup

The pipeline’s performance is evaluated through a simulation using 10,000 network traffic sessions from the Aposemat IoT-23 dataset, which includes a mix of benign and malicious traffic (C&C, DDoS, Okiru, Horizontal Port Scan). The simulation processes samples sequentially through the VAE, BERT, and Grok 3 to assess the functionality of each component. Data handling is optimized with intermediate outputs stored in CSV files for traceability.

3.6.3. Evaluation Strategy for End-to-End Performance

The end-to-end performance is assessed using quantitative and qualitative metrics to evaluate detection, classification, and response effectiveness:
  • Quantitative Metrics: Detection rate (proportion of malicious samples correctly identified and classified), false positive rate (benign samples incorrectly flagged or misclassified), and classification accuracy.
  • Qualitative Metrics: Countermeasure quality, assessed for relevance (specificity to the threat), alignment (with IoT security best practices) [23,24,25,26,27,28], and actionability (feasibility in smart home settings).
The evaluation utilizes a test set of 10,000 samples to ensure a robust assessment across diverse threat scenarios, with results validated through both statistical and qualitative analysis.

4. Results Analysis and Discussion

4.1. Dataset Preparation Results and Analysis

4.1.1. Preprocessing Results

For unsupervised feature selection, PCA was applied to benign samples, identifying features that contribute to 92% of the variance in normal traffic, which is critical for VAE anomaly detection. Key contributors included resp_bytes, missed_bytes, service_encoded, and duration, as shown in Table 5. The selected features (resp_bytes, missed_bytes, service_encoded, duration, proto_encoded, conn_state_encoded, resp_pkts, resp_ip_bytes, history_encoded) capture essential patterns in benign traffic, supporting the VAE’s ability to model normal behavior.
For supervised feature selection, Chi-Square tests were applied to all samples (benign and malicious), identifying features with a significant statistical association with the five threat classes (Benign, C&C, DDoS, Okiru, and Horizontal Port Scan). The selected features (orig_bytes, resp_bytes, orig_ip_bytes, duration, conn_state_encoded, history_encoded, orig_pkts, proto_encoded, service_encoded) were used for BERT classification.

4.1.2. Preprocessing Analysis

The 92% variance retention from PCA indicates that the selected features effectively reduce dimensionality while preserving the representativeness of benign traffic, enabling the VAE to achieve a recall of 99.9% in anomaly detection. The Chi-Square test’s high scores confirm the relevance of selected features for BERT’s classification, contributing to its 99.90% accuracy across the five threat classes. The large, benign samples used for VAE training ensured comprehensive learning of normal traffic patterns, while balanced splits for BERT prevented class imbalance, enhancing classification fairness.
However, PCA’s efficiency in capturing 92% variance may overlook subtle features relevant to rare or novel attacks, as noted in [14]. Chi-Square tests evaluate the correlation between each feature and the target label in the Aposemat IoT-23 dataset, with NaN scores for missed_bytes, resp_pkts, and resp_ip_bytes indicating no quantified correlation.

4.1.3. Implications

The preprocessed dataset allowed the generative AI pipeline to function effectively, retaining 92% of the variance and ensuring access to key features. This facilitated high accuracy in anomaly detection (VAE) and threat classification (BERT), as outlined in later sections. The dataset preparation underscores the crucial role of robust preprocessing in machine learning, where data quality directly affects model performance and security results.

4.2. Anomaly Detection Performance and Analysis

The VAE effectively detected anomalies in smart home IoT network traffic, leveraging its unsupervised learning approach. This section presents the VAE’s performance metrics, analyzes their significance, and discusses implications and limitations for IoT security.

4.2.1. Anomaly Detection Performance Metrics

The VAE achieved a recall of 99.9% with a Mean Squared Error (MSE) threshold of 0.26, optimized using a validation set, as shown in Figure 3. This threshold enabled the VAE to flag 99.9% of anomalous samples in the Aposemat IoT-23 test set, ensuring minimal missed threats.

4.2.2. Threshold Selection and Sensitivity Analysis

To determine the optimal MSE threshold for anomaly detection, a sensitivity analysis was conducted on the validation set. Figure 4 illustrates the trade-offs between Precision, Recall, and F1-Score across varying MSE thresholds ranging from 0.0 to 0.32.
The primary objective of the VAE within this pipeline is to act as a high-sensitivity filter to minimize False Negatives (missed attacks). Consequently, we prioritized Recall over Precision during threshold selection. As shown in Table 6, a threshold of 0.26 was selected. At this operating point, the model achieves a Recall of approximately 0.999, ensuring that nearly all malicious samples are forwarded to the subsequent BERT classification stage, while maintaining a Precision of 0.961. Although lower thresholds (e.g., 0.21) maintain high recall, they do not yield significant improvements in F1-score. In contrast, higher thresholds (e.g., 0.30) cause a sharp drop in Recall, leading to unacceptable missed-detection rates.

4.2.3. Anomaly Detection Results Analysis

The 99.9% recall demonstrates the VAE’s ability to detect novel and zero-day attacks by modeling normal traffic patterns, a critical advantage in IoT environments where labeled attack data is scarce. The 6-dimensional latent space and Leaky ReLU activations enabled robust feature learning, while the MSE threshold of 0.26 balanced sensitivity with precision. The high detection accuracy aligns with findings in related work [8], which highlight the superiority of generative AI over signature-based systems.
The VAE’s performance underscores its value in unsupervised anomaly detection, providing a strong foundation for the pipeline’s threat classification and response stages.

4.3. Threat Classification Performance and Analysis

The BERT model effectively classified anomalies into specific threat categories, demonstrating high accuracy and balanced performance across all classes. This section presents the BERT’s performance metrics, analyzes their significance, and discusses implications and limitations for IoT security.

4.3.1. Threat Classification Performance Metrics

The BERT model achieved an overall accuracy of 99.9%, with per-class precision, recall, and F1-scores as shown in Table 7. The confusion matrix in Figure 5 indicates that only 10 benign and 15 malicious samples were misclassified in a 25,000-sample test set, demonstrating the model’s robustness.

4.3.2. Threat Classification Results Analysis

The 99.90% accuracy highlights BERT’s effectiveness in classifying diverse threat types in smart home IoT networks, with minimal misclassification errors. The balanced per-class metrics ensure that the model performs well across all threat categories. The high accuracy is attributed to the balanced dataset splits and the pre-trained BERT’s ability to capture contextual patterns, aligning with advancements in LLM-based security solutions [36].

4.4. Countermeasure Generation Performance and Analysis

Grok 3 effectively generated context-specific countermeasures for cyber threats in smart home IoT networks, enhancing the pipeline’s ability to respond dynamically.

4.4.1. Countermeasures Performance Metrics

Grok 3 generated countermeasures for four attack types: Okiru, DDoS, C&C, and Horizontal Port Scan. The evaluation focused on relevance, alignment with IoT security best practices [23,24,25,26,27,28], and actionability in smart home settings, as shown in Table 8. For Okiru, there were 16 countermeasures (all relevant, 15 aligned, 14 actionable). For DDoS, 11 countermeasures (all relevant, 10 aligned, 9 actionable). C&C generated 12 countermeasures (all relevant and aligned, 10 actionable). The Horizontal Port Scan had 16 countermeasures (all relevant, 15 aligned, 14 actionable). Aggregated countermeasures by attack type and severity levels are detailed in Table 9.

4.4.2. Countermeasures Performance Analysis

The high relevance (100% for all attack types) and strong alignment (e.g., 15/16 for Okiru, 12/12 for C&C) demonstrate Grok 3’s ability to generate countermeasures tailored to specific threats, surpassing static rule-based approaches. The actionability rates (e.g., 14/16 for Okiru, 9/11 for DDoS) indicate practical applicability. The aggregation of countermeasures across attack types and severity levels enhances their usability in diverse IoT scenarios.
The generated countermeasures were manually reviewed to ensure alignment with the recommended actions for each attack type, as outlined in Section 2.3. Each countermeasure listed in Table 9 corresponded with the endorsed mitigation steps for the relevant attack classes (C&C, DDoS, Okiru, PortScan). Grok 3’s countermeasure generation enhances the pipeline’s ability to provide dynamic, actionable responses.

4.5. End-to-End Pipeline Performance and Analysis

This section presents the end-to-end performance metrics, leveraging results from the 10,000-sample experiment (see Figure 6), VAE unsupervised malicious traffic filtering (Figure 7), and BERT classification (Figure 8), followed by analysis, discussion, and limitations.

4.5.1. Pipeline Performance Metrics

The pipeline processed a 10,000-sample test set, comprising 2000 benign and 8000 malicious samples (2000 each for C&C, DDoS, Okiru, and Horizontal Port Scan), as shown in Figure 6. The pipeline achieved a 99.95% end-to-end detection rate, calculated as the proportion of malicious samples correctly identified by the VAE and classified by BERT, as reported in Table 10. The VAE flagged 9054 samples as potentially malicious, including 1054 benign samples incorrectly flagged, yielding a false positive rate (FPR) of 10.54% (Figure 7). BERT refined these results, achieving a 99.90% classification accuracy with only 4 benign samples misclassified as malicious and nine as benign, reducing the FPR to 0.04% (Figure 8). Grok 3 generated countermeasures for confirmed malicious samples, with high quality (e.g., 14/16 actionable countermeasures for Okiru, see Table 8).

4.5.2. Pipeline Results Analysis

The 99.95% detection rate, derived from the pipeline’s 10,000-sample experiment (Figure 6), demonstrates near-perfect identification of malicious traffic, driven by the VAE’s 99.9% recall (Figure 7). The VAE’s FPR of 10.54% (1054/10,000 benign samples misclassified) reflects its sensitivity to anomalies, flagging 9054 samples, including most of the 8000 malicious samples. BERT’s 99.90% accuracy, with only 13 misclassifications (4 benign, and 9 malicious) out of 10,000 samples (Figure 8), significantly reduced the FPR to 0.04% (4/10,000), showcasing effective filtering of false positives. Grok 3’s countermeasures, evaluated for relevance, alignment, and actionability, provided practical responses, including 14 out of 16 actionable countermeasures for Okiru.
The pipeline’s high performance, validated through the 10,000-sample experiment (Figure 6), VAE filtering (Figure 7), and BERT classification (Figure 8), underscores its potential to enhance smart home IoT security.

4.6. Discussion and Threats to Validity

4.6.1. Performance and Comparative Analysis

The generative AI pipeline, integrating VAE, BERT, and Grok 3, addresses the critical need for adaptive, real-time security solutions in smart home IoT environments. The VAE’s high recall (99.9%) ensures robust anomaly detection, capturing nearly all malicious traffic while allowing BERT to refine results with 99.87% classification accuracy, minimizing false positives to 0.04%. Grok 3’s countermeasures, with high relevance (e.g., 16/16 for Okiru), alignment (15/16), and actionability (14/16), enable dynamic, threat-specific responses, addressing research gaps in IoT security [2]. The pipeline’s sequential design ensures each component builds on the previous, creating a cohesive framework for threat detection and mitigation.
Compared to the work by Alani et al., accuracy of 99.61% on Aposemat IoT-23 using ensemble models [40], the pipeline outperforms by 0.25–0.28% in accuracy while offering automated countermeasures. This enables rapid, tailored mitigation, reducing manual intervention.

4.6.2. Methodological Design Choices

While clustering within the VAE’s latent feature space is a recognized method for anomaly detection [41], this study utilizes the MSE of reconstruction as the primary metric. The specific integration requirements of the generative AI pipeline drove this design choice. The MSE provides a direct scalar measure of anomaly severity (i.e., a higher MSE indicates a greater deviation from the norm). This scalar value is used to prompt-engineer the subsequent Grok 3 module (as detailed in Section 3.5.1), enabling the LLM to interpret threat intensity in context without complex preprocessing. Additionally, computing the element-wise difference between the input and the reconstructed output is more computationally efficient for real-time edge processing than computing distance metrics against multiple centroids in a high-dimensional latent space. Considering these, reconstruction error offers the necessary balance of explainability for the LLM and computational efficiency for the IoT environment.

4.6.3. Generative Model Selection and Practicality

While this study utilizes Grok 3 to demonstrate the efficacy of generative countermeasures, we acknowledge that other LLMs such as ChatGPT, Gemini, or Deepseek possess comparable capabilities. Grok 3 was selected for this proof of concept to leverage its API availability, but the pipeline architecture is model-agnostic. A comparative analysis of these models remains a direction for future research.
Regarding deployment feasibility, the current reliance on an external API introduces latency and operational costs that may not be suitable for all smart home hubs. To address deployment costs and ensure real-time performance in production environments, future iterations of this pipeline should prioritize migrating from cloud-based APIs to locally deployed, quantized open-source models running on edge hardware. Another approach is to handle the Grok 3 countermeasure generation before deployment by creating a countermeasure lookup table that can be used during deployment. The countermeasure lookup table then can be updated regularly; e.g., daily or weekly. With this approach, the need to call Grok 3 API for countermeasure during deployment is no longer needed as the lookup table can serve this purpose, thereby reducing the latency and improving response time of the system.
To ensure the overall accuracy and safety of the AI tools, this study employed a rigorous qualitative evaluation framework (assessing Relevance, Actionability, and Alignment) to validate that the generated countermeasures adhere to established IoT security best practices before implementation.

4.6.4. Operational Considerations and Resilience

While the proposed pipeline demonstrates high detection and classification accuracy, this study focused on architectural efficacy rather than operational latency. Specific inference time metrics and computational resource consumption (CPU/RAM usage) for the VAE and BERT models were not quantified in this proof-of-concept phase. The current reliance on the external Grok 3 API introduces a dependency on external internet connectivity. A critical failure mode arises during volumetric attacks, such as DDoS, which may saturate network bandwidth and prevent the system from reaching the API to generate countermeasures. To mitigate this risk and ensure resilience, future iterations must prioritize the migration from cloud-based APIs to locally deployed, quantized open-source models running directly on edge hardware. Local inference eliminates the latency bottleneck of API round-trips and ensures that defense mechanisms remain effective even when external connectivity is suboptimal.

4.6.5. Generalization and Validation Constraints

The distinct flow characteristics of the synthetic Aposemat IoT-23 dataset partially influence the reported high detection rates. To actively mitigate the risk of overfitting to these synthetic patterns, the VAE training incorporated a dropout rate of 0.2 (as detailed in Table 2). This regularization technique prevents the model from memorizing specific noise patterns in the training data by forcing it to learn robust, distributed latent features, thereby preventing model overfitting. Despite this measure, real-world traffic exhibits higher entropy.
A validity threat regarding generalization is the handling of novel threats, such as a C&C pattern from a botnet not present in the training set. In such cases, while the supervised BERT classifier might struggle, the pipeline’s reliance on the unsupervised VAE acts as a failsafe, flagging the novel pattern as an anomaly based on reconstruction error. This study focuses on generating actionable text-based countermeasures rather than enforcing them. There is no validation of the generated rules on a live firewall (e.g., iptables) in this iteration. The practical mitigation efficacy remains to be verified in a hardware-in-the-loop testbed.
To address the long-term generalization capability against novel threat patterns not present in the IoT-23 training set (e.g., new C&C signatures), the proposed architecture supports the integration of online learning mechanisms. As noted in the future work directions, implementing a sliding-window approach would enable the VAE to continuously update its definition of ‘normal’ traffic based on real-time data streams. This adaptive thresholding ensures the system remains resilient to concept drift in networks whose behavior evolves over time. To prevent data poisoning—where an attacker gradually trains the model to accept malicious traffic—such online updates would operate in a ‘Human-in-the-Loop’ configuration, in which model updates are triggered only after high-confidence anomalies are validated by the pipeline’s explainability outputs.

5. Conclusions and Future Work

This study presents a novel generative AI pipeline for enhancing smart home IoT security, including a VAE model for unsupervised anomaly detection, a fine-tuned BERT model for threat classification, and Grok 3 for automated countermeasure generation. Evaluated on the Aposemat IoT-23 dataset, the pipeline achieves a 99.95% detection rate, a 10.54% VAE false positive rate (FPR), and a 0.04% post-BERT FPR, with BERT classification accuracy of 99.90%. The VAE’s high recall (99.9%) ensures robust anomaly detection, while BERT’s precise classification minimizes false positives and Grok 3’s countermeasures provide adaptive, threat-specific responses. These results outperform traditional static intrusion detection systems [40], addressing critical gaps in IoT security by enabling real-time, scalable threat detection and mitigation.
The pipeline’s preprocessing, leveraging LightGBM imputation and PCA (92% variance retention), ensures data quality, supporting the models’ high performance. The sequential architecture (VAE → BERT → Grok 3) creates a cohesive framework, validated through comprehensive simulations and metrics.
Future work should focus on model compression for scalability, local deployment of Grok 3 to reduce API latency, and testing on real-world IoT datasets for improved generalizability. Addressing operational impacts for countermeasures will strengthen evaluations. The proposed solution has broader applications, including industrial IoT and protecting critical infrastructure.

Author Contributions

Conceptualization, A.O., K.S.A. and A.J.; methodology, A.O. and K.S.A. and A.J.; software, A.O.; validation, A.O., K.S.A., A.J. and P.D.; formal analysis, A.O., K.S.A. and A.J.; investigation, A.O. and K.S.A.; resources, K.S.A., A.J. and P.D.; data curation, A.O. and K.S.A.; writing—original draft preparation, A.O. and K.S.A.; writing—review and editing, A.O., K.S.A., A.J. and P.D.; visualization, A.O.; supervision, K.S.A., A.J. and P.D.; project administration, K.S.A., A.J. and P.D.; funding acquisition, K.S.A., A.J. and P.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially funded by the Knowledge Foundation (Stiftelsen för kunskaps-och kompetensutveckling—KK-stiftelsen) via the Synergy project Intelligent and Trustworthy IoT Systems (Grant number 20220087).

Data Availability Statement

The data presented in this study are openly available in [Aposemat IoT-23 dataset] [https://www.stratosphereips.org/blog/2020/1/22/aposemat-iot-23-a-labeled-dataset-with-malicious-and-benign-iot-network-traffic (accessed on 12 December 2024)].

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Alghayadh, F.; Debnath, D. A Hybrid Intrusion Detection System for Smart Home Security. In Proceedings of the 2020 IEEE International Conference on Electro Information Technology (EIT), Chicago, IL, USA, 31 July–1 August 2020; pp. 319–323. [Google Scholar] [CrossRef]
  2. Farea, A.H.; Alhazmi, O.H.; Samet, R.; Guzel, M.S. AI-Powered Integrated with Encoding Mechanism Enhancing Privacy, Security, and Performance for IoT Ecosystem. IEEE Access 2024, 12, 121368–121386. [Google Scholar] [CrossRef]
  3. Dixit, M.; Siby, S.M.; J, J.; Vetriveeran, D.; Sambandam, R.K.; D, V. Theoretical Framework for Integrating IoT and Explainable AI in a Smart Home Intrusion Detection System. In Proceedings of the 2024 IEEE International Conference on Contemporary Computing and Communications (InC4), Bangalore, India, 15–16 March 2024; Volume 1, pp. 1–5. [Google Scholar] [CrossRef]
  4. Kumar, M.S.; Ben-Othman, J.; Srinivasagan, K.; Krishnan, G.U. Artificial Intelligence Managed Network Defense System against Port Scanning Outbreaks. In Proceedings of the 2019 International Conference on Vision Towards Emerging Trends in Communication and Networking (ViTECoN), Vellore, India, 30–31 March 2019; pp. 1–5. [Google Scholar] [CrossRef]
  5. Gazdar, T.; Alqarni, H.; Bakhsh, A.; Aljidaani, M.; Alzahrani, M. A 2-Layers Deep learning Based Intrusion Detection System for Smart Home. In Proceedings of the 2022 Fifth National Conference of Saudi Computers Colleges (NCCC), Makkah, Saudi Arabia, 17–18 December 2022; pp. 35–40. [Google Scholar] [CrossRef]
  6. Liang, P.; Yang, L.; Xiong, Z.; Zhang, X.; Liu, G. Multilevel Intrusion Detection Based on Transformer and Wavelet Transform for IoT Data Security. IEEE Internet Things J. 2024, 11, 25613–25624. [Google Scholar] [CrossRef]
  7. Sabeel, U.; Heydari, S.S.; El-Khatib, K.; Elgazzar, K. Incremental Adversarial Learning for Polymorphic Attack Detection. IEEE Trans. Mach. Learn. Commun. Netw. 2024, 2, 869–887. [Google Scholar] [CrossRef]
  8. Vadisetty, R.; Polamarasetti, A. Generative AI for Cyber Threat Simulation and Defense. In Proceedings of the 2024 12th International Conference on Control, Mechatronics and Automation (ICCMA), London, UK, 11–13 November 2024; pp. 272–279. [Google Scholar] [CrossRef]
  9. Kołpa, P.; Adewole, K.S.; Persson, J.A.; Karlsson, F. Unsupervised Transformer-Based Anomaly Detection for IoT Networks. In Proceedings of the 2025 12th International Conference on Future Internet of Things and Cloud (FiCloud), Istanbul, Turkiye, 11–13 August 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 177–184. [Google Scholar]
  10. Alkhatib, N.; Mushtaq, M.; Ghauch, H.; Danger, J.L. CAN-BERT do it? Controller Area Network Intrusion Detection System based on BERT Language Model. In Proceedings of the 2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA), Abu Dhabi, United Arab Emirates, 5–8 December 2022; pp. 1–8. [Google Scholar] [CrossRef]
  11. Easttom, C. Malicious Use of Artificial Intelligence. In Proceedings of the 2025 IEEE 15th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 6–8 January 2025; pp. 00499–00507. [Google Scholar] [CrossRef]
  12. Oacheşu, A. Enhancing Smart Home Security with Generative AI: Threat Detection and Countermeasure Design. Master’s Thesis, Malmö University, Malmö, Sweden, 2025. [Google Scholar]
  13. Yang, T.; Wang, J.; Deng, H.; Li, M. A Data Enhancement Model for Intrusion Detection In Smart Home. In Proceedings of the 2023 4th International Conference on Computer Vision, Image and Deep Learning (CVIDL), Zhuhai, China, 12–14 May 2023; pp. 314–317. [Google Scholar] [CrossRef]
  14. Khan, F.B.; Durad, M.H.; Khan, A.; Khan, F.A.; Rizwan, M.; Ali, A. Design and Performance Analysis of an Anti-Malware System Based on Generative Adversarial Network Framework. IEEE Access 2024, 12, 27683–27708. [Google Scholar] [CrossRef]
  15. Changala, R.; Kayalvili, S.; Farooq, M.; Rao, L.M.; Rao, V.S.; Muthuperumal, S. Using Generative Adversarial Networks for Anomaly Detection in Network Traffic: Advancements in AI Cybersecurity. In Proceedings of the 2024 International Conference on Data Science and Network Security (ICDSNS), Tiptur, India, 26–27 July 2024; pp. 1–6. [Google Scholar] [CrossRef]
  16. Park, C.; Lee, J.; Kim, Y.; Park, J.G.; Kim, H.; Hong, D. An Enhanced AI-Based Network Intrusion Detection System Using Generative Adversarial Networks. IEEE Internet Things J. 2023, 10, 2330–2345. [Google Scholar] [CrossRef]
  17. Cong, X.; Duan, X.; Zhu, H.; Yu, Z.; Fang, Y. Design of resilient supervisory control by labeled Petri nets under external attacks. Trans. Inst. Meas. Control 2025, 1, 01423312251344708. [Google Scholar] [CrossRef]
  18. Adewole, K.S.; Jacobsson, A.; Davidsson, P. Intrusion detection framework for Internet of Things with rule induction for model explanation. Sensors 2025, 25, 1845. [Google Scholar] [CrossRef]
  19. Usmanbayev, D. Improving and Evaluating Methods Network Attack Anomaly Detection. In Proceedings of the 2021 International Conference on Information Science and Communications Technologies (ICISCT), Tashkent, Uzbekistan, 3–5 November 2021; pp. 1–5. [Google Scholar] [CrossRef]
  20. Yun, K.; Astillo, P.V.; Lee, S.; Kim, J.; Kim, B.; You, I. Behavior-Rule Specification-based IDS for Safety-Related Embedded Devices in Smart Home. In Proceedings of the 2021 World Automation Congress (WAC), Taipei, Taiwan, 1–5 August 2021; pp. 65–70. [Google Scholar] [CrossRef]
  21. Kumar, R.; Sharma, D. HyINT: Signature-Anomaly Intrusion Detection System. In Proceedings of the 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Bengaluru, India, 10–12 July 2018; pp. 1–7. [Google Scholar] [CrossRef]
  22. IoT-23 Dataset. 2019. Available online: https://www.stratosphereips.org/datasets-iot23 (accessed on 12 December 2024).
  23. Jin, H.; Jeon, G.; Choi, H.W.A.; Jeon, S.; Seo, J.T. A threat modeling framework for IoT-Based botnet attacks. Heliyon 2024, 10, e39192. [Google Scholar] [CrossRef] [PubMed]
  24. K, V.; Kannimoola, J.M. Guarding Against Command and Control (C2) Agents Utilizing Real-World Applications for Communication Channels. In Proceedings of the 2024 5th International Conference for Emerging Technology (INCET), Belgaum, India, 24–26 May 2024; pp. 1–5. [Google Scholar] [CrossRef]
  25. Huang, H.; Chu, J.; Cheng, X. Trend Analysis and Countermeasure Research of DDoS Attack Under 5G Network. In Proceedings of the 2021 IEEE 5th International Conference on Cryptography, Security and Privacy (CSP), Zhuhai, China, 8–10 January 2021; pp. 153–160. [Google Scholar] [CrossRef]
  26. Nikolskaya, K.Y.; Ivanov, S.A.; Golodov, V.A.; Minbaleev, A.V.; Asyaev, G.D. Review of modern DDoS-attacks, methods and means of counteraction. In Proceedings of the 2017 International Conference “Quality Management, Transport and Information Security, Information Technologies” (IT&QM&IS), St. Petersburg, Russia, 24–30 September 2017; pp. 87–89. [Google Scholar] [CrossRef]
  27. Khan, A.; Dutta, M. Combatting Okiru Attacks in IoT using AI-Powered Detection and Mitigation Tactics. In Proceedings of the 2024 4th International Conference on Technological Advancements in Computational Sciences (ICTACS), Tashkent, Uzbekistan, 13–15 November 2024; pp. 139–144. [Google Scholar] [CrossRef]
  28. Khan, A.; Sharma, I. Tackling Okiru Attacks in IoT with AI-Driven Detection and Mitigation Strategies. In Proceedings of the 2023 International Conference on Power Energy, Environment & Intelligent Control (PEEIC), Greater Noida, India, 19–23 December 2023; pp. 336–341. [Google Scholar] [CrossRef]
  29. Hajimaghsoodi, M.; Jalili, R. RAD: A Statistical Mechanism Based on Behavioral Analysis for DDoS Attack Countermeasure. IEEE Trans. Inf. Forensics Secur. 2022, 17, 2732–2745. [Google Scholar] [CrossRef]
  30. A, J.; Ebenezer, V.; Isaac, A.J.; Marshell, J.; Pradeepa, P.; Naveen, V. Adversarial Attacks on Generative AI Anomaly Detection in the Quantum Era. In Proceedings of the 2023 7th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 22–24 November 2023; pp. 1833–1840. [Google Scholar] [CrossRef]
  31. Jiang, T.; Fu, X.; Wang, M. BBO-CFAT: Network Intrusion Detection Model Based on BBO Algorithm and Hierarchical Transformer. IEEE Access 2024, 12, 54191–54201. [Google Scholar] [CrossRef]
  32. Golestani, S.; Makaroff, D. Device-Specific Anomaly Detection Models for IoT Systems. In Proceedings of the 2024 IEEE Conference on Communications and Network Security (CNS), Taipei, Taiwan, 30 September–3 October 2024; pp. 1–6. [Google Scholar] [CrossRef]
  33. Gatla, R.K.; Gatla, A.; Sridhar, P.; Kumar, D.G.; Rao, D.S.N.M. Advancements in Generative AI: Exploring Fundamentals and Evolution. In Proceedings of the 2024 International Conference on Electronics, Computing, Communication and Control Technology (ICECCC), Bengaluru, India, 2–3 May 2024; pp. 1–5. [Google Scholar] [CrossRef]
  34. Shahriar, M.H.; Haque, N.I.; Rahman, M.A.; Alonso, M. G-IDS: Generative Adversarial Networks Assisted Intrusion Detection System. In Proceedings of the 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), Madrid, Spain, 13–17 July 2020; pp. 376–385. [Google Scholar] [CrossRef]
  35. Shen, H.; Chen, J.; Wang, R.; Zhang, J. Counterfeit Anomaly Using Generative Adversarial Network for Anomaly Detection. IEEE Access 2020, 8, 133051–133062. [Google Scholar] [CrossRef]
  36. Mavikumbure, H.S.; Cobilean, V.; Wickramasinghe, C.S.; Drake, D.; Manic, M. Generative AI in Cyber Security of Cyber Physical Systems: Benefits and Threats. In Proceedings of the 2024 16th International Conference on Human System Interaction (HSI), Paris, France, 8–11 July 2024; pp. 1–8. [Google Scholar] [CrossRef]
  37. Saddi, V.R.; Gopal, S.K.; Mohammed, A.S.; Dhanasekaran, S.; Naruka, M.S. Examine the Role of Generative AI in Enhancing Threat Intelligence and Cyber Security Measures. In Proceedings of the 2024 2nd International Conference on Disruptive Technologies (ICDT), Greater Noida, India, 15–16 March 2024; pp. 537–542. [Google Scholar] [CrossRef]
  38. xAI. xAI Grok 3 Documentation. 2025. Available online: https://docs.x.ai/docs/overview (accessed on 28 April 2025).
  39. Hugging Face. Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. 2025. Available online: https://huggingface.co/docs/transformers (accessed on 14 July 2025).
  40. Alani, M.M.; Miri, A. Towards an Explainable Universal Feature Set for IoT Intrusion Detection. Sensors 2022, 22, 5690. [Google Scholar] [CrossRef] [PubMed]
  41. Jiang, Z.; Zheng, Y.; Tan, H.; Tang, B.; Zhou, H. Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering. arXiv 2017, arXiv:1611.05148. [Google Scholar] [CrossRef]
Figure 1. Architecture of the generative AI pipeline for smart home IoT security, showing data flow from raw input to countermeasure generation.
Figure 1. Architecture of the generative AI pipeline for smart home IoT security, showing data flow from raw input to countermeasure generation.
Electronics 15 00092 g001
Figure 2. Simplified VAE model architecture diagram.
Figure 2. Simplified VAE model architecture diagram.
Electronics 15 00092 g002
Figure 3. Precision–Recall curve for VAE threshold selection.
Figure 3. Precision–Recall curve for VAE threshold selection.
Electronics 15 00092 g003
Figure 4. Impact of MSE threshold variations on VAE detection performance (Precision, Recall, and F1-Score).
Figure 4. Impact of MSE threshold variations on VAE detection performance (Precision, Recall, and F1-Score).
Electronics 15 00092 g004
Figure 5. Confusion matrix for BERT threat classification (validation set).
Figure 5. Confusion matrix for BERT threat classification (validation set).
Electronics 15 00092 g005
Figure 6. Pipeline performance on a 10,000-sample test set, including 2000 benign and 8000 malicious samples.
Figure 6. Pipeline performance on a 10,000-sample test set, including 2000 benign and 8000 malicious samples.
Electronics 15 00092 g006
Figure 7. VAE unsupervised malicious traffic filtering results, showing 10.54% FPR.
Figure 7. VAE unsupervised malicious traffic filtering results, showing 10.54% FPR.
Electronics 15 00092 g007
Figure 8. BERT threat classification. Only four benign samples are incorrectly flagged as malicious.
Figure 8. BERT threat classification. Only four benign samples are incorrectly flagged as malicious.
Electronics 15 00092 g008
Table 1. Hyperparameter Configuration for Data Imputation Models.
Table 1. Hyperparameter Configuration for Data Imputation Models.
ModelConfiguration / Hyperparameters
LightGBMBoosting Type: GBDT
(Classifier and Regressor)n_estimators: 100
min_data_in_leaf: 20
n_jobs: −1, force_row_wise: True
AutoencoderArchitecture (Dense Layers): [Input, 64, 32, 64, Output]
(Refinement)Hidden Activation: ReLU
Output Activation: Linear
Table 2. VAE Model Architecture and Training Hyperparameters.
Table 2. VAE Model Architecture and Training Hyperparameters.
ParameterValue
Architecture
Encoder Layers (Neurons)[128, 64, 32]
Decoder Layers (Neurons)[32, 64, 128]
Latent Dimension6
Hidden ActivationLeaky ReLU
Batch NormalizationTrue
Dropout Rate0.2
Training
OptimizerAdam (Weight Decay: 1 × 10−6)
Learning Rate1 × 10−4
Batch Size256
Epochs100
Beta ( β )2.0
Capacity0.1
Contamination0.01
Table 3. BERT Model Architecture and Fine-Tuning Hyperparameters.
Table 3. BERT Model Architecture and Fine-Tuning Hyperparameters.
ParameterValue
Architecture
Base Modelbert-base-uncased
TaskSequence Classification
Number of Labels5 (Benign, DDoS, Okiru, C&C, Port Scan)
Hidden Dropout Rate0.1 (Default)
Fine-Tuning
OptimizerAdamW
Learning Rate3 × 10−5
Batch Size32
Epochs50
Max Sequence Length128 tokens
Table 4. Training Callbacks and Termination Criteria.
Table 4. Training Callbacks and Termination Criteria.
Callback ComponentValueDescription
Loss-Based Stopping
Metriceval_lossMonitors validation loss minimization
Patience3 EpochsTolerance for non-improving epochs
Restore Best WeightsTrueReverts to model with lowest loss
Accuracy-Based Stopping
MetricaccuracyMonitors classification accuracy
Threshold0.99 (99%)Stops if model exceeds this accuracy
General Training
Evaluation StrategyEpochValidate at end of every epoch
Save StrategyEpochCheckpoint at end of every epoch
Table 5. Feature selection results using Chi-Square tests (supervised, all samples) and PCA (unsupervised, benign samples).
Table 5. Feature selection results using Chi-Square tests (supervised, all samples) and PCA (unsupervised, benign samples).
FeatureChi-Sq. ScorePCA Contrib.Selected
orig_bytes954,153.5260.054C/P
resp_bytes907,718.5780.121C/P
orig_ip_bytes885,225.7540.068C/P
duration267,156.1290.107C/P
conn_state_encoded140,919.3260.087C/P
history_encoded15,217.4650.068C/P
orig_pkts13,683.8910.067C/P
proto_encoded8046.6750.096C/P
service_encoded1866.0650.111C/P
missed_bytesNaN0.120P
resp_pktsNaN0.075P
resp_ip_bytesNaN0.075P
NaN: insufficient variance; Selected: C = Chi-Square, P = PCA, C/P = both.
Table 6. Performance trade-offs at different MSE Thresholds.
Table 6. Performance trade-offs at different MSE Thresholds.
Threshold (MSE)PrecisionRecallF1-Score
0.320.93980.62400.7500
0.300.93980.62400.7500
0.26 (Selected)0.96160.99990.9804
0.250.92971.00000.9636
0.230.92971.00000.9636
0.210.92971.00000.9636
Table 7. Per-class metrics for BERT threat classification.
Table 7. Per-class metrics for BERT threat classification.
ClassPrecisionRecallF1-ScoreSamples
Benign0.99700.99800.99755000
C&C0.99920.99960.99945000
DDoS1.00000.99820.99915000
Okiru1.00000.99920.99965000
Horizontal Port Scan0.99881.00000.99945000
Table 8. Summary of countermeasure evaluation metrics.
Table 8. Summary of countermeasure evaluation metrics.
Attack TypeTotalAlignmentRelevanceActionability
Okiru16151614
DDoS1110119
C&C12121210
H. Port Scan16151614
Table 9. Aggregated countermeasures with attack type codes and severity levels.
Table 9. Aggregated countermeasures with attack type codes and severity levels.
CountermeasureAttack Type(s) and Severity Level(s)
Block Malicious IPs and TrafficO: L, DD: H & M, CC: H, HPS: H & M
Intr. Detection and PreventionO: L, DD: H & M, CC: H, HPS: H & M
Harden Systems and ServicesO: L, DD: H, CC: H, HPS: H & M
Network SegmentationO: L, CC: H, HPS: H & M
Monitor and Analyze TrafficO: L, DD: H, CC: H, HPS: H & M
Deception TechniquesO: L, HPS: H & M
Geo-RestrictionO: L, DD: H, HPS: H & M
Educate and Enforce PoliciesO: L, CC: H, HPS: H & M
Incident Response/AnalysisO: L, CC: H
SYN Flood ProtectionO: L, DD: H & M, HPS: H & M
Endpoint ProtectionO: L, CC: H
Behavioral AnalysisO: L, CC: H, HPS: M
Enable LoggingO: L, CC: H, HPS: H
Update SystemsO: L, CC: H, HPS: H
Disable Unused ServicesO: L, HPS: H & M
Threat IntelligenceCC: H, HPS: H & M
Increase CapacityDD: H
Content Delivery NetworkDD: H
Collaborate with ISPDD: H
Security AuditsDD: H
Access ControlHPS: H & M
Stealth ModeHPS: M
Attacks: O—Okiru, DD—DDoS, CC—C&C, HPS—Horizontal Port Scan; Severity: L—Low, M—Medium, H—High.
Table 10. End-to-End Pipeline Performance Metrics.
Table 10. End-to-End Pipeline Performance Metrics.
MetricValue
Anomaly Detection Rate99.95%
False Positive Rate (VAE)10.54%
False Positive Rate (Post-BERT)0.04%
Accuracy (BERT Classification)99.90%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Oacheșu, A.; Adewole, K.S.; Jacobsson, A.; Davidsson, P. Enhancing IoT Security with Generative AI: Threat Detection and Countermeasure Design. Electronics 2026, 15, 92. https://doi.org/10.3390/electronics15010092

AMA Style

Oacheșu A, Adewole KS, Jacobsson A, Davidsson P. Enhancing IoT Security with Generative AI: Threat Detection and Countermeasure Design. Electronics. 2026; 15(1):92. https://doi.org/10.3390/electronics15010092

Chicago/Turabian Style

Oacheșu, Alex, Kayode S. Adewole, Andreas Jacobsson, and Paul Davidsson. 2026. "Enhancing IoT Security with Generative AI: Threat Detection and Countermeasure Design" Electronics 15, no. 1: 92. https://doi.org/10.3390/electronics15010092

APA Style

Oacheșu, A., Adewole, K. S., Jacobsson, A., & Davidsson, P. (2026). Enhancing IoT Security with Generative AI: Threat Detection and Countermeasure Design. Electronics, 15(1), 92. https://doi.org/10.3390/electronics15010092

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop