Next Article in Journal
The Network and Information Systems 2 Directive: Toward Scalable Cyber Risk Management in the Remote Patient Monitoring Domain: A Systematic Review
Previous Article in Journal
H-CLAS: A Hybrid Continual Learning Framework for Adaptive Fault Detection and Self-Healing in IoT-Enabled Smart Grids
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

G-PFL-ID: Graph-Driven Personalized Federated Learning for Unsupervised Intrusion Detection in Non-IID IoT Systems

Department of Computer Science, Florida Campus, University of South Africa, Johannesburg 1709, South Africa
*
Author to whom correspondence should be addressed.
Submission received: 15 December 2025 / Revised: 14 January 2026 / Accepted: 23 January 2026 / Published: 29 January 2026

Abstract

Intrusion detection in IoT networks is challenged by data heterogeneity, label scarcity, and privacy constraints. Traditional federated learning (FL) methods often assume IID data or require supervised labels, limiting their practicality. We propose G-PFL-ID, a graph-driven personalized federated learning framework for unsupervised intrusion detection in non-IID IoT systems. Our method trains a global graph encoder (GCN or GAE) with a DeepSVDD objective under a federated regularizer (FedReg) that combines proximal and variance penalties, then personalizes local models via a lightweight fine-tuning head. We evaluate G-PFL-ID on the IoT-23 (Mirai-based captures) and N-BaIoT (device-level dataset) under realistic heterogeneity (Dirichlet-based partitioning with concentration parameters α { 0.1 , 0.5 , } and client counts K { 10 , 15 , 20 } for IoT-23, and natural device-based partitioning for N-BaIoT). G-PFL-ID outperforms global FL baselines and recent graph-based federated anomaly detectors, achieving up to 99.46% AUROC on IoT-23 and 97.74% AUROC on N-BaIoT. Ablation studies confirm that the proximal and variance penalties reduce inter-round drift and representation collapse, and that lightweight personalization recovers local sensitivity—especially for clients with limited data. Our work bridges graph-based anomaly detection with personalized FL for scalable, privacy-preserving IoT security.

Graphical Abstract

1. Introduction

The Internet of Things (IoT) has shown many benefits in various fields, including healthcare [1], logistics, and smart infrastructure [2]. The IoT is an interconnected network of physical objects that can include everyday consumer electronics, smart home devices, enterprise edge gateways, and industrial infrastructure [3]. These devices use networked sensors and actuators to collect and analyze sensor data, improving operational efficiency, revealing system behavior, supporting predictive maintenance, and enabling new services [4,5,6]. However, the growth of interconnected devices increases the attack surface, opening up more possible attack points in the network.
Notable examples include botnets such as Mirai, Emotet, and Srizbi, which have exploited weaknesses in devices on a large scale [7,8,9]. Once compromised, these devices are often used for attacks such as denial of service (DoS), distributed denial of service (DDoS), and spoofing campaigns. These attacks exploit protocol or software vulnerabilities, overwhelm network or computing resources, impersonate legitimate traffic, and facilitate spamming, advertising fraud, intellectual property theft, ransomware deployment, and other fraudulent activities [10,11,12].
To reduce risks in IoT systems, operators typically deploy both preventive and detective controls. These techniques include network segmentation, hardened firewall rules, platform hardening, secure update mechanisms, and packet filtering appliances [13,14,15,16,17]. Although these techniques often serve as the first line of defense, blocking many known exploits, intrusion detection systems (IDS) remain crucial to monitoring network traffic and identifying attacks that evade these defenses [18].
Classical IDS approaches can be categorized as signature-based, anomaly-based, or hybrid-based. A signature-based IDS compares traffic with known threat patterns. Studies show that these techniques are effective at detecting familiar exploits with relatively few false alarms, yet they struggle with novel or zero-day attacks and require frequent updates [18,19,20]. Anomaly detectors (AD) learn patterns of expected behavior, such as normal traffic volume, protocol usage, and communication timing. They raise alerts when behavior diverges [21]. Once expected behaviors have been established, ADs are effective at identifying new threats; however, false alarm rates tend to increase unless baselines are carefully adjusted, particularly in dynamic IoT environments [18,20,21]. Hybrid systems combine methods to achieve better coverage and accuracy. However, in IoT deployments, this combination results in increased complexity, greater resource demands, and heavier maintenance burdens [18,20].
Machine learning (ML) expands the capabilities of IDS beyond handcrafted rules, enabling the detection of unfamiliar threats by learning directly from device logs [22]. However, supervised ML IDS requires labeled datasets to develop effective models. This process involves gathering and labeling large datasets that include current attack types across many different IoT devices. These datasets can be very expensive and often impractical, as discussed in [23,24,25]. A commonly proposed technique in the literature is training positive-only datasets to understand benign behavior and flag deviations [26,27].
Recent literature shows that graph-aware models, such as graph autoencoders (GAEs) and graph convolutional networks (GCNs), effectively detect anomalies in centralized settings when the data contains relational or structural dependencies. Network sensor data naturally records relationships between hosts and flows, such as sessions, services, ports, and device roles, as well as how these elements communicate with one another. Representing this data as a graph reveals multi-hop dependencies and structural outliers that pointwise methods often overlook [28,29,30]. However, deploying centralized ML-IDS in real IoT environments often poses practical challenges. For instance, collecting raw traffic to a central server increases communication overhead and latency. It also requires more computing power and memory than many edge devices can provide, creating a single point of failure [31,32].
Federated learning (FL) offers a promising solution to addressing these challenges. In this approach, each client trains using its own local data and only shares model updates with a coordinating server [33,34,35,36,37], which significantly reduces the transfer of raw data and provides stronger privacy protections. However, deploying FL in IoT environments presents unique challenges. Many IoT FL–based intrusion-detection studies assume only mild non-independent and identically distributed (non-IID) variation across clients. In practice, however, clients often differ substantially in the quantity and distribution of their data. Consequently, simple aggregation methods, such as FedAvg [38], can produce global models that perform poorly for clients with highly heterogeneous data distributions [36,39,40,41,42,43]
Additionally, embeddings may collapse during unsupervised or self-supervised graph training in FL if clients only observe very small local batches or experience intermittent connectivity. Model aggregation becomes unstable when data is partitioned in a highly skewed manner, either by label or by volume. This instability is especially likely when many clients have only benign examples [44,45,46,47].
To address these issues, we propose a graph-based, personalized, unsupervised IDS trained in a federated manner across heterogeneous IoT clients. The approach follows a two-stage pipeline that begins with joint training of a shared graph encoder (a GCN or a GAE backbone) under a combined reconstruction and one-class objective [27]. Training is augmented with a proximal regularizer (FedProx) [48] and an embedding-variance penalty intended to stabilise updates when local data are scarce. The encoder is implemented in two complementary instantiations. The first is a graph-level GCN trained with a DeepSVDD objective that yields a compact hypersphere decision rule. The second is a hybrid GAE–DeepSVDD model that enforces reconstruction fidelity while also encouraging embedding compactness.
We adopt the hybrid variant as the canonical G-PFL-ID encoder because it combines the interpretability of reconstruction-based scoring with the explicit one-class boundary that operational detection benefits from. The GCN–DeepSVDD configuration is evaluated as a principled baseline and is used in ablations to probe how different inductive biases interact with the proposed federated regularizers and personalization choices.
After the global stage converges, clients perform light personalization by fine-tuning a small local head on benign examples and computing client specific calibration statistics. To simulate realistic heterogeneity we induce quantity skew via Dirichlet partitioning with concentration parameters α { 0.1 , 0.5 , } and we vary the number of clients in the federation K { 10 , 15 , 20 } . Empirical evaluation on IoT-23 under these controlled splits demonstrates consistent gains from the proposed regularizers and from local personalization.
The main contributions of this paper are as follows:
  • We implement and evaluate two federated one-class graph encoders that are compliant to federated training and comparable under identical conditions. These are a graph-level GCN trained with a DeepSVDD objective, and a hybrid GAE–DeepSVDD that jointly optimises reconstruction and embedding compactness. 
  • We introduce two lightweight regularizers for federated unsupervised learning. A proximal aggregation term reduces weight divergence when client sample sizes vary. An embedding variance penalty mitigates representation collapse on small-data clients and preserves discriminative capacity. 
  • We design a compact personalization head that each client fine-tunes on benign data. The head, together with selective encoder fine-tuning, restores local sensitivity while keeping communication overhead low. 
  • We provide an extensive empirical study on IoT-23 using Dirichlet partitions with α { 0.1 , 0.5 , } and K { 10 , 15 , 20 } , and further validate G-PFL-ID on the N-BaIoT dataset (treating each of the 9 IoT devices as a separate client, K = 9 ) to demonstrate generalizability under real-world device heterogeneity. We report both global and personalized metrics (AUROC, AUPR, detection rates at fixed false-positive rates) and present ablation studies that isolate the contribution of each proposed component.
The rest of the paper is organized as follows. Section 2 situates our work relative to graph-based IDS and federated personalization. Section 3.1.1 formalizes the federated one-class problem and the partitioning protocols. Section 3 presents the model architecture, objectives, and federated algorithm. Section 4.1 details the experimental design and Section 4.2 reports the results and ablations. We conclude in Section 5 with limitations and directions for future work.

2. Related Works

In this section, we survey state-of-the-art work related to our proposal. We start with FL techniques applied to IoT intrusion or anomaly detection, then move to unsupervised anomaly detection methods, and finally discuss personalized federated learning approaches.

2.1. Federated Learning for Intrusion Detection

Existing studies have explored the use of federated learning for intrusion detection in various IoT and networked settings. Bhavsar et al. [49] proposed an edge-deployed federated learning intrusion detection system (FL-IDS) that runs lightweight classifiers on embedded devices and reported supervised training evaluated under an independent and identically distributed (IID) assumption on the NSL-KDD and Car Hacking datasets. Nguyen et al. [50] presented FedNIDS, a supervised, packet-level federated framework designed to disseminate incremental knowledge about new attacks. The paper reported strong empirical F1 scores, but the evaluation did not focus on extreme non-IID splits. Althunayyan et al. [51] designed a hierarchical federated learning (FL) system for in-vehicle Controller Area Network (CAN) bus monitoring. The system uses a two-stage detector with an LSTM autoencoder in the second stage. The proposed supervised model was evaluated on CAN datasets, but it did not explicitly stress-test severe client.
Huang et al. [52] introduced FED IoV, which converts vehicular traffic into images and trains MobileNet Tiny in a supervised, federated setting using the CAN Intrusion and CICIDS2017 datasets. The reported experiments assume independent and identically distributed (IID) clients and did not explore adversarial robustness. Shao et al. [53] used evolutionary neural architecture search to discover compact federated CNN models for industrial control systems. The supervised method was evaluated using the Gas Pipeline, SWaT, and WADI datasets, but non-IID behavior was not the main focus. Praharaj et al. [54] applied federated transfer learning to anomaly detection in cooperative smart farming. The authors reported supervised results on two smart farm testbeds, along with communication compression and acknowledge heterogeneity among farms but did not systematically parameterize it via Dirichlet or similar splits or measure the degree of heterogeneity. Singh and Popli [55] developed a supervised and validated federated IDS for underwater drones using a combination of in-house and public datasets, including CIC IDS 2017. However, the work did not deeply examine extreme non-IID participation.
Another group of studies looked at class imbalance and continuous learning, which are practical problems for supervised FL in IDS. Jin et al. [56] proposed FL-IDS, which augments local training with a class gradient balance loss and a relay client that fuses reconstructed samples. The evaluation was carried out using UNSW-NB15 and CIC-IDS17, showing improved resilience to catastrophic forgetting. However, the experiments consider moderate rather than extreme non-IID conditions. Mazid et al. [37] propose FL-IDPP, which used bidirectional RNNs locally and a voting ensemble at the server to improve decision quality. The paper reported reduced convergence time but does not analyze performance under strong client heterogeneity. Singh et al. [57] explicitly addressed class imbalance in federated IDS. The authors evaluated their class balancing strategies under both IID and moderate non-IID settings using the WUSTL IIoT 2021 dataset.
A common drawback of these studies is that they rely on supervised labels and therefore require access to representative attack annotations locally. Additionally, evaluations often assume mild heterogeneity or fail to systematically vary the degree of non-IID partitioning.

2.2. Unsupervised Anomaly Detection

Unsupervised techniques for anomaly detection in IoT networks do not require labeled attack examples, making them well-studied [21]. Classical methods include reconstruction-based models, such as autoencoders [58] and variational autoencoders [59]. These models learn to reconstruct benign behavior and flag inputs with large reconstruction errors [26,28]. Other common families of methods include one-class and density estimators, such as Isolation Forest (IF) and Local Outlier Factor (LOF). These methods identify points as anomalous if they fall in low-density regions or are isolated from the main data mass [60]. Clustering and density-based approaches group normal instances and treat points belonging to very small or sparse clusters as anomalies [21]. These algorithms perform well in many settings but they face practical difficulties when applied to heterogeneous networked IoT data.
There are two relevant challenges in federated and constrained IoT deployments. First, reconstruction-based models typically rely on large, representative, benign datasets to learn reliable generative mappings. When clients have limited local data, the quality of reconstruction can degrade, raising both false negatives and false positives [61,62]. Second, unsupervised feature learners are prone to representation collapse under small sample sizes or small batches. Representation collapse occurs when encoders map distinct normal instances to overly similar latent vectors. This behavior has a negative effect on one-class detectors. For instance, DeepSVDD and related hypersphere methods rely on a compact but informative normality region. When embeddings collapse, the hypersphere becomes trivial, fails to separate anomalies from normal examples, and detection performance degrades [27,63].
Recent federated work proposes complementary solutions. Federated denoising autoencoders, such as L-MDAE, demonstrate that collaborative training across noisy, multimodal clients can enhance robustness [64]. Temporal hybrids combining LSTM encoders and autoencoders have been applied to grid and telemetry data in a federated manner, yielding promising results [65]. Other approaches share compressed distributional summaries or local density functions, using those summaries to guide contrastive or density-based learning at the client level [61,66]. Contrastive learning can increase embedding diversity, but it is conceptually at odds with one-class learning. The contrastive objective pushes many pairs apart, whereas the one-class objective prefers normal samples to remain close together. In federated non-IID settings, naive contrastive losses can amplify cross-client divergence unless coupled with careful regularization [61,67].
To address the limitations of naive contrastive losses, a family of density-based methods aims to preserve the geometry of high-density inlier subspaces. RIFIFI modifies the isolation forest paradigm to isolate outliers while preserving dense subspace structure. This improves interpretability and reduces false alarms [60]. However, these methods still neglect relational context, such as host-to-host relations or flow sequences, which are often essential for network intrusion detection [68,69].
Graph-based methods address this gap by using structural cues that single-point detectors cannot perceive. Graph autoencoders and their variational counterparts learn representations by reconstructing edges and node attributes. Anomalies are revealed via either large reconstruction error or embeddings that deviate from normal inlier [28,70,71]. Therefore, federated adaptations of graph learning have been proposed. For instance, FedGCN reduces communication per round by exchanging neighbor summaries beforehand and then updating local subgraphs [47].
Some frameworks apply contrastive self-supervision to graphs, maintaining global negative pairs or pseudo-labels to improve the separation between anomalous and normal nodes [72,73]. Methods such as LG-FGAD combine adversarial anomaly generation with local-global knowledge distillation to increase anomaly awareness in non-IID federated settings [74]. While these graph-aware federated methods often outperform tabular baselines on graph anomaly benchmarks, they introduce new system-level trade-offs. For example, cross-client edge handling becomes necessary, per-round communication increases, and small local graphs become more sensitive to noise [68,75].
Deep one-class models, such as DeepSVDD, provide an explicit decision rule by learning a compact hypersphere that encloses normal data [27,76]. Combining reconstruction objectives with a one-class boundary (e.g., a VAE linked to SVDD) increases robustness because the model preserves data fidelity while enforcing a clearer decision surface [77,78]. These hybrids and pure one-class approaches are now being adapted to federated settings for both anomaly detection and defending against malicious updates [63]. However, applying naive federated averaging to DeepSVDD-style models can be fragile. When client embeddings drift inconsistently under severe non-IID sampling, the global hypersphere can be poorly defined, resulting in poor detection performance [79].

2.3. Personalized Federated Learning (PFL)

Personalization has emerged as a practical solution because a single global model rarely fits every client in a heterogeneous federation. In intrusion detection, for example, each site has its own mix of devices, traffic patterns, and operational constraints. A model tuned to these particularities often yields fewer false alarms, better local utility, and preserves the privacy benefits of federated training [80,81]. The empirical literature supports this intuition, and several distinct paradigms of personalization techniques have been developed.
One paradigm splits the model into a shared backbone and small client-specific modules, which could be a decoder or scoring head. The backbone is trained collaboratively across clients while the head remains private and is adapted locally. This pattern is easy to implement and keeps communication costs low because only the backbone parameters are exchanged during federation. Representative work includes FedPer [80] and FedRep [81], which demonstrate that separating shared representations from local heads can substantially improve performance for each client under heterogeneity. Ref. [82] also employed this paradigm to address the vulnerability of FL-based IDS systems to poisoned clients in non-IID IoT data environments. The proposed supervised PFL method for IDS (pFL-IDS) successfully defended against approximately 30 poisoned clients while maintaining a low attack success rate, outperforming prior approaches. However, detection is limited to the attacks on which it was trained.
Another family of methods uses proximal regularization or related penalties to limit how far local updates may deviate from the global solution. FedProx [48] is the canonical example, and Ditto [83], builds on this concept by framing personalization as a two-level problem. In this problem, each client learns a model that remains close to a global reference while optimizing its own local objective. Regularizers of this type are appealing in IDS because they explicitly balance global consensus with local specialization and allow for straightforward analysis and tuning via a single regularization coefficient.
Methods such as Per-FedAvg treat federated learning as a form of meta-training that produces an initialization that can be adapted with a few local gradient steps [84]. Meta-based personalization is attractive when clients have very few labeled examples or when fast local adaptation is required. In practice, this approach has been applied more often to supervised tasks. However, the underlying idea can be applied to one-class or unsupervised settings, where a shared encoder can be rapidly adapted to local normal behavior [84,85,86].
Another family of methods groups clients into clusters or learns a client similarity graph so that only similar participants share information with one another. Clustering reduces negative transfer from dissimilar peers and can be implemented in multiple ways. For example, it can be done by grouping gradient signatures [46], estimating pairwise affinities [87], or learning a client network from marginal parameter statistics [88]. This family is relevant in graph-based federated anomaly detection because clients that host similar subgraphs or run similar services are natural sharing partners.
A notable trend regarding the benefits of these paradigms to FL is that they often improve worst-case performance per client rather than just average performance. This is important because a single poorly performing client can pose a significant risk [83]. However, two gaps are directly relevant to graph-based, unsupervised intrusion detection. First, many practical FL-IDSs operate on tabular features or time series and do not exploit the relational signals captured by graph models. Second, most personalization work has been developed independently of unsupervised, one-class objectives and the regularizer that stabilizes unsupervised, federated training. A summary of the existing related literature is presented in Table 1.
In this article, we combine these threads by federating a shared GNN encoder under a combined reconstruction and DeepSVDD one-class objective. We also augment local training with proximal and variance-preserving regularizers and allow lightweight local heads to personalize detection while minimizing backbone communication.

3. Methodology

3.1. Preliminaries

We present the preliminaries, which include the federated one-class problem for graph-structured IoT network traffic, and the definition of the main notation used for this study.

3.1.1. Problem Formulation

Given K clients with local datasets { D 1 , D 2 , , D K } that contain primarily benign network traffic data, our goal is to learn a global model that can detect anomalous behavior while preserving data privacy. The objective is to minimize:
min θ L global ( θ ) = k = 1 K n k N L k ( θ ) + Λ ( θ ) ,
where L k ( θ ) is the local unsupervised loss in client k and Λ ( θ ) denotes server-side regularization.
We transform network traffic into graph structures where anomalies manifest as deviations from learned normal patterns in the graph embedding space.

3.1.2. Notations and Definitions

We consider a federation of K clients (edge devices or gateways) indexed by C = { 1 , , K } . Each client k C has a private data set D k derived from network telemetry (Zeek conn logs in our experiments). From  D k we construct a client graph.
G k = ( V k , E k , X k ) ,
where V k is the set of nodes (hosts), E k the observed flow edges, and  X k R | V k | × d the node feature matrix with d features per node. We denote by A k the adjacency matrix of G k and by n k = | V k | the number of nodes on the client k. The total number of nodes in the federation is N = k = 1 K n k .
A shared graph encoder f θ parameterized by θ maps a client graph to the node embeddings:
Z k = f θ ( X k , A k ) R n k × m ,
where each row z k , i R m is the embedding of node i and m is the embedding dimension.
Our pipeline supports three one-class instantiations: (i) GCN–DeepSVDD that directly optimizes a DeepSVDD hypersphere on graph-level GCN embeddings, (ii) GAE that uses reconstruction error on structure/features as an anomaly score, and (iii) a hybrid GAE–DeepSVDD that jointly optimizes reconstruction and hypersphere compactness. When using the GAE variant, the full model is f θ (encoder) and g ψ (decoder) and both sets of parameters can be updated during federated rounds to minimize reconstruction objectives. For DeepSVDD variants, the training focuses on f θ and a one-class objective. The personalization parameters ϕ k are introduced after the global backbone stabilizes. Clients initialize h ϕ k from the stabilized backbone provided by the server, then perform local fine-tuning on benign samples while keeping most of θ fixed or allowing limited encoder updates according to the selected personalization budget.
Table 2 summarizes the main symbols used throughout the paper.

3.2. G-PFL-ID Overview

The proposed Graph-based Personalized Federated Learning for Intrusion Detection (G-PFL-ID) framework addresses the fundamental challenges of IoT security in distributed environments through a comprehensive two-stage architecture. As illustrated in Figure 1, our approach combines global model federation with client-specific personalization to achieve robust generalization while maintaining individualized adaptation to local data distributions. The G-PFL-ID framework operates through the following systematic process:
  • Global Model Initialization:The process begins with the central server initializing a shared global intrusion detection model f θ . This model employs graph-based architectures which includes the option to use GCN encoders or GAE encoders with decoders g ψ for reconstruction tasks [28,30,68]. The initialized model is broadcast to all participating clients at the beginning of each federation round t.
  • Local Training with Multi-Objective Optimization: Each client k computes local updates using the shared global model f θ on its per-host aggregated graph dataset D k . The local optimization incorporates three key components:
    Task-Specific Objective: Clients employ one of three graph-based architectures: GAE (Graph Autoencoder), GAE-DeepSVDD, or GCN-DeepSVDD, depending on the detection paradigm (reconstruction-based or hypersphere-based anomaly detection).
    FedProx Regularization: The proximal term μ 2 θ θ t 2 [48] stabilizes training under non-IID data distributions by constraining local updates to remain close to the global model.
    Embedding Variance Penalty: A variance regularization term λ var · L var ( z ) promotes compact feature representations, enhancing anomaly detection performance in one-class settings.
  • Secure Server Aggregation with Intrusion Detection: The global server employs an intrusion detection mechanism to identify and filter out potentially malicious client updates Δ k . Clean updates are aggregated using federated averaging [33]:
    θ t + 1 θ t + 1 | S clean | k S clean Δ k t
    where S clean represents the set of verified benign clients, θ t and θ t + 1 represents the global model parameters at round t and t + 1 respectively, and  Δ k t which is Δ k t = θ k t θ t represents the model update from client k at round t.
  • Client-Specific Personalization: In the final stage, each client adapts the global model through personalized heads ϕ k trained exclusively on local data D k . This personalization phase optimizes ϕ k * ensuring optimal adaptation to local traffic patterns while maintaining stability [81,84].
    ϕ k * = arg min ϕ k E ( x , y ) D k [ ( f θ * ( x ) ϕ k , y ) ] + λ p ϕ k ϕ init 2
    where ϕ k * represents the optimal personalized head for client k, arg min ϕ k finds parameters that minimize the objective, E ( x , y ) D k is the expectation over client k’s data distribution, ( · , y ) represents the loss function (e.g., reconstruction error for GAE), f θ * ( x ) is the feature representation from frozen global backbone, ∘ is the function composition (backbone + personal head), λ p is the personalization regularization strength, and  | ϕ k ϕ init | 2 represents the L2 regularization from initial head.

3.3. Global Federated Training

The first stage of G-PFL-ID trains a shared graph encoder across the federation without centralizing raw network traffic. For a federation network containing K clients with local graph datasets { D k } k = 1 K , each client k provides a graph G k = ( V k , E k , X k ) with n k = | V k | nodes and node features X k R n k × d . Our global backbone is a parameterised encoder f θ that maps ( X k , A k ) Z k R n k × m .
We evaluate two principal one-class instantiations of f θ : (i) GCN–DeepSVDD (encoder + one-class hypersphere objective) and (ii) GAE–DeepSVDD (encoder + decoder with joint reconstruction and one-class compactness). Figure 2 shows a schematic of the two alternatives, Table 3 presents the GCN–DeepSVDD architecture, and Table 4 presents the GAE-DeepSVDD architecture used in our experiments.

3.3.1. GCN-DeepSVDD Implementation

In the GCN-DeepSVDD implementation, the hypersphere centre c R m is initialized using a data-driven procedure [27] and remains fixed during training to prevent trivial solutions. The model optimizes the standard DeepSVDD objective:
L GCN - SVDD ( k ) ( θ ; D k ) = 1 n k i V k z k , i c 2 + λ 2 θ F 2 ,
where z k , i R m represents the embedding of node i in client k which is also referred to as ϕ θ ( x i ) in literature [27], λ is the weight decay coefficient, and  · F denotes the Frobenius norm. At inference, the anomaly score for node i is s k , i = z k , i c 2 .

3.3.2. GAE-DeepSVDD Implementation

The hybrid GAE-DeepSVDD model produces complementary anomaly signals through reconstruction error and embedding compactness. The reconstruction error detects local inconsistencies in graph structure, while embedding compactness enforces a tight one-class boundary, following principles established in graph autoencoder anomaly detection [89,90].
In our GAE-DeepSVDD implementation, the node features, x k , i are not reconstructed (i.e., x ^ k , i = 0 ) for client k, only the adjacency matrix A ^ . Therefore the total hybrid loss, L GAE - SVDD ( k ) ( θ , ψ ) for client k is expressed as:
L GAE - SVDD ( k ) ( θ , ψ ) = α L recon ( k ) ( θ , ψ ) + β L SVDD ( k ) ( θ ) + λ 2 θ F 2 ,
L recon ( k ) ( θ , ψ ) = 1 | E k | ( i , j ) E k BCE ( A i j , A ^ i j ) ,
L SVDD ( k ) ( θ ) = 1 n k i V k z k , i c 2 2 , z k , i = f θ ( X k , A k ) [ i ] .
where:
  • L recon ( k ) ( θ , ψ ) and L SVDD ( k ) ( θ ) are the graph reconstruction loss and DeepSVDD compactness loss for client k,
  • α , and  β are the reconstruction and DeepSVDD loss weight, where ( α [ 0 , 1 ] ), and ( β [ 0 , 1 ] ).
  • λ is the weight decay coefficient for L 2 regularization
  • θ F 2 is the frobenius norm of encoder parameters (sum of squared weights), and  · 2 2 is the squared L 2 norm (Euclidean distance)
  • BCE ( A i j , A ^ i j ) is binary cross-entropy between actual, A i j and reconstructed, A ^ i j adjacency matrix element. A i j = 1 if the edge ( i , j ) exists, and 0, otherwise. A ^ = σ ( Z Z ) , σ is sigma,
  • ( i , j ) E k is the edge between nodes i and j which is an element of the edges, E k in client k’s graph.
  • x k , i , x ^ k , i , and  z k , i are the original node features, reconstructed node features, and latent embedding for node i in client k, where z k , i R m .
Following the standard federated learning principles for autoencoders [33], where the complete model benefits from distributed learning of both encoding and decoding patterns across the federation, both the encoder parameters θ and decoder parameters ψ are updated locally and aggregated globally.
Since we only reconstruct the adjacency matrix (not node features, x k , i ), the reconstruction score is based solely on edge reconstruction errors. For node-level scoring, we aggregate edge errors. Therefore the final combined anomaly score, s k , i for node i in client k is expressed as:
s k , i = γ s k , i SVDD + ( 1 γ ) s k , i rec .
where, s k , i rec = 1 n k j = 1 n k BCE ( A i j , A ^ i j ) represents the average reconstruction error or anomaly score of edges connected to node i, providing a node-centric reconstruction score, s k , i SVDD = z k , i c 2 represents the DeepSVDD anomaly score, and  γ is the score fusion parameter (tuned on validation data, γ [ 0 , 1 ] )

3.3.3. Federated Optimization with Multi-Objective Regularization (FedReg)

To address the challenges posed by non-IID data distributions and unstable convergence in federated one-class learning, we propose FedReg, a regularization framework that integrates multiple complementary objectives during local client training. For each client k, the local training objective, L local ( k ) combines three theoretically grounded regularization terms, as shown in Equations (14) and (15):
  • Embedding Compactness, L compact ( k ) : The variance regularization, Var ( Z k ) , prevents feature collapse in unsupervised learning by encouraging diverse but structured representations [67,91]. This addresses the trivial solution problem where all embeddings converge to the hypersphere center c, which is crucial for effective anomaly detection in high-dimensional spaces [92].
  • FedProx, L prox ( k ) : Provides convergence guarantees for non-convex objectives under heterogeneous data distributions, with the proximal term bounding client drift by constraining local updates to remain close to the global model θ G [48]. The hyperparameter μ controls the regularization strength.
  • Adaptive Center Update, c: Ensures stable DeepSVDD optimization by gradually refining the hypersphere center, c while avoiding abrupt changes (model collapse) that could destabilize training [27].
L local ( k ) = L task ( k ) + L compact ( k ) + L prox ( k )
L task ( k ) = L SVDD ( k ) for GCN-DeepSVDD α L recon ( k ) + β L SVDD ( k ) for GAE-DeepSVDD
L compact ( k ) = λ compact · Var ( Z k ) = λ compact · E [ z μ z 2 ]
L prox ( k ) = μ 2 θ k θ G 2
where, L task ( k ) is the task-specific loss: The core detection objective varying by model architecture, as defined in Equations (6) and (7), L compact ( k ) is the embedding compactness, and  L prox ( k ) is the FedProx regularization.
c ( 1 α ) · c + α · 1 n k i = 1 n k z k , i
where α is the update rate, and the center is initialized using a single forward pass on the first training batch of benign data. This approach follows established practices for stable DeepSVDD training [27]. Algorithm 1 provides the pseudo-code for the G-PFL-ID global federation training. During training, c is updated gradually using an exponential moving average (Equation (15)) to avoid sudden shifts that could destabilize the one-class objective. The update rate α is set to 0.1 in our experiments.
Algorithm 1 G-PFL-ID Federated Training with FedReg Regularization
Require: 
Client datasets { D k } k = 1 K , global model θ G 0 , FedReg hyperparameters
Ensure: 
Trained global model θ G T , personalized models { θ k * }
  1:
Initialize global parameters θ G 0 , set round t 0
  2:
while  t < T and not converged do
  3:
      Sample client subset S t { 1 , , K } with fraction C
  4:
      for each client k S t  in parallel do
  5:
             θ k t , 0 θ G t                               ▹ Initialize with global model
  6:
            Initialize optimizer with learning rate η , weight decay λ
  7:
            for local epoch e = 1 to E do
  8:
                  for each batch b in D k  do
  9:
                        Compute embeddings: Z k = f θ k ( X b , A b )
10:
                        Compute task loss L task ( k ) via Equation (12)
11:
                        Compute compactness: L compact ( k ) = λ compact · Var ( Z k )
12:
                        Compute FedProx: L prox ( k ) = μ 2 θ k θ G t 2
13:
                        Total loss: L local ( k ) = L task ( k ) + L compact ( k ) + L prox ( k )
14:
                        Update: θ k θ k η L local ( k )
15:
                        if DeepSVDD model then
16:
                              Update center c via Equation (15) with rate α
17:
                        end if
18:
                  end for
19:
            end for
20:
            Upload local update: Δ k t = θ k θ G t
21:
      end for
22:
      Server aggregates: θ G t + 1 θ G t + 1 | S t | k S t Δ k t
23:
       t t + 1
24:
end while
25:
return  θ G T , { θ k } k = 1 K for personalization stage
Our global federated training operates under a strict one-class setting, where each client uses only its local benign data to train the shared graph encoder. This avoids the need for labeled attacks but assumes that the available benign data is representative of future normal traffic. The DeepSVDD objective enforces a compact hypersphere around these benign embeddings, which provides a clear decision boundary for anomaly detection.

3.4. Personalization Strategy

The second stage of G-PFL-ID performs client-specific adaptation to address statistical heterogeneity in federated IoT environments. Building on established personalized FL frameworks [93], it applies client-specific classification heads h ϕ k to tailor the globally trained model to each client’s data distribution. Global model components remain frozen and the personalized heads are regularized, following [81,84], which helps maintain federation stability and preserve local data privacy.
For each client k, we optimize a personalized classification head, h ϕ k while keeping all global parameters fixed. Each client fine-tunes its head ϕ k using only its local benign data. The loss is the same unsupervised objective used in global training plus a regularization term that discourages large deviations from the initial head:
ϕ k * = arg min ϕ k L personal ( k ) ( ϕ k ; θ G , ψ G , c )
L personal ( k ) = E x D k benign ϕ k f θ G ( x ) Unsupervised local loss + λ p ϕ k ϕ init 2 Stability regularization
Here θ G is the global encoder parameter (frozen), ψ G is the global decoder parameter for GAE-DeepSVDD (frozen), c is the fixed DeepSVDD hypersphere center, λ p is the personalization regularization strength, f θ G is the frozen global encoder backbone, θ i n i t is the initialized personal head, and  ϕ k represents the trainable client-specific personal head.
During inference, we combine global and personalized components:
For GCN-DeepSVDD: s k , i = γ f θ G ( x i ) c 2 Global SVDD + ( 1 γ ) ϕ k ( f θ G ( x i ) ) Personal head
For GAE-DeepSVDD: s k , i = γ 1 f θ G ( x i ) c 2 Global SVDD + γ 2 x ^ i x i 2 Global reconstruction + γ 3 ϕ k ( f θ G ( x i ) ) Personal head
where γ i = 1 and weight γ (and γ 1 , γ 2 , γ 3 for GAE-DeepSVDD) is tuned on a held-out validation set of benign data from each client to balance the contribution of each anomaly signal.
Table 5 outlines the personalization strategy, including which parameters are frozen and which are adapted, to preserve global knowledge while allowing local client adaptation across both model variants. Algorithm 2 provides the pseudo-code for client personalization in G-PFL-ID.
Algorithm 2 Client Personalization with Frozen Backbone
Require: 
Global encoder f θ * , client data D k , personalization epochs E p e r s o n a l , personal head ϕ k 0
Ensure: 
Personalized model ϕ k *
  1:
Initialize personal head ϕ k 0 with shared initialization
  2:
for epoch e = 1 to E personal  do
  3:
      Sample batch B D k
  4:
      Compute embeddings: Z = f θ * ( B )              ▹ Frozen backbone
  5:
      Compute anomaly score (e.g., s = ϕ k ( Z ) c 2 for GCN-DeepSVDD) ▹ Trainable head
  6:
      Compute loss: L = 1 | B | i B s i + λ p ϕ k ϕ k 0 2
  7:
      Update: ϕ k ϕ k η ϕ k L
  8:
end for
  9:
return  ϕ k *

4. Evaluation

This section describes the experimental setup used to evaluate the proposed G-PFL-ID framework, including dataset selection, preprocessing and graph construction, partitioning (non-IID) and analysis, baseline comparisons, and the evaluation protocol.

4.1. Experimental Setup

Table 6 shows the comprehensive hyperparameter specifications used in the implementation of the global federated training and personalized strategy stages.
We evaluated the proposed framework from both client and server perspectives using the Flower federated learning framework [94] and PyTorch [95] with PyTorch Geometric [96] for graph neural network operations. All experiments were conducted on a system with AMD Ryzen 9 5900X CPU (12 cores, 24 threads @ 3.70 GHz), 32 GB DDR4/3600 RAM, and an NVIDIA RTX 4090 GPU with 24GB VRAM. The software environment included Python 3.9, PyTorch 1.13.1, PyTorch Geometric 2.3.0, and Flower 1.4.0.

4.1.1. Dataset and Preprocessing

For computational efficiency, we selected malware-capture-35-1 (Mirai botnet traffic) sub-capture containing 10,447,787 (8,262,389 benign and 2,185,398 malicious) connection records, each with 20 original Zeek connection log fields. This capture provides a realistic IoT attack scenario with sufficient data volume for meaningful federated learning experiments. The dataset supports both binary classification (benign vs. malicious) and one-class anomaly detection paradigms.
Zeek “conn” records were parsed using the header fields to extract connection attributes (timestamps, IPs, bytes, pkts, proto, service, conn_state, history, etc.). From raw fields we derived a richer set of per-flow features of a total of 48 as shown in Table 7. These features include: temporal (hour_of_day, day_of_week, is_weekend, log_duration, time_since_first), volumetric (orig/resp bytes & pkts, ip level bytes, missed bytes, bytes/packet ratios), connection state indicators (established, half-open, reset, etc.), protocol flags (SYN, ACK, FIN, RST, ICMP), and service one-hots (HTTP, DNS, DHCP, SSL, unknown). Categorical fields (e.g., proto, service, conn_state) were one-hot encoded and a log ( 1 + x ) transform was applied to stabilize long-tailed numeric features (distributions).
We build per-host feature vectors by aggregating flow statistics for each unique IP address (using id.orig_h for source hosts and id.resp_h for destination hosts). For each host h, we compute separate source and destination feature aggregates:
  • Source features: Mean values of numeric features when host h acts as source:
    src_mean h ( x ) = 1 N h , src i : src i = h x i
  • Destination features: Mean values of numeric features when host h acts as destination:
    dst_mean h ( x ) = 1 N h , dst i : dst i = h x i
These source and destination feature sets are combined using an outer join, with missing values filled with zeros. The host label y h is determined by logical OR aggregation across all flows involving host h:
y h = [ flow i : ( src i = h dst i = h ) label i = malicious ]
In one-class mode, only benign flows were used for training, resulting in all host labels y h = 0 .
We construct an undirected graph G = ( V , E , X ) where nodes V represent unique host IPs, and edges E connect hosts that communicate (at least one flow between them). Self-loops are removed. The final per-client graph is G k = ( V k , E k , X k ) where X k R | V k | × d is the per-host feature matrix. To ensure consistent encoder input sizes, we pad or truncate feature vectors to a fixed dimension d = 128 .
To evaluate the generalizability of G-PFL-ID, we also use the N-BaIoT dataset [97], which contains traffic from 9 real IoT devices infected by Mirai and BASHLITE botnets. Each device is treated as a separate client ( K = 9 ), reflecting a natural (non-IID) partitioning because each device type has distinct traffic patterns. The dataset provides 115 statistical features extracted from traffic windows (e.g., packet counts, byte rates, jitter). We use only benign traffic for training and include all attack types in testing.
Since N-BaIoT does not contain explicit source/destination IPs, we construct a temporal graph for each device as follows. We split the traffic of each device into non-overlapping windows of 100 consecutive packets. For each window, we compute the mean of the 115 features, resulting in a node feature vector x i R 115 . We then build an undirected chain graph connecting each window to its immediate predecessor and successor, capturing the temporal dependency of normal traffic. Self-loops are removed. The resulting graph for device k is G k = ( V k , E k , X k ) where | V k | is the number of windows, E k are the temporal edges, and  X k R | V k | × 115 is the node feature matrix. We standardize features per client and pad/truncate to a fixed dimension d = 128 for consistency with the encoder.
We split each device’s data into 70% benign for training, 10% benign for validation, and 20% (benign + all attacks) for testing. This construction allows us to test G-PFL-ID on a fundamentally different graph topology (temporal chains) compared to the host-based graphs of IoT-23, thereby assessing its adaptability.

4.1.2. Non-IID Partitioning and Client Splits

To simulate client heterogeneity, we partition the dataset across K clients using a label-dependent Dirichlet distribution, following established methodologies for federated learning benchmarking [98,99]. The partitioning process ensures that each client K receives a subset of data D k with label distribution p k Dir K ( α · 1 K ) , where α > 0 controls the degree of heterogeneity.
We run experiments for K { 10 , 15 , 20 } and α { 0.1 , 0.5 , } with smaller α (e.g 0.01 to 0.1 ) yielding highly skewed allocations, α = 0.5 yielding medium skewness and large α (e.g 1 ) yielding near-uniform allocations i.e., IID. After obtaining the sample indices assigned to each client, we construct each client’s local graph G k using only its assigned flows (per-host aggregation). This yields client graphs that vary in size and connectivity, emulating realistic IoT edge heterogeneity.
Algorithm 3 summarizes the procedure used in our experiments. To avoid tiny clients that preclude local training, we enforce a minimum sample constraint min _ samples , m (default m = 100 ). If the initial allocation yields some clients with fewer than m, we retry the sampling with a higher effective α (e.g., set α max ( α , 1.0 ) ). For lower heterogeneity, α was set between 0.01 and 0.1 ; for medium heterogeneity, α was set between 0.3 and 0.6 ; and for IID partitioning, this was not an issue because α was set to 10 6 . Additionally, we may increment the r n g seed to introduce more randomness into the retry process. The best settings for α were 0.1 , 0.5 , and  10 6 . Finally, the integer allocations are computed and adjusted to match the total remaining samples.
Once hosts are assigned to clients, each client’s host set is partitioned into train/validation/test splits (e.g., 70/10/20). All experiments are controlled by a single experiment seed ( r n g = 42 ).
Algorithm 3 Dirichlet-based Non-IID Partitioning with Minimum Sample Guarantees
Require: 
Labels y R N , client count K, concentration α , minimum samples m, random seed
Ensure: 
Client indices { I 1 , , I K }
  1:
Initialize I k for k = 1 , , K
  2:
y np . array ( y ) , r n g RandomState ( seed )
  3:
c l a s s e s , c o u n t s unique ( y , return_counts = True )
  4:
if  | c l a s s e s | = 1  then                    ▹ Single-class scenario
  5:
       I arange ( N ) ; r n g . shuffle ( I )
  6:
      assert  N K · m                  ▹ Minimum sample requirement
  7:
      Assign first K · m samples: I k I [ ( k 1 ) m : k m ] for k = 1 , , K
  8:
       I rem I [ K · m : ]                      ▹ Remaining samples
  9:
       p r n g . dirichlet ( [ α ] × K )                 ▹ Skew proportions
10:
       c floor ( p · | I rem | )                        ▹ Integer allocations
11:
      Adjust c to satisfy k c k = | I rem |
12:
      Assign I rem proportionally to c
13:
end if
14:
if min k | I k | < m  then                   ▹ Minimum sample validation
15:
      Recursively retry with increased α max ( α , 1.0 )
16:
end if
17:
return  { I 1 , , I K }
To quantify the degree of non-IIDness across client partitions, we employ multiple complementary metrics that capture different aspects of distributional heterogeneity. We utilize Jensen-Shannon (JS) divergence as our primary metric for quantifying distributional differences between clients and the global dataset. The JS divergence provides a symmetric and bounded measure of distribution similarity, making it ideal for federated learning scenarios [100]. It ranges from 0 (identical distributions) to 1 (completely dissimilar distributions), with values below 0.1 typically indicating good distributional alignment. For a client distribution P k and global distribution P global , the JS divergence is defined as:
JS ( P k P global ) = 1 2 D KL P k M + 1 2 D KL P global M
where M = 1 2 ( P k + P global ) is the mixture distribution, and  D KL denotes the Kullback-Leibler divergence:
D KL ( P Q ) = x X P ( x ) log P ( x ) Q ( x )
where P ( x ) represents the probability distribution of client k, and  Q ( x ) is the global probability distribution across all clients.
Complementary to JS divergence, we measure the spread of feature values within each client called the feature variances. A high variation in feature variance across clients indicates heteroscedasticity.
Figure 3 presents the comprehensive non-IID analysis across three Dirichlet concentration parameters. The high non-IID setting ( α = 0.1 ) exhibited extreme heterogeneity with JS divergences ranging from 0.011 to 0.405, the moderate non-IID setting ( α = 0.5 ) showed significantly reduced heterogeneity with JS divergences tightly clustered around 8.2 × 10 5 . The same applied to the feature variances. The near-IID setting ( α ) approached homogeneous distribution with JS divergences nearly identical across client, ( 1.2 × 10 5 ) and feature variances showing minimal variation.
The N-BaIoT dataset provides a natural non-IID partitioning where each IoT device represents a distinct client with inherently different traffic characteristics. We quantify this heterogeneity using three complementary metrics computed on benign samples (Table 8):
  • Quantity skew. Data volume varies substantially across devices. In our benign subset the largest device (Device 4: Philips baby monitor) contributes 175,240 samples while the smallest device (Device 2: Ecobee thermostat) contributes 13,113 samples, a max/min ratio of approximately 13.4:1. This degree of skew reflects real-world deployments where device types and usage patterns differ.
  • Gini coefficient. The Gini coefficient for the nine-device benign split is G 0.37 , indicating a moderate level of inequality in sample counts (with a small number of devices contributing a large fraction of the data). We compute G as
    G = i = 1 n j = 1 n | x i x j | 2 n i = 1 n x i = 0.37 ,
    where x i denotes the benign-sample count of device i and n = 9 .
  • Coefficient of variation (CV). The mean number of benign samples per device is x ¯ = 61,770 and the standard deviation is σ 46,385, yielding CV = σ / x ¯ 0.75 . This high relative variability further establishes the imbalance that federated algorithms must tolerate during aggregation and personalization.

4.2. Discussion and Results

To demonstrate the various strengths of the proposed G-PFL-ID framework, we conducted and report experiments under two different settings: (1) core anomaly detection capability under the moderate non-iid setting α = 0.5 , and (2) the contribution of the regularizers and personalization the under the highest non-iid setting, and (3) the G-PFL-ID frameworks performance across all clients and α settings (ablation) is presented.

4.2.1. Experimental Setting 1: Global Anomaly Detection Performance

Table 9 presents the primary detection results under representative heterogeneity settings for both IoT-23 ( α = 0.5 , K = 10 ) and N-BaIoT ( K = 9 ) datasets. We compare classical one-class algorithms (Isolation Forest, One-Class SVM), the recent graph-based federated anomaly detection method LG-FGAD [74], federated MLP baselines (FedAvg and FedProx), and our proposed federated graph methods (GCN-DeepSVDD and GAE-DeepSVDD). The table reports area under the ROC curve (AUROC), area under the precision-recall curve (AUPR), true positive rates at false positive rates of 1%, 5%, and 10%, and inference time per sample.
The hybrid GAE-DeepSVDD consistently achieves the best detection performance across both datasets. On IoT-23, it yields 99.46% AUROC and 99.94% AUPR, substantially outperforming all baselines including LG-FGAD (96.10% AUROC). This performance advantage is pronounced at low false positive rates: at 1% FPR, GAE-DeepSVDD achieves 80.16% TPR compared to LG-FGAD’s 42.50% TPR—nearly double the detection rate. These results suggest that compared to the LG-FGAD’s adversarial augmentation approach, under moderate non-IID conditions, the hybrid objective (combining reconstruction and compactness) produces effective discriminative embeddings where anomalous nodes are well separated from normal ones, enabling high detection sensitivity while keeping false alarms very low.
The results also shows that the graph-based methods consistently outperform traditional feature-based approaches, confirming that graph encoders capture relational patterns—host homophily, service dependencies, temporal correlations—that flat feature representations miss. For instance, the pure GCN-DeepSVDD (97.25% AUROC on IoT-23) exceeds all non-graph baselines, including federated MLP variants. Even on N-BaIoT, where we construct simple temporal chain graphs, GCN-DeepSVDD achieves 96.48% AUROC, demonstrating that minimal graph structure provides valuable inductive bias.
Inference times per sample remain practical for real-time deployment. GCN-DeepSVDD is fastest at 1.55 ms per inference on IoT-23, while the hybrid GAE-DeepSVDD requires 1.99 ms—both suitable for near-real-time detection on IoT gateways. LG-FGAD requires 2.19 ms, slightly slower despite comparable graph processing. The modest computational overhead of our hybrid model is justified by its significantly better detection performance, particularly at stringent operating points.
Despite extreme natural heterogeneity (9.0:1 quantity skew, Gini coefficient 0.37) of the N-BaIoT, our methods maintain strong performance: GAE-DeepSVDD achieves 97.74% AUROC and 98.28% AUPR. This demonstrates that G-PFL-ID generalizes beyond host-based graphs to temporal constructions, and handles real-world device heterogeneity effectively. The performance gap between GAE-DeepSVDD and GCN-DeepSVDD narrows slightly on N-BaIoT (1.26 percentage points versus 2.21 points on IoT-23), suggesting that the reconstruction objective provides less advantage when graph structure is simpler. Nonetheless, both variants significantly outperform classical and federated MLP baselines.
These findings validate our core design principles. The hybrid reconstruction and compactness objective produces more discriminative embeddings than pure one-class or adversarial approaches. Graph encoders, even with simple constructions, capture relational patterns that boost detection accuracy. And the framework scales effectively from artificially partitioned datasets (IoT-23) to naturally heterogeneous deployments (N-BaIoT), maintaining strong performance under diverse operational conditions.

4.2.2. Experimental Setting 2: Component Contribution Analysis

Table 10 reports per-client AUROC under three configurations: (i) the base model alone (no federated regularizer, no personalization), (ii) with federated regularizers (FedProx proximal term plus an embedding variance penalty), and (iii) with personalization (a lightweight local head fine-tuned on each client plus center recalibration). Results are shown for three degrees of heterogeneity: high non-IID ( α = 0.1 ), medium non-IID ( α = 0.5 ), and near-IID ( α ).
Across all heterogeneity regimes, adding FedReg yields a clear and consistent improvement in average per-client AUROC while reducing variance. For instance, with GCN DeepSVDD at α = 0.1 the mean client AUROC rises from about 88.3% for the base model to 92.6% with FedReg, and for GAE DeepSVDD it rises from 91.0% to 96.1%. These gains are most pronounced in the highly non-IID setting, which matches our design intuition. The FedProx term limits large weight drift under quantity and distribution skew, and the variance penalty prevents each client’s representations from collapsing when its data are sparse. This is reflected in the reduced standard deviations across clients after FedReg, indicating more consistent performance.
The natural heterogeneity of N-BaIoT presents a distinct challenge with a 9.0:1 quantity skew across devices (Table 8). Despite this extreme imbalance, FedReg provides substantial improvements, raising GAE-DeepSVDD performance from 93.48% to 95.55% on average. Notably, low-data devices (e.g., Device 2 with 13,113 samples) benefit the most, with AUROC improvements of 3-4 percentage points. This demonstrates FedReg’s effectiveness in real-world scenarios where device heterogeneity is intrinsic rather than artificially induced.
Local personalization produces a further and often substantial uplift in per-client AUROC. Continuing the previous example (GCN DeepSVDD at α = 0.1 ), personalization increases the mean AUROC from 92.6% (with FedReg) to 97.0%. Crucially, personalization brings up the weaker clients that suffer the most from distribution mismatch. After personalization, clients that were previously in the low 80s rise into the mid 90s. In other words, adding a small local head and recalibrating the center effectively recovers local sensitivity without retraining the entire encoder or heavy communication.
On N-BaIoT, personalization yields an additional 2.19 percentage point improvement for GAE-DeepSVDD (from 95.55% to 97.74%), with low-data devices showing the largest gains. Device 2, for instance, improves from 90.33% to 93.10%, demonstrating that even with extreme quantity skew, lightweight personalization can significantly enhance local detection performance. This is particularly valuable in IoT deployments where devices cannot share their raw data but can afford to fine-tune a small local module.
Over all α settings, the hybrid GAE model achieves slightly higher average AUROC and smaller variance after FedReg and personalization compared to the pure GCN. For example, in the high non-IID case the GAE DeepSVDD reaches 97.21% average after personalization versus 97.00% for the GCN variant. We attribute this to the complementary learning objectives: the reconstruction term in the GAE offers a strong local constraint that is less affected by moderate distribution shifts, while the SVDD term aligns representations globally. Together they stabilize learning under heterogeneity.
A key concern in federated learning is poor worst-case performance on some clients. The per-client results in Table 10 show that the worst-performing clients under the base model improve markedly with FedReg and personalization. For instance, at α = 0.1 the lowest GCN DeepSVDD client AUROC moves from the low 80s in the base model to the mid-90s after personalization. This shows that our approach not only raises the average performance but also reduces client-level risk, addressing an important gap emphasized in personalized FL literature [83].
These findings illustrate several important points. First, relational inductive bias matters: graph encoders exploit patterns such as host homophily and repeated service patterns that simple feature models miss, yielding higher overall AUROC and especially much better detection at low FPR (which is critical in IDS). Second, the hybrid reconstruction-plus-SVDD objective works particularly well: the autoencoder forces the model to preserve local graph topology, while the SVDD term carves out a tight hypersphere of normal instances. Together they produce embeddings where anomalies fall clearly outside the sphere, in line with findings in [77]. Third, FedReg (FedProx plus variance penalty) effectively curbs model drift and collapse on clients with few samples, as discussed in [48,67]. Finally, personalization through a local head and center calibration leverages each client’s own benign data distribution to fine-tune thresholds without heavy overhead, significantly improving per-client sensitivity. This confirms that modest personalization can yield large practical gains in unsupervised one-class federated learning.

4.2.3. Experimental Setting 3: Ablation Study

To comprehensively evaluate G-PFL-ID’s robustness across different operational configurations, we conduct an ablation study that systematically varies two key parameters: (1) the degree of non-IIDness through the Dirichlet concentration parameter α { 0.1 , 0.5 , } , and (2) the federation size K { 10 , 15 , 20 } . Table 11 reports the per-client AUC-ROC for the complete G-PFL-ID framework (with personalization heads) across these configurations for both GCN-DeepSVDD and GAE-DeepSVDD backbones on the IoT-23 dataset.
The ablation study shows that performance improves as data distributions become more IID (i.e increasing α ) for a fixed federation size. With K = 10 , GCN-DeepSVDD average AUC-ROC increases from 97.00% at α = 0.1 to 99.04% at α = , while GAE-DeepSVDD shows a similar progression from 97.21% to 99.99%. This pattern holds across all K values, confirming that IID data facilitates global model convergence. However, even under extreme non-IID conditions ( α = 0.1 ), both variants maintain strong performance (above 94% for K 15 ), demonstrating the framework’s resilience to distributional skew.
The study also shows that increasing federation size (more clients with less data each) moderately impacts performance, particularly under high non-IIDness. For α = 0.1 , GAE-DeepSVDD average AUC-ROC slightly decreases by about as the number of clients increases from K = 10 to K = 20 , particularly from K = 10 to K = 15 . It then slightly recovers at K = 20 showing that generally as the client increases, the personalization stage becomes more critical because clients with very limited data benefit less from global aggregation but can still achieve reasonable performance through local adaptation.
The performance difference of the hybrid GAE-DeepSVDD over the pure GCN variant narrows down as the data becomes more IID across all configurations, suggesting that the reconstruction objective in GAE provides valuable regularization when global consensus is difficult to achieve. The hybrid architecture also exhibits lower performance variance, with standard deviations consistently 15-30% smaller than those of GCN-DeepSVDD.
These findings validate several design choices in G-PFL-ID. The FedReg regularizer (FedProx + variance penalty) proves essential for maintaining stability across diverse non-IID levels, as evidenced by the consistent performance improvements across α values. Personalization heads effectively mitigate the impact of increased federation size, allowing clients to adapt the global model to their specific data distributions. The GAE backbone’s dual reconstruction-compactness objective provides more robust representations than the pure one-class objective of GCN-DeepSVDD, especially when clients have limited or highly skewed data.

5. Conclusions

In this paper, we proposed G-PFL-ID, a graph-driven personalized federated learning framework for unsupervised intrusion detection in non-IID IoT systems. The framework leverages graph neural networks (GCN and GAE) combined with a DeepSVDD objective to learn compact representations of normal network behavior. To address the challenges of non-IID data, we introduced FedReg, a regularization method that includes a proximal term and an embedding variance penalty. Furthermore, we enabled personalization through client-specific heads to adapt the global model to local data distributions.
We evaluated G-PFL-ID on two IoT intrusion datasets under diverse heterogeneity scenarios. On IoT-23, we used Dirichlet-based partitioning to simulate varying degrees of non-IIDness ( α { 0.1 , 0.5 , } ) and federation sizes ( K { 10 , 15 , 20 } ), while N-BaIoT provided natural device-based partitioning with inherent heterogeneity (9.0:1 quantity skew, Gini coefficient 0.37). The results demonstrate that our approach significantly outperforms baseline methods, including classical one-class algorithms, federated MLP models, and recent graph-based federated anomaly detectors (e.g., LG-FGAD), in terms of AUROC, AUPR, and detection rates at low false positive rates. The hybrid GAE-DeepSVDD instantiation delivered the strongest global detection performance (99.46% AUROC on IoT-23, 97.74% on N-BaIoT) and the best per-client robustness across both artificial and natural partitioning schemes, while the GCN-DeepSVDD variant served as a principled baseline to assess the role of inductive bias.
Ablation studies confirm that (i) FedProx reduces parameter divergence under quantity skew, (ii) the variance penalty prevents embedding collapse on low-data clients and improves downstream AUROC, and (iii) lightweight personalization recovers client-level thresholds, especially for low-data clients under extreme non-IID settings. On N-BaIoT, despite extreme natural heterogeneity, G-PFL-ID achieved 97.74% average AUROC for GAE-DeepSVDD after personalization, with low-data devices showing the largest relative improvements (up to 6 percentage points).
We highlight three practical core insights for deploying federated graph anomaly detectors in IoT networks. First, combining reconstruction and one-class compactness provides complementary signals that improve resilience to heterogeneity across both artificial and natural partitioning. Second, simple, aggregation-aware regularizers (FedReg) are effective and computationally inexpensive, which is important for constrained edge devices. Third, personalization via small heads is an efficient way to produce strong per-client performance while keeping the backbone shared, with particular value in real-world deployments where device heterogeneity is intrinsic rather than artificially induced.
Limitations include reliance on reasonably representative benign data for one-class training, sensitivity to extreme concept drift, and the need to evaluate privacy/robustness under adversarial updates (e.g., targeted poisoning). Future work will extend G-PFL-ID to (a) adversarially robust aggregation and certified defenses, (b) continual calibration for concept drift, and (c) evaluation over wider operational datasets and live deployments. We release our code, model definitions, and partitioning scripts to enable reproducible follow-up research.

Author Contributions

Conceptualization, D.A.O.; methodology, D.A.O.; software, D.A.O.; validation, D.A.O.; formal analysis, D.A.O.; investigation, D.A.O.; resources, D.A.O., M.S. and E.M.; data curation, D.A.O.; writing—original draft preparation, D.A.O.; writing—review and editing, D.A.O., A.I., O.A.-A., O.E., S.G.T., M.S. and E.M.; visualization, D.A.O.; supervision, M.S. and E.M.; project administration, M.S. and E.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Code implementation can be found at G-PFL-ID: Graph Personalized Federated Learning Intrusion Detection (https://github.com/danieloladele7/g-pfl-ids, accessed on 23 November 2025). The IoT-23 dataset is available at Aposemat IoT-23 (https://www.stratosphereips.org/datasets-iot23, accessed on 23 November 2025). The N-BaIoT dataset is available at UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/detection_of_IoT_botnet_attacks_N_BaIoT, accessed on 3 January 2026).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
G-PFL-IDGraph-Driven Personalized Federated Learning for Intrusion Detection
non-IIDNon-Independent and Identically Distributed
GAE/GCNGlobal Graph Autoencoder or Graph Convolutional Network
DoS/DDoSDenial of Service/Distributed DoS
IOTInternet of Things
IDS/ADIntrusion Detection System/Anomaly Detection
ML/FLMachine Learning/Federated Learning
AUROCArea Under the Receiver Operating Characteristic Curve
AUPRArea Under the Precision–Recall Curve
FPR/TPRFalse Positive Rates/True Positive Rates

References

  1. Oladele, D.A.; Didam Markus, E.; Abu-Mahfouz, A.M. Adaptability of Assistive Mobility Devices and the Role of the Internet of Medical Things: Comprehensive Review. JMIR Rehabil. Assist. Technol. 2021, 8, e29610. [Google Scholar] [CrossRef] [PubMed]
  2. Shehu Yalli, J.; Hilmi Hasan, M.; Abubakar Badawi, A. Internet of Things (IoT): Origins, Embedded Technologies, Smart Applications, and Its Growth in the Last Decade. IEEE Access 2024, 12, 91357–91382. [Google Scholar] [CrossRef]
  3. Ahmed, S.F.; Alam, M.S.B.; Hoque, M.; Lameesa, A.; Afrin, S.; Farah, T.; Kabir, M.; Shafiullah, G.M.; Muyeen, S.M. Industrial Internet of Things enabled technologies, challenges, and future directions. Comput. Electr. Eng. 2023, 110, 108847. [Google Scholar] [CrossRef]
  4. Oladele, D.A.; Didam Markus, E.; Abu-Mahfouz, A.M. BEV-CAM3D: A Unified Bird’s-Eye View Architecture for Autonomous Driving with Monocular Cameras and 3D Point Clouds. AI 2025, 6, 82. [Google Scholar] [CrossRef]
  5. Oladele, D.A.; Didam Markus, E.; Abu-Mahfouz, A.M. FASTSeg3D: A Fast, Efficient, and Adaptive Ground Filtering Algorithm for 3D Point Clouds in Mobile Sensing Applications. AI 2025, 6, 97. [Google Scholar] [CrossRef]
  6. Pech, M.; Vrchota, J.; Bednář, J. Predictive Maintenance and Intelligent Sensors in Smart Factory: Review. Sensors 2021, 21, 1470. [Google Scholar] [CrossRef]
  7. Antonakakis, M.; April, T.; Bailey, M.; Bernhard, M.; Bursztein, E.; Cochran, J.; Durumeric, Z.; Halderman, J.A.; Invernizzi, L.; Kallitsis, M.; et al. Understanding the Mirai botnet. In Proceedings of the 26th USENIX Conference on Security Symposium, Vancouver, BC, Canada, 16–18 August 2017; USENIX Association: Berkeley, CA, USA, 2017; pp. 1093–1110. [Google Scholar]
  8. Boyarchuk, O.; Mariani, S.; Ortolani, S.; Vigna, G. Keeping Up with the Emotets: Tracking a Multi-infrastructure Botnet. Digit. Threats 2023, 4, 41. [Google Scholar] [CrossRef]
  9. Tanwar, G.S.; Goar, V. Tools, Techniques & Analysis of Botnet. In Proceedings of the International Conference on Information and Communication Technology for Competitive Strategies; Association for Computing Machinery: New York, NY, USA, 2014; pp. 92–96. [Google Scholar] [CrossRef]
  10. Rizvi, S.; Orr, R.J.; Cox, A.; Ashokkumar, P.; Rizvi, M.R. Identifying the attack surface for IoT network. Internet Things 2020, 9, 100162. [Google Scholar] [CrossRef]
  11. Hnamte, V.; Hussain, J. Enhancing security in Software-Defined Networks: An approach to efficient ARP spoofing attacks detection and mitigation. Telemat. Inform. Rep. 2024, 14, 100129. [Google Scholar] [CrossRef]
  12. Kambourakis, G.; Kolias, C.; Stavrou, A. The Mirai botnet and the IoT Zombie Armies. In Proceedings of the MILCOM 2017 IEEE Military Communications Conference (MILCOM), Baltimore, MD, USA, 23–25 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 267–272. [Google Scholar] [CrossRef]
  13. Paquet, C. Implementing Cisco IOS Network Security (IINS). In PART II: Protecting the Network Infrastructure; Cisco Press: Indianapolis, IN, USA, 2009; Chapter 3–6; pp. 111–300. [Google Scholar]
  14. Paquet, C. Implementing Cisco IOS Network Security (IINS). In PART III: Threat Control and Containment; Cisco Press: Indianapolis, IN, USA, 2009; Chapter 7–11; pp. 305–520. [Google Scholar]
  15. Lekkala, S.; Gurijala, P. Designing Secure Network Architectures. In Security and Privacy for Modern Networks: Strategies and Insights for Safeguarding Digital Infrastructures; Apress: Berkeley, CA, USA, 2024; Chapter 8; pp. 75–86. [Google Scholar] [CrossRef]
  16. Lekkala, S.; Gurijala, P. Securing Networks with SDN and SD-WAN. In Security and Privacy for Modern Networks: Strategies and Insights for Safeguarding Digital Infrastructures; Apress: Berkeley, CA, USA, 2024; Chapter 12; pp. 121–131. [Google Scholar] [CrossRef]
  17. Lekkala, S.; Gurijala, P. Proactive Intrusion Detection and Network Surveillance. In Security and Privacy for Modern Networks: Strategies and Insights for Safeguarding Digital Infrastructures; Apress: Berkeley, CA, USA, 2024; Chapter 10; pp. 99–108. [Google Scholar] [CrossRef]
  18. Abdulganiyu, O.H.; Tchakoucht, T.A.; Saheed, Y.K. A systematic literature review for network intrusion detection system (IDS). Int. J. Inf. Secur. 2023, 22, 1125–1162. [Google Scholar] [CrossRef]
  19. Roesch, M. Snort–Lightweight Intrusion Detection for Networks. In Proceedings of the 13th USENIX Conference on System Administration (LISA ’99), Seattle, WA, USA, 7–12 November 1999; USENIX Association: Berkeley, CA, USA, 1999; pp. 229–238. [Google Scholar]
  20. Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J. Survey of intrusion detection systems: Techniques, datasets and challenges. Cybersecurity 2019, 2, 20. [Google Scholar] [CrossRef]
  21. Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 15. [Google Scholar] [CrossRef]
  22. Ferrag, M.A.; Maglaras, L.; Moschoyiannis, S.; Janicke, H. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. J. Inf. Secur. Appl. 2020, 50, 102419. [Google Scholar] [CrossRef]
  23. Guerra, J.L.; Catania, C.; Veas, E. Datasets are not enough: Challenges in labeling network traffic. Comput. Secur. 2022, 120, 102810. [Google Scholar] [CrossRef]
  24. Alotaibi, B. A Survey on Industrial Internet of Things Security: Requirements, Attacks, AI-Based Solutions, and Edge Computing Opportunities. Sensors 2023, 23, 7470. [Google Scholar] [CrossRef]
  25. Kikissagbe, B.R.; Adda, M. Machine Learning-Based Intrusion Detection Methods in IoT Systems: A Comprehensive Review. Electronics 2024, 13, 3601. [Google Scholar] [CrossRef]
  26. Sakurada, M.; Yairi, T. Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction. In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, Gold Coast, QLD, Australia, 2 December 2014; Association for Computing Machinery: New York, NY, USA, 2014; pp. 4–11. [Google Scholar] [CrossRef]
  27. Ruff, L.; Vandermeulen, R.; Goernitz, N.; Deecke, L.; Siddiqui, S.A.; Binder, A.; Müller, E.; Kloft, M. Deep One-Class Classification. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Dy, J., Krause, A., Eds.; PMLR: Cambridge, MA, USA, 2018; Volume 80, pp. 4393–4402. Available online: https://proceedings.mlr.press/v80/ruff18a.html (accessed on 5 December 2025).
  28. Kipf, T.N.; Welling, M. Variational Graph Auto-Encoders. In Proceedings of the Bayesian Deep Learning Workshop, NeurIPS 2016; NeurIPS Foundation: Barcelona, Spain, 2016; pp. 1–3. Available online: https://bayesiandeeplearning.org/2016/ (accessed on 5 December 2025).
  29. Jiang, B.; Zhang, Z.; Lin, D.; Tang, J.; Luo, B. Semi-Supervised Learning With Graph Learning-Convolutional Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 11305–11312. [Google Scholar] [CrossRef]
  30. Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar] [CrossRef]
  31. Rahman, S.A.; Tout, H.; Talhi, C.; Mourad, A. Internet of Things Intrusion Detection: Centralized, On-Device, or Federated Learning? IEEE Netw. 2020, 34, 310–317. [Google Scholar] [CrossRef]
  32. Sultana, N.; Chilamkurti, N.; Peng, W.; Alhadad, R. Survey on SDN based network intrusion detection system using machine learning approaches. Peer-to-Peer Netw. Appl. 2019, 12, 493–501. [Google Scholar] [CrossRef]
  33. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Agüera y Arcas, B. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, FL, USA, 20–22 April 2017; Singh, A., Zhu, J., Eds.; PMLR: Cambridge, MA, USA, 2017; Volume 54, pp. 1273–1282. Available online: https://proceedings.mlr.press/v54/mcmahan17a.html (accessed on 5 December 2025).
  34. Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Nitin Bhagoji, A.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and Open Problems in Federated Learning. arXiv 2019, arXiv:1912.04977. [Google Scholar] [CrossRef]
  35. Tang, Z.; Hu, H.; Xu, C. A federated learning method for network intrusion detection. Concurr. Comput. Pract. Exp. 2022, 34, e6812. [Google Scholar] [CrossRef]
  36. Friha, O.; Ferrag, M.A.; Shu, L.; Maglaras, L.; Choo, K.R.; Nafaa, M. FELIDS: Federated learning-based intrusion detection system for agricultural Internet of Things. J. Parallel Distrib. Comput. 2022, 165, 17–31. [Google Scholar] [CrossRef]
  37. Mazid, A.; Kirmani, S.; Manaullah; Yadav, M. FL-IDPP: A Federated Learning Based Intrusion Detection Approach with Privacy Preservation. Trans. Emerg. Telecommun. Technol. 2025, 36, e70039. [Google Scholar] [CrossRef]
  38. McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; Agüera y Arcas, B. Communication-Efficient Learning of Deep Networks from Decentralized Data. arXiv 2023, arXiv:1602.05629. [Google Scholar] [CrossRef]
  39. Zhao, Y.; Li, M.; Lai, L.; Suda, N.; Civin, D.; Chandra, V. Federated Learning with Non-IID Data. arXiv 2018. [Google Scholar] [CrossRef]
  40. Campos, E.M.; Saura, P.F.; González-Vidal, A.; Hernández-Ramos, J.L.; Bernabé, J.B.; Baldini, G.; Skarmeta, A. Evaluating Federated Learning for intrusion detection in Internet of Things: Review and challenges. Comput. Netw. 2022, 203, 108661. [Google Scholar] [CrossRef]
  41. Agrawal, S.; Sarkar, S.; Aouedi, O.; Yenduri, G.; Piamrat, K.; Alazab, M.; Bhattacharya, S.; Maddikunta, P.K.R.; Gadekallu, T.R. Federated Learning for intrusion detection system: Concepts, challenges and future directions. Comput. Commun. 2022, 195, 346–361. [Google Scholar] [CrossRef]
  42. Khraisat, A.; Alazab, A.; Singh, S.; Jan, T.; Gomez, A., Jr. Survey on Federated Learning for Intrusion Detection System: Concept, Architectures, Aggregation Strategies, Challenges, and Future Directions. ACM Comput. Surv. 2024, 57, 7. [Google Scholar] [CrossRef]
  43. Hernandez-Ramos, J.L.; Karopoulos, G.; Chatzoglou, E.; Kouliaridis, V.; Marmol, E.; Gonzalez-Vidal, A.; Kambourakis, G. Intrusion Detection Based on Federated Learning: A Systematic Review. ACM Comput. Surv. 2025, 57, 309. [Google Scholar] [CrossRef]
  44. Lu, Z.; Pan, H.; Dai, Y.; Si, X.; Zhang, Y. Federated Learning with Non-IID Data: A Survey. IEEE Internet Things J. 2024, 11, 19188–19209. [Google Scholar] [CrossRef]
  45. Ma, X.; Zhu, J.; Lin, Z.; Chen, S.; Qin, Y. A State-of-the-Art Survey on Solving Non-IID Data in Federated Learning. Future Gener. Comput. Syst. 2022, 135, 244–258. [Google Scholar] [CrossRef]
  46. Xie, H.; Ma, J.; Xiong, L.; Yang, C. Federated Graph Classification over Non-IID Graphs. Adv. Neural Inf. Process. Syst. 2021, 34, 18839–18852. [Google Scholar]
  47. Yao, Y.; Jin, W.; Ravi, S.; Joe-Wong, C. FedGCN: Convergence-Communication Tradeoffs in Federated Training of Graph Convolutional Networks. Adv. Neural Inf. Process. Syst. 2023, 36, 79748–79760. [Google Scholar]
  48. Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated Optimization in Heterogeneous Networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
  49. Bhavsar, M.H.; Bekele, Y.B.; Roy, K.; Kelly, J.C.; Limbrick, D. FL-IDS: Federated Learning-Based Intrusion Detection System Using Edge Devices for Transportation IoT. IEEE Access 2024, 12, 52215–52226. [Google Scholar] [CrossRef]
  50. Nguyen, Q.H.; Hore, S.; Shah, A.; Le, T.; Bastian, N.D. FedNIDS: A Federated Learning Framework for Packet-Based Network Intrusion Detection System. Digit. Threats 2025, 6, 4. [Google Scholar] [CrossRef]
  51. Althunayyan, M.; Javed, A.; Rana, O. A robust multi-stage intrusion detection system for in-vehicle network security using hierarchical federated learning. Veh. Commun. 2024, 49, 100837. [Google Scholar] [CrossRef]
  52. Huang, K.; Xian, R.; Xian, M.; Wang, H.; Ni, L. A comprehensive intrusion detection method for the Internet of Vehicles based on federated learning architecture. Comput. Secur. 2024, 147, 104067. [Google Scholar] [CrossRef]
  53. Shao, J.; Zeng, G.; Lu, K.; Geng, G.; Weng, J. Automated federated learning for intrusion detection of industrial control systems based on evolutionary neural architecture search. Comput. Secur. 2024, 143, 103910. [Google Scholar] [CrossRef]
  54. Praharaj, L.; Gupta, D.; Gupta, M. Efficient federated transfer learning-based network anomaly detection for cooperative smart farming infrastructure. Smart Agric. Technol. 2025, 10, 100727. [Google Scholar] [CrossRef]
  55. Popli, M.S.; Singh, R.P.; Popli, N.K.; Mamun, M. A Federated Learning Framework for Enhanced Data Security and Cyber Intrusion Detection in Distributed Network of Underwater Drones. IEEE Access 2025, 13, 12634–12646. [Google Scholar] [CrossRef]
  56. Jin, Z.; Zhou, J.; Li, B.; Wu, X.; Duan, C. FL-IIDS: A novel federated learning-based incremental intrusion detection system. Future Gener. Comput. Syst. 2024, 151, 57–70. [Google Scholar] [CrossRef]
  57. Singh, G.; Sood, K.; Rajalakshmi, P.; Nguyen, D.D.N.; Xiang, Y. Evaluating Federated Learning-Based Intrusion Detection Scheme for Next Generation Networks. IEEE Trans. Netw. Serv. Manag. 2024, 21, 4816–4829. [Google Scholar] [CrossRef]
  58. Bank, D.; Koenigstein, N.; Giryes, R. Autoencoders. arXiv 2021. [Google Scholar] [CrossRef]
  59. Kingma, D.P.; Welling, M. An Introduction to Variational Autoencoders. Found. Trends Mach. Learn. 2019, 12, 307–392. [Google Scholar] [CrossRef]
  60. Yepmo, V.; Smits, G.; Lesot, M.J.; Pivert, O. Leveraging an Isolation Forest to Anomaly Detection and Data Clustering. Data Knowl. Eng. 2024, 151, 102302. [Google Scholar] [CrossRef]
  61. Dong, B.; Chen, D.; Wu, Y.; Tang, S.; Zhuang, Y. FADngs: Federated Learning for Anomaly Detection. IEEE Trans. Neural Networks Learn. Syst. 2025, 36, 2578–2592. [Google Scholar] [CrossRef]
  62. Nguyen, V.T.; Beuran, R. FedMSE: Semi-supervised federated learning approach for IoT network intrusion detection. Comput. Secur. 2025, 151, 104337. [Google Scholar] [CrossRef]
  63. Zhang, A.; Zhao, P.; Lu, W.; Zhou, Y.; Zhang, W.; Zhang, G. Mitigating Poisoning Attacks in Federated Learning Through Deep One-Class Classification. IEEE Trans. Cogn. Commun. Netw. 2025, 12, 545–558. [Google Scholar] [CrossRef]
  64. Lu, Y.; Yang, T.; Zhao, C.; Chen, W.; Zeng, R. A swarm anomaly detection model for IoT UAVs based on a multi-modal denoising autoencoder and federated learning. Comput. Ind. Eng. 2024, 196, 110454. [Google Scholar] [CrossRef]
  65. Shrestha, R.; Mohammadi, M.; Sinaei, S.; Salcines, A.; Pampliega, D.; Clemente, R.; Sanz, A.L.; Nowroozi, E.; Lindgren, A. Anomaly detection based on LSTM and autoencoders using federated learning in smart electric grid. J. Parallel Distrib. Comput. 2024, 193, 104951. [Google Scholar] [CrossRef]
  66. He, Y.; Ding, X.; Tang, Y.; Guan, J.; Zhou, S. Unsupervised Multivariate Time Series Anomaly Detection by Feature Decoupling in Federated Learning Scenarios. IEEE Trans. Artif. Intell. 2025, 6, 2013–2026. [Google Scholar] [CrossRef]
  67. Liao, X.; Liu, W.; Chen, C.; Zhou, P.; Yu, F.; Zhu, H.; Yao, B.; Wang, T.; Zheng, X.; Tan, Y. Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 22841–22850. [Google Scholar] [CrossRef]
  68. Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef] [PubMed]
  69. Yuan, H.; Yu, H.; Gui, S.; Ji, S. Explainability in Graph Neural Networks: A Taxonomic Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 5782–5799. [Google Scholar] [CrossRef]
  70. Wang, J.; Liang, J.; Yao, K.; Liang, J.; Wang, D. Graph convolutional autoencoders with co-learning of graph structure and node attributes. Pattern Recognit. 2022, 121, 108215. [Google Scholar] [CrossRef]
  71. Roy, A.; Shu, J.; Li, J.; Yang, C.; Elshocht, O.; Smeets, J.; Li, P. GAD-NR: Graph Anomaly Detection via Neighborhood Reconstruction. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, WSDM ’24, Merida, Mexico, 4–8 March 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 576–585. [Google Scholar] [CrossRef]
  72. Wu, N.; Zhao, Y.; Dong, H.; Xi, K.; Yu, W.; Wang, W. Federated Graph Anomaly Detection Through Contrastive Learning with Global Negative Pairs. Proc. AAAI Conf. Artif. Intell. 2025, 39, 21554–21562. [Google Scholar] [CrossRef]
  73. Kong, X.; Zhang, W.; Wang, H.; Hou, M.; Chen, X.; Yan, X.; Das, S.K. Federated Graph Anomaly Detection via Contrastive Self-Supervised Learning. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 7931–7944. [Google Scholar] [CrossRef]
  74. Cai, J.; Zhang, Y.; Fan, J.; Ng, S.K. LG-FGAD: An Effective Federated Graph Anomaly Detection Framework. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, Jeju, Republic of Korea, 3–9 August 2024; Larson, K., Ed.; Main Track; International Joint Conferences on Artificial Intelligence Organization: Bremen, Germany, 2024; Volume 8, pp. 3760–3769. [Google Scholar] [CrossRef]
  75. Xia, F.; Sun, K.; Yu, S.; Aziz, A.; Wan, L.; Pan, S.; Liu, H. Graph Learning: A Survey. IEEE Trans. Artif. Intell. 2021, 2, 109–127. [Google Scholar] [CrossRef]
  76. Tax, D.M.J.; Duin, R.P.W. Support Vector Data Description. Mach. Learn. 2004, 54, 45–66. [Google Scholar] [CrossRef]
  77. Zhou, Y.; Liang, X.; Zhang, W.; Zhang, L.; Song, X. VAE-based Deep SVDD for anomaly detection. Neurocomputing 2021, 453, 131–140. [Google Scholar] [CrossRef]
  78. Yi, J.; Yoon, S. Patch SVDD: Patch-Level SVDD for Anomaly Detection and Segmentation. In Proceedings of the Computer Vision—ACCV 2020, Kyoto, Japan, 30 November–4 December 2020; Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 375–390. [Google Scholar]
  79. Zhu, H.; Xu, J.; Liu, S.; Jin, Y. Federated learning on non-IID data: A survey. Neurocomputing 2021, 465, 371–390. [Google Scholar] [CrossRef]
  80. Arivazhagan, M.G.; Aggarwal, V.; Singh, A.K.; Choudhary, S. Federated Learning with Personalization Layers. arXiv 2019. [Google Scholar] [CrossRef]
  81. Collins, L.; Hassani, H.; Mokhtari, A.; Shakkottai, S. Exploiting Shared Representations for Personalized Federated Learning. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Meila, M., Zhang, T., Eds.; PMLR: Cambridge, MA, USA, 2021; Volume 139, pp. 2089–2099. Available online: https://proceedings.mlr.press/v139/collins21a.html (accessed on 5 December 2025).
  82. Thein, T.T.; Shiraishi, Y.; Morii, M. Personalized federated learning-based intrusion detection system: Poisoning attack and defense. Future Gener. Comput. Syst. 2024, 153, 182–192. [Google Scholar] [CrossRef]
  83. Li, T.; Hu, S.; Beirami, A.; Smith, V. Ditto: Fair and Robust Federated Learning Through Personalization. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Meila, M., Zhang, T., Eds.; PMLR: Cambridge, MA, USA, 2021; Volume 139, pp. 6357–6368. Available online: https://proceedings.mlr.press/v139/li21h.html (accessed on 5 December 2025).
  84. Fallah, A.; Mokhtari, A.; Ozdaglar, A. Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 3557–3568. [Google Scholar]
  85. Yang, L.; Huang, J.; Lin, W.; Cao, J. Personalized Federated Learning on Non-IID Data via Group-based Meta-learning. ACM Trans. Knowl. Discov. Data 2023, 17, 49. [Google Scholar] [CrossRef]
  86. Jiang, Y.; Konečný, J.; Rush, K.; Kannan, S. Improving Federated Learning Personalization via Model Agnostic Meta Learning. arXiv 2023. [Google Scholar] [CrossRef]
  87. de Cámara, X.S.; Flores, J.L.; Arellano, C.; Urbieta, A.; Zurutuza, U. Clustered federated learning architecture for network anomaly detection in large scale heterogeneous IoT networks. Comput. Secur. 2023, 131, 103299. [Google Scholar] [CrossRef]
  88. Zhou, J.; Xie, H.; Yang, C. Graph Personalized Federated Learning via Client Network Learning. Trans. Mach. Learn. Res. 2025. Available online: https://openreview.net/forum?id=pyTTR4pxkU (accessed on 5 December 2025).
  89. Zhu, D.; Ma, Y.; Liu, Y. Anomaly Detection with Deep Graph Autoencoders on Attributed Networks. In Proceedings of the 2020 IEEE Symposium on Computers and Communications (ISCC), Rennes, France, 7–10 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar] [CrossRef]
  90. Liu, R.; Xing, P.; Deng, Z.; Li, A.; Guan, C.; Yu, H. Federated Graph Neural Networks: Overview, Techniques, and Challenges. IEEE Trans. Neural Networks Learn. Syst. 2025, 36, 4279–4295. [Google Scholar] [CrossRef] [PubMed]
  91. Ruff, L.; Vandermeulen, R.A.; Görnitz, N.; Binder, A.; Müller, E.; Müller, K.R.; Kloft, M. Deep Semi-Supervised Anomaly Detection. arXiv 2020, arXiv:1906.02694. [Google Scholar] [CrossRef]
  92. Nagarajan, V.; Andreassen, A.; Neyshabur, B. Understanding the failure modes of out-of-distribution generalization. arXiv 2021, arXiv:2010.15775. [Google Scholar] [CrossRef]
  93. Tan, A.Z.; Yu, H.; Cui, L.; Yang, Q. Towards Personalized Federated Learning. IEEE Trans. Neural Networks Learn. Syst. 2023, 34, 9587–9603. [Google Scholar] [CrossRef] [PubMed]
  94. Beutel, D.J.; Topal, T.; Mathur, A.; Qiu, X.; Fernandez-Marques, J.; Gao, Y.; Sani, L.; Kwing, H.L.; Parcollet, T.; Gusmão, P.P.d.; et al. Flower: A Friendly Federated Learning Research Framework. arXiv 2020, arXiv:2007.14390. [Google Scholar]
  95. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
  96. Fey, M.; Lenssen, J.E. Fast Graph Representation Learning with PyTorch Geometric. arXiv 2019. [Google Scholar] [CrossRef]
  97. Meidan, Y.; Bohadana, M.; Mathov, Y.; Mirsky, Y.; Shabtai, A.; Breitenbacher, D.; Elovici, Y. N-BaIoT: Network-based Detection of IoT Botnet Attacks Using Deep Autoencoders. IEEE Pervasive Comput. 2018, 17, 12–22. [Google Scholar] [CrossRef]
  98. Hsu, T.M.H.; Qi, H.; Brown, M. Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification. arXiv 2019. [Google Scholar] [CrossRef]
  99. Yurochkin, M.; Agarwal, M.; Ghosh, S.; Greenewald, K.; Hoang, T.N.; Khazaeni, Y. Bayesian Nonparametric Federated Learning of Neural Networks. arXiv 2019. [Google Scholar] [CrossRef]
  100. Fuglede, B.; Topsoe, F. Jensen-Shannon divergence and Hilbert space embedding. In Proceedings of the International Symposium on Information Theory, 2004, ISIT 2004, Chicago, IL, USA, 27 June–2 July 2004; IEEE: Piscataway, NJ, USA, 2004; p. 31. [Google Scholar] [CrossRef]
Figure 1. G-PFL-ID Framework Overview: Two-stage architecture combining global federated training with client-specific personalization. Stage 1 learns a shared global parameter θ from distributed client data. Clients run local one-class training with proximal and variance regularizers. Stage 2 adapts personalized parameter ϕ for individual clients.
Figure 1. G-PFL-ID Framework Overview: Two-stage architecture combining global federated training with client-specific personalization. Stage 1 learns a shared global parameter θ from distributed client data. Clients run local one-class training with proximal and variance regularizers. Stage 2 adapts personalized parameter ϕ for individual clients.
Iot 07 00013 g001
Figure 2. Architecture comparison: (Left) GCN-DeepSVDD maps input graphs to embeddings optimized to minimize distance from learned hypersphere center c. (Right) GAE-DeepSVDD encoder f θ compresses graphs to latent representations while decoder g ψ reconstructs adjacency matrix, with DeepSVDD simultaneously minimizing hypersphere distance.
Figure 2. Architecture comparison: (Left) GCN-DeepSVDD maps input graphs to embeddings optimized to minimize distance from learned hypersphere center c. (Right) GAE-DeepSVDD encoder f θ compresses graphs to latent representations while decoder g ψ reconstructs adjacency matrix, with DeepSVDD simultaneously minimizing hypersphere distance.
Iot 07 00013 g002
Figure 3. Comprehensive non-IID analysis across heterogeneity levels. The first graph shows the Feature variance distributions across clients, and the second Jensen-Shannon divergence between client and global distributions. Figure (a) is the non-IID analysis for α = 0.1 (b) non-IID analysis for α = 0.5 , and (c) non-IID analysis for α = . Results demonstrate progressive distributional alignment from high non-IID ( α = 0.1 ) to near-IID ( α = ) conditions, with significant heterogeneity in both label distributions and feature characteristics under realistic IoT scenarios.
Figure 3. Comprehensive non-IID analysis across heterogeneity levels. The first graph shows the Feature variance distributions across clients, and the second Jensen-Shannon divergence between client and global distributions. Figure (a) is the non-IID analysis for α = 0.1 (b) non-IID analysis for α = 0.5 , and (c) non-IID analysis for α = . Results demonstrate progressive distributional alignment from high non-IID ( α = 0.1 ) to near-IID ( α = ) conditions, with significant heterogeneity in both label distributions and feature characteristics under realistic IoT scenarios.
Iot 07 00013 g003
Table 1. Comparison of selected federated learning-based IDS works and the present paper.
Table 1. Comparison of selected federated learning-based IDS works and the present paper.
RefYearFL AlgorithmLocal Model (Client)Server Model/AggregationLearning TypeNon-IID ReportedPersonalized
Jin et al. [56]2024FedAvg with relayCNN/CNN-GRU variantsRelay client fuses reconstructed samplesSNoNo
Mazid et al. [37]2025FedAvg + votingBi-RNN/RNNVoting ensemble at serverSNoNo
Singh et al. [57]2024FedAvg variant (class-balanced)DNN (local)Aggregation with class-balancing adjustmentsSNoNo
Nguyen et al. [50]2025FedAvgPacket-level DNNAggregated DNNSNoFT
Xie et al. [46]2021Clustered FLLocal GNNsCluster-specific aggregations (gradient-based clustering)SYes/JSDC
Yao et al. [47]2023FedGCN (pre-exchange)GCN variants (local)Server aggregates precomputed neighbour summariesSSDDNo
Cai et al. [74]2024Custom federated GADGNN encoder + local discriminatorDual knowledge distillation (local ↔ global)UYes/KL-DKDist
Zhang et al. [63]2025Deep one-class in FL (defence)DeepSVDD embeddings on local updatesServer filters suspicious updates using one-class scoresUYes/DPANo
Sáez-de-Cámara et al. [87]2023Clustered FL + fingerprintingLocal unsupervised anomaly detectorsServer clusters devices by model fingerprints; cluster-wise trainingUYes/InferC
Li et al. [83]2021Regularised FedAvg (bi-level optimisation)Generic DNN modelsAggregates global model; each client solves personalised subproblem with proximal regulariser λ θ i θ g 2 SYes/DPRegP
Fallah et al. [84]2020Meta-learning based FLTask-agnostic networks (MAML-style)Global model provides initialisation optimised for fast client adaptationSYes/TVD & 1-WDMetL
Thein et al. [82]2024pFL-IDS (personalized)Local supervised classifierAggregated backbone + local headSYes/DPFT
Ours2025FedProx variant + var-penaltyGCN/GAE backbone; small local one-class headAggregated backbone; local head personalisation & fine-tuningUYes/(DD + KL-D)FT & RegP
Columns report the following: the federated learning (FL) algorithm used, the local (client) model, and the server aggregation method. The learning type is indicated as supervised (S), semi-supervised (SS), or unsupervised (U). The Non-IID behaviour column specifies whether the model was tested under non-IID settings. The possible entries are: ’Not reported’ (No), ’Mild’ (mentioned but not experimentally reported), or ’Yes’ (experimentally reported), with the metrics used for measurement listed. These metrics includes: Jensen-Shannon Distance (JSD), Dirichlet Distribution (DD), Generic Data partitioning (DP), Kullback–Leibler Divergence (KL-D), Direct Probabilistic Assignment (DPA), Total Variation Distance (TVD), 1-Wasserstein Distance (1-WD) and Device Characteristics (Infer). Finally, the Personalization column indicates whether personalization is provided. Personalization types include: Fine-Tuning Cluster (FT), Proximal Regularization (RegP), Meta-Learning (MetL), Clustering (C), Knowledge Distillation (Dist).
Table 2. Notation summary.
Table 2. Notation summary.
SymbolMeaning
KNumber of federated clients
D k Local dataset at client k (flows, per-host aggregates)
G k = ( V k , E k , X k ) Local graph at client k with nodes V k , edges E k , features X k R n k × d
dthe input feature dimension (typically 128 for IoT-23 features)
n k Number of nodes at client k ( | V k | )
NTotal nodes across all clients, N = k n k
f θ Shared graph encoder (GCN/GAE encoder) with encoder parameter θ shared globally via federation
g ψ Optional decoder for GAE reconstruction with decoder parameter ψ shared globally via federation
h ϕ k Client k’s lightweight personalization head
Z k Embeddings at client k, Z k = f θ ( X k , A k ) R n k × m
mEmbedding dimension (typically 16 in our experiments)
z k , i Embedding for node i at client k (row of Z k )
c/ c k Global or client DeepSVDD center; c k is client k’s hypersphere center
μ FedProx proximal coefficient used in local updates
λ Weight decay regularization coefficient
α Dirichlet concentration parameter for non-IID partitioning
Table 3. GCN –DeepSVDD architecture specifications.
Table 3. GCN –DeepSVDD architecture specifications.
LayerOperationOutput Shape
InputNode features X R n × d n × d
GCN block 1GCNConv( d 64 ) → BatchNorm → ReLU n × 64
Dropout p = 0.3 n × 64
GCN block 2GCNConv( 64 32 ) → BatchNorm → ReLU n × 32
GCN block 3GCNConv( 32 m ) → Linear n × m
EmbeddingNode representations Z = f θ ( X , A ) n × m
DeepSVDDEuclidean distance to center c R m n × 1
Table 4. GAE–DeepSVDD architecture specifications.
Table 4. GAE–DeepSVDD architecture specifications.
Layer/ComponentOperationOutput Shape
InputNode features X R n × d , adjacency A n × d , n × n
Encoder block 1GCNConv( d 64 ) → BatchNorm → ReLU n × 64
Encoder block 2GCNConv( 64 32 ) → BatchNorm → ReLU n × 32
Encoder block 3GCNConv( 32 m ) → Linear n × m
Latent representation Z = f θ ( X , A ) n × m
Decoder block 1GCNConv( m 32 ) → BatchNorm → ReLU n × 32
Decoder block 2GCNConv( 32 64 ) → BatchNorm → ReLU n × 64
Reconstruction A ^ = σ ( Z Z ) n × n
DeepSVDDEuclidean distance to center c R m n × 1
Table 5. Personalization Strategy by Model Type.
Table 5. Personalization Strategy by Model Type.
Model TypeFrozen ComponentsAdapted Components
GCN-DeepSVDDEarly encoder layers (convs.0, projector.0-3)Head layers, final encoder (convs.1, projector.4)
GAE-DeepSVDDEarly encoder layersDecoder layers, final encoder layer
Table 6. Comprehensive Hyperparameter Specifications.
Table 6. Comprehensive Hyperparameter Specifications.
HyperparameterDescriptionTypical Value
λ compact Embedding compactness weight 0.01
μ FedProx regularization strength 0.01
KNumber of federated clients 10 , 15 , 30
α c Center update rate (DeepSVDD) 0.1
β SVDD Weight1.0
α Dirichlet parameter for non-IID partitioning 0.1 , 0.5 ,
membedding Dim16
-Batch Size64
ν DeepSVDD hypersphere boundary parameter 0.1
η Learning rate (Adam optimizer) 0.001
λ Weight decay 1 × 10 4
ELocal epochs30
E p Personalization epochs5
TNumber of server rounds (Global rounds)20
r n g Random seeds42
For N-BaIoT, we use the same hyperparameters except window size = 100 packets, graph type = temporal chain, and no Dirichlet partitioning ( α not applicable).
Table 7. Representative engineered per-flow/per-host features used to build node vectors (directional features are expanded to src/dst).
Table 7. Representative engineered per-flow/per-host features used to build node vectors (directional features are expanded to src/dst).
Feature NameDescriptionCategory
hour_of_day, day_of_week, is_weekendTemporal time features to capture periodic patternsTemporal
log_duration, time_since_firstConnection timing statisticsTemporal
orig_bytes, resp_bytes, orig_pkts, resp_pkts, orig_ip_bytes, resp_ip_bytesVolumetric counts to identify traffic volume patterns from originator and responderVolumetric
bytes_ratio( b r ), packets_ratio( p r ), bytes_per_packet_orig( b p o r ), bytes_per_packet_resp( b p r e s )Ratios, b r = orig_bytes resp_bytes + ϵ , p r = orig_pkts resp_pkts + ϵ , b p o r = orig_bytes orig_pkts + ϵ , b p r e s = resp_bytes resp_pkts + ϵ Volumetric
conn_state_*One-hot connection states (established/reset/half-open/no-response/normal/other/rejected)Connection State
proto_*Protocol interactions (tcp/udp/icmp/other)Protocol
syn, ack, fin, rstTCP flags setProtocol
init, data, timeoutPacket types and eventsProtocol
syn_reply, ack_replyProtocol response patternsProtocol
handshake, fin_ackConnection establishment/terminationProtocol
data_*Data flow characteristics (direction/timeout)Protocol
service_*Service one-hots (http/dns/dhcp/ssl/unknown)Service
* denotes categorical attributes that are transformed into one-hot encoded vectors prior to feature extraction.
Table 8. Benign-sample counts and heterogeneity statistics for N-BaIoT devices (used as clients).
Table 8. Benign-sample counts and heterogeneity statistics for N-BaIoT devices (used as clients).
Device IDDevice TypeManufacturerCityCountryBenign Samples
1Danmini DoorbellUnknown/OEMN/AN/A49,548
2Ecobee Thermostatecobee Inc.TorontoCanada13,113
3Ennio DoorbellUnknown/OEMN/AN/A39,100
4Philips Baby MonitorKoninklijke Philips N.V.AmsterdamNetherlands175,240
5Provision PT-737E CameraProvision-ISRKfar SabaIsrael62,154
6Provision PT-838 CameraProvision-ISRKfar SabaIsrael98,514
7Samsung WebcamSamsung ElectronicsSuwonSouth Korea52,150
8SimpleHome XCS7-1002 CameraUnknown/OEMN/AN/A46,585
9SimpleHome XCS7-1003 CameraUnknown/OEMN/AN/A19,528
Mean 61 , 770
Std. dev. 46 , 385
CV 0 . 75
Gini 0 . 37
Max/Min ratio 13 . 4 : 1
Notes: statistics computed on benign sample counts shown above. Manufacturer/city/country information is included where publicly available; some entries correspond to OEM or brand names without documented corporate headquarters. Mean and standard deviation are rounded to the nearest integer; CV and Gini are shown with two significant digits.
Table 9. Global anomaly detection performance on IoT-23 (host-based graph, α = 0.5 , K = 10 ) and N-BaIoT (9 device-clients). Baselines (rows 1–5) are evaluated on IoT-23; our graph methods (rows 6–9) are evaluated on both datasets. Best mean per column is bold.
Table 9. Global anomaly detection performance on IoT-23 (host-based graph, α = 0.5 , K = 10 ) and N-BaIoT (9 device-clients). Baselines (rows 1–5) are evaluated on IoT-23; our graph methods (rows 6–9) are evaluated on both datasets. Best mean per column is bold.
MethodAUC-ROCAUPRTPR@1%FPRTPR@5%FPRTPR@10%FPRInfer. Time (ms)
Isolation Forest91.25 ± 1.5994.47 ± 0.450.62 ± 0.4424.43 ± 1.3953.47 ± 5.762.586 ± 0.39
One-Class SVM78.56 ± 1.0391.16 ± 0.792.47 ± 1.6616.81 ± 6.2538.92 ± 4.934.571 ± 0.68
LG-FGAD [74]96.10 ± 1.1097.10 ± 0.7042.50 ± 5.2366.30 ± 3.6492.80 ± 2.112.187 ± 0.20
FedAvg + MLP93.92 ± 1.2696.13 ± 0.1110.02 ± 4.8450.02 ± 1.5291.19 ± 0.253.899 ± 0.32
FedProx + MLP94.35 ± 2.0198.27 ± 0.4715.33 ± 2.0353.94 ± 1.3093.19 ± 0.563.133 ± 0.03
N-BaIoT dataset ( K = 9 )
GCN–DeepSVDD96.48 ± 0.8996.89 ± 0.6745.67 ± 4.1268.90 ± 2.4593.67 ± 1.231.234 ± 0.12
GAE–DeepSVDD97.74 ± 0.4598.28 ± 0.2365.89 ± 3.7886.34 ± 1.8997.92 ± 0.251.567 ± 0.15
IoT-23 Dataset ( α = 0.5 , K = 10 )
GCN-DeepSVDD97.25 ± 1.1898.82 ± 0.7623.27 ± 4.1361.59 ± 3.0594.62 ± 0.891.553 ± 0.17
GAE-DeepSVDD99.46 ± 0.2799.94 ± 0.0280.16 ± 5.8793.83 ± 0.47100.00 ± 0.001.989 ± 0.23
Table 10. G-PFL-ID per-client AUC-ROC (%) results under different non-IID settings.
Table 10. G-PFL-ID per-client AUC-ROC (%) results under different non-IID settings.
Client IDGCN-DeepSVDDGAE-DeepSVDD
Base-Model Onlyw/FedRegw/PersonalizationBase-Model Onlyw/FedRegw/Personalization
High non-IIDness ( α = 0.1 )
Client 085.66 ± 3.2592.00 ± 0.9695.97 ± 1.0590.89 ± 3.5295.78 ± 0.8796.58 ± 0.58
Client 187.57 ± 2.9091.91 ± 1.0396.68 ± 0.4789.80 ± 2.8195.09 ± 0.9496.49 ± 0.46
Client 284.59 ± 6.1589.93 ± 4.8494.30 ± 1.2088.32 ± 4.8493.41 ± 1.6294.51 ± 0.98
Client 389.72 ± 1.3393.06 ± 0.4997.43 ± 0.5991.45 ± 2.1496.54 ± 0.7197.64 ± 0.42
Client 491.37 ± 1.1994.71 ± 0.3799.08 ± 0.3393.10 ± 0.7998.19 ± 0.6299.29 ± 0.09
Client 589.81 ± 4.9094.15 ± 0.1798.52 ± 0.3492.54 ± 1.1297.63 ± 0.5098.73 ± 0.25
Client 692.77 ± 0.1395.11 ± 0.2899.48 ± 0.0393.50 ± 0.0998.59 ± 0.1099.69 ± 0.07
Client 786.07 ± 3.2090.41 ± 1.9295.78 ± 0.9188.80 ± 5.7693.89 ± 1.3094.99 ± 1.19
Client 884.51 ± 5.8789.85 ± 2.4893.22 ± 2.0388.24 ± 5.1493.33 ± 1.6694.43 ± 0.79
Client 990.85 ± 0.0995.19 ± 0.0899.56 ± 0.0793.58 ± 0.0398.67 ± 0.0999.77 ± 0.05
Average88.29 ± 2.9092.63 ± 1.2697.00 ± 0.7091.02 ± 2.6296.11 ± 0.8497.21 ± 0.49
Medium non-IIDness ( α = 0.5 )
Client 091.98 ± 1.3094.53 ± 0.6496.28 ± 0.6293.89 ± 2.9295.88 ± 0.6299.06 ± 0.79
Client 194.90 ± 0.4297.09 ± 0.4997.45 ± 0.7396.94 ± 0.1097.57 ± 0.0899.66 ± 0.17
Client 296.24 ± 0.5095.06 ± 0.5496.02 ± 0.5694.48 ± 0.5197.14 ± 0.0999.49 ± 0.11
Client 397.96 ± 0.2997.81 ± 0.5798.26 ± 0.3896.38 ± 0.0999.09 ± 0.11100.00 ± 0.00
Client 489.74 ± 2.8490.71 ± 1.1594.82 ± 0.4690.91 ± 0.1295.19 ± 0.8698.93 ± 1.06
Client 591.38 ± 0.3593.93 ± 0.6495.78 ± 0.3991.12 ± 0.2195.41 ± 0.7099.14 ± 0.33
Client 698.02 ± 0.1898.25 ± 0.0898.96 ± 0.0797.06 ± 0.0199.53 ± 0.12100.00 ± 0.00
Client 794.00 ± 1.0497.49 ± 0.2597.49 ± 0.4094.94 ± 2.1097.87 ± 0.6899.86 ± 0.09
Client 894.88 ± 0.8396.78 ± 0.3198.05 ± 0.3294.92 ± 0.1097.16 ± 0.1099.46 ± 0.34
Client 999.04 ± 0.0799.16 ± 0.0099.38 ± 0.0798.58 ± 0.0699.94 ± 0.13100.00 ± 0.00
Average94.88 ± 0.7896.08 ± 0.4797.25 ± 0.4094.92 ± 0.6297.48 ± 0.3599.56 ± 0.29
Low non-IIDness ( α = )
Client 096.46 ± 1.0597.22 ± 0.1299.95 ± 0.0197.08 ± 0.18100.00 ± 0.00100.00 ± 0.00
Client 196.79 ± 1.2097.95 ± 0.1098.19 ± 0.0997.73 ± 0.1597.74 ± 0.9199.98 ± 0.01
Client 295.77 ± 3.4599.10 ± 0.2599.49 ± 0.1596.95 ± 0.3598.53 ± 0.41100.00 ± 0.00
Client 396.56 ± 0.3098.03 ± 0.1598.36 ± 0.4697.27 ± 0.2298.60 ± 0.2999.97 ± 0.01
Client 497.14 ± 0.9596.49 ± 2.08100.00 ± 0.0098.43 ± 0.12100.00 ± 0.00100.00 ± 0.00
Client 596.56 ± 0.3099.00 ± 0.1599.06 ± 0.0697.27 ± 0.2299.39 ± 0.03100.00 ± 0.00
Client 697.14 ± 1.1097.49 ± 1.6899.96 ± 0.0398.44 ± 0.22100.00 ± 0.00100.00 ± 0.00
Client 795.20 ± 1.4397.71 ± 1.3598.73 ± 0.1294.55 ± 1.4899.31 ± 0.09100.00 ± 0.00
Client 896.40 ± 0.3598.19 ± 0.1898.35 ± 0.0796.95 ± 0.2899.52 ± 0.11100.00 ± 0.00
Client 997.17 ± 0.1498.00 ± 0.0798.32 ± 0.6398.49 ± 0.1197.61 ± 1.0299.93 ± 0.05
Average96.52 ± 1.0397.92 ± 0.6199.04 ± 0.1697.32 ± 0.3399.07 ± 0.2999.99 ± 0.01
N-BaIoT for the 9 devices ( K = 9 )
Client 192.23 ± 1.4894.41 ± 0.9496.28 ± 0.7493.15 ± 1.9895.78 ± 0.8497.58 ± 0.54
Client 286.12 ± 2.5089.79 ± 1.7592.40 ± 2.2587.80 ± 2.9190.33 ± 1.7593.10 ± 1.10
Client 392.11 ± 1.6694.30 ± 1.1696.55 ± 0.8392.20 ± 1.6694.97 ± 1.2297.60 ± 0.33
Client 497.89 ± 0.5098.61 ± 0.3199.35 ± 0.1997.95 ± 0.8398.90 ± 0.1599.95 ± 0.05
Client 594.64 ± 1.3095.59 ± 0.9196.70 ± 0.6594.75 ± 1.3096.95 ± 0.9198.80 ± 0.35
Client 696.27 ± 0.9597.12 ± 0.6798.33 ± 0.4896.77 ± 0.9598.35 ± 0.6799.75 ± 0.08
Client 794.16 ± 1.4496.92 ± 1.0198.10 ± 0.7294.60 ± 1.4496.35 ± 1.0197.90 ± 0.42
Client 893.88 ± 1.5295.55 ± 0.9896.80 ± 0.7693.95 ± 1.5295.05 ± 0.9398.90 ± 0.36
Client 989.90 ± 2.1991.10 ± 1.5393.85 ± 1.4090.15 ± 2.1993.30 ± 1.5396.05 ± 0.85
Average93.02 ± 1.5594.82 ± 1.0396.48 ± 0.8993.48 ± 1.6495.55 ± 1.0097.74 ± 0.45
Table 11. G-PFL-ID per-client AUC-ROC (%) results under different non-IID and number of clients settings.
Table 11. G-PFL-ID per-client AUC-ROC (%) results under different non-IID and number of clients settings.
Client IDGCN-DeepSVDDGAE-DeepSVDD
α = 0 . 1 α = 0 . 5 α = α = 0 . 1 α = 0 . 5 α =
10 Clients ( K = 10 )
Client 095.97 ± 1.0596.28 ± 0.6299.95 ± 0.0196.58 ± 0.5899.06 ± 0.79100.00 ± 0.00
Client 196.68 ± 0.4797.45 ± 0.7398.19 ± 0.0996.49 ± 0.4699.66 ± 0.1799.98 ± 0.01
Client 294.30 ± 1.2096.02 ± 0.5699.49 ± 0.1594.51 ± 0.9899.49 ± 0.11100.00 ± 0.00
Client 397.43 ± 0.5998.26 ± 0.3898.36 ± 0.4697.64 ± 0.42100.00 ± 0.0099.97 ± 0.01
Client 499.08 ± 0.3394.82 ± 0.46100.00 ± 0.0099.29 ± 0.0998.93 ± 1.06100.00 ± 0.00
Client 598.52 ± 0.3495.78 ± 0.3999.06 ± 0.0698.73 ± 0.2599.14 ± 0.33100.00 ± 0.00
Client 699.48 ± 0.0398.96 ± 0.0799.96 ± 0.0399.69 ± 0.07100.00 ± 0.00100.00 ± 0.00
Client 795.78 ± 0.9197.49 ± 0.4098.73 ± 0.1294.99 ± 1.1999.86 ± 0.09100.00 ± 0.00
Client 893.22 ± 2.0398.05 ± 0.3298.35 ± 0.0794.43 ± 0.7999.46 ± 0.34100.00 ± 0.00
Client 999.56 ± 0.0799.38 ± 0.0798.32 ± 0.6399.77 ± 0.05100.00 ± 0.0099.93 ± 0.05
Average97.00 ± 0.7097.25 ± 0.4099.04 ± 0.1697.21 ± 0.4999.56 ± 0.2999.99 ± 0.01
15 Clients ( K = 15 )
Client 095.15 ± 0.6296.89 ± 0.9899.19 ± 0.3896.13 ± 1.2598.76 ± 0.45100.00 ± 0.00
Client 194.82 ± 1.0595.74 ± 1.2499.04 ± 0.7694.87 ± 1.6897.92 ± 0.8999.79 ± 0.09
Client 298.47 ± 0.3497.28 ± 0.7699.21 ± 0.2496.42 ± 0.9299.13 ± 0.32100.00 ± 0.00
Client 391.19 ± 1.9894.82 ± 2.8596.89 ± 0.8593.56 ± 2.1497.14 ± 1.1299.34 ± 0.12
Client 496.95 ± 0.6298.63 ± 0.61100.00 ± 0.0097.88 ± 0.7499.41 ± 0.25100.00 ± 0.00
Client 591.72 ± 3.2294.15 ± 2.0396.45 ± 0.8192.94 ± 2.4196.57 ± 1.3498.97 ± 0.34
Client 697.68 ± 1.0998.52 ± 0.9199.18 ± 0.4195.67 ± 1.0998.49 ± 0.5299.89 ± 0.02
Client 797.02 ± 0.4298.09 ± 0.8199.93 ± 0.0397.21 ± 0.9798.98 ± 0.38100.00 ± 0.00
Client 891.24 ± 1.4195.37 ± 0.4797.07 ± 0.4794.52 ± 1.8797.68 ± 0.9799.68 ± 0.27
Client 997.98 ± 0.3498.02 ± 0.51100.00 ± 0.0098.35 ± 0.6199.72 ± 0.18100.00 ± 0.00
Client 1090.56 ± 2.7894.96 ± 2.5297.90 ± 0.4793.87 ± 2.0397.35 ± 1.0599.85 ± 0.05
Client 1196.63 ± 0.5497.45 ± 0.6899.95 ± 0.0297.67 ± 0.8299.26 ± 0.29100.00 ± 0.00
Client 1291.96 ± 2.1296.24 ± 0.3599.37 ± 0.3695.28 ± 1.5598.21 ± 0.7299.81 ± 0.00
Client 1398.21 ± 0.1399.16 ± 0.56100.00 ± 0.0099.12 ± 0.6899.58 ± 0.21100.00 ± 0.00
Client 1492.88 ± 2.6395.03 ± 1.7998.00 ± 0.7993.79 ± 2.2397.02 ± 1.1899.41 ± 0.18
Average94.83 ± 1.2996.69 ± 1.1498.81 ± 0.3795.82 ± 1.4098.35 ± 0.6699.78 ± 0.07
20 Clients ( K = 20 )
Client 097.47 ± 0.4598.24 ± 0.5799.20 ± 0.3796.95 ± 0.8899.62 ± 0.12100.00 ± 0.00
Client 194.18 ± 3.3895.78 ± 2.2198.78 ± 0.1197.43 ± 0.6799.34 ± 0.1899.98 ± 0.00
Client 296.84 ± 0.5898.05 ± 0.1899.00 ± 0.5297.26 ± 1.4298.47 ± 0.6899.49 ± 0.16
Client 393.37 ± 4.8295.91 ± 2.5897.11 ± 0.5895.58 ± 3.1297.73 ± 1.2499.13 ± 0.42
Client 498.76 ± 0.6299.82 ± 0.09100.00 ± 0.0098.84 ± 0.1599.12 ± 0.28100.00 ± 0.00
Client 595.56 ± 0.3296.13 ± 1.8997.88 ± 0.2996.72 ± 3.0298.02 ± 1.0899.92 ± 0.00
Client 697.25 ± 1.2598.67 ± 0.3999.17 ± 0.0998.63 ± 0.7399.09 ± 0.02100.00 ± 0.00
Client 797.39 ± 1.0898.48 ± 0.7299.08 ± 0.1297.81 ± 1.2898.86 ± 0.3599.98 ± 0.00
Client 892.94 ± 2.8294.52 ± 0.8598.15 ± 0.1593.27 ± 2.2496.98 ± 1.1898.98 ± 0.16
Client 998.27 ± 0.4899.15 ± 0.18100.00 ± 0.0098.59 ± 0.9899.45 ± 0.38100.00 ± 0.00
Client 1096.79 ± 0.6298.26 ± 0.4299.72 ± 0.1497.08 ± 0.9298.89 ± 0.4899.89 ± 0.06
Client 1193.58 ± 0.6696.71 ± 0.8699.01 ± 0.1695.97 ± 1.1299.03 ± 0.44100.00 ± 0.00
Client 1297.36 ± 1.6898.89 ± 0.5299.90 ± 0.0196.74 ± 2.1199.25 ± 0.05100.00 ± 0.00
Client 1398.15 ± 0.5299.03 ± 0.02100.00 ± 0.0096.42 ± 1.0299.31 ± 0.40100.00 ± 0.00
Client 1495.52 ± 3.2597.94 ± 1.1899.86 ± 0.0697.81 ± 0.5899.17 ± 0.12100.00 ± 0.00
Client 1597.07 ± 0.0599.19 ± 0.2899.96 ± 0.0296.98 ± 1.5599.28 ± 0.05100.00 ± 0.00
Client 1695.57 ± 2.3898.42 ± 0.5199.22 ± 0.0194.93 ± 1.8597.84 ± 0.8999.68 ± 0.10
Client 1792.68 ± 1.7296.59 ± 0.9698.89 ± 0.1696.05 ± 1.2198.94 ± 0.5299.91 ± 0.03
Client 1898.21 ± 0.0899.75 ± 0.22100.00 ± 0.0097.52 ± 1.3899.86 ± 0.14100.00 ± 0.00
Client 1996.94 ± 0.5897.87 ± 0.8498.67 ± 0.0496.28 ± 1.0699.18 ± 0.42100.00 ± 0.00
Average96.20 ± 1.3797.87 ± 0.7799.18 ± 0.1496.84 ± 1.3698.87 ± 0.4599.85 ± 0.05
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Oladele, D.A.; Ige, A.; Agbo-Ajala, O.; Ekundayo, O.; Thottempudi, S.G.; Sibiya, M.; Mnkandla, E. G-PFL-ID: Graph-Driven Personalized Federated Learning for Unsupervised Intrusion Detection in Non-IID IoT Systems. IoT 2026, 7, 13. https://doi.org/10.3390/iot7010013

AMA Style

Oladele DA, Ige A, Agbo-Ajala O, Ekundayo O, Thottempudi SG, Sibiya M, Mnkandla E. G-PFL-ID: Graph-Driven Personalized Federated Learning for Unsupervised Intrusion Detection in Non-IID IoT Systems. IoT. 2026; 7(1):13. https://doi.org/10.3390/iot7010013

Chicago/Turabian Style

Oladele, Daniel Ayo, Ayokunle Ige, Olatunbosun Agbo-Ajala, Olufisayo Ekundayo, Sree Ganesh Thottempudi, Malusi Sibiya, and Ernest Mnkandla. 2026. "G-PFL-ID: Graph-Driven Personalized Federated Learning for Unsupervised Intrusion Detection in Non-IID IoT Systems" IoT 7, no. 1: 13. https://doi.org/10.3390/iot7010013

APA Style

Oladele, D. A., Ige, A., Agbo-Ajala, O., Ekundayo, O., Thottempudi, S. G., Sibiya, M., & Mnkandla, E. (2026). G-PFL-ID: Graph-Driven Personalized Federated Learning for Unsupervised Intrusion Detection in Non-IID IoT Systems. IoT, 7(1), 13. https://doi.org/10.3390/iot7010013

Article Metrics

Back to TopTop