A Comparative Analysis of Self-Aware Reinforcement Learning Models for Real-Time Intrusion Detection in Fog Networks

Nyashadzashe Tamuka; Topside Ehleketani Mathonsi; Thomas Otieno Olwal; Solly Maswikaneng; Tonderai Muchenje; Tshimangadzo Mavin Tshilongamulenzhe

doi:10.3390/fi18020100

Abstract

Fog computing extends cloud services to the network edge, enabling low-latency processing for Internet of Things (IoT) applications. However, this distributed approach is vulnerable to a wide range of attacks, necessitating advanced intrusion detection systems (IDSs) that operate under resource constraints. This study proposes integrating self-awareness (online learning and concept drift adaptation) into a lightweight RL (reinforcement learning)-based IDS for fog networks and quantitatively comparing it with non-RL static thresholds and bandit-based approaches in real time. Novel self-aware reinforcement learning (RL) models, the Hierarchical Adaptive Thompson Sampling–Reinforcement Learning (HATS-RL) model, and the Federated Hierarchical Adaptive Thompson Sampling–Reinforcement Learning (F-HATS-RL), were proposed for real-time intrusion detection in a fog network. These self-aware RL policies integrated online uncertainty estimation and concept-drift detection to adapt to evolving attacks. The RL models were benchmarked against the static threshold (ST) model and a widely adopted linear bandit (Linear Upper Confidence Bound/LinUCB). A realistic fog network simulator with heterogeneous nodes and streaming traffic, including multi-type attack bursts and gradual concept drift, was established. The models’ detection performance was compared using metrics including latency, energy consumption, detection accuracy, and the area under the precision–recall curve (AUPR) and the area under the receiver operating characteristic curve (AUROC). Notably, the federated self-aware agent (F-HATS-RL) achieved the best AUROC (0.933) and AUPR (0.857), with a latency of 0.27 ms and the lowest energy consumption of 0.0137 mJ, indicating its ability to detect intrusions in fog networks with minimal energy. The findings suggest that self-aware RL agents can detect traffic–dynamic attack methods and adapt accordingly, resulting in more stable long-term performance. By contrast, a static model’s accuracy degrades under drift.

Keywords:

fog nodes; self-aware; concept drift; F-HATS-RL; HATS-RL; LinUCB; federated learning; intrusion detection

1. Introduction

Fog computing extends cloud services to Internet of Things (IoT) devices, reducing latency and bandwidth usage. Still, it also increases the attack surface because fog nodes and gateways operate in exposed, heterogeneous environments [1]. In this setting, intrusion detection must remain accurate while operating within strict resource budgets at the fog layer (memory and energy).

Reinforcement learning (RL) offers an attractive alternative to static intrusion detection system (IDS) models because it can learn online from feedback and adapt decisions as traffic conditions evolve [2,3,4,5]. However, fog traffic is non-stationary (dynamic): benign profiles and attack strategies change over time (concept drift), which can degrade policies that do not explicitly track uncertainty or detect drift.

1.1. Research Gaps

Despite recent progress in reinforcement learning-based intrusion detection for fog and edge environments, the gaps that remain are as follows: (i) limited formal specification of self-awareness for lightweight RL agents, including explicit links between drift detection and posterior/policy updates; (ii) insufficient description of privacy-preserving collaboration mechanisms that remain effective under heterogeneous (non-independent and identically distributed/non-IID) node traffic.

1.2. Research Questions

Can integrating online uncertainty estimation and concept drift detection improve the stability of an intrusion detection system (IDS) under non-stationary (dynamic) fog traffic?
Can decentralized, privacy-preserving gossip sharing of lightweight posterior summaries improve intrusion detection performance across heterogeneous fog nodes?
How can the detection performance of the proposed RL-based IDS models be evaluated?

1.3. Contributions

The gaps were addressed by proposing two lightweight RL-based IDS models for real-time intrusion detection in fog networks:

(i): Hierarchical Adaptive Thompson Sampling–Reinforcement Learning (HATS-RL), which introduces self-awareness via online uncertainty estimation and Page–Hinkley drift alarms, and an explicit posterior adaptation rule (adaptive exploration and forgetting)
(ii): Federated HATS-RL (F-HATS-RL), which extends the HATS-RL with privacy-preserving gossip sharing of compact posterior summaries across nodes. These models were evaluated against the static threshold and a contextual bandit baseline (LinUCB) under bursty attacks and drift.

The HATS-RL is a self-aware reinforcement learning technique that incorporates internal drift detection and uncertainty awareness to adjust its learning strategy. The F-HATS-RL extends this by enabling multiple fog nodes to share learned model parameters (a form of gossip-based federated learning), thereby improving detection of distributed attacks and mitigating concept drift. Concept-drift adaptation and federated online learning were integrated into a reinforcement learning intrusion detection system (IDS) for the real-time detection of various attacks in fog computing, since local drift alone lacks global potential for distributed attacks, and federated learning without self-awareness is liable to concept drift [5,6,7,8]. These models were evaluated in a simulation that mirrored real-world fog networks, including a variety of attack types, such as service emulation, botnets, man-in-the-middle attacks, and fog node identity falsification. By comparing these with a static threshold (ST) and a classical contextual bandit (LinUCB), the benefits of the self-aware and collaborative features were revealed.

The remainder of this paper is organized as follows: Section 2 reviews the related studies; Section 3 presents the materials and methods, including the simulator, policies, and evaluation metrics; Section 4 reports the results; and Section 5 concludes and outlines future work.

1.4. Background

1.4.1. Fog Computing and Security Challenges

Fog computing extends cloud capabilities to the network edge by placing fog nodes between IoT devices and centralized data centers, reducing bandwidth and latency for real-time analytics and smart infrastructure. Fog nodes operate in vulnerable environments and lack continuous security monitoring [9]. Threats include “man-in-the-middle” attacks on in-transit data, DDoS attacks, malware targeting the node’s operating system, and botnet-driven malicious traffic [1]. A compromised fog node can enable attacks to move across the network and to disrupt the services it supports. Adequate security requires both a network-based IDS to monitor passing traffic and a host-based IDS to monitor the node [10]. The fog-layer IDS must meet tight memory, energy, latency, and CPU constraints [2]. Securing fog nodes requires real-time, resource-aware intrusion–detection techniques for federated learning.

1.4.2. Intrusion Detection and Reinforcement Learning

Intrusion detection techniques are classified as signature-based and anomaly-based [11]. Signature-driven techniques match traffic against known rules and signatures, achieving precise detection of known attacks but missing novel threats, and being expensive to keep synchronized across many fog nodes. Anomaly-driven techniques flag anomalies in network packets at the expense of heavier computational requirements [11].

By contrast, a reinforcement learning intrusion detection system (IDS) uses sequential decision-making: an agent monitors traffic features and selects actions such as allow, rate-limit, retransmit, and block to maximize a long-term reward that balances accurate detections and operational costs.

Because policies are updated online, reinforcement learning (RL) naturally adapts to multiple attacks and concept drift, improving responsiveness in streaming environments [12]. For real-time fog deployments, contextual multi-armed bandits, such as a LinUCB and Thompson sampling, provide a lightweight RL variant that learns per-flow decisions from feedback without modeling future state transitions and updates incrementally to reduce benign blocks under low CPU and energy constraints [13]. Rewards can be maximized to increase true positives, while false positives and latency/energy overheads can be penalized. Using drift detection methods, such as adaptive windowing and the Page–Hinkley test, the agent can increase exploration or reset specific learning parameters when traffic changes are detected [14]. This allows it to adapt more quickly to new conditions rather than persisting with a suboptimal policy.

In a fog/edge IDS, multiple distributed fog nodes monitor different traffic streams, making collaboration essential. Sending all data to the cloud is not ideal due to privacy and bandwidth concerns [15]. Federated learning (FL) addresses this by enabling local training and sharing only model updates (e.g., gradients or weights), thereby protecting data privacy and improving the global model [15]. However, FL in an IDS still faces challenges, such as model poisoning and concept drift, which lead to periodic aggregation failures and missed detections. To mitigate this, the proposed method employs a specialized form of federated learning for streaming data, in which fog nodes exchange lightweight summaries of their learned parameters, known as “gossip”.

2. Related Studies

Related works have improved intrusion detection in fog and IoT networks, as shown in Table 1. Nevertheless, they have some gaps that the proposed self-aware RL models (HATS-RL and F-HATS-RL) address. For instance, the hybrid IDS for smart homes based on CSE-CIC-IDS2018 was accurate for intrusion detection with classical machine learning classifiers such as RF and XGBoost. Still, its reliance on static datasets restricted its generalizability in dynamic fog networks [16]. The Auto-IF (autoencoder and isolation forest) achieved 95.4% on NSL-KDD, yet the binary intrusion classification on the old dataset left out the drift and energy requirements for fog networks [17].

Table 1. Comparison of Related Intrusion Detection Approaches for Fog Networks.

The drift-adaptive online DDoS frameworks, such as AUWPAE, effectively addressed concept drift and zero-day attacks but remained attack-specific (DDoS only) and lacked feasibility across heterogeneous threats [18]. The federated fog learning IDS (2FIDS) preserved privacy and scaled across datasets such as Bot-IoT and TON-IoT, but degraded on the MQTTset (86%), lacking sequential decision-making, and did not incorporate cost-aware responses at the fog nodes [15]. The hybrid CNN-LSTM model, using the CICDDoS2019 dataset, reported 99.5% accuracy with minimal latency; however, its computational complexity limited real-time deployment on resource-constrained fog nodes [19]. A DNN-KDQ IDS for edge devices reported 99.43% accuracy on the CICIDS2017 dataset and shrunk the model from 197 KB to 20 KB, showing a detection potential at low edge resources [20]. However, the IDS was evaluated on a fixed dataset that lacked concept drift, and it was not subjected to dynamic multi-attack scenarios. A fog-edge federated SVM for smart grids, sharing parameters rather than raw data, outperformed the NSL-KDD/CICIDS2017 baselines [21], yet drift behavior and latency at scale were not considered. A Bi-GRU autoencoder ensemble [22] was developed to reduce parameter requirements and to detect unknown attacks, thereby supporting constrained IoT systems. The work was anomaly-centric and lightweight, but was not evaluated under internode heterogeneity (drift) and there was no cross-node information sharing. Adaptive collaboration was also explored in vehicular fog networks [23], which emphasized scalable, low-latency collaborative detection, yet overlooked dynamic network attacks and cross-node federated learning for intrusion detection. A 94% accurate perceptron-based IDS [24] aimed to minimize edge computing but relied on static datasets and did not account for federation or concept drift.

2.1. Synthesis of Related Studies

Recent fog and edge intrusion detection research can be categorized into lightweight supervised IDSs built on fixed datasets, drift-aware online IDSs focused on evolving traffic, resource-aware deep or compressed IDSs targeting edge/fog feasibility, and collaborative/federated IDSs that preserve privacy across distributed nodes. Surveys of fog IDS models show that fog deployments must balance detection quality with latency and privacy constraints, making centralized models difficult to operationalize at scale [8,9,10,11,12]. Within lightweight supervised approaches, significant findings were reported using classical machine learning models. Still, the reliance on static, offline datasets limits generalization to dynamic fog networks, where traffic and attacks evolve [17]. Drift-aware online frameworks have addressed this non-stationarity, but many have remained attack-specific, such as DDoS-only, or have not provided a general decision mechanism that could adapt responses across heterogeneous threats [18]. Various studies have investigated deep learning and compressed architectures, such as CNN-LSTM and distilled/quantized DNNs, which achieve high accuracy on benchmark datasets. Yet, they assume a stationary evaluation and can impose non-trivial training/inference burdens that are challenging for resource-limited fog or edge nodes [20,22]. Collaborative and federated IDS models address privacy and distributed observability by sharing model updates rather than raw data; this approach has been identified as promising but remains open to heterogeneity (non-IID traffic) and trust/poisoning risks [24]. Existing fog federated IDS studies have demonstrated privacy-preserving benefits but have reported degradation under specific protocols/datasets, and they have omitted the treatment of sequential decision-making and drift-adaptive response policies. Prior collaborative IDS studies have further highlighted the importance of coordinating against distributed attacks in multi-node environments, motivating mechanisms that share cross-node evidence while avoiding centralized raw-data collection.

A consistent gap across these categories has been that drift handling, sequential decision-making, and federation have often been treated separately: static, lightweight IDSs did not address drift; drift-aware IDSs were usually attack-specific or relied on retraining-style adaptation; and federated IDSs frequently prioritized privacy without explicitly modeling drift-aware decision policies under fog constraints. This gap motivated the proposed HATS-RL and F-HATS-RL agents, which were designed to treat fog IDSs as a streaming, contextually informed decision problem with explicit online adaptation under drift, and to extend this capability through privacy-preserving collaboration via parameter sharing rather than raw traffic, aligning with fog security requirements for heterogeneity and resource constraints.

2.2. Problem Formulation: Streaming Fog IDS as a Contextual Bandit

Intrusion detection at each fog node was modeled as an online contextual bandit operating on a stream of network flows. At each discrete time step

t

, a flow arrived with a feature vector,

x_{t} \in R^{D}

. The agent selected an action,

a_{t} \in A

, where

A = {

allow, rate_limit, block

}

[25]. After action execution, the environment revealed a feedback signal in the form of a scalar reward

r_{t}

derived from the ground-truth label

y_{t} \in {0,1}

(benign/malicious). The objective was to maximize the reward

(\sum_{t = 1}^{T} r_{t})

over time. The agent learns continuously from a data stream without prior batch training (online learning).

For each action

a

, the expected reward was modeled linearly according to Equation (1) [26]:

E [r_{t} ∣ x_{t}, a_{t} = a] = x_{t}^{⊤} w_{a}

(1)

where

w_{a} \in R^{D}

is an unknown parameter vector learned online. The objective was to maximize the cumulative reward,

\sum_{t = 1}^{T} r_{t,}

while adapting to non-stationary traffic, where the distribution of

x_{t}

and the reward mapping drifts over time (concept drift).

3. Materials and Methods

To evaluate the performance of the proposed self-aware RL models for intrusion detection in fog networks, a simulation environment was developed, and a series of experiments was designed. The methodology involved the following stages: (1) simulating a fog network comprising multiple nodes handling network traffic; (2) generating benign traffic interspersed with cyberattacks; (3) implementing four IDS policies (static and RL-based) on the fog nodes; and (4) measuring a wide range of metrics to compare their effectiveness and efficiency. The simulation, which allows for control over attack scenarios and ensures repeatability, was used to reflect a real-world fog network. Figure 1 presents the simulation stages.

Figure 1. The study’s simulation stages.

3.1. Simulation of the Fog Network

A fog network consisting of 8 nodes was simulated. Each fog node represented a compute device that processes network flows from IoT devices or edge clients. Each node operated on a portion of the incoming traffic independently, akin to each node serving a particular region or subset of edge clients or IoT devices. This was appropriate for evaluating the IDS on each node, whereas the proposed federated learning will enhance privacy across the nodes.

An 8-node fog network was applied to model a micro-fog deployment in which a small number of fog gateways process traffic from many fog clients. This was sufficient to capture the core distributed IDS challenges studied, namely heterogeneous (non-IID) local traffic, bursty attack injection, decentralized knowledge propagation, and gossip mixing, while keeping the simulator tractable for controlled-drift analysis and repeatable runs. Notably, the proposed learning and federation steps were local and lightweight. The updates were considered independent of the number of nodes, and communication overhead scaled linearly with the number of gossip exchanges. Therefore, the methodology was considered to generalize to larger node counts.

3.2. Traffic and Attack Injection

A traffic generator that produced streaming network flows at each time tick was developed using Python 3.11 modules. A “tick” can be thought of as a short time window (such as 100 ms or 1 s) during which a batch of network flows arrives at the fog node. At each tick t, the simulator injected a random number of flows (drawn from the range [flows_min, flows_max] to simulate varying load. An 8-element feature vector characterized each flow. These features were inspired by standard network flow metrics, including bytes per flow, packets per flow, average interarrival time, TCP SYN flag ratio, destination port entropy, flow duration, error rate, and new connection rate. A multivariate normal distribution was used to generate benign traffic feature vectors, with the distribution gradually shifting over time to model benign concept drift. Every 100 ticks, a small amount of random drift was added to the mean of the benign distribution. This represented fluctuations in standard traffic patterns, for instance, increased traffic during peak hours or new IoT devices slightly altering traffic characteristics.

The simulator injected multiple concurrent cyberattacks in bursts. The attack library consisted of 15 attack types relevant to fog/edge networks, which included port scans, SYN floods, UDP/ICMP floods, DNS amplification, DDoS, data exfiltration, botnet Command-and-Control traffic, TLS abuse, slow HTTP DoS, ARP spoofing, and fog-specific threats like rogue fog node or man-in-the-middle. Each attack type was defined by a mean shift in the feature space; that is, how that attack’s network traffic deviated from regular network traffic and from noise. When an attack was active, a fraction of flows at a fog node had their features altered by the corresponding noise and shift. These flows were identified as malicious. The probability p_start per tick was set for a new attack to begin if fewer than the maximum concurrent attacks are active. Each attack, once started, lasted for a random duration (on_durticks) and, after ending, had a cooldown period (off_durticks) before it could restart. This represented intermittent attack patterns. Overlapping attacks were included; that is, if multiple attacks were active, some flows were affected by more than one attack, indicating flows exhibiting characteristics of various threats. The simulation produced a continuous sequence of data per node, a stream of flows with labels (0 for benign, 1 for malicious). The IDS running on each node observed the features and output a decision (action) for each flow in real-time.

3.3. Implementation of the IDS Policies/Models

The IDS policies compared included static threshold, LinUCB, HATS-RL, and F-HATS-RL. Each policy was implemented as an agent that processed flows sequentially and made decisions in Python 3.11 on a Dell Latitude (1 TB SSD and Core i7) computer on 1 August 2025.

The models’ evaluation followed a structured baseline ladder that isolated the contribution of each significant component without requiring separate ablation runs. The static threshold represented non-learning thresholding under drift, the LinUCB added lightweight online contextual-bandit learning without explicit drift-aware posterior adaptation, the HATS-RL added self-awareness (uncertainty tracking and Page–Hinkley drift alarms) together with explicit drift-aware exploration and forgetting; and the F-HATS-RL enhanced privacy-preserving gossip federation of compact posterior summaries on top of the HATS-RL. Accordingly, differences between consecutive baselines can be interpreted as the incremental benefit of online learning, drift-aware adaptation, and federation.

3.3.1. Static Threshold Algorithm Development

Feature Statistics: The feature statistics module maintained the mean (μ) and variance σ² of each feature using an exponential window with a decay factor α of 0.01 (the decay factor could be altered). This was updated upon the arrival of new flows.

Decision step: For an input flow x, the decision step computed standardized values for z = (x − μ)/σ and an anomaly score for S =

∥ z ∥ .

This scalar score was compared to thresholds. If (S > thr_high Block Traffic, if S > thr_low, Rate-limit, else Allow traffic).

Output step: The output included the chosen action and, if rate-limited, a rate percentage, a logistic function to map score to (0,1) as rate strength was applied. The policy did not learn from dynamic/drifting network traffic. It uses thresholding, so it cannot reduce false positives or false negatives below the fixed threshold. This baseline model was expected to be quick and lightweight, but produced more false alarms under concept drift, since it treated drift as an anomaly.

3.3.2. LinUCB Linear Upper Confidence Bound (Contextual Bandit) Algorithm Development

This is a lightweight contextual bandit algorithm from reinforcement learning [27]. The LinUCB considered each decision as a multi-armed bandit with context (features). The policy has actions for attacks: allow = 0, rate_limit = 1, and block = 2.

Parameter Matrices: For each action a (0,1,2), it kept an

A_{a}

(D × D matrix) and

b_{a}

(D × 1 vector). Initially,

A_{a}

= λI (λ = 1 regularization) and

b_{a}

= 0. These indicated the action’s reward model.

Decision Step: For each action, a flow x is computed as follows:

{θ_{a} = A}_{a}^{- 1} b_{a}

(2)

(the weight vector for that action’s reward) The LinUCB computed an upper-confidence bound [26] for each action upon arrival of a new flow with feature vector x:

U C B_{a} = θ_{a}^{T} x + α \sqrt{x^{T} A_{a}^{- 1} x}

(3)

A_{a}

is a (D × D matrix) that accumulated feature covariance for action a, and α is a tunable parameter controlling attack identification. The policy selected the action with the highest UCB.

Learning Step: After receiving a reward r, let a be the chosen action. Update the equations as follows:

A_{a} \leftarrow A_{a} + x x^{T}

(4)

b_{a} \leftarrow b_{a} + r x

(5)

The weights

θ

for each action start reflecting on which feature patterns in the feature vector x yielded higher rewards for that action. For instance, if high bytes-per-flow correlated with attacks and blocking, then

θ_{b l o c k}

could develop a significant weight on that feature, making block more likely in future similar flows. This policy was extended to intrusion detection in edge/fog networks by imposing a slight latency penalty on resource-intensive actions.

3.3.3. Self-Aware Hierarchical Adaptive Thompson Sampling (HATS-RL)

This is the proposed self-aware/self-learning reinforcement learning model. “Hierarchical” refers to the notion that the model made decisions in a two-step hierarchy: first, selecting an action using Thompson sampling [28]; and second, potentially adjusting meta-parameters, like exploration rate, at a higher level based on performance. It also maintained

A_{a}

and

b_{a}

for each action (like the LINUCB) to represent a Gaussian posterior over

θ_{a}

. This implies that the algorithm’s belief about the unknown weight vector

θ_{a}

, the parameters that map feature x to the expected reward for action a, was a multivariate normal distribution rather than a point estimate. The reward noise variance,

σ^{2} = 0.8,

was for Thompson sampling updates. Classical contextual bandits, such as the LinUCB, provide a strong, lightweight baseline, but their fixed confidence schedule can be miscalibrated under concept drift [28]. The developed HATS-RL incorporated hierarchical self-awareness that monitored exponential moving averages of uncertainty and regret, triggering Page–Hinkley drift alarms that scaled the Thompson posterior covariance (adaptive exploration) and applied lightweight forgetting to outdated posteriors. This yielded faster recovery after distribution shifts while maintaining the same action set and reward structure as the LinUCB.

Self-Awareness and Hierarchical Adaptation

Self-awareness is defined in this study as an online capability that enabled an IDS agent as follows: (i) to quantify uncertainty by measuring how unusual the current flow was relative to the node’s recent operating regime; (ii) to detect persistent distributional changes (concept drift) from streaming observations, and initiate a principled adjustment of the exploration–exploitation process to maintain reliable long-term performance under drift. The approach is considered hierarchical because it combines a policy that selects actions via Thompson sampling with a meta-adaptation controller that updates exploration and forgetting behavior upon drift detection.

A composite drift monitoring signal,

m_{t,}

was defined by combining a performance/regret proxy

g_{t}

derived from reward feedback and an uncertainty signal

u_{t}

computed from standardized streaming feature deviations [29,30]:

m_{t} = η g_{t} + (1 - η) u_{t} η \in [0,1]

(6)

where

η \in [0,1]

is the weighting hyperparameter.

A streaming change detector, the Page–Hinkley test, was applied to the stream,

\{m_{t,}\}

to identify sustained shifts in its mean [31].

Drift-Aware Thompson Sampling Posterior Update Rule

For each action

a

, Bayesian linear regression sufficient statistics (

A_{a}, b_{a}

) were maintained as follows [32]:

A_{a} = λ I + \sum_{t : a_{t} = a} x_{t} x_{t}^{⊤}

(7)

b_{a} = \sum_{t : a_{t} = a} x_{t} r_{t},

(8)

where

λ > 0

is a regularizer.

The value for

A_{a}

summarized the evidence that had been observed for action

a

, and

b_{a}

summarized the reward-weighted evidence.

The posterior mean,

A_{a}^{- 1} b_{a}

and the covariance

, σ^{2} A_{a}^{- 1}

for

w_{a}

were as follows:

μ_{a} = A_{a}^{- 1} b_{a}, Σ_{a} = σ^{2} A_{a}^{- 1}

(9)

At decision time, Thompson sampling [28] draws the following equation:

~ w_{a} \sim N (μ_{a}, γ_{t} Σ_{a})

(10)

where γt is a drift-adaptive exploration multiplier [33]. When a Page–Hinkley drift alarm occurs, two lightweight adaptations are applied: covariance inflation by increasing γt to encourage adaptive exploration [34], and forgetting of old evidence that down-weights stale statistics.

A_{a} \leftarrow (1 - ρ) A_{a} + ρ λ I

(11)

b_{a} \leftarrow (1 - ρ) b_{a}, ρ \in (0,1)

(12)

This explicit update rule directly links drift detection to policy adaptation. Drift alarms trigger increased exploration and systematic forgetting, enabling the agent to rapidly relearn an effective response policy in the new traffic. The proposed Self-Aware Hierarchical Adaptive Thompson Sampling (HATS-RL) comprises the decision step, learning step, self-awareness, and drift detection.

Decision Step: Instead of computing an upper confidence bound (UCB), the agent used Thompson sampling. For each action a, it sampled model parameters from the Gaussian posterior

θ_{a} \sim N (A_{a}^{- 1} b_{a}, σ^{2} A_{a}^{- 1})

. Subsequently, the score

θ_{a}^{⊤} x

was computed, and the action

a_{s e l e c t e d} = \arg {m a x}_{a} θ_{a}^{⊤} x

was selected. This means an action with a slightly lower posterior mean could still be chosen if its sampled value was high due to uncertainty. This is referred to as Thompson sampling [35]. Context uncertainty, uncert =

∥ x ∥

, which quantified the deviation of input traffic from the mean, was measured. Instances that deviated substantially from the mean were identified as anomalies.

Learning Step: Using reward

r

for action

a

, update the equations as follows:

A_{a} \leftarrow A_{a} + x x^{⊤}

(13)

b_{a} \leftarrow b_{a} + r x, t h e s a m e a s L i n U C B p o l i c y

(14)

Self-awareness: An exponential moving average of two signals that tracked uncertainty was maintained. It was updated with the latest uncertainty value for each time (

α = 0.05

), which tracked the latest regret or error. If the reward was negative, for instance, a significant negative for mistakes, the policy considered

- r

as regret. The

R_{mean}

was updated with

m a x (0, - r)

.

Drift Detection: The Page–Hinkley test [36],

w i t h m_{t} = R_{mean} + c U_{mean}

, was adopted for drift detection in fog nodes. A weighting from

c

= 0.2 to 0.4 was defined to incorporate regret and uncertainty. The Page–Hinkley function was used to keep a cumulative sum of

(m_{t} - m_{avg} - δ)

and to check if it went below a threshold, indicating a sudden increase in

m_{t}

[36]. The

m_{t}

increased significantly as errors increased, while uncertainty remained consistently high. The HATS-RL policy increased its exploration parameter

β

(Thompson sampling exploration multiplier) if it detected drift. By default,

β = 1.0

. The proposed HATS-RL had

β

varying between 0.6 and 2.0. Upon drift,

β =

m i n (2.0, β + 0.4),

which increased exploration by up to double. The policy maintained

β > 0.6

to sustain dynamic exploration and exploitation.

Model’s output: The HATS-RL outputs the actions (allow/throttle/block). The model could produce a rate strength as a sigmoid of its score, akin to static. For example, if it chose action =1 (rate limit), it computed the Q-value for that action and applied a logistic to receive a 0 to 1 number for how strongly to throttle. A Q-value (action-value) was the expected reward for taking an action. In fact, if the model is highly confident (the score is very high, but it still chooses to throttle rather than block), the rate_strength will be close to 1, indicating a block.

HATS-RL (Self-Aware RL) Algorithm

This is a lightweight contextual bandit algorithm in reinforcement learning. The LinUCB considered each decision as a multi-armed bandit with context (features). The policy has the following actions for attacks: allow = 0, rate_limit = 1, and block = 2. Each flow at time

t

arrives with a feature vector,

x_{t} \in R^{D}

(

D = 8

). The action set is

A = {

allow

= 0

, rate_limit

= 1

, block

= 2}

. For each action

a \in A

,

A_{a} \in R^{D \times D}

and

b_{a} \in R^{D}

were maintained with priors

A_{a} \leftarrow λ I

(

λ = 1

),

b_{a} \leftarrow

0. If an agent makes a decision, these parameters were updated using

A_{a}

and

b_{a}

.

Posterior and Thompson sampling (exploration scaled by drift): The HATS-RL models the action-value weights with a Gaussian posterior:

θ_{a} \sim N (A_{a}^{- 1} b_{a}, σ^{2} A_{a}^{- 1}) n o i s e σ^{2} = 0.8 .

(15)

To regulate exploration under drift, the posterior covariance was scaled by

β_{t} \in [0.6,2.0] :

sample {\tilde{θ}}_{a} \sim N (A_{a}^{- 1} b_{a}, (β_{t} σ^{2}) A_{a}^{- 1})

(16)

Uncertainty and self-awareness: The standardized features using an exponential moving average (EMA) of decay

α = 0.01

are as follows:

z_{t} = (x_{t} - μ_{t}) / σ_{t}

(17)

Uncertainty

U_{t} = {∥z_{t}∥}_{2}

is the Euclidean length (2-norm) of the z-score vector.

R e g r e t p r o x y ρ_{t} = m a x (0, - r_{t})

(18)

where

r_{t}

is the reward. Track EMAs

U_{mean}

and

R_{mean}

with decay 0.05.

Drift detector (Page–Hinkley):

C o m p u t e d m_{t} = R_{mean} + c U_{mean}, c \in [0.2,0.4]

(19)

Equation (19) maintains the Page–Hinkley cumulative statistic as follows:

P H_{t} = m a x \{0, P H_{t - 1} + (m_{t} - {\overline{m}}_{t} - δ)\},

(20)

where

{\overline{m}}_{t}

is the running mean of

m_{t}

, and

δ

is a slight difference, such as 0.01. Drift was flagged as follows:

P H_{t} > τ (f o r e x a m p l e τ = 5) .

(21)

If drift is

β_{t} \leftarrow m i n (2.0, β_{t} + 0.4)

, then apply forgetting to posteriors as follows:

A_{a} \leftarrow (1 - γ) A_{a}, b_{a} \leftarrow (1 - γ) b_{a}, γ = 0.02

(22)

Action selection and learning:

S c o r e e a c h a c t i o n b y s_{a} = {\tilde{θ}}_{a}^{⊤} x_{t},

(23)

S e l e c t a_{s e l e c t e d} = a r g {m a x}_{a} s_{a} .

(24)

For the rate-limit, produce a strength as follows:

ρ_{t} = σ (κ s_{a_{s e l e c t e d}}) \in [0,1], l o g i s t i c m a p; κ = 1

(25)

Upon observing the reward

r_{t}

, update the equations as follows:

A_{a_{s e l e c t e d}} \leftarrow A_{a_{s e l e c t e d}} + x_{t} x_{t}^{⊤}

(26)

b_{a_{s e l e c t e d}} \leftarrow b_{a_{s e l e c t e d}} + r_{t} x_{t}

(27)

The HATS-RL Algorithm

Initialize $A_{a} = λ I, b_{a} = 0, β = 1.0, P H = 0, \overline{m} = 0$ for all $a$ .
For each incoming flow $x_{t}$ :
Standardize $x_{t} \to z_{t}$ ; update (exponential moving averages) EMAs of $μ, σ$ .
For each $a$ : sample ${\tilde{θ}}_{a} \sim N (A_{a}^{- 1} b_{a}, (β σ^{2}) A_{a}^{- 1})$ .
Compute $s_{a} = {\tilde{θ}}_{a}^{⊤} x_{t}$ .
Choose $a_{s e l e c t e d} = a r g {m a x}_{a} s_{a}$ (and $ρ_{t}$ if rate-limit).
Execute $a_{s e l e c t e d}$ , observe $r_{t}$ , update $A_{a_{s e l e c t e d}} a n d b_{a_{s e l e c t e d}}$ .
Update $U_{mean}, R_{mean}$ , update Page–Hinkley.
If drift: increase $β$ and apply forgetting.

Compared to the LinUCB, this HATS-RL sampled from a dynamically scaled posterior (β scaled) when drift was suspected and tied exploration to measured uncertainty and regret, rather than a fixed confidence bound.

3.3.4. Federated Hierarchical Adaptive Thompson Sampling (F-HATS-RL)

This is a HATS-RL enhanced with a privacy trait, ensuring that statistics are shared without revealing the nodes’ data. Each node ran the HATS-RL locally. The nodes gossiped (

A_{a}, b_{a}

) without sharing their data. The nodes randomly selected a peer and exchanged their model parameters (matrices A and vectors b for each action). The receiving node then merged the peer’s parameters with its own by weighted average. The weight of merging was adaptive: if the two nodes had different features (their traffic characteristics differed), the policy assigned the peer’s information a lower weight; if their observed traffic characteristics were similar, it assigned the peer’s information a higher weight. This avoided overweighting dissimilar nodes. Knowledge of identified attacks, as reflected in the updated

A_{a}, b_{a,}

which provides evidence that certain features are malicious, was propagated across the network. Related federated IDS work has improved privacy and scalability, but often lacks drift-aware sequential decision-making or degrades under non-IID (independent and identically distributed) traffic [18,20]. The federated self-aware agents were developed to periodically gossip statistics and to merge them with similarity weights so that evidence of new attacks spreads quickly to peers observing related traffic, while dissimilar nodes downweight mismatched updates.

Gossip Sharing Protocol and Exchanged Information

To preserve privacy and to reduce communication costs, the F-HATS-RL did not transmit raw traffic records or labels. Instead, it followed the decentralized federated learning principle of exchanging model/learning summaries rather than data [37]. Each node periodically exchanged only lightweight posterior summaries for each action

a

, the matrix. Vectors

A_{a} \in R^{D \times D}

and

b_{a} \in R^{D}

are sufficient statistics for updating the posterior

(μ_{a}, Σ_{a})

without exposing flows.

Communication frequency and peer selection

For every

G

tick, each node selected one peer uniformly at random and exchanged

{\{A_{a}, b_{a}\}}_{a \in A}

. This was consistent with the “one-peer-to-one-peer” randomized gossip pattern used in decentralized federated learning and decentralized consensus-style averaging under bandwidth constraints [37].

Similarity-weighted merge to handle non-independent and identically distributed (non-IID) traffic

Because fog nodes often observe heterogeneous (non-IID) traffic, naively averaging peer updates could reduce local specialization and worsen performance [38]. To minimize negative transfer, peer similarity

d_{i j}

was mapped to a merge weight

α_{i j} \in [0,1]

(larger

α_{i j}

for more similar peers), and each node applied a simple convex mixing step [39], as follows:

A_{a}^{(i)} \leftarrow (1 - α_{i j}) A_{a}^{(i)} + α_{i j} A_{a}^{(j)}

(28)

b_{a}^{(i)} \leftarrow (1 - α_{i j}) b_{a}^{(i)} + α_{i j} b_{a}^{(j)}

(29)

This rule was easy to interpret:

α_{i j} = 0

kept the local model unchanged, while

α_{i j} = 1

adopted the peer summary, and intermediate values combined the two. Similarity-aware aggregation and clustering ideas have been widely used to address non-IID effects in modern federated learning by prioritizing updates from more compatible clients [39]. Repeated convex mixing over randomized peer exchanges was consistent with decentralized learning practice, where gossip-style communication was commonly used to disseminate information and approach consensus.

Analytical communication overhead (lightweight)

With

D = 8

and

| A | = 3

, each node sent three

8 \times 8

matrices and three

8 \times 1

vectors per gossip round, which is

3 (64 + 8) = 216

floating-point values. Using 32-bit floats, this was approximately 864 bytes per gossip (excluding headers). This demonstrates feasibility under constrained fog bandwidth.

F-HATS-RL (Federated HATS-RL privacy-preserving gossip) Algorithm

Each node tracked a running feature mean ${\overline{x}}_{i}$ for every $T_{g}$ ticks.
Compute group mean ${\overline{x}}_{G}$ and distance $d_{i} = ∥{\overline{x}}_{i} - {\overline{x}}_{G}∥$ .
Map to a merge weight $w_{i} \in [w_{m i n}, w_{m a x}]$ (closer $\Rightarrow$ larger) [40].
$w_{i} = w_{m i n} + (w_{m a x} - w_{m i n}) e^{- d_{i} / τ_{w}}, 0 < w_{m i n} \leq w_{m a x} \leq 1$ [41].

For a random peer

j

:

A_{a}^{(i)} \leftarrow (1 - w_{i}) A_{a}^{(i)} + w_{i} A_{a}^{(j)}

(30)

b_{a}^{(i)} \leftarrow (1 - w_{i}) b_{a}^{(i)} + w_{i} b_{a}^{(j)}

(31)

A_{a} i s

(D × D matrix) and

b_{a}

is a (D × 1 vector) [42].

5.: Learn with updated posteriors.

Hyperparameters

λ = 1.0

σ^{2} = 0.8

β \in [0.6,2.0] (init 1.0)

α_{EMA} = 0.01

c = 0.2 - 0.4

δ = 0.01, τ = 5 (P a g e - H i n k l e y)

γ = 0.02 (f o r g e t t i n g)

T_{g} = 10 - 50

w_{m i n} = 0.1

w_{m a x} = 0.6

τ_{w} = 0.5

Reward

The agents received a reward for an identified network traffic label. The per-flow reward

r_{t}

was defined from the binary label

y_{t}

and the selected action

a_{t}

as

r_{t} = + 1.5

when malicious traffic was correctly blocked or rate-limited (true positive), as

r_{t} = + 0.5

when benign traffic was correctly allowed (true negative), as

r_{t} = - 5.0

when malicious traffic was allowed (false negative), or as

r_{t} = - 1.0

when benign traffic was blocked or rate-limited (false positive). When rate-limiting was selected, enforcement was applied between a minimum rate and a maximum rate to emulate partial containment rather than complete blocking. For the rate-limiting action, the agents were rewarded based on throttle strength. If the flow was malicious, (1.5 * rate_strength). If the flow was benign, (−1 * rate_strength), where rate_strength ∈ [0,1] was computed by a sigmoid of the model’s confidence/ anomaly scores. Latency and energy were treated as evaluation metrics rather than embedded in

r_{t}

, keeping the learning signal focused on detection outcomes while enabling a transparent, multi-metric trade-off analysis across accuracy and resource use.

3.4. Experimental Procedure and Evaluation Metrics

A lightweight Python 3.11 simulator that emits streaming network flows per tick, injects gradual benign drift, and overlays bursty, overlapping attacks was adopted to represent a real-world fog network. The simulator was used to regulate drift intensity and attack concurrency, to provide per-flow ground truth for PR and ROC, to run policies in lockstep on identical traffic for fair comparison, and to log per-decision latency and energy. Each run covered 600 ticks (20,000 flows/node) with one policy per node. Unlike other simulators, such as iFogSim/iFogSim2 and (Yet Another Fog Simulator) YAFS [43], which focus on application placement and resource management, the Python 3.11 simulator was well-suited for per-flow intrusion labeling and drift handling. Furthermore, the simulator was chosen over EdgeCloudSim because it focuses on edge offloading and mobility, rather than on a packet- or flow-level IDS with concept drift controls. OMNeT++ can model packets in detail, but reproducing our needs, which are drift/attack superposition and federated summary gossip, would demand substantial model development. Several metrics were applied to compare the proposed policies/models, which are the self-aware HATS-RL and FHATS-RL with the ST and LinUCB policies/models for real-time intrusion detection in fog/edge networks.

For reproducibility, the simulator was designed to provide controlled and repeatable drift and attack injection while maintaining lightweight fog constraints. To enable replication, key configuration parameters, such as feature-generation ranges, drift schedule and strength, attack-burst parameters, and policy hyperparameters, were included in the Python 3.11 simulator. Where possible, configuration files and the random seeds used to generate the reported results can be provided upon request to facilitate verification.

Metrics

Table 2 presents the evaluation metrics for assessing intrusion detection effectiveness, including classification performance and operational feasibility, as well as reward, latency, and energy in streaming fog environments. The malicious class (attacks) is considered the positive class.

Table 2. Summary of Evaluation Metrics.

4. Results

The performance of the proposed self-aware/ self-learning HATS-RL and FHATS-RT policies, the LinUCB, and the ST model was compared for real-time intrusion detection in fog/edge networks. This section presents this study’s findings using various evaluation metrics.

4.1. Receiver Operating Characteristic (ROC) Curves

To indicate the models’ detection capability, Figure 2 shows the ROC curves for the adopted policies. The static threshold (ST) yielded the lowest AUROC (0.696), indicating a degraded intrusion-detection performance in dynamic fog networks. By contrast, the F-HATS-RL achieved a higher AUROC than the HATS-RL and the LinUCB. The area under the ROC (AUROC) values were 0.918 for the LinUCB and HATS-RL, and 0.933 for the F-HATS-RL. The F-HATS-RL model achieved the highest AUROC, suggesting it was more accurate at identifying drifting malicious traffic than other models. Furthermore, the F-HATS-RL’s higher AUROC than other policies validated that the proposed nodes-gossiping approach enhanced intrusion detection, indicating improved collaborative learning.

Figure 2. The ROC curves for the adopted IDS policies/models. The dashed diagonal line indicates the no-discrimination (random-guessing) baseline where TPR = FPR (AUROC = 0.5).

4.2. Precision–Recall (PR) Curves

Figure 3 shows the policies’ PR (precision–recall) curves. The curves compared the policies’ performance in detecting the positive (attack) class. Notably, the F-HATS-RL and the HATS-RL had the highest AUPR. The F-HATS-RL yielded the highest of 0.857, the HATS-RL was a runner-up with 0.816, followed by the LinUCB with 0.797, and the static threshold was the lowest with 0.356. This suggested that the self-aware RL policies (FHATS-RL and HATS-RL) achieved much higher precision at a given recall than the ST and the LinUCB policies. The FHATS-RL had significantly fewer false alarms than other models, indicating superior intrusion detection capability in drifting fog/edge networks.

Figure 3. Precision–Recall curves for the adopted IDS policies/models. The dashed line marks the no-skill baseline (random precision).

4.3. Learning Curves

The learning curves shown in Figure 4 indicate the policies’ adaptability to dynamic attacks over run. The learning curves show a short warm-up/preparation of 50 ticks followed by rapid F1 gains for the RL policies, whereas the static threshold gradually increased and remained far lower. The LinUCB attained the highest peak (F1 = 0.57 to 0.58 around tick =160) and yielded the best average F1 over the run. The HATS-RL and F-HATS-RL peaks were slightly lower (0.52–0.54) and tracked closely, indicating smoother adaptation despite a smaller F1 gap than in the LinUCB. The policies experienced a significant dip near ticks 200–250, consistent with concept drift/attack bursts. This was followed by partial recovery from 430 ticks onward (F1 = 0.41 for the LinUCB, F1 = 0.39 for the HATS-RL, F1 = 0.38 for the F-HATS-RL, and F1 = 0.27 for the static threshold). This implies that the self-aware RL markedly outperformed the static threshold in drift adaptation. The LinUCB achieved the highest F1, while the HATS variants traded a minor F1 deficit for stable, drift-aware behavior, which is suitable for real-time fog settings.

Figure 4. The learning curves for the adopted IDS policies/models.

4.4. Reward Curves

Figure 5 shows the policy rewards over time. The mean reward curve shows correct attack detections minus penalties for false decisions, latency, and energy consumption. The LinUCB, HATS-RL, and F-HATS-RL rapidly improved from negative reward at 0 ticks to positive values by 150 ticks, while the static threshold remained negative throughout, indicating a degraded performance. A dip in RL policy performance (200–400 ticks) indicates concept drift/non-stationarity. Among the RL methods, the LinUCB reached the highest early peak. By contrast, the F-HATS-RL yielded a small, consistent, relatively late gain over the HATS-RL owing to knowledge transfer via federated gossip. The proposed self-aware RL policies maintained positive utility and decisively outperformed the static threshold, supporting their feasibility for real-time intrusion detection in fog networks.

Figure 5. The adopted policies’ average reward over time.

4.5. Loss Curves

The loss curves shown in Figure 6 present the policies’ negative (−) mean reward over ticks. Values below 0 indicate optimal policy/model performance. The drop in loss was pronounced for all reinforcement-learning policies at 150 ticks, from 0.45 to a slightly negative minimum (−0.10), indicating that learning quickly yielded a desirable reward. The static threshold yielded the worst performance; after this drop, it stabilized from a 0.40 to a 0.50 loss, never achieving a positive reward. The LinUCB attained the lowest loss among other methods. The HATS-RL and F-HATS-RL tracked closely, with marginal differences. Two transient degradations were visible around ticks 230–320 and 380–420, validating the drift (attack bursts), after which the RL policies reconverged near zero loss. By the end of the run (tick 600), the HATS-RL and F-HATS-RL losses hovered slightly below zero = (−0.02 to −0.04), which confirmed sustained positive reward, whereas the static threshold remained high, revealing the strength of self-learning policies for a real-time fog IDS.

Figure 6. The loss curves for the adopted IDS policies/models.

4.6. Latency

Figure 7 presents the latency of the policies (milliseconds/ms) over time. The static threshold was the fastest, at 0.23 ms per flow, due to fewer computations. The HATS-RL achieved 0.28 ms, and the F-HATS-RL yielded 0.27 ms per flow, while the LinUCB achieved 0.34 ms. The extra cost of the LinUCB was due to linear algebra: it computed an inverse and solved a linear system for each decision’s UCB. By contrast, the proposed HATS-RL policies sampled from a posterior (an updated belief about their model’s parameters) and could reuse the existing structure. The slight latency differences between the static threshold and the proposed self-aware policies are negligible for fog workloads. The sub-millisecond decision times indicate that all policies comfortably handled fog traffic loads (500 flows/s) and remained feasible at several thousand flows/s.

Figure 7. The policies’ latency (ms) over time.

4.7. Energy Consumption

Using a baseline unit cost of 0.01 mJ/decision, the static threshold mean per decision energy was 0.0156 millijoules (mJ), the LinUCB was 0.0155, the HATS-RL yielded 0.0138, and the F-HATS-RL yielded 0.0137, as shown in Figure 8, which indicates a minimal inter-policy energy consumption variation. The proposed self-aware RL variants exhibited slightly lower consumption due to lighter per-tick updates and selective blocking, which reduced processing complexity. This implies that the F-HATS-RL and HATS-RL are compatible with constrained-edge hardware, such as a 5 W router or a Raspberry Pi.

Figure 8. The energy consumption for the adopted policies/models.

4.8. Discussion

The findings revealed that the LinUCB can achieve a higher peak F1 during stable periods because it aggressively exploits under the near-stationary assumption. However, in non-stationary fog traffic, accumulated statistics can become outdated after concept drift, delaying recovery. The HATS-RL and F-HATS-RL monitor uncertainty and a regret proxy and use Page–Hinkley drift alarms to trigger adaptive exploration (posterior covariance inflation) and lightweight forgetting. This design prioritizes faster recovery and more stable long-term behavior under drift, at the cost of temporary additional exploration. Accordingly, the contribution of the HATS-RL variants should be interpreted as improving robustness and stability in dynamic fog environments, rather than solely maximizing peak F1.

Previous research has developed advanced intrusion detection systems for fog and edge networks, but these often face limitations in accuracy and adaptability. For instance, a hybrid smart-home IDS on the CSE-CIC-IDS2018 dataset, using Random Forest/XGBoost, achieved high accuracy [16]. Yet, its reliance on a static dataset limits its ability to generalize in changing fog network environments. An autoencoder–isolation forest in [17] reached 95.4% accuracy on the NSL-KDD dataset. Still, its binary attack classification and use of outdated data failed to detect multi-attack scenarios or address drift at edge nodes. The drift-adaptive online DDoS schemes introduced by [18] addressed concept drift and specific zero-day threats, but they were attack-specific and were not tested against a diverse set of threats. Federated fog IDS systems, like 2FIDS [14], maintained privacy and scaled across Bot-IoT/TON-IoT, yet their performance declined on the MQTTset, and they lacked sequential and self-awareness decision policies. The deep hybrid CNN-LSTM in [19] achieved 99.5% accuracy on CICDDoS2019, but its high computational demands limited real-time deployment on fog/edge devices. The new self-aware RL agents (HATS-RL) fill these gaps by providing broad attack recognition and adapting to network traffic changes (drift). Additionally, the F-HATS-RL achieved over 91% in real-time attack detection through decentralized gossip-based coordination. These results suggest that this approach is resource-efficient and can adapt to drift, making it suitable for fog network intrusion detection systems.

5. Conclusions and Future Work

This study proposed a self-aware reinforcement-learning intrusion detection system for fog networks (HATS-RL) and its federated extension (F-HATS-RL). Across real-time, multi-attack streaming scenarios, the proposed agents achieved over 90% accuracy and outperformed the static threshold baseline and the LinUCB contextual bandit in terms of AUROC and AUPR, indicating improved robustness under evolving traffic conditions. In particular, the HATS-RL maintained reliable performance after distribution shifts by explicitly coupling drift awareness to the exploration–exploitation dynamics, enabling faster recovery when traffic changed. The federated variant, F-HATS-RL, achieved the best overall performance (AUROC = 0.933 and AUPR = 0.857), demonstrating that decentralized collaboration can improve detection quality without sharing raw traffic.

This study’s novelty lies not only in combining concept-drift-aware reinforcement learning with federated learning, but in providing a principled approach to maintaining the reliability of IDS decision policies in real-time fog traffic by linking drift/uncertainty signals to policy adaptation. This study advances intrusion detection theory by reframing a fog IDS from a static classifier into a streaming, context-based decision task in which the agent must continuously adapt its action policy as benign behavior and attacks evolve. The federated (gossip) extension contributes to fog security research by enabling privacy-preserving and bandwidth-aware collaboration: nodes share learning summaries rather than raw traffic, thereby improving detection when each node observes only partial and heterogeneous data. Together, drift-aware adaptation and decentralized sharing addressed drift and distributed observability under resource constraints.

Limitations and Future Directions

Simulation-only evaluation: The evaluation was conducted using a controlled Python 3.11 simulator rather than an established real-traffic benchmark. While simulation is standard in fog research for repeatability and controlled drift/attack injection, future work should validate the HATS-RL and F-HATS-RL on real IoT/fog datasets to quantify energy under realistic constraints.
Hardware and energy modeling: Communication overhead was defined, and the algorithms were designed to remain lightweight, but they were not benchmarked on actual fog gateways (such as Raspberry Pi or edge networks). Future work will implement the agents on fog or edge hardware and measure latency/energy for allow/rate-limit/block actions under varying load.
Federated adversaries and poisoning: Poisoning or Byzantine peers in gossip federations have not been experimentally studied. A systematic adversarial evaluation is planned as future work, consistent with security considerations in federated learning for IDSs.
Attack labeling granularity: The simulator injected multiple attack types and overlapping bursts, yet the attack labels were binary (benign/malicious). Future work will extend the evaluation to multi-class settings, reporting per-attack performance and robustness curves as a function of drift severity.
Topology scale: A micro-fog deployment of eight nodes was adopted to reflect a realistic fog network and to enable drift analysis. Scalable multi-domain topologies and hierarchical federation, such as fog-to-cloud or cluster-to-cluster, are extensions for future work.

Author Contributions

N.T.: analysis and research write-up. T.E.M.: technical review, idea enhancement, and guidance. T.O.O.: technical review, idea improvement, and guidance. S.M.: technical review, idea improvement, and guidance. T.M.: technical review, idea improvement, and guidance. T.M.T.: technical review, idea improvement, and guidance. All authors have read and agreed to the published version of the manuscript.

Funding

Tshwane University of Technology.

Data Availability Statement

Available upon request from the authors and the funder.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of this study, in the collection, analysis, or interpretation of the data, in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AUPR	Area Under the Precision–Recall curve
AUROC	Area Under the Receiver Operating Characteristic Curve
CNN	Convolutional Neural Network
DDoS	Distributed Denial of Service
DNN	Deep Neural Network
EMA	Exponential Moving Average
FL	Federated Learning
F-HATS-RL	Federated Hierarchical Adaptive Thompson Sampling–Reinforcement Learning
HATS-RL	Hierarchical Adaptive Thompson Sampling–Reinforcement Learning
IDS	Intrusion Detection System
IoT	Internet of Things
LinUCB	Linear Upper Confidence Bound (Contextual Bandit)
LSTM	LSTM
ML	Machine Learning
non-IID	Non-Independent and Identically Distributed (Heterogeneous Client/Node Data)
RL	Reinforcement Learning
ROC	Receiver Operating Characteristic
ST	Static Threshold

References

Mohamed, D.; Ismael, O. Enhancement of an IoT hybrid intrusion detection system based on fog-to-cloud computing. J. Cloud Comput. 2023, 12, 41. [Google Scholar] [CrossRef]
Mahjoub, C.; Hamdi, M.; Alkanhel, R.I.; Mohamed, S.; Ejbali, R. An adversarial-environment reinforcement-learning-driven intrusion-detection algorithm for the Internet of Things. EURASIP J. Wirel. Commun. Netw. 2024, 2024, 21. [Google Scholar] [CrossRef]
Saeed, I.A.; Selamat, A.; Rohani, F.; Krejcar, O. Performance Evaluation of Reinforcement Learning-Based Intrusion Detection Systems. IEEE Access 2025. [Google Scholar] [CrossRef]
Rehman, T.; Tariq, N.; Khan, F.A.; Rehman, S.U. FFL-IDS: A FOG-Enabled Federated Learning-Based Intrusion Detection System to counter jamming and spoofing attacks for the industrial Internet of Things. Sensors 2024, 25, 10. [Google Scholar] [CrossRef]
Zhou, C.V.; Leckie, C.; Karunasekera, S. A survey of coordinated attacks and collaborative intrusion detection. Comput. Secur. 2010, 29, 124–140. [Google Scholar] [CrossRef]
Tawfik, M. Optimized intrusion detection in IoT and fog computing using ensemble learning and advanced feature selection. PLoS ONE 2024, 19, e0304082. [Google Scholar] [CrossRef]
Han, W.; Peng, J.; Yu, J.; Kang, J.; Lu, J.; Niyato, D. Heterogeneous data-aware federated learning for intrusion detection systems via meta-sampling in artificial intelligence of things. IEEE Internet Things J. 2023, 11, 13340–13354. [Google Scholar] [CrossRef]
Fedorchenko, E.; Novikova, E.; Shulepov, A. Comparative review of the intrusion detection systems based on federated learning: Advantages and open challenges. Algorithms 2022, 15, 247. [Google Scholar] [CrossRef]
Chang, V.; Golightly, L.; Modesti, P.; Xu, Q.A.; Doan, L.M.T.; Hall, K.; Boddu, S.; Kobusińska, A. A survey on intrusion detection systems for fog and cloud computing. Future Internet 2022, 14, 89. [Google Scholar] [CrossRef]
Chou, D.; Jiang, M. A survey on data-driven network intrusion detection. ACM Comput. Surv. CSUR 2021, 54, 182. [Google Scholar] [CrossRef]
Otoum, Y.; Nayak, A. As-ids: Anomaly and signature-based IDs for the Internet of Things. J. Netw. Syst. Manag. 2021, 29, 23. [Google Scholar] [CrossRef]
Banaamah, A.M.; Ahmad, I. Intrusion detection in IoT using deep learning. Sensors 2022, 22, 8417. [Google Scholar] [CrossRef] [PubMed]
Ouyang, T.; Chen, X.; Zhou, Z.; Li, R.; Tang, X. Adaptive user-managed service placement for mobile edge computing via contextual multi-armed bandit learning. IEEE Trans. Mob. Comput. 2021, 22, 1313–1326. [Google Scholar]
Wassermann, S.; Cuvelier, T.; Mulinka, P.; Casas, P. Adaptive and reinforcement learning approaches for online network monitoring and analysis. IEEE Trans. Netw. Serv. Manag. 2020, 18, 1832–1849. [Google Scholar] [CrossRef]
Bensaid, R.; Labraoui, N.; Saidi, H.; Bany Salameh, H. Securing fog-assisted IoT smart homes: A federated learning-based intrusion detection approach. Clust. Comput. 2025, 28, 50. [Google Scholar] [CrossRef]
Alghayadh, F.; Debnath, D. A hybrid intrusion detection system for smart home security. In Proceedings of the 2020 IEEE International Conference on Electro Information Technology (EIT), Chicago, IL, USA, 31 July–1 August 2020; pp. 319–323. [Google Scholar]
Sadaf, K.; Sultana, J. Intrusion detection based on an autoencoder and an isolation forest in fog computing. IEE Access 2020, 8, 167059–167068. [Google Scholar] [CrossRef]
Beshah, Y.K.; Abebe, S.L.; Melaku, H.M. Drift adaptive online DDoS attack detection framework for IoT system. Electronics 2024, 13, 1004. [Google Scholar] [CrossRef]
Zainudin, A.; Ahakonye, L.A.C.; Akter, R.; Kim, D.S.; Lee, J.M. An efficient hybrid-DNN for DDoS detection and classification in software-defined IoT networks. IEEE Internet Things J. 2022, 10, 8491–8504. [Google Scholar] [CrossRef]
Umar, H.G.A.; Yasmeen, I.; Aoun, M.; Mazhar, T.; Khan, M.A.; Jaghdam, I.H.; Hamam, H. Energy-efficient deep learning-based intrusion detection system for edge computing: A novel DNN-KDQ model. J. Cloud Comput. 2025, 14, 32. [Google Scholar] [CrossRef]
Tariq, N.; Alsirhani, A.; Humayun, M.; Alserhani, F.; Shaheen, M. A fog-edge-enabled intrusion detection system for smart grids. J. Cloud Comput. 2024, 13, 43. [Google Scholar] [CrossRef]
Yao, W.; Hu, L.; Hou, Y.; Li, X. A lightweight intelligent network intrusion detection system using one-class autoencoder and ensemble learning for IoT. Sensors 2023, 23, 4141. [Google Scholar] [CrossRef]
Frimpong, S.A.; Han, M.; Zheng, W.; Chowdhury, I.J. An adaptive collaborative intrusion detection system for vehicular fog computing networks. Eng. Appl. Artif. Intell. 2025, 158, 111563. [Google Scholar] [CrossRef]
Sudqi Khater, B.; Abdul Wahab, A.W.B.; Idris, M.Y.I.B.; Abdulla Hussain, M.; Ahmed Ibrahim, A. A lightweight perceptron-based intrusion detection system for fog computing. Appl. Sci. 2019, 9, 178. [Google Scholar] [CrossRef]
Patil, S.; Varadarajan, V.; Mazhar, S.M.; Sahibzada, A.; Ahmed, N.; Sinha, O.; Kumar, S.; Shaw, K.; Kotecha, K. Explainable artificial intelligence for an intrusion detection system. Electronics 2022, 11, 3079. [Google Scholar] [CrossRef]
Shi, Q.; Xiao, F.; Pickard, D.; Chen, I.; Chen, L. Deep neural network with linucb: A contextual bandit approach for personalized recommendation. In Proceedings of the Companion Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 778–782. [Google Scholar]
Albulayhi, K.; Sheldon, F.T. An adaptive deep-ensemble anomaly-based intrusion detection system for the Internet of Things. In Proceedings of the 2021 IEEE World AI IoT Congress (AIIoT), Online, 10–13 May 2021; pp. 187–196. [Google Scholar]
Li, C.; Wang, H. Asynchronous upper confidence bound algorithms for federated linear bandits. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Online, 28–30 March 2022; pp. 6529–6553. [Google Scholar]
Agrahari, S.; Singh, A.K. Concept drift detection in data stream mining: A literature review. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 9523–9540. [Google Scholar] [CrossRef]
Palli, A.S.; Jaafar, J.; Gomes, H.M.; Hashmani, M.A.; Gilal, A.R. An experimental analysis of drift detection methods on multi-class imbalanced data streams. Appl. Sci. 2022, 12, 11688. [Google Scholar] [CrossRef]
Shang, D.; Zhang, G.; Lu, J. Novelty-aware concept drift detection for neural networks. Neurocomputing 2025, 617, 128933. [Google Scholar] [CrossRef]
Fangwei, K.F. Contextual Bandit Analysis of the LinUCB Disjoint Algorithm with a Dataset. Contextual Bandits Analysis of LinUCB Disjoint Algorithm with Dataset. 2020. Available online: https://kfoofw.github.io/contextual-bandits-linear-ucb-disjoint/#:~:text=In%20matrix%20formulation%2C%20the%20above,return%20can%20be%20formulated%20as (accessed on 30 January 2026).
Xu, R.; Min, Y.; Wang, T. Noise-adaptive Thompson sampling for linear contextual bandits. Adv. Neural Inf. Process. Syst. 2023, 36, 23630–23657. [Google Scholar]
Qi, H.; Guo, F.; Zhu, L. Thompson Sampling for Non-Stationary Bandit Problems. Entropy 2025, 27, 51. [Google Scholar] [CrossRef]
Dai, Z.; Low, B.K.H.; Jaillet, P. Federated Bayesian optimization via Thompson sampling. Adv. Neural Inf. Process. Syst. 2020, 33, 9687–9699. [Google Scholar]
Seth, S.; Chahal, K.K.; Singh, G. Concept drift–based intrusion detection for evolving data stream classification in IDS: Approaches and comparative study. Comput. J. 2024, 67, 2529–2547. [Google Scholar] [CrossRef]
Yuan, L.; Wang, Z.; Sun, L.; Yu, P.S.; Brinton, C.G. Decentralized federated learning: A survey and perspective. IEEE Internet Things J. 2024, 11, 34617–34638. [Google Scholar] [CrossRef]
Chung, W.C.; Lo, C.A.; Lin, Y.H.; Chen, Z.H.; Hung, C.L. Decentralized Federated Learning with Non-IID Data: Challenges, Trends, and Future Opportunities. ACM Comput. Surv. 2026, 58, 192. [Google Scholar] [CrossRef]
Yan, Y.; Tong, X.; Wang, S. Clustered federated learning in heterogeneous environment. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 12796–12809. [Google Scholar] [CrossRef] [PubMed]
Yang, S.; Zheng, X.; Li, J.; Xu, J.; Zhang, X.; Ngai, E.C. Self-Supervised Adaptation Method to Concept Drift for Network Intrusion Detection. IEEE Trans. Dependable Secur. Comput. 2025, 22, 7632–7646. [Google Scholar] [CrossRef]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
Bénézit, F.; Blondel, V.; Thiran, P.; Tsitsiklis, J.; Vetterli, M. Weighted gossip: Distributed averaging using non-doubly stochastic matrices. In Proceedings of the 2010 IEEE International Symposium on Information Theory, Austin, TX, USA, 13–18 June 2010; pp. 1753–1757. [Google Scholar]
Fahimullah, M.; Philippe, G.; Ahvar, S.; Trocan, M. Simulation tools for fog computing: A comparative analysis. Sensors 2023, 23, 3492. [Google Scholar] [CrossRef]

Figure 1. The study’s simulation stages.

Figure 2. The ROC curves for the adopted IDS policies/models. The dashed diagonal line indicates the no-discrimination (random-guessing) baseline where TPR = FPR (AUROC = 0.5).

Figure 3. Precision–Recall curves for the adopted IDS policies/models. The dashed line marks the no-skill baseline (random precision).

Figure 4. The learning curves for the adopted IDS policies/models.

Figure 5. The adopted policies’ average reward over time.

Figure 6. The loss curves for the adopted IDS policies/models.

Figure 7. The policies’ latency (ms) over time.

Figure 8. The energy consumption for the adopted policies/models.

Table 1. Comparison of Related Intrusion Detection Approaches for Fog Networks.

Study	Dataset	Techniques	Findings	Limitations
Autoencoder–Isolation Forest (Auto-IF) [17]	NSL-KDD	Autoencoder and Isolation Forest	Reached 95.4% accuracy; effective anomaly detection.	Binary classification oversimplifies the diversity of real attacks; the dataset is outdated and ignores drift and energy requirements in fog environments.
Drift-Adaptive Online DDoS Framework [18]	AUWPAE (online traffic)	Drift detection and online adaptive learning	Handled concept drift and zero-day DDoS with robustness.	Attack-specific (DDoS only), not generalized to heterogeneous threats.
Federated Fog Learning IDS (2FIDS) [14]	Bot-IoT, TON-IoT	Federated learning across fog nodes	Preserved data privacy, scaled across IoT datasets. Accurate on Bot-IoT/TON-IoT.	Degraded accuracy on the MQTTset (86%). Lacked sequential decision-making and cost-aware response.
Hybrid CNN-LSTM IDS [19]	CICDDoS2019	Deep hybrid CNN + LSTM	Reported 99.5% accuracy; low-latency intrusion detection.	Computationally expensive. Unsuitable for real-time deployment on resource-limited fog nodes.
DNN-KDQ IDS (Edge) [20]	CICIDS2017	Knowledge Distillation and Quantization (lightweight DNN)	Reported 99.43% accuracy; model size reduced from 197 KB to 20 KB; edge-suitable.	Evaluated on a fixed, stationary dataset. No concept drift and no dynamic multi-attack consideration.
Fog-Edge Federated SVM (Smart Grids) [21]	NSL-KDD and CICIDS2017	Federated SVM with parameter sharing.	Outperformed NSL-KDD/CICIDS2017 baselines. Reserves privacy via federation	No analysis of drift behavior or non-IID nodes. Latency and energy consumption at scale are not reported.
Bi-GRU Autoencoder Ensemble (IoT) [22]	KDD CUP99, UNSW-NB15 and WSN-DS datasets	One-class Bi-GRU autoencoder ensemble (lightweight)	Reduced parameters and can flag unknown attacks. Useful for constrained IoT devices	No evaluation under internode heterogeneity/drift. No cross-node information sharing
A-CIDS (Vehicular Fog) [23]	(ToN-IoT) datasets	Adaptive, collaborative IDS for vehicular/fog networks	Emphasized scalable, low-latency collaborative detection.	Overlooked dynamic multi-attack scenarios and federated learning for IDS and concept drift.
Perceptron-based Fog IDS [24]	ADFA-WD and ADFA-LD datasets.	Simple perceptron classifier (ultra-lightweight)	Reported 94% accuracy. Minimal compute suitable for edge devices.	Relied on static datasets, no federation, and concept drift evaluation.

Table 2. Summary of Evaluation Metrics.

Metric	Formula	Description	Interpretation
TP (True Positives)	count	Malicious flows correctly detected as malicious.	Higher TP is better.
TN (True Negatives)	count	Benign flows correctly detected as benign.	Higher TN is better.
FP (False Positives)	count	Benign flows misclassified as malicious (false alarms).	Lower FP is better.
FN (False Negatives)	count	Malicious flows misclassified as benign (missed attacks).	Lower FN is better.
Accuracy	(TP+TN)/(TP + TN + FP + FN)	Overall fraction of correctly classified flows.	Higher accuracy is better.
Precision	TP/(TP + FP)	Fraction of flagged flows that were actually attacks.	Higher precision is better; it penalizes false alarms.
Recall (True Positive Rate)	TP/(TP + FN)	Fraction of attacks detected.	Higher recall is better; sensitive to missed attacks.
Specificity (True Negative Rate)	TN/(TN + FP)	Fraction of benign traffic correctly allowed.	Higher specificity is better.
FPR (False Positive Rate)	FP/(FP + TN)	False positive rate (false alarm rate).	Lower FPR is better.
FNR (False Negative Rate)	FN/(FN + TP)	False negative rate (miss rate).	A lower FNR is better.
F1-score	2*(Precision–Recall)/(Precision + Recall)	Harmonic mean of precision and recall.	A higher F1-score is better; robust under imbalance.
AUROC	Area Under the Receiver Operating Characteristic (ROC) curve	Threshold-free discrimination (TPR vs. FPR across thresholds).	Higher is better; 0.5 is random.
AUPR	Area Under Precision–Recall Curve	Threshold-free precision–recall performance; informative when attacks are rare.	Higher is better; preferred under imbalance.
Mean reward	$\frac{1}{T} \sum_{t = 1}^{T} r_{t}$ $T$ is the number of processed flows	Average reward per tick/flow for RL policies.	Higher indicates better decision outcomes.
Latency	ms per flow	CPU time per decision (detection) in milliseconds.	Lower values are preferable; they support real-time operation.
Energy	mJ per flow	Estimated energy consumed per decision in millijoules.	Lower is better; supports resource constraints.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.