Dual-Factor Adaptive Robust Aggregation for Secure Federated Learning in IoT Networks

Song, Zuan; Tan, Wuzheng; Wang, Hailong; Zhang, Guilong; Weng, Jian

doi:10.3390/fi18040201

Open AccessArticle

Dual-Factor Adaptive Robust Aggregation for Secure Federated Learning in IoT Networks

by

Zuan Song

,

Wuzheng Tan

^*

,

Hailong Wang

,

Guilong Zhang

and

Jian Weng

College of Cyber Security, Jinan University, Guangzhou 511436, China

^*

Author to whom correspondence should be addressed.

Future Internet 2026, 18(4), 201; https://doi.org/10.3390/fi18040201

Submission received: 11 March 2026 / Revised: 9 April 2026 / Accepted: 9 April 2026 / Published: 10 April 2026

(This article belongs to the Special Issue Federated Learning: Challenges, Methods, and Future Directions)

Download

Browse Figures

Versions Notes

Abstract

Federated Learning (FL) has been widely adopted in privacy-sensitive and distributed environments. However, training stability becomes significantly challenged when differential privacy (DP) noise and Byzantine client behaviors coexist, as these heterogeneous perturbations jointly introduce time-varying distortions to model updates. Existing approaches typically address privacy and robustness in isolation. Under DP constraints, noise injection increases gradient variance and obscures the distinction between benign and adversarial updates, causing many robust aggregation methods to misclassify normal clients or fail to detect malicious ones. As a result, their effectiveness degrades substantially in practical IoT environments where noise and attacks interact. In this work, we propose a dual-factor adaptive and robust aggregation framework (DARA) to improve the stability of FL under such combined disturbances. DARA adjusts the differential privacy noise scale by jointly considering local update magnitudes and training-round dynamics, aiming to mitigate noise-induced bias under a fixed privacy budget. Meanwhile, a direction-aware weighted aggregation scheme assigns continuous trust weights based on cosine similarity between updates, thereby suppressing the influence of potentially anomalous or adversarial clients. We conduct extensive experiments on multiple benchmark datasets to evaluate DARA under differential privacy constraints and Byzantine attack scenarios. The results indicate that DARA achieves favorable robustness and convergence behavior compared with representative aggregation baselines, while maintaining competitive model accuracy.

Keywords:

federated learning; differential privacy; robust aggregation; Byzantine attacks

1. Introduction

Federated Learning (FL) has emerged as a promising paradigm for collaborative model training in Internet of Things (IoT) systems, where massive edge devices jointly learn a global model without sharing raw data [1,2,3,4]. By keeping data local and exchanging only model updates, FL naturally aligns with the privacy, ownership, and regulatory requirements of IoT applications such as smart healthcare, industrial sensing, intelligent transportation, and large-scale edge vision systems. However, deploying FL in open IoT environments introduces challenges that go far beyond the assumptions of centralized or cross-silo learning [5,6,7]. In this context, federated learning for IoT must simultaneously address privacy protection and robustness against adversarial behaviors, giving rise to a growing line of research on noise-aware and secure aggregation mechanisms.

Unlike well-controlled data centers, IoT-enabled FL systems are characterized by resource-constrained devices, highly heterogeneous data distributions, unreliable connectivity, and a fundamentally untrusted operating environment. First, limited computation and communication capabilities make IoT devices particularly sensitive to excessive noise injection and slow convergence. Second, data collected by distributed IoT sensors are inherently non-IID due to diverse sensing conditions and deployment locations, leading to significant gradient diversity. More critically, IoT devices are widely exposed to physical compromise, malware infection, and misconfiguration, rendering Byzantine behaviors a realistic and persistent threat rather than an extreme assumption [8,9,10,11]. For example, in smart healthcare IoT systems, wearable devices continuously collect sensitive physiological signals, where both privacy leakage and compromised devices can severely impact system reliability.

To protect sensitive local data against inference attacks, differential privacy (DP) has been widely adopted as a practical privacy-preserving mechanism in FL [12,13,14,15,16]. By injecting noise into local updates or aggregated results, DP provides formal privacy guarantees. However, this protection introduces a fundamental trade-off between privacy and utility. Strong privacy requirements often necessitate large noise magnitudes, which increase gradient variance, slow convergence, and amplify communication overhead—effects that are particularly detrimental in resource-limited IoT environments.

The presence of Byzantine clients further exacerbates this challenge. Malicious or compromised IoT devices may upload arbitrarily corrupted updates to disrupt or manipulate the global model. Although numerous robust aggregation methods have been proposed to mitigate Byzantine attacks, most of them implicitly assume unbiased or light-tailed noise distributions [17,18,19,20]. In differentially private IoT FL systems, DP noise not only increases gradient variance but also distorts directional information, which is critical for robust aggregation. As a result, adversarial updates can be concealed within the noise distribution, making them statistically indistinguishable from benign updates. This effect fundamentally weakens the reliability of existing robust aggregation rules, leading to both a false suppression of benign clients and failure to filter malicious updates.

Existing approaches typically treat privacy preservation and Byzantine robustness separately, without explicitly modeling their interaction. In particular, static or globally scheduled DP mechanisms fail to adapt to dynamic training states and heterogeneous client behaviors, while magnitude-based robust aggregation schemes often become unreliable when gradient directions are distorted by privacy noise and non-IID data [21,22,23]. This limitation significantly restricts the effectiveness of existing methods in realistic IoT environments.

In this work, we argue that effective federated learning in IoT systems requires a joint and adaptive treatment of privacy and robustness, explicitly accounting for their coupled effects on optimization dynamics. Motivated by this observation, we propose DARA (dual-factor adaptive and robust aggregation), a unified federated learning framework tailored for privacy-sensitive and adversarial IoT environments.

On the client side, DARA introduces a dual-factor adaptive differential privacy mechanism that dynamically adjusts the noise scale based on both training progress and local update magnitude, mitigating excessive noise in later training stages while preserving strong privacy protection when updates are most sensitive. On the server side, DARA employs a direction-aware robust aggregation strategy that leverages gradient directional consistency rather than raw magnitude, enabling effective suppression of Byzantine updates even in the presence of DP-induced randomness and non-IID heterogeneity. Through this coordinated design, DARA achieves stable convergence and strong robustness under realistic IoT threat models.

Extensive experiments on multiple benchmark datasets under differential privacy constraints and representative Byzantine attack scenarios demonstrate that DARA consistently outperforms state-of-the-art aggregation baselines in terms of robustness, convergence stability, and communication efficiency, while maintaining competitive model accuracy under equivalent privacy budgets. These results highlight the importance of jointly designing privacy and robustness mechanisms for practical federated learning in IoT systems.

The main contributions of this work are summarized as follows:

We propose DARA, a unified framework that jointly addresses privacy preservation and adversarial robustness under IoT-specific constraints.
We design a dual-factor adaptive differential privacy mechanism that dynamically balances privacy protection and convergence stability.
We develop a direction-aware robust aggregation strategy that effectively suppresses Byzantine updates while preserving benign non-IID variations.
Extensive experimental results demonstrate that DARA achieves superior robustness and stable convergence compared with representative baselines under realistic IoT attack and privacy settings.

2. Related Work

Existing research on federated learning (FL) security can be broadly categorized into two largely independent directions: robust aggregation against Byzantine attacks and privacy preservation via differential privacy (DP). While both lines of work are well-studied, their interaction remains insufficiently explored, particularly in IoT settings where data heterogeneity, resource constraints, and adversarial behaviors coexist.

To mitigate the influence of malicious clients, numerous robust aggregation methods have been proposed. Classical approaches such as Krum [9] select updates closest to the majority in Euclidean space, while coordinate-wise rules such as Median and Trimmed Mean [10] remove extreme values. These methods are effective under IID assumptions but often degrade in non-IID scenarios, where benign updates may naturally deviate due to data heterogeneity. More advanced techniques, including Bulyan [21] and FoolsGold [22], incorporate additional statistical or historical information to improve robustness. However, they still rely on the implicit assumption that benign updates form a compact cluster, which may not hold in highly heterogeneous IoT environments. When data distributions are highly diverse, distinguishing benign heterogeneity from adversarial manipulation becomes particularly challenging. Other approaches, such as FLTrust [23], introduce a trusted reference dataset to guide aggregation, while RSA [24] enforces robustness via regularization. These methods improve robustness to some extent but may suffer from distribution mismatch, increased computational overhead, or sensitivity to hyperparameters.

Differential privacy (DP) has become a standard tool for protecting sensitive local data in FL [12,13,14,15,25,26]. Methods such as DP-FedAvg [27] and FedDPSGD [28] inject Gaussian noise into local updates or aggregated results to achieve formal privacy guarantees. To alleviate the performance degradation caused by noise, adaptive mechanisms such as AdaClip [29] and Adaptive-DP-FL [30] dynamically adjust clipping bounds or noise scales based on training dynamics. Despite these improvements, most DP-based approaches assume honest participants and do not explicitly consider adversarial behaviors. In the presence of Byzantine clients, DP noise introduces additional uncertainty that can obscure malicious updates, making them difficult to distinguish from benign noisy updates. Consequently, the effectiveness of existing detection and filtering mechanisms may be significantly reduced.

Recent studies attempt to integrate DP and Byzantine robustness within a unified framework. For example, APRA-DP [18] combines adaptive noise tuning with robust aggregation in a two-stage manner. However, the interaction between noise adaptation and aggregation is not explicitly modeled, which may lead to suboptimal coordination. DPBFL [31] leverages Shuffle DP with sign-based aggregation to achieve end-to-end privacy and robustness guarantees. While theoretically appealing, its reliance on sign compression introduces bias and limits its applicability in non-convex optimization and heterogeneous settings. Similarly, Robust-DPFL [32] augments existing robust aggregators with fixed DP noise, but lacks adaptivity to training dynamics and attack intensity. In addition, several noise-aware or pre-training-based approaches, such as FLNAP [33] and MarkFL [34], aim to mitigate the impact of DP noise by improving initialization or training dynamics. However, these methods primarily focus on noise adaptation and do not explicitly address Byzantine robustness.

In contrast to existing approaches, DARA explicitly models the interaction between adaptive differential privacy noise and Byzantine perturbations. By jointly designing a dual-factor adaptive noise mechanism and a direction-aware aggregation strategy, DARA leverages gradient directional consistency to achieve robust and stable optimization under heterogeneous and adversarial IoT environments.

3. Background and Challenges

We consider a federated learning (FL) system deployed in an open Internet of Things (IoT) environment, where a central server coordinates N heterogeneous edge devices to collaboratively minimize a global objective without accessing raw local data, as illustrated in Figure 1. This architecture serves as the foundation for the system, perturbation, and threat models introduced below. The global optimization objective is formulated as

min_{w \in R^{d}} F (w) = \sum_{i = 1}^{N} p_{i} F_{i} (w)

(1)

where

F_{i} (w)

denotes the local objective function of client i, and

p_{i}

is the weight associated with client i, typically proportional to the size of its local dataset.

In practical IoT-enabled FL systems, the training process is jointly affected by multiple sources of perturbation that significantly impact both optimization stability and system security. These perturbations primarily arise from data heterogeneity, privacy-preserving noise injection, and adversarial client behaviors, which coexist and interact in open IoT environments.

Data heterogeneity (Non-IID)

Due to diverse sensing modalities, deployment locations, and user behaviors, local data distributions across IoT devices are inherently non-identically distributed. As a result, client gradients may deviate from the true global optimization direction, leading to slower convergence and unstable training dynamics.

Differential privacy noise

To prevent information leakage from model updates, differential privacy (DP) mechanisms inject random noise during local training or before aggregation. While providing formal privacy guarantees, DP noise inevitably increases gradient variance and aggregation uncertainty, which can significantly degrade optimization efficiency in resource-constrained IoT devices.

Byzantine attacks

In untrusted IoT environments, a subset of devices may be compromised through physical capture, malware infection, or firmware misconfiguration. Such Byzantine clients can upload arbitrarily manipulated or targeted updates, thereby biasing or disrupting the global model training process.

To formally characterize the joint effect of these factors, we introduce a unified system and perturbation model in the following.

3.1. System Model

We consider an FL system consisting of N IoT clients

{C_{1}, C_{2}, \dots, C_{N}}

and a central server. Each client

C_{i}

holds a private dataset

D_{i} = {(x_{i, j}, y_{i, j})}_{j = 1}^{n_{i}}

sampled from an underlying distribution

P_{i}

. Owing to heterogeneous sensing environments and deployment conditions, the local data distributions are generally non-identically distributed, i.e.,

P_{i} \neq P_{k}

for

i \neq k

.

At communication round t, due to intermittent connectivity, energy constraints, and device heterogeneity, only a subset of clients

S_{t} \subseteq {1, \dots, N}

is available to participate in training. Each participating client

C_{i} \in S_{t}

computes its local gradient as

g_{i, t} = \nabla F_{i} (w_{t}) = \nabla F (w_{t}) + ξ_{i, t}

(2)

where

\nabla F (w_{t})

denotes the true global gradient and

ξ_{i, t}

represents the statistical bias induced by non-IID data distributions across heterogeneous IoT devices.

3.2. Perturbation Model

The server does not directly observe the true local gradients

g_{i, t}

, but instead receives perturbed updates uploaded by the clients. Specifically, the received update from client i at round t can be expressed as

{\tilde{g}}_{i, t} = \nabla F (w_{t}) + ξ_{i, t} + z_{i, t} + b_{i, t}

(3)

where:

$ξ_{i, t}$ denotes the statistical offset caused by non-IID data heterogeneity;
$z_{i, t} \sim N (0, σ_{i, t}^{2} I)$ represents the Gaussian noise injected by client i for differential privacy;
$b_{i, t}$ denotes an arbitrary perturbation introduced by a Byzantine client.

The noise scale

σ_{i, t}

is allowed to vary across clients and communication rounds to accommodate heterogeneous device behaviors and dynamic training states in IoT environments. Equation (3) highlights the superposition of benign statistical noise and adversarial perturbations at the server side, which fundamentally complicates robust aggregation under differential privacy constraints and motivates the need for joint robustness–privacy modeling.

3.3. Threat Model

We adopt a partial adversarial setting, where a subset of clients

A \subseteq {1, \dots, N}

is controlled by an attacker, with ratio

ρ = | A | / N

. Such adversarial behaviors naturally arise in IoT systems due to device capture, malware infection, or firmware compromise.

The attacker is assumed to have the following capabilities:

The ability to upload arbitrary perturbations $b_{i, t}$ in any training round;
Access to the current global model $w_{t}$ , enabling the construction of directional or targeted updates;
The capability to perform common Byzantine attacks, such as sign-flipping and random-noise injection.

The attack objective is to maximize the degradation of the global model performance:

max_{{b_{i, t}}_{i \in A}} E [L (w_{T}; D_{test})]

(4)

Meanwhile, attackers are subject to realistic constraints: they cannot access other clients’ private data, cannot tamper with the communication protocol, and their identities remain unknown to the server.

3.4. Research Objectives

Driven by the practical constraints of IoT systems, including limited device resources, heterogeneous data distributions, stringent privacy requirements, and persistent security threats, the server aims to design an aggregation function

A (\cdot)

such that the global update

w_{t + 1} = w_{t} - η_{t} A ({{\tilde{g}}_{i, t}})

(5)

simultaneously satisfies the following objectives:

Robustness against Byzantine attacks;
Formal $(ε, δ)$ -differential privacy guarantees;
Aggregation directions consistent with the true global gradient;
Adaptivity to training dynamics, privacy noise intensity, and attack strength.

Based on the above formulation, we propose DARA, a dual-factor adaptive robust aggregation framework that jointly addresses privacy preservation and Byzantine robustness at both the client and server levels in IoT-enabled federated learning systems.

4. The DARA Framework

In this section, we present the proposed DARA framework in detail (the architecture is shown in Figure 2). DARA aims to achieve a balance between privacy protection, robustness, and convergence in federated learning through the mechanisms of “dual-factor adaptive noise scheduling” and “directional robust aggregation”.

4.1. Framework Overview

DARA performs the following three stages in each round of federated training:

1.: Local Update with Adaptive Differential Privacy: Each client computes the gradient based on local data and adds Gaussian noise according to the locally adaptive noise standard deviation $σ_{i, t}$ .
2.: Directional Robust Aggregation: The server side adaptively filters abnormal updates based on the consistency of gradient directions and confidence weights from the client side, achieving robust aggregation.
3.: Global Model Synchronization: The aggregated global model parameters are broadcast to each client and proceed to the next round of training.

These three stages are designed to jointly address the privacy–robustness–convergence tension in IoT federated learning, where adaptive noise protects sensitive local updates, while direction-aware aggregation suppresses Byzantine perturbations amplified by DP noise.

4.2. Client-Side Update with Dual-Factor Adaptive Differential Privacy

Traditional differential privacy federated learning methods typically employ fixed noise intensity or simple decay strategies that rely solely on training epochs, making it difficult to strike a balance between privacy protection and model convergence performance. Fixed noise may be insufficient to protect highly sensitive gradients in the early stages of training; in later stages, it may significantly interfere with the model’s approximation of the optimal solution. To alleviate these contradictions, this paper proposes a two-factor adaptive differential privacy noise scheduling mechanism on the client side, based on both the training progress and the current local update magnitude.

In the t training round, client i computes the model update on the local dataset:

g_{i, t} = \nabla L_{i} (w_{t - 1})

. The corresponding differential privacy noise scale is defined as

σ_{i, t} = σ_{0} exp (- k t) (1 + α ∥ g_{i, t} ∥^{β})

(6)

where

σ_{0}

represents the initial noise amplitude, k controls the exponential decay rate of the noise with each training epoch, and

α

and

β

are hyperparameters that adjust the effect of the current local update amplitude on the noise intensity.

Subsequently, the client locally prunes the model update and injects Gaussian noise, and the perturbation update uploaded to the server is as follows:

{\tilde{g}}_{i, t} = clip (g_{i, t}) + N (0, σ_{i, t}^{2} I)

(7)

The above noise dispatching mechanism consists of the following two parts working together:

Training progress factor $exp (- k t)$ : To enhance privacy, a higher noise level is maintained in the early stages of training. As training progresses, the noise level is gradually reduced to mitigate interference with model convergence.
Update amplitude adjustment factor $(1 + α ∥ g_{i, t} ∥^{β})$ : When the client updates significantly, stronger noise is applied to reduce the risk of potential privacy leaks; when the update is small, the noise intensity is automatically reduced to retain more effective learning signals.

The randomization process described above is entirely performed locally on the client, constituting client-level differential privacy (DP). Based on the post-processing invariance of differential privacy, subsequent aggregation and weighting operations on the server side will not weaken this privacy guarantee. The algorithm flow is shown in Algorithm 1.

Algorithm 1: Client-side update with dual-factor adaptive differential privacy.

Input: Global model

w_{t - 1}

, learning rate

η

, base noise scale

σ_{0}

, round index t, adaptive parameters

(k, α, β)

Output: Noisy local update

{\tilde{g}}_{i, t}

Client i executes the following steps:

1.: Local gradient computation: Compute the gradient of the local loss function:

$g_{i, t} = \nabla L_{i} (w_{t - 1}) .$
2.: Adaptive noise scaling: Adjust the noise scale according to the training round and gradient magnitude:

$σ_{i, t} = σ_{0} exp (- k t) (1 + α ∥ g_{i, t} ∥^{β}) .$
3.: Gradient clipping and perturbation:

$\begin{matrix} {\bar{g}}_{i, t} & = clip (g_{i, t}), \\ η_{i, t} & \sim N (0, σ_{i, t}^{2} I), \\ {\tilde{g}}_{i, t} & = {\bar{g}}_{i, t} + η_{i, t} . \end{matrix}$
4.: Update: Send the privatized update ${\tilde{g}}_{i, t}$ to the server.

4.3. Server-Side Direction-Aware Robust Aggregation

After receiving privacy perturbation updates uploaded by each client, the server employs a robust aggregation strategy based on directional consistency. This method is based on the observation that in non-independent identically distributed (non-IID) scenarios, the update magnitudes of different clients may differ significantly, but their optimization directions usually maintain a certain consistency; Byzantine or anomalous clients, on the other hand, often exhibit significant deviations in their update directions.

Let

{\tilde{g}}_{i, t}

represent the perturbation update uploaded by client i in round t. The server first calculates the global average update direction for the current round:

{\bar{g}}_{t} = \frac{1}{| S_{t} |} \sum_{i \in S_{t}} {\tilde{g}}_{i, t}

(8)

Subsequently, cosine similarity is used to characterize the consistency between each client update and the reference direction:

s_{i, t} = \frac{{\tilde{g}}_{i, t}^{⊤} {\bar{g}}_{t}}{∥ {\tilde{g}}_{i, t} ∥ \cdot ∥ {\bar{g}}_{t} ∥}

(9)

Based on directional consistency, the server assigns continuous soft-aggregate weights to the client:

γ_{i, t} = \frac{exp (λ s_{i, t})}{\sum_{j \in S_{t}} exp (λ s_{j, t})}

(10)

The

λ

parameter controls the sensitivity of the aggregation weights to directional deviations. This softmax structure avoids direct elimination based on hard thresholds, achieving a good balance between robustness and optimization stability.

Based on this, the server further considers the deviation of the client update relative to the group mean:

r_{i, t} = ∥ {\tilde{g}}_{i, t} - {\bar{g}}_{t} ∥

(11)

This is used to mitigate anomalous updates that deviate significantly from the normal client distribution in magnitude, thereby enhancing defense against Byzantine attacks. The magnitude deviation

r_{i, t}

can be optionally incorporated to further regularize aggregation weights and bound adversarial update energy. In this work, we deliberately emphasize directional consistency as the primary robustness signal, which aligns with our theoretical analysis where Byzantine residuals are progressively attenuated through adaptive weighting.

Finally, the global model parameters are iterated according to the weighted update rule:

w_{t + 1} = w_{t} - η_{t} \sum_{i \in S_{t}} γ_{i, t} {\tilde{g}}_{i, t}

(12)

where

η_{t}

represents the learning rate. The complete server-side aggregation process is shown in Algorithm 2.

The proposed DARA framework introduces no additional communication overhead compared with standard FedAvg, since each client uploads a single d-dimensional model update per round. The client-side adaptive noise scheduling incurs

O (d)

computational complexity, while the server-side direction-aware aggregation requires

O (n d)

operations for n participating clients with model dimension d. Therefore, the overall computational complexity remains linear in both the number of clients and model parameters, making DARA scalable to large-scale IoT federated settings.

Algorithm 2: Server-side direction-aware robust aggregation (DARA).

5. Security and Privacy Analysis

This section provides an intuitive analysis of the robustness and stability of DARA when differential privacy noise, data heterogeneity, and Byzantine attacks coexist in IoT-enabled federated learning systems.

5.1. Robustness Against Byzantine Attacks

Under the unified perturbation model introduced in Section 3, the aggregated updates received by the server contain benign statistical deviations, differential privacy noise, and potential Byzantine perturbations. In the absence of attacks, benign client updates remain aligned in expectation with the true descent direction

\nabla F (w_{t})

.

Byzantine clients, however, may upload arbitrary updates whose directions are typically uncorrelated with the global gradient. DARA mitigates such attacks through a direction-aware weighting mechanism based on cosine similarity. Updates that are consistent with the majority direction receive larger aggregation weights, while abnormal updates are naturally suppressed.

Consequently, the expected aggregation weights satisfy

E [γ_{i, t} ∣ i \in H] ≫ E [γ_{j, t} ∣ j \in A],

indicating that benign clients dominate the aggregation process and that the influence of adversarial perturbations is significantly reduced.

5.2. Upper Bound on Byzantine Aggregation Weight

We further derive an upper bound on the aggregation weight assigned to Byzantine clients under the proposed softmax weighting mechanism.

Let

S_{t}

denote the set of participating clients at round t, where

| S_{t} | = n

and f clients are Byzantine. The aggregation weight is defined as

γ_{i, t} = \frac{exp (λ s_{i, t})}{\sum_{j \in S_{t}} exp (λ s_{j, t})},

where

s_{i, t}

denotes the cosine similarity between the client update and the reference direction.

Assume that benign clients satisfy

s_{b, t} \geq s_{b}

and Byzantine clients satisfy

s_{m, t} \leq s_{m}

, where

s_{b} > s_{m}

. Then, the aggregation weight of any Byzantine client is bounded by

γ_{i, t} \leq \frac{exp (λ s_{m})}{(n - f) exp (λ s_{b}) + f exp (λ s_{m})} .

This bound shows that the influence of Byzantine clients decreases exponentially as

λ (s_{b} - s_{m})

increases, ensuring that benign updates dominate the aggregation process.

5.3. Impact of Adaptive Differential Privacy Noise

On the client side, DARA adopts a dual-factor adaptive noise scheduling mechanism, where the noise scale varies according to the training stage and the magnitude of local gradients. This design enables a dynamic privacy–utility trade-off.

In early training stages, relatively larger noise levels provide stronger privacy protection. As training progresses and gradients stabilize, the noise magnitude decreases in expectation, preventing excessive randomness from dominating the optimization process. After aggregation, the influence of privacy noise on the global update can be bounded in expectation by the average noise energy across clients.

5.4. Stability Under Mixed Perturbations

Combining the bounded influence of Byzantine updates with the controlled energy of adaptive privacy noise, the aggregated update direction remains aligned with the true gradient in expectation under standard smoothness assumptions.

Therefore, as long as the malicious client ratio and privacy noise scale remain within reasonable ranges, DARA preserves a descent direction during training. This property enables stable convergence toward a stationary point even when data heterogeneity, differential privacy noise, and Byzantine attacks coexist.

5.5. Privacy Analysis

We analyze the privacy guarantees provided by DARA under the proposed adaptive differential privacy mechanism.

5.5.1. Differential Privacy Mechanism

At communication round t, each participating client uploads a privatized update

{\hat{g}}_{i, t} = {\tilde{g}}_{i, t} + N (0, σ_{i, t}^{2} I),

where

{\tilde{g}}_{i, t}

denotes the clipped local update and

σ_{i, t}

is the noise scale determined by the proposed dual-factor adaptive scheduling strategy.

For any fixed

σ_{i, t}

, this mechanism corresponds to the standard Gaussian mechanism and satisfies

(ε_{i, t}, δ)

-differential privacy with respect to the local dataset of client i.

5.5.2. Privacy Composition

Although the noise scale

σ_{i, t}

varies across clients and communication rounds, the overall privacy guarantee remains valid because the adaptive rule depends only on local training statistics (e.g., gradient norms and training epochs).

Using standard privacy accounting methods such as Rényi differential privacy (RDP), the total privacy loss after T rounds can be bounded by

ε_{i} \leq \sum_{t = 1}^{T} ε_{i, t},

for a given

δ

, where

ε_{i, t}

is determined by the corresponding noise scale

σ_{i, t}

.

5.5.3. Privacy–Utility Trade-Off

The adaptive noise scheduling in DARA provides a flexible privacy–utility trade-off. Larger noise scales can offer stronger privacy protection during early training stages, while smaller noise levels in later stages help reduce optimization instability caused by excessive noise.

6. Experiments

This section presents a comprehensive experimental evaluation of the proposed DARA framework. The experiments are designed to examine the empirical robustness and stability of DARA under Byzantine client behaviors and differential privacy noise. (i) Can DARA effectively defend against Byzantine attacks? (ii) How does DARA behave in terms of convergence stability under strong data heterogeneity and DP noise? (iii) How does DARA balance robustness, convergence speed, and privacy? All experiments are repeated three times, and we report the mean accuracy along with the standard deviation to ensure statistical reliability. All experiments were conducted using Python 3.10 and PyTorch 2.1. and conducted on a server equipped with an Intel i7-14700K CPU and an NVIDIA RTX 4090 GPU.

6.1. Experimental Setup

6.1.1. Datasets and Data Heterogeneity

We evaluate all methods on the MNIST, CIFAR-10 and CIFAR-100 datasets using the official training and test splits. To simulate different degrees of data heterogeneity, the training data are partitioned among clients using a Dirichlet distribution with concentration parameter

α \in {0.5, 1, 5, \infty}

, where a smaller

α

indicates stronger non-IID behavior. Unless otherwise specified,

α = 0.5

is used. We assume a network of N = 50 users, where

80 %

of users are randomly sampled to participate in each communication round and the total number of communication rounds is

T = 100 / 200 / 500

.

6.1.2. Models and Training Protocol

For MNIST, a two-layer convolutional neural network is adopted, while a ResNet-18 model is used for CIFAR-10 and CIFAR-100. All methods employ SGD with momentum

0.9

. The local batch size is set to 64, and each client performs two local epochs per round. The initial learning rate is set to

0.01

and follows a decaying schedule shared by all methods.

We compare DARA with the following representative aggregation methods: FedAvg, Multi-Krum, Median, Bulyan, and Robust-DPFL. All experiments are repeated three times with different random seeds, and we report the mean and standard deviation.

6.1.3. Attack Models

To evaluate the behavior of different aggregation methods under Byzantine client behaviors, we consider the following attack scenarios:

Sign-flip attack: Malicious clients upload updates with reversed gradient directions, deliberately driving the global model toward increasing loss.
Random attack: Malicious clients submit randomly generated updates, injecting strong noise into the aggregation process.
No attack: All clients behave benignly, serving as a control setting.

Unless otherwise stated, $20 %$ of clients are Byzantine, which is commonly adopted in prior federated learning robustness studies.

6.1.4. Discussion on Hyperparameters

The performance of DARA is influenced by several key hyperparameters, including

λ

in the server-side aggregation and

α, β

in the adaptive noise mechanism. The parameter

λ

controls the sensitivity of the softmax weighting to directional consistency. A larger

λ

increases the discrimination between aligned and misaligned updates, thereby improving robustness against Byzantine attacks. However, it may also suppress benign updates under highly non-IID settings. The parameters

α

and

β

jointly regulate the dependence of the noise scale on the local gradient magnitude. Larger values result in stronger noise injection for large updates, enhancing privacy protection but potentially slowing down convergence.

6.1.5. Privacy Configuration

Client-level differential privacy is enforced via gradient clipping with norm bound

C = 1.0

, followed by Gaussian noise injection with scale

σ = 0.5

. The privacy loss is tracked using Rényi differential privacy (RDP) accounting. Unless otherwise specified, the target privacy budget is set to

(ε = 5, δ = 10^{- 5})

. Under the adaptive noise schedule, the cumulative privacy loss is conservatively estimated based on the averaged per-round noise scale, resulting in

ε \in [4.91, 5.07]

. All DP-enabled baselines are calibrated to achieve a comparable final privacy budget.

6.2. Experimental Results

Figure 3, Figure 4 and Figure 5 present the test accuracy and loss curves on MNIST, CIFAR-10 and CIFAR-100 under different attack scenarios. In the absence of attacks, all methods achieve comparable convergence performance, indicating that DARA does not sacrifice accuracy in benign settings. Under random and sign-flip attacks, however, FedAvg, Median, Robust-DPFL and Multi-Krum exhibit significant oscillations or even divergence, due to their sensitivity to noisy or adversarial updates. In contrast, DARA demonstrates more stable convergence behavior and consistently achieves competitive or improved final accuracy across all evaluated settings. These observations indicate that direction-aware aggregation effectively suppresses updates with large directional deviations, while the dual-factor adaptive noise scheduling reduces the adverse impact of DP noise in later training stages. As a result, DARA achieves a favorable balance among privacy protection, robustness, and convergence stability under mixed perturbations.

6.3. Analysis of Internal Dynamics

To further validate the theoretical insights in Section 5, we monitor several internal quantities during training, including the aggregated noise energy

a_{t}

, the Byzantine residual proxy

r_{t}

, and the global gradient bias

ξ_{t}

.

As shown in Figure 6,

a_{t}

demonstrates a decreasing trend in expectation, while exhibiting persistent oscillations over training rounds. Such behavior reflects the dynamic interaction between adaptive noise control and adversarial perturbations in a non-convex federated optimization setting.

r_{t}

shows pronounced spikes during attack phases; however, it is rapidly suppressed by the direction-aware consistency filtering mechanism. In contrast, the behavior of

ξ_{t}

suggests that inherent non-IID data heterogeneity is not overly amplified by the aggregation mechanism in our experimental setting. Overall, these empirical observations are consistent with the theoretical analysis, providing additional insights into the convergence behavior and adaptive properties of the proposed framework.

6.4. Ablation Study

6.4.1. Ablation Study on Client-Side DP and Server-Side Aggregation

To isolate the respective contributions of client-side differential privacy and server-side robust aggregation, we conduct a systematic ablation study by selectively enabling or disabling each component. The results are reported in Table 1.

When neither differential privacy nor robust aggregation is applied (None + FedAvg), the model converges rapidly under benign conditions but becomes highly unstable in the presence of Byzantine attacks, highlighting the lack of robustness in standard aggregation. Introducing fixed client-side DP noise without robustness (Fixed DP + FedAvg) leads to a substantial performance degradation, as the server-side aggregator is unable to distinguish DP-induced stochastic noise from adversarial updates.

When adaptive DP is applied alone with standard FedAvg (Adaptive DP + FedAvg), the training process becomes noticeably more stable compared to the fixed DP setting, demonstrating the benefit of dynamically adjusting the noise scale. Nevertheless, the model remains vulnerable to strong Byzantine behaviors, indicating that privacy mechanisms alone are insufficient to guarantee robustness.

In contrast, combining fixed DP with the proposed direction-aware robust aggregation (Fixed DP + DARA) significantly improves robustness against Byzantine attacks. However, convergence is still impeded by excessive noise in the later training stages, which limits the final model accuracy. The full method (Adaptive DP + DARA) achieves the best overall performance, demonstrating that client-side adaptive DP and server-side direction-aware aggregation are highly complementary. These results highlight the importance of jointly considering client-side adaptive DP and server-side aggregation when aiming for stable training behavior under adversarial and privacy constraints. For the fixed DP setting, the noise scale

σ

is fixed to the initial value used by the adaptive scheme and remains constant throughout training, ensuring that all methods operate under a comparable privacy budget.

6.4.2. Ablation Study on Adaptive DP Noise Scheduling

We further investigate the impact of different client-side DP noise scheduling strategies, as summarized in Table 2. This ablation focuses exclusively on the privacy mechanism, while keeping the server-side aggregation method fixed. Here, rounds denote the number of communication rounds required to reach a predefined target accuracy and serve as a proxy for the convergence speed. In the absence of any DP noise, the model achieves the highest accuracy but offers no formal privacy guarantee. Applying a fixed noise scale throughout training introduces persistent and excessive perturbations, resulting in slow convergence and degraded final performance. A simple round-based annealing strategy partially mitigates this issue by gradually reducing the noise magnitude; however, it fails to account for client-level update dynamics and therefore provides limited improvement.

In contrast, the proposed dual-factor adaptive noise scheduling strategy shows improved performance compared with the considered DP variants. By jointly considering the training progress and the magnitude of local updates, the adaptive mechanism effectively suppresses unstable updates while preserving informative gradients in later stages. These results suggest that adaptive DP noise scheduling can play an important role in alleviating the performance degradation caused by fixed or globally decaying noise schemes.

6.5. Limitations

Despite the promising performance of DARA, several limitations should be noted. The direction-aware aggregation mechanism assumes that benign client updates exhibit relatively consistent directional patterns; however, this assumption may be weakened when adversarial clients craft updates that closely mimic benign directions, thereby reducing the discriminative power of cosine similarity. In addition, while the threat model allows adversarial behaviors to occur at arbitrary communication rounds, the current experimental evaluation primarily focuses on static attack scenarios in which malicious clients participate from the beginning of training. More sophisticated dynamic or adaptive attacks, where adversaries strategically change their behavior over time, may further challenge the stability of direction estimation and aggregation. The current evaluation focuses on commonly adopted Byzantine attack models, which, although standard in the literature, do not fully capture more advanced adaptive or colluding adversaries that may explicitly exploit aggregation rules. When the proportion of malicious clients becomes sufficiently large, the estimated reference direction may also be biased, potentially affecting aggregation stability. Finally, while the adaptive differential privacy mechanism improves the privacy–utility trade-off, strong noise levels under strict privacy budgets may still distort gradient directions and degrade robustness. These limitations highlight the challenges of jointly addressing privacy and robustness in adversarial federated learning. Exploring defenses against dynamic attacks and extreme adversarial conditions remains an important direction for future work.

7. Conclusions

This paper studied robust and privacy-preserving federated learning in open and heterogeneous environments, where non-IID bias, differential privacy noise, and Byzantine perturbations coexist. We established a unified perturbation model to characterize the coupled impact of statistical heterogeneity, stochastic noise injection, and adversarial behaviors, highlighting the intrinsic interaction between robustness and privacy. Based on this formulation, we proposed DARA, a dual-factor adaptive robust aggregation framework that integrates client-side adaptive differential privacy with server-side direction-aware soft aggregation. The adaptive noise scheduling mechanism dynamically balances privacy protection and optimization stability, while the directional consistency weighting suppresses abnormal updates without relying on hard elimination rules. Extensive experiments demonstrated that DARA achieves improved robustness against Byzantine attacks and stable convergence under strong heterogeneity and privacy constraints. The framework is particularly suitable for large-scale IoT environments, where devices operate with distributed data, privacy requirements, and limited trust assumptions. Future work will consider asynchronous federated settings and communication-efficient extensions in large-scale edge networks.

Author Contributions

Conceptualization, Z.S. and H.W.; methodology, Z.S. and G.Z.; software, Z.S. and G.Z.; validation, Z.S. and H.W.; formal analysis, Z.S.; investigation, Z.S. and G.Z.; writing—original draft preparation, Z.S.; writing—review and editing, Z.S. and H.W.; supervision, W.T. and J.W.; funding acquisition, W.T. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported by the National Natural Science Foundation of China under Grant 62272199.

Data Availability Statement

The data used in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.y. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR, Lauderdale, FL, USA, 20–22 April 2017; Volume 54, pp. 1273–1282. [Google Scholar]
Kairouz, P.; McMahan, H.B.; Avent, B. Advances and Open Problems in Federated Learning. arXiv 2019, arXiv:1912.04977. [Google Scholar]
Wang, S.; Tuor, T.; Salonidis, T.; Leung, K.K.; Makaya, C.; He, T.; Chan, K. Adaptive Federated Learning in Resource Constrained Edge Computing Systems. IEEE J. Sel. Areas Commun. 2019, 37, 1205–1221. [Google Scholar] [CrossRef]
Fang, M.; Cao, X.; Jia, J.; Gong, N. Local Model Poisoning Attacks to Byzantine-Robust Federated Learning. In Proceedings of the 29th USENIX Security Symposium (USENIX Security 20); USENIX Association: Berkeley, CA, USA, 2020; pp. 1605–1622. [Google Scholar]
Bhagoji, A.N.; Chakraborty, S.; Mittal, P.; Calo, S. Analyzing Federated Learning through an Adversarial Lens. In Proceedings of the 36th International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 634–643. [Google Scholar]
Lyu, L.; Yu, H.; Ma, X. Privacy and Robustness in Federated Learning: Attacks and Defenses. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 8726–8746. [Google Scholar] [CrossRef] [PubMed]
Bagdasaryan, E.; Veit, A.; Hua, Y.; Estrin, D.; Shmatikov, V. How To Backdoor Federated Learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics PMLR, Online, 26–28 August 2020; Volume 108, pp. 2938–2948. [Google Scholar]
Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proceedings of the ACM SIGSAC Conference Computer and Communications Security (CCS), Dallas, TX, USA, 30 Octobe–3 November 2017; pp. 1175–1191. [Google Scholar] [CrossRef]
Blanchard, P.; El Mhamdi, E.M.; Guerraoui, R.; Stainer, J. Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Yin, D.; Chen, Y.; Kannan, R.; Bartlett, P. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates. In Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 5650–5659. [Google Scholar]
Mao, Y.; Ye, Z.; Yuan, X.; Zhong, S. Secure Model Aggregation Against Poisoning Attacks for Cross-Silo Federated Learning With Robustness and Fairness. IEEE Trans. Inf. Forensics Secur. 2024, 19, 6321–6336. [Google Scholar] [CrossRef]
Wang, Q.; Li, Z.; Zou, Q.; Zhao, L.; Wang, S. Deep Domain Adaptation with Differential Privacy. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3093–3106. [Google Scholar] [CrossRef]
Liu, H.; Li, C.; Liu, B.; Wang, P.; Ge, S.; Wang, W. Differentially Private Learning with Grouped Gradient Clipping. In Proceedings of the 3rd ACM International Conference on Multimedia in Asia; Association for Computing Machinery: New York, NY, USA, 2022. [Google Scholar] [CrossRef]
Jiang, H.; Pei, J.; Yu, D.; Yu, J.; Gong, B.; Cheng, X. Applications of Differential Privacy in Social Network Analysis: A Survey. IEEE Trans. Knowl. Data Eng. 2023, 35, 108–127. [Google Scholar] [CrossRef]
Yang, X.; Huang, W.; Ye, M. Dynamic Personalized Federated Learning with Adaptive Differential Privacy. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2023; Volume 36, pp. 72181–72192. [Google Scholar]
Xu, G.; Li, H.; Zhang, Y.; Xu, S.; Ning, J.; Deng, R.H. Privacy-preserving federated deep learning with irregular users. IEEE Trans. Dependable Secur. Comput. 2020, 19, 1364–1381. [Google Scholar] [CrossRef]
Zhang, M.; Jin, Z.; Hou, J.; Luo, R. Resilient Mechanism Against Byzantine Failure for Distributed Deep Reinforcement Learning. In 2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE); IEEE: Piscataway, NJ, USA, 2022; pp. 378–389. [Google Scholar] [CrossRef]
Xia, G.; Chen, J.; Huang, X.; Wu, J.; Huang, H.; Yu, C.; Zhang, Y.; Cai, Z. APRA-DP: Differential Privacy based Adaptive Privacy Preserving Robust Aggregation Method for Federated Learning. In Proceedings of the 2025 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN), Barcelona, Spain, 26–29 May 2025; pp. 1–7. [Google Scholar] [CrossRef]
Sun, P.; Che, H.; Wang, Z.; Wang, Y.; Wang, T.; Wu, L.; Shao, H. Pain-FL: Personalized Privacy-Preserving Incentive for Federated Learning. IEEE J. Sel. Areas Commun. 2021, 39, 3805–3820. [Google Scholar] [CrossRef]
So, J.; Güler, B.; Avestimehr, A.S. Byzantine-resilient secure federated learning. IEEE J. Sel. Areas Commun. 2020, 39, 2168–2181. [Google Scholar] [CrossRef]
El Mhamdi, E.M.; Guerraoui, R.; Rouault, S. The Hidden Vulnerability of Distributed Learning in Byzantium. In Proceedings of the International Conference on Machine Learning PMLR, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 3521–3530. [Google Scholar]
Fung, C.; Yoon, C.J.M.; Beschastnikh, I. The Limitations of Federated Learning in Sybil Settings. In Proceedings of the Symposium on Research in Attacks, Intrusion, and Defenses RAID, Virtually, 14–16 October 2020. [Google Scholar]
Cao, X.; Fang, M.; Liu, J.; Gong, N.Z. FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping. In Proceedings of the 28th Annual Network and Distributed System Security Symposium, NDSS 2021, Virtually, 21–25 February 2021; The Internet Society: Fredericksburg, VA, USA, 2021. [Google Scholar]
Li, L.; Xu, W.; Chen, T.; Giannakis, G.B.; Ling, Q. RSA: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 1544–1551. [Google Scholar]
Niu, B.; Chen, Y.; Wang, B.; Wang, Z.; Li, F.; Cao, J. AdaPDP: Adaptive Personalized Differential Privacy. In Proceedings of the IEEE INFOCOM 2021–IEEE Conference on Computer Communications, Vancouver, BC, Canada, 10–13 May 2021; pp. 1–10. [Google Scholar] [CrossRef]
Truex, S.; Liu, L.; Chow, K.H.; Gursoy, M.E.; Wei, W. LDP-Fed: Federated Learning with Local Differential Privacy. In Proceedings of the ACM Int. Workshop on Edge Systems, Analytics and Networking (EdgeSys), Heraklion, Greece, 27 April 2020; pp. 61–66. [Google Scholar] [CrossRef]
Fang, X.; Ye, M. Robust Federated Learning with Noisy and Heterogeneous Clients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–20 June 2022; pp. 10072–10081. [Google Scholar]
Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar]
Hu, Z.; Ye, A.N.; Hosseini Khorasgani, S.; Mohomed, I. MM ’23: AdaCLIP: Towards Pragmatic Multimodal Video Retrieval; Association for Computing Machinery: New York, NY, USA, 2023; pp. 5623–5633. [Google Scholar] [CrossRef]
Xue, R.; Xue, K.; Zhu, B.; Luo, X.; Zhang, T.; Sun, Q.; Lu, J. Differentially Private Federated Learning With an Adaptive Noise Mechanism. IEEE Trans. Inf. Forensics Secur. 2024, 19, 74–87. [Google Scholar] [CrossRef]
Ma, X.; Sun, X.; Wu, Y.; Liu, Z.; Chen, X.; Dong, C. Differentially Private Byzantine-Robust Federated Learning. IEEE Trans. Parallel Distrib. Syst. 2022, 33, 3690–3701. [Google Scholar] [CrossRef]
Qi, T.; Wang, H.; Huang, Y. Towards the robustness of differentially private federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024; pp. 19911–19919. [Google Scholar]
Lian, Z.; Wang, W.; Zhang, C.; Su, C.; Sakurai, K. MarkFL: Efficient Watermarking in Federated Learning via Parallel Training and Weighted Averaging. IEEE Trans. Comput. Soc. Syst. 2025, 1–8. [Google Scholar] [CrossRef]
Zeng, Q.; Lian, Z.; Niu, H.; Song, J.; Zhao, H.; Su, C. FLNAP: Mitigating LDP Impact in Federated Learning with Noise-Aware Pre-Training. In Proceedings of the 2025 17th International Conference on Advanced Infocomm Technology (ICAIT), Liaocheng, China, 24–27 October 2025; pp. 128–134. [Google Scholar] [CrossRef]

Figure 1. Illustration of a federated learning architecture deployed in an open IoT environment, where heterogeneous edge devices perform local training and a central server aggregates uploaded updates.

Figure 2. Overall architecture of the proposed DARA framework. Clients perform local training with gradient clipping and adaptive differential privacy noise. The server aggregates updates using direction-aware weighting based on cosine similarity and broadcasts the updated global model for the next round.

Figure 3. Test accuracy of DARA and baselines with MNIST dataset under different attack settings.

Figure 4. Test accuracy and convergence of DARA and baselines with CIFAR-10 dataset under different attack settings.

Figure 5. Test accuracy of DARA and baselines with CIFAR-100 dataset under different attack settings.

Figure 6. Evolution of monitoring indicators during training.

Table 1. Ablation study of client-side DP and server-side aggregation.

Client DP	Server Aggregation	Description	Acc (%)
None	FedAvg	No privacy, no robustness	$80.85 \pm 0.24$
Fixed DP	FedAvg	Fixed noise without robustness	$72.34 \pm 0.32$
Adaptive DP	FedAvg	Adaptive DP only	$75.48 \pm 0.43$
Fixed DP	DARA	Robust aggregation only	$75.12 \pm 0.27$
Adaptive DP	DARA	Full proposed method	$78.24 \pm 0.29$

Table 2. Ablation study of client-side DP noise scheduling.

Noise Schedule	Description	Attack	Acc (%)	Rounds
None	No differential privacy	None	$80.65 \pm 0.32$	$125 \pm 3$
Fixed $σ$	Constant Gaussian noise	sign-flip	$76.85 \pm 0.25$	$165 \pm 2$
$σ_{t} = σ_{0} e^{- k t}$	Round-based annealing	sign-flip	$77.85 \pm 0.38$	$152 \pm 4$
Adaptive $σ_{i, t}$	Dual-factor adaptive DP (Ours)	sign-flip	$78.24 \pm 0.29$	$134 \pm 2$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Song, Z.; Tan, W.; Wang, H.; Zhang, G.; Weng, J. Dual-Factor Adaptive Robust Aggregation for Secure Federated Learning in IoT Networks. Future Internet 2026, 18, 201. https://doi.org/10.3390/fi18040201

AMA Style

Song Z, Tan W, Wang H, Zhang G, Weng J. Dual-Factor Adaptive Robust Aggregation for Secure Federated Learning in IoT Networks. Future Internet. 2026; 18(4):201. https://doi.org/10.3390/fi18040201

Chicago/Turabian Style

Song, Zuan, Wuzheng Tan, Hailong Wang, Guilong Zhang, and Jian Weng. 2026. "Dual-Factor Adaptive Robust Aggregation for Secure Federated Learning in IoT Networks" Future Internet 18, no. 4: 201. https://doi.org/10.3390/fi18040201

APA Style

Song, Z., Tan, W., Wang, H., Zhang, G., & Weng, J. (2026). Dual-Factor Adaptive Robust Aggregation for Secure Federated Learning in IoT Networks. Future Internet, 18(4), 201. https://doi.org/10.3390/fi18040201

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dual-Factor Adaptive Robust Aggregation for Secure Federated Learning in IoT Networks

Abstract

1. Introduction

2. Related Work

3. Background and Challenges

3.1. System Model

3.2. Perturbation Model

3.3. Threat Model

3.4. Research Objectives

4. The DARA Framework

4.1. Framework Overview

4.2. Client-Side Update with Dual-Factor Adaptive Differential Privacy

4.3. Server-Side Direction-Aware Robust Aggregation

5. Security and Privacy Analysis

5.1. Robustness Against Byzantine Attacks

5.2. Upper Bound on Byzantine Aggregation Weight

5.3. Impact of Adaptive Differential Privacy Noise

5.4. Stability Under Mixed Perturbations

5.5. Privacy Analysis

5.5.1. Differential Privacy Mechanism

5.5.2. Privacy Composition

5.5.3. Privacy–Utility Trade-Off

6. Experiments

6.1. Experimental Setup

6.1.1. Datasets and Data Heterogeneity

6.1.2. Models and Training Protocol

6.1.3. Attack Models

6.1.4. Discussion on Hyperparameters

6.1.5. Privacy Configuration

6.2. Experimental Results

6.3. Analysis of Internal Dynamics

6.4. Ablation Study

6.4.1. Ablation Study on Client-Side DP and Server-Side Aggregation

6.4.2. Ablation Study on Adaptive DP Noise Scheduling

6.5. Limitations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI