An Asynchronous Federated Learning Aggregation Method Based on Adaptive Differential Privacy

Wu, Jiawen; Xia, Geming; Huang, Hongwei; Yu, Chaodong; Zhang, Yuze; Li, Hongfeng

doi:10.3390/electronics14142847

Open AccessArticle

An Asynchronous Federated Learning Aggregation Method Based on Adaptive Differential Privacy

by

Jiawen Wu

^*,

Geming Xia

^*

,

Hongwei Huang

,

Chaodong Yu

,

Yuze Zhang

and

Hongfeng Li

College of Computer Science and Technology, National University of Defense Technology, No.137 Yanwachi Street, Changsha 410073, China

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(14), 2847; https://doi.org/10.3390/electronics14142847

Submission received: 25 June 2025 / Revised: 9 July 2025 / Accepted: 15 July 2025 / Published: 16 July 2025

(This article belongs to the Special Issue Emerging Trends in Federated Learning and Network Security)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Federated learning is a distributed machine learning technique that allows multiple devices to collaborate on learning a shared model without exchanging data. It can be used to improve model accuracy while protecting user privacy. However, traditional federated learning is vulnerable to attacks from generative adversarial networks (GANs). As a new privacy protection method, differential privacy enhances privacy protection capabilities by sacrificing some data accuracy. To optimize the privacy budget allocation scheme in traditional differential privacy, we propose a differential privacy method called ADP-FL, which dynamically adjusts the privacy budget based on Newton’s Law of Cooling. While maintaining the overall privacy budget, it dynamically tunes adaptive parameters to improve training accuracy. Additionally, we propose an asynchronous federated learning aggregation scheme that combines privacy budget with data freshness, thereby reducing the impact of differential privacy on accuracy. We conducted extensive experiments on differential privacy algorithms based on Gaussian mechanisms and Laplace mechanisms. The experimental results show that, under the same privacy budget, our algorithm achieves higher accuracy and lower communication overhead compared to the baseline algorithm.

Keywords:

differential privacy; adaptive; weighted aggregation; federated learning

1. Introduction

With the increasing popularity of IoT devices, the amount of distributed data has skyrocketed. The total amount of global data will grow to 175 ZB by 2025 [1], and the increase in data volume has promoted the development of artificial intelligence in many fields such as smart medical, smart home, and traffic accident detection, leading artificial intelligence—driven by the big data environment—to enter its third golden period of development. Traditional centralized learning requires all data collected on local devices to be stored centrally in a data center or cloud server. This requirement not only raises concerns about privacy risks and data breaches but also places high demands on the storage and computing power of servers, particularly in cases involving large amounts of data. Distributed data parallelism enables multiple machines to train a copy of a model in parallel, using different datasets. While it may be a potential storage solution and address compute power issues, it still requires access to the entire training data, segmenting it into evenly distributed fragments, which can pose security and privacy concerns for the data.

Federated learning aims to train a global model that can train on data distributed across different devices while protecting data privacy. In 2016, McMahan et al. first introduced the concept of federated learning based on data parallelism [1] and proposed the federated averaging (FedAvg) algorithm. As a decentralized machine learning approach, FedAvg enables multiple devices to collaborate in training machine learning models while storing user data locally. FedAvg eliminates the need to upload sensitive user data to a central server, enabling edge devices to train shared models locally using their local data. By aggregating updates of local models, FedAvg meets the basic requirements of privacy protection and data security.

While federated learning offers a promising approach to privacy protection, numerous challenges arise when applying it to the real world [2]. The first is the problem of privacy. Studies in recent years have shown that gradient information during training can reveal private information [3,4,5,6,7,8,9], whether of third parties or central servers [10,11]. As shown in [12,13], even a slight gradient can reveal a significant amount of sensitive information about local data. By simply observing the gradient, a malicious attacker can steal training data within a few iterations [6]. Although traditional privacy protection methods, such as encryption and secure multi-party computing, can protect private information from being leaked, they are not designed for edge environments. Excessive algorithm complexity leads to high latency and communication overhead in practical applications. To solve the above problems, this paper proposes an efficient federated learning method that combines differential privacy and outdated level methods. The main contributions are as follows:

This paper proposes a differential privacy method with adaptive parameters that dynamically adjusts privacy parameters during training. This method improves the accuracy of federated learning while maintaining the total privacy budget.
This paper proposes an asynchronous federated learning aggregation scheme that combines privacy budget and device aging. This scheme adjusts weights based on privacy budget, device aging, and dataset during aggregation to improve training accuracy.
This paper conducts extensive experimental testing and validates the effectiveness of the algorithm using real-world datasets under Gaussian and Laplace noise.

2. Related Work

Differential privacy is a new definition of privacy proposed in 2006 by Dwork [14] for addressing the problem of privacy leakage in databases. It is mainly implemented through the use of random noise to ensure that the results of querying publicly visible information do not reveal the private information of the individual, that is, to provide a way to maximize the accuracy of the data query when querying from a statistical database, while minimizing the chance of identifying its records. Differential privacy enhances the ability to protect privacy by sacrificing some data accuracy. Determining how to balance privacy and efficiency in the actual use process is a problem worth studying.

In response to differential attacks, Robin C. et al. [15] propose a federated optimization algorithm for protecting client differential privacy. This algorithm dynamically adjusts the level of differential privacy during distributed training, aiming to hide customer contributions during training while balancing the trade-off between privacy loss and model performance. Xue J. et al. [16] proposed an improved SignDS-FL framework, which shares the same dimension-selection concept as FedSel but saves privacy costs during the value perturbation stage by assigning random sign values to the selected dimensions. Patil et al. [17] introduced the concept of differential privacy into traditional random forests [18]. The Random Forest algorithm was tested in three aspects. The experiments demonstrated that the traditional random forest algorithm and the random forest based on differential privacy achieved nearly identical classification accuracy. Badih Ghazi et al. [19] improved the privacy guarantee of the FL model by combining the shuffling technique with DP and masking user data using an invisibility cloak algorithm. Cai et al. [20] proposed the idea of differential private continuous release (DPCR) into FL and proposed a FL framework based on DPCR (FL-DPCR) to effectively reduce the overall error added to the parameter model and improve the accuracy of FL. Wang et al. [21] propose a Loss Differential Strategy (LDS) for parameter replacement in FL. The key idea is to maintain the performance of the Private Model by preserving it through parameter replacement with multi-user participation while significantly reducing the efficiency of privacy attacks on the model. However, these schemes introduce uncertainty in the uploaded parameters, which may affect training performance.

Although the above methods have promoted the application of differential privacy in federated learning from different perspectives [22,23,24,25,26], they have limitations in terms of dynamic adaptability and aggregation strategy. The ADP-FL algorithm proposed in this paper has formed significant advantages through targeted design, with the following specific differences:

1. Fundamental differences in privacy budget allocation logic: J. F. et al. [27] proposed a new differential privacy classification system. This study classified different differential privacy models based on their definitions, guarantees, and federated learning scenarios. However, this classification system remains at the theoretical framework level and does not involve specific dynamic allocation strategies, failing to resolve the privacy–accuracy trade-off across different training stages; J. L. [28] proposed a record-level differential privacy federated learning framework based on two-stage mixed sampling, and designed a simulated curve-fitting strategy (SCF) to determine the sampling probability of all records under a given personalized privacy budget. However, this personalized scheme focuses on privacy differentiation at the individual record level, ignoring the temporal characteristics of the training process in federated learning, and struggles to address the dynamic changes in gradient information sensitivity as training progresses.

2. Differences in the comprehensiveness of aggregation strategies: J.M. [29] et al. proposed a differential privacy framework based on optimal sparse responses, combining minimization of convergence rates to optimize sparse parameters adaptively. However, this optimal sparse response mechanism only optimizes sparse parameters to reduce noise effects, without considering the interference of device participation timeliness on the quality of aggregation. Some studies, such as DP-FLAGD [30], provide privacy budgets for different privacy requirements by adaptively allocating privacy budgets and corresponding learning rates. However, this linked scheme does not address the issue of differentiated weighting of contributions from multiple clients.

3. Differences in practical applicability: To address the risk of data leakage when broadcasting parameters to the central server and the issue of excessive noise affecting parameter aggregation quality, FedBADP [31] implements a bidirectional adaptive differential privacy scheme. By adaptively adding noise to the transmitted gradients, it protects data security without affecting model accuracy. However, the gradient sampling mechanism may lead to the loss of critical information.

In summary, the core advantage of this paper lies in its deep coupling of dynamic privacy control with multi-factor aggregation strategies, thereby forming a systematic solution that encompasses the entire training cycle and addresses the shortcomings of existing research in temporal adaptability and multidimensional collaborative optimization.

3. Problem Formulation

3.1. Model Definition

Neural networks update parameters through backpropagation. Similarly, in a federated learning framework based on gradient (weight) updates, gradient information is propagated between clients and servers. The gradient information transmitted by clients originates from local datasets. Attackers can use gradient information to reverse-engineer datasets, resulting in privacy leaks among participants in federated learning.

Definition 1

(Differential Privacy Problem Model). A trusted data regulator C has a set of data

D = D_{1}, D_{2}, \dots D_{n}

. The goal of the data regulator is to derive a random algorithm

A (D^{'}), D^{'} \in D

, where

A (D^{'})

describes certain information about the data set

D^{'}

, while ensuring the privacy of all data D.

The schematic diagram of differential privacy is shown in Figure 1. For example, for a query on the average age, the average age of all individuals in the query dataset D is first calculated, followed by the average age of the adjacent dataset

D^{'}

, which lacks Bob’s age. This can be inferred from the results of the previous two queries, which constitute a differential attack. The figure illustrates the core idea of differential privacy technology, which involves processing query results to ensure that the query algorithm produces highly similar output probability distributions on adjacent datasets. Thus, for datasets with only one record difference, the query results are likely to be the same. Differential privacy applications can effectively prevent attackers from inferring information about datasets through gradient analysis. However, the added noise inevitably leads to reduced accuracy and slower convergence in federated learning, and may even prevent convergence altogether. Therefore, finding a method to minimize the impact of differential privacy on accuracy while ensuring privacy security has become an urgent research issue.

3.2. Algorithm Definition

This section introduces the relevant definitions and basic properties that will be applied in the composition of adaptive differential privacy algorithms. Differential privacy is defined as follows:

Definition 2

(Adjacent data sets). For data sets D and

D^{'}

with the same data structure, if these two data sets differ only in a certain element x, then these two data sets are called adjacent data sets.

Definition 3

(Differential Privacy [14]). A randomized algorithm

M

with domain

R^{| X |}

is

(ϵ, Δ)

-differentially private if for all

S \subseteq

Range(

M

) and for all

x, y \in R^{| X |}

, such that

{| | x - y | |}_{1} \leq 1

:

P r [M (x) \in S] \leq e x p (ϵ) P r [M (y) \in S] + δ

(1)

where the probability space is over the coin flips of the mechanism

M

. If

δ = 0

, we say that

M

is ϵ-differentially private.

M

satisfies

(ϵ, δ)

-differential privacy, then when δ = 0,

M

satisfies ϵ-differential privacy.

(ϵ, 0)

-differential privacy guarantees that the absolute value of privacy loss for all adjacent databases is less than or equal to

ϵ

.

(ϵ, δ)

-differential privacy guarantees that the probability of privacy loss for all adjacent databases being less than or equal to

ϵ

is

1 - δ

, meaning that a privacy algorithm failure with a probability of

δ

is acceptable. This is a relatively lenient differential privacy strategy.

Definition 4

(Local Sensitivity). Local sensitivity of FL training is defined as follows:

Δ f_{L s} = \max_{D^{'}} | | f (D) - f (D^{'}) {| |}_{1}

(2)

Definition 5

(Global Sensitivity). Global sensitivity of FL training is defined as follows:

Δ f_{L s}^{g l o b a l} = \max_{D, D^{'}} | | f (D) - f (D^{'}) {| |}_{1}

(3)

Definition 6

(Laplace Mechanism). For input dataset

D

and function

F

, if the algorithm

Γ

satisfies:

Γ = F (D) + L a p (Δ F / ϵ)

(4)

Theorem 1

(Gaussian Mechanism [14]). For any

δ, σ > \frac{\sqrt{2 ln (1.25 / δ)} Δ f}{ϵ}

, if algorithm

Γ

satisfies:

Γ = F (D) + N (σ^{2})

(5)

then algorithm

Γ

satisfies

(ϵ, σ)

-differential privacy.

N (σ^{2})

is a Gaussian distribution with center

0

and variance

σ^{2}

.

Theorem 2

(Composition Theorem [14]). Let

M_{i}

each provide

ϵ_{i}

differentially private. The sequence of

M_{i} (X)

provides

(\sum_{i} ϵ_{i})

differentially private.

The differential privacy method selected for this study is a differential privacy algorithm based on the Laplace noise mechanism and the Gaussian noise mechanism.

4. Adaptive Differential Privacy Mechanisms for Federated Learning

In this section, we will introduce the building blocks of our method and explain how to implement our algorithm. In Section 4.1, we introduced adaptive differential privacy design. In Section 4.2, we introduced security analysis and scheme design. In Section 4.3, we provided a detailed explanation of the ADP-FL algorithm implementation. The hyperparameter selection strategy in the algorithm and the actual deployment of the algorithm are analyzed separately in Appendix A.2 and Appendix A.3.

4.1. Design of Adaptive Differential Privacy

Differential privacy technology was first proposed by Dwork in 2006 [14] to prevent differential attacks from obtaining sensitive information about a single record, thereby protecting the confidentiality of data. For example, for a query for average wages, select a set of 100 people, query the average wages of these 100 people, and then query the average wages of any 99 people in the set. The wages of the remaining one person can be analyzed by the results of the first two queries, which is a differential attack. The core idea of differential privacy technology is to process query results in a way that, for a dataset with only one record difference, the query result is likely to remain the same.

According to the composition theorem, when deploying differential privacy multiple times on the same input data, the requirements of differential privacy can still be met. However, it should be noted that there is a correlation between the outputs of each algorithm in serial composition, which leads to an increase in the overall privacy budget

ϵ

and failure probability

δ

, thereby reducing the effectiveness of privacy protection. In particular, when different differential privacy methods are applied multiple times on the same dataset, the level of privacy protection may be significantly weakened.

Therefore, allocating privacy budgets of different sizes according to the various stages of federated learning has greater advantages than the traditional method of evenly distributing privacy budgets across all stages.

In the early stages of federated learning, the gradient information contains less sensitive information, allowing for a more relaxed privacy budget to be adopted. As training progresses, the privacy information contained in the delayed gradient increases, and the privacy budget must be reduced to protect user information. Based on this idea, this paper employs an optimization algorithm that adjusts the client’s privacy budget in real time according to the model’s training progress and accuracy, ensuring that the overall privacy budget remains unchanged. By dynamically allocating the privacy budget, this paper achieves a more balanced approach between privacy protection and model performance during federated learning. This strategy of allocating privacy budgets of different sizes for various stages of federated learning enables the method proposed in this paper to flexibly address privacy protection requirements while fully utilizing the dataset’s information to enhance the model’s accuracy and performance.

Since our overall method gradually reduces the privacy budget

ϵ_{i}^{t}

as training progresses, we use Newton’s cooling law formula [32] to adjust

ϵ_{i}^{t}

for each training round. We analyze the mathematical characteristics of Newton’s cooling law (rapid decay followed by stabilization) and find that it highly matches the dynamic requirements of the privacy budget in federated learning [33,34,35]. Existing privacy budget adjustment methods mainly include linear decay [36] and fixed allocation. However, linear decay may cause the privacy budget to decrease sharply in the later stages of training, introducing excessive noise that disrupts model convergence. Fixed allocation cannot adapt to the privacy requirements of different training stages. In contrast, the exponential decay characteristic of Newton’s cooling law enables a smooth transition in budget consumption and adaptive adaptation to training progress. A detailed comparison and verification of the plans is described in Appendix A.1.

The adaptive adjustment process of the privacy budget

ϵ_{i}^{t}

can be formalized as:

ϵ_{i}^{t} = ϵ_{i} \times e^{- α \times (E - t)} + ϵ_{i}

(6)

where t is current communication round, E is maximum communication round, and

α

is the adjustment coefficient that defaults to 0.1.

Algorithm 1 demonstrates the adaptive differential privacy process. When the client begins participating in federated learning, the cumulative privacy budget is set to 0. As training progresses, if the cumulative privacy budget exceeds the total privacy budget, continuing to participate in federated learning will result in privacy leakage risks, so the client exits federated learning. It is important to note that the decline curve of Newton’s cooling law is very rapid. To prevent the privacy budget from depleting too quickly, which could lead to excessive noise and negatively impact training, this paper introduces a detection callback mechanism. When the client detects that the model’s accuracy has decreased beyond a threshold—i.e., when noise is affecting model convergence—the privacy budget is adjusted accordingly. Through this mechanism, this paper achieves a balance between privacy and efficiency.

Algorithm 1 Adaptive Differential Privacy

Input:: privacy budget $ϵ$ , accumulated privacy budget $ϵ_{a c c}$ , Coefficient $λ$ , Maximum number of communication rounds E
Output:: $ϵ_{t}$

1:: $ϵ_{a c c} = 0$
2:: while $t \leq E$ and $ϵ_{a c c} \leq ϵ$ do
3:: if $A c c_{t} - A c c_{t - 1} > - λ$ then
4:: $ϵ_{t} \leftarrow ϵ_{t - 1} \times e^{- α \times (E - E^{t})} + ϵ_{t - 1}$
5:: else
6:: $ϵ_{t}$ = $ϵ_{t - 1}$
7:: end if
8:: $ϵ_{a c c} + = ϵ_{t}$
9:: end while
10:: return $ϵ_{t}$

4.2. Design of Weighted Aggregation

Due to communication or device issues, some devices may not participate in training for extended periods and are therefore considered outdated devices. These outdated devices can lead to a decline in model accuracy, a significant issue in practice. To address this issue, a common approach in the context of federated learning is to adjust the weight of the gradient based on the degree of model obsolescence, thereby reducing the impact of outdated gradient information on the model. This paper adjusts the weights of gradient information based on the number of training rounds the model has not participated in and uses an exponential function to implement this adjustment. Through this approach, this paper can more effectively address the impact of outdated devices that have not participated in training for an extended period, thereby improving the overall accuracy of the model. This dynamic weight adjustment strategy ensures that the contribution of outdated devices in model updates gradually decreases, allowing the updates from devices that participate on time to be more significant. Therefore, this paper can better balance the contributions of different devices, thereby enhancing the effectiveness of federated learning and the model’s performance. The obsolescence degree function we use is as follows:

f (λ) = α^{t_{1} - t_{2}} λ

(7)

where

t_{1}

is the current communication round,

t_{2}

is the last communication round of the client, and

α

is the adjustment coefficient.

λ

is the outdated coefficient. It is initially set to 1 and reset to 1 each time it has participated in training.

According to the characteristics of the exponential function, the weights of clients who have not participated in training for multiple rounds will be tiny, effectively reducing the impact of outdated information.

During parameter aggregation, differential privacy noise impacts the model’s convergence. Clients with smaller privacy budgets have a higher probability of their uploaded gradient parameters deviating from the model convergence direction. Therefore, it is necessary to adjust the weights based on the amount of noise added by the client. Regarding how to assess the amount of noise added, the privacy budget

ϵ

and the amount of noise added are negatively correlated. The smaller the privacy budget, the more noise is added and the greater the deviation of gradient information. Naturally, this paper uses the privacy budget

ϵ

as a parameter to assess the degree of noise added and uses

ϵ

as one of the weight parameters for model aggregation.

Combining the weight adjustment algorithm for outdated devices and the weight adjustment algorithm for noise, this paper proposes the following aggregation scheme:

g^{t + 1} \leftarrow \sum_{i = 1}^{n} \frac{ϵ_{i}^{t} λ_{i}^{t} | D_{i} |}{ϵ_{1}^{t} λ_{1}^{t} | D_{1} | + ϵ_{2}^{t} λ_{2}^{t} | D_{2} | + \dots + ϵ_{n}^{t} λ_{n}^{t} | D_{n} |} g_{i}^{t}

(8)

where

g_{i}^{t}

is the gradient of client i in round t.

| D_{i} |

is the size of the dataset for client i. In this formula, the more noise, the older the model, and the smaller the weight of the gradient provided by the client. When the client continuously participates in training and adjusts the privacy budget

ϵ_{i}^{t}

, the weight in the aggregation will increase.

Accordingly, the calculation method for the global model is:

W^{t + 1} \leftarrow W^{t} - η \sum_{i = 1}^{n} \frac{ϵ_{i}^{t} λ_{i}^{t} | D_{i} |}{ϵ_{1}^{t} λ_{1}^{t} | D_{1} | + ϵ_{2}^{t} λ_{2}^{t} | D_{2} | + \dots + ϵ_{n}^{t} λ_{n}^{t} | D_{n} |} g_{i}^{t}

(9)

The flowchart of the aggregation scheme is shown in Figure 2. As shown in the figure, in a training round, the green portion represents the time window during which the server receives gradients from clients. In contrast, the red portion represents the time window during which the server performs aggregation and updates the global model. Clients that upload gradients during the green time window are considered regular clients participating in aggregation. Clients who upload gradients during the red time window or do not upload gradients are marked as lagging clients. For users marked as lagging clients, their aggregation weights will decay exponentially over time, allowing them to participate in normal model aggregation again.

4.3. Adaptive Differential Privacy Federated Learning Algorithm

Based on the previously proposed method, we propose Adaptive Differential Privacy Federated Learning (ADP-FL). Figure 3 is the overview of ADP-FL. Algorithm 2 shows the whole process of ADP-FL.

F_{a d p}

is the Adaptive Differential Privacy function. The specific details of the algorithm are as follows:

(1) Steps 2 to 7 are performed by the server. In Step 2, the server initializes the model w and sends it to all clients. Before reaching the maximum communication round E, the server receives the gradient information sent by the clients, aggregates the gradients of each participating client according to the aggregation scheme proposed in this paper, and updates the global model

W_{g l o b a l} = W - η g^{t + 1}

, then

W_{g l o b a l}

is sent to participating clients to update their local models. After the update is complete, the server resets the obsolescence level of participating clients to 1 based on their participation in this update, and sets the obsolescence level of non-participating clients to

λ α

, thereby implementing the algorithm.

(2) Steps 8 to 15 are performed by the client. During the preparation phase of federated learning training, the client receives the initial model w. When the client is selected and the privacy budget has not been fully consumed, the client first receives the latest model parameters

W_{g l o b a l}

from the server. Based on the local dataset and the latest global model

W_{g l o b a l}

, local training is performed to obtain the local gradient

g_{t}^{i}

. The privacy budget for this round of training is calculated as

ϵ_{t}^{i} \leftarrow F_{a d p} (t, ϵ_{t - 1}^{i}, E)

. Based on the privacy budget

ϵ_{t}^{i}

and the Laplace mechanism or Gaussian mechanism, perturbation noise is generated and added to the gradient information, yielding the perturbed gradient information

\hat{g_{t}^{i}}

, while the cumulative consumed privacy budget is updated as

ϵ_{i} = ϵ_{t}^{i} + ϵ_{i}

.

Figure 3. The overview of ADP-FL.

Algorithm 2 ADP-FL

Input:: initial paramenters w, privacy budget $ϵ$ , maximum communication round E, accumulated privacy budget $ϵ_{i} = 0$ , current communication round t, outdated level $α$ .

1:: Server does:
2:: Send initial parameters w to all clients i
3:: while $t \leq E$ do
4:: $g^{t + 1} \leftarrow \sum_{i = 1}^{n} \frac{ϵ_{i}^{t} λ_{i}^{t}}{ϵ_{1}^{t} λ_{1}^{t} + ϵ_{2}^{t} λ_{2}^{t} + \dots + ϵ_{n}^{t} λ_{n}^{t}} g_{i}^{t}$
5:: Return $g^{t + 1}$ to each selected client i
6:: Set participating clients’ $λ$ to 1, other clients’ $λ = λ \times α$
7:: end while
8:: Client does:
9:: Recieve initial parameters w
10:: if client i selected and $ϵ_{i} < ϵ$ then
11:: receive $g^{t}$ from server
12:: $g_{i}^{t} \leftarrow$ local train $(g^{t} + w_{i}^{t - 1})$
13:: $ϵ_{i}^{t} \leftarrow F_{a d p} (t, ϵ_{i}^{t - 1}, E)$
14:: ${\hat{g}}_{i}^{t} \leftarrow$ add noise $(g_{i}^{t}, ϵ_{i}^{t})$
15:: $ϵ_{i} + = ϵ_{i}^{t}$
16:: return ${\hat{g}}_{i}^{t}, ϵ_{i}^{t}$ to server
17:: end if

5. Experiment

5.1. Experimental Environment

The experimental environment for this paper is an AMD Ryzen 7 5800H (8 cores, 16 threads) with a Radeon Graphics @3.20 GHz processor and an NVIDIA GeForce RTX 4090 Laptop GPU, running on Windows 10. The experiment uses the PyTorch 1.13.1 (CUDA 11.7 GPU-accelerated) deep learning framework, with auxiliary libraries including Python 3.9.12 (code programming), NumPy (numerical computation), Pandas 1.4.2 (data processing), and Matplotlib 3.5.1 (result visualization).

This experiment simulates a server and 50 local client nodes using a simulation experiment. Non-IID data is constructed based on the MNIST, EMNIST, CIFAR10, and CIFAR100 datasets. The non-IID data was constructed based on the MNIST, EMNIST, CIFAR10, and CIFAR100 datasets. To test the algorithm’s impact on highly non-independent and identically distributed data, each client was randomly assigned two labels, with each client’s data accounting for 10% of the dataset. The neural network models include RNN, VGG9, CNN, and Transformer models. Their detailed parameters are shown in Table 1.

The baseline algorithm used in this paper is the FedAvg algorithm, which evenly distributes the privacy budget. It is also the most commonly used algorithm in current research on federated learning based on differential privacy. Let the total privacy budget for each client be

ϵ

, the number of clients selected by the server each time be n, the total number of clients be N, and the total number of communication rounds in federated learning be E. Then, for a single client i, the differential privacy budget

ϵ_{i}

for each upload is:

ϵ_{i} = \frac{N_{ϵ}}{n E}

(10)

To test the algorithm’s impact on highly non-IID data, each client was randomly assigned two labels, with each client’s data accounting for 10% of the dataset. The learning rate of the gradient descent model used was 0.01, and the batch size was 16. The number of local training rounds was 3, and the optimizer was SGD. The loss function was CrossEntropyLoss. When testing the accuracy of the ADP-FL algorithm and the baseline algorithm, the privacy budget was set to 1, 5, and 10, with the differential privacy relaxation parameter

δ

of the Gaussian mechanism set to 0.00001. When testing the impact of different noise mechanisms, the privacy budget was set more loosely to 10, 20, 30, 40, and 50 to better observe the experimental results.

5.2. Accuracy of Algorithm on Different Datasets

This paper tests the accuracy of the FedAvg algorithm without privacy protection, the FedAvg algorithm with evenly distributed privacy budgets (baseline algorithm), and the ADP-FL algorithm on three datasets under non-IID conditions. The total privacy budget is set to 1, 5, and 10, respectively. Other model evaluation metrics are described in Appendix A.4.

Figure 4, Figure 5, Figure 6 and Figure 7 show the accuracy rates of the proposed method and the baseline algorithm on the MNIST, CIFAR10, CIFAR100, and EMNIST datasets, with all final accuracy rates listed in Table 2. Among these, “Non” denotes the FedAvg algorithm without privacy protection, and “Nan” denotes the algorithm failing to converge. Non-convergence occurred on the CIFAR10 and EMNIST datasets when

ϵ = 1

, as seen in the figure, where the accuracy rate remained at a low level. This paper speculates that this is due to the privacy budget being too small, resulting in excessive noise addition and making it difficult for complex models to converge. In contrast, under the same privacy budget, the algorithm successfully converged on the MNIST dataset with lower complexity.

Compared to the baseline algorithm, the method proposed in this paper achieves higher accuracy and faster convergence speed. When the privacy budget is relatively lenient, the accuracy rate is even higher than that of methods without differential privacy. When

ϵ = 1

, compared to the baseline on MNIST, the proposed method improves accuracy by 15.35%. This is because the proposed aggregation method not only considers the impact of noise but also adjusts the weights of lagging clients. Compared to the baseline algorithm, ADP-FL can effectively address the impact of lagging clients. From the experimental results, it is evident that in most cases, ADP-FL achieves higher accuracy compared to traditional local differential privacy algorithms.

5.3. Accuracy of Algorithm Under Differential Privacy Mechanisms

In order to verify the accuracy of the adaptive differential privacy algorithm on different differential privacy mechanisms, this paper conducted comparative experiments on differential privacy algorithms based on Laplace and Gaussian mechanisms. The experimental results are shown in Table 3 and Table 4. Intuitive experimental results are shown in Appendix A.5. Other model evaluation metrics are described in Appendix A.4.

From the experimental results, it is evident that in the vast majority of cases, the ADP algorithm achieves higher accuracy compared to traditional privacy budget allocation algorithms. Especially under the premise of the same privacy budget, the adaptive differential privacy algorithm proposed in this article outperforms traditional differential privacy algorithms in terms of accuracy, with an average privacy budget allocation in most cases, whether using the Laplacian mechanism or the Gaussian mechanism. Especially on the CIFAR10 and CIFAR100 datasets, the accuracy of the ADP algorithm is generally higher than that of the baseline algorithm. This article speculates that this is because, on more complex datasets, the noise added by differential privacy significantly increases, and the superiority of the ADP algorithm in the reasonable allocation of the privacy budget can be better reflected. On the MNIST dataset, the accuracy of the ADP algorithm can exceed that of the baseline algorithm in most cases. The performance is not significant on the EMNIST dataset because the accuracy of the EMNIST dataset is relatively high, leaving limited room for improvement. A large number of experiments have shown that the ADP algorithm has a certain improvement in accuracy compared to the baseline algorithm.

5.4. Time Complexity of Algorithm Under Differential Privacy Mechanisms

Table 5 shows the computational time costs of the baseline algorithm and ADP-FL algorithm across different datasets. Since federated learning focuses more on client performance, and servers typically have powerful computational resources, this paper selected the average time consumption of the client, primarily including local training, noise addition, and gradient transmission processes. It did not test the server’s time consumption. The results demonstrate that the ADP-FL algorithm proposed in this paper exhibits lower time consumption and is more advantageous in terms of aggregation efficiency.

5.5. Gradient Leakage Attack

Figure 8 shows the performance of the ADP-FL algorithm in the face of gradient leakage attacks. From the experimental results, it can be concluded that under the premise of privacy budget

ϵ

and relaxation term

δ

within the conventional value range, the ADP-FL algorithm in this paper has a significant effect on gradient leakage attacks.

After multiple tests, we found that the ADP-FL algorithm failed when the privacy budget

ϵ = 1000

and the slack term

δ = 0.1

. At this point, the privacy budget and relaxation term settings have far exceeded the parameter range in general situations. It can be concluded that the ADP-FL algorithm proposed in this paper is resistant to gradient leakage attacks.

6. Conclusions

In the paper, we proposed ADP-FL, a federated learning method based on adaptive differential privacy. First, based on Newton’s cooling law, ADP-FL dynamically adjusts the privacy budget

ϵ

according to the training progress and accuracy changes. Additionally, this paper optimizes the federated learning aggregation scheme by changing the aggregation weights based on the privacy budget and model obsolescence. Based on MNIST, CIFAR10, CIFAR100, and EMNIST, this paper constructs corresponding Non-IID datasets and validates them using RNN, VGG9, CNN, and Transformer networks. Extensive experiments are conducted on differential privacy algorithms based on Gaussian mechanisms and Laplace mechanisms. The experimental results show that, under the same privacy budget, ADP-FL achieves higher accuracy and lower communication overhead compared to baseline algorithms. In future work, we will explore the method of introducing clusters [37] into the ADP-FL algorithm, while better compressing the model’s parameters, improving communication efficiency [38], and enhancing the security of federated learning.

Author Contributions

Conceptualization, J.W. and G.X.; methodology, J.W.; investigation, H.H. and C.Y.; writing—original draft preparation, J.W. and Y.Z.; writing—review and editing, H.L.; visualization, J.W. and H.L.; supervision, G.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Dataset available upon request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Comparisons of Privacy Adjustment Strategies

This section compares the accuracy (

ϵ = 10

) and convergence rounds comparisons of different privacy adjustment strategies on MNIST and CIFAR100, as shown in Table A1.

Table A1. Comparisons of different privacy adjustment strategies on MNIST and CIFAR100.

Different Privacy Strategies	Dataset	Accuracy	Convergence Rounds
Newton’s law of cooling	MNIST	92.15	30
Linear decay	MNIST	84.52	43
Fixed allocation	MNIST	82.56	45
Newton’s law of cooling	CIFAR100	63.56	45
Linear decay	CIFAR100	56.45	59
Fixed allocation	CIFAR100	52.69	55

Appendix A.2. Hyperparameter Selection

In this section, we focus on supplementing the hyperparameter selection strategy. We mainly analyze

α

and

γ

in Formula (7), the privacy budget

ϵ

, and the relaxation parameter

δ

.

The selection of

α

requires balancing the rate of privacy budget decay with the stability of model convergence. A value that is too large may cause the budget to deplete rapidly, introducing excessive noise; a value that is too small may result in decay occurring too slowly, failing to meet later privacy requirements.

γ

primarily ensures that the gradient weights of clients participating for the first time are not affected by decay. Its decay logic is primarily implemented using exponential decay, achieving the principle that “the longer the delay, the lower the weight.”

The choice of

ϵ

must match the dataset complexity and privacy requirements. For example, simple datasets (e.g., MNIST) are robust to noise, and

ϵ = 5

is sufficient for convergence; in contrast, complex datasets (e.g., CIFAR100) require

ϵ \geq 10

to avoid noise interference. The choice of

δ

follows the “low probability of failure” principle to avoid individual information being identified separately. For example, CIFAR100 contains 60,000 samples, and

δ \leq

1.7 × 10⁻⁵, so setting it to 1 × 10⁻⁵ meets the safety constraints.

Based on the above selection strategy, Table A2 shows the hyperparameter tuning process, value basis, and effect comparison for CIFAR100 (Transformer model), covering different scenario requirements.

Table A2. Hyperparameter tuning process and comparative analysis for CIFAR100 (Transformer).

Scenario Requirement	$α$	$λ$	$ϵ$	$δ$	Accuracy	Convergence Rounds
Unoptimized Baseline	0.1	1	10	1 × 10⁻⁵	58.23	60
High-Privacy	0.15	1	5	1 × 10⁻⁶	54.55	75
Balanced	0.1	1	10	1 × 10⁻⁵	62.34	55
High-Utility	0.08	1	20	1 × 10⁻⁵	65.82	45
Unstable Network	0.1	1	10	1 × 10⁻⁵	61.55	58

Based on the aforementioned selection strategy, Table A2 presents the hyperparameter tuning process, value selection criteria, and performance comparisons for the CIFAR100 dataset using the Transformer model, covering various scenario requirements.

Based on the data in the table, we can summarize the following:

(1) In the baseline scenario, default values are selected based on theoretical constraints, resulting in moderate model accuracy; however, privacy risks and convergence speed are not aligned with the scenario requirements.

(2) In the high-privacy scenario, strict control of privacy leakage is required (e.g., medical image data), so

α

is increased, and

ϵ

and

δ

are decreased to ensure “privacy first”;

(3) In the balanced scenario,

α = 0.1

,

ϵ = 10

, and

δ

= 1 × 10⁻⁵ make the algorithm more general;

(4) In the high-utility scenario, non-sensitive data can relax privacy constraints, so reducing

α

and increasing

ϵ

reduces noise interference, ensuring controllable risks and maximum efficiency;

(5) In the unstable network scenario, due to high network latency, adjust

λ

from 0.1 to 0.9 to avoid frequent discarding of valid gradients.

Appendix A.3. Actual Deployment Analysis

We can analyze the adaptability of ADP-FL on real devices from the experimental computational complexity results in Table 5:

1. Client computing power requirements: In simulation experiments, the training time for a single client ranged from 22 to 295 min (depending on the dataset), but the computing power of actual edge devices (such as smartphones and IoT sensors) is typically lower than that of experimental servers. Further optimization can be achieved through model lightweighting (e.g., pruning, quantization). For example, after compressing the model parameters of VGG9 by half, we observed an approximately 40% reduction in simulation time on CIFAR10 through testing. We speculate that ADP-FL meets real-time requirements on mainstream chips in mobile devices.

2. Server load: In simulation experiments, the aggregation time for 50 clients was less than 1 s. Based on linear scaling, supporting 1000 clients would result in an aggregation time of approximately 20 s, which aligns with the concurrent processing capabilities of cloud servers.

3. Weak network adaptation: If a client fails to upload gradients due to network interruption, the lag-based sparse

λ

reset mechanism enables it to reconnect without requiring retraining from scratch.

4. Communication Volume: Taking the VGG9 model as an example, the gradient of the VGG9 model is approximately 1.2 MB per client, which aligns with real-world usage scenarios. Additionally, in actual deployment, gradient sparsification can be employed to reduce the communication volume further.

Based on the simulation results analyzing computational time and communication volume, our algorithm is feasible for practical deployment.

Appendix A.4. Multi-Metric Performance Analysis

In this section, we have added multiple metrics, including precision, recall, and F1 score, to analyze the algorithm’s performance. Table A3, Table A4 and Table A5 compares precision, recall, and F1-Score under different privacy budgets. Table A6 and Table A7 compare metrics under the Laplace mechanism and the Gaussian mechanism.

Under both the Gaussian mechanism and the Laplace mechanism, the ADP-FL algorithm outperforms the Baseline algorithm in most scenarios, particularly demonstrating consistent advantages in precision, recall, and F1 scores, thereby validating the effectiveness of its “dynamic adjustment of privacy budget + weighted aggregation” strategy.

Table A3. Precision of baseline and ADP-FL algorithms under different privacy budgets

ϵ

.

Table A3. Precision of baseline and ADP-FL algorithms under different privacy budgets

ϵ

.

Algorithm	MNIST	CIFAR10	CIFAR100	EMNIST
Non	86.92	55.76	53.62	66.89
Baseline $ϵ = 1$	50.87	Nan	Nan	Nan
Baseline $ϵ = 5$	85.89	56.43	52.83	65.01
Baseline $ϵ = 10$	85.93	61.78	61.02	63.78
ADP-FL $ϵ = 1$	65.98	Nan	Nan	Nan
ADP-FL $ϵ = 5$	87.05	53.92	52.31	64.23
ADP-FL $ϵ = 10$	91.98	63.42	60.65	69.12

Table A4. Recall of baseline and ADP-FL algorithms under different privacy budgets

ϵ

.

Table A4. Recall of baseline and ADP-FL algorithms under different privacy budgets

ϵ

.

Algorithm	MNIST	CIFAR10	CIFAR100	EMNIST
Non	87.15	55.92	53.85	67.12
Baseline $ϵ = 1$	51.22	Nan	Nan	Nan
Baseline $ϵ = 5$	86.11	56.78	53.05	65.32
Baseline $ϵ = 10$	86.18	62.03	61.25	64.05
ADP-FL $ϵ = 1$	66.51	Nan	Nan	Nan
ADP-FL $ϵ = 5$	87.32	54.21	52.54	64.51
ADP-FL $ϵ = 10$	92.27	63.71	60.88	69.43

Table A5. F1-Score of baseline and ADP-FL algorithms under different privacy budgets

ϵ

.

Table A5. F1-Score of baseline and ADP-FL algorithms under different privacy budgets

ϵ

.

Algorithm	MNIST	CIFAR10	CIFAR100	EMNIST
Non	87.03	55.84	53.73	67.00
Baseline $ϵ = 1$	51.04	Nan	Nan	Nan
Baseline $ϵ = 5$	85.99	56.60	52.94	65.16
Baseline $ϵ = 10$	86.05	61.90	61.13	63.91
ADP-FL $ϵ = 1$	66.24	Nan	Nan	Nan
ADP-FL $ϵ = 5$	87.18	54.06	52.42	64.37
ADP-FL $ϵ = 10$	92.12	63.56	60.76	69.27

Table A6. Performance metrics of baseline and ADP-FL algorithms under Gaussian mechanism.

Dataset	Algorithm	$ϵ$	Precision	Recall	F1-Score
MNIST	Baseline	10	72.98	73.25	73.11
	ADP-FL	10	74.56	74.89	74.72
	Baseline	20	85.37	85.68	85.52
	ADP-FL	20	86.45	86.78	86.61
	Baseline	30	81.82	82.15	81.98
	ADP-FL	30	81.76	82.09	81.92
	Baseline	40	82.80	83.13	82.96
	ADP-FL	40	84.41	84.74	84.57
	Baseline	50	79.78	80.11	79.94
	ADP-FL	50	80.10	80.43	80.26
CIFAR10	Baseline	10	27.25	27.58	27.41
	ADP-FL	10	27.51	27.84	27.67
	Baseline	20	40.65	40.98	40.81
	ADP-FL	20	41.91	42.24	42.07
	Baseline	30	48.10	48.43	48.26
	ADP-FL	30	49.61	49.94	49.77
	Baseline	40	47.18	47.51	47.34
	ADP-FL	40	45.84	46.17	46.00
	Baseline	50	50.90	51.23	51.06
	ADP-FL	50	53.06	53.39	53.22
CIFAR100	Baseline	10	29.61	29.94	29.77
	ADP-FL	10	32.51	32.84	32.67
	Baseline	20	42.65	42.98	42.81
	ADP-FL	20	46.08	46.41	46.24
	Baseline	30	47.95	48.28	48.11
	ADP-FL	30	45.61	45.94	45.77
	Baseline	40	46.57	46.90	46.73
	ADP-FL	40	47.84	48.17	48.00
	Baseline	50	50.93	51.26	51.09
	ADP-FL	50	53.06	53.39	53.22
EMNIST	Baseline	10	88.01	88.34	88.17
	ADP-FL	10	89.98	90.31	90.14
	Baseline	20	96.65	96.98	96.81
	ADP-FL	20	97.23	97.56	97.39
	Baseline	30	97.55	97.88	97.71
	ADP-FL	30	96.76	97.09	96.92
	Baseline	40	97.35	97.68	97.51
	ADP-FL	40	97.50	97.83	97.66
	Baseline	50	97.90	98.23	98.06
	ADP-FL	50	98.41	98.74	98.57

Under the Gaussian mechanism, the advantage of ADP-FL is more stable, outperforming the Baseline in most scenarios (e.g., F1 score of 53.22 for CIFAR10 with

ϵ = 50

, compared to 51.06 for the Baseline), especially under high privacy budgets (

ϵ \geq 30

), where the optimization effect on complex datasets is more pronounced. This is due to the continuity of Gaussian noise, which is more suitable for ADP-FL’s dynamic budget adjustment strategy. In the Laplace mechanism, performance fluctuates significantly. In some scenarios, ADP-FL performs worse than the baseline (e.g., EMNIST

ϵ = 20

with an F1 score of 95.24, baseline 98.04), but it shows significant optimization for CIFAR100 in the

ϵ

= 30–40 range (F1 improvement of 6.4%). This is related to the discreteness of Laplace noise, which can introduce local biases during dynamic budget allocation.

Table A7. Performance metrics of baseline and ADP-FL algorithms under Laplace mechanism.

Dataset	Algorithm	$ϵ$	Precision	Recall	F1-Score
MNIST	Baseline	10	80.48	80.81	80.64
	ADP-FL	10	82.28	82.61	82.44
	Baseline	20	81.81	82.14	81.97
	ADP-FL	20	80.42	80.75	80.58
	Baseline	30	82.36	82.69	82.52
	ADP-FL	30	83.17	83.50	83.33
	Baseline	40	84.24	84.57	84.40
	ADP-FL	40	86.09	86.42	86.25
	Baseline	50	86.87	87.20	87.03
	ADP-FL	50	86.74	87.07	86.90
CIFAR10	Baseline	10	49.76	50.09	49.92
	ADP-FL	10	35.52	35.85	35.68
	Baseline	20	28.41	28.74	28.57
	ADP-FL	20	46.14	46.47	46.30
	Baseline	30	42.80	43.13	42.96
	ADP-FL	30	46.14	46.47	46.30
	Baseline	40	51.03	51.36	51.19
	ADP-FL	40	51.41	51.74	51.57
	Baseline	50	48.89	49.22	49.05
	ADP-FL	50	48.91	49.24	49.07
CIFAR100	Baseline	10	43.29	43.62	43.45
	ADP-FL	10	36.20	36.53	36.36
	Baseline	20	30.81	31.14	30.97
	ADP-FL	20	42.70	43.03	42.86
	Baseline	30	42.17	42.50	42.33
	ADP-FL	30	48.57	48.90	48.73
	Baseline	40	50.06	50.39	50.22
	ADP-FL	40	52.44	52.77	52.60
	Baseline	50	46.49	46.82	46.65
	ADP-FL	50	50.33	50.66	50.49
EMNIST	Baseline	10	97.30	97.63	97.46
	ADP-FL	10	97.33	97.66	97.49
	Baseline	20	97.88	98.21	98.04
	ADP-FL	20	95.08	95.41	95.24
	Baseline	30	97.80	98.13	97.96
	ADP-FL	30	98.06	98.39	98.22
	Baseline	40	97.83	98.16	97.99
	ADP-FL	40	97.91	98.24	98.07
	Baseline	50	97.74	98.07	97.90
	ADP-FL	50	97.40	97.73	97.56

Appendix A.5. Comparisons of Performance Under Different Noise Mechanisms

Intuitive experimental results are shown in Figure A1, Figure A2, Figure A3 and Figure A4. The red line represents the accuracy of the baseline algorithm, and the green line represents the accuracy of the ADP-FL algorithm.

Figure A1. On the MNIST dataset, adaptive differential privacy algorithms based on Laplace and Gaussian mechanisms were tested against the baseline algorithm, with privacy budgets

ϵ