Next Article in Journal
Energy-Efficient Hierarchical Federated Learning in UAV Networks with Partial AI Model Upload Under Non-Convex Loss
Previous Article in Journal
A Universal Method for Identifying and Correcting Induced Heave Error in Multi-Beam Bathymetric Surveys
Previous Article in Special Issue
SelectVote Byzantine Fault Tolerance for Evidence Custody: Virtual Voting Consensus with Environmental Compensation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Secure Hierarchical Asynchronous Federated Learning with Shuffle Model and Mask–DP

1
School of Computer Science, Hubei University of Technology, No. 28 Nanli Road, Hongshan District, Wuhan 430068, China
2
Hubei Provincial Engineering Research Center for Digital & Intelligent Manufacturing Technologies and Applications, No. 28 Nanli Road, Hongshan District, Wuhan 430068, China
3
Hubei Provincial Key Laboratory of Green Intelligent Computing Power Network, No. 28 Nanli Road, Hongshan District, Wuhan 430068, China
*
Author to whom correspondence should be addressed.
Sensors 2026, 26(2), 617; https://doi.org/10.3390/s26020617
Submission received: 24 November 2025 / Revised: 22 December 2025 / Accepted: 10 January 2026 / Published: 16 January 2026

Abstract

Hierarchical asynchronous federated learning (HAFL) accommodates more real networking and ensures practical communications and efficient aggregations. However, existing HAFL schemes still face challenges in balancing privacy-preserving and robustness. Malicious training nodes may infer the privacy of other training nodes or poison the global model, thereby damaging the system’s robustness. To address these issues, we propose a secure hierarchical asynchronous federated learning (SHAFL) framework. SHAFL organizes training nodes into multiple groups based on their respective gateways. Within each group, the training nodes prevent inference attacks from the gateways and committee nodes via a mask–DP exchange protocol and employ homomorphic encryption (HE) to prevent collusion attacks from other training nodes. Compared with conventional solutions, SHAFL uses noise that can be eliminated to reduce the impact of noise on the global model’s performance, while employing a shuffle model and subsampling to enhance the local model’s privacy-preserving level. At global model aggregation, SHAFL considers both model accuracy and communication delay, effectively reducing the impact of malicious and stale models on system performance. Theoretical analysis and experimental evaluations demonstrate that SHAFL outperforms state-of-the-art solutions in terms of convergence, security, robustness, and privacy-preserving capabilities.

1. Introduction

Hierarchical asynchronous federated learning (HAFL) has been widely applied and studied across various academic and industrial scenarios [1,2,3,4,5]. HAFL can adapt to more realistic networking systems with hierarchical structures and be compatible with heterogeneous training nodes through an asynchronous update mechanism. Typical applications include the Internet of Vehicles (IoV) [6,7,8] and the Internet of Things (IoT) [9,10,11,12]. However, HAFL still faces FL-specific security issues, including single-point failure, data privacy, and Byzantine fault tolerance. Attackers may conduct inference attacks to reconstruct the training nodes’ datasets from their updated models [13,14]. Malicious nodes may launch Byzantine attacks to compromise system robustness by poisoning the model [15,16].
Centralized federated learning (FL) approaches always suffer from a single point of failure and untrusted aggregation [17,18]. Owing to features such as decentralization, immutability, traceability, and consensus mechanisms, Blockchain-based technologies offer effective solutions [6,18,19]. They use Blockchain to store the global model, computational metadata, and other relevant data generated during the training process, ensuring transparency, traceability, and tamper resistance. However, Blockchain-based FL still faces privacy-preserving problems, e.g., membership inference attacks [13,20], model inversion attacks [14], Byzantine attacks, e.g., poisoning the models [15,16], and label flipping [21,22].
Differential privacy (DP) has been widely used for privacy in FL [23,24,25]. Compared to homomorphic encryption (HE) [26,27,28] and secure multi-party computation (SMC) [29,30,31], DP has low computational overhead and is more suitable for multiple iterations of computation [32]. Central differential privacy (CDP) [33] inputs calibrated noise into the global model via a central server that aggregates the model. Local differential privacy (LDP) [34] eliminates the dependence on a trusted central server and allows each training node to add noise to the uploaded model. However, the accumulated noise may degrade the performance of the global model. Yuan et al. [35] proposed an adaptive perturbation scheme that adjusts the variance of the perturbation online to reduce the performance degradation. Sun et al. [24] combined LDP with a shuffle model to reduce noise variance and enlarge the privacy budget. However, they can only reduce, but not eliminate, the impact of noise.
To suppress Byzantine attacks, e.g., additive noise (AN) [36,37], A Little Is Enough (ALIE) [22], inner product manipulation (IPM) [38], sign flipping (SF) [39,40], and label flipping (LF) [21,41], numerous robust aggregation algorithms have been proposed, e.g., Euclidean distance-based methods [42,43,44,45], cosine similarity-based approaches [46,47], and median/mean-based statistical techniques [48]. These algorithms distinguished between honest and malicious nodes by leveraging geometric distances or statistical features in high-dimensional spaces.
However, in LDP-based HAFL, the geometric distances or statistical characteristics are disturbed by noise, making it hard to distinguish malicious and delay models. Designing an HAFL system that simultaneously ensures privacy-preserving and Byzantine robustness remains hard.
This study proposes a secure hierarchical asynchronous federated learning (SHAFL) framework that ensures both privacy preservation and Byzantine robustness. Our contributions are summarized as follows:
  • SHAFL proposes a decentralized mask exchange protocol that uses eliminable noise to prevent the gateway from compromising the privacy of the training node and to reduce the impact of noise on global model performance. Based on HE, it prevents N 1 collusion attacks among training nodes.
  • The SHAFL scheme introduces a novel mechanism for continuous layer subsampling and dummy-layer padding. Combining continuous-layer subsampling, dummy-layer padding, and a shuffle model, SHAFL enhances the privacy-preserving capability of local models during the server aggregation phase.
  • SHAFL designs a secure aggregation scheme that leverages the upload model’s test accuracy to mitigate the impact of malicious nodes on system robustness.
  • With an eliminable noise, SHAFL reduces the damage to system robustness caused by node offline before model shuffling in groups.
  • Experiments on the MNIST, CIFAR-10, and Heart Disease datasets validate the privacy, convergence, and robustness of the proposed SHAFL.
The remainder of this study is organized as follows: Section 2 analyzes the related work; Section 3 discusses the system model; Section 4 presents the proposed SHAFL framework; Section 5 and Section 6 discuss the convergence and security of the proposed SHAFL; Section 7 presents an experimental analysis of the proposed SHAFL; and Section 8 is the conclusion.

2. Related Work

Xie et al. [49] proposed an asynchronous federated optimization algorithm (FedAsync) addressing the straggler issue. Miao et al. [50] proposed a time-weighted asynchronous PPFL that integrates stale models. Wu et al. [51] designed an aggregation method to control asynchronous aggregation errors. Chen et al. [52] proposed an adaptive semi-asynchronous federated learning (ASAFL) approach to balance learning latency and accuracy. However, the distributed architecture of FL makes it susceptible to privacy-preserving issues [13,14] and Byzantine attacks [22,36,37].
There are three typical privacy protection methods in FL: HE [26,27], DP [18,32,53,54], and SMC [29,30,31]. Compared with DP and SMC, HE-based methods exhibit higher computational complexity and overly conservative safety assumptions. For example, Yang et al. [26] proposed a secure FL scheme that prevents privacy attacks from external attackers and half-honest servers without requiring a shared homomorphic key. It can not defend against internal attacks from training nodes that share homomorphic keys. Miao et al. [27] proposed a privacy-preserving and Byzantine-robust FL framework with a fully homomorphic encryption (FHE) algorithm CKKS, assuming a trusted verifier. DP is widely used to preserve privacy in FL due to its quantifiable privacy loss and low computational overhead [18,32,53,54]. Wei et al. [53] proposed a Gaussian–DP-based privacy-preserving FL scheme. Jiang et al. [54] proposed a Laplace–DP-based algorithm to improve performance. Yan et al. [18] proposed a Laplace–DP-based asynchronous FL scheme for an IoT system, while analyzing the dropout tolerance of DP-based FL. However, the noise introduced by DP inherently degrades the model’s accuracy and utility and requires a larger privacy budget. In theory, Mask-based SMC schemes can eliminate the effects of noise. However, security concerns arise in the generation and aggregation of the mask/noise. For example, Feng et al. [29] proposed a Blockchain-enabled, horizontally decentralized FL with a mask that may be generated by a malicious node. Hiroki et al. [30] proposed a mask-based decentralized FL scheme; however, it cannot protect against collusion attacks. Shen et al. [31] proposed a LiPFed scheme in which each training node generates its own masks, thereby eliminating reliance on intermediate nodes for security. However, the divided model may result in insecure aggregation. Moreover, if the aggregation node cannot obtain all the noisy models, the mask/noise cannot be eliminated. In our proposed scheme, we introduce a mask-DP exchange protocol that, in theory, eliminates noise and improves performance when used with PBFT.
To address Byzantine attacks, it is necessary to distinguish between honest and malicious nodes using updated models [55,56]. With a consortium Blockchain, Yan et al. [18] adopt a Practical Byzantine Fault Tolerance (PBFT) protocol to ensure the credibility of aggregated results. Furthermore, Xu et al. [57] proposed a semi-asynchronous aggregation scheme resisting poisoning attacks, backdoor attacks, and Distributed Denial of Service (DDoS) attacks. Zhang et al. [56] proposed a robust and secure framework for FL with verifiable DP noise. However, their work is discussed in the context of synchronous FL but ignores the impact of asynchronous FL, particularly the effect of noise on the model accuracy of PBFT.
In addition, privacy amplification mechanisms, e.g., shuffler [58,59], subsampling [60], and dummy points [24], are introduced into FL to increase the privacy budget while reducing noise. The shuffling mechanism disrupts the correlation between the uploaded local models and the training nodes to enhance the LDP with anonymity [58,59,61]. Using the subsampling and dummy point algorithms, Sun et al. [24] proposed a privacy-enhancing DP-based FL, which amplified the privacy-preserving level of LDP at the aggregation stage. These methods can reduce the impact of DP noise. In our proposed scheme, we introduce a shuffling mechanism for asynchronous environments, reducing the impact of mask noise leakage on PBFT.

3. System Model

This section introduces the Blockchain-based hierarchical asynchronous federated learning, threat model, and privacy-preserving mechanism adopted by the SHAFL framework.

3.1. Blockchain-Based Hierarchical Asynchronous Federated Learning

Shown in Figure 1, our proposed SHAFL framework considers a Blockchain-based scenario that consists of two layers; In the first layer of the SHAFL framework, the training nodes have K groups { G o } , each group has a header node called gateway y o , and the size of group G o is s o . In group G o , each training node c i o G o has a dataset D c i o D . The basic FL [62] is
F ( w ) = c i o C ρ c i o F c i o ( w )
F c i o ( w ) = ξ D c i o ρ ξ f ( w , ξ )
where F ( w ) is the task objective function, F c i o is the objective function of c i o , ρ c i o and ρ ξ are weights of model aggregation, and f ( w , ξ ) is the loss function at c i o .
In the first layer of the SHAFL framework, in turn t T , each training node c i o first receives a global model w t 1 from Blockchain; then, locally and iteratively trains w t 1 with D c i o , and outputs w c i o t τ o : w c i o t τ o w c i o t τ o , H , after H iterations. In iteration h H , the local update is
w c i o t τ o , h = w c i o t τ o , h 1 γ c i o g c i o t τ o , h 1 ( w c i o t τ o , h 1 , D c i o )
g c i o t τ o , h ( w c i o t τ o , h , D c i o ) = 1 | D c i o | ξ D c i o f ( w c i t τ o , h , ξ )
where w c i o t τ o is the local update, τ o is the delay of group G o and gateway y o , and  t τ o is the start time of local training replacing synchronous tempo t; γ c i o is the learning rate.
It assumes that all local training within a group is synchronous, meaning that all τ in a group are the same and are marked with τ o . After collecting all local updates, the gateway y o obtains w y o t τ o :
w y o t τ o = c i o G o w c i o t τ o s o
In the second layer of the SHAFL framework, all gateways can upload their updates to the Blockchain asynchronously, which means the primary committee node allows the gateways to have different delays τ o . After a period, the primary committee node downloads the updates from the Blockchain and aggregates the global model as [1]
w t = ( 1 α ) w t 1 + α o = 1 η t ρ y o w y o t τ o o = 1 η t ρ y o
where α ( 0 , 1 ) is the hyperparameter weight of global update, ρ y o is the weight of local update w y o t τ o , and η t is the number of local updates uploaded in turn t. Figure 2 shows the asynchronous time workflow of the SHAFL framework.

3.2. Threat Model

In this study, we assume that a gateway can be honest but curious, and a training/committee node might be potentially malicious. The potential threats caused by training nodes, gateways, and committee nodes are shown as follows.
  • Training nodes: They try to extract other training nodes’ local data as much as possible from local updates, via launching inference attacks [13,14] and data reconstruction attacks [63,64,65]. Malicious training clients may engage in data poisoning or upload maliciously crafted local updates [66], which can lead to a degradation of the global model’s accuracy.
  • Gateways: They follow predefined protocols and submit correct intermediate results. However, they are curious about the sensitive information contained in training nodes and may attempt to infer the training nodes’ private data, resulting in data leakage.
  • Committee nodes: Malicious committee nodes may discard local updates from gateways or release a malicious global model, thus compromising the robustness of the system.
  • collusion attacks: Malicious training nodes may collude to obtain the private model of the target node, such as attempting to remove the noise added to the target model. Furthermore, malicious training nodes could collude with gateways, or gateways could collude with malicious committee nodes to attack the training nodes’ privacy.

3.3. Privacy Preserving Mechanism

To tackle the privacy-preserving issues, the SHAFL framework introduces an LDP-based shuffle model, a mask–DP exchange protocol, and Paillier homomorphic encryption.

3.3.1. LDP Mechanism

Unlike CDP, an LDP-based FL allows the training node to add noise to the model locally to achieve decentralized privacy-preserving [32], which has no reliance on a trusted server.
Definition 1 
( ( ϵ 0 , δ 0 ) -LDP [32]). A randomized algorithm M : D R satisfies ( ϵ 0 , δ 0 ) -LDP if for any two adjacent datasets d , d D and for any subset of outputs S R , it holds that
P r [ M ( d ) S ] e ϵ 0 P r [ M ( d ) S ] + δ 0
Gaussian mechanism extracts random noise from the Gaussian distribution and adds noise to the query function to satisfy ( ϵ , δ ) -DP.
Definition 2 
(Gaussian Mechanism [32]). For a given query function f with sensitivity 2 f . The randomized algorithm M = f ( D ) + N ( 0 , σ 2 ) satisfies ( ϵ , δ ) -DP if
σ 2 f ϵ log ( 2 ln ( 1.25 / δ ) )
where N ( 0 , σ 2 ) is a Gaussian distribution with mean 0 and covariance σ 2 , and 2 f is l 2 sensitivity of query function f.

3.3.2. LDP-Based Shuffle Model

The shuffle model disrupts the correlation between the local model and the training nodes through a confusion mechanism to provide anonymity to the local model [59,61]. A LDP-based shuffle model further enhances the privacy-preserving and anonymity [58,67]. The LDP shuffle model is shown in Figure 3 and defined as follows.
Definition 3 
(LDP-based shuffle model [24]). A randomized mechanism M is an LDP shuffle model if it includes three components: encoder R , shuffler S , and analyzer A  [24]. Considering that the shuffler (gateway) takes n training nodes’ upload in group G o :
  • Encoder R : X Y d is a randomized algorithm that runs on the training nodes’ side and converts local data x i X into d messages.
  • Shuffler S : ( Y d ) n Y ^ d n collects the messages uploaded by n training nodes and processes the messages into a random permutation.
  • Aggregator A : Y ^ d n Z aggregates the random permutation uploaded by training nodes to generate a model.
In summary, the shuffle DP can be denoted as
M y o A S R ( X ) = A ( S ( R ( x 1 ) , , R ( x n ) ) ) = A ( S ( y 1 , 1 , , y 1 , d , , y n , 1 , , y n , d ) ) = A ( y 1 , 1 , , y z 1 , 1 n , y 1 , 2 , , y z 2 , 2 , , y 1 , d , , y z 3 , d ) = Z
where M y o is the privacy-preserving mechanism, d is the number of messages, z 1 , z 2 , z 3 are random numbers, and Z is the uploaded model of the shuffler (gateway). Encoder R satisfies ( ϵ 0 , δ 0 ) -LDP.

3.3.3. Mask–DP Exchange Protocol

An eliminable noise [68], mask π , is generated through the Gaussian mechanism. In turn t, the mask exchange protocol is defined as detailed in [30]. The process is
  • Input the number of training nodes n, the number of exchange noises m , 1 < m < n , and a set of privacy budgets { ϵ c i } .
  • Each c i generates mask π c i t based on { ϵ c i } and receives mask { π c j t } from J = { c j } , j = i + 1 , , i + m mod n .
  • After the exchange step, each c i aggregate received masks π i t = j J π c j t .
  • Each training node c i generates m multi-masks as follows:
    M i , k t = w c i t π i t , i f k = 0 π c i t , i f k = 1 , 2 , 3 , , m
  • Each c i sends m multi-masks to gateways.
The local update w i t π i t satisfies ( ϵ , δ ) -LDP. In a group G o , i = 1 n k = 0 m M i , k t = i = 1 n w c i t . The server can aggregate the global model without adding perturbation. Shown in Figure 4, in a group G o , the number of training nodes is n, e.g.,  n = 4 , and the number of noises to be exchanged is m = n 1 = 3 . Each training node generates noise { π c 1 t , π c 2 t , π c 3 t , π c 4 t } based on its privacy budget { ϵ c 1 , ϵ c 2 , ϵ c 3 , ϵ c 4 } . Following the mask exchange protocol, m multi-mask messages { M 1 , 0 t , , M 4 , 3 t } are generated and transmitted to the gateway. The gateway then performs pre-aggregation as follows:
i = 1 4 k = 0 3 M i , k t = M 1 , 0 t + M 1 , 1 t + M 1 , 2 t + M 1 , 3 t + + M 4 , 0 t + M 4 , 1 t + M 4 , 2 t + M 4 , 3 t = w c 1 t π 1 t + π c 1 t + π c 1 t + π c 1 t + + w c 4 t π 4 t + π c 4 t + π c 4 t + π c 4 t = w c 1 t π c 2 t π c 3 t π c 4 t + 3 π c 1 t + w c 4 t π c 1 t π c 2 t π c 3 t + 3 π c 4 t = w c 1 t + w c 2 t + w c 3 t + w c 4 t = i = 1 4 w c i t
where w c i t denotes the local update of training node c i . After pre-aggregation, the noise { π c 1 t , π c 2 t , π c 3 t , π c 4 t } is eliminated. Therefore, the server can aggregate a global model without perturbation.

3.3.4. Paillier Homomorphic Encryption

Our scheme is based on Paillier homomorphic encryption (PHE) [69], which is an additive homomorphic encryption scheme. It consists of three algorithms.
  • Key Generation: Select two large prime numbers, p and q. Calculate n = p q and λ = l c m ( p 1 , q 1 ) ; l c m ( · ) denotes the least common multiple. Randomly select g Z n 2 * satisfying g c d ( L ( g λ mod n 2 ) , n ) = 1 ; g c d ( · ) denotes the greatest common divisor, L ( x ) = x 1 n . Calculate μ = ( L ( g λ mod n 2 ) ) 1 mod n . Output the public key P K = ( n , g ) and keep the private key S K = ( λ , μ ) .
  • Encryption: Input a plain text m Z n and select a random number r Z n * . Output the cipher text c = E n c ( m ) = g m · r n mod n 2 .
  • Decryption: Input a cipher text c Z n 2 * . Output the plain text m = D e c ( c ) = L ( c λ mod n 2 ) · μ mod n .

4. Proposed Framework

This section introduces our proposed secure hierarchical asynchronous federated learning (SHAFL) scheme, including design goals, the SHAFL framework, the shuffle model, and the committee consensus mechanism. Table 1 outlines the notation definitions in this study.

4.1. Design Goals

The design objectives of SHAFL are as follows:
  • Prevent malicious training nodes, gateways, and committee nodes from compromising the local data privacy of training nodes.
  • Solve the problem of n 1 collusion attacks among training nodes.
  • Eliminate the impact of noise on global model performance.
  • Prevent malicious training and committee nodes from compromising system robustness and global model performance.

4.2. Framework

The workflow of the SHAFL framework is shown in Figure 5. The SHAFL framework comprises four types of entities: task publishers, committee nodes U, training nodes C, and gateways Y. The task publisher initializes the global model w 0 (and rewards) in b l o c k 0 . The committee nodes share the same Paillier homomorphic key pair u p k / u s k and act as aggregators. They receive messages from gateways, analyze and aggregate them, and then publish the global model w t and the hyperparameters. Gateways act as shufflers that receive m multi-masks from training nodes and upload the output Z / M y o of the shuffle model to the Blockchain. Each training node has the same Paillier homomorphic key pair c p k / c s k . The training nodes under the same gateway are called a group G o , and the size of the group G o is s o . The SHAFL framework is presented in Algorithm 1 with six steps:
Algorithm 1 Algorithm of SHAFL
Input:  w t 1 , H , { D c i o } , ϵ c i o , δ c i o , T , O , Y , C , U
Output:  w T
1:
Task publisher initializes the global model w 0 (and rewards) in b l o c k 0
2:
for  t T   do
3:
   for each c i C , y o Y  do
4:
      G o = Nodes shuffling ( c i , y o )
5:
   end for
6:
    u q sends the signed messages to Blockchain
7:
   for each c i o G o  do
8:
     According to ϵ c i o , δ c i o and Gaussian Mechanism, c i o calculates noise scale σ c i o
9:
      π c i t τ o = Mask generating N ( 0 , σ c i o 2 )
10:
      c i o receives { π c j t τ o } from other trainers
11:
      c i o downloads and decrypts signed messages from Blockchain
12:
      w c i o t τ o = Local training( w t 1 , D c i o , H , γ )
13:
      { M i , k t τ o } = Model masking( w c i o t τ o , π c i t τ o , { π c j t τ o } )
14:
      c i o divides and encrypts { M i , k t τ o } to E n c u p k ( { x i , k t τ o , z } )
15:
   end for
16:
   for each y o G o O  do
17:
      w y o t τ o = Model shuffling ( E n c u p k ( { x i , k t τ o , z } ) )
18:
      y 0 signs and saves w y o t τ o in Blockchain.
19:
   end for
20:
   for each u i U  do
21:
     Select u q by q = h a s h ( b l o c k t 1 ) mod M
22:
      w t , { R t , o u q } = Committee consensus( U , w t 1 , { w y o t τ o } )
23:
   end for
24:
    u q signs and saves w t , { R t , o u q } in b l o c k t
25:
end for
26:
t = t + 1
27:
return Outputs
  • Node shuffling: In turn t, each training node c i C randomly selects a gateway as its shuffler. Training nodes under the same gateway y o form a group G o O .
  • Mask generating: Training nodes process mask–DP exchange protocol. According to the differential privacy parameter ϵ c i o , δ c i o and the Gaussian mechanism, c i o calculates noise scale σ c i o based on Equation (8) and generates masks π c i t τ o based on Gaussian distribution N ( 0 , σ c i o 2 ) . Then, c i o exchanges masks π c i t τ o with other training nodes within a group  G o .
  • Local training: All training nodes { c i o } receive the signed and encrypted message from the gateway, decrypt D e c c s k ( E n c c p k ( w t 1 , γ , τ m a x , H ) ) with private key c s k , obtain the global model w t 1 , set their learning rate γ , and train the global model w t 1 with D c i o locally using Equations (3) and (4).
  • Model masking: Training node c i o subsamples its local update { w c i o t τ o } and performs z t h dummy layer { p i , 0 t τ o , z } filling on the sampled model to restore the original model shape. Using the filled model w c i o * , t τ o , masks π c i t τ o and { π c j t τ o } , and the training node generates m multi-masks messages according to Equation (10). m multi-masks { M i , k t τ o } are further divided into d-layer vectors { x i , k t τ o , z } , z [ 1 , d ] , according to the shape of the global model. Then, training node c i o encrypts these messages with the primary committee node’s public key u p k and sends the encrypted messages E n c u p k { x i , k t τ o , z } to the gateway y o . The subsample, dummy-layer filling, and model masking are proposed in Algorithm 2. It is worth noting that the masks are additive Gaussian noises; the encrypted model has the same shape and location information as the global model.
  • Model shuffling: After the gateway receives all messages E n c u p k { x i , k t τ o , z } from the training nodes, the gateway shuffles encrypted messages E n c u p k ( x i , k t τ o , z ) using Equation (9), and retains the location information of the layer. Then, the gateway generates a new model w y o t τ o and sends it to the Blockchain with a delay τ o asynchronously. If  τ o > τ m a x , stale models are discarded.
  • Committee consensus: Committee nodes U select a primary node u q . Primary committee node u q downloads η t local updates { w y o t τ o } from the Blockchain and decrypts them to obtain { w y o t τ o , * } . Then, u q scores the model { w y o t τ o , * } and signs and broadcasts the scores { R t , o u q } to other committee nodes. Other committee nodes then re-score the models and reach a consensus on scores { R t , o u q } . Once a consensus is reached, primary u q aggregates the local updates { w y o t τ o , * } as
    w t = ( 1 α ) w t 1 + α o = 1 η t R t , o u i w y o t τ o , * o = 1 η t R t , o u i
    where α denotes the hyperparameter of secure aggregation, and η t is the number of local updates uploaded by gateways in turn t. Primary committee node u q encrypts and uploads the new global model E n c c p k ( w t ) to the Blockchain for the next turn t + 1 .
Algorithm 2 Model masking
Input:  w c i o t τ o , π c i t , { π c j t } , ϵ , φ
Output:  E n c u p k ( x i , k t τ o , z )
1:
for each G o O  do
2:
   for each c i o G o  do
3:
     Calculated mask π i t τ o = c j G o π c j , i t τ o
4:
     for  z d  do
5:
        if  z mod d s o = i  then
6:
Dropped x i , 0 t τ o , z and evaluated σ by Equation (8)
7:
Generated dummy layers p i , 0 t τ o , z = N ( 0 , σ c i 2 )
8:
x i , 0 t τ o , z p i , 0 t τ o , z
9:
end if
10:
z = z + 1
11:
end for
12:
Generated masks { M i , k t τ o } with w c i o * , t τ o by Equation (10)
13:
Divide masks as { x i , k t τ o , z } { M i , k t τ o , z }
14:
for z d do
15:
x i , k t τ o , z φ · x i , k t τ o , z
16:
end for
17:
Encrypted { x i , k t τ o , z } to E n c u p k ( x i , k t τ o , z ) using primary committee’s public key upk
18:
end for
19:
c i o G o send Encrypted messages E n c u p k ( x i , k t τ 0 , z ) to gateway yo synchronous
20:
end for
21:
return Outputs
Once t reaches the set parameter T or the global model w t converges, the FL ends.

4.3. Multi-Shuffle with Subsample and Dummy Layers

To enhance the LDP, the SHAFL framework introduces the shuffle model, which fills the subsample and dummy layers. These privacy-enhancing mechanisms reduce the required noise level in local updates while ensuring the performance preservation of models uploaded by training nodes [67]. However, the subsample will cause missing model layers, which makes it difficult for the committee node to combine and aggregate the local updates into an available model [24]. The SHAFL framework introduces dummy layers to ensure a valid model.
The subsample, dummy-layer filling, and model masking are shown in Figure 6 and Algorithm 2. After local training, the training node first performs continuous layer subsampling on its local updates w c i o t τ o . All training nodes perform subsampling and drop some model layers. They then evaluate the variance σ of the Gaussian noise using Equation (8), and fill the dropped layers x i , 0 t τ o , z with dummy layers p i , 0 t τ o , z generated from Gaussian noise p i , 0 t τ o , z = N ( 0 , σ ) , where
x i , 0 t τ o , z p i , 0 t τ o , z , z = i + r d s o , r = 0 , 1 , 2 ,
The filled model is denoted as w c i o * , t τ o . According to the mask–DP exchange protocol, the training node generates m multi-masks { M i , k t τ o } using masks π c i t τ o and { π c j t τ o }. It is worth noting that k [ 0 , m ] , and each training node generates m + 1 multi-masks { M i , k t τ o }.
Before uploading the masks { M i , k t τ o } to the gateway, the training node divides them layer by layer:
x i , k t τ o , z = M i , k t τ o , z
where x i , k t τ o , z denotes the z t h layer vector of M i , k t τ o , z , i [ 1 , s o ] , and k [ 0 , m ] , z [ 1 , d ] . The value of x i , k t τ o , z is a float number, which can not be encrypted directly with PHE. Therefore, the value of layer vector x i , k t τ o , z should be expanded to an integer through quantization x i , k t τ o , z φ · x i , k t τ o , z , where φ denotes the scale of quantization. Then, using primary committee node u q ’s homomorphic public key u p k , the training node encrypts the layer vector x i , k t τ o , z to E n c u p k { x i , k t τ o , z } layer by layer and uploads d s o ( m + 1 ) messages to gateway y o .
The gateway (shuffler) receives messages from training nodes and shuffles them by layer. Specifically, the gateway will collectively shuffle the order of the layer vectors E n c u p k { x i , k t τ o , z } from all clients at the same layer. After shuffling, according to the hierarchical relationship, the encrypted layer vectors are stored in order to form a new local update w y o t τ o . Then, the gateway uploads it to the Blockchain. The local update w y o t τ o before aggregating is denoted as
M y o = w y o t τ o = A S R d s o ( m + 1 ) ( X ) = A ( S y o ( R 1 d ( m + 1 ) ( X ) , , R s o d ( m + 1 ) ( X ) ) ) = A ( S y o ( R 1 d ( m + 1 ) ( E n c u p k ( M 1 , 0 t τ o , M 1 , 1 t τ o , ,         M 1 , m t τ o ) ) , , R s o d ( m + 1 ) ( E n c u p k ( M s o , 0 t τ o , ,         M s o , m t τ o ) ) ) ) = A ( S y o ( E n c u p k ( x 1 , 0 t τ o , 1 , , x 1 , 0 t τ o , d , , x 1 , m t τ o , 1 ,         , x 1 , m t τ o , d , , x s o , 0 t τ o , 1 , , x s o , 0 t τ o , d , , x s o , m t τ o , 1 ,         , x s o , m t τ o , d ) ) ) = Z
where m is the number of multi-masks, and x s o , m t τ o , z is the z t h layer vector of mask M s o , m t τ o in turn t before encryption. The number of training node messages is d ( m + 1 ) . The z t h layer of the new local update w y o t τ o is
θ t τ o , z = i = 1 s o k = 0 m E n c u p k ( x i , k t τ o , z )
Due to homomorphism, D e c ( θ t τ o , z ) = i = 1 s o k = 0 m ( x i , k t τ o , z )
The SHAFL framework uses the gateways as shufflers. The shuffling and pre-aggregation of model w y o t τ o are shown in Figure 6 and Algorithm 3.
Algorithm 3 Model shuffling
Input:  Y , E n c u p k ( x i , k t τ o , z )
Output:  w y o t τ o
1:
for each y o Y  do
2:
   Receives the encrypted messages { E n c u p k ( x i , k t τ o , z ) }
3:
   for  z [ 0 , d ]  do
4:
     Shuffles { E n c u p k ( x i , k t τ o , z ) } by Equation (9)
5:
      θ t τ o , z = i = 1 s o k = 0 m E n c u p k ( x i , k t τ o , z )
6:
      w y o t τ o , z θ t τ o , z
7:
   end for
8:
    y o uploads local update w y o t τ o to Blockchain asynchronous
9:
end for
10:
return Outputs

4.4. Committee Consensus

The committee consensus is shown in Algorithm 4. In the t t h round of the SHAFL framework, the primary committee node u q first downloads an η t encrypted local update { w y o t τ o } from the Blockchain and decrypts it using a homomorphic private key u s k layer by layer. Since the layer vectors { x i , k t τ o , z } are quantized during encryption, mapping floating-point numbers to integers, it is necessary to dequantize the decrypted layer vectors { θ t τ o , * , z } to restore them to their original floating-point format. According to the hierarchical relationship, the layer vectors { θ t τ o , * , z } are reconstructed into a decrypted local update w y o t τ o , * . Then, the primary committee node u q tests each local update w y o t τ o , * and global model w t 1 using the committee nodes’ local dataset D t e s t to obtain the accuracy A L and A G . By using A L , A G , and the delay τ o , u q calculates the score R t , o u q for each local update w y o t τ o , * as
R t , o u q = ( A L ( A L + A G ) τ )
After scoring, the primary committee node u q sends the scores of local updates { R t , o u q } to other committee nodes { u i } . { u i } downloads w y o t τ o , * and re-scores them, and uses a consensus mechanism, PBFT, to reach a consensus on scores { R t , o u q } . Once a consensus is reached, u q aggregates the local updates { w y o t τ o , * } to obtain a new global model w t through secure aggregation using (12). Then, u q encrypts and uploads E n c c p k ( w t ) to b l o c k t .
Algorithm 4 Committee consensus
Input:  U , { w y o t τ o } , w t 1 , D t e s t , s o , φ
Output:  w t , { R t , o u q }
1:
Primary committee node u q downloads η t local updates { w y o t τ o } from Blockchain and decrypts it
2:
for  o η t do
3:
   for  z d  do
4:
      θ t τ o , * , z = D e c u s k ( θ t τ o , z ) =   i = 1 s o k = 0 m ( x i , k t τ o , z )
5:
     Dequantization:
6:
      θ t τ o , * , z = θ t τ o , * , z , s o · φ
7:
      w y o t τ o , * , z θ t τ o , * , z
8:
      z = z + 1
9:
   end for
10:
   Obtain the decrypted local update w y o t τ o , * of gateway y o
11:
    o = o + 1
12:
end for
13:
   Tests the accuracy of global model w t 1 by D t e s t to obtain A G
14:
   for  o η t  do
15:
      u q test the accuracy of w y o t τ o , * by D t e s t to obtain A L
16:
     Calculates score R t , o u q of model w y o t τ o , * by Equation (17)
17:
      o = o + 1
18:
   end for
19:
   Sent scores to all committee node u i
20:
   for  u i U  do
21:
     Re-score each local update w y o t τ o , * by D t e s t
22:
     Sent scores { R t , o u i } to other committee node
23:
   end for
24:
   All u i U reach a consensus on scores { R t , o u q }
25:
   Pirmary node u q process secure aggregation by Equation (12)
26:
    u q encrypts the global model to E n c c p k ( w t ) by training nodes’ public key
27:
   Uploads E n c c p k ( w t ) to b l o c k t  
28:
return Outputs

5. Convergence Analysis

In this section, we present the theorem and proof for the convergence analysis of the SHAFL framework.
Definition 4 
(L-smooth [1]). Function f is L-smooth if x , y R N , x y exists:
f ( y ) f ( x ) + ( y x ) f ( x ) + L 2 y x 2
Definition 5 
( μ -strongly convex [1]). Function f is μ-strongly convex if x , y R N , x y exists:
f ( y ) f ( x ) + ( y x ) f ( x ) + μ 2 y x 2
Theorem 1. 
Assume the global loss function F is L-smooth and μ-strongly convex. For the group G o , let the learning rate be γ < 1 L and the local iterations be H [ H m i n , H m a x ] . For w R d , ξ D , the expected square norm of the gradients is bounded:
E ξ D f ( w , ξ ) 2 Q
For the initial global model w 0 and optimization model w *
F ( w 0 ) F ( w * ) ( 1 α ) T γ 2 H Q
After T turns, the convergence bond of the global loss function is
E [ F ( w T ) F ( w * ) ] F ( w 0 ) F ( w * ) + [ 1 ( 1 α ) T ] γ 2 H Q
Proof of Theorem 1. 
Since prior studies [1,70] have established convergence analysis for hierarchical asynchronous federated learning frameworks, we specifically focus on presenting several distinct components in this study. For a training node c i o in an arbitrary group G o , after performing H a local update, the convergence bound is
E [ F ( w c i o t τ o , H ) F ( w * ) ] F ( w c i o t τ o , 0 ) F ( w * ) γ 2 H Q
where w c i o t τ o , H is derived from w c i o t τ o , 0 by H iterations. Then, the committee nodes will aggregate η t local updates { w y o t τ o } from the gateway to obtain a new global model w t . Thus, the convergence bound of the SHAFL framework after t turns is
E [ F ( w t ) F ( w * ) ] ( 1 α ) F ( w t 1 ) + α E F ( o = 1 η t R t , o u i w y o t τ o o = 1 η t R t , o u i ) F ( w * ) ( 1 α ) F ( w t 1 ) + α o = 1 η t R t , o u i F ( w y o t τ o ) o = 1 η t R t , o u i F ( w * ) ( 1 α ) F ( w t 1 ) + α E F ( w y o t τ o ) F ( w * ) ( 1 α ) F ( w t 1 ) + α E F ( i = 1 s o | D c i o | k = 1 m M i , k t τ o i = 1 s o | D c i o | ) F ( w * ) ( 1 α ) F ( w t 1 ) + α E F ( i = 1 s o | D c i o | w c i o t τ o i = 1 s o | D c i o | ) F ( w * ) ( 1 α ) F ( w t 1 ) + α E F ( w c i o t τ o ) F ( w * ) ( 1 α ) [ F ( w t 1 ) F ( w * ) ] + α E [ F ( w c i o t τ o ) F ( w * ) ]
Using Equations (23) and (24), after performing T global turns, the convergence bound of the SHAFL framework is
E [ F ( w T ) F ( w * ) ] ( 1 α ) [ F ( w T 1 ) F ( w * ) ] + α E [ F ( w c i o T τ o ) F ( w * ) ] ( 1 α ) ( 1 α ) [ F ( w T 2 ) F ( w * ) ] + α E [ F ( w c i o T 1 τ o ) F ( w * ) ]         + α E [ F ( w c i o T τ o ) F ( w * ) ] ( 1 α ) ( 1 α ) [ F ( w T 2 ) F ( w * ) ] + ( 1 α ) α E [ F ( w c i o T 1 τ o ) F ( w * ) ]         + α E [ F ( w c i o T τ o ) F ( w * ) ] ( 1 α ) T [ F ( w 0 ) F ( w * ) ] + [ α + ( 1 α ) α + + ( T 1 ) ( 1 α ) α ] E [ F ( w c i o t τ o , H )         F ( w * ) ] ( 1 α ) T [ F ( w 0 ) F ( w * ) ] + [ 1 ( 1 α ) T ] [ F ( w c i o t τ o , 0 ) F ( w * ) γ 2 H Q ] ( 1 α ) T [ F ( w 0 ) F ( w * ) ] + [ 1 ( 1 α ) T ] [ F ( w 0 ) F ( w * ) γ 2 H Q ] F ( w 0 ) F ( w * ) + [ 1 ( 1 α ) T ] γ 2 H Q ]
Thus, Theorem 1 derives the convergence bound after T turns. □

6. Security Analysis

This section describes the security analysis of the SHAFL framework as follows: privacy-preserving analysis, system robustness analysis, and model security analysis.

6.1. Privacy-Preserving Analysis

Lemma 1 
(Amplification by shuffling [58]). Let R be an ϵ 0 -LDP mechanism. Then, the shuffle model M ( x 1 , , x n ) : = A S ( R ( x 1 ) , , R ( x n ) ) satisfies ( ϵ , δ ) -DP, where
  • If
    ϵ 0 log ( n / l o g ( 2 / δ ) ) 2
    for any δ > 0 , it has
    ϵ = O ( min { ϵ 0 , 1 } e ϵ 0 l o g ( 1 / δ ) n ) .
Lemma 2 
(Amplification by subsampling [60]). If M : X n Y satisfies ( ϵ , δ ) -DP with the relationship on the set n, then M : X m Y satisfies ( log ( 1 + ( n / m ) ( e ϵ 1 ) ) , ( n / m ) δ ) -DP.
Theorem 2. 
In the SHAFL framework, the training node employs the Gaussian mechanism-based ( ϵ l , δ l ) -LDP to preserve data privacy. Through the integration of the shuffle model and subsample, the privacy parameters of the local model satisfy
ϵ c = O ( m i n { log ( 1 + ( d s o 1 ) ( e ϵ l 1 ) ) , 1 } [ 1 + ( d s o 1 ) ( e ϵ l 1 ) ] log ( 1 / ( d s o 1 ) δ l ) ( m + 1 ) d )
δ c = ( d s o 1 ) δ l
Equations (28) and (29) demonstrate the conversion relationship between local privacy parameters ( ϵ l , δ l ) and central differential privacy parameters ( ϵ c , δ c ) .
Proof of Theorem 2. 
In Algorithm 2, d / d s o layers of the model are dropped and replaced with dummy layers { p i , 0 t τ o , z } . Therefore, the SHAFL framework samples d d / d s o layers from the model parameter space. According to Lemma 2, the local model satisfies
ϵ s u b = log ( 1 + d d / d s o d ( e ϵ l 1 ) ) = log ( 1 + ( d s o 1 ) ( e ϵ l 1 ) )
δ s u b = d d / d s o d δ l = ( d s o 1 ) δ l
where d d / d s o d is the subsampling rate. After subsampling, the local model satisfies ( ϵ s u b , δ s u b ) -DP. Since the training nodes send the subsampled model to the gateway, the gateway performs a random permutation on the subsampled model. According to Lemma 1, the local model processed with subsampling and shuffling satisfies
ϵ c = O ( m i n { ϵ s u b , 1 } e ϵ s u b log ( 1 / δ s u b ) ( m + 1 ) d ) = O ( m i n { ϵ s u b , 1 } e ϵ s u b log ( 1 / ( d s o 1 ) δ l ) ( m + 1 ) d ) = O ( m i n { log ( 1 + ( d s o 1 ) ( e ϵ l 1 ) ) , 1 } [ 1 + ( d s o 1 ) ( e ϵ l 1 ) ] log ( 1 / ( d s o 1 ) δ l ) ( m + 1 ) d )
δ c = δ s u b = ( d s o 1 ) δ l
as shown in Theorem 2. □

6.2. System Robustness Analysis

The SHAFL framework introduces a novel secure aggregation algorithm. Before the committee nodes aggregate a new global model w t , the primary committee node u q evaluates the test accuracy of each local update { w y o t τ o } using a globally shared test dataset D t e s t . The algorithm then calculates a score for each model based on its accuracy and delay τ o , which serves as the aggregation weight of { w y o t τ o } . A suboptimal model uploaded by a malicious training node will achieve low test accuracy and therefore receive a low aggregation weight. The SHAFL framework mitigates the impact of malicious training nodes on the global model’s performance using the secure aggregation algorithm described above. In the latter rounds of training, the accuracy of both the global model and the local models becomes high and similar. The aggregation weight of the model uploaded by y o with a high delay is significantly lower than that of normal models, thereby mitigating the detrimental impact of stale models on the performance of the global model.
In the event of node disconnections after the mask–DP exchange protocol, the mask–DP introduced by the SHAFL framework is equivalent to a Gaussian noise-based DP. When training node c i o is offline, each training node generates noise with a variance of
σ = m σ c i o s o 1 + n = i i + m σ c n m + 1
where σ is the variance of noises π , and m is the number of exchange masks.

6.3. Model Security Analysis

The proposed SHAFL framework employs a combination of consortium Blockchain technology, HE, and DP-based masks to ensure the privacy and security of local data for training nodes. The consortium Blockchain, as a private chain, restricts data access to authorized nodes only, thereby mitigating privacy threats from external nodes.  Within the SHAFL framework, all committee nodes and training nodes each possess homomorphic key pairs < u p k , u s k > , < c p k , c s k > . When committee nodes distribute the global model to training nodes via an intermediate gateway node, they encrypt the global model with the training node’s homomorphic public key c p k . Except for the training node, no one else can access the global model. Before sending local updates to the gateway, all training nodes encrypt their messages using the committee nodes’ homomorphic public key u p k , preventing the gateway and the training nodes from extracting any original model information. The gateway shuffles received messages from training nodes and disrupts the mapping between messages and training nodes. The committee nodes can only receive the shuffled model from the gateway, not the original model from the training node. If the gateway and committee nodes collude, they can use the committee node’s private u s k key to decrypt the local updates uploaded by the training nodes. However, since the local updates uploaded by the training nodes are masked, they cannot obtain the original local updates of the training nodes.

7. Experiments

7.1. Experimental Setting

7.1.1. Benchmarks

The baseline algorithms used in the experiments are introduced as follows.
  • FedAvg [62], as the canonical synchronous federated learning framework, was adopted as the baseline comparative scheme in our experiments. This implementation deliberately excludes privacy-preserving mechanisms and Byzantine fault tolerance capabilities.
  • DP–FedAvg [71] is a privacy-preserving federated learning framework based on LDP. By injecting noise into their local models, training nodes ensure that the uploaded local models satisfy LDP requirements, thereby defending against inference attacks from the server.
  • FedSDP [24] is a synchronous privacy-preserving federated learning framework designed for the Internet of Vehicles (IoV), which enhances privacy and improves data utility through a tripartite mechanism that combines Top-k gradient subsampling, virtual point padding, and shuffle-based anonymization.
  • MSFL [61] is a privacy-preserving federated learning framework that synergistically integrates multi-stage shuffling mechanisms and Byzantine-resilient consensus algorithms. It enhances privacy by shuffling training nodes and local updates.
  • PBFL [27] is a synchronous, centralized privacy-preserving federated learning framework that achieves privacy-preserving through HE and ensures Byzantine fault tolerance via cosine similarity-based gradient validation.
  • PPAFL [18] is an asynchronous privacy-preserving federated learning framework that implements LDP via the Laplace mechanism.
  • RAFLS [34] is an RDP-based adaptive FL scheme. It uses the sensitivity of different layers’ weights to determine the amount of noise injected into the model, adopts a model-parameter shuffling mechanism to achieve local model anonymity, and proposes a fine-grained model-weight aggregation scheme.
Table 2 compares the computational complexity of the evaluated schemes from three aspects: local training, aggregation, and privacy preserving. The FedSGP scheme’s marginally higher local training loss is a consequence of the additional Tok sparsification operation performed locally. Regarding aggregation and privacy preservation, due to the use of homomorphic encryption, PBFL and SHAFL exhibit significantly higher computational complexity than other schemes.

7.1.2. Datasets and Models

Three benchmark datasets were rigorously employed in our experiments: MNIST [72], CIFAR-10 [73], and a Heart Disease dataset [74]. The MNIST dataset is a classic handwritten digital image dataset, comprising a training set of 60,000 grayscale images and a test set of 10,000 grayscale images, each standardized to a resolution of 28 × 28 pixels. The test set of 10,000 grayscale images is used to form D t e s t . The committee nodes utilize D t e s t to evaluate the accuracy of local updates uploaded by the gateways and assign aggregation weights to each gateway’s local updates based on their accuracy. The training set comprising 50,000 images is evenly distributed across the training nodes. The training nodes then conduct training using their allocated subsets of the training data. The model used on the MNIST dataset is a two-layer CNN. The CIFAR-10 dataset includes 60,000 labeled RGB images (32 × 32 pixels) across 10 object classes, which are divided into 50,000 training images and 10,000 test images. For the CIFAR-10 dataset, the partitioning method for D t e s t and the training dataset is the same as that for the MNIST dataset. The model architecture employed on the CIFAR-10 dataset is ResNet-18. The Heart Disease dataset is a real-world IoMT dataset. The dataset contains approximately 37,000 heart activity samples, each with a 50-dimensional feature vector including heart rate, body mass index, glucose levels, and a label indicating coronary heart disease. There are 1,500 heart health samples across these sample nodes. For the Heart Disease dataset, the model and dataset partitioning scheme are adopted from Reference [74].

7.1.3. Experimental Parameters

The experiment was implemented with Python 3.9 and PyTorch 2.1.0 on a computer equipped with an Intel CPU i5-12400F (Santa Clara, CA, USA) and a NVIDIA GPU 3060Ti (Santa Clara, CA, USA). The random seed was 42, and the key size was 2048-bit, as referenced in [75]. Different experimental parameters were adopted for the three datasets, as shown in Table 3, Table 4, Table 5, where N denotes the number of training nodes, M denotes the number of committee nodes, H denotes number of local iterations, χ denotes the proportion of malicious nodes within the training node set, λ denotes the aggregation hyperparameter of FedAvg [62], T denotes number of global iterations, γ denotes learning rate, α denotes the aggregation hyperparameter of SHAFL, ϵ , δ denotes differential privacy parameters, and τ m a x denotes the maximum aggregation delay.

7.2. Experimental Result

7.2.1. Performance Analysis

In the absence of malicious nodes, Table 6 presents the model accuracy of each scheme across three datasets. Except for the non-privacy-preserving baseline scheme FedAvg, all other schemes employ a privacy budget of ϵ = 1 and δ = 1 × 10 3 , coupled with a subsampling rate of 0.9 . As evidenced by Table 6, under identical privacy budget conditions, the proposed scheme achieved superior model accuracy across all three datasets compared to other schemes, with the exception of the non-privacy-preserving baseline FedAvg.
Figure 7 illustrates the global model performance of five schemes across three datasets. As observed in Figure 7a on the MNIST dataset, the proposed SHAFL framework achieved comparable accuracy to the non-privacy-preserving baseline FedAvg, with a marginal difference of merely 0.26%. Furthermore, after the 40th training iteration, SHAFL, MSFL, and FedAvg all showed convergence in model accuracy. This demonstrates that the SHAFL framework maintains strong model utility and convergence properties under identical privacy budget constraints. Similarly, as depicted in Figure 7b,c, the SHAFL framework demonstrates robust performance on both the CIFAR-10 dataset and the Heart Disease dataset. Notably, on CIFAR-10, the model accuracy of the SHAFL framework surpasses FedAvg by a narrow margin of 0.06%, which can be attributed to the enhanced generalization capability enabled by the minimal noise injection. Additionally, SHAFL, FedAvg, and DP-FedAvg all converged around the 40th training iteration, collectively demonstrating stable optimization trajectories. After the 10th round, both MSFL and RAFLS exhibited persistent oscillations. This occurs because, as the model approaches convergence, excessive noise injection causes the model parameters to fluctuate around the optimum. In contrast, the SHAFL scheme employs eliminable noise, thereby effectively mitigating the occurrence of oscillations. On the Heart Disease dataset, the SHAFL framework exhibited 0.18% lower model accuracy than the non-privacy-preserving baseline FedAvg, yet outperformed all other comparative schemes. Additionally, the SHAFL framework demonstrated a marginally faster convergence rate than the remaining approaches. In conclusion, compared with the baseline approach, FedAvg and the SHAFL framework achieved comparable model accuracy while providing enhanced privacy protection for local data on training nodes. Compared with other privacy-preserving schemes under the same privacy budget, the SHAFL framework achieved higher model accuracy and superior convergence properties.

7.2.2. Impact of Sampling Strategies on Model Accuracy

We evaluated the impact of three subsampling strategies on model accuracy. In the experiments, the model fixed the local noise variance and adjusted the scheme’s privacy budget to control the sampling rate. Three privacy budget values, ϵ = 0.5 , 0.8 , and 1, were selected, corresponding to sampling rates of 63 % , 70 % , and 90 % , respectively. Figure 8 and Figure 9 illustrate the impact of different sampling strategies on model performance across datasets. As shown in Figure 8 and Figure 9, under privacy budgets of ϵ = 0.5 and 0.8 , the layer sampling proposed by the SHAFL framework achieved a significant improvement in model accuracy compared to other schemes, while exhibiting smaller oscillation amplitudes. At ϵ = 1 , the accuracy of layer sampling outperforms sequential sampling and matches the baseline FedAvg scheme without sampling.

7.2.3. Analysis of Byzantine Attack Resistance

To evaluate the Byzantine attack resistance of the models, it compared model accuracy across several schemes at varying proportions of malicious nodes. FedAvg served as the baseline to reflect the Byzantine robustness of other schemes in asynchronous environments. In the experiments, it set the maximum number of delay rounds τ m a x = 3 and the privacy budget ϵ = 1 . As shown in Figure 10 and Figure 11, all schemes experienced a significant drop in model accuracy at Turn 4. For instance, FedAvg in Figure 10 achieved an accuracy of 72.73% at Turn 6, but this plummeted to 49.18% at Turn 7, marking a 48.18% decline. These results indicate that the participation of stale models in aggregation during early training stages degrades accuracy more severely than the impact of a limited number of malicious nodes. From Figure 10 on the MNIST dataset, when χ = 0 , the accuracy of the SHAFL framework is 1.37% lower than FedAvg but 1.42% higher than PBFL. At χ = 0.2 , the SHAFL framework outperforms FedAvg, PBFL, and PPAFL by 4.59%, 8.35%, and 0.52%, respectively. When χ = 0.4 , the accuracy of the SHAFL framework surpasses FedAvg, PBFL, PPAFL and RAFLS by 12.75%, 34.26%, 16.62%, and 5.07%, respectively. Similarly, as shown in Figure 11, when χ = 0.2 , the accuracy of the SHAFL framework surpasses FedAvg, PBFL, PPAFL and RAFLS by 71.81%, 67.36%, 24.54%, and 67.09%, respectively. Figure 11 demonstrates that the SHAFL framework achieves higher accuracy than other schemes in environments with malicious nodes. Notably, when χ = 0.4 , the accuracy of the SHAFL framework exceeds FedAvg and PBFL by 43.37% and 39.19%, respectively. This superiority stems from the SHAFL framework’s mechanism: it evaluates each gateway-uploaded model’s accuracy on a test dataset before computing aggregation weights. This approach assigns lower aggregation weights to models from gateways that contain malicious nodes, thereby minimizing their influence on the global model.

7.2.4. Privacy Enhancement Analysis

As demonstrated in Figure 12, the correlation between local and central privacy across varying sampling rates shows that the local privacy budget of training nodes is significantly reduced by the subsampling mechanism and the shuffling model. This substantiates SHAFL’s inherent privacy-enhancing capability. Furthermore, the experimental results show that SHAFL’s privacy amplification effect strengthens as the subsampling rate decreases. This phenomenon occurs because lower sampling rates inherently retain fewer model parameters during aggregation, thereby containing a correspondingly lower amount of sensitive information susceptible to privacy leakage.

8. Conclusions

This study proposes a secure asynchronous hierarchical federated learning (SHAFL) framework. In the first layer, it introduces a decentralized mask–DP exchange protocol. Under a gateway, training nodes generate masks using the Gaussian mechanism and exchange them according to the protocol. Each training node then constructs a set of messages using its locally generated mask and those received from other nodes, such that their aggregation recovers the original local model without noise perturbation. To prevent gateways and training nodes from inferring private information from uploaded messages, it employs homomorphic encryption. At the gateway, a shuffling mechanism is applied to disrupt the order of uploaded messages, further enhancing the privacy-preserving level for the local models. In the second layer, it implements an accuracy-based, committee-consensus scoring mechanism, where the primary committee node uses a global test dataset to evaluate and score models uploaded by gateways, thereby determining their aggregation weights. This reduces the impact of malicious nodes on the global model. Theoretical analysis and experimental results demonstrate that our proposed SHAFL achieves superior performance in privacy-preserving and Byzantine-robustness. However, as our scheme employs the Paillier homomorphic encryption algorithm to resist collusion attacks, it incurs relatively high computational overhead. Additionally, our experimental results are obtained using an IID dataset, without considering the impact of Non-IID datasets on the convergence of model aggregation. In future work, we plan to explore ways to reduce computational cost under the existing security assumptions, while also accounting for the effects of non-IID datasets when designing the aggregation scheme.

Author Contributions

Conceptualization, D.A. and L.Y.; methodology, D.A. and L.Y.; software, L.Y.; validation, Y.C. and D.A.; formal analysis, Y.C. and D.A.; investigation, D.A. and L.Y.; resources, Y.C.; data curation, D.A. and Y.C.; writing—original draft preparation, D.A. and L.Y.; writing—review and editing, Y.C. and D.A.; visualization, D.A. and Y.C.; supervision, Y.C.; project administration, Y.C.; funding acquisition, Y.C. All authors have read and agreed to the published version of this manuscript.

Funding

This research was supported in part by the National Natural Science Foundation of China under Grant 62002106 and Grant 62002105, and in part by Hubei University of Technology Green Industry Technology Leading Program Project under Grant XJ2021000901.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, Z.; Xu, H.; Liu, J.; Huang, H.; Qiao, C.; Zhao, Y. Resource-efficient federated learning with hierarchical aggregation in edge computing. In Proceedings of the IEEE INFOCOM 2021-IEEE Conference on Computer Communications, Vancouver, BC, Canada, 10–13 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–10. [Google Scholar]
  2. Xu, C.; Qu, Y.; Xiang, Y.; Gao, L. Asynchronous federated learning on heterogeneous devices: A survey. Comput. Sci. Rev. 2023, 50, 100595. [Google Scholar] [CrossRef]
  3. Jiang, X.; Sun, A.; Sun, Y.; Luo, H.; Guizani, M. A Trust-Based Hierarchical Consensus Mechanism for Consortium Blockchain in Smart Grid. Tsinghua Sci. Technol. 2023, 28, 69–81. [Google Scholar] [CrossRef]
  4. Zhou, H.; Zheng, Y.; Huang, H.; Shu, J.; Jia, X. Toward Robust Hierarchical Federated Learning in Internet of Vehicles. IEEE Trans. Intell. Transp. Syst. 2023, 24, 5600–5614. [Google Scholar] [CrossRef]
  5. Huang, X.; Wu, Y.; Liang, C.; Chen, Q.; Zhang, J. Distance-aware hierarchical federated learning in blockchain-enabled edge computing network. IEEE Internet Things J. 2023, 10, 19163–19176. [Google Scholar] [CrossRef]
  6. Tan, H.; Wang, M.; Shen, J.; Vijayakumar, P.; Moh, S.; Wu, Q. Blockchain-Assisted Conditional Anonymous Authentication and Adaptive Tree-Based Group Key Agreement for VANETs. IEEE Trans. Dependable Secur. Comput 2025, 1–16. [Google Scholar] [CrossRef]
  7. Wang, B.; Tian, Z.; Tang, F.; Pan, H.; She, W.; Liu, W. Blockchain-empowered asynchronous federated reinforcement learning for IoT-based traffic trajectory prediction. IEEE Internet Things J. 2025, 12, 17095–17109. [Google Scholar] [CrossRef]
  8. Pan, Y.; Su, Z.; Wang, Y.; Zhou, J.; Mahmoud, M. Privacy-Preserving Byzantine-Robust Federated Learning via Deep Reinforcement Learning in Vehicular Networks. IEEE Trans. Veh. Technol. 2025, 74, 9461–9475. [Google Scholar] [CrossRef]
  9. Jin, X.; Wei, Y.; Han, Z. A Collaborative Sharding Consensus Mechanism for Blockchain-Based Federated Learning in IoT. IEEE Internet Things J. 2025, 12, 36422–36435. [Google Scholar] [CrossRef]
  10. Khan, M.S.; Hongsong, C. Hybrid transformer deep neural architectures for enhanced misinformation detection on social media. Expert Syst. Appl. 2025, 300, 130470. [Google Scholar] [CrossRef]
  11. Ma, H.; Yang, K.; Jiao, Y. Cellular Traffic Prediction via Byzantine-robust Asynchronous Federated Learning. IEEE Trans. Netw. Sci. Eng. 2025, 12, 2402–2414. [Google Scholar] [CrossRef]
  12. Liu, J.; Wu, Y.; Du, W.; Sun, R.; Xu, G.; Liu, L.; Wu, C. Byzantine-Robust Hierarchical Aggregation for Cross-Device Federated Learning in Consumer IoT. IEEE Trans. Consum. Electron. 2024, 71, 6359–6370. [Google Scholar] [CrossRef]
  13. Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership inference attacks against machine learning models. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 3–18. [Google Scholar]
  14. Fredrikson, M.; Jha, S.; Ristenpart, T. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015; pp. 1322–1333. [Google Scholar]
  15. Sun, G.; Cong, Y.; Dong, J.; Wang, Q.; Lyu, L.; Liu, J. Data poisoning attacks on federated machine learning. IEEE Internet Things J. 2021, 9, 11365–11375. [Google Scholar] [CrossRef]
  16. Tolpegin, V.; Truex, S.; Gursoy, M.E.; Liu, L. Data poisoning attacks against federated learning systems. In Proceedings of the Computer Security–ESORICs 2020: 25th European Symposium on Research in Computer Security, ESORICs 2020, Guildford, UK, 14–18 September 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 480–501. [Google Scholar]
  17. Qu, Y.; Gao, L.; Luan, T.H.; Xiang, Y.; Yu, S.; Li, B.; Zheng, G. Decentralized privacy using blockchain-enabled federated learning in fog computing. IEEE Internet Things J. 2020, 7, 5171–5183. [Google Scholar] [CrossRef]
  18. Yan, X.; Miao, Y.; Li, X.; Choo, K.K.R.; Meng, X.; Deng, R.H. Privacy-Preserving Asynchronous Federated Learning Framework in Distributed IoT. IEEE Internet Things J. 2023, 10, 13281–13291. [Google Scholar] [CrossRef]
  19. Tian, L.; Lin, F.; Gan, J.; Jia, R.; Zheng, Z.; Li, M. Pefl: Privacy-preserved and efficient federated learning with blockchain. IEEE Internet Things J. 2024, 12, 3305–3317. [Google Scholar] [CrossRef]
  20. Xie, Y.; Chen, B.; Zhang, J.; Li, W. ALGANs: Enhancing membership inference attacks in federated learning with GANs and active learning. In Proceedings of the 2022 IEEE International Symposium on Product Compliance Engineering-Asia (ISPCE-ASIA), Guangzhou, China, 4–6 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
  21. Fang, M.; Cao, X.; Jia, J.; Gong, N. Local model poisoning attacks to Byzantine-Robust federated learning. In Proceedings of the 29th USENIX Security Symposium (USENIX Security 20), Virtual, 12–14 August 2020; pp. 1605–1622. [Google Scholar]
  22. Baruch, G.; Baruch, M.; Goldberg, Y. A little is enough: Circumventing defenses for distributed learning. In Proceedings of the NIPS’19: 33rd International Conference on Neural Information, Vancouver, BC, Canada, 8–14 December 2019; Volume 32, pp. 8635–8645. [Google Scholar] [CrossRef]
  23. Zhang, T.; Xu, D.; Hu, Y.; Vijayakumar, P.; Zhu, Y.; Tolba, A. Deep Fingerprinting Data Learning Based on Federated Differential Privacy for Resource-Constrained Intelligent IoT Systems. IEEE Internet Things J. 2024, 11, 25744–25756. [Google Scholar] [CrossRef]
  24. Sun, K.; Xu, H.; Hua, K.; Lin, X.; Li, G.; Jiang, T.; Li, J. Joint Top-K Sparsification and Shuffle Model for Communication-Privacy-Accuracy Tradeoffs in Federated-Learning-Based IoV. IEEE Internet Things J. 2024, 11, 19721–19735. [Google Scholar] [CrossRef]
  25. Liu, X.; Zhou, Y.; Wu, D.; Hu, M.; Hui Wang, J.; Guizani, M. FedDP-SA: Boosting Differentially Private Federated Learning via Local Data Set Splitting. IEEE Internet Things J. 2024, 11, 31687–31698. [Google Scholar] [CrossRef]
  26. Yang, R.; Zhao, T.; Yu, F.R.; Li, M.; Zhang, D.; Zhao, X. Blockchain-based federated learning with enhanced privacy and security using homomorphic encryption and reputation. IEEE Internet Things J. 2024, 11, 21674–21688. [Google Scholar] [CrossRef]
  27. Miao, Y.; Liu, Z.; Li, H.; Choo, K.K.R.; Deng, R.H. Privacy-Preserving Byzantine-Robust Federated Learning via Blockchain Systems. IEEE Trans. Inf. Forensics Secur. 2022, 17, 2848–2861. [Google Scholar] [CrossRef]
  28. Hijazi, N.M.; Aloqaily, M.; Guizani, M.; Ouni, B.; Karray, F. Secure Federated Learning with Fully Homomorphic Encryption for IoT Communications. IEEE Internet Things J. 2024, 11, 4289–4300. [Google Scholar] [CrossRef]
  29. Feng, C.; Liu, B.; Yu, K.; Goudos, S.K.; Wan, S. Blockchain-empowered decentralized horizontal federated learning for 5G-enabled UAVs. IEEE Trans. Ind. Inform. 2021, 18, 3582–3592. [Google Scholar] [CrossRef]
  30. Masuda, H.; Kita, K.; Koizumi, Y.; Takemasa, J.; Hasegawa, T. Model Fragmentation, Shuffle and Aggregation to Mitigate Model Inversion in Federated Learning. In Proceedings of the 2021 IEEE International Symposium on Local and Metropolitan Area Networks (LANMAN), Boston, MA, USA, 12–14 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
  31. Shen, M.; Wang, J.; Zhang, J.; Zhao, Q.; Peng, B.; Wu, T.; Zhu, L.; Xu, K. Secure decentralized aggregation to prevent membership privacy leakage in edge-based federated learning. IEEE Trans. Netw. Sci. Eng. 2024, 11, 3105–3119. [Google Scholar] [CrossRef]
  32. Dwork, C.; Roth, A. The algorithmic foundations of differential privacy. Found. Trends® Theor. Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
  33. Wang, T.; Chen, J.Q.; Zhang, Z.; Su, D.; Cheng, Y.; Li, Z.; Li, N.; Jha, S. Continuous release of data streams under both centralized and local differential privacy. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual, 15–19 November 2021; pp. 1237–1253. [Google Scholar]
  34. Wang, S.; Gai, K.; Yu, J.; Zhu, L.; Wu, H.; Wei, C.; Yan, Y.; Zhang, H.; Choo, K.K.R. Rafls: Rdp-based adaptive federated learning with shuffle model. IEEE Trans. Dependable Secur. Comput. 2024, 22, 1181–1194. [Google Scholar] [CrossRef]
  35. Yuan, X.; Ni, W.; Ding, M.; Wei, K.; Li, J.; Poor, H.V. Amplitude-varying perturbation for balancing privacy and utility in federated learning. IEEE Trans. Inf. Forensics Secur. 2023, 18, 1884–1897. [Google Scholar] [CrossRef]
  36. Li, S.; Ngai, E.; Voigt, T. Byzantine-robust aggregation in federated learning empowered industrial iot. IEEE Trans. Ind. Inform. 2021, 19, 1165–1175. [Google Scholar] [CrossRef]
  37. Li, T.; Hu, S.; Beirami, A.; Smith, V. Ditto: Fair and robust federated learning through personalization. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 6357–6368. [Google Scholar]
  38. Xie, C.; Koyejo, O.; Gupta, I. Fall of empires: Breaking byzantine-tolerant sgd by inner product manipulation. In Proceedings of the Uncertainty in Artificial Intelligence, Tel Aviv, Israel, 22–25 July 2020; pp. 261–270. [Google Scholar]
  39. Li, L.; Xu, W.; Chen, T.; Giannakis, G.B.; Ling, Q. RSA: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 7 January–1 February 2019; Volume 33, pp. 1544–1551. [Google Scholar]
  40. Karimireddy, S.P.; He, L.; Jaggi, M. Learning from history for byzantine robust optimization. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 5311–5319. [Google Scholar]
  41. Jere, M.S.; Farnan, T.; Koushanfar, F. A taxonomy of attacks on federated learning. IEEE Secur. Priv. 2020, 19, 20–28. [Google Scholar] [CrossRef]
  42. Blanchard, P.; El Mhamdi, E.M.; Guerraoui, R.; Stainer, J. Machine learning with adversaries: Byzantine tolerant gradient descent. In Proceedings of the NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 118–128. [Google Scholar] [CrossRef]
  43. Chen, Y.; Su, L.; Xu, J. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proc. ACM Meas. Anal. Comput. Syst. 2017, 1, 1–25. [Google Scholar] [CrossRef]
  44. Pillutla, K.; Kakade, S.M.; Harchaoui, Z. Robust aggregation for federated learning. IEEE Trans. Signal Process. 2022, 70, 1142–1154. [Google Scholar] [CrossRef]
  45. Li, S.; Ngai, E.C.H.; Voigt, T. An experimental study of byzantine-robust aggregation schemes in federated learning. IEEE Trans. Big Data 2023, 10, 975–988. [Google Scholar] [CrossRef]
  46. Sattler, F.; Müller, K.R.; Wiegand, T.; Samek, W. On the Byzantine Robustness of Clustered Federated Learning. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 8861–8865. [Google Scholar] [CrossRef]
  47. Li, Z.; Liu, L.; Zhang, J.; Liu, J. Byzantine-robust federated learning through spatial-temporal analysis of local model updates. In Proceedings of the 2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS), Beijing, China, 14–16 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 372–379. [Google Scholar]
  48. Yin, D.; Chen, Y.; Kannan, R.; Bartlett, P. Byzantine-robust distributed learning: Towards optimal statistical rates. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 5650–5659. [Google Scholar]
  49. Xie, C.; Koyejo, S.; Gupta, I. Asynchronous federated optimization. arXiv 2019, arXiv:1903.03934. [Google Scholar]
  50. Miao, Y.; Liu, Z.; Li, X.; Li, M.; Li, H.; Choo, K.K.R.; Deng, R.H. Robust asynchronous federated learning with time-weighted and stale model aggregation. IEEE Trans. Dependable Secur. Comput. 2023, 21, 2361–2375. [Google Scholar] [CrossRef]
  51. Wu, M.; Boban, M.; Dressler, F. Flexible training and uploading strategy for asynchronous federated learning in dynamic environments. IEEE Trans. Mob. Comput. 2024, 23, 12907–12921. [Google Scholar] [CrossRef]
  52. Chen, Z.; Yi, W.; Shin, H.; Nallanathan, A. Adaptive semi-asynchronous federated learning over wireless networks. IEEE Trans. Commun. 2024, 73, 394–409. [Google Scholar] [CrossRef]
  53. Wei, K.; Li, J.; Ding, M.; Ma, C.; Yang, H.H.; Farokhi, F.; Jin, S.; Quek, T.Q.; Poor, H.V. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3454–3469. [Google Scholar] [CrossRef]
  54. Jiang, B.; Li, J.; Wang, H.; Song, H. Privacy-preserving federated learning for industrial edge computing via hybrid differential privacy and adaptive compression. IEEE Trans. Ind. Inform. 2021, 19, 1136–1144. [Google Scholar] [CrossRef]
  55. Guo, H.; Wang, H.; Song, T.; Hua, Y.; Ma, R.; Jin, X.; Xue, Z.; Guan, H. Siren+: Robust Federated Learning With Proactive Alarming and Differential Privacy. IEEE Trans. Dependable Secur. Comput. 2024, 21, 4843–4860. [Google Scholar] [CrossRef]
  56. Zhang, C.; Weng, J.; Weng, J.; Zhong, Y.; Liu, J.N.; Deng, C. Robust and Secure Federated Learning with Verifiable Differential Privacy. IEEE Trans. Dependable Secur. Comput. 2025, 22, 5713–5729. [Google Scholar] [CrossRef]
  57. Xu, C.; Ge, J.; Deng, Y.; Gao, L.; Zhang, M.; Li, Y.; Zhou, W.; Zheng, X. BASS: A blockchain-based asynchronous SignSGD architecture for efficient and secure federated learning. IEEE Trans. Dependable Secur. Comput. 2024, 21, 5388–5402. [Google Scholar] [CrossRef]
  58. Balle, B.; Bell, J.; Gascón, A.; Nissim, K. The privacy blanket of the shuffle model. In Proceedings of the Advances in Cryptology–CRYPTO 2019: 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, 18–22 August 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 638–667. [Google Scholar]
  59. Girgis, A.; Data, D.; Diggavi, S.; Kairouz, P.; Suresh, A.T. Shuffled model of differential privacy in federated learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Virtual, 13–15 April 2021; pp. 2521–2529. [Google Scholar]
  60. Balle, B.; Barthe, G.; Gaboardi, M. Privacy amplification by subsampling: Tight analyses via couplings and divergences. In Proceedings of the NIPS’18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 3–8 December 2018; Volume 31, pp. 6280–6290. [Google Scholar] [CrossRef]
  61. Zhou, Z.; Xu, C.; Wang, M.; Kuang, X.; Zhuang, Y.; Yu, S. A Multi-Shuffler Framework to Establish Mutual Confidence for Secure Federated Learning. IEEE Trans. Dependable Secur. Comput. 2023, 20, 4230–4244. [Google Scholar] [CrossRef]
  62. McMahan, H.B.; Moore, E.; Ramage, D.; y Arcas, B.A. Federated learning of deep networks using model averaging. arXiv 2016, arXiv:1602.05629. [Google Scholar]
  63. Zhao, Z.; Luo, M.; Ding, W. Deep leakage from model in federated learning. arXiv 2022, arXiv:2206.04887. [Google Scholar] [CrossRef]
  64. Yin, H.; Mallya, A.; Vahdat, A.; Alvarez, J.M.; Kautz, J.; Molchanov, P. See through gradients: Image batch recovery via gradinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16337–16346. [Google Scholar]
  65. Zhao, J.C.; Sharma, A.; Elkordy, A.R.; Ezzeldin, Y.H.; Avestimehr, S.; Bagchi, S. Loki: Large-scale data reconstruction attack against federated learning through model manipulation. In Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1287–1305. [Google Scholar]
  66. Lyu, L.; Yu, H.; Ma, X.; Chen, C.; Sun, L.; Zhao, J.; Yang, Q.; Yu, P.S. Privacy and Robustness in Federated Learning: Attacks and Defenses. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 8726–8746. [Google Scholar] [CrossRef]
  67. Wang, S.; Luo, X.; Qian, Y.; Zhu, Y.; Chen, K.; Chen, Q.; Xin, B.; Yang, W. Shuffle differential private data aggregation for random population. IEEE Trans. Parallel Distrib. Syst. 2023, 34, 1667–1681. [Google Scholar] [CrossRef]
  68. Zeng, Z.; Liu, Y.; Chang, L. A robust and optional privacy data aggregation scheme for fog-enhanced IoT network. IEEE Syst. J. 2022, 17, 1110–1120. [Google Scholar] [CrossRef]
  69. Paillier, P. Public-key cryptosystems based on composite degree residuosity classes. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques; Springer: Berlin/Heidelberg, Germany, 1999; pp. 223–238. [Google Scholar]
  70. Chen, Y.; Yan, L.; Ai, D. An Robust Secure Blockchain-based Hierarchical Asynchronous Federated Learning Scheme for Internet of Things. IEEE Access 2024, 12, 165280–165297. [Google Scholar] [CrossRef]
  71. Geyer, R.C.; Klein, T.; Nabi, M. Differentially private federated learning: A client level perspective. arXiv 2017, arXiv:1712.07557. [Google Scholar]
  72. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  73. Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: https://www.cs.toronto.edu/~kriz/index.html (accessed on 9 January 2026).
  74. Dutta, A.; Batabyal, T.; Basu, M.; Acton, S.T. An efficient convolutional neural network for coronary heart disease prediction. Expert Syst. Appl. 2020, 159, 113408. [Google Scholar] [CrossRef]
  75. Wang, L.; Polato, M.; Brighente, A.; Conti, M.; Zhang, L.; Xu, L. PriVeriFL: Privacy-preserving and aggregation-verifiable federated learning. IEEE Trans. Serv. Comput. 2024, 18, 998–1011. [Google Scholar] [CrossRef]
Figure 1. The structure of HAFL.
Figure 1. The structure of HAFL.
Sensors 26 00617 g001
Figure 2. Time workflow of asynchronous update mechanism in SHAFL framework.
Figure 2. Time workflow of asynchronous update mechanism in SHAFL framework.
Sensors 26 00617 g002
Figure 3. Shuffle model.
Figure 3. Shuffle model.
Sensors 26 00617 g003
Figure 4. Mask–DP exchange protocol when n = 4 and m = 3 .
Figure 4. Mask–DP exchange protocol when n = 4 and m = 3 .
Sensors 26 00617 g004
Figure 5. The workflow of iterative SHAFL framework.
Figure 5. The workflow of iterative SHAFL framework.
Sensors 26 00617 g005
Figure 6. The subsample and dummy-layer filling of the SHAFL framework.
Figure 6. The subsample and dummy-layer filling of the SHAFL framework.
Sensors 26 00617 g006
Figure 7. Comparison of the model testing accuracy of each method under three datasets, where ϵ = 1 , χ = 0 , (a) corresponds to the MNIST dataset, (b) corresponds to the CIFAR-10 dataset, and (c) corresponds to the Heart Disease dataset.
Figure 7. Comparison of the model testing accuracy of each method under three datasets, where ϵ = 1 , χ = 0 , (a) corresponds to the MNIST dataset, (b) corresponds to the CIFAR-10 dataset, and (c) corresponds to the Heart Disease dataset.
Sensors 26 00617 g007
Figure 8. Model accuracy of different sampling strategies on the MNIST dataset, where (a) corresponds to ϵ = 0.5 , (b) corresponds to ϵ = 0.8 , and (c) corresponds to ϵ = 1 .
Figure 8. Model accuracy of different sampling strategies on the MNIST dataset, where (a) corresponds to ϵ = 0.5 , (b) corresponds to ϵ = 0.8 , and (c) corresponds to ϵ = 1 .
Sensors 26 00617 g008
Figure 9. Model accuracy of different sampling strategies on the Heart Disease dataset, where (a) corresponds to ϵ = 0.5 , (b) corresponds to ϵ = 0.8 , and (c) corresponds to ϵ = 1 .
Figure 9. Model accuracy of different sampling strategies on the Heart Disease dataset, where (a) corresponds to ϵ = 0.5 , (b) corresponds to ϵ = 0.8 , and (c) corresponds to ϵ = 1 .
Sensors 26 00617 g009
Figure 10. Model accuracy of different χ on MNIST dataset, where (a) corresponds to χ = 0 , (b) corresponds to χ = 0.2 , and (c) corresponds to χ = 0.4 .
Figure 10. Model accuracy of different χ on MNIST dataset, where (a) corresponds to χ = 0 , (b) corresponds to χ = 0.2 , and (c) corresponds to χ = 0.4 .
Sensors 26 00617 g010
Figure 11. Model accuracy of different χ on Heart Disease dataset, where (a) corresponds to χ = 0 , (b) corresponds to χ = 0.2 , and (c) corresponds to χ = 0.4 .
Figure 11. Model accuracy of different χ on Heart Disease dataset, where (a) corresponds to χ = 0 , (b) corresponds to χ = 0.2 , and (c) corresponds to χ = 0.4 .
Sensors 26 00617 g011
Figure 12. Comparison of privacy enhancement with different subsampling rates.
Figure 12. Comparison of privacy enhancement with different subsampling rates.
Sensors 26 00617 g012
Table 1. Nomenclature.
Table 1. Nomenclature.
NotationDescriptionNotationDescription
Fglobal loss c i , c i o i-th trainer; i-th trainer in G o
ρ weights y o gateway of G o
floss function G o o-th group
wmodel & model parametershh-th iteration
w t global model for t-th turntt-th turn
w c i o t τ o , h local update for c in h-th iteration t-th turn ξ sample of dataset
w y o t τ o local update for group G o τ o delay of group G o
M i , k t τ o m-multi-masks R t , o u i score of model w y o t τ o generated by u i
x i , k t τ o , z vector of z t h layer of mask M i , k t τ o v z number of x i , r t τ o , z
u q primary committee node u i committee nodes
D t e s t global datasets s o size of group G o
D c i o local dataset for c i o Cset of training nodes
α hyperparameters of aggregationUset of committees
γ c i o learning rate of training node c i o Nnumber of training nodes
η t number of w y o t τ o collected in t-th turnMnumber of committee nodes
σ variance of noisemnumber of exchange noises
π c i t τ o mask generated by c i o Knumber of groups
Oset of groupsYset of gateways
θ t τ o , z z t h layer of model w y 0 t τ o p i , 0 t τ o , z z t h dummy layer of model w c i o t τ o
H m i n minimum training iteration H m a x maximum training iteration
Table 2. Computational complexity.
Table 2. Computational complexity.
SchemeLocal TrainAggregationPrivacy Preserving
FedAvg [62] O ( | w | · | D | · N · H · T ) O ( | w | · N · T )
DP-FedAvg [71] O ( | w | · | D | · N · H · T ) O ( | w | · N · T ) O ( | w | · N · T )
FedSDP [24] O ( ( | w | + d ) · | D | · N · H · T ) O ( | w | · N · T ) O ( ( d · N · H + N ) · T )
MSFL [61] O ( | w | · | D | · N · H · T ) O ( | w | · N · T ) O ( ( N + d ) · N · T )
PBFL [27] O ( | w | · | D | · N · H · T ) O ( n d log ( n ) 2 ) O ( n d log ( n ) 3 )
PPAFL [18] O ( | w | · | D | · N · H · T ) O ( | w | · N · T · M ) O ( | w | · N · T )
RAFLS [34] O ( | w | · | D | · N · H · T ) O ( ( | w | + d ) · N · T ) O ( ( | w | + d ) · N · T )
Proposed O ( | w | · | D | · N · H · T ) O ( | w | · N · T · log n ) O ( ( | w | + | w | · log ( n ) + d ) · m · N · T )
Table 3. Hyperparameter notations on MNIST.
Table 3. Hyperparameter notations on MNIST.
ParamValueParamValue
N 20 M 10
H20batchsize64
χ {0, 0.2, 0.4} λ 0.1
T 50 γ 0.08
α 0.3 ϵ {0.5, 0.8, 1}
τ m a x 3 δ 1 × 10 3
Table 4. Hyperparameter notations on CIFAR-10.
Table 4. Hyperparameter notations on CIFAR-10.
ParamValueParamValue
N 20 M 10
H20batchsize32
χ {0, 0.2, 0.4} λ 0.03
T 50 γ 0.001
α 0.85 ϵ {0.5, 0.8, 1}
τ m a x 3 δ 1 × 10 3
Table 5. Hyperparameter notations on Heart Disease dataset.
Table 5. Hyperparameter notations on Heart Disease dataset.
ParamValueParamValue
N 20 M 10
H20batchsize500
χ {0, 0.2, 0.4} λ 0.03
T 50 γ 0.01
α 0.85 ϵ {0.5, 0.8, 1}
τ m a x 3 δ 1 × 10 3
Table 6. Model accuracy.
Table 6. Model accuracy.
χ = 0 , ϵ = 1 , δ = 1 × 10 3 , subsampling rate = 90%
DatasetFedAvgDP-FedAvgFedSDPMSFLRAFLSProposed
MNIST93.0192.4687.6692.4690.0792.77
CIFAR-1077.6177.7055.9875.5175.2577.75
Heart Disease Dataset10099.6390.0099.4597.9999.82
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Ai, D.; Yan, L. Secure Hierarchical Asynchronous Federated Learning with Shuffle Model and Mask–DP. Sensors 2026, 26, 617. https://doi.org/10.3390/s26020617

AMA Style

Chen Y, Ai D, Yan L. Secure Hierarchical Asynchronous Federated Learning with Shuffle Model and Mask–DP. Sensors. 2026; 26(2):617. https://doi.org/10.3390/s26020617

Chicago/Turabian Style

Chen, Yonghui, Daxiang Ai, and Linglong Yan. 2026. "Secure Hierarchical Asynchronous Federated Learning with Shuffle Model and Mask–DP" Sensors 26, no. 2: 617. https://doi.org/10.3390/s26020617

APA Style

Chen, Y., Ai, D., & Yan, L. (2026). Secure Hierarchical Asynchronous Federated Learning with Shuffle Model and Mask–DP. Sensors, 26(2), 617. https://doi.org/10.3390/s26020617

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop