A Review of Research on Secure Aggregation for Federated Learning

Zhang, Xing; Luo, Yuexiang; Li, Tianning

doi:10.3390/fi17070308

Open AccessReview

A Review of Research on Secure Aggregation for Federated Learning

by

Xing Zhang

¹

,

Yuexiang Luo

¹ and

Tianning Li

^2,*

¹

School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China

²

School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China

^*

Author to whom correspondence should be addressed.

Future Internet 2025, 17(7), 308; https://doi.org/10.3390/fi17070308

Submission received: 17 June 2025 / Revised: 11 July 2025 / Accepted: 15 July 2025 / Published: 17 July 2025

Download

Browse Figures

Versions Notes

Abstract

Federated learning (FL) is an advanced distributed machine learning method that effectively solves the data silo problem. With the increasing popularity of federated learning and the growing importance of privacy protection, federated learning methods that can securely aggregate models have received widespread attention. Federated learning enables clients to train models locally and share their model updates with the server. While this approach allows collaborative model training without exposing raw data, it still risks leaking sensitive information. To enhance privacy protection in federated learning, secure aggregation is considered a key enabling technology that requires further in-depth investigation. This paper summarizes the definition, classification, and applications of federated learning; reviews secure aggregation protocols proposed to address privacy and security issues in federated learning; extensively analyzes the selected protocols; and concludes by highlighting the significant challenges and future research directions in applying secure aggregation in federated learning. The purpose of this paper is to review and analyze prior research, evaluate the advantages and disadvantages of various secure aggregation schemes, and propose potential future research directions. This work aims to serve as a valuable reference for researchers studying secure aggregation in federated learning.

Keywords:

privacy protection; secure aggregation; federated learning

1. Introduction

With the explosive growth of data, neural networks have achieved important results in computer vision, speech recognition, agricultural science, and other fields [1,2,3,4,5]. However, building joint datasets across organizations and individuals faces the dual challenges of data silos and privacy protection. Traditional data processing methods usually adopt centralized collection and unified processing, but they are prone to data leakage, especially in scenarios where multiple users share data [6]. In order to protect user privacy, countries have introduced relevant laws and regulations; for example, the Cyber Security Law of the People’s Republic of China clearly stipulates that network operators shall not disclose or provide personal information without authorization. Although these regulations effectively protect privacy, they also make data collection and model training more difficult.

In order to protect privacy and solve the data silo problem, federated learning has emerged. In 2016, Google first proposed federated learning [7,8], the core idea of which is to train models through local devices and send model parameters to a central server for global aggregation, thus realizing localized training of data and privacy protection. Compared with traditional centralized machine learning methods, federated learning avoids the risk of privacy leakage caused by centralized data storage and, at the same time, reduces communication overhead and improves computational efficiency.

However, federated learning has system vulnerabilities that can be exploited for attacks during the training process to transmit model data. For instance, a malicious centralized server may potentially reconstruct a user’s private data, and a malicious participant may also negatively impact the training of the global model. Therefore, designing secure and efficient aggregation algorithms becomes crucial to safeguarding the privacy and performance of federated learning. Current research focuses on combining privacy-preserving mechanisms with aggregation algorithms to address more complex threat models. In addition, since the participants are usually computationally resource-constrained devices, such as mobile terminals, the aggregation schemes also need to be characterized by high communication efficiency and fault tolerance. Given the growing demand for privacy protection, it is crucial to systematically analyze the strengths and limitations of existing security aggregation techniques. In this paper, we categorize and discuss mainstream schemes from the perspectives of security and efficiency, and evaluate their applicability scenarios.

The current research review on federated learning mainly focuses on its definition, application, classification, and system security, and seldom focuses on the main aspects of security aggregation. In summarizing the existing literature, we identify the following four deficiencies:

There is no detailed introduction of privacy-preserving mechanisms applied to federated learning, and the discussion of security is limited to describing the development process of integrating privacy-preserving mechanisms with federated learning [9,10,11].
In the discussion of security, only the means of attack and the defense methods that can be used are analyzed, and there is no systematic categorization of defense methods based on privacy-preserving mechanisms [12].
The literature only analyzes the possibility of combining individual privacy-preserving mechanisms with federated learning, lacking a side-by-side comparison with other security aggregations [13,14].
Existing work compares security aggregation algorithms with other aggregation algorithms, focusing on their differences, but lacks a longitudinal comparison of security aggregation algorithms [15,16].

Therefore, this paper primarily introduces the classification of federated learning from the perspective of secure aggregation, details the privacy protection mechanisms currently employed in federated learning, and categorizes the secure aggregation schemes based on privacy protection mechanisms. The main contributions of this work are as follows:

This work classifies federated learning from the perspective of secure aggregation, provides a detailed overview of privacy-preserving mechanisms currently applied in federated learning, and categorizes existing secure aggregation schemes based on these mechanisms.
This work evaluates the resource consumption, protected models, accuracy, and network structures of different schemes, with vertical comparisons of secure aggregation algorithms under the same privacy-preserving mechanism and horizontal comparisons across distinct mechanisms.
This work examines the future research directions of secure aggregation and the associated challenges.

This paper is organized as follows: Section 2 introduces the definition, classification, and application scenarios of federated learning; Section 3 introduces the privacy-preserving mechanisms used in federated learning; Section 4 introduces the original aggregation algorithms for federated learning and analyzes and summarizes existing research on secure aggregation algorithms; Section 5 analyzes the challenges faced by secure aggregation in federated learning and the future research directions, and finally concludes the paper.

2. Federated Learning

2.1. Definition of Federated Learning

Federated learning is the process of distributed machine learning training deployed across multiple clients. To ensure data privacy, federated learning only allows clients to exchange model gradients with a central server. In this process, each client trains its own model using local data and then uploads the local model to the central server. After aggregating all received models, the server returns the new global model to each client, as shown in Figure 1.

In practice, it is assumed that each n client

\{C_{1}, C_{2}, \dots, C_{n}\}

holds a private dataset

\{D_{1}, D_{2}, \dots, D_{n}\}

and has no direct access to the data of other clients. A typical training flow consists of the following three steps [8]:

Initialization: At the t-th round of communication, the client downloads the latest model, $W_{t}$ , from the server for initialization.
Local training: Each client, $C_{k}$ , iteratively trains on its own local dataset, $D_{k}$ , and hyperparameters, $η$ . The local model, $W_{k}^{t}$ , is updated according to $W_{k}^{t + 1} \leftarrow W_{k}^{t} (η, D_{k})$ and sent to the server.
Model aggregation: The server aggregates the collected local models $\{W_{1}, W_{2}, \dots, W_{n}\}$ into a global model for the global model update.

2.2. Classification of Federated Learning

2.2.1. Data Partition

According to the difference between feature space and sample space [17], federated learning can be categorized into three types, which are horizontal federated learning, vertical federated learning, and federated transfer learning. Horizontal federated learning is suitable for scenarios that have the same feature space but different sample spaces. Vertical federated learning applies to scenarios that have the same sample space but with different feature spaces. Federated transfer learning applies to scenarios where the data comes from different participants that are very different in terms of features and samples.

Horizontal federated learning partitions the same feature space across participants’ data, brings together multiple data holders, and trains on data with the same features. Horizontal federated learning improves the accuracy and generalization ability of the model and is widely used in healthcare and finance.

As shown in Figure 2, samples with the same features from different healthcare organizations are collected through federated learning, and the shared features are used to train the model. Using horizontal federated learning to train the model not only increases the total number of training samples but also improves the accuracy of the model.

Vertical federated learning partitions the same sample space in the participant data to align the overlapping sample data between different participants and improve the feature dimension of the sample data, and it is mainly applied between non-competing companies or organizations.

As shown in Figure 3, financial institutions cooperate with e-commerce platforms to use different feature space data of the same users for model training to improve the feature dimension of the samples, better predict user consumption, and facilitate the rating of their service levels afterward.

Federated transfer learning derives common representations from different feature spaces and a limited set of common samples. Transfer learning is applied to scenarios where the relevant dataset is small, but more data is needed to optimize the model performance [18,19,20].

As shown in Figure 4, federated transfer learning can be used to transfer treatment and diagnosis information from different healthcare organizations to make a more comprehensive diagnosis of a disease. It can make training more flexible under different data structures, but the current research in federated transfer learning is not mature enough, so there is still much room for development.

2.2.2. Client Size

Based on the size of the number of clients, federated learning can be categorized into two types—cross-device federated learning and cross-silo federated learning. Cross-device federated learning refers to federated learning involving a large number of widely dispersed devices. Cross-silo federated learning involves a small number of organizations that store large amounts of sensitive data.

Cross-device federated learning is a federated learning approach trained using a large number of clients that have similar domains to the global model, such as IoT-based systems [21], where each device independently collects its usage data. The advantage of this method is that it maximizes the use of decentralized data sources while protecting user privacy.

Cross-silo federated learning typically involves large organizations or companies as clients, where different companies collaborate to build a federated system, usually ranging from as few as 2 to 100 devices, where each company maintains local data and performs model training based on its respective database. Cross-silo federated learning is more flexible in its implementation and can be trained on the data as needed by the organization.

2.2.3. Network Structure

Distinguished by network architecture, federated learning consists of centralized federated learning and decentralized federated learning. Centralized federated learning uses a central server to guide aggregation and synchronization. Decentralized federated learning allows clients to communicate directly with each other in a peer-to-peer (P2P) manner.

In centralized federated learning, each client trains a local model with its data, and the central server is responsible for model aggregation. Centralized federated learning is mainly for multi-user federated learning scenarios, where the enterprise acts as a server and coordinates the global model. The network topology of this model is shown in Figure 5.

In decentralized federated learning, each client trains a local model based on its private data. After training, the client may choose to communicate with other clients, and the clients exchange or fuse local models. The network topology diagram adopted for decentralized federated learning is shown in Figure 6. Decentralized federated learning eliminates the dependence on a central server for model aggregation, and replaces it with algorithms that build trust and reliability, thus reducing the presence of untrusted servers and increasing resilience to network failures.

2.3. Applications

Federated learning is currently exhibiting its great potential in various fields, and applying federated learning to real-life applications can protect user data, improve the accuracy of models, and provide higher-quality services.

The main application areas of federated learning demand robust privacy protection, the ability to handle data heterogeneity, suitability for real-time scenarios, and efficient management of widely distributed data.

In the field of mobile devices, mobile applications are typically latency-sensitive. Federated learning supports model updates directly on local devices, reducing communication with central servers and enabling low-latency responses. Given the heterogeneity of mobile applications due to diverse user behaviors, federated learning can aggregate individualized device models to deliver personalized services.

In agriculture, data is widely distributed across various sources such as sensors and drones. Federated learning minimizes the communication overhead associated with centralized data transmission, leverages data diversity, and enhances model generalization. Agricultural applications often require real-time predictive analytics, such as yield forecasting and disease classification. Federated learning supports edge device-based model updates, improving real-time performance and alleviating cloud computing burdens.

In healthcare, sensitive data (e.g., medical records, imaging data, genetic information) poses significant privacy risks if exposed. Federated learning provides robust privacy protection by enabling multi-institutional collaborative modeling without sharing raw data. It addresses the challenges of data ownership and privacy in clinical research, drug development, and other fields by facilitating the integration of multi-center data while safeguarding patient confidentiality.

In the field of renewable energy, it is challenging for a single model to adapt to complex environments due to significant regional variations in meteorological conditions, such as wind speed and solar irradiance. Therefore, improving model generalization capability is essential. Federated learning has been widely adopted as it enables the sharing of modeling experience and enhances both the generalization and prediction accuracy of models across diverse scenarios.

Federated learning is more frequently applied in these fields due to its inherent advantages in privacy protection, real-time processing, support for training on distributed data, and the maturity of its technologies and ecosystem.

2.3.1. Mobile Device

Federated learning is applied to smartphones to predict human behavioral trajectories while preserving user privacy. Google built federated learning to train language models among Android mobile users [22] to further improve keyboard prediction of keystrokes, which improves keyboard prediction accuracy and protects privacy compared to the previous one. The use of federated learning to train word- and character-level recursive neural network models for predicting keystrokes and emoticons [23] enables server-based data collection and training in a commercial environment.

Mobile devices also include devices in IoT environments, and federated learning has also been applied to IoT. In [24], federated learning is combined with smart home IoT to build a federated multi-task learning framework that learns customized, context-aware policies from multiple smart homes while preserving privacy.

Applying federated learning to wireless communications, such as edge computing and 5G networks, can solve the energy, bandwidth, latency, and data privacy problems in wireless communications. In [25], federated learning is applied to wireless networks and edge mobile device computing, where a distributed stochastic gradient descent scheme over a shared noisy wireless channel enables faster model convergence, higher accuracy, and more efficient utilization of limited channel bandwidth resources. In [26], federated learning is applied to heterogeneous mobile edge devices to address three key challenges: communication efficiency, statistical heterogeneity, and system heterogeneity.

2.3.2. Agriculture

Currently, in agriculture, there is a growing demand for data decentralization and privacy protection, such as farm sensor data [27,28] and yield monitoring data [29,30]. Federated learning enables smart agriculture to efficiently collect specific environmental variables, optimize planting strategies, and predict the impact of environmental changes on crop yield and quality, thereby outperforming traditional data collection methods [31,32,33].

In the agricultural supply chain, federated learning facilitates the integration of data across various stages, enhancing transparency and efficiency [34]. For instance, an intelligent agricultural machinery collaboration model based on federated learning [35,36] can coordinate agricultural machinery systems with external supply chains, storage, and transportation networks. This integration significantly improves operational efficiency, task quality, and land utilization.

In precision agriculture, federated learning enables efficient management of agricultural resources, ensuring sustainable agricultural development. For example, it supports precise irrigation and water-saving systems [37], accurate crop growth predictions [38], and effective crop monitoring [39].

Using federated learning for smart agriculture enables disease classification and detection. In [40], federated learning combined with convolutional neural networks was utilized to detect watermelon leaf diseases. The scheme was evaluated across five different clients and five disease categories, as follows: healthy, anthracnose, downy mildew, powdery mildew, and bacterial blight mosaic. The results demonstrated a detection accuracy of 97% for these specific diseases.

2.3.3. Healthcare

Federated learning has a promising application in the healthcare domain because of its data privacy-preserving properties. For individual healthcare organizations, the amount of patient data may not be sufficient to train predictive models that can be put into use [41]; therefore, the use of federated learning techniques can build federated networks for cross-regional hospitals with similar medical information. The emergence of federated learning breaks down the barriers that prevent data from being shared between different healthcare organizations for disease analysis.

In [42], federated learning is applied to the collection and analysis of electronic health records, enabling the development of secure data harmonization and federated computational procedures to transform extensive electronic health records into meaningful phenotypic clinical concepts for analysis.

In [43], federated learning is applied to address the problem of searching for similar patient matches. The similarity between patients is efficiently computed using their hash codes, enabling healthcare organizations to summarize the general characteristics of similar patients.

Automatic extraction of medical information using an unstructured text format can be performed by federated learning. Liu et al. [44] combined federated learning with natural language processing for the first time, which can make full use of clinical records from different hospitals to train a machine learning model specific to a medical task and improve the quality of a particular clinical task.

2.3.4. Renewable Energy

In the field of renewable energy, wind and solar power generation heavily rely on sensitive data such as geographical and climatic conditions. Traditional centralized model training often suffers from privacy leakage and data silo issues. Thus, federated learning has emerged as a key approach to preserving data privacy while enabling collaborative model development across multiple parties.

In [45], deep reinforcement learning is integrated with federated learning to develop an ultrashort-term wind power forecasting method, which effectively protects data privacy while maintaining forecasting accuracy compared to traditional approaches.

In [46], a hybrid prediction model based on federated learning is proposed to address the limitations of traditional photovoltaic (PV) power prediction methods, particularly their inability to share data and their poor generalization performance. Experimental results show that the prediction accuracy of the proposed model improves by more than 20%.

3. Privacy-Preserving Mechanisms

In this section, privacy-preserving mechanisms such as homomorphic encryption, secure multi-party computation, differential privacy, and blockchain techniques will be introduced, along with privacy-preserving techniques for each privacy-preserving mechanism used for federated learning.

3.1. Homomorphic Encryption

Homomorphic encryption (HE) is a form of encryption that allows computations to be performed on encrypted data, and the results of these computations remain valid after decryption. The secret encryption and decryption of homomorphic encryption satisfy the equation

m = D e c (E n c (m, p k), s k)

. When the ciphertext is equal to

c = f (c_{1}, c_{2}, \dots, c_{n})

, and

c_{i} = E n c (m_{i}, p k)

, it follows that

D e c (c, s k) = f (m_{1}, m_{2}, \dots, m_{n})

. Here, m is the data to be encrypted by each user,

m_{i}

denotes the data to be encrypted by each user,

D e c (a, b)

is the decryption function,

E n c (a, b)

is the encryption function,

p k, s k

refer to the public key used for encryption and the private key used for decryption, respectively, c is the value of the encrypted data after the execution of the function, f is the function to be executed; the function is generally set to its aggregation algorithm in federated learning.

Homomorphic encryption can be classified according to the number of arithmetic operations allowed on the encrypted data. The three classes of homomorphic encryption are as follows:

Partially homomorphic encryption (PHE): Permits an unlimited number of operations, but restricts them to a single type, such as addition or multiplication.
Somewhat homomorphic encryption (SWHE): Permits certain types of operations but limits the number of uses, such as allowing only one multiplication.
Fully homomorphic encryption (FHE): Allows infinite types of algorithmic operations and an infinite number of times.

The homomorphic encryption algorithms applied in federated learning are described as follows:

Paillier partially homomorphic encryption: The Paillier encryption algorithm is an asymmetric encryption algorithm with additive homomorphism properties and consists of three main steps:
1.
Key generation: (1) Randomly select large primes $p, q$ such that p is not equal to q. (2) Calculate $n = p q$ , $λ = l c m (p - 1, q - 1)$ , $l c m$ is the least common multiple. (3) Randomly select $g \in Z_{n^{2}}^{*}$ , and calculate the auxiliary value $μ = {(L (g^{λ} mod n^{2}))}^{- 1} mod n$ . Here, $L (x) = \frac{x - 1}{n}$ , $Z_{n^{2}}^{*}$ is the multiplicative group modulo $n^{2}$ . (4) Use $(n, g)$ as the public key and $(λ, μ)$ as the private key.
2.
Encryption: For plaintext $m \in Z_{n}^{*}$ , randomly select $r \in Z_{n}^{*}$ and compute the ciphertext $c = g^{m} r^{n} mod n^{2}$ .
3.
Decryption: For ciphertext $c \in Z_{n^{2}}^{*}$ , compute the intermediate value $u = c^{λ} mod n^{2}$ required to restore the plaintext $m = L (u) μ mod n$ .
ElGamal partially homomorphic encryption: The ElGamal encryption algorithm is a public key encryption scheme with multiplicative homomorphism properties, comprising three main steps:
1.
Key generation: (1) Choose a large prime p and a generator g of the multiplicative group $Z_{n}^{*}$ . (2) Select a private key $x \in \{1, 2, \dots, p - 2\}$ . (3) Calculate $h = g^{x} mod p$ and use $(p, g, h)$ as the public key.
2.
Encryption: For plaintext $m \in \{1, 2, \dots, p - 1\}$ , randomly select an ephemeral key $k \in \{1, 2, \dots, p - 2\}$ , compute $c_{1} = g^{k} mod p$ , $c_{2} = m h^{k} mod p$ , and represent the ciphertext as the pair $(c_{1}, c_{2})$
3.
Decryption: Using the private key x, compute the shared secret $s = c_{1}^{x} mod p$ , and then restore the plaintext $m = c_{2} s^{- 1} mod p$ , where $s^{- 1}$ is the modular inverse of s modulo p.
CKKS Fully Homomorphic Encryption: The CKKS encryption algorithm is a fully homomorphic encryption algorithm designed for approximate computation, consisting of the following five main steps:
1.
Key generation: (1) Parameter selection: Defining the ring polynomial $R = Z [X] / (X^{N} + 1)$ , which serves as the polynomial modulus in the CKKS encryption scheme. $s^{- 1}$ is the modular inverse of s modulo p. N is a power of 2, and X is a formal variable. Choose a set of prime moduli $\{q_{0}, q_{1}, \dots, q_{L}\}$ to control noise growth. Define the scaling factor $Δ$ to map floating-point numbers to integer polynomials. Select a noise distribution, such as a discrete Gaussian distribution, to generate a small noise polynomial e. (2) Generating keys: Generate private key $s \in R_{q}$ , where s is a small polynomial randomly selected from the ring. Generate public keys $(a, b)$ , $b = - (a s + e) mod q$ , where a is a random polynomial, and e is noise.
2.
Data encoding: (1) Data vectorization: Encodes the plaintext data $z = (z_{0}, z_{1}, \dots, z_{N / 2 - 1})$ to be encrypted by the user as a polynomial $m (X) = z_{0} + z_{1} X + \dots + z_{N / 2 - 1} X^{N / 2 - 1}$ . (2) Scaling polynomials: Scale the polynomial by a factor of $Δ$ , $m ’ (X) = r o u n d (m (X) Δ)$
3.
Encryption: Generate ciphertext $c = (c_{0}, c_{1})$ , $c_{0} = b r + m^{'} (X) + e_{0} mod q$ , $c_{1} = a r + e_{1} mod q$ , r is a random polynomial, $e_{1}$ , $e_{2}$ are noise polynomials.
4.
Decryption: Decrypt the ciphertext $c = (c_{0}, c_{1})$ using private key s, $m^{'} (X) = c_{0} + c_{1} s mod q$ .
5.
Plaintext recovery: $m (X) = m^{'} (X) / Δ$ , after which it is converted to plaintext data $z = (z_{0}, z_{1}, \dots, z_{N / 2 - 1})$ .

The above three homomorphic encryption schemes are relatively mature and have distinct application scenarios. Paillier and ElGamal are PHE schemes, and they are generally more efficient than the CKKS scheme. Paillier supports additive homomorphism properties, offering higher precision in additive aggregation and lower algorithmic complexity. ElGamal supports multiplicative homomorphism properties, making it suitable for weighted multiplications and key exchange scenarios. Unlike Paillier and ElGamal, CKKS supports the direct encryption of real numbers, making it well-suited for high-precision model training tasks.

3.2. Secure Multi-Party Computation

Secure multi-party computation (SMPC) enables multiple participants to collaboratively perform a computational task while preserving the confidentiality of their individual private data. That is, there are n participants

\{P_{1}, P_{2}, \dots, P_{n}\}

in a mutually untrusting network. These participants aim to keep their data

x_{i}

private from others while collaboratively computing a function to obtain the result

Y = (y_{1}, y_{2}, \dots, y_{n}) = f (x_{1}, x_{2}, \dots, x_{n})

.

SMPC encompasses cryptographic techniques with privacy-preserving properties for multiple parties, including but not limited to secret sharing, homomorphic encryption, bit commitment, zero-knowledge proofs [47], hybrid networks [48], and inadvertent transmission [49]. Current applications of SMPC in federated learning primarily involve integrating fundamental secret-sharing schemes into federated learning systems. Secret sharing is a cryptographic technique where a secret holder S divides a secret m into n parts and distributes them to a group of participants

P \{P_{1}, P_{2}, \dots, P_{n}\}

; each participant

P_{i}

receives a corresponding share

m_{i}

. To reconstruct the secret, only an authorized subset of participants

A \subseteq P

can collaboratively recover m, while unauthorized participants cannot gain any information about the secret. The secret-sharing schemes applied to federated learning are described as follows:

Shamir’s secret sharing: In 1979, Adi Shamir proposed a secret sharing scheme rooted in the Lagrange interpolation theorem. The scheme leverages polynomial operations over finite fields to securely distribute a secret among multiple parties. Specifically, the Shamir secret sharing scheme for $(t, n)$ works as follows: To share the secret, s, a trusted dealer first selects a large prime number, p, such that $s < p$ . Then, $t - 1$ random elements $\{a_{1}, a_{2}, \dots, a_{t - 1}\}$ are chosen from the finite field $F_{p}$ to construct a polynomial $f (x) = a_{0} + a_{1} x + a_{2} x^{2} + \dots + a_{t - 1} x^{t - 1} mod p$ , $a_{0} = s$ . Then, the trusted dealer selects n distinct non-zero elements $x_{1}, x_{2}, \dots, x_{n}$ from the finite field $F_{p}$ , computes $y_{i} = f (x_{i}) mod p$ , and assigns the share $(x_{i}, y_{i})$ to $P_{i}$ . In the secret reconstruction phase, at least t shares from the participants are required to reconstruct the secret. The reconstruction formula is as follows:

$s = a_{0} = f (0) = \sum_{j = 0}^{t - 1} y_{i} \prod_{m = 0, m \neq j}^{t - 1} \frac{x_{m}}{x_{m} - x_{j}} .$

(1)
Verifiable secret sharing: A verifiable secret-sharing scheme is an enhancement of the traditional secret-sharing scheme, addressing the issue of malicious or deceptive behavior that may occur among participants in the traditional approach. This improvement makes the scheme more suitable for federated learning environments, where malicious clients and servers may exist. In scenarios where not all participants are necessarily honest, verifiable secret sharing allows honest participants to use cryptographic tools, such as commitment schemes or zero-knowledge proofs, to validate the consistency of secret shares provided by others. When dishonest participants are present, a verifiable secret-sharing scheme with fault-tolerance mechanisms enables the remaining participants to recover the secret value by adhering to the protocol rules.
Additive secret sharing: In additive secret sharing, all secret sharing processes are implemented using additive methods, where the secret s is represented as $s = m + d$ . The secret distributor assigns the secret shares m and d to the participants. Each participant $P_{i}$ receives their respective secret shares as follows:

$m_{i} = f_{1} (x_{i}) = m + a_{1} x + a_{2} x^{2} + \dots + a_{t - 1} x^{t - 1} .$

(2)

$d_{i} = f_{2} (x_{i}) = d + b_{1} x + b_{2} x^{2} + \dots + b_{t - 1} x^{t - 1} .$

(3)

Participants $P_{i}$ calculate $s_{i}$ as follows:

$s_{i} = m_{i} + d_{i} = f_{1} (x_{i}) + f_{2} (x_{i}) = (m + d) + \dots + (a_{t - 1} + b_{t - 1}) x^{t - 1} .$

(4)

The secret $s = m + d$ can be recovered by using the reconstruction formula for any t participants.

The above secret sharing schemes each present specific advantages and limitations when applied in federated learning. Verifiable secret sharing schemes can detect malicious clients, whereas Shamir’s secret sharing and additive secret sharing lack this capability. Shamir’s secret sharing supports both addition and multiplication operations and generally provides acceptable computational efficiency. Additive secret sharing is generally more efficient but typically supports only addition; implementing multiplication requires additional mechanisms. Verifiable secret sharing offers strong security against malicious clients, but it typically incurs significantly higher computational and communication overhead.

3.3. Differential Privacy

Differential privacy (DP) [50], introduced by Dwork, is a mathematical framework for quantifying privacy, specifically designed to address the issue of information leakage in statistical databases. DP assumes a randomized algorithm

M

and two neighboring datasets, D and

D^{'}

, differing by only one record. Algorithm

M

satisfies

(ε, δ)

-DP if it adheres to the following equation.

P r [M (D) \in S] \leq e x p (ε) P r [M (D^{'}) \in S] + δ .

(5)

where

ε

is the privacy budget,

P r [\cdot]

denotes the probability, S represents the set of all possible outputs of the algorithm

M

, and

δ

is a relaxation term used to bound the probability of model failure. The smaller

ε

is, the more indistinguishable the outputs of

M

on D and

D^{'}

are, resulting in a higher privacy level but lower utility.

The primary method for achieving DP involves adding noise, which is randomized based on the input or output to obscure the actual data and ensure privacy. Sensitivity plays an important role in determining the appropriate noise level to add. It is formally defined as follows:

Δ f = max_{D, D^{'}} {∥ f (D) - f (D^{'}) ∥}_{p} .

(6)

where

f : D \to R^{d}

is the query function, and p represents the norm type of the vector. The noise generation mechanism applied in federated learning is elaborated as follows:

Laplace mechanism [51]: DP is achieved by adding noise drawn from the Laplace distribution to the input or output. The probability density of the Laplace distribution $L a p (μ, b)$ is as follows:

$L a p (x |μ, b) = \frac{1}{2 b} e x p (- \frac{|x - μ|}{b}) .$

(7)

The sensitivity of the query function in the Laplace mechanism is defined as follows:

$Δ f = max_{D, D^{'}} {∥ f (D) - f (D^{'}) ∥}_{1} .$

(8)

Based on the above, the definition of Laplace DP can be stated as follows: Assume a query function $f : D \to R^{d}$ , $M (D) = f (D) + Y$ , where $Y \sim L a p (0, \frac{Δ f}{ε})$ represents Laplace random noise. The random algorithm $M$ is defined to satisfy $(ε, 0)$ -differential privacy.
Gaussian mechanism [52]: The Gaussian mechanism is used to achieve DP by adding Gaussian noise to the input or output. The Gaussian distribution is mathematically defined as follows:

$f (x) = \frac{1}{\sqrt{2 π} σ} e^{- \frac{{(μ - x)}^{2}}{2 σ^{2}}} .$

(9)

The sensitivity of the query function in the Gaussian mechanism is defined as follows:

$Δ f = max_{D, D^{'}} {∥ f (D) - f (D^{'}) ∥}_{2} .$

(10)

Based on the above, the definition of Gaussian differential privacy can be stated as follows: Assume a query function $f : D \to R^{d}$ , $M (D) = f (D) + Y$ , where for any $δ \in (0, 1)$ , $σ > \frac{\sqrt{2 ln (1.25 / δ)} Δ f}{ε}$ , and $Y \sim N (0, σ^{2})$ . Then, the randomized algorithm $M$ satisfies $(ε, δ)$ -differential privacy.
Exponential mechanism [53]: The exponential mechanism is designed for non-numerical DP, and its sensitivity $Δ (q)$ is defined as follows:

$Δ (q) = max {∥ q (D, r) - q (D^{'}, r) ∥}_{1} .$

(11)

where D is the dataset, q is the quality function that evaluates the quality score of each output result on the dataset, and r is the output result. When the randomized algorithm $M$ outputs r with a probability proportional to $e x p (\frac{ε q (D, r)}{2 Δ (q)})$ , then $M$ satisfies $(ε, 0)$ -differential privacy.

Each of the aforementioned noise mechanisms applied in federated learning has a distinct focus. The Laplace mechanism allows for direct noise control and is suitable for parameter updates and gradient uploads; however, it introduces relatively high noise, which can significantly degrade model accuracy. The Gaussian mechanism produces smoother noise compared to the Laplace mechanism, but requires careful calibration of the sensitivity parameter. The exponential mechanism is suitable for discrete selection problems and is often applied in scenarios involving categorical outputs or limited candidate sets. However, it requires the design of a well-defined quality function to guide the selection process.

3.4. Blockchain

A blockchain is an append-only distributed ledger designed to record transactions among participants, ensuring immutability and resistance to tampering. The blockchain structure is viewed as a linear chain, with transactions appended in batches of blocks. Each block contains transaction data and block header information, and is linked to its immediate predecessor to form the chain. The block header comprises the previous block hash, the Merkle tree root hash, and a timestamp. The structure of the block is illustrated in Figure 7, and the Merkle tree structure is illustrated in Figure 8, using four transactions as examples.

Blockchain provides a unified quantitative benchmark for federated learning, including evaluation metrics for model contribution (accuracy gain), data quality (label accuracy), and training behavior (convergence curve). These standards are enforced through smart contracts, ensuring that all participants collaborate on a fair and transparent basis, thereby enhancing both system credibility and model performance.

The consensus mechanism of the blockchain is the core technology that ensures data consistency among network nodes. It enables agreement on the blockchain state through a set of protocols. The consensus algorithm, as the core mathematical method of the mechanism, ensures that all nodes agree on each new block added to the blockchain. The consensus algorithm used in federated learning is as follows:

Proof-of-work (PoW): Originally implemented in Bitcoin, the PoW algorithm addresses the issue of consensus through a computational puzzle that is difficult to solve but easy for others to verify. The core idea is to demonstrate validity through computational effort. PoW uses the process of solving puzzles to determine member selection. While PoW enables blockchain membership selection, it demands significant computation, leading to high resource consumption and low transaction throughput [54,55,56].
Proof-of-stake (PoS): The core idea of PoS [57] is that a node demonstrates ownership of virtual resources by staking assets. This ownership determines its eligibility to participate in the consensus process and its level of influence. However, the security of PoS depends on the distribution of assets and places nodes with more assets in a more influential position in system operations.
Delegated proof-of-stake (DPoS): DPoS [58] is a variant of proof-of-stake that introduces a democratic, voting-based membership selection process. Nodes holding stakes vote to select a delegate node, which enables the delegate node to verify and produce blocks.
Practical Byzantine fault tolerant (PBFT): PBFT [59] is a mechanism that remains operational even in the presence of faulty nodes. The PBFT algorithm consists of three primary phases, as follows: the pre-preparation phase, the preparation phase, and the submission phase. The pre-preparation and preparation phases establish a total order of messages, while the preparation and submission phases guarantee consistent request ordering across all nodes.

The above consensus algorithms are applied to federated learning primarily to enhance decentralization, auditability, participant management, and fault tolerance against malicious clients. The proof-of-work consensus algorithm provides strong tamper-resistance and ensures the integrity of model uploads, offering high security. However, its high latency and computational energy consumption make it unsuitable for real-time federated learning tasks. The proof-of-stake and delegated proof-of-stake algorithms improve training and consensus efficiency compared to proof-of-work, making them more suitable for large-scale federated learning scenarios. However, they offer relatively limited decentralization. The practical Byzantine fault tolerance algorithm can resist Byzantine attacks and is suitable for scenarios involving a small number of trusted nodes. However, it incurs high communication overhead, which limits its scalability.

4. Aggregation Techniques for Federated Learning

Federated learning is a distributed machine learning approach that generates a global model by aggregating local models trained by multiple clients while protecting client data privacy.

In centralized federated learning, model aggregation is performed by a server. In this process, each client trains the model on its dataset and uploads the training results to the server. The exchanged information may include model parameters, gradients, or the model itself. Finally, the server integrates all models to generate the global model, and this information is used for subsequent model updates.

In decentralized federated learning, model aggregation occurs through client-to-client communication, where one client initiates cooperation with its neighbors to aggregate models. The process of merging locally generated models into a global model in federated learning is called aggregation techniques.

The aggregation technique in federated learning involves more than simply merging models for updates; other metrics, such as dataset size, model performance, and loss function, are also considered in aggregation algorithms. During federated learning, the aggregation algorithm plays a crucial role in integrating results and enhancing training efficiency.

4.1. Fundamental Aggregation Algorithms

Traditional aggregation algorithms have been proposed to address communication efficiency and privacy preservation issues in federated learning. These traditional aggregation algorithms do not optimize the aggregation mechanism itself; instead, they enhance the efficiency and accuracy of model training by employing mathematical techniques and tuning various parameters. These algorithms are usually integrated into the federated learning framework as foundational algorithms. The commonly used traditional aggregation algorithms are as follows:

FedAvg: The FedAvg [60] algorithm is a classic federated learning aggregation algorithm, where a subset of clients is randomly selected for aggregation in each training round. During aggregation, client parameters are weighted and averaged, with weights determined by the client’s data volume relative to the total data volume. FedAvg has efficient communication, supports non-independent and non-identically distributed (non-IID) data, and achieves high accuracy. However, when the degree of non-IID data distribution is too high, model convergence slows down. Furthermore, security considerations are lacking, failing to ensure participant trustworthiness. FedAvg updates the global model using the following equation:

$ω_{g l o b}^{t + 1} \leftarrow \sum_{k}^{S_{t}} \frac{n_{k}}{n} ω_{k}^{t + 1} .$

(12)

where $ω_{g l o b}^{t + 1}$ is the aggregated global model, $S_{t}$ represents the set of selected clients, $ω_{k}^{t + 1}$ denotes the locally updated model for client k after local training, and $\frac{n_{k}}{n}$ is the weight factor.
FedProx: FedProx [61] generalizes and reparameterizes FedAvg to address local optimization challenges in stochastic gradient descent (SGD)-based algorithms. It introduces a correction term to the client-side loss function, improving model performance and enhancing convergence speed. The correction term is defined as $\frac{μ}{2} {∥ ω_{k}^{t} - ω_{g l o b}^{t} ∥}_{2}$ , the L2 norm of the difference between the local model and the global model, where $μ$ is the penalty constant for the regularization term, designed to penalize clients with large deviations from the global model, thereby constraining the behavior of clients participating in training.
Scaffold: Scaffold [62] is an update process that corrects both the global and local models by calculating the differences between the server-side and client-side control variables. This method effectively addresses the client drift problem caused by data heterogeneity in FedAvg. The model update in Scaffold consists of the following four steps:
- The client-side local model is updated as follows:
  
  $y_{i} \leftarrow y_{i} - η_{l} (g_{i} (y_{i}) - c_{i} + c) .$
  
  (13)
- The client-side control variable is updated as follows:
  
  $c_{i}^{+} \leftarrow c_{i} - c + \frac{1}{K η_{l}} (x - y_{i}) .$
  
  (14)
- The server global model is updated as follows:
  
  $x \leftarrow x + η_{g} Δ x .$
  
  (15)
- The server control variable is updated as follows:
  
  $c \leftarrow c + \frac{1}{N} \begin{matrix} \sum_{i \in S} (c_{i}^{+} - c) \end{matrix} .$
  
  (16)
  
  where $η$ represents the learning rate, K denotes the number of local update steps, $g_{i} (x)$ is the gradient computation function, S stands for the selected client set, and N is the total number of clients.
FedBN: FedBN [63] trains a local model on each client and incorporates a batch normalization (BN) layer for normalization. FedBN is designed to address the heterogeneity of federated learning data, particularly the feature drift issue. The FedBN normalization layer algorithm is as follows:

$ω_{t + 1, k}^{(l)} \leftarrow \frac{1}{P} \begin{matrix} \sum_{k = 1}^{P} ω_{t + 1, k}^{(l)} \end{matrix} .$

(17)

where t is the training period, k is the client serial number, l is the number of neural network layers, and P is the set of clients.
FedDF: FedDF [64] applies the concept of distillation to model fusion. Each client trains a local model, which is subsequently fused through distillation. FedDF is designed to address model heterogeneity and data heterogeneity with a certain level of robustness. The FedDF distillation algorithm is as follows:

$x_{t, j} = x_{t, j - 1} - η \frac{\partial K L (σ (\frac{1}{|S_{t}|} \begin{matrix} \sum_{k \in S_{t}} f ({\hat{x}}_{t}^{k}, d) \end{matrix}), σ (f (x_{t, j - 1}, d)))}{\partial x_{t, j - 1}} .$

(18)

where $K L$ represents the Kullback–Leibler divergence, $σ$ is the softmax function, $η$ is the step size, and $f ({\hat{a}}_{t}^{k}, d)$ denotes the output of the k-th client model in the t-th round at the d-th batch.

4.2. Secure Aggregation Algorithms

Although federated learning does not disclose local data, sensitive information may still leak during model exchanges. Secure aggregation algorithms integrate privacy-preserving mechanisms with aggregation to ensure the confidentiality of model information. This section introduces some of the commonly used secure aggregation algorithms.

4.2.1. Secure Aggregation Based on Homomorphic Encryption

In federated learning, honest-but-curious servers may attempt to infer private data from the model information uploaded by users. Homomorphic encryption allows servers to perform aggregation without decryption, preventing access to the original data. Secure aggregation based on homomorphic encryption typically involves a four-step process:

Each user locally encrypts the trained model using homomorphic encryption to obtain $E n c (x_{i})$ , where $x_{i}$ represents the trained model data. The encrypted data is then uploaded to the server.
The server performs the aggregation operation directly on the encrypted data to obtain the global model.
The user downloads the global model.
The user decrypts the global model locally using their private key.

The schematic diagram of the aggregation process is shown in Figure 9.

Secure aggregation based on homomorphic encryption in federated learning is realized through underlying cryptosystems. Currently, there are three main types of encryption schemes based on cryptosystems, as follows:

Partially homomorphic encryption. Partially homomorphic encryption is now widely used in secure aggregation for federated learning [65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82]. Among these, Paillier cryptosystems are employed in studies [65,66,67,68,69,70,71,72], where various schemes process data differently to improve model performance and reduce communication overhead. For instance, in [71], gradients are selected and quantized using batch cropping to reduce communication overhead. Experiments on the MNIST and CIFAR-10 datasets with 50 edge nodes demonstrate that the proposed scheme achieves higher accuracy than plaintext models, with the single aggregation time on the MNIST dataset ranging from 0.01 s to 0.10 s, indicating a significant improvement in communication efficiency. In [72], a federated learning scheme that integrates gradient compression and homomorphic encryption techniques is proposed to reduce the communication overhead on the client side through gradient compression, while preventing gradient privacy leakage. Security is enhanced by introducing noise, encrypting, and signing the aggregated model before uploading it to the aggregation server. Experimental results on the CIFAR-10 dataset show that gradient compression reduces data transmission and helps preserve privacy, albeit at the cost of slightly reduced accuracy. When the compression rate reaches 50%, the model achieves an accuracy of 89.12%, which is comparable to the uncompressed model’s accuracy of 89.15%.

A lightweight secure aggregation scheme based on ElGamal homomorphic encryption is proposed in [73]. This scheme integrates federated learning into the deep learning of medical models for IoT-based healthcare systems. Employing cryptographic primitives such as masking and homomorphic encryption enhances local model protection, preventing attackers from inferring private medical data through model reconstruction or inversion attacks. Experimental results show that this approach improves communication efficiency by over twofold compared to baseline algorithms and achieves 99.1% accuracy on MNIST after 506 training rounds.

The scheme presented in [76] employs the Joye–Libert partially homomorphic encryption system to implement an additive homomorphic encryption scheme. Its core involves secure multi-party global gradient computation based on active user data, demonstrating low communication overhead and robustness against user dropouts.

To mitigate risks posed by malicious participants colluding with the server to compromise other participants, a federated learning framework combining the Rivest–Shamir–Adleman (RSA) algorithm with Paillier encryption is proposed in [77]. In terms of accuracy, this scheme achieves performance equivalent to unencrypted models after 30 training rounds on the MNIST dataset. Regarding computational overhead, it introduces only a linear increase proportional to the number of parameters.

Fully Homomorphic Encryption. In [83,84,85,86,87], fully homomorphic encryption is also performed on the global model for federated learning to further protect the global model rather than just the local model. In this case, the global model is encrypted for the user, who needs to train their model on the ciphertext, which requires multiplicative homomorphic operations in addition to additive homomorphic operations. This necessitates fully homomorphic encryption systems capable of supporting these operations.

Hybrid Scheme. Refs. [81,82] propose combining homomorphic encryption with other privacy-preserving mechanisms to make federated learning more adaptable to various scenarios. For example, Ref. [81] combines differential privacy with homomorphic encryption, proposing a new multi-party learning framework for privacy preservation in vertically partitioned environments. The core idea leverages the functional mechanism, addressing the challenge of applying differential privacy in vertically partitioned scenarios. This is achieved by adding noise to the objective function. Experiments demonstrate that for linear regression, the scheme achieves a mean square error of 0.007, which is very close to the 0.0044 of privacy-preserving linear regression. For logistic regression, the scheme’s utility deteriorates as the privacy budget decreases, achieving an accuracy of 0.8132 ± 0.0231 when the privacy budget is 10, approximating privacy-preserving logistic regression.

In [82], secure multiparty computation is combined with homomorphic encryption to implement the scheme in a cloud-based federated learning service scenario. However, the appropriate number of clients must be carefully determined, as the communication overhead increases when the server and clients use encrypted model vectors to protect model parameters. Experiments show that the computational overhead of the baseline algorithm in the same scenario is 2.3 times higher than this scheme. The key size of homomorphic encryption significantly impacts runtime. Combining it with other privacy-preserving mechanisms can result in high computational overhead, highlighting the need to balance security and computational efficiency.

Partially homomorphic encryption offers faster encryption and decryption, low communication overhead, and minimal impact on model accuracy. For example, Refs. [73,76] demonstrates lower communication costs and computational overhead compared to [83], while achieving higher model accuracy. However, Ref. [76] does not support multiplication operations and is, therefore, unsuitable for complex model updates.

Generally, fully homomorphic encryption incurs longer training times per round and higher communication costs, making it unsuitable for edge deployments. Nevertheless, it provides strong security guarantees, supports complex computations, and ensures controllable accuracy degradation. For example, Ref. [83] defends against N-1 attacker inference, reconstruction attacks, and membership inference, whereas Ref. [73] fails to defend against malicious clients tampering with model updates.

Hybrid encryption schemes can mitigate accuracy loss through mechanisms such as differentially private balancing. They typically incur moderate communication costs, which can be further reduced using compression or pruning techniques. For example, Ref. [81] incurs lower communication costs and computational overhead than [76], but achieves lower model accuracy. However, these schemes tend to be more complex to implement and less practical in real-world deployment scenarios.

In this paper, secure aggregation schemes based on homomorphic encryption are categorized into partially and fully homomorphic encryption, as summarized in Table 1.

4.2.2. Secure Aggregation Based on Secure Multi-Party Computation

Secure multi-party computation addresses the collaborative computation challenge by splitting each local model into multiple secret shares and distributing them among participants. The aggregated model is then computed using these secret shares, ensuring privacy throughout the process. A schematic diagram of secure aggregation based on secure multi-party computation is shown in Figure 10.

Some previous works have investigated the privacy preservation of secure multi-party computation for machine learning. Bonawitz et al. [88] first introduced the concept of secure aggregation into federated learning by analyzing the security of prior methods and constructing SecAgg, a secure aggregation protocol based on Shamir’s secret sharing. Currently, secure aggregation for multi-party computation can be categorized into three types depending on the secret sharing scheme:

Shamir’s secret sharing. Once Shamir’s secret sharing was applied to federated learning, several studies [89,90,91,92,93,94] have focused on improving aggregation schemes based on Shamir’s secret sharing. In [89], a variant of SecAgg, a secure aggregation protocol, was proposed. The core idea is to model node sharing as a sparse random graph instead of the complete graph used in SecAgg, by selectively performing secret sharing on only a subset of clients. Experiments show that this protocol achieves comparable reliability in learning algorithms and data confidentiality while requiring only 20-30% of the resources needed by SecAgg.

To address user dropout and mitigate attacks by semi-honest and active malicious adversaries, Ref. [90] proposed a scalable privacy-preserving aggregation scheme. Theoretical and experimental analyses demonstrate that the scheme tolerates participant dropout at any time and defends against such attacks through appropriate system parameter configurations. Moreover, the runtime improves by a factor of 6.37 compared to existing schemes.

In [91], a scheme was introduced for dynamically designing the content and objects of secret sharing. This approach enables participants to incorporate secret gradient sharing within their respective groups and significantly reduces communication costs by replacing high-dimensional random vectors with pseudo-random seeds. Experiments indicate that the scheme ensures robust privacy preservation even with honest-but-curious servers and participant collusion. Regarding communication costs, training a LeNet network with 136,886 parameters on the MNIST dataset incurs a communication cost of

10^{1}

, compared to

10^{2}

for the baseline algorithm.

To protect decentralized learning models from inference attacks and privacy leakage, Ref. [92] proposed a secure aggregation scheme based on secret sharing for decentralized learning. This work provides valuable insights for designing more efficient decentralized learning architectures. Experimental results reveal that this scheme requires only 0.21 s of computation time per iteration on the MNIST dataset, compared to 59.54 s for traditional schemes.

For long-term privacy preservation in federated learning, Ref. [93] proposed a privacy-preserving aggregation protocol designed for multiple training rounds. Experimental results show that the protocol enhances privacy preservation while increasing the communication cost by only 0.04 times in a system with 100,000 users compared to previous schemes.

In [94], federated learning is applied to the healthcare domain to predict patient severity. Shamir’s secret sharing is employed to ensure secure communication between clients and the edge aggregator. A dynamic edge thresholding mechanism is integrated, enabling an adaptive update strategy that can accept or reject model updates in real time. Experiments conducted on real-world ICU datasets (MIMIC-III) show that encryption of model updates increases computation time by approximately 50–60% per round and memory consumption by 35–40%. However, these overheads remain within a feasible range for real-world deployment. The accuracy degradation compared to the baseline scenario without secure multi-party computation is limited to 1–2 percentage points.

Verifiable secret sharing. Most aggregation schemes based on secure multi-party computation focus on protecting local models, without considering the security of the global model. This limitation allows malicious servers to tamper with the global model, potentially reducing training efficiency and increasing the risk of exposing users’ sensitive information. To verify the correctness of aggregation results returned by servers, several studies [95,96,97,98,99] have proposed methods based on verifiable secret sharing schemes.

The first privacy-preserving and verifiable federated learning framework, VerifyNet, was introduced in [95]. It ensures the confidentiality of users’ local gradients during the federated learning process through a double-masking protocol. Additionally, it allows the cloud server to provide proof of the correctness of aggregation results, ensuring they are not tampered with. Experimental results show that training a convolutional neural network on the MNIST dataset, with 500 users and a total of 500,000 gradients, requires 70 MB per user for a single parameter update iteration.

In [96], a non-interactive, verifiable, and decentralized federated learning scheme was proposed. This scheme decentralizes the process by splitting users’ secret inputs and distributing them to multiple servers. A subset of these servers collaborates to correctly reconstruct the outputs, ensuring the results are not tampered with. Experiments demonstrate that this approach enables distributed aggregation of secret inputs across multiple untrusted servers and outperforms prior methods.

To enhance the efficiency of federated learning while reducing communication, computational, and storage costs for low-resource devices, Refs. [97,98] proposed optimizations that maintain the confidentiality of users’ local gradients and verify aggregation results. Experimental results indicate that the schemes reduce device power consumption due to lightweight computation.

In [99], a verifiable federated learning scheme was designed to protect sensitive training data in industrial IoT environments and prevent malicious servers from returning falsified aggregated gradients. This scheme employs Lagrangian interpolation, carefully setting interpolation points to verify the correctness of aggregated gradients. By combining Lagrangian interpolation with blinding techniques, it achieves secure gradient aggregation. Notably, the scheme maintains a constant validation overhead compared to existing methods, regardless of the number of participants.

Additive secret sharing. The traditional secret sharing scheme uses Shamir’s secret sharing, which splits the secret into multiple shares based on polynomial interpolation and recovers the secret through Lagrange polynomials. However, this algorithm has high computational and communication costs, so some research efforts have focused on lightweight improvements [100,101,102,103,104,105]. Among them, Refs. [100,101] focus on preventing information leakage in centralized federated learning systems and adopt encryption methods that differ from traditional schemes.

In [100], a partial encryption method is used, encrypting only the first layer of the local model with additive secret sharing, while the rest is sent directly to the central server. Experiments show that the secret share generation time decreases from 2.7081 s to 0.6951 s and the secret share aggregation time decreases from 0.0019 s to 0.0008 s compared to applying secret sharing to the entire local model.

Ref. [101] focuses on training split models in a vertical federated learning scenario. To address performance degradation caused by client discretization and privacy leakage from client-embedded information, the FedVS framework for synchronous split vertical federated learning is proposed, which reconstructs the set of user embeddings in a lossy manner by designing an additive secret sharing scheme. Experiments show that this framework guarantees user privacy and improves performance in scenarios with discrete clients compared to baseline methods.

Refs. [102,103] study decentralized federated learning systems based on secure multi-party computation, exploring optimizations from different perspectives to achieve high accuracy, low communication costs, and scalability without compromising privacy.

In [102], the optimization of the additive secret sharing algorithm is considered by replacing the secure summation building block from [106] with the secure summation algorithm from [107], simplifying the protocol while eliminating the need for public key encryption.

Considering client selection prior to aggregation, Ref. [103] proposes a hierarchical mechanism based on secure multi-party computation that restricts model aggregation to a small number of aggregation committee members. Experiments demonstrate a 90% reduction in exchanged messages on the MNIST dataset and a 22% accuracy improvement compared to locally trained models.

In [104], to reduce the computational and communication costs of existing secure aggregation protocols and improve robustness to client dropout, a secure aggregation protocol is proposed with a novel multi-secret sharing scheme based on the Fast Fourier Transform. Experimental results show that this protocol achieves significantly lower computational costs than existing schemes while maintaining comparable communication costs.

In [105], a scheme combining secure multi-party computation and differential privacy techniques is proposed to prevent local and global models from leaking user data information, protecting the privacy of both the computation process and the results. Experimental results show that the scheme is scalable and robust to client dropouts, and when the number of clients reaches 1000, its efficiency is comparable to that of FedAvg.

Shamir’s secret sharing, verifiable secret sharing, and additive secret sharing do not theoretically affect the model training process or the numerical representation of gradients. For example, Refs. [92,97,100] report modeling accuracies consistent with FedAvg.

In terms of security, verifiable secret sharing offers the highest level of protection, as it guards against malicious clients. Shamir’s secret sharing provides a moderate level of security, while additive secret sharing offers the most basic protection. For example, Ref. [97] can verify aggregation integrity and protect against server tampering, while Ref. [92] provides drop tolerance for clients.

However, compared to [92,97], Ref. [100] offers only basic protection against server–client collusion.

Regarding communication and computational overhead, the order from highest to lowest is as follows: verifiable secret sharing, Shamir’s secret sharing, and additive secret sharing.

In this paper, secure aggregation schemes based on secure multi-party computation are categorized according to the secret sharing schemes they employ, as summarized in Table 2.

4.2.3. Secure Aggregation Based on Differential Privacy

Differential privacy protects individual records from inferential attacks by introducing random noise to the model data. Typically, the added noise ensures that the risk of privacy leakage remains minimal and acceptable. Unlike cryptographic approaches, differential privacy-based methods generally incur low computational overhead. However, the introduction of noise during the learning process can lead to a reduction in the accuracy of the trained model.

Differential privacy frameworks in federated learning can be classified into three categories, as follows:

Local differential privacy (LDP): Local differential privacy is designed for scenarios with untrusted servers, where each client adds perturbation noise to its data before sending it to the central server. By ensuring that the added noise satisfies the client’s differential privacy requirements, the client’s privacy remains protected regardless of the actions or behaviors of other clients or the server.
Global differential privacy (GDP): Global differential privacy is typically employed in scenarios involving a trusted server, where the server adds differential privacy-compliant noise during the aggregation process. This approach protects user privacy while enabling the construction of a more practical model by introducing controlled noise into the global model maintained on the server.
Distributed differential privacy (DDP): Distributed differential privacy does not require a trusted server and aims to achieve strong privacy guarantees with minimal noise addition. The system must pre-determine the noise budget required for each training round to ensure sufficient privacy protection. In each round, the noise addition task is allocated evenly among clients, each of which adds the minimum necessary noise to perturb its model update. Subsequently, the client’s update is masked before transmission to ensure that the server learns only the aggregated result. Distributed differential privacy typically incurs substantial communication overhead.

Secure aggregation schemes based on differential privacy can be categorized according to their noise generation mechanisms. Existing studies are typically divided into the following three categories:

Laplace mechanism. Data models for federated learning are protected by differential privacy based on the Laplace mechanism, as presented in [108,109,110].

In [108], to prevent attackers from partially reconstructing the original training samples using model parameters, a federated learning scheme that combines model segmentation techniques with differential privacy methods is proposed for mobile edge computing. Experimental results demonstrate that the scheme provides strong privacy guarantees while maintaining high model accuracy. Specifically, training a convolutional neural network on the MNIST dataset with 300 users, 50 communication rounds, and a high perturbation strength still achieves over 85% accuracy.

In [109], recognizing that existing protection methods cannot fully safeguard user privacy during iterative global model updates in federated learning, a scheme combining differential privacy with lightweight encryption is proposed. This scheme perturbs the privacy-sensitive parameters and transmits them in ciphertext form. Trained on the MNIST dataset with a convolutional neural network, experiments show that the scheme ensures strong security in client–server interactions and achieves a model accuracy of 97.8% when the privacy budget and number of users are set to 10 and 100, respectively.

In [110], federated learning is applied in an IoT environment to implement an adaptive gradient Laplace noise addition mechanism. The noise level in each round is determined by a dynamically allocated privacy budget. Experiments are conducted on the Tiny ImageNet and DeepFashion datasets, comparing the proposed method with both an equal privacy budget allocation strategy and a baseline model without privacy protection. The proposed method consistently outperforms the baseline, with the accuracy gap reaching its maximum at 50 iterations.

Gaussian mechanism. The commonly used mechanisms in differential privacy are the Laplace mechanism and the Gaussian mechanism. However, the mean squared error of the Laplace mechanism is significantly higher than that of other mechanisms, making the Gaussian mechanism the most commonly used in federated learning based on differential privacy, e.g., [111,112,113,114].

In [111], the privacy budget and communication rounds under convergence constraints are analyzed using an adaptive algorithm. Theoretical analysis is conducted to determine the optimal number of local differential privacy stochastic gradient descent iterations between any two consecutive global updates. The scheme dynamically searches for the optimal performance of differential privacy federated learning. Experiments on the MNIST, FMNIST, and CIFAR-10 datasets show that, with a privacy budget of 5.25, accuracies of 98.82%, 84.58%, and 57.25% are achieved, respectively.

In [112], the local dataset is pre-partitioned into multiple subsets for parameter updates. The issue of increased parameter sensitivity is addressed through data splitting and parameter averaging operations. Experiments on the FEMNIST dataset, with a training duration of 180,000 s, demonstrate that this approach improves accuracy by 8% compared to the baseline scheme.

In [113], the heterogeneous privacy requirements of different clients are explicitly modeled and exploited. The work investigates optimizing the utility of the federated model while minimizing communication costs. The proposed scheme significantly improves model utility, achieving better noise reduction and performance compared to FedAvg.

In [114], to address privacy leakage caused by model parameters encoding sensitive information during training, a training scheme using global differential privacy is proposed. A dynamic privacy budget allocator is implemented to enhance model accuracy. Experimental results demonstrate that the scheme effectively improves training efficiency and model quality for a given privacy budget.

Exponential mechanism. To select the optimal term among multiple candidate output terms and handle high-dimensional feature parameters, the application of exponential mechanisms in federated learning has been proposed, as demonstrated in [115,116,117].

In [115], a federated learning framework based on joint differential privacy is introduced to incentivize client participation in model training and prevent malicious participants from poisoning the model. The framework employs two game-theoretic mechanisms, formulating client selection as an auction game. Clients report their costs as bids, and servers adopt payment strategies to maximize returns. Experimental results indicate that, under dirty label attacks, the traditional scheme suffers a 4–5% accuracy degradation, whereas this framework limits the degradation to 1–2%.

In [116], the authors focus on model parameter updates and observe that these updates in federated learning involve high-dimensional, continuous values with high precision, rendering existing local differential privacy protocols unsuitable. To address this, the LDP-Fed algorithm is proposed, leveraging the exponential mechanism to handle high-dimensional, continuous, and large-scale parameter updates. Experiments on the FashionMNIST dataset demonstrate that this scheme achieves a final accuracy of 86.85%, outperforming the baseline algorithm.

In [117], the impact of parameter dimensions on privacy budgets was analyzed. It was found that increasing parameter dimensions significantly inflates the privacy budget, and the larger variance induced by the perturbation mechanism degrades model performance. To mitigate this, a filtering and screening mechanism based on the exponential mechanism is proposed. This mechanism selects better parameters by evaluating their contribution to the neural network. Experiments on the MNIST dataset, with five edge nodes under non-independent and homogeneous distribution, show that each participant reduces communication costs by at least 60%, while achieving an accuracy of up to 94.63%.

To balance privacy guarantees with model accuracy and practicality in differential privacy-based federated learning systems, distributed differential privacy mechanisms have been proposed in studies such as [118,119,120]. For instance, the work of [119] introduces a distributed differential privacy federated learning framework named Dordis, which implements a novel “add-delete” scheme to precisely enforce the desired noise level in each training round, even when some sampled clients drop out. Experimental results indicate that this framework achieves an optimal balance between privacy protection and model utility, while enhancing training speed by a factor of 2.4 compared to existing methods addressing client dropouts.

The differential privacy schemes described above are commonly used in federated learning. In terms of accuracy, the exponential mechanism has minimal impact when its quality function is well-designed. The introduction of the Laplace mechanism may reduce accuracy by approximately 3–5%, while the Gaussian mechanism typically leads to a smaller accuracy drop of around 1–2%.

Regarding communication cost, the exponential mechanism incurs extremely low overhead. The Gaussian mechanism introduces only mild noise, which allows for the use of pruning, quantization, and compression techniques, keeping communication cost controllable. In contrast, the Laplace mechanism introduces higher noise amplitude, making it unsuitable for compression; maintaining sufficient precision during transmission leads to increased communication cost. For example, Ref. [117], which employs the exponential mechanism, has the same communication cost as FedAvg. In contrast, Ref. [108], which adopts the Laplace mechanism, incurs a slight increase in communication cost, while [111], which applies the Gaussian mechanism, achieves a reduction of approximately 30%.

As for computational overhead, both the Laplace and Gaussian mechanisms are relatively lightweight, while the exponential mechanism imposes a moderate computational burden.

Secure aggregation schemes based on differential privacy are categorized in this paper according to their noise generation mechanisms, as shown in Table 3.

4.2.4. Secure Aggregation Based on Blockchain

In terms of security, blockchain is characterized by decentralization, minimizing reliance on a central server. This architecture distributes trust from a single entity to selected aggregators, thereby reducing the risk of a single point of failure. In the context of federated learning, this property is particularly beneficial for secure aggregation, as it prevents any single aggregator from becoming a privacy bottleneck or an attack target. Blockchain’s authentication and traceability features ensure that data is both tamper-proof and auditable, which can be directly leveraged to verify the integrity of model updates during the aggregation process. Additionally, the open ledger functionality allows all transactions to be validated by other nodes, thereby enhancing system transparency and providing verifiable guarantees of aggregation correctness.

Every model update uploaded to the blockchain must satisfy consensus conditions, and the combination of consensus mechanisms and cryptographic techniques (e.g., digital signatures, zero-knowledge proofs) provides strong resistance against Byzantine attacks, thereby improving the robustness of secure aggregation against malicious participants. Furthermore, interactions between users in a blockchain system are conducted anonymously, with no central server maintaining user information, thereby preserving user privacy during aggregation—a key requirement in federated learning.

In terms of efficiency, blockchain can offer incentive mechanisms to ensure that high-performance edge devices are adequately motivated to contribute to the federated learning system. These incentives can also promote honest participation in secure aggregation, encouraging nodes to follow protocols that uphold data privacy. Moreover, participants in the blockchain network can dynamically join or leave while maintaining high availability, which is essential to ensure the continuity of secure aggregation in real-world federated settings.

The inherent characteristics of blockchain make its integration with federated learning a promising research direction, and numerous studies have begun to explore this synergy [121,122]. Unlike traditional federated learning systems, this approach enables fully decentralized federated learning, offering greater flexibility. It can be tailored to diverse scenarios using dynamic aggregation schemes governed by smart contracts, thus enabling secure, automated aggregation without relying on a trusted third party. These smart contracts can enforce privacy-preserving rules and detect anomalies during the aggregation phase, providing an effective solution for handling non-IID data while ensuring security and trust.

Figure 11 shows a schematic diagram of secure aggregation based on blockchain. The specific process is as follows:

Initialization: The task publisher initializes the parameters of the global model and publishes them to the blockchain.
Model download: Each client obtains the global model and determines whether to proceed to the next training round.
Model training: Each client trains local models on its local data.
Data transmission: All trained local models are uploaded and recorded in a blockchain block.
Model aggregation: The selected consensus node uses the consensus algorithm to aggregate the local models and generate the global model.
Global model update: The resulting global model is updated and recorded in a block, which is then appended to the blockchain.

The framework for applying blockchain technology in federated learning can be classified into two categories:

Full decentralization: In fully decentralized federated learning, each node can act as a consensus node, playing the role of a central server to lead a training round, with a probability proportional to its available resources.
Partial decentralization: In partially decentralized federated learning, the system balances computational cost and security by algorithmically selecting candidate blocks or designating nodes responsible for selecting candidate blocks to achieve an optimal trade-off.

Secure aggregation based on blockchain can be categorized according to the consensus algorithms used, with existing approaches divided into the following four categories:

PoW. The PoW consensus algorithm is used to construct federated learning systems in [123,124,125]. In [123], to solve the single-point-of-failure problem, a blockchain-assisted decentralized federated learning framework is proposed. In this framework, the model aggregation process is fully decentralized, integrating model training and blockchain mining tasks for each participant. This design effectively prevents malicious clients from poisoning the learning process and provides clients with a self-motivated and reliable learning environment. Experimental results show that when 30% of clients are inert, the scheme improves accuracy by 18% and 13.8% on the Fashion-MNIST and CIFAR-10 datasets, respectively, compared to the baseline.

Ref. [124] proposes a blockchain-based asynchronous federated learning scheme to ensure security and efficiency. The scheme introduces a new entropy weight method to evaluate the participation level and proportion of local models trained on devices. Experiments demonstrate that the scheme reduces latency by 66% compared to synchronous federated learning and improves accuracy by 12.1% over FedAvg.

In [125], a fully coupled blockchain-assisted federated learning system is proposed to eliminate the single point of failure by decentralizing training and aggregation tasks among all participants. The system offers high flexibility, allowing participants to choose shared models and customize aggregation according to local needs, optimizing system performance and yielding accurate inference results. Experimental results indicate that this scheme can train large and complex models with an accuracy loss of less than 0.5%.

PoS or DPoS. To improve the communication efficiency, the construction of federated learning systems is performed using PoS or DPoS consensus algorithms, e.g., [126,127,128].

In [126], to enhance edge node caching efficiency and communication efficiency in federated learning, an algorithm called CREAT is proposed. This algorithm predicts the popularity of different files to cache them, thereby speeding up responses to IoT device requests. Combined with blockchain technology, it ensures the security of data transmitted by IoT devices and gradients uploaded by edge nodes. Experiments using the MovieLens dataset show that this algorithm significantly optimizes data upload time compared to baseline algorithms.

In [127], a blockchain-based hierarchical crowdsourced federated learning system is proposed to assist manufacturers in developing smart home systems. This system uses a reputation mechanism to help appliance manufacturers train models with customer data while protecting extracted features with differential privacy. Experimental results indicate that its communication cost is acceptable. Assuming a local model size of 617.8 KB and an upload bandwidth of 1 MB/s, the communication time is approximately 0.6178 s.

Ref. [128] considers potential scenarios where federated learning is applied to 5G and beyond. It proposes a framework combining federated learning with blockchain to improve model quality, communication efficiency, and privacy security. The framework formulates resource sharing as a combinatorial optimization problem, considering resource consumption and learning quality. It designs a deep reinforcement learning-based algorithm to find the optimal solution. Experimental results demonstrate high accuracy, good convergence, and improved system security.

Committee Consensus Algorithm. Research employs committee-based consensus algorithms for partially decentralized federated learning, where selected committees reach consensus to decide candidate blocks, as in [129,130,131].

In [129,130], to resist Byzantine attacks and maintain both accuracy and security, committee-based consensus algorithms propose a new federated learning framework.

In [131], to prevent Byzantine committee members from compromising the correctness of the global model by tampering with aggregation results, a federated learning scheme called VDFChain is proposed. VDFChain uses lossless masking techniques and trusted committee mechanisms to provide secure aggregation effectively. Security analysis and experimental results demonstrate that the scheme is secure and exhibits excellent computational and communication performance.

PBFT. PBFT is used in [132,133,134] to defend against Byzantine attacks. PBFT consensus is final and deterministic, without chain forks or rollbacks, which guarantees the trustworthiness of federated learning model updates and data integrity. Additionally, PBFT does not rely on high computational power or resources. When applied with blockchain in federated learning, the PBFT consensus algorithm verifies the correctness of global model updates while achieving high efficiency and low energy consumption.

In [134], Shamir’s secret sharing is combined with blockchain to enhance poisoning robustness and data privacy, leveraging techniques such as quantization, median aggregation, and Hamming distance. Experiments are conducted on the MNIST and CIFAR-10 datasets using a setup with 100 clients and 7 aggregators. Compared to the baseline scheme, the proposed method reduces communication cost by a factor of 76.26 and improves processing speed by a factor of 54.16. Moreover, it maintains a high level of accuracy even when up to 40% of the clients are malicious.

In blockchain-based secure aggregation for federated learning, the use of the PoW consensus algorithm incurs extremely high computational costs and energy consumption, making it unsuitable for frequently updated federated learning tasks. However, PoW provides strong security, is tamper-resistant, and is often used as a baseline reference in related studies. In terms of accuracy, [125], which uses PoW, achieves approximately 85%, close to the FedAvg approximation; Ref. [126], employing PoS, reaches about 92%; and [131], using the committee consensus algorithm, achieves 90%.

The PoS or DPoS consensus algorithms offer lower computational and energy costs compared to PoW, making them more suitable for medium-frequency federated learning tasks. However, their security relies on the distribution of stake, and they generally provide lower decentralization. For example, Ref. [126] reduces the communication cost per upload by 60%, whereas [125] increases it by 10–20%.

Committee consensus algorithms provide fast consensus, high throughput, and low energy consumption. However, the committee or leader election mechanisms may introduce risks of centralization. The computational overhead of [131], which uses the committee consensus algorithm, is roughly the same as that of [126], which uses PoS, with both increasing by about 20%.

In high-poisoning environments [133], which uses the PBFT consensus algorithm, achieves an accuracy of up to 80%. Its communication cost increases by approximately 15%, and its computational overhead increases by about 25%. The overall computational cost and energy consumption lie between those of PoW and PoS. Nevertheless, PBFT-based approaches suffer from poor scalability and are more suitable for small-scale scenarios that require high consistency in model updates.

In this paper, blockchain-based secure aggregation is classified by the consensus algorithms employed, as summarized in Table 4.

4.3. Discussions

Generally, homomorphic encryption and SMPC-based secure aggregation are more resource-intensive but provide robust privacy protection and high model accuracy.

Among homomorphic encryption-based secure aggregation. Additive homomorphic encryption is well-suited for federated learning aggregation, offering lower resource requirements compared to fully homomorphic encryption, making it a commonly adopted approach. Fully homomorphic encryption is employed when the entire federated learning workflow requires protection. However, its extensive key generation and encryption/decryption processes often lead to substantial computational and communication overhead. Homomorphic encryption-based secure aggregation is primarily applied in domains with stringent privacy requirements and sensitive data, such as banking risk control and medical diagnosis. In [69], homomorphic encryption is employed to encrypt model parameters for diabetic retinopathy (DR) diagnosis. Experimental results demonstrate that this approach outperforms conventional methods in terms of model accuracy, computational efficiency, and communication overhead.

SMPC-based secure aggregation is more computationally efficient than homomorphic encryption but incurs higher communication costs. To address this, most SMPC research integrates lightweight encryption techniques, delivering cost-effective solutions for resource-constrained federated learning environments. SMPC-based secure aggregation is commonly used in scenarios involving numerous participants and end-to-end data security, such as industrial IoT, medical data modeling, and retail enterprises [135,136]. For example, factory equipment and sensor nodes collect large amounts of data (e.g., temperature, voltage, capacity, etc.), but different vendors and shops are reluctant to share raw data. Therefore, prediction models can be trained locally on each device, and the model updates are aggregated using SMPC to ensure that specific working condition parameters are not exposed.

In federated learning, differential privacy-based secure aggregation is resource-efficient. However, the added noise inevitably degrades model accuracy, making the current research focus on balancing accuracy and privacy protection. This approach is primarily used in resource-constrained and time-sensitive scenarios, such as edge computing on mobile devices, the Internet of Vehicles (IoV), and personalized recommendation systems. If federated learning is applied to the IoV for vehicle trajectory prediction, individual vehicles will generate a large amount of sensitive driving data. To protect privacy while improving the accuracy of path planning, the trajectory data can be perturbed with noise before being uploaded.

Blockchain-based secure aggregation leverages blockchain’s authentication and traceability features to ensure data integrity and enhance training transparency. Its decentralized nature addresses the single-point-of-failure problem, while consensus mechanisms prevent Byzantine attacks and ensure data validity. Additionally, incentive mechanisms boost the motivation of participants. However, traditional blockchain technology in federated learning lacks robust privacy guarantees for model information, as client-uploaded models remain in plaintext. Thus, it typically needs to be integrated with additional privacy protection mechanisms. For example, methods like homomorphic encryption and differential privacy can be applied, requiring clients to encrypt trained models before uploading them to the blockchain to prevent information leakage.

Integrating blockchain with federated learning inevitably introduces additional communication and computational overhead, largely influenced by the blockchain’s consensus algorithm. The computational overhead varies significantly with the consensus algorithm. PoW-based blockchains demand substantial computational resources, whereas PoS-based systems reduce overhead at the cost of decreased decentralization. Thus, research primarily focuses on enhancing model accuracy and minimizing resource consumption in untrusted environments.

In practical applications, high-energy consensus mechanisms are generally avoided. Instead, techniques such as model compression, gradient pruning, and asynchronous updating are employed to reduce communication frequency and on-chain interactions, thereby lowering overall energy consumption.

To address scalability challenges, a collaborative on-chain/off-chain storage strategy is adopted. In this approach, model parameters and large-scale transaction data are stored off-chain, while only hash values or metadata are recorded on-chain, ensuring data verifiability while enhancing scalability.

Furthermore, scalability can be improved through asynchronous federated learning and local model fusion. This method eliminates the need for all participants to upload their models simultaneously. Instead, partial model fusion is first conducted within partitions, followed by global aggregation, effectively reducing synchronization overhead.

Blockchain-based secure aggregation is commonly applied in fields requiring incentives, transparency, and decentralization, making it suitable for untrusted multi-party scenarios such as smart homes, smart grids, and supply chain management [137,138].

5. Challenges and Future Directions

As federated learning secure aggregation schemes are deployed and further developed, numerous new challenges emerge that warrant deeper investigation. Future research should focus on improving the model quality, enhancing the security, and increasing the adaptability of secure aggregation schemes.

5.1. Global Model Quality

In federated learning, each client possesses a varying amount of data. When secure aggregation schemes are applied, the computational or communication overhead increases, and clients may not always train locally within a given time frame, which can consequently degrade the global model’s accuracy. To address these challenges, cryptographic secure aggregation requires lightweight encryption schemes tailored for federated learning, as demonstrated in works like [73,76]. Another method to reduce communication overhead involves using Top-K gradient selection algorithms for compressive screening of data, as shown in [71]. Differential privacy-based secure aggregation schemes, on the other hand, emphasize balancing trade-offs between model quality and performance, as explored in [111].

Striking an optimal balance between performance and model quality is complex, highlighting the urgent need for further research to effectively integrate different techniques and secure aggregation protocols. Future work should focus on enhancing aggregation algorithms to handle client heterogeneity, improve generalization, and pre-process client data [139,140]. These improvements aim to enhance the quality of locally trained models and achieve acceptable global model performance.

5.2. Security

In federated learning, three primary areas are susceptible to attacks, namely, input data, the aggregation process, and the aggregated model. While federated learning does not collect raw user data, other types of attacks can still hinder its deployment. For instance, category inference attacks [141] and poisoning attacks [142,143] target data categories by either inferring original data through model access or degrading model performance for specific categories. Consequently, improving the security of secure aggregation algorithms remains an important area of research.

Poisoning attacks degrade the performance of learned models by introducing noise into the federated learning system. To counter malicious client-side poisoning, model detection algorithms are employed to identify and filter out compromised models. For server-side poisoning, clients must verify the integrity of the aggregated model. In this context, integrating blockchain technology and verifiable secret sharing algorithms into secure aggregation methods represents a promising research direction.

Inference attacks compromise privacy by recovering user data through the inference of model updates. While existing secure aggregation techniques protect local models to prevent such attacks, the global model remains exposed and susceptible. Protecting the global model, however, incurs significant computational and communication costs. Balancing secure aggregation with model utility thus poses a critical challenge.

Although robust solutions for poisoning attacks have been developed [132,133], and differential privacy techniques have been applied to mitigate inference attacks, federated learning systems still face unknown security vulnerabilities. Therefore, combining secure aggregation techniques with other privacy-preserving methods to enhance security continues to be a valuable research topic.

5.3. Adaptability

Adaptability is one of the key directions for the continuous development of federated learning across diverse application scenarios. Research on adaptation needs to focus on constructing more efficient and personalized models in heterogeneous data environments. Current methods addressing the challenges of non-IID data, such as data sharing and data distillation, carry risks of increasing privacy leakage, with the extent of potential harm remaining uncertain. Therefore, defining and quantifying the degree of privacy leakage in heterogeneous data environments remains a crucial research direction.

Federated learning demonstrates broad application prospects across various fields. Although studies on local optimization algorithms [144], personalized modeling [145], and data preprocessing [146] contribute to reducing communication costs and improving model generalization in specific scenarios, research on federated learning tailored to distinct application contexts is still limited and often overlooked. Designing systems suited to specific application scenarios, optimizing model deployment, and enhancing both training performance and the generalization ability of global models in these contexts are critical challenges requiring further investigation.

In practical applications, federated learning clients may encounter constraints in computational power, bandwidth, and storage resources. For instance, in mobile device scenarios, low-power devices demand lightweight models and optimized communication schemes [147]. In resource-constrained environments, studying efficient model compression, pruning techniques, and communication protocol optimization is a key research priority.

Federated learning must also adapt to the demands of multi-domain collaboration and heterogeneous platform integration. For example, in cross-domain data collaboration, achieving efficient and effective collaborative modeling of data across different domains [148] and enabling seamless deployment on diverse hardware platforms are essential directions for adaptability research.

6. Conclusions

This paper provides a concise overview of secure aggregation for federated learning. It first analyzes the background and significance of studying secure aggregation for federated learning and introduces the definition and classification of federated learning. Then, it presents the fundamental aggregation algorithms and several commonly used privacy-preserving mechanisms, categorizing and discussing existing research based on these mechanisms. Finally, this paper outlines the current challenges in federated learning and suggests directions for future research.

Author Contributions

Conceptualization, X.Z. and Y.L.; methodology, X.Z.; validation, T.L.; writing—original draft preparation, X.Z.; writing—review and editing, Y.L. and T.L.; supervision, T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Science Foundation of China under grant nos. 61902156 and 62072217.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors have no competing interests to declare that are relevant to the content of this article.

References

Deng, L.; Yu, D. Deep learning: Methods and applications. Found. Trends Signal Process. 2014, 7, 197–387. [Google Scholar] [CrossRef]
Khan, S.; Yairi, T. A review on the application of deep learning in system health management. Mech. Syst. Signal Process. 2018, 107, 241–265. [Google Scholar] [CrossRef]
Chen, C.; Zhu, W.; Steibel, J.; Siegford, J.; Han, J.; Norton, T. Classification of drinking and drinker-playing in pigs by a video-based deep learning method. Biosyst. Eng. 2020, 196, 1–14. [Google Scholar] [CrossRef]
Liu, J.; Abbas, I.; Noor, R.S. Development of deep learning-based variable rate agrochemical spraying system for targeted weeds control in strawberry crop. Agronomy 2021, 11, 1480. [Google Scholar] [CrossRef]
Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018, 4, e00938. [Google Scholar] [CrossRef] [PubMed]
Stoica, I.; Song, D.; Popa, R.A.; Patterson, D.; Mahoney, M.W.; Katz, R.; Joseph, A.D.; Jordan, M.; Hellerstein, J.M.; Gonzalez, J.E.; et al. A berkeley view of systems challenges for ai. arXiv 2017, arXiv:1712.05855. [Google Scholar] [CrossRef]
Konečnỳ, J.; McMahan, H.B.; Ramage, D.; Richtárik, P. Federated optimization: Distributed machine learning for on-device intelligence. arXiv 2016, arXiv:1610.02527. [Google Scholar] [CrossRef]
McMahan, H.B.; Moore, E.; Ramage, D.; y Arcas, B.A. Federated learning of deep networks using model averaging. arXiv 2016, arXiv:1602.05629. [Google Scholar]
Zhang, C.; Xie, Y.; Bai, H.; Yu, B.; Li, W.; Gao, Y. A survey on federated learning. Knowl.-Based Syst. 2021, 216, 106775. [Google Scholar] [CrossRef]
Wen, J.; Zhang, Z.; Lan, Y.; Cui, Z.; Cai, J.; Zhang, W. A survey on federated learning: Challenges and applications. Int. J. Mach. Learn. Cybern. 2023, 14, 513–535. [Google Scholar] [CrossRef]
Li, Q.; Wen, Z.; Wu, Z.; Hu, S.; Wang, N.; Li, Y.; Liu, X.; He, B. A survey on federated learning systems: Vision, hype and reality for data privacy and protection. IEEE Trans. Knowl. Data Eng. 2021, 35, 3347–3366. [Google Scholar] [CrossRef]
Lyu, L.; Yu, H.; Ma, X.; Chen, C.; Sun, L.; Zhao, J.; Yang, Q.; Yu, P.S. Privacy and robustness in federated learning: Attacks and defenses. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 8726–8746. [Google Scholar] [CrossRef]
Qu, Y.; Uddin, M.P.; Gan, C.; Xiang, Y.; Gao, L.; Yearwood, J. Blockchain-enabled federated learning: A survey. ACM Comput. Surv. 2022, 55, 1–35. [Google Scholar] [CrossRef]
Fu, J.; Hong, Y.; Ling, X.; Wang, L.; Ran, X.; Sun, Z.; Wang, W.H.; Chen, Z.; Cao, Y. Differentially private federated learning: A systematic review. arXiv 2024, arXiv:2405.08299. [Google Scholar] [CrossRef]
Qi, P.; Chiaro, D.; Guzzo, A.; Ianni, M.; Fortino, G.; Piccialli, F. Model aggregation techniques in federated learning: A comprehensive survey. Future Gener. Comput. Syst. 2024, 150, 272–293. [Google Scholar] [CrossRef]
Moshawrab, M.; Adda, M.; Bouzouane, A.; Ibrahim, H.; Raad, A. Reviewing federated learning aggregation algorithms; strategies, contributions, limitations and future perspectives. Electronics 2023, 12, 2287. [Google Scholar] [CrossRef]
Yang, W.; Zhang, Y.; Ye, K.; Li, L.; Xu, C.Z. FFD: A federated learning based method for credit card fraud detection. In Big Data–BigData 2019; Chen, K., Seshadri, S., Zhang, L.J., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 18–32. [Google Scholar]
Yang, F.; Sun, J.; Cheng, J.; Fu, L.; Wang, S.; Xu, M. Detection of starch in minced chicken meat based on hyperspectral imaging technique and transfer learning. J. Food Process Eng. 2023, 46, e14304. [Google Scholar] [CrossRef]
Zhang, Z.; Lu, Y.; Yang, M.; Wang, G.; Zhao, Y.; Hu, Y. Optimal training strategy for high-performance detection model of multi-cultivar tea shoots based on deep learning methods. Sci. Hortic. 2024, 328, 112949. [Google Scholar] [CrossRef]
Zhu, H.; Wang, D.; Wei, Y.; Zhang, X.; Li, L. Combining Transfer Learning and Ensemble Algorithms for Improved Citrus Leaf Disease Classification. Agriculture 2024, 14, 1549. [Google Scholar] [CrossRef]
Ahmed, S.; Qiu, B.; Ahmad, F.; Kong, C.W.; Xin, H. A state-of-the-art analysis of obstacle avoidance methods from the perspective of an agricultural sprayer UAV’s operation scenario. Agronomy 2021, 11, 1069. [Google Scholar] [CrossRef]
Mansour, Y.; Mohri, M.; Ro, J.; Suresh, A.T. Three approaches for personalization with applications to federated learning. arXiv 2020, arXiv:2002.10619. [Google Scholar] [CrossRef]
Hard, A.; Rao, K.; Mathews, R.; Ramaswamy, S.; Beaufays, F.; Augenstein, S.; Eichner, H.; Kiddon, C.; Ramage, D. Federated learning for mobile keyboard prediction. arXiv 2018, arXiv:1811.03604. [Google Scholar]
Yu, T.; Li, T.; Sun, Y.; Nanda, S.; Smith, V.; Sekar, V.; Seshan, S. Learning context-aware policies from multiple smart homes via federated multi-task learning. In Proceedings of the 2020 IEEE/ACM Fifth International Conference on Internet-of-Things Design and Implementation (IoTDI), Sydney, NSW, Australia, 21–24 April 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 104–115. [Google Scholar]
Amiri, M.M.; Gündüz, D. Federated learning over wireless fading channels. IEEE Trans. Wirel. Commun. 2020, 19, 3546–3557. [Google Scholar] [CrossRef]
Yi, L.; Shi, X.; Wang, N.; Zhang, J.; Wang, G.; Liu, X. Fedpe: Adaptive model pruning-expanding for federated learning on mobile devices. IEEE Trans. Mob. Comput. 2024, 23, 10475–10493. [Google Scholar] [CrossRef]
Zhu, W.; Sun, J.; Wang, S.; Shen, J.; Yang, K.; Zhou, X. Identifying field crop diseases using transformer-embedded convolutional neural network. Agriculture 2022, 12, 1083. [Google Scholar] [CrossRef]
Ren, Y.; Huang, X.; Aheto, J.H.; Wang, C.; Ernest, B.; Tian, X.; He, P.; Chang, X.; Wang, C. Application of volatile and spectral profiling together with multimode data fusion strategy for the discrimination of preserved eggs. Food Chem. 2021, 343, 128515. [Google Scholar] [CrossRef]
Yang, N.; Chang, K.; Dong, S.; Tang, J.; Wang, A.; Huang, R.; Jia, Y. Rapid image detection and recognition of rice false smut based on mobile smart devices with anti-light features from cloud database. Biosyst. Eng. 2022, 218, 229–244. [Google Scholar] [CrossRef]
Awais, M.; Li, W.; Hussain, S.; Cheema, M.J.M.; Li, W.; Song, R.; Liu, C. Comparative evaluation of land surface temperature images from unmanned aerial vehicle and satellite observation for agricultural areas using in situ data. Agriculture 2022, 12, 184. [Google Scholar] [CrossRef]
Tang, L.; Syed, A.U.A.; Otho, A.R.; Junejo, A.R.; Tunio, M.H.; Hao, L.; Asghar Ali, M.N.H.; Brohi, S.A.; Otho, S.A.; Channa, J.A. Intelligent Rapid Asexual Propagation Technology—A Novel Aeroponics Propagation Approach. Agronomy 2024, 14, 2289. [Google Scholar] [CrossRef]
Guo, Y.; Gao, J.; Tunio, M.H.; Wang, L. Study on the identification of mildew disease of cuttings at the base of mulberry cuttings by aeroponics rapid propagation based on a BP neural network. Agronomy 2022, 13, 106. [Google Scholar] [CrossRef]
Peng, Y.; Wang, A.; Liu, J.; Faheem, M. A comparative study of semantic segmentation models for identification of grape with different varieties. Agriculture 2021, 11, 997. [Google Scholar] [CrossRef]
El-Mesery, H.S.; Qenawy, M.; Ali, M.; Hu, Z.; Adelusi, O.A.; Njobeh, P.B. Artificial intelligence as a tool for predicting the quality attributes of garlic (Allium sativum L.) slices during continuous infrared-assisted hot air drying. J. Food Sci. 2024, 89, 7693–7712. [Google Scholar] [CrossRef] [PubMed]
Wang, B.; Du, X.; Wang, Y.; Mao, H. Multi-machine collaboration realization conditions and precise and efficient production mode of intelligent agricultural machinery. Int. J. Agric. Biol. Eng. 2024, 17, 27–36. [Google Scholar] [CrossRef]
Zhu, S.; Wang, B.; Pan, S.; Ye, Y.; Wang, E.; Mao, H. Task allocation of multi-machine collaborative operation for agricultural machinery based on the improved fireworks algorithm. Agronomy 2024, 14, 710. [Google Scholar] [CrossRef]
Lakhiar, I.A.; Yan, H.; Zhang, C.; Wang, G.; He, B.; Hao, B.; Han, Y.; Wang, B.; Bao, R.; Syed, T.N.; et al. A review of precision irrigation water-saving technology under changing climate for enhancing water use efficiency, crop yield, and environmental footprints. Agriculture 2024, 14, 1141. [Google Scholar] [CrossRef]
Zhang, L.; Wang, X.; Zhang, H.; Zhang, B.; Zhang, J.; Hu, X.; Du, X.; Cai, J.; Jia, W.; Wu, C. UAV-Based Multispectral Winter Wheat Growth Monitoring with Adaptive Weight Allocation. Agriculture 2024, 14, 1900. [Google Scholar] [CrossRef]
Wang, B.; Deng, J.; Jiang, H. Markov transition field combined with convolutional neural network improved the predictive performance of near-infrared spectroscopy models for determination of aflatoxin B1 in maize. Foods 2022, 11, 2210. [Google Scholar] [CrossRef]
Suryavanshi, A.; Mehta, S.; Gupta, A.; Aeri, M.; Jain, V. Agriculture Farming Evolution: Federated Learning CNNs in Combatting Watermelon Leaf Diseases. In Proceedings of the 2024 Asia Pacific Conference on Innovation in Technology (APCIT), Mysore, India, 26–27 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
Szegedi, G.; Kiss, P.; Horváth, T. Evolutionary federated learning on EEG-data. In Proceedings of the ITAT 2019-Information Technologies—Applications and Theory, Donovaly, Slovakia, 20–24 September 2019; pp. 71–78. [Google Scholar]
Kim, Y.; Sun, J.; Yu, H.; Jiang, X. Federated tensor factorization for computational phenotyping. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 887–895. [Google Scholar]
Lee, J.; Sun, J.; Wang, F.; Wang, S.; Jun, C.H.; Jiang, X. Privacy-preserving patient similarity learning in a federated environment: Development and analysis. JMIR Med. Inform. 2018, 6, e7744. [Google Scholar] [CrossRef]
Liu, D.; Dligach, D.; Miller, T. Two-stage federated phenotyping and patient representation learning. Proc. Conf. Comput. Linguist. Meet. 2019, 2019, 283–291. [Google Scholar]
Li, Y.; Wang, R.; Li, Y.; Zhang, M.; Long, C. Wind power forecasting considering data privacy protection: A federated deep reinforcement learning approach. Appl. Energy 2023, 329, 120291. [Google Scholar] [CrossRef]
Wang, H.; Shen, H.; Li, F.; Wu, Y.; Li, M.; Shi, Z.; Deng, F. Novel PV power hybrid prediction model based on FL Co-Training method. Electronics 2023, 12, 730. [Google Scholar] [CrossRef]
Goldwasser, S.; Micali, S.; Rackoff, C. The knowledge complexity of interactive proof-systems. In Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali; Association for Computing Machinery: New York, NY, USA, 2019; pp. 203–225. [Google Scholar]
Chaum, D.L. Untraceable electronic mail, return addresses, and digital pseudonyms. Commun. ACM 1981, 24, 84–90. [Google Scholar] [CrossRef]
Rabin, M.O. Fingerprinting by Random Polynomials; Technical Report; Harvard University: Cambridge, MA, USA, 1981. [Google Scholar]
Dwork, C. Differential privacy. In International Colloquium on Automata, Languages, and Programming; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–12. [Google Scholar]
Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, 4–7 March 2006; Proceedings 3. Springer: Berlin/Heidelberg, Germany, 2006; pp. 265–284. [Google Scholar]
Dwork, C.; Roth, A. The algorithmic foundations of differential privacy. Found. Trends® Theor. Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
McSherry, F.; Talwar, K. Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), Providence, RI, USA, 21–23 October 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 94–103. [Google Scholar]
Cachin, C.; Vukolić, M. Blockchain consensus protocols in the wild. arXiv 2017, arXiv:1707.01873. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, L.; Battino, M.; Farag, M.A.; Xiao, J.; Simal-Gandara, J.; Gao, H.; Jiang, W. Blockchain: An emerging novel technology to upgrade the current fresh fruit supply chain. Trends Food Sci. Technol. 2022, 124, 1–12. [Google Scholar] [CrossRef]
Gervais, A.; Karame, G.O.; Wüst, K.; Glykantzis, V.; Ritzdorf, H.; Capkun, S. On the security and performance of proof of work blockchains. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 3–16. [Google Scholar]
Saad, S.M.S.; Radzi, R.Z.R.M. Comparative review of the blockchain consensus algorithm between proof of stake (pos) and delegated proof of stake (dpos). Int. J. Innov. Comput. 2020, 10, 27–32. [Google Scholar] [CrossRef]
Pîrlea, G.; Sergey, I. Mechanising blockchain consensus. In Proceedings of the 7th ACM SIGPLAN International Conference on Certified Programs and Proofs, Los Angeles, CA, USA, 8–9 January 2018; pp. 78–90. [Google Scholar]
Castro, M.; Liskov, B. Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. (TOCS) 2002, 20, 398–461. [Google Scholar] [CrossRef]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics; PMLR: Birmingham, UK, 2017; pp. 1273–1282. [Google Scholar]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning; PMLR: Birmingham, UK, 2020; pp. 5132–5143. [Google Scholar]
Li, X.; Jiang, M.; Zhang, X.; Kamp, M.; Dou, Q. Fedbn: Federated learning on non-iid features via local batch normalization. arXiv 2021, arXiv:2102.07623. [Google Scholar]
Lin, T.; Kong, L.; Stich, S.U.; Jaggi, M. Ensemble distillation for robust model fusion in federated learning. Adv. Neural Inf. Process. Syst. 2020, 33, 2351–2363. [Google Scholar]
Fang, H.; Qian, Q. Privacy preserving machine learning with homomorphic encryption and federated learning. Future Internet 2021, 13, 94. [Google Scholar] [CrossRef]
Liu, Y.; Ma, Z.; Liu, X.; Ma, S.; Nepal, S.; Deng, R.H.; Ren, K. Boosting privately: Federated extreme gradient boosting for mobile crowdsensing. In Proceedings of the 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS), Singapore, 29 November–1 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–11. [Google Scholar]
Tang, F.; Wu, W.; Liu, J.; Wang, H.; Xian, M. Privacy-preserving distributed deep learning via homomorphic re-encryption. Electronics 2019, 8, 411. [Google Scholar] [CrossRef]
Zhang, X.; Chen, X.; Liu, J.K.; Xiang, Y. DeepPAR and DeepDPA: Privacy preserving and asynchronous deep learning for industrial IoT. IEEE Trans. Ind. Inform. 2019, 16, 2081–2090. [Google Scholar] [CrossRef]
Wang, B.; Li, H.; Guo, Y.; Wang, J. PPFLHE: A privacy-preserving federated learning scheme with homomorphic encryption for healthcare data. Appl. Soft Comput. 2023, 146, 110677. [Google Scholar] [CrossRef]
Aono, Y.; Hayashi, T.; Wang, L.; Moriai, S. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. 2017, 13, 1333–1345. [Google Scholar] [CrossRef]
Yu, S.X.; Chen, Z. Efficient secure federated learning aggregation framework based on homomorphic encryption. J. Commun. 2023, 44, 14. [Google Scholar]
Li, Q.; Cai, R.; Zhu, Y. GHPPFL: A Privacy Preserving Federated Learning Based On Gradient Compression and Homomorphic Encryption in Consumer App Security. IEEE Trans. Consum. Electron. 2025. early access. [Google Scholar] [CrossRef]
Fang, C.; Guo, Y.; Hu, Y.; Ma, B.; Feng, L.; Yin, A. Privacy-preserving and communication-efficient federated learning in Internet of Things. Comput. Secur. 2021, 103, 102199. [Google Scholar] [CrossRef]
Li, Y.; Li, H.; Xu, G.; Huang, X.; Lu, R. Efficient privacy-preserving federated learning with unreliable users. IEEE Internet Things J. 2021, 9, 11590–11603. [Google Scholar] [CrossRef]
Truex, S.; Baracaldo, N.; Anwar, A.; Steinke, T.; Ludwig, H.; Zhang, R.; Zhou, Y. A hybrid approach to privacy-preserving federated learning. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, London, UK, 15 November 2019; pp. 1–11. [Google Scholar]
Mandal, K.; Gong, G. PrivFL: Practical privacy-preserving federated regressions on high-dimensional data over mobile networks. In Proceedings of the 2019 ACM SIGSAC Conference on Cloud Computing Security Workshop, London, UK, 11 November 2019; pp. 57–68. [Google Scholar]
Yang, W.; Liu, B.; Lu, C.; Yu, N. Privacy preserving on updated parameters in federated learning. In Proceedings of the ACM Turing Celebration Conference-China, Hefei, China, 22–24 May 2020; pp. 27–31. [Google Scholar]
Zhang, L.; Xu, J.; Vijayakumar, P.; Sharma, P.K.; Ghosh, U. Homomorphic encryption-based privacy-preserving federated learning in IoT-enabled healthcare system. IEEE Trans. Netw. Sci. Eng. 2022, 10, 2864–2880. [Google Scholar] [CrossRef]
Shen, C.; Zhang, W.; Zhou, T.; Zhang, L. A Security-Enhanced Federated Learning Scheme Based on Homomorphic Encryption and Secret Sharing. Mathematics 2024, 12, 1993. [Google Scholar] [CrossRef]
Lai, C.; Zhao, Y.; Zheng, D. A Privacy Preserving and Verifiable Federated Learning Scheme Based on Homomorphic Encryption. Net-Info Secur. 2024, 24, 93–105. [Google Scholar]
Xu, D.; Yuan, S.; Wu, X. Achieving differential privacy in vertically partitioned multiparty learning. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 5474–5483. [Google Scholar]
Park, J.; Lim, H. Privacy-preserving federated learning using homomorphic encryption. Appl. Sci. 2022, 12, 734. [Google Scholar] [CrossRef]
Froelicher, D.; Troncoso-Pastoriza, J.R.; Pyrgelis, A.; Sav, S.; Sousa, J.S.; Bossuat, J.P.; Hubaux, J.P. Scalable privacy-preserving distributed learning. arXiv 2020, arXiv:2005.09532. [Google Scholar] [CrossRef]
Stripelis, D.; Saleem, H.; Ghai, T.; Dhinagar, N.; Gupta, U.; Anastasiou, C.; Ver Steeg, G.; Ravi, S.; Naveed, M.; Thompson, P.M.; et al. Secure neuroimaging analysis using federated learning with homomorphic encryption. In Proceedings of the 17th International Symposium on Medical Information Processing and Analysis, Campinas, Brazil, 17–19 November 2021; SPIE: Bellingham, WA, USA, 2021; Volume 12088, pp. 351–359. [Google Scholar]
Ma, J.; Naas, S.A.; Sigg, S.; Lyu, X. Privacy-preserving federated learning based on multi-key homomorphic encryption. Int. J. Intell. Syst. 2022, 37, 5880–5901. [Google Scholar] [CrossRef]
Hijazi, N.M.; Aloqaily, M.; Guizani, M.; Ouni, B.; Karray, F. Secure federated learning with fully homomorphic encryption for IoT communications. IEEE Internet Things J. 2023, 11, 4289–4300. [Google Scholar] [CrossRef]
Fereidooni, H.; Marchal, S.; Miettinen, M.; Mirhoseini, A.; Möllering, H.; Nguyen, T.D.; Rieger, P.; Sadeghi, A.R.; Schneider, T.; Yalame, H.; et al. SAFELearn: Secure aggregation for private federated learning. In Proceedings of the 2021 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 27 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 56–62. [Google Scholar]
Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October– 3 November 2017; pp. 1175–1191. [Google Scholar]
Choi, B.; Sohn, J.Y.; Han, D.J.; Moon, J. Communication-computation efficient secure aggregation for federated learning. arXiv 2020, arXiv:2012.05433. [Google Scholar]
Liu, Z.; Guo, J.; Lam, K.Y.; Zhao, J. Efficient dropout-resilient aggregation for privacy-preserving machine learning. IEEE Trans. Inf. Forensics Secur. 2022, 18, 1839–1854. [Google Scholar] [CrossRef]
Jin, X.; Yao, Y.; Yu, N. Efficient secure aggregation for privacy-preserving federated learning based on secret sharing. JUSTC 2024, 54, 0104-1–0104-16. [Google Scholar] [CrossRef]
Ghavamipour, A.R.; Zhao, B.Z.H.; Turkmen, F. Privacy-preserving, dropout-resilient aggregation in decentralized learning. arXiv 2024, arXiv:2404.17984. [Google Scholar]
Liu, Z.; Lin, H.Y.; Liu, Y. Long-term privacy-preserving aggregation with user-dynamics for federated learning. IEEE Trans. Inf. Forensics Secur. 2023, 18, 2398–2412. [Google Scholar] [CrossRef]
Maurya, A.; Haripriya, R.; Pandey, M.; Choudhary, J.; Singh, D.P.; Solanki, S.; Sharma, D. Federated Learning for Privacy-Preserving Severity Classification in Healthcare: A Secure Edge-Aggregated Approach. IEEE Access 2025, 13, 102339–102358. [Google Scholar] [CrossRef]
Xu, G.; Li, H.; Liu, S.; Yang, K.; Lin, X. VerifyNet: Secure and verifiable federated learning. IEEE Trans. Inf. Forensics Secur. 2019, 15, 911–926. [Google Scholar] [CrossRef]
Brunetta, C.; Tsaloli, G.; Liang, B.; Banegas, G.; Mitrokotsa, A. Non-interactive, secure verifiable aggregation for decentralized, privacy-preserving learning. In Australasian Conference on Information Security and Privacy; Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 510–528. [Google Scholar]
Eltaras, T.; Sabry, F.; Labda, W.; Alzoubi, K.; Ahmedeltaras, Q. Efficient verifiable protocol for privacy-preserving aggregation in federated learning. IEEE Trans. Inf. Forensics Secur. 2023, 18, 2977–2990. [Google Scholar] [CrossRef]
Guo, X.; Liu, Z.; Li, J.; Gao, J.; Hou, B.; Dong, C.; Baker, T. VeriFL: Communication-efficient and fast verifiable aggregation for federated learning. IEEE Trans. Inf. Forensics Secur. 2020, 16, 1736–1751. [Google Scholar] [CrossRef]
Fu, A.; Zhang, X.; Xiong, N.; Gao, Y.; Wang, H.; Zhang, J. VFL: A verifiable federated learning with privacy-preserving for big data in industrial IoT. IEEE Trans. Ind. Inform. 2020, 18, 3316–3326. [Google Scholar] [CrossRef]
Sotthiwat, E.; Zhen, L.; Li, Z.; Zhang, C. Partially encrypted multi-party computation for federated learning. In Proceedings of the 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Melbourne, Australia, 10–13 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 828–835. [Google Scholar]
Li, S.; Yao, D.; Liu, J. Fedvs: Straggler-resilient and privacy-preserving vertical federated learning for split models. In International Conference on Machine Learning; PMLR: Birmingham, UK, 2023; pp. 20296–20311. [Google Scholar]
Boer, D.; Kramer, S. Secure sum outperforms homomorphic encryption in (current) collaborative deep learning. arXiv 2020, arXiv:2006.02894. [Google Scholar]
Kanagavelu, R.; Wei, Q.; Li, Z.; Zhang, H.; Samsudin, J.; Yang, Y.; Goh, R.S.M.; Wang, S. CE-Fed: Communication efficient multi-party computation enabled federated learning. Array 2022, 15, 100207. [Google Scholar] [CrossRef]
Kadhe, S.; Rajaraman, N.; Koyluoglu, O.O.; Ramchandran, K. Fastsecagg: Scalable secure aggregation for privacy-preserving federated learning. arXiv 2020, arXiv:2009.11248. [Google Scholar]
Wang, D.; Zhang, L. Federated learning scheme based on secure multi-party computation and differential privacy. Comput. Sci. 2022, 49, 297–305. [Google Scholar]
Schlitter, N. A protocol for privacy preserving neural network learning on horizontal partitioned data. PSD 2008.
Urabe, S.; Wang, J.; Kodama, E.; Takata, T. A high collusion-resistant approach to distributed privacy-preserving data mining. Inf. Media Technol. 2007, 2, 821–834. [Google Scholar] [CrossRef]
Zhang, J.; Wang, J.; Zhao, Y.; Chen, B. An efficient federated learning scheme with differential privacy in mobile edge computing. In Proceedings of the Machine Learning and Intelligent Communications: 4th International Conference, MLICOM 2019, Nanjing, China, 24–25 August 2019; Proceedings 4. Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 538–550. [Google Scholar]
Wang, C.; Ma, C.; Li, M.; Gao, N.; Zhang, Y.; Shen, Z. Protecting data privacy in federated learning combining differential privacy and weak encryption. In Proceedings of the Science of Cyber Security: Third International Conference, SciSec 2021, Virtual Event, 13–15 August 2021; Revised Selected Papers 4. Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 95–109. [Google Scholar]
Kasula, V.K.; Yenugula, M.; Konda, B.; Yadulla, A.R.; Tumma, C.; Rakki, S.B. Federated Learning with Secure Aggregation for Privacy-Preserving Deep Learning in IoT Environments. In Proceedings of the 2025 IEEE Conference on Computer Applications (ICCA), Yangon, Myanmar, 18 March 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–7. [Google Scholar]
Ling, X.; Fu, J.; Wang, K.; Liu, H.; Chen, Z. Ali-dpfl: Differentially private federated learning with adaptive local iterations. In Proceedings of the 2024 IEEE 25th International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), Perth, Australia, 4–7 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 349–358. [Google Scholar]
Liu, X.; Zhou, Y.; Wu, D.; Hu, M.; Wang, J.H.; Guizani, M. FedDP-SA: Boosting Differentially Private Federated Learning via Local Dataset Splitting. IEEE Internet Things J. 2024, 11, 31687–31698. [Google Scholar] [CrossRef]
Liu, J.; Lou, J.; Xiong, L.; Liu, J.; Meng, X. Projected federated averaging with heterogeneous differential privacy. Proc. VLDB Endow. 2021, 15, 828–840. [Google Scholar] [CrossRef]
Yu, L.; Liu, L.; Pu, C.; Gursoy, M.E.; Truex, S. Differentially private model publishing for deep learning. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 332–349. [Google Scholar]
Zhang, L.; Zhu, T.; Xiong, P.; Zhou, W.; Yu, P.S. A robust game-theoretical federated learning framework with joint differential privacy. IEEE Trans. Knowl. Data Eng. 2022, 35, 3333–3346. [Google Scholar] [CrossRef]
Truex, S.; Liu, L.; Chow, K.H.; Gursoy, M.E.; Wei, W. LDP-Fed: Federated learning with local differential privacy. In Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and Networking, Heraklion, Greece, 27 April 2020; pp. 61–66. [Google Scholar]
Wang, B.; Chen, Y.; Jiang, H.; Zhao, Z. Ppefl: Privacy-preserving edge federated learning with local differential privacy. IEEE Internet Things J. 2023, 10, 15488–15500. [Google Scholar] [CrossRef]
Cheu, A.; Smith, A.; Ullman, J.; Zeber, D.; Zhilyaev, M. Distributed differential privacy via shuffling. In Proceedings of the Advances in Cryptology–EUROCRYPT 2019: 38th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Darmstadt, Germany, 19–23 May 2019; Proceedings, Part I. Springer International Publishing: Berlin/Heidelberg, Germany, 2019; Volume 38, pp. 375–403. [Google Scholar]
Jiang, Z.; Wang, W.; Chen, R. Dordis: Efficient Federated Learning with Dropout-Resilient Differential Privacy. In Proceedings of the Nineteenth European Conference on Computer Systems, Athens, Greece, 22–25 April 2024; pp. 472–488. [Google Scholar]
Scott, M.; Cormode, G.; Maple, C. Aggregation and transformation of vector-valued messages in the shuffle model of differential privacy. IEEE Trans. Inf. Forensics Secur. 2022, 17, 612–627. [Google Scholar] [CrossRef]
Hamouda, D.; Ferrag, M.A.; Benhamida, N.; Seridi, H. PPSS: A privacy-preserving secure framework using blockchain-enabled federated deep learning for industrial IoTs. Pervasive Mob. Comput. 2023, 88, 101738. [Google Scholar] [CrossRef]
Dillenberger, D.N.; Novotny, P.; Zhang, Q.; Jayachandran, P.; Gupta, H.; Hans, S.; Verma, D.; Chakraborty, S.; Thomas, J.J.; Walli, M.M.; et al. Blockchain analytics and artificial intelligence. IBM J. Res. Dev. 2019, 63, 5:1–5:14. [Google Scholar] [CrossRef]
Ma, C.; Li, J.; Shi, L.; Ding, M.; Wang, T.; Han, Z.; Poor, H.V. When federated learning meets blockchain: A new distributed learning paradigm. IEEE Comput. Intell. Mag. 2022, 17, 26–33. [Google Scholar] [CrossRef]
Feng, L.; Zhao, Y.; Guo, S.; Qiu, X.; Li, W.; Yu, P. BAFL: A blockchain-based asynchronous federated learning framework. IEEE Trans. Comput. 2021, 71, 1092–1103. [Google Scholar] [CrossRef]
Nguyen, H.; Nguyen, T.; Lovén, L.; Pirttikangas, S. Wait or Not to Wait: Evaluating Trade-Offs between Speed and Precision in Blockchain-based Federated Aggregation. arXiv 2024, arXiv:2406.00181. [Google Scholar] [CrossRef]
Cui, L.; Su, X.; Ming, Z.; Chen, Z.; Yang, S.; Zhou, Y.; Xiao, W. CREAT: Blockchain-assisted compression algorithm of federated learning for content caching in edge computing. IEEE Internet Things J. 2020, 9, 14151–14161. [Google Scholar] [CrossRef]
Zhao, Y.; Zhao, J.; Jiang, L.; Tan, R.; Niyato, D.; Li, Z.; Lyu, L.; Liu, Y. Privacy-preserving blockchain-based federated learning for IoT devices. IEEE Internet Things J. 2020, 8, 1817–1829. [Google Scholar] [CrossRef]
Lu, Y.; Huang, X.; Zhang, K.; Maharjan, S.; Zhang, Y. Blockchain and federated learning for 5G beyond. IEEE Netw. 2020, 35, 219–225. [Google Scholar] [CrossRef]
Zhou, S.; Huang, H.; Chen, W.; Zhou, P.; Zheng, Z.; Guo, S. Pirate: A blockchain-based secure framework of distributed machine learning in 5G networks. IEEE Netw. 2020, 34, 84–91. [Google Scholar] [CrossRef]
Li, Y.; Chen, C.; Liu, N.; Huang, H.; Zheng, Z.; Yan, Q. A blockchain-based decentralized federated learning framework with committee consensus. IEEE Netw. 2020, 35, 234–241. [Google Scholar] [CrossRef]
Zhou, M.; Yang, Z.; Yu, H.; Yu, S. VDFChain: Secure and verifiable decentralized federated learning via committee-based blockchain. J. Netw. Comput. Appl. 2024, 223, 103814. [Google Scholar] [CrossRef]
Yang, Z.; Shi, Y.; Zhou, Y.; Wang, Z.; Yang, K. Trustworthy federated learning via blockchain. IEEE Internet Things J. 2022, 10, 92–109. [Google Scholar] [CrossRef]
Qin, Z.; Yan, X.; Zhou, M.; Deng, S. BlockDFL: A Blockchain-based Fully Decentralized Peer-to-Peer Federated Learning Framework. In Proceedings of the ACM on Web Conference 2024, Singapore, 13–17 May 2024; pp. 2914–2925. [Google Scholar]
Chen, R.; Dong, Y.; Liu, Y.; Fan, T.; Li, D.; Guan, Z.; Liu, J.; Zhou, J. FLock: Robust and Privacy-Preserving Federated Learning based on Practical Blockchain State Channels. In Proceedings of the ACM on Web Conference 2025, Sydney, NSW, Australia, 28 April–2 May 2025; pp. 884–895. [Google Scholar]
Elsherbiny, O.; Gao, J.; Ma, M.; Guo, Y.; Tunio, M.H.; Mosha, A.H. Advancing lettuce physiological state recognition in IoT aeroponic systems: A meta-learning-driven data fusion approach. Eur. J. Agron. 2024, 161, 127387. [Google Scholar] [CrossRef]
Mohamed, T.M.K.; Gao, J.; Tunio, M. Development and experiment of the intelligent control system for rhizosphere temperature of aeroponic lettuce via the Internet of Things. Int. J. Agric. Biol. Eng. 2022, 15, 225–233. [Google Scholar] [CrossRef]
Ding, C.; Wang, L.; Chen, X.; Yang, H.; Huang, L.; Song, X. A blockchain-based wide-area agricultural machinery resource scheduling system. Appl. Eng. Agric. 2023, 39, 1–12. [Google Scholar] [CrossRef]
Adade, S.Y.S.S.; Lin, H.; Johnson, N.A.N.; Nunekpeku, X.; Aheto, J.H.; Ekumah, J.N.; Kwadzokpui, B.A.; Teye, E.; Ahmad, W.; Chen, Q. Advanced Food Contaminant Detection through Multi-Source Data Fusion: Strategies, Applications, and Future Perspectives. Trends Food Sci. Technol. 2024, 156, 104851. [Google Scholar] [CrossRef]
Li, Y.; Xu, L.; Lv, L.; Shi, Y.; Yu, X. Study on modeling method of a multi-parameter control system for threshing and cleaning devices in the grain combine harvester. Agriculture 2022, 12, 1483. [Google Scholar] [CrossRef]
Zhou, X.; Zhao, C.; Sun, J.; Cao, Y.; Yao, K.; Xu, M. A deep learning method for predicting lead content in oilseed rape leaves using fluorescence hyperspectral imaging. Food Chem. 2023, 409, 135251. [Google Scholar] [CrossRef] [PubMed]
Gao, J.; Hou, B.; Guo, X.; Liu, Z.; Zhang, Y.; Chen, K.; Li, J. Secure aggregation is insecure: Category inference attack on federated learning. IEEE Trans. Dependable Secur. Comput. 2021, 20, 147–160. [Google Scholar] [CrossRef]
Wang, Z.; Huang, Y.; Song, M.; Wu, L.; Xue, F.; Ren, K. Poisoning-assisted property inference attack against federated learning. IEEE Trans. Dependable Secur. Comput. 2022, 20, 3328–3340. [Google Scholar] [CrossRef]
Lyu, L.; Yu, H.; Yang, Q. Threats to federated learning: A survey. arXiv 2020, arXiv:2003.02133. [Google Scholar] [CrossRef]
Zhou, X.; Sun, J.; Tian, Y.; Lu, B.; Hang, Y.; Chen, Q. Hyperspectral technique combined with deep learning algorithm for detection of compound heavy metals in lettuce. Food Chem. 2020, 321, 126503. [Google Scholar] [CrossRef]
Raza, A.; Saber, K.; Hu, Y.; L. Ray, R.; Ziya Kaya, Y.; Dehghanisanij, H.; Kisi, O.; Elbeltagi, A. Modelling reference evapotranspiration using principal component analysis and machine learning methods under different climatic environments. Irrig. Drain. 2023, 72, 945–970. [Google Scholar] [CrossRef]
Lin, H.; Pan, T.; Li, Y.; Chen, S.; Li, G. Development of analytical method associating near-infrared spectroscopy with one-dimensional convolution neural network: A case study. J. Food Meas. Charact. 2021, 15, 2963–2973. [Google Scholar] [CrossRef]
Zhang, D.; Lin, Z.; Xuan, L.; Lu, M.; Shi, B.; Shi, J.; He, F.; Battino, M.; Zhao, L.; Zou, X. Rapid determination of geographical authenticity and pungency intensity of the red Sichuan pepper (Zanthoxylum bungeanum) using differential pulse voltammetry and machine learning algorithms. Food Chem. 2024, 439, 137978. [Google Scholar] [CrossRef]
Chen, J.; Zhang, M.; Xu, B.; Sun, J.; Mujumdar, A.S. Artificial intelligence assisted technologies for controlling the drying of fruits and vegetables using physical fields: A review. Trends Food Sci. Technol. 2020, 105, 251–260. [Google Scholar] [CrossRef]

Figure 1. A schematic diagram of federated learning.

Figure 2. A schematic diagram of horizontal federated learning.

Figure 3. A schematic diagram of vertical federated learning.

Figure 4. A schematic diagram of federated transfer learning.

Figure 5. The network structures for centralized federated learning.

Figure 6. The network structures for decentralized federated learning.

Figure 7. The structure of the block.

Figure 8. The structure of a Merkle tree is illustrated using four transaction records as examples.

Figure 9. Schematic diagram of the homomorphic encryption-based secure aggregation process.

Figure 10. Schematic of secure aggregation based on secure multi-party computation.

Figure 11. Schematic of secure aggregation based on blockchain.

Table 1. Secure aggregation schemes based on homomorphic encryption.

Type	Scheme	Algorithm	Protected Model	Network Structure	Resource Requirement
Partially Homomorphic Encryption	[65,66,67,68,69,70,71,72]	Paillier	Local	Centralization	Acceptable
	[73]	ElGamal	Local	Centralization	Lightweight
	[74,75]	threshold Paillier	Local	Centralization	High cost
	[76]	Joye-Libert	Local and Global	Centralization	Lightweight
	[77]	RSA/Paillier	Local	Trusted party and server	Acceptable
	[78,79,80]	threshold ElGamal	Local	Trusted party and server	Acceptable
	[81,82]	Others	Local and Global	Centralization	High cost
Fully Homomorphic Encryption	[83,84,85]	CKKS	Local and Global	Centralization	High cost
Fully Homomorphic Encryption	[86,87]	Others	Local	Centralization	High cost

Table 2. Secure aggregation schemes based on secure multi-party computation.

Type	Scheme	Support for User Disconnection	Protected Model	Network Structure	Resource Requirement
Shamir	[89,90,91]	Support	Local	Centralization	Lightweight
	[92]	Support	Local	Decentralization	High cost
	[93,94]	Support	Local and Global	Centralization	Acceptable
Verifiable Secret Sharing	[95]	Support	Local and Global	Trusted party and server	High cost
	[96]	Support	Local and Global	Decentralization	High cost
	[97,98]	Support	Local and Global	Centralization	Lightweight
	[99]	Not supported	Local and Global	Centralization	Lightweight
Additive Secret Sharing	[100]	Not supported	Local	Centralization	Acceptable
	[102,103]	Not supported	Local	Decentralization	Acceptable
	[101,104]	Support	Local	Centralization	Lightweight
	[105]	Support	Local and Global	Centralization	Acceptable

Table 3. Secure aggregation schemes based on differential privacy.

Type	Scheme	Framework	Protected Model	Network Structure	Resource Requirement
Laplace Mechanism	[108]	LDP	Local	Decentralization	Lightweight
	[109]	LDP	Local and Global	Centralization	Lightweight
	[110]	LDP	Local	Centralization	Lightweight
Gaussian Mechanism	[111,112]	LDP	Local	Centralization	Lightweight
	[113]	LDP	Local and Global	Centralization	Lightweight
	[114]	GDP	Global	Centralization	Acceptable
Exponential Mechanism	[115]	GDP	Global	Centralization	Lightweight
Exponential Mechanism	[116,117]	LDP	Local	Centralization	Lightweight
Others	[118,119,120]	DDP	Local and Global	Centralization	Acceptable

Table 4. Secure aggregation schemes based on blockchain.

Type	Scheme	Model Accuracy	Protected Model	Network Structure	Resource Requirement
PoW	[123]	Increase	Local and Global	Full Decentralization	High cost
	[124]	Increase 12.1% compared to FedAvg	Local and Global	Full Decentralization	High cost
	[125]	Increase	Local and Global	Full Decentralization	High cost
PoS or DPoS	[126,127,128]	Increase	Local and Global	Partial Decentralization	Acceptable
Committee Consensus Algorithm	[129,130]	Increase	Local and Global	Partial Decentralization	Acceptable
Committee Consensus Algorithm	[131]	No decline	Local and Global	Partial Decentralization	Acceptable
PBFT	[132,133,134]	Increase	Local and Global	Full Decentralization	Acceptable

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; Luo, Y.; Li, T. A Review of Research on Secure Aggregation for Federated Learning. Future Internet 2025, 17, 308. https://doi.org/10.3390/fi17070308

AMA Style

Zhang X, Luo Y, Li T. A Review of Research on Secure Aggregation for Federated Learning. Future Internet. 2025; 17(7):308. https://doi.org/10.3390/fi17070308

Chicago/Turabian Style

Zhang, Xing, Yuexiang Luo, and Tianning Li. 2025. "A Review of Research on Secure Aggregation for Federated Learning" Future Internet 17, no. 7: 308. https://doi.org/10.3390/fi17070308

APA Style

Zhang, X., Luo, Y., & Li, T. (2025). A Review of Research on Secure Aggregation for Federated Learning. Future Internet, 17(7), 308. https://doi.org/10.3390/fi17070308

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Review of Research on Secure Aggregation for Federated Learning

Abstract

1. Introduction

2. Federated Learning

2.1. Definition of Federated Learning

2.2. Classification of Federated Learning

2.2.1. Data Partition

2.2.2. Client Size

2.2.3. Network Structure

2.3. Applications

2.3.1. Mobile Device

2.3.2. Agriculture

2.3.3. Healthcare

2.3.4. Renewable Energy

3. Privacy-Preserving Mechanisms

3.1. Homomorphic Encryption

3.2. Secure Multi-Party Computation

3.3. Differential Privacy

3.4. Blockchain

4. Aggregation Techniques for Federated Learning

4.1. Fundamental Aggregation Algorithms

4.2. Secure Aggregation Algorithms

4.2.1. Secure Aggregation Based on Homomorphic Encryption

4.2.2. Secure Aggregation Based on Secure Multi-Party Computation

4.2.3. Secure Aggregation Based on Differential Privacy

4.2.4. Secure Aggregation Based on Blockchain

4.3. Discussions

5. Challenges and Future Directions

5.1. Global Model Quality

5.2. Security

5.3. Adaptability

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI