A Blockchain-Based Fairness Guarantee Approach for Privacy-Preserving Collaborative Training in Computing Force Network

Sun, Zhe; Li, Weiping; Liang, Junxi; Yin, Lihua; Li, Chao; Wei, Nan; Zhang, Jie; Wang, Hanyi

doi:10.3390/math12050718

Open AccessArticle

A Blockchain-Based Fairness Guarantee Approach for Privacy-Preserving Collaborative Training in Computing Force Network

by

Zhe Sun

¹,

Weiping Li

¹,

Junxi Liang

¹,

Lihua Yin

^1,*,

Chao Li

¹,

Nan Wei

¹,

Jie Zhang

² and

Hanyi Wang

^3,*

¹

Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China

²

School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China

³

China Mobile (Suzhou) Software Technology Co., Ltd., Suzhou 215163, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(5), 718; https://doi.org/10.3390/math12050718

Submission received: 29 January 2024 / Revised: 23 February 2024 / Accepted: 26 February 2024 / Published: 28 February 2024

(This article belongs to the Special Issue Applications of Big Data Analysis and Modeling)

Download

Browse Figures

Versions Notes

Abstract

The advent of the big data era has brought unprecedented data demands. The integration of computing resources with network resources in the computing force network enables the possibility of distributed collaborative training. However, unencrypted collaborative training is vulnerable to threats such as gradient inversion attacks and model theft. To address this issue, the data in collaborative training are usually protected by cryptographic methods. However, the semantic meaninglessness of encrypted data makes it difficult to prevent potential data poisoning attacks and free-riding attacks. In this paper, we propose a fairness guarantee approach for privacy-preserving collaborative training, employing blockchain technology to enable participants to share data and exclude potential violators from normal users. We utilize a cryptography-based secure aggregation method to prevent data leakage during blockchain transactions, and employ a contribution evaluation method for encrypted data to prevent data poisoning and free-riding attacks. Additionally, utilizing Shamir’s secret sharing for secret key negotiation within the group, the negotiated key is directly introduced as noise into the model, ensuring the encryption process is computationally lightweight. Decryption is efficiently achieved through the aggregation of encrypted models within the group, without incurring additional computational costs, thereby enhancing the computational efficiency of the encryption and decryption processes. Finally, the experimental results demonstrate the effectiveness and efficiency of our proposed approach.

Keywords:

fairness guarantee; collaborative training; computing force network; blockchain; privacy preservation

MSC:

68T07

1. Introduction

The advent of the big data era and the emergence of various data-driven technologies have brought unprecedented demands for data processing and analysis. With the integration of computing resources and network resources, the concept of the Computing Force Network (CFN) has emerged and offered possibilities for distributed collaborative training. Researchers have designed many attractive candidate technologies that can be implemented in a CFN, including the following:

Network security: Researchers believe that active defense mechanisms can be implemented in a CFN through deep neural networks. This involves periodically adjusting the deployment locations of control nodes and mapping relationships between switch nodes and controller nodes to enhance the security objectives of the network system.
Computational Awareness network: With the significant improvement in the accessibility and scheduling techniques of a CFN, dynamic allocation of computing resources becomes feasible. This allows for the allocation of idle resources to various tasks while reallocating resources to tasks that are better suited for them.
Enhanced transmission technology: In a CFN, data are offloaded to the network edge, compressed, and then transmitted. This makes full use of computational resources across the network to meet the demands of massive data transmission. This involves coordinating the computing and spectral resources throughout the network to minimize system latency and deliver high-speed data services.

As illustrated in Figure 1, these technological characteristics establish the foundation for collaborative training, a novel application paradigm, by facilitating network and computational resource allocation, as well as ensuring secure data transmission. Specifically, users in a CFN can generate or process task-oriented data and subsequently provide the results to service providers for mutual benefit, necessitating that users possess valuable data and substantial data-processing capabilities [1]. This innovative model of data transaction has the potential to supplant the traditional method, where service providers directly collect data from users. Participants in this model may include both individual users and corporate entities in possession of data. In a CFN, identifying an authoritative aggregation center to facilitate the interaction among these participants poses a significant challenge. Consequently, this type of data transaction may adopt a decentralized approach.

Presently, numerous researchers are exploring the application of blockchain technology to facilitate decentralized and trusted transactions, akin to those in Bitcoin. However, the application of blockchain technology to data transactions presents several challenges [2,3,4]. To deliver and ensure the integrity of data, blockchain-based data sharing often involves storing users’ data on blocks [5], potentially leading to privacy breaches. While encryption methods safeguard data privacy, they also give rise to new challenges [6]. Owing to the semantic meaninglessness of encrypted data, malicious and dishonest users may perpetrate data poisoning and free-riding attacks [7] without detection and tracking. This situation undermines the core principles of equitable data transactions.

To achieve practical data transactions and guarantee fairness, three issues must be addressed. The first issue concerns the privacy of user data. Users must maintain adequate privacy when participating in data transactions in the CFN. If data privacy is violated in transactions, it will inevitably lead to negative long-term effects on participants’ positivity. The second issue to be addressed involves transaction fairness. If participants do not receive fair payment from the transaction, the degree of participation will decrease over the long-term operation of the system. The third issue pertains to feasibility. If user data do not contribute sufficiently to the collaborative training [8], or if the performance overhead of the data transaction system exceeds the CFN’s affordability [9], it will also make this new transaction mode difficult to implement.

In this paper, we propose a blockchain-based privacy-preserving collaborative training approach that guarantees fairness, enabling collaborative training and fair payment within the CFN. Initially, we discuss how attackers can be prevented from breaking the confidentiality of data and malicious attackers can be denied their malicious behavior through blockchain and secure aggregation methods. Based on this, the identity of the malicious attacker is accurately detected without breaking the confidentiality of normal users’ data through grouped aggregation evaluation methods.

The main contributions of the paper are as follows:

We propose a blockchain-based approach for ensuring fairness, ingeniously integrating the anti-denial mechanism of distributed ledgers with the confidentiality assurance of secure aggregation and potential violator detection. This approach can effectively mitigate data poisoning and free-riding attacks in encrypted states.
Our proposed method has the capability to identify poisoning attackers and free-riding attackers during the collaborative training process and exclude them from the collaborative training process. This approach reduces unfair behaviors such as malicious users reaping rewards without contributing data and computing resources and disrupting the effectiveness of collaborative training so that benefits for all participants are diminished. It ensures the benefits of legitimate users.
We reduce the computational overhead of cryptographic methods by limiting the scale of negotiation during group formation using Shamir’s secret sharing. This approach circumvents high-computational multiplication calculations during the negotiation process, thereby accommodating the issue of limited computational capabilities of users in the CFN.
We have developed a fine-grained potential violator detection algorithm that dynamically adjusts user groups, enabling the identification of potential violators in scenarios of dynamic user dropout during collaborative training.

In the remaining sections of this paper, we review existing research on data privacy, fairness guarantee, and usability optimization in collaborative training, and how our work performs better. Subsequently, we introduce our proposed approach and provide a comprehensive description of the fairness guarantee mechanism. Following this, security analysis and experimental evaluation are utilized to demonstrate the effectiveness and efficiency of the fairness guarantee approach. This paper concludes with a summary of the findings and implications.

2. Related Work

2.1. Data Privacy

In task-oriented data trading enabled by a CFN, addressing the privacy leakage issue is paramount [10,11,12,13]. Otherwise, this may lead to users’ reluctance to participate in collaborative training. Privacy protection in collaborative training is classified into two categories: data-level protection and information-level protection. Data-level privacy threats involve attackers directly accessing participants’ model parameters during collaborative training, with typical attack methods including model reconstruction attacks, among others. Information-level privacy threats entail attackers utilizing background knowledge and models to infer private information in the training data, with typical methods encompassing membership inference attacks, attribute inference attacks, and more.

To mitigate the aforementioned privacy threats at both the data and information levels in collaborative training, researchers have developed a variety of privacy-preserving techniques. To counter privacy threats at the data level, predominant approaches include homomorphic encryption, functional encryption, authenticated encryption, and secure multi-party computation. Bonawitz et al. [14] introduced a secure aggregation mechanism for collaborative training, employing an authenticated encryption scheme to encrypt client model parameters and utilizing a secret sharing method to deter the server from accessing specific clients’ model parameters. Consequently, So et al. [15] introduced a novel metric designed to measure privacy protection across multiple training rounds and developed a structured user selection strategy, aiming to ensure long-term privacy for individual participants across various training iterations. This strategy considers both the fairness of the selection process and the average number of participants per training round. This framework presents a trade-off between user selection and privacy preservation, thereby facilitating the optimization of model performance while ensuring an appropriate level of privacy. This approach effectively mitigates multi-round privacy leakage in federated learning and guarantees secure aggregation, thus ensuring the privacy of model parameters. To safeguard privacy at the information level, researchers typically introduce perturbations to the data or generalize the data, thereby preventing various types of inference attacks during collaborative training. Among these, Abadi et al. [16] proposed a differential privacy approach utilizing the stochastic gradient descent algorithm, which partitions the gradient and integrates noise in accordance with the differential privacy mechanism during the training process, thus thwarting various inference attacks. Yang et al. [17] proposed an innovative defense mechanism aimed at bolstering resilience against membership inference attacks in federated learning. This method involves personalizing the client’s local data distribution during training and inference stages, thereby generating an offset model. It strikes an effective balance between privacy protection and the preservation of model utility, effectively countering membership inference attacks while concurrently minimizing their impact on model accuracy.

However, data-centric privacy protection methods render traditional data quality verification approaches ineffective, as malicious actors can conceal poisoned models within encrypted data, evading quality checks targeted at individual participants. Some lazy participants may simply upload the initial models back to the server without any modifications, deceiving the rewards of collaborative training. These attacks significantly impede the fairness of collaborative training, prompting significant concern among researchers.

2.2. Fairness Guarantee

To address the previously mentioned challenges that undermine transaction fairness, researchers have devised various methods employing blockchain technology to ensure fairness. Initially, to guarantee data accuracy and ownership, Weng et al. [18] developed a data audit platform that utilizes the homomorphic Paillier encryption algorithm, complemented with a zero-knowledge proof method, certifying the legitimacy and integrity of each dataset. All audit information is recorded on the blockchain, forming a publicly searchable audit ledger, serving as a safeguard against malicious activities that undermine the system. Tian et al. [19] proposed a blockchain-based data auditing scheme aimed at establishing an effective data audit mechanism. This scheme introduces non-interactive designated decryptor function encryption and smart contract technology, achieving secure data auditing and providing indisputable arbitration mechanisms. Utilizing blockchain as a self-recording channel, it offers reliable evidence for resolving disputes between users and cloud storage service providers, thereby minimizing conflicts and ensuring undisputed arbitration. Concurrently, this scheme incorporates a set of reward and penalty measures to mitigate the malicious behavior of untrusted users and cloud storage service providers, thus ensuring the efficiency and security of the audit process. To address data quality disparities, Lu et al. [20] devised a training quality consensus scheme necessitating that participants honestly declare their data quality. An aggregator then assesses whether the quality of the aggregated result aligns with the participants’ claims, aiding in defending against malicious low-quality data submissions.

Researchers have proposed various fairness mechanisms to dynamically adjust the data sharing system in order to maintain fairness, including reputation mechanisms or contract theory, ensuring that participants with good data and credit can consistently receive more rewards. Zhou et al. [21] have implemented a reputation recording mechanism on the blockchain, enabling miners to evaluate participants’ data and subsequently reward higher data quality more. Furthermore, blockchain nodes are designed to prioritize the collection of these data. Jiao et al. [22] introduced a blockchain-based reputation mechanism that assesses participants’ historical contributions and behaviors. The mechanism assigns a reputation score to each participant, with those with higher scores being accorded more weight in the model training process, thereby incentivizing the contribution of high-quality data while restricting participants with lower scores, thus preventing the impact of malicious behavior. This approach effectively balances the need for user privacy protection with the imperative of improving data quality.

To ensure the long-term functionality of the system, guaranteeing the contribution evaluation and fair payment during collaboration is essential. Ma et al. [23] introduced a blockchain-based method to transparently assess participants’ contributions in federated learning. They utilized a novel group-based Shapley value calculation framework, compatible with secure aggregation, which enables a fair evaluation of each participant’s contribution while protecting data owners’ privacy. Implemented on the blockchain, this method ensures both the transparency and verifiability of the assessment process, as well as the effective evaluation of contributions. Chen et al. [24] developed a blockchain-based federated learning framework specifically designed to ensure fair transactions between task publishers and participants. The non-interactive designated decryptor function encryption scheme they adopted secures the privacy of training data, while the design of the blockchain ensures that participants receive corresponding rewards only after completing tasks and fulfilling requirements. Consequently, the framework strikes a balance between privacy protection, participant motivation, and the establishment of a fair payment mechanism. To address the contradiction between data auditing and the perceived irrelevance of encrypted data in the current blockchain-based framework, further research into the quality evaluation of encrypted data is imperative. Du et al. [25] proposed a blockchain-based method for assessing data quality. This method employs the Kullback–Leibler divergence to evaluate information loss between non-IID and IID data samples and uses the inverse of data quantity to simulate their marginal utility. Thus, the blockchain facilitates a transparent evaluation of data quality. Sun et al. [6] employed BCP homomorphic encryption and group aggregation auditing to evaluate the quality of encrypted data. This approach enables the identification of malicious users through iterative partitioning and group aggregation auditing without the need to decrypt individual user data.

However, existing fairness guarantee approaches often treat all participants as potential malicious attackers when auditing, resulting in significant wastage of auditing resources. Our proposed approach segregates ordinary users from those with potential malicious intent, thereby effectively enhancing the utilization of auditing resources.

2.3. Utility Optimization

To improve the utility of collaborative training in the CFN, the existing work can be categorized into two primary areas: enhancing user data availability, and optimizing system efficiency. Data availability is adjusted by balancing user privacy and data quality, with researchers incorporating game theory from economics and various game-theory-based incentive mechanisms to encourage high-quality data provision by users. Xiong et al. [26] proposed a game-theory-based incentive mechanism to achieve equilibrium between user privacy and data quality. They initially calculated users’ privacy levels with a personalized privacy metric algorithm, and then built a game model focusing on service quality and users’ personalized privacy. They analyzed Bayesian equilibrium to encourage users to submit high-quality data. Wang et al. [27] proposed a game-theory-based joint resource allocation and incentive mechanism. This mechanism is designed to address the resource allocation challenge within a blockchain-based federated learning context, wherein the model owner and the client are required to allocate computational resources in accordance with the provided rewards. It ensures fairness in reward distribution by applying the Shapley value and calculating rewards based on client contributions. This mechanism permits model owners and clients to independently determine the distribution of rewards and the allocation of computing resources, following their respective optimal strategies. Thereby, it incentivizes clients to submit high-quality data while ensuring system stability and sustainability. On the other hand, researchers have also been working on improving the efficiency of blockchain in collaborative data analysis. Qu et al. [28] proposed a federated learning scheme that combines blockchain technology with a distributed hash table to enable a data storage and exchange framework, thereby enhancing collaborative training efficiency.

However, existing utility optimization solutions for collaborative training predominantly focus on the training process itself, paying less attention to balanced solutions concerning privacy, fairness, and utility. We propose an encrypted data fairness guarantee method based on multi-party secure encryption. While leveraging multi-party secure computation to safeguard data privacy, our method also considers the use of noise addition as an alternative to homomorphic encryption, effectively reducing the computational and communication overhead incurred due to privacy protection and fairness assurance during the collaborative training process.

3. Overview of Our Proposed Approach

In this chapter, we provide an overview of our blockchain-based fairness guarantee approach for privacy-preserving collaborative training in Figure 2. Our proposed approach encompasses three roles: the service provider, the users and the blockchain.

Service provider: The service application builders that release collaborative training tasks to the public categorize and rank the task participants (i.e., users) based on their situations. After each round of collaborative training, they examine the outcomes of groups and use this as a basis to determine if there are any issues with the users.
Users: Physical devices with the ability to access data or an organization that collects and stores data. On the premise of ensuring data privacy, it trades data with service provider and benefits from it.
Blockchain: decentralized distributed ledgers, which are a series of data blocks with interconnections generated using cryptography, are used to record local models uploaded by users and offer upload and download services to service providers and users.

The service provider and users transfer models via blockchain. Users are divided into general users and key users. General users are considered normal users by the service provider, while key users are considered users with a high probability of being potential violators by the service provider. The service provider conducts two rounds of basic grouping for the entire user group without distinguishing between user types. For individual key users, two additional general users are randomly selected to form a focus group. Each group consists of three users. The basic groups are used to satisfy basic audit for all users to determine whether they are key users and focus groups are used to implement fine-grained audit for key users to determine whether they are potential violators. In Figure 2, users with the same background color in a single grouping represent users in the same group.

Secure aggregation: A privacy concern in collaborative data analysis is the association between users and their respective data. Should an attacker discern that a specific data feature or model parameter is attributable to an identified participant, they could reconstruct training data or other attributes via a gradient inversion attack. To mitigate this issue, a prevalent strategy involves encrypting users’ data employing cryptographic aggregation methods like homomorphic encryption, secure multi-party computation protocols, and function encryption. Utilizing these methods, service providers can process encrypted data and only access aggregated data results and remain unaware of the specific details contributed by individual users. Encryption methods, while obstructing the association between users and their data, also conceal malicious behaviors such as data poisoning and free-riding attacks within the encrypted data and make it more challenging for service providers to assess the quality of user data and to identify potential violators. How service providers evaluate the contributions of users to learning tasks and how they detect potential violators have become significant challenges.

User detection: In collaborative training, users may engage in various malicious behaviors such as data poisoning and free-riding, while encryption hinders the association of these behaviors with their data and makes it harder for service providers to detect such malicious behaviors in an encrypted state. To facilitate the detection of users engaged in malicious behaviors like data poisoning and free-riding in an encrypted state, we employ a group detection method. This involves multiple unduplicated groupings of the entire user population and key users subjected to additional random groupings. We assess the contribution scores of users within each group by comparing the accuracy of the aggregated model for the groups with a threshold determined by the overall user situation. For users participating in a certain number of groups, we determine the presence of malicious behavior by obtaining a fine-grained assessment of their contributions through confidence intervals, thus achieving user detection in an encrypted state.

Efficiency optimization: Although CFNs integrate computing power with network resources, devices with limited computing capabilities still exist. Blockchain technology necessitates high-performance computing nodes. Although methods based on homomorphic encryption and secure multi-party computation (SMPC) can encrypt local models, the high computational overhead is burdensome for devices with limited computing capabilities. To reduce the computational demands of the encryption process and enhance the efficiency of model encryption and ciphertext aggregation, we have designed a group-based encryption method. This method involves grouping the user population, within which a secret key is negotiated using SMPC and then added to the local models for encryption. The service provider aggregates the encrypted models from each group and decrypts them to obtain the aggregated model of the group. The secret key negotiation process using SMPC is confined to within the groups, where the scale of users is not large, and neither the secret key negotiation nor the encryption process involves multiplication. Moreover, the normal model aggregation process can achieve decryption. This significantly reduces the computational overhead during the secret key negotiation and encryption/decryption processes, thereby improving efficiency.

4. Fine-Grained Potential Violator Detection Algorithm Based on Blockchain

In this chapter, we focus on the implementation of a blockchain-based fine-grained potential violator detection algorithm using cryptographic methods. This includes user grouping based on user types, intra-group encryption secret key negotiation based on Shamir’s secret sharing, encryption, aggregated decryption of local models within groups and fine-grained detection of potential violators. The overall process of the fine-grained potential violator detection algorithm is presented in Algorithm 1.

Algorithm 1: Fine-Grained Potential Violator Detection Algorithm

Input: user set $I$ , key user set $K I$ , number of training rounds $e p o c h s$
Output: global model $M_{g l o b a l}$ , global potential violators set $P I_{g l o b a l}$

1: Set the Potential Violators Set $P I$ , and the New Key User Set $N K I$ , and initialize them as emptyset and $K I$ , respectively.
2: Set the initial model.
3: User grouping algorithm based on user types.
4: For $e p o c h$ in $[e p o c h s]$
5: If $K I \neq N K I$ or $P I \neq \emptyset$
6: User grouping algorithm based on user type
7: $K I \leftarrow N K I$
8: End
9: For $i \in I$
10: Distribute the initial model
11: Training a local model $m^{i}$
12: End
13: For $g \in G$
14: Intra-group encryption secret key negotiation based on Shamir’s Secret Sharing
15: End
16: For $i \in I$
17: For $g \in G_{a}^{i} \cup G_{p}^{i}$
18: Encryption algorithms for the local model of each user within a subgroup
19: Uploading encrypted models
20: End
21: End
22: For $g \in G$
23: Aggregate decryption algorithms for local models within groups
24: End
25: $P I, N K I \leftarrow$ Fine-grained detection of potential violators
26: If $P I \neq \emptyset$
27: $I \leftarrow I / P I$
28: $P I_{g l o b a l} \leftarrow P I_{g l o b a l} \cup P I$
29: End
30: $M_{g l o b a l} \leftarrow$ Global model aggregation
31: End
32: Return $M_{g l o b a l}$ , $P I_{g l o b a l}$

In Algorithm 1, Fine-Grained Potential Violator Detection Algorithm, the entire collaborative training process undergoes multiple rounds of training. The service provider conducts potential violator detection in each round and adjusts the key user set. Potential violators identified during the detection are removed from the collaborative training process. When changes occur in the user set or the key user set, the entire user population is regrouped and collaborative training continues.

In each round of collaborative training, the service provider groups the users based on a user-grouping algorithm based on user types and distributes the grouping information and the global model to be trained to each user. Then, each user trains their local model and negotiates encryption secret keys within their group using intra-group encryption secret key negotiation based on Shamir’s secret sharing. Subsequently, they encrypt their local models using the group-specific encryption algorithm and upload them to alliance chain. The service provider downloads cryptographic local models from alliance chain, uses aggregation decryption algorithm for local models within groups to obtain the aggregated model and conducts fine-grained detection of potential violators based on the aggregated models of each group. If potential violators are identified, they are excluded from the entire collaborative training process. Then, the server provider repeats global aggregation by group until the model converges.

4.1. User Grouping Based on User Types

The service provider categorizes users into three types: general users, key users, and potential violators. Potential violators are identified by the service provider as malicious participants based on the information the service provider holds and are denied participation in collaborative training. Key users are those whom the service provider needs to closely monitor in collaborative training to determine if they are potential violators. Depending on whether a user is considered a key user, the service provider adopts different grouping strategies for users participating in collaborative training, as detailed in Algorithm 2.

Algorithm 2: User Grouping Based on User Types
Input: all users set $I$ , key user set $K I$ , number of key user repetitions $K I N$ Output: basic grouping set $G_{b}$ , focus group set $G_{k}$ , set of active groups for each user $G_{a}^{i}, i \in I$ , set of passive groups for each user $G_{p}^{i}, i \in I$ 1 Initialize $G_{b}$ , $G_{k}$ , $G_{a}^{i}, i \in I$ , $G_{p}^{i}, i \in I$ to the empty set
2 For $n$ in [2]	//Basic grouping for all users
3 For $(i_{1}, i_{2}, i_{3})$ Unduplicated selection of a group of users from $I$ 4 $g \leftarrow (i_{1}, i_{2}, i_{3})$ 5 $G_{b} \leftarrow G_{b} \cup g$ 6 $G_{a}^{i_{1}} \leftarrow G_{a}^{i_{1}} \cup g, G_{a}^{i_{2}} \leftarrow G_{a}^{i_{2}} \cup g, G_{a}^{i_{3}} \leftarrow G_{a}^{i_{3}} \cup g$ 7 End 8 End
9 For each $K i \in K I$	//Focused grouping, targeting key users
10 For $n$ in $[K I N]$ 11 $K i, i_{2}, i_{3} \in I$ 12 $g \leftarrow (K i, i_{2}, i_{3})$ 13 $G_{k} \leftarrow G_{k} \cup g$ 14 $G_{a}^{K i} \leftarrow G_{a}^{K i} \cup g, G_{p}^{i_{2}} \leftarrow G_{p}^{i_{2}} \cup g, G_{p}^{i_{3}} \leftarrow G_{p}^{i_{3}} \cup g$ 15 End 16 End 17 Return $G_{b}$ , $G_{k}$ , $G_{a}^{i}, i \in I$ , $G_{p}^{i}, i \in I$

Algorithm 2, based on whether users are general or key users, employs different grouping strategies, with each group consisting of three users. To meet the basic contribution assessment needs, the service provider conducts two unduplicated randomized groupings of general and key users together. These groups are marked as basic groups by the service provider, and represented by

G_{b}

. Each user within these groups marks that group as active group, and

G_{a}^{i}

represents the set of groups marked as active group by user

i

.

For the fine-grained assessment needs of key users, the service provider randomly selects two general users to form a group with a key user and repeats this

K N I

times for a single key user. These groups are marked as focus group by the service provider, and represented by

G_{k}

. Each key user marks these groups as active group, while the selected general users mark these groups as passive group, and

G_{p}^{i}

represents the set of groups marked as passive group by user

i

.

4.2. Intra-Group Secret Key Negotiation Based on Shamir’s Secret Sharing

Within each group, users utilize the Shamir’s secret sharing algorithm for intra-group secret key negotiation. By leveraging the features of Shamir’s secret sharing, the process can be further categorized into generation of secret and secret shares, homomorphic addition, secret reconstruction and secret key computation, as specified in Algorithm 3.

Algorithm 3: Intra-Group Secret Key Negotiation Based on Shamir’s Secret Sharing
Input: user set $I_{g}$ of Group $g$ Output: secret key $s k_{g}^{i}$ of user $i$ within group $g, i \in I_{g}$
1 For $i \in I_{g}$	//Secret and secret share generation
2 Present a random number $r a n d_{g}^{i}$ 3 $s_{g}^{i j} \leftarrow s h a r i n g (r a n d_{g}^{i}, \|I_{g}\|, \|I_{g}\|), j \in I_{g}$ 4 Directly transmit secret share $s_{g}^{i j}$ to user $j$ within the group 5 End
6 For $i \in I_{g}$	//homomorphic addition
7 $s_{g}^{i} \leftarrow a d d (s_{g}^{j i}, j \in I_{g})$ 8 Directly distribute secret share $s_{g}^{i}$ to other users within the group 9 End
10 For $i \in I_{g}$	//Recovering secret and calculating key
11 $s_{g} \leftarrow r e c o n s t r u c t i o n (s_{g}^{j}, j \in I_{g})$ 12 $s k_{g}^{i} \leftarrow r a n d_{g}^{i} - s_{g} / \|I g\|$ 13 End 14 return $s k_{g}^{i}, i \in I_{g}$

Algorithm 3 is achieved using secure multi-party computation based on Shamir’s secret sharing and ensures the security of the secret key. Among Algorithm 3, the

s h a r i n g ()

function represents the secret key generation function based on Shamir’s secret sharing, which divides a secret

s

into

n

secret shares, and the secret

s

can be reconstructed when the number of secret shares is no less than

t

. The

a d d ()

function represents the addition of secret shares and its output is equivalent to a secret share obtained from using

s h a r i n g ()

on the sum of two secrets

s_{1}

and

s_{2}

because of the additive homomorphism property of Shamir’s secret sharing. The

r e c o n s t r u c t i o n ()

function represents the secret reconstruction function based on Shamir’s secret sharing, which can reconstruct the secret

s

when given no less than

t

secret shares.

Within group

g

, each user independently presents a random number

r a n d_{g}^{i}

, and computes the sum of the random numbers

s_{g}

within group

g

through secure multi-party computation based on Shamir’s secret sharing scheme. After obtaining

s_{g}

, each user in the group divides it equally and subtracts their portion from their own random number to derive their local model’s encryption secret key

s k_{g}^{i}

for group

g

.

This method uses the additive homomorphism property of Shamir’s secret sharing to perform summation. Each user in group

g

, as a participant, only knows their own random number

r a n d_{g}^{i}

and the sum

s_{g}

of all random numbers within the group, without knowledge of other users’ random numbers. Since each user’s encryption secret key within the group is determined by both their own random number

r a n d_{g}^{i}

and the group sum

s_{g}

, the secret key

s k_{g}^{i}

remains unknown to other users within the group.

Based on the numerical computation process of the secret key, we can derive the following property:

\sum_{i \in I_{g}} s k_{g}^{i} = \sum_{i \in I_{g}} r a n d_{g}^{i} - s_{g} / |I g| = \sum_{i \in I_{g}} r a n d_{g}^{i} - s_{g} = 0,

(1)

4.3. Encryption and Aggregated Decryption of Local Models within Groups

The secret key negotiated within the group is utilized as noise added to the users’ local models to derive encrypted models. According to the property of the secret key as demonstrated by Equation (1), decryption is achieved through the aggregation of encrypted models within the group, as detailed in Algorithms 4 and 5.

Algorithm 4: Encryption of Each User’s Local Model within the Group

Input: local model $m^{i}$ of user $i$ , secret key $s k_{g}^{i}$ of user $i$ within group $g$
Output: cryptographic model $m_{g}^{i}$ for user $i$ within group $g$

1: $m_{g}^{i} \leftarrow m^{i} + s k_{g}^{i}$
2: Return $m_{g}^{i}$

Algorithm 4 implements the encryption of each user’s local model within the group. During the encryption process of local models within group

g

, each user adds the previously negotiated encryption secret key

s k_{g}^{i}

as noise to their local model

m^{i}

, and obtains the encrypted model

m_{g}^{i}

for user

i

within group

g

. After the implementation of Algorithm 4, each user uploads the encrypted models to the alliance chain.

Algorithm 5: Aggregate Decryption of Local Models in Groups

Input: cryptographic model $m_{g}^{i}$ of each user within group $g$ , $i \in I_{g}$
Output: aggregated decryption Model $m_{g}$ of Group $g$

1: $m_{g} \leftarrow a g g (m_{g}^{i}, i \in I_{g})$
2: Return $m_{g}$

Algorithm 5 implements the aggregated decryption of local models within the group.

a g g ()

represents the aggregation function for the local models and employs an average aggregation method, i.e.,

a g g (m^{i}, i \in I) = \frac{\sum_{i \in I} m^{i}}{|I|}

. During the aggregated decryption process of local models within group

g

, the encrypted models

m_{g}^{i}

of all users within group

g

are aggregated according to

a g g ()

to obtain the aggregated encrypted model

m_{g}

. Based on the property derived in Equation (1), we can deduce the following:

m_{g} = \frac{\sum_{i \in I_{g}} m_{g}^{i}}{|I g|} = \frac{\sum_{i \in I_{g}} m^{i}}{|I g|},

(2)

thus, through the aggregation of encrypted models within the group, the aggregation of unencrypted local models within the group can be obtained synchronously, achieving the decryption of the encrypted models.

4.4. Fine-Grained Detection of Potential Violators

The service provider utilizes the recording property of the blockchain and the multiple groupings of users in each round of collaborative training to conduct multiple group detections. This approach reduces the impact of randomness during the model training and grouping processes in indirect user detection, enabling fine-grained detection of potential violators, as detailed in Algorithm 6.

Algorithm 6: Fine-Grained Detection of Potential Violators
Input: all user set $I$ , key user set $K I$ , basic grouping set $G_{b}$ , focus group set $G_{k}$ , set of active groups for each user $G_{a}^{i}, i \in I$ , set of passive groups for each user $G_{p}^{i}, i \in I$ , aggregation model for each group $m_{g}, g \in G$ , key user score bound $b o u n d_{s c o r e}$ , potential violator confidence bound $b o u n d_{c o n f}$ Output: set of potential violators $P I$ , new key user set $N K I$ 1 Initialize the set of potential violators $P I$ , the new key user set $N K I$ is the empty set 2 Set the average accuracy of the basic group aggregation model to $τ$ and initialize it to 0 3 Set the contribution score $s c o r e_{i}$ for each user $i$ and initialize it to 0 $, i \in I$ 4 Set Confidence Interval Bounds $c o n f_{u p}^{i}, i \in K I$ and $c o n f_{l o}^{i}, i \in K I$
5 For $g \in G_{b}$	//Calculate mean grouping accuracy
6 $τ \leftarrow τ + a c c (m_{g})$ 7 End 8 $τ \leftarrow τ / \|G_{b}\|$
9 For $i \in I$	//Calculate the contribution score for each user
10 For $g \in G_{a}^{i}$ 11 If $a c c (m_{g}) > τ$ 12 $s c o r e_{i} \leftarrow s c o r e_{i} + 1$ 13 End 14 $s c o r e_{i} \leftarrow s c o r e_{i} / \|G_{a}^{i}\|$ 15 End
16 For $i \in K I$	//Calculate confidence intervals for key users
17 $s c o r e_{m e a n} \leftarrow \frac{\sum_{g \in G_{a}^{i}} s c o r e_{g}}{\|G_{a}^{i}\|}$ 18 $c o n f_{l o}^{i} \leftarrow s c o r e_{m e a n} - \frac{S}{\sqrt{\|G_{a}^{i}\|}} t_{0.025} (\|G_{a}^{i}\| - 1)$ 19 $c o n f_{u p}^{i} \leftarrow s c o r e_{m e a n} + \frac{S}{\sqrt{\|G_{a}^{i}\|}} t_{0.025} (\|G_{a}^{i}\| - 1)$ 20 End
21 For $i \in I$	//Determine if a user is a key user
22 If $s c o r e_{i} < (τ - b o u n d_{s c o r e})$ 23 $N K I \leftarrow N K I \cup i$ 24 End
25 For $i \in N K I$	//Determine if a user is a potential violator
26 If $i \in K I$ and $(c o n f_{u p}^{i} - c o n f_{l o}^{i}) < b o u n d_{c o n f}$ 27 $P I \leftarrow P I \cup i$ 28 $N K I \leftarrow N K I / i$ 29 End 30 Return $P I$ , $N K I$

Algorithm 6 is based on user grouping information and the group aggregated model’s accuracy. The service provider calculates the mean accuracy rate

τ

of the groups based on basic groupings as the standard for subsequent scoring. Each user, based on their active group

G_{a}^{i}

, calculates a contribution score

s c o r e_{i}

. If the accuracy of a certain group exceeds

τ

, that group scores 1 point; otherwise, it scores 0 points. The average of these scores is then taken as the contribution score

s c o r e_{i}

for user

i

. For key users, the confidence interval’s upper bound

c o n f_{u p}^{i}

and lower bound

c o n f_{l o}^{i}

are additionally calculated based on the scoring of their active groups

G_{a}^{i}

with a 95% confidence level for a more detailed contribution assessment, where

S = \frac{\sum_{g \in G_{a}^{i}} {(s c o r e_{g} - s c o r e_{m e a n})}^{2}}{|G_{a}^{i}|}

and

s c o r e_{g}

represents the score of group

g

.

After the contribution assessment, the service provider determines whether to add a user

i

to the new key user set

N K I

based on whether the user’s contribution score

s c o r e_{i}

falls below a certain threshold

(τ - b o u n d_{s c o r e})

. Within the new key user set

N K I

, users

i

who are also in the key user set

K I

are further selected. Whether to move a user to the potential violator set

P I

is judged based on whether the more finely grained contribution score confidence interval

(c o n f_{l o}^{i}, c o n f_{u p}^{i})

narrows to a smaller range

b o u n d_{c o n f}

.

5. Security Analysis and Experiment

5.1. Security and Privacy Analysis

(1): Privacy Preservation for Local Model

Users encrypt their local models using the encryption keys of their respective groups to obtain encrypted models, which are then uploaded to the blockchain. The service provider downloads the encrypted models from the blockchain for potential violator detection and model aggregation. Throughout this process, all other members of the blockchain have access to these encrypted models. Therefore, ensuring that users’ encrypted models do not leak sensitive information after being obtained by the service provider and other users becomes the focal point of our research.

We employ secure multi-party computation based on Shamir’s secret sharing [29] to negotiate keys under the assumption of honesty and curiosity. In a group comprising three users, let us assume user

I_{i} (i = 1, 2, 3)

holds a secret

s_{i} (i = 1, 2, 3)

. The secret sharing scheme divides the secret

s_{i}

into three shares

s_{i j} (j = 1, 2, 3)

, where three shares are required to reconstruct the secret

s_{i}

. Throughout the negotiation process, user

I_{i}

only knows the secret

s_{i}

they hold, one share of the secret sent by each of the other two users

s_{j i} (j = 1, 2, 3; j \neq i)

, and the negotiation result

\sum_{i = 1, 2, 3} s_{i}

. They cannot deduce the secrets

s_{j} (j = 1, 2, 3; j \neq i)

held by the other two users based on the secure multi-party computation process. Therefore, under the assumption of honesty and curiosity, users cannot infer the secrets held by other users through the secure multi-party computation process.

In the secure multi-party computation process, the secret

s_{i}

is a random number generated by user

I_{i}

, denoted as

r a n d_{i}

. If other users cannot know the secret

s_{i}

, then the key

s k^{i} = r a n d_{i} - \sum_{j = 1, 2, 3} r a n d_{j} = s_{i} - \sum_{j = 1, 2, 3} s_{j}

also remains unknown. Therefore, the process of encrypting local models is secure in terms of key confidentiality.

(2): Potential Violator Detection

The encryption of local models by users not only ensures the protection of their own privacy, guards against malicious attackers attempting to infer sensitive information and compromise privacy interests, but also introduces a challenge. Specifically, it deprives the direct ability of the service provider to conduct quality assessments on the local models provided by users. This opens up more covert channels for violators to engage in malicious behavior, such as data poisoning and free-riding attacks.

To address this issue, we leverage the non-repudiation property of blockchain to facilitate the detection of user behavior. By recording the outputs of each user on the blockchain and synchronizing this information across the entire peer-to-peer (P2P) network, we enable the service provider to indirectly assess the quality of user’s local models. This is achieved through multiple rounds of group aggregation of user outputs recorded in the blockchain. Such an approach allows the service provider to evaluate the presence of data poisoning attacks and free-riding attacks, thereby implementing a system for the detection of potential violators.

5.2. Experiment

In this section, we will conduct experimental evaluations on the blockchain-based fine-grained potential violator detection algorithm to demonstrate its effectiveness and performance in a blockchain network. This entails ensuring that (i) the algorithm can execute within the constraints of limited computing resources, and (ii) it can effectively detect potential violators.

Experimental setup: We employed a total of eight CPU servers, each equipped with a Hygon C86 7159 16-core processor, which is produced by Hygon Information Technology Co., Ltd. in Beijing, China, and two GPU servers; each GPU server was outfitted with eight NVIDIA TESLA T4 GPUs, which are produced by NVIDIA in City of Santa Clara, CA, USA, and 16 GB of memory. These servers were utilized to conduct training for data owners.
Dataset: We conducted our tests using the MNIST and CIFAR-10 datasets. The MNIST dataset consists of binary images of handwritten digits and was curated by the National Institute of Standards and Technology in the United States. MNIST is widely employed in machine learning for training and testing purposes and serves as a benchmark in the field. On the other hand, CIFAR-10 is a color image dataset and represents objects more akin to everyday entities. In contrast to MNIST, CIFAR-10 presents real-world objects because it introduces significant noise and variations in object proportions and features. This realistic complexity poses substantial challenges for recognition tasks and makes CIFAR-10 a valuable resource for assessing model performance under more demanding conditions.

Q1:

Can our approach accurately detect data poisoning and free-riding attacks? Furthermore, can it pinpoint the identity of malicious users?

To validate the feasibility of our potential violator detection algorithm, we employ a Convolutional Neural Network (CNN) model for image classification tasks on the MNIST and CIFAR-10 datasets, subsequently assessing its accuracy. Specifically, the CNN architecture for the MNIST dataset comprised two convolutional layers, two pooling layers, and two fully connected layers. On the CIFAR-10 dataset, the CNN model featured eight convolutional layers, four pooling layers, and three fully connected layers. The aggregation algorithm for collaborative training is FedAvg, with a learning rate of 0.01 for each round. The batch size is set as B = 64, and the maximum communication round (epochs) is set as E = 50. In this experiment, potential violators will not be excluded from the collaborative training process.

In each learning round, nine client machines perform local training, and the resulting models are aggregated by the service provider. We evaluated the collaborative training accuracy in the absence of malicious users, involving the accuracies of global and each group models in the MNIST and CIFAR-10 datasets. As shown in Figure 3, the results indicate that, under sufficient training, the accuracy of the group models closely approaches that of the global model as baseline, where no malicious user is present. After 50 iterations, the global model achieved an accuracy of 98.63% on the MNIST dataset and 80.93% on the CIFAR-10 dataset.

In our simulated experiments on data poisoning attacks, we randomly designate a device as the poison attacker (PAC). This chosen device uploads encrypted local models containing negative gradient information acquired during the training process, simulating a data poisoning attack. We assess the impact of poisoning attacks through group aggregation detection, recording the influence of PAC attacks on the accuracy of global model and group models.

As shown in Figure 4a,b, the simulated data poisoning actions by the poison attacker (PAC) exhibit fluctuations in the accuracy of the global model on both the MNIST and CIFAR-10 datasets. These fluctuations hinder convergence and result in a significantly lower global model accuracy compared to the baseline accuracy. The impact of PAC is further evident in the aggregation model accuracies of the two groups influenced by PAC. These group models not only have lower accuracies compared to the global model but also show no signs of convergence. This implies that PAC receives lower scores in our user detection and is identified as a potential violator, leading to its exclusion from the collaborative training process. These results substantiate the effectiveness of our user detection algorithm in resisting users’ data poisoning attack and accurately pinpointing attacker within the collaborative training framework.

In our simulated experiments on free-riding attacks, we randomly designate a device as the free-riding attacker (FAC). This device does not actively participate in regular local training and only uploads zero-gradient parameters to blockchain in each iteration to simulate a free-riding attack. Group aggregation detection is employed to assess free-riding behavior, and we record the impact of FAC’s on the accuracies of global model and group models.

As shown in Figure 4c,d, in the MNIST dataset, the accuracies of the two groups influenced by FAC are lower than those of the global model and other groups during the early iterations with significant accuracy fluctuations. Although the accuracies of the FAC-influenced groups gradually increase and approach the accuracy of the global model in later iterations, the stability of the model accuracy is comparatively poorer and the magnitude of accuracy fluctuations in these groups is greater than that observed in other groups. In the CIFAR-10 dataset, during the early iterations, the model accuracies of the FAC-influenced groups are not significantly different from those of the global model and other groups. However, with continued iterations, it becomes evident that the accuracies of these two groups are consistently lower than those of the global model and other groups. Free-riding attacks initiated by malicious users may converge with increasing iteration counts, indicating that detecting free-riding attacks is more challenging compared to data poisoning attacks. Nevertheless, through extended observations, abnormal behaviors and the underlying FAC can still be identified. This result demonstrates the effectiveness of our user detection algorithm in resisting user’s free-riding attacks and accurately pinpointing attackers.

To validate the scalability of our proposed method and examine its ability to identify potential violators in the presence of multiple malicious users, we conducted an assessment with nine participants and without excluding potential violators. We performed four basic groupings per round with epoch = 50 and analyzed the changes in user scores and confidence intervals, as depicted in Figure 5.

In Figure 5, we observe that as the number of malicious users increases, the difference in scores between general users and poisoning attackers, as well as free-riding attackers, decreases, and the overlapping area of the confidence intervals for scores becomes larger. However, even with the presence of three poisoning attackers and three free-riding attackers, i.e., when two-thirds of the participants are malicious users, the scores and confidence intervals for general users remain significantly higher than those for malicious users. This indicates that in the collaborative training process, malicious users can be identified as potential violators and excluded from the collaborative training process, demonstrating good scalability.

Q2:

Is our proposed method resistant to malicious attacks?

To validate the impact of our user detection algorithm on the accuracy of collaborative training, we compared the accuracy variations among three categories of collaborative training scenarios: collaborative training with benign users (no malicious behavior), collaborative training with malicious attacks and no user detection, and collaborative training with malicious attacks and user detection.

From Figure 6, it is evident that on the MNIST dataset, the user detection algorithm enhances the stability of collaborative training processes that include malicious users engaging in poisoning attacks. This improvement allows the global model to reach a more stable state with a higher accuracy. For collaborative training involving free-riding users, the user detection algorithm also contributes to an increased model accuracy, aligning it closely with the accuracy of collaborative training with malicious users. On the CIFAR-10 dataset, the use of our user detection algorithm provides the accuracy and convergence speed of the global model in collaborative training scenarios with malicious users. This indicates that our proposed user detection algorithm is effective in identifying potential violators during the collaborative training process, enhancing both the convergence speed and final model accuracy of the entire collaborative training process.

Q3:

Does our proposed approach effectively improve performance under ciphertext potential violator detection?

In our user detection algorithm, secure multi-party computation based on Shamir’s secret sharing and multiple group localization plays a crucial role. In this section, we assess the feasibility of secure multi-party computation based on Shamir’s secret sharing and multiple rounds of grouping localization by examining the overhead of encryption and decryption, as well as the communication overhead for users under differential numbers of groups. We employ the BCP algorithm for encrypting and decrypting model parameters [6] as a baseline.

To validate the effectiveness of secure multi-party computation based on Shamir’s secret sharing, we evaluated its overhead. Since the model decryption process aligns with the normal model aggregation process, we have omitted the decryption process and only calculated the overhead incurred by key negotiation and computation during encryption, along with the necessary communication.

The overhead is evaluated based on the number of encryption and decryption iterations. As shown in Table 1, the experimental results demonstrate that the BCP algorithm requires 96.35 s for encryption and 175.19 s for decryption in 20 iterations. In contrast, our proposed method incurs no additional overhead during the decryption process, and the overhead for 20 encryption iterations is only 5.06 s. This is significantly lower than the overhead associated with the BCP algorithm.

This is attributed to the fact that our method only has complex computations in the secret key negotiation phase based on Shamir’s secret sharing. The computation in the model encryption phase is straightforward and results in an overall lower computational load. Additionally, the user detection algorithm constrains the scale of key negotiation within each group. This contributes to a lower overhead of communication during the secret key negotiation process. As a result, our proposed method incurs a significantly lower overhead compared to the BCP algorithm.

To validate the effectiveness of multiple rounds of grouping localization, we evaluated its overhead of communication by calculating the overhead of communication incurred through uploading group models via the blockchain and necessary communication during the encryption process.

The overhead of communication was assessed based on the number of groups. Experimental results, as shown in Table 2, indicate that the encryption algorithm based on BCP requires the transmission of 170.23 MB of data during 20 times of group model uploads, whereas our proposed method only requires 41.94 MB, which is lower than the overhead of communication of the BCP. This is because our proposed grouping encryption method does not expand the messages during the encryption process, resulting in the same size for both plaintext and ciphertext. However, in the BCP algorithm, only one group model needs to be uploaded per round of training, whereas our proposed method requires uploading multiple groups. Therefore, in practice, if there are too many groups in one training round, it can lead to significant overhead of communication.

Furthermore, our proposed method requires only one blockchain compared to the BCP. All content uploaded to the blockchain is encrypted, and no key-related content is uploaded, thus avoiding the risk of a single node simultaneously being part of two blockchains, where it could potentially access both keys and ciphertexts and decrypt them to obtain the original local models if the entire blockchain content is readable by blockchain nodes.

6. Conclusions

Over the past decade, communication technologies have experienced significant breakthroughs, driving the expansion of extensive data applications. The advent of CFNs has rendered multi-party data collaborative training an irreversible trend. However, traditional approaches to multi-party data collaborative analysis necessitate a trusted entity for data fusion. While blockchain-based multi-peer endorsement technology offers a novel solution to this challenge, the issue of undetectable data poisoning and free-riding attacks that emerge post the encryption of data on the blockchain still requires resolution. Consequently, we propose a blockchain-based fairness guarantee approach for privacy-preserving collaborative training in a CFN. Our proposed approach employs cryptography-based secure aggregation to prevent malicious attackers from obtaining private information in the training data through gradient inversion attacks. It also identifies potential violators through group aggregation evaluation of accuracy, effectively mitigating encrypted data poisoning and free-riding attacks. Additionally, in our use of Shamir’s secret sharing for secret key negotiation within the group, we limit the scale of the negotiating group and avoid high computational multiplication calculations, thereby enhancing the computational efficiency of encryption and decryption. Finally, we validate our proposed approach on two publicly available datasets, and our experimental results affirm its effectiveness and efficiency.

In our forthcoming research endeavors, our objective is to expand the scope of encrypted data fairness guarantee approaches in collaborative training beyond poisoning attacks and free-riding attacks to comprehensively explore various deep neural network threats, including backdoor attacks. Concurrently, we are committed to guaranteeing the precision and sensitivity of malicious user behavior detection. In addition to conventional metrics such as model accuracy, we will delve into multi-dimensional assessment techniques. This comprehensive approach will enable us to more effectively identify potential violators within complex and diverse environments, thus guaranteeing the security and resilience of collaborative training systems.

Author Contributions

Conceptualization, Z.S. and L.Y.; Data Curation, J.L.; Formal Analysis, W.L.; Investigation, C.L. and H.W.; Software, J.L. and J.Z.; Supervision, L.Y. and H.W.; Writing—Original Draft, Z.S.; Writing—Review and Editing, Z.S., C.L., N.W. and H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Major Research plan of the National Natural Science Foundation of China, grant number 92167203; in part by the National Natural Science Foundation of China, grant number 62002077; in part by the Guangdong Basic and Applied Basic Research Foundation, grant number 2020A1515110385; in part by the Guangzhou Science and Technology Plan Project, grant number 202201020216, 2023A03J0119; and in part by the Guangxi Key Laboratory of Trusted Software, grant number KX202313.

Data Availability Statement

Publicly available datasets were analyzed in this study. The MNIST dataset can be found here: http://yann.lecun.com/exdb/mnist/ and accessed on 29 January 2024. The CIFAR-10 dataset can be found here: http://www.cs.toronto.edu/~kriz/cifar.html and accessed on 29 January 2024.

Conflicts of Interest

Author Hanyi Wang was employed by the company China Mobile (Suzhou) Software Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Tang, K.; Ma, Y.; Miao, D.; Song, P.; Gu, Z.; Tian, Z.; Wang, W. Decision Fusion Networks for Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–14. [Google Scholar] [CrossRef]
Han, P.; Yan, Z.; Ding, W.; Fei, S.; Wan, Z. A Survey on Cross-chain Technologies. Distrib. Ledger Technol. Res. Pract. 2023, 2, 1–30. [Google Scholar] [CrossRef]
Dasgupta, D.; Shrein, J.M.; Gupta, K.D. A survey of blockchain from security perspective. J. Bank. Financ. Technol. 2019, 3, 1–17. [Google Scholar] [CrossRef]
Li, X.; Jiang, P.; Chen, T.; Luo, X.; Wen, Q. A survey on the security of blockchain systems. Future Gener. Comput. Syst. 2020, 107, 841–853. [Google Scholar] [CrossRef]
Tian, Z.; Li, M.; Qiu, M.; Sun, Y.; Su, S. Block-DEF: A secure digital evidence framework using blockchain. Inf. Sci. 2019, 491, 151–165. [Google Scholar] [CrossRef]
Sun, Z.; Wan, J.; Yin, L.; Cao, Z.; Luo, T.; Wang, B. A Blockchain-based Audit Approach for Encrypted Data in Federated Learning. Digit. Commun. Netw. 2022, 8, 614–624. [Google Scholar] [CrossRef]
Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated learning: Challenges, methods, and future direction. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar] [CrossRef]
Li, A.; Duan, Y.; Yang, H.; Chen, Y.; Yang, J. TIPRDC: Task-independent privacy-respecting data crowdsourcing framework for deep learning with anonymized inter-mediate representations. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, CA, USA, 6–10 July 2020; pp. 824–832. [Google Scholar]
Yang, H.H.; Liu, Z.; Quek, T.Q.S.; Poor, H.V. Scheduling policies for federated learning in wireless networks. IEEE Trans. Commun. 2019, 68, 317–333. [Google Scholar] [CrossRef]
Zhu, P.; Hong, J.; Li, X.; Tang, K.; Wang, Z. SGMA: A novel adversarial attack approach with improved transferability. Complex Intell. Syst. 2023, 9, 6051–6063. [Google Scholar] [CrossRef]
Tang, K.; Shi, Y.; Lou, T.; Peng, W.; He, X.; Zhu, P.; Gu, Z.; Tian, Z. Rethinking Perturbation Directions for Imperceptible Adversarial Attacks on Point Clouds. IEEE Internet Things J. 2022, 10, 5158–5169. [Google Scholar] [CrossRef]
Zhu, P.; Fan, Z.; Guo, S.; Tang, K.; Li, X. Improving Adversarial Transferability through Hybrid Augmentation. Comput. Secur. 2024, 139, 103674. [Google Scholar] [CrossRef]
Neziri, V.; Shabani, I.; Dervishi, R.; Rexha, B. Assuring Anonymity and Privacy in Electronic Voting with Distributed Technologies Based on Blockchain. Appl. Sci. 2022, 12, 5477. [Google Scholar] [CrossRef]
Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1175–1191. [Google Scholar]
So, J.; Ali, R.E.; Güler, B.; Jiao, J.; Avestimehr, A.S. Securing Secure Aggregation: Mitigating Multi-Round Privacy Leakage in Federated Learning. Proc. AAAI Conf. Artif. Intell. 2023, 37, 9864–9873. [Google Scholar] [CrossRef]
Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar]
Yang, Y.; Yuan, H.; Hui, B.; Gong, N.; Fendley, N.; Burlina, P.; Cao, Y. Fortifying Federated Learning against Membership Inference Attacks via Client-level Input Perturbation. In Proceedings of the 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Porto, Portugal, 27–30 June 2023; pp. 288–301. [Google Scholar] [CrossRef]
Weng, J.; Weng, J.; Zhang, J.; Li, M.; Zhang, Y.; Luo, W. Deepchain: Auditable and privacy-preserving deep learning with blockchain-based incentive. IEEE Trans. Dependable Secur. Comput. 2019, 18, 2438–2455. [Google Scholar] [CrossRef]
Tian, J.; Song, Q.; Wang, H. Blockchain-Based Incentive and Arbitrable Data Auditing Scheme. In Proceedings of the 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), Baltimore, MD, USA, 27–30 June 2022; pp. 170–177. [Google Scholar] [CrossRef]
Lu, Y.; Huang, X.; Dai, Y.; Maharjan, S.; Zhang, Y. Blockchain and federated learning for privacy-preserved data sharing in industrial IoT. IEEE Trans. Ind. Inform. 2019, 16, 4177–4186. [Google Scholar] [CrossRef]
Zhou, Z. Computation resource allocation and task assignment optimization in vehicular fog computing: A contract-matching approach. IEEE Trans. Veh. Technol. 2019, 68, 3113–3125. [Google Scholar] [CrossRef]
Jiao, W.; Zhao, H.; Feng, P.; Chen, Q. A Blockchain Federated Learning Scheme Based on Personalized Differential Privacy and Reputation Mechanisms. In Proceedings of the 2023 4th International Conference on Information Science, Parallel and Distributed Systems (ISPDS), Guangzhou, China, 14–16 July 2023; pp. 630–635. [Google Scholar] [CrossRef]
Ma, S.; Cao, Y.; Xiong, L. Transparent Contribution Evaluation for Secure Federated Learning on Blockchain. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering Workshops (ICDEW), Chania, Greece, 19–22 April 2021; pp. 88–91. [Google Scholar] [CrossRef]
Chen, B.; Zeng, H.; Xiang, T.; Guo, S.; Zhang, T.; Liu, Y. ESB-FL: Efficient and Secure Blockchain-Based Federated Learning with Fair Payment. IEEE Trans. Big Data 2022, 1. [Google Scholar] [CrossRef]
Du, Y.; Wang, Z.; Leung, C.; Victor, C.L. Blockchain-based Data Quality Assessment to Improve Distributed Machine Learning. In Proceedings of the 2023 International Conference on Computing, Networking and Communications (ICNC), Honolulu, HI, USA, 20–22 February 2023; pp. 170–175. [Google Scholar] [CrossRef]
Xiong, J.; Ma, R.; Chen, L.; Tian, Y.; Li, Q.; Liu, X.; Yao, Z. A personalized privacy protection framework for mobile crowdsensing in IIoT. IEEE Trans. Ind. Inform. 2019, 16, 4231–4241. [Google Scholar] [CrossRef]
Wang, Z.; Hu, Q.; Li, R.; Xu, M.; Xiong, Z. Incentive Mechanism Design for Joint Resource Allocation in Blockchain-Based Federated Learning. IEEE Trans. Parallel Distrib. Syst. 2023, 34, 1536–1547. [Google Scholar] [CrossRef]
Qu, Y.; Gao, L.; Luan, T.H.; Xiang, Y.; Yu, S.; Li, B.; Zheng, G. Decentralized privacy using blockchain-enabled federated learning in fog computing. IEEE Internet Things J. 2020, 7, 5171–5183. [Google Scholar] [CrossRef]
Fălămaş, D.-E.; Marton, K.; Suciu, A. Assessment of Two Privacy Preserving Authentication Methods Using Secure Multiparty Computation Based on Secret Sharing. Symmetry 2021, 13, 894. [Google Scholar] [CrossRef]

Figure 1. The collaborative training in CFN.

Figure 2. The data-flow perspective of blockchain-based fairness guarantee approach.

Figure 3. Model accuracy without malicious users. (a) MNIST; (b) CIFAR-10.

Figure 4. Model accuracy with poison and free-riding attack. (a) MNIST with poison attack; (b) CIFAR-10 with poison attack; (c) MNIST with free-riding attack; (d) CIFAR-10 with free-riding attack.

Figure 5. The scores and confidence intervals for each user under different numbers of malicious users: (a) 1 PAC and 1 FAC; (b) 2 PAC and 2 FAC; (c) 3 PAC and 3 FAC; (d) 4 PAC and 4 FAC.

Figure 6. Model accuracy with and without audit, (a) MNSIT; (b) CIFAR-10.

Table 1. The overhead of encryption and decryption time(s).

Times	5	10	15	20
BCP_Enc_avg [6]	24.30	43.59	72.62	96.35
BCP_Dec [6]	45.95	89.03	132.11	175.19
Our_Enc_avg	1.265	2.535	3.796	5.060
Our_Dec	-	-	-	-

Table 2. The overhead of communication in uploading models (MB).

Times	5	10	15	20
BCP [6]	42.55	85.11	127.67	170.23
Our	10.48	20.97	31.45	41.94

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, Z.; Li, W.; Liang, J.; Yin, L.; Li, C.; Wei, N.; Zhang, J.; Wang, H. A Blockchain-Based Fairness Guarantee Approach for Privacy-Preserving Collaborative Training in Computing Force Network. Mathematics 2024, 12, 718. https://doi.org/10.3390/math12050718

AMA Style

Sun Z, Li W, Liang J, Yin L, Li C, Wei N, Zhang J, Wang H. A Blockchain-Based Fairness Guarantee Approach for Privacy-Preserving Collaborative Training in Computing Force Network. Mathematics. 2024; 12(5):718. https://doi.org/10.3390/math12050718

Chicago/Turabian Style

Sun, Zhe, Weiping Li, Junxi Liang, Lihua Yin, Chao Li, Nan Wei, Jie Zhang, and Hanyi Wang. 2024. "A Blockchain-Based Fairness Guarantee Approach for Privacy-Preserving Collaborative Training in Computing Force Network" Mathematics 12, no. 5: 718. https://doi.org/10.3390/math12050718

APA Style

Sun, Z., Li, W., Liang, J., Yin, L., Li, C., Wei, N., Zhang, J., & Wang, H. (2024). A Blockchain-Based Fairness Guarantee Approach for Privacy-Preserving Collaborative Training in Computing Force Network. Mathematics, 12(5), 718. https://doi.org/10.3390/math12050718

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Blockchain-Based Fairness Guarantee Approach for Privacy-Preserving Collaborative Training in Computing Force Network

Abstract

1. Introduction

2. Related Work

2.1. Data Privacy

2.2. Fairness Guarantee

2.3. Utility Optimization

3. Overview of Our Proposed Approach

4. Fine-Grained Potential Violator Detection Algorithm Based on Blockchain

4.1. User Grouping Based on User Types

4.2. Intra-Group Secret Key Negotiation Based on Shamir’s Secret Sharing

4.3. Encryption and Aggregated Decryption of Local Models within Groups

4.4. Fine-Grained Detection of Potential Violators

5. Security Analysis and Experiment

5.1. Security and Privacy Analysis

5.2. Experiment

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI