Next Article in Journal
Immunity-Empowered Collaboration Security Protection for Mega Smart Cities
Previous Article in Journal
Design and Measurement of a Two-Dimensional Beam-Steerable Metasurface for Ka-Band Communication Systems
Previous Article in Special Issue
Security Threats, Requirements and Recommendations on Creating 5G Network Slicing System: A Survey
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Data Attack Detection Framework for Cryptography-Based Secure Aggregation Methods in 6G Intelligent Applications

1
Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China
2
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
3
China Mobile (Suzhou) Software Technology Co., Ltd., Suzhou 215163, China
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(11), 1999; https://doi.org/10.3390/electronics13111999
Submission received: 19 April 2024 / Revised: 13 May 2024 / Accepted: 14 May 2024 / Published: 21 May 2024
(This article belongs to the Special Issue Recent Advances in Reliability and Security in 5G/6G Mobile Networks)

Abstract

:
Eagerly anticipated, 6G networks are attributed with a variety of characteristics by researchers. A pivotal characteristic of 6G networks is the deep integration of sensing and networking, along with intelligent network applications operating on top of this infrastructure. To optimally harness the data collected by sensors distributed across various locations, the training paradigm of the new generation of 6G intelligence applications aligns naturally with the federated-learning paradigm. The exposure of gradients in federated learning to inversion attacks is a critical concern. To address this, cryptography-based secure aggregation methods are commonly implemented to protect the privacy and confidentiality of gradients. However, the semantic meaninglessness of encrypted data makes it difficult to assess the correctness, availability, and source legitimacy of participants’ data. In this paper, we propose a data attack detection framework for cryptography-based secure aggregation methods in 6G intelligence applications that address the security vulnerabilities associated with encrypted data obscurity. We employ a suite of encrypted-data-auditing techniques to prevent data-aggregation errors, data poisoning, and illegal data sources. Additionally, we have compared a series of promising security methods, analyzed, and provided recommendations for the most suitable security approaches in specific 6G scenarios.

1. Introduction

With the widespread and large-scale deployment of 5G networks globally, people are enjoying the convenience of high-speed networks and the array of resulting applications. Representing the next generation of communication networks, 6G networks are highly anticipated [1]. Researchers have designed many attractive candidate technologies (White Paper on 6G Vision and Candidate Technologies. http://www.caict.ac.cn/english/news/202106/P020210608349616163475.pdf (accessed on 13 May 2024)) that can be implemented in future 6G networks, for example:
  • New network with native AI. Researchers believe that AI can be integrated into mobile communication systems, resulting in a new intelligent network technology. Consequently, 6G-network users will not only generate data but also undertake data processing and analysis tasks;
  • Enhanced wireless-transmission technologies. As heterogeneous networks become widely interconnected through terahertz communications, users in any geographic location can interact with application service providers in real time, reducing the spatial limitations of applications;
  • Native network security. As cross-domain data sharing becomes more frequent, basic security functions, such as secure multi-party computation and secure aggregation algorithms such as homomorphic encryption [2], will be embedded in the network architecture, providing a fundamental trust guarantee for 6G intelligence applications.
Building upon these compelling technologies, a suite of promising 6G intelligent applications has emerged, such as vehicle to everything (V2X), digital twin, etc. These applications harness the integration of a variety of sensors within the 6G network to achieve perception of the physical world. The perceived data are initially processed by local artificial intelligence models and then interact with other intelligent entities through terahertz communications. All communication and computation processes are safeguarded by efficient built-in security mechanisms that ensure the confidentiality of the data. Take the example of advanced intelligent driving in V2X, as depicted in Figure 1. During the advanced intelligent-driving training process, a vast amount of data are collected through sensors, such as onboard cameras and radars integrated into the vehicle. Terahertz communications provide transmission speeds far exceeding those of 5G networks, enabling the transfer of more comprehensive training data. Inherent artificial intelligence can provide some initial analysis results, while the inherent security mechanisms provide a secure foundation for data transmission and computation for the entire 6G intelligent application.
Cryptography-based secure aggregation methods, despite their performance overhead, appear to be an almost perfect solution for data security sharing in theory. With the ongoing lightweighting of secure aggregation algorithms and the high overhead of traditional secure aggregation algorithms in transmission and computation being gradually alleviated by the terahertz communications and edge computing capabilities of 6G networks, these methods are becoming more feasible. However, with the popularization of secure aggregation methods, derivative security issues of encrypted data are emerging. A series of attacks on encrypted data, such as tampering attacks, resale attacks, poisoning attacks, and free-riding attacks, have become one of the factors hindering the application of secure aggregation methods. Compared with traditional plaintext data threat-detection methods, there are two difficulties in ciphertext data threat-detection methods in 6G intelligent applications. (a) The coexistence of trusted, semi-trusted, and untrusted participants in large-scale 6G intelligent model training, with inconsistent interests among participants, makes it possible for malicious attackers or semi-trusted participants to undermine the security and fairness of collaborative training. (b) The encryption of data increases the concealment of malicious attacks, making it more difficult to detect malicious attacks in the circulation of data elements. While ciphertext computation protects the confidentiality of the data submitted by participants, it also prevents service providers from directly detecting the quality of the user data submitted by participants, providing convenience for malicious attackers to submit harmful or non-contributory data.
This paper proposes a data attack detection framework for cryptography-based secure aggregation methods to defend against security issues such as data tampering, data resale, data poisoning, and free-riding attacks by malicious participants while protecting the confidentiality of 6G user data. We first discuss the various attacks commonly encountered in encrypted-data scenarios. Then, we propose a general detection technology framework for different attacks. By integrating our proposed framework with technologies such as blockchain and zero-knowledge proofs, we prevent aggregation errors, data poisoning, and illegal data sources in secure aggregation.
The main innovative points of this paper are as follows.
  • We present a comprehensive data attack detection framework tailored for secure aggregation methods, which effectively identifies malicious activities such as data tampering, data resale, data poisoning, and free-riding attacks, all while preserving the confidentiality of 6G user data;
  • We provide a general design pattern for ensuring data integrity, verifying ownership, and evaluating contribution levels in encrypted states. Additionally, we analyze typical algorithmic cases and offer recommended strategies;
  • We demonstrate the viability of our proposed framework within next-generation 6G networks through a comprehensive security analysis and evaluate the limitations of these approaches.
In the subsequent sections of this paper, we examine existing research concerning correctness assurance, ownership security, and contribution assessment in collaborative training. Next, we introduce our proposed framework and discuss how it addresses security vulnerabilities inherent in the encryption of data. Finally, this paper evaluates various promising security methods, analyzes the optimal security approach for specific 6G scenarios, and provides recommendations.

2. Related Work

2.1. Secure Aggregation Algorithms in the Context of 6G

In the upcoming era of 6G networks, communication technology will undergo revolutionary changes; 6G will not only achieve unprecedented high speed and low latency but also introduce three key features: terahertz communication, inherent AI, and inherent security. These features will enable 6G networks to support more intelligent and secure application scenarios. In 6G networks, terahertz communication will provide extremely high data-transmission speeds, meeting the needs of future large-scale data communication. Inherent AI will enable the network to achieve self-learning and self-optimization, enhancing the intelligence level of the network. Inherent security will ensure that the network has strong security protection capabilities from the beginning of its design, providing comprehensive security for users and data. In such a highly secure and intelligent network environment, federated learning, as a distributed machine-learning method that protects data privacy, will play an important role.
To ensure data security in the federated-learning process, secure aggregation algorithms have become a core technology. These algorithms ensure that participants can jointly train models in an encrypted manner without sharing raw data, ensuring that only model update information that has been securely aggregated is transmitted between participants, thereby effectively preventing data leakage and abuse. Zhao et al. [3] proposed the SEAR secure and efficient aggregation framework, which uses trusted execution environments to protect clients’ private models, achieving support for Byzantine fault tolerance and a significant improvement in aggregation efficiency. Pillutla et al. [4] proposed the RFA method, which uses a geometric median for aggregation updates, enhancing the robustness of the aggregation process in the presence of potential data or model parameter contamination. Yang et al. [5] proposed an efficient and secure federated-learning scheme with verifiable weighted average aggregation, which uses masking techniques to encrypt weighted gradients and data sizes, achieving privacy protection for uploaded gradients and verification of aggregation correctness. Elkordy et al. [6] proposed the HeteroSAg scheme, which allows for secure model aggregation using heterogeneous quantization, achieving a better tradeoff between training accuracy and communication time, while adapting to the communication resources available to different users. Wang et al. [7] proposed weighted-clustering federated learning, which uses adaptive clustering based on cosine similarity and weighted-cluster model aggregation to effectively manage data imbalance issues in a 6G environment. Liu et al. [8] proposed FedCPF, which uses customized local training strategies, partial client participation rules, and flexible aggregation strategies to achieve efficient communication for federated learning for vehicular edge computing in 6G. The design and implementation of secure aggregation algorithms are key to the successful application of federated learning in 6G networks. It not only needs to ensure the security of computation but also consider the efficiency and scalability of the algorithm to adapt to the growing amount of data and computational demands. Therefore, researching and developing efficient and reliable secure aggregation algorithms is an important task for implementing federated learning in 6G networks. However, to achieve all this, we still need to address a series of challenges, including ensuring the correctness of ciphertext data, verifying the ownership of data, and evaluating the contributions of each participant. The solutions to these problems will directly affect the feasibility and efficiency of federated learning in 6G networks.

2.2. Data Security Issues Arising in 6G Networks

Regarding the correctness of encrypted data, since data owners are generally unwilling to fully relinquish control over their data, it is difficult to establish complete mutual trust among participants during multi-party data sharing. Furthermore, system errors or disconnections that may occur during encrypted computation, as well as potential malicious behavior, can affect the security and stability of multi-party data collaborative analysis. Therefore, it is crucial to ensure the correctness of data transmission, data aggregation, and data encryption/decryption during the encrypted computation process. Existing research, such as Peng et al. [9], designed a data verification and audit tracing method based on hash functions to ensure that user-submitted data are not tampered with before aggregation, but did not consider the security of private information in the submitted data. For data-aggregation correctness, Fu et al. [10] designed a correctness verification method based on blind encryption, using Lagrange interpolation to verify the correctness of aggregated encrypted data. Xu et al. [11] designed a dual-concealment technique to protect model parameters and used bilinear mapping functions to verify whether the service provider correctly aggregated all models, but this method is not applicable to federated learning based on homomorphic encryption. For data encryption/decryption correctness, Weng et al. [12]. designed an audit scheme based on the Σ protocol to verify whether the encryption/decryption results of data are correct. Y. Shin et al. [13] proposed a privacy-preserving federated averaging (PP-FedAvg) protocol, which uses homomorphic encryption technology to protect the size of datasets and aggregated local-update parameters, ensuring that the server does not access dataset sizes and local-update parameters when updating the global model. Zheng et al. [14] provided effective protection for individual model updates during the learning process, allowing clients to provide only obfuscated model updates, while the cloud server can still perform aggregation. The system, for the first time, enhanced the security and efficiency of federated learning by supporting lightweight encryption and aggregation, as well as resilient handling of dropped clients. These methods have addressed the issue to some extent, but further verification and optimization are needed in practical applications.
The issue of verifying ownership of encrypted data cannot be ignored either. Since encrypted models cannot be decrypted before aggregation, service providers can only identify the identity of the model sender through technologies such as digital signatures but cannot determine whether the sender is the actual trainer of the model. This could lead to the resale of encrypted data, thus undermining the fairness and security of multi-party data sharing. To prevent this, researchers have proposed a series of methods, such as Kim et al. [15] and Li et al. [16], who suggest using blockchain to store encrypted model updates in multi-party data sharing, verifying encrypted data based on the consensus mechanism on the chain, and implementing incentive mechanisms to reward users who provide encrypted data. Although such work achieves transparency of data and processes in multi-party data sharing, it also allows any participant to obtain the encrypted model parameters of other participants through the blockchain. Participants may exploit the homomorphic property of encrypted data to modify model parameters, disguising them as their own data to participate in multi-party data collaborative analysis. Zhao et al. [17] designed a blockchain caching mechanism based on IPFS (InterPlanetary File System), storing encrypted data in off-chain storage areas to prevent direct access to the data by participants. However, this solution still cannot prevent malicious attackers from accessing and obtaining encrypted model parameters through the data hash values on the chain. To avoid leakage of the encrypted model parameters, Lu et al. [18] proposed storing model parameters in the cache area of blockchain nodes, with new blocks only recording data-sharing events. These methods all attempt to solve the issue of verifying the ownership of encrypted data through technical means.
Evaluating the contribution of encrypted data is equally important for incentivizing data holders to provide high-quality data and preventing malicious behavior. In multiple rounds of training, there may be malicious attackers who launch data-poisoning attacks [19,20,21] on multi-party data sharing, upload malicious data, and damage the global model. Additionally, data holders may provide low-quality data or carry out free-riding attacks due to concerns about privacy protection and computational resource consumption. To ensure that multi-party data sharing ultimately achieves good model-training results, it is necessary to defend against various data-poisoning attacks from malicious attackers and detect low-quality data uploads or free-riding behavior [22] by passive data holders. Existing defense work can be roughly divided into two categories: data feature detection and data quality assessment. Data feature detection refers to the feature analysis of training data or intermediate data used in federated learning, such as clustering data [23] or calculating the distance [24] and similarity [25] between model parameters, to determine whether there are malicious data in the training set. Since data in ciphertext computation are in an encrypted state invisible before aggregation, data feature detection methods are not applicable to cryptographic-based secure aggregation scenarios. Data quality assessment refers to evaluating the classification performance of the model-training results, such as assessing whether the performance of the aggregated model meets expectations [18] or designing homomorphic computation methods based on the structure of the encrypted model to test the model’s accuracy [26]. The former assessment can only determine whether malicious data exists, without being able to locate it; the latter requires inputting each test data into the model and performing a large number of homomorphic calculations, which is not highly feasible in the evaluation of complex models. Currently, there is no work that can accurately locate encrypted malicious data without relying on a large amount of homomorphic computation. Ensuring the long-term functionality of the system and guaranteeing data quality and fair payment during cooperation are crucial. Jiang et al. [27] proposed a malicious client-detection federated-learning mechanism (MCDFL) facing label-flipping attacks, which, in a 6G environment, more efficiently detects the data quality of each client by recovering the distribution of latent feature spaces. Li et al. [28] proposed an incentive mechanism based on contract theory. By maximizing the utility of data holders, an incentive mechanism was established, proving that the optimal strategy set of data holders reaches Nash equilibrium. Lin et al. [29] proposed a new social federated-learning framework (SFEL) that is realized by recruiting trustworthy social friends as learning partners. In a 6G environment, like-minded friends can be quickly found by establishing a social graph model, taking into account mutual trust and similarity of learning tasks. Furthermore, an incentive mechanism based on social effects was proposed to promote better individual federated-learning behavior through complete and incomplete information. Qi et al. [30] proposed a blockchain-based federated-learning model that uses a reputation mechanism to incentivize data owners to participate and contribute high-quality data. In a 6G network, by designing smart contracts, this mechanism can evaluate participants’ contributions and distribute rewards accordingly without disclosing personal data, thus promoting the contribution of high-quality data. T. Ranathunga et al. [31] proposed a blockchain-based decentralized federated-learning framework that uses an aggregator of hierarchical networks to reward or penalize organizations based on the quality of local models.
Secure aggregation methods based on encryption have broad application value in 6G multi-party secure sharing scenarios, but existing methods are insufficient for addressing the derived security issues, with challenges such as repetitive verification of encrypted data correctness, difficulty in verifying ownership, and difficulty in locating malicious attackers, greatly harming the participation enthusiasm of trusted users. There is an urgent need to design a data attack detection framework for secure aggregation methods based on encryption in 6G intelligent applications.

3. Threat Model

The deployment of 6G inherent security mechanisms ensures the confidentiality of 6G user data. However, the protection of data confidentiality makes data invisible, which is a double-edged sword, as it can lead to new security risks when multiple parties collaborate in analysis. As a new open-application network, 6G networks coexist with trusted, semi-trusted, and malicious 6G users, who are likely to exploit the invisibility of encrypted data to attack the collaborative analysis process. The following are four typical encrypted data attacks that this framework aims to defend against, as illustrated in Figure 2

3.1. Tampering Attacks on Encrypted Data

Malicious participants can cause errors in the ciphertext aggregation algorithm on the central server by uploading tampered ciphertexts of model parameters. The central server can manipulate the aggregation results of the model by tampering with ciphertexts or aggregation weights during the model aggregation process. Errors in data transmission or encryption/decryption operations can prevent the central server from obtaining the correct aggregation results.

3.2. Resale Attacks on Encrypted Data

During the data-sharing process, unscrupulous actors might access and modify other data holders’ encrypted models and verification information, using the homomorphic properties of these models to create new encrypted models for federated-learning participation. Based on the encrypted state of the model parameters, the central server is unable to determine if a data holder has engaged in the resale of models. Without utilizing computational resources for training, malicious data holders can still participate in federated learning, compromising its fairness.

3.3. Poisoning Attacks on Encrypted Data

Malicious data provided to the central server by rogue actors may result in anomalies within the global model. With model parameters encrypted prior to aggregation, the central server is unable to ascertain the specific contributions of each parameter to the performance of the aggregated model, nor can it identify any instances of data poisoning.

3.4. Free-Riding Attacks on Encrypted Data

Rogue participants may submit untrained models to the central server, attempting to deceitfully acquire the optimized aggregated model. Since model parameters are encrypted prior to aggregation, the central server is unable to determine the impact of each parameter on the performance of the aggregated model, nor can it identify any freeloading malicious activities within the data.

4. Details of Our Framework

In this section, we focus on how to ensure that encryption-based secure aggregation methods can withstand the four typical encrypted-data attacks mentioned above while protecting the confidentiality of 6G user data, ensuring data integrity, verifying ownership, and assessing the contribution levels under encrypted conditions.

4.1. The Overview of Our Proposed Framework

As illustrated in Figure 3, we conducted a comprehensive examination of the encrypted data from three key perspectives: the correctness of the data, the ownership of the data, and the assessment of contributions.
In the cycle of federated learning, participants initially train and encrypt their models locally before uploading them to an aggregation server. Once the server has verified the ownership and integrity of these models, it securely aggregates them to produce an updated global model. This model is then decrypted and optimized before being redistributed to the participants for further training or deployment. The cycle includes several crucial security measures: verification of the correctness of the encrypted data, verification of data ownership to prevent resale, and the assessment of contributions and reward distribution, which together effectively counter four typical types of encrypted-data attacks. These measures collectively ensure the security and fairness of the federated-learning project, thereby better serving all participants. The notations of the data attack detection framework are denoted in Table 1.
Correctness verification is the use of encryption and sophisticated aggregation algorithms where end-to-end encrypted verification of the aggregation results is achieved, ensuring the integrity and authenticity of data throughout its transmission and encryption/decryption processes.
Source and data-ownership verification methods involve creating a tamper-proof cryptographic token that ensures any attempt to modify data after submission is detected, effectively preventing the risks of data tampering or illegal resale.
Contribution assessment and reward distribution are each participant’s actual influence, and contributions are assessed to ensure the fairness of reward distribution. This process aims to motivate meaningful contributions and prevent participants from submitting minimal or ineffective data to gain undue benefits.

4.2. Correctness Verification for Ciphertext-Oriented Tampering Attacks

In the federated-learning framework based on homomorphic encryption, the introduction of 6G networks allows participants to exchange data in an environment with extremely low latency and high reliability, yet there still remains a risk of failures caused by computational errors or malicious actions. In this advanced network context, although all model parameters are processed in ciphertext before aggregation, making it difficult for participants to identify whether the data has computational errors or has been maliciously tampered with, this raises issues of model parameter correctness. This study combines the advantages of 6G communications with a secure aggregation encoding scheme, utilizing the algorithm’s features to ensure that only after the data has been correctly aggregated can the service provider decrypt it. This not only ensures the confidentiality of the data providers’ data but also significantly enhances the efficiency and security of data processing. Furthermore, this research introduces a rapid verification method for encrypted data among multiple participants based on a homomorphic hashing algorithm [32], allowing participants to use the rapid-response features of 6G networks to quickly check whether data have been tampered with or calculations are erroneous without revealing the plaintext, further enhancing the reliability and real-time capabilities of the federated-learning model.
The primary characteristics of 6G networks include extremely high data-transmission rates and ultra-low latency, significantly enhancing the speed and reliability of data exchange and providing a technical foundation for implementing effective security measures. For this advanced network environment, we propose the following design principles for correctness verification:
The integrity-verification principle in the 6G communication environment is characterized by ultra-high speed and ultra-low latency, and each participant should be able to verify in real time the correctness and completeness of data processing and computational results of others, without exposing their original data. This ensures data integrity during rapid transmission and processing, meeting the security requirements of high-speed data handling and low-latency communication.
For example, the correctness verification scheme used in Ref. [32] employs a homomorphic hashing algorithm, using hash values to verify the correctness of aggregated ciphertexts by validating whether H1 (∏mi) = ∏H1 (mi) holds true, thereby determining if the correct aggregated model has been obtained. This step adheres to the integrity-verification principle, allowing each participant to clearly see whether the data processing maintains consistency and correctness.
For the verifiability and transparency principle, throughout all data sharing and aggregation processes, the real-time correctness and transparency of data are ensured through public verification and encryption/decryption auditing mechanisms, meeting the audit and compliance requirements in a high-speed data-processing environment.
The correctness verification scheme mentioned in Ref. [32], through encp (G‖H(acc)‖Signp (G‖H(acc))), demonstrates the integrity and authenticity of the data sent by the participants. This ensures both the verifiability and transparency of data transmission.

4.3. User Data-Ownership Authentication against Ciphertext Resale Attacks

In a 6G environment, the central server in federated learning cannot directly verify whether the sender of model parameters is their actual trainer, as all models are encrypted before parameter aggregation. In such environments, malicious parties may participate in the learning process without using their local computational resources, by stealing and reselling other participants’ encrypted models, thus demotivating the honest participants from training. To address this challenge, this paper introduces a user data-ownership authentication mechanism based on Pedersen commitments. Through this mechanism, the service provider initiates a challenge that requires data providers to verify that they possess the plaintext corresponding to the ciphertext, and only the true data owners can successfully respond to the challenge. Additionally, by combining the aggregate commitment authentication algorithm based on the Σ-protocol, it is possible to verify whether each piece of data has been illegally resold or tampered with without decrypting individual model parameters [33]. This method leverages the low latency and high reliability of 6G to enhance the efficiency and security of data exchange, ensuring data integrity and participant integrity during the federated-learning process.
The introduction of 6G networks offers several key technological advantages. The ultra-low latency feature supports real-time verification, allowing for the immediate verification of data authenticity and integrity at the time of upload and various processing stages. The high bandwidth and large capacity are suitable for dynamic trust assessments. In this advanced network environment, we propose the following design principles for data-ownership authentication:
For the encryption and commitment verification principle, each data owner should encrypt their data and generate a commitment to prove the data’s origin and ownership. This ensures that data owners can verify the legality and integrity of their data without disclosing the original information. Using the ultra-low latency and high bandwidth characteristics of 6G, the immediate verification of encrypted data and commitments can be realized.
As used in the data-ownership authentication scheme in Ref. [33], through the formula C = grhm, data owners provide a commitment C to a central server, demonstrating control and ownership of the data without directly displaying the data itself. The implementation of encryption and commitment technologies ensures the verification of data ownership while protecting data privacy.
For the error detection and traceability principle, in the data-ownership audit, if the preliminary verification fails, the audit process can be further refined, and certain data owners can be selected for a focused audit. Utilizing the high-speed data-transmission capabilities of 6G, audit information can be quickly collected and processed, and errors can be detected and traced in real time.
As used in the data-ownership authentication scheme in Ref. [33], through the formula guhv = RCe, the central server can confirm the correctness of each data owner’s commitments. If mismatches or errors are found at this stage, the central server can conduct a more granular grouped aggregation audit to locate the problem, selecting some data owners for re-audit. This not only detects specific errors but also traces them back to specific data owners, thereby achieving efficient error detection and traceability.

4.4. Ciphertext Data-Contribution Assessment for Poisoning and Free-Riding Attacks

In federated learning, since encrypted model parameters are in ciphertext before aggregation, it is difficult to directly assess data quality or accurately locate malicious data. To counter attacks potentially launched by passive data holders, this paper utilizes the high-speed data-transmission capabilities of 6G communication technology and adopts a ciphertext data-contribution assessment method based on dual-trapdoor encryption. This method combines enhanced encrypted noise-cancellation technology from the BCP algorithm by introducing noise into the model parameters to prevent attackers from inferring user data. Although noise is introduced, thanks to the dual-trapdoor homomorphic properties, the noises can cancel each other out during the aggregation process, thus not affecting the true contribution of the data. Additionally, the paper employs a technique for the precise localization of malicious users based on group aggregation [34]. Leveraging the high-speed processing capabilities of 6G, data are rapidly aggregated, decrypted, and evaluated across multiple groups. By comparing the model accuracy in each group, if a participant consistently performs poorly across multiple groups, they can be considered a potential low-contributing user. This technique can effectively identify malicious attackers, enhancing data security and the quality of model training during the federated-learning process.
The verifiability principle states that all data-contribution assessments must be verifiable. Any results generated during the assessment process should be independently verifiable to ensure their accuracy and fairness. Using the ultra-low latency and high-speed data-transmission capabilities of 6G, real-time verification of data-contribution assessments can be achieved. With the extensive connectivity and enhanced bandwidth of 6G networks, broader data synchronization and sharing can be realized, enhancing system transparency.
As used in the data-contribution assessment scheme in Ref. [34], through the joint audit algorithm, encrypted gradient data Cij from each data owner are aggregated and decrypted, obtaining the actual gradient contributions mri and mcj, where the steps are verifiable, and the assessment results are publicly transparent, accessible, and verifiable by any third party or auditor, reflecting the verifiability principle.
The fairness principle states that the encrypted-data-contribution assessment should fairly reflect the actual contributions of each participant, regardless of the size or quality of their data. The assessment algorithm must accurately distinguish and quantify the value of different contributions, ensuring the fairness of resource distribution and incentive mechanisms. The flexibility and high speed of 6G support rapid adjustments to assessment algorithms and standards, ensuring that the fairness of assessments timely reflects the latest data and participant contributions, optimizing resource allocation and ensuring that all participants benefit fairly.
As used in the data-contribution assessment scheme in Ref. [34], through the formula if Lri − Lij > ϵ3 and Lcj − Lij > ϵ3, the actual data contribution of devices is tested against their claimed contribution, and devices exceeding the threshold ϵ3 are identified as free riders. This step ensures that all participants are allocated resources and incentives according to their actual data contributions, thus maintaining fairness.

5. Security Analysis

5.1. Verification of the Correctness of Transmission, Encryption/Decryption, and Aggregation

Homomorphic hashing allows for operations on encrypted data while maintaining their encrypted state, meaning data can be verified without decryption.
In this paper, the additive homomorphic hash function (H1(x)) supports the validation of addition operations on ciphertexts. When data owners submit hash values of model parameters to the central server, the server can use these hashes to verify the integrity of the aggregated model parameters. If H1 (∏mi) = ∏H1 (mi) holds true, it indicates that all model parameters have not been tampered with during aggregation, ensuring the integrity of addition operations.
Multiplicative homomorphic hash function (H2(x)) allows the central server to verify the correctness of multiplication operations. By checking whether H2(∏ci) = ∏H2(ci) holds, it is confirmed that the encrypted model parameters remain unaltered during aggregation.
Paillier encryption is a public-key encryption system with additive homomorphic properties, allowing arithmetic operations on encrypted data without decryption:
The encryption process (Enc ()) and decryption process (Dec ()) is when data owners use Paillier encryption for their model parameters, these are converted into ciphertext, ensuring the confidentiality of the data. Only entities possessing the correct private key (such as the central server or encryption service providers) can decrypt the aggregated ciphertext back to the original aggregated model.
Additive homomorphic property: The additive homomorphic property of the Paillier encryption system means that encrypted data can directly undergo arithmetic addition. This allows the central server to aggregate encrypted models directly without accessing the raw data, further protecting data confidentiality and preventing data tampering during aggregation.
Combining homomorphic hashing and Paillier encryption, this approach not only protects data confidentiality and integrity but also allows effective data processing and verification while maintaining data encryption. This method enhances the system’s resistance to tampering attacks, ensuring the security and transparency of the data-processing process.

5.2. Legitimacy Verification of Encrypted-Data Sources

The Pedersen commitment is a cryptographic commitment scheme that allows a data holder to commit to a value without revealing it to anyone. It features two key properties: concealment and binding. Concealment ensures the confidentiality of the original value, while binding prevents the data holder from changing the committed value in the future.
The Σ-protocol is an interactive proof system that allows one party (the prover) to demonstrate knowledge of certain information (such as the original value of a cryptographic commitment) to another party (the verifier) without revealing the information itself. This protocol is completed through a series of challenges and responses, ensuring the integrity of the proof.
Commitment sharing and challenge response are when each data owner DOi generates a Pedersen commitment Ci for their data mi and uploads it along with the encrypted data ci. The central server initiates a challenge, requiring data holders to respond to a random challenge involving their commitments, to verify that they indeed possess the original data.
Aggregation and verification are when the central server aggregates all commitment values and responses, verifying the equation guhv = RCe, ensuring that both the data aggregation and responses are correct. This step verifies whether each data holder possesses the plaintext data mi corresponding to their commitment.
Random number disclosure and final verification occur when, after verification, data holders must disclose the random number ri used in calculating the commitment. The central server uses these random numbers for the final consistency check, ensuring that each commitment matches the corresponding data item. Through this mechanism, the system ensures that all participants are the legitimate owners of the data they submit and that the data has not been tampered with or illegally resold during the upload process.
Moreover, if errors are detected during audits, the system can implement more granular auditing measures, such as re-auditing the commitments of specific data holders, which helps precisely locate and identify any malicious actors attempting to resell encrypted data.

5.3. Validation of Participants’ Contribution Assessment

This approach achieves continuous monitoring and auditing of data quality through the combination of behavior roles and audit roles.
The behavior role is responsible for receiving and recording each data owner’s encrypted model parameters and related audit information. Key functions of the behavior role include the following.
Adding random noise by introducing random noise into model parameters can prevent attackers from deducing other participants’ data or learning strategies from encrypted model parameters, enhancing data confidentiality and providing an additional verification layer for the audit role to check data authenticity.
By recording and forwarding data by storing encrypted model parameters and audit information in a new block and forwarding it to the audit role, the behavior role ensures the tamper-proof nature and integrity of the data, providing a reliable data source for subsequent quality audits.
The audit role processes models received from the central server, responsible for grouping, aggregating, and decrypting encrypted gradients sent by the behavior role, including:
For the quality audit, after decrypting the aggregated model, the audit role conducts a quality audit using a predefined test set, which is crucial for detecting poisoning attacks as it allows the system to assess the actual performance of the aggregated model and compare it with the data quality claimed by participants, thus identifying potential malicious or erroneous data inputs.
Recording and updating aggregated results: By recording audit results on the blockchain, the audit role ensures the transparency and tamper-proof nature of the audit process, enhancing the entire system’s resistance to free-riding behavior.
Through the collaboration of the behavior and audit roles, this scheme provides a comprehensive security defense system for the federated-learning environment. By auditing the quality of each submitted data, the system can effectively identify and eliminate malicious or erroneous data that may compromise model performance; through continuous quality checks and participation assessments, the system ensures all participants contribute valuable input, preventing those who attempt to benefit without actual effort. This effectively defends against data poisoning and free-riding security threats, ensuring data quality and integrity in the federated learning environment.

6. Experiments Evaluation

6.1. Secure Aggregation Data Correctness Verification

This section trains deep-learning models for image recognition using the MNIST dataset and the Celeb A dataset. For model training based on the MNIST dataset, 10 CPU servers (Hygon C867 16-core processors) are set up to simulate one central server and nine data holders, respectively. For the Celeb A dataset model training, one GPU server (eight NVIDIA TESLA T4 GPUs, 16 GB) is used to perform the training for data holders in batches. The deep-learning training is based on the Python (3.8.3) and PyTorch (1.6.0) libraries, while data encoding and encryption are based on the Charm-crypto (0.5.0) library and Numpy (1.18.5) library. The unit of measurement for experimental time is seconds.
In the 6G environment, the speed of data, the volume of data, and the number of connected devices will far exceed existing technologies. Therefore, the techniques for verifying the correctness of aggregated data must not only be highly efficient and secure but also adapt to the characteristics of high-speed networks and large-scale device networks. To further compare and analyze the technological methods for verifying the correctness of aggregated data in federated learning, we have selected five verification methods: zero-knowledge proofs, Paillier encryption, secure multiparty computation, multiplicative homomorphic encryption, additive homomorphic encryption, and blockchain technology.
For zero-knowledge proofs [35], the prover can demonstrate the truthfulness of a statement to the verifier without revealing any information beyond the validity of the statement itself.
For Paillier Encryption [36], given two ciphertexts, anyone can compute the ciphertext of their sum without access to the original plaintexts.
Secure multiparty computation [37] enables multiple parties to jointly compute a function over their inputs while ensuring that each party’s inputs remain confidential.
Multiplicative homomorphic encryption [38] allows for multiplication operations on encrypted data without the need for decryption.
Additive homomorphic encryption [39] allows for addition operations on encrypted data without the need for decryption.
Blockchain technology [40] is a distributed ledger technology that utilizes cryptographic hashing, linking structures, and consensus mechanisms to ensure the immutability and transparency of data.
The performance comparison of the various encryption methods is shown in Table 2.
ZKP [35] and SMPC [36] emphasize the correctness of data processing and privacy protection, and Paillier also performs well in these aspects, with advantages in scalability. MTH [38] and ATH [39] excel in efficiency but may require compromises in other areas. Blockchain technology [40] offers high levels of data immutability and auditability, but it has lower privacy and efficiency. When choosing a specific method, it is necessary to balance based on the application scenario and specific requirements.
We use the homomorphic Paillier encryption algorithm [36] to protect data privacy, implement data integrity auditing based on a homomorphic hash algorithm, and demonstrate its feasibility. The security parameters of the Paillier encryption algorithm and the homomorphic hash algorithm are set to 1024, where the modulus n in the Paillier encryption algorithm and the multiplication homomorphic hash function H2(x) is a 1024-bit large prime number, and the modulus p in the addition homomorphic hash function H1(x) is a 1024-bit large prime number. Performance evaluation of the Paillier encryption algorithm [36] and the homomorphic hash algorithm shows that the time consumption for data integrity verification remains within a relatively low range.
In this experiment, the model precision is first set to six so that the length of each parameter-encoded message is 10. Using the encoded parameters, the encryption and decryption times of the Paillier encryption algorithm [36] are tested with 1000 plaintext messages of lengths 30, 50, 100, 200, and 300, as shown in Table 3.
When the message length doubles, the encryption time increases by only about 20%, and the decryption time remains almost unchanged. This is because Paillier encryption is mainly composed of exponentiation operations. As the message increases, the exponentiation operations will also increase accordingly. However, the size of the exponent in the decryption process is fixed, so the decryption time is almost independent of the message length.
For evaluating the computational cost of the homomorphic hash function and comparing it with the Paillier encryption algorithm, this experiment calculates the hash values for 1000 plaintext messages of lengths 30, 50, 100, 200, and 300, as shown in Table 4.
When the message length doubles, the computation time for the addition homomorphic hash algorithm H1(x) shows a doubling growth, while the computation time for the multiplication homomorphic hash algorithm H2(x) remains almost unchanged. The performance cost of both types of homomorphic hash algorithms is much lower than that of the Paillier encryption algorithm, indicating that the time consumption of the homomorphic hash auditing algorithm is acceptable.
To test the verification consumption, set the encrypted message length to 300, and perform 1000 Paillier aggregation, decryption, and homomorphic hash verification for 5, 10, 15, 20, 25, and 30 encrypted messages, as shown in Table 5.
As the number of messages increases, the time for Paillier aggregation and decryption, H1(x) aggregation verification, and H2(x) aggregation verification all show slow growth. The time consumption for H1(x) and H2(x) aggregation verification is much lower than that for Paillier aggregation and decryption, indicating that the homomorphic hash auditing algorithm supports aggregation verification for more data situations, and the running consumption is acceptable.
When choosing a data-aggregation correctness verification technology suitable for the 6G environment, it is necessary to comprehensively consider factors such as the sensitivity of the data, processing speed, system scalability, and network environment. Each method has its advantages and limitations, and the best choice depends on the specific application requirements and environmental conditions.
Secure multiparty computation (SMPC) [37] enables collaborative processing of data without revealing the data of each party. In a 6G environment, the fast data-transmission speed makes the multi-round interaction in SMPC more feasible, even maintaining efficiency in large-scale systems. Additionally, due to the wider device interconnection supported by 6G, SMPC is suitable for secure data processing across different devices and organizations.
The Paillier algorithm [36] provides additive homomorphic properties, allowing mathematical addition operations to be performed directly on encrypted data, which is very useful for applications that require aggregation under the premise of data confidentiality. In the 6G environment, the ability to process large-scale data is particularly critical, and the Paillier algorithm enables fast aggregation without decrypting the data, making it ideal for cloud services and big-data analysis.
Zero-knowledge proofs (ZKPs) [35] allow a prover to prove the correctness of a statement without revealing the information itself, which is particularly important for protecting user privacy. With the advancement of data-processing capabilities and complex applications driven by 6G, ZKPs can be used to ensure that data processing in areas such as secure transactions and user authentication does not reveal sensitive information, such as privacy-protecting transactions in financial services and identity verification in government services.
Additive homomorphic hash functions (ATH) [39] allow addition operations to be performed on encrypted data, which is useful for real-time statistics on sensitive data. In a 6G environment, ATH can support distributed, large-scale data sources for real-time data aggregation.
Multiplicative homomorphic hash functions (MTH) [38] provide the homomorphic property of multiplication, suitable for applications that require computing the product of encrypted data. In the 6G environment, for example, in insurance, finance, or health data analysis, MTH allows for the calculation of precise statistical models and risk assessments while keeping the data encrypted, supporting more complex data-processing requirements.
Blockchain technology [40] provides decentralized data management, enhancing data transparency and immutability, and supports the automatic execution of smart contracts. In a 6G environment, blockchain is particularly suitable for applications requiring highly secure and transparent recording and transaction verification. For example, blockchain can be used for Internet of Things (IoT) device management, ensuring secure and reliable data exchange between devices, and automating the execution of device updates and maintenance protocols.

6.2. Ciphertext Data-Ownership Authentication

In the 6G environment, the goal of data-ownership auditing is to ensure data security and privacy while supporting high-speed data processing and low-latency communication. Here is a performance comparison of several data-ownership auditing methods, as shown in Table 6.
ZKP [35] is the most general and efficient across different parameters. The Pedersen commitment [33] is suitable for privacy and efficient communication but may not scale well for large-scale applications. HE [38] provides privacy and computational efficiency, which is ideal for secure data computation, but may not be the most efficient in terms of communication. VC [41] can handle large amounts of data and compute efficiently but may compromise privacy and require more communication overhead.
In this experiment, the security parameters of the Pedersen commitment algorithm are set to 1024, where the modulus p is a large prime number of 1024 bits. The values of the message m, random number r during commitment, challenge value e, and random numbers k, one during response are random integers between 0 and p. The feasibility of the Pedersen commitment algorithm is verified in the experiment, and its computational efficiency is evaluated by a comparison with the Paillier encryption algorithm and the fake homomorphic hash algorithm.
The experiment first sets the model precision to six, so that the length of each parameter-encoded message is 10. Using the encoded parameters, 1000 plaintext messages of lengths 30, 50, 100, 200, and 300 are, respectively, tested for the algorithmic time consumption of Pedersen commitment computation (PCC), Pedersen commitment response (PCR), and Pedersen commitment verification (PCV), as shown in Table 7.
When the length of the commitment message doubles, the time consumption for Pedersen commitment computation and verification shows slow growth. This is because the Pedersen commitment verification process includes the commitment computation process of aggregating the message m. The response time of the Pedersen commitment is independent of the length of the commitment message and remains within a fixed range. As the length of the commitment message increases, the time consumption for commitment computation and verification does not show the same growth trend, indicating that increasing the length of the commitment message can reduce the total commitment time and total verification time of the model parameters for a fixed-size model. The time consumption for aggregate verification with different numbers of messages when the commitment length changes is shown in Table 8.
Increasing the number of commitment messages does not significantly increase the aggregate verification time. When the length of the commitment message doubles, the total aggregate verification time is less than twice the previous time. Therefore, increasing the length of the commitment message can reduce the total verification time of the model.
For the Paillier encryption algorithm and the multiplication homomorphic hash function H2(x), this experiment sets the message length to 300, trains 10 different model parameters, and performs aggregation, decryption, and verification. For the fake homomorphic hash function H1(x) and Pedersen commitment, the message length is set to 2400 in this section. The time consumption for the aggregation, decryption, and verification of the models is evaluated. As shown in the experiment, increasing the message length can significantly reduce the verification consumption of the fake homomorphic hash function H1(x) and Pedersen commitment. Therefore, the performance overhead of this scheme is acceptable. A performance comparison of the Pedersen commitment algorithm with the Paillier encryption algorithm and the fake homomorphic hash algorithm is shown in Table 9.
When considering the application of these methods in a 6G environment, their suitability may vary due to their unique performance characteristics; 6G is expected to bring unprecedented data rates and ultra-low latency that will place higher demands on privacy protection and data-processing efficiency.
The Pedersen commitment [33] supports homomorphic operations, allowing some calculations to be performed while keeping the data encrypted, reducing the risk of exposure during data transmission. It is suitable for scenarios in a 6G environment, where data needs to be collected and analyzed from a large number of devices without exposing the content of individual data points.
Zero-knowledge proofs (ZKPs) [35] allow users to prove possession of certain attributes or permissions without revealing specific information, ensuring the privacy and security of the verification process. They are suitable for scenarios in a 6G environment, where user identity or permissions need to be verified, such as secure network access and service access control.
Homomorphic encryption (HE) [38] allows arithmetic operations to be performed directly on encrypted data and is suitable for handling highly sensitive data, such as personal health and financial information. It is applicable in 6G cloud computing and edge computing environments, where the collected data need to be analyzed and processed in real time under confidentiality.
Verifiable computation (VC) [41] allows devices to easily verify the results of external computations, ensuring the accuracy and trustworthiness of the computations, which is particularly important for decentralized 6G networks and services. It is suitable for scenarios where computing tasks need to be outsourced to cloud services or edge devices while verifying the correctness of the computation results.
In a 6G environment, the choice of appropriate auditing methods should be based on the specific requirements of the application, considering factors such as privacy protection, computational efficiency, and communication overhead. For example, Pedersen commitment and homomorphic encryption are suitable for scenarios requiring data confidentiality, while zero-knowledge proofs and verifiable computation are suitable for identity verification and outsourced computing scenarios. Each method has its advantages and limitations, and selecting the most suitable technology based on the specific requirements and resource conditions of the 6G network is crucial.

6.3. Evaluation of the Contribution of Ciphertext Data

With the development of 6G technology, it is expected to support higher data rates, lower latency, and more extensive connections of IoT devices. In this environment, the selection of encryption algorithms must consider not only security but also the ability to pinpoint malicious data, computational efficiency, and communication efficiency. This section will compare various encryption methods and discuss the 6G scenarios suitable for each method, as shown in Table 10.
The method from the literature [42] and the PL-FedIPEC [44] methods meet basic security requirements and perform well in terms of efficiency but are less secure than BCP and Paillier methods. BCP performs well in protection and malicious data detection but may be less efficient than other methods. Paillier focuses on data protection and may require additional measures in other areas. OPHE [43] performs well in aspects other than protecting model parameters. Choosing the right method based on the specific needs of the application is crucial.
In the experiment for evaluating the contribution of encrypted data, the functions of the behavior role and audit role are deployed on six CPU servers. The purpose of this section is to test the performance of various encryption algorithms with different encryption algorithm choices in different 6G environments, primarily focusing on analyzing the efficiency issues in protecting model quality based on encryption and decryption time consumption, proving their feasibility. We use the BCP algorithm to encrypt and decrypt the model parameters, comparing it with the Paillier encryption method and the method from the literature [42]. We set the encoded length of each parameter as 10, composing plaintext messages of lengths 30, 50, 100, 200, and 300, respectively. We use the BCP encryption algorithm, Paillier encryption algorithm, and method from the literature [42] to encrypt and decrypt 1000 plaintexts each, as shown in Table 11.
When the size of the encryption group increases, the encryption time for Paillier increases linearly, but the encryption time for BCP and the method from the literature [42] remains constant. The method from the literature’s [42] encryption takes less time. When the message length is 300, the encryption times for BCP and Paillier are almost the same. Thus, choosing an encrypted message length of 300 does not result in higher encryption time consumption for either BCP or the method from the literature [42].
The decryption time consumption of the BCP encryption algorithm was tested. In BCP encryption, two different ciphertexts can be encrypted with different public keys. The aggregated ciphertext can only be decrypted using the master key, with each data holder generating their private key locally, preventing the leakage of private keys. The decryption time consumptions for BCP master key encryption and private key encryption were compared with the Paillier encryption algorithm. Since the decryption process of the method from the literature [42] encryption algorithm model is consistent with the normal model aggregation process, we have omitted the decryption process, as shown in Table 12.
When the encrypted message length increases, the decryption time for each ciphertext remains unchanged. However, the decryption time for BCP encryption is longer compared to Paillier, especially in the master key decryption process, due to BCP decryption supporting the aggregation of ciphertexts encrypted with different public keys involving more exponential calculations.
The experiment used 5, 10, 15, and 20 encrypted messages for 1000 aggregation decryption tests, and the decryption times obtained are shown in Table 13. As the number of participants increases, the decryption time significantly increases.
Communications with 6G will drive smarter and faster network services, and the choice of each encryption technology should consider how it supports specific needs in a 6G environment.
Paillier encryption, due to its additive homomorphic properties, is suitable for privacy-protected data aggregation and analysis, which is especially important in a 6G environment, where sensitive data from various devices and services are processed. Given that 6G will support highly dynamic network environments and services, the flexibility and homomorphic properties of Paillier encryption may be more suitable for these rapidly changing and highly distributed application needs.
Traditional dual-trapdoor encryption algorithms, based on complex mathematical problems, provide highly secure data encryption and decryption, suitable for data transmission that requires high-security assurance.
The method from the literature [42] does not require complex iterative processes or large-number computations; even with an increase in shares, its computational complexity remains relatively low, making the entire encryption and decryption process more efficient. For applications that require rapid data responses, it may be necessary to consider using the method from the literature [42].
The OPHE [43] encryption method is particularly suited for applications that need to handle sensitive data; in a 6G environment, it reduces communication overhead and can be used for real-time data processing and analysis, especially in communication systems where personal privacy is particularly important.
PL-FedIPEC [44] maintains encryption strength while reducing the time required for the encryption stage and improving computational efficiency; with the support of 6G, this method helps increase the speed and efficiency of data processing, supporting real-time dynamic decision-making and response. Suitable for edge computing scenarios, it can play a role in intelligent transportation systems and urban monitoring.
These encryption technologies and methods provide varying levels of security and efficiency; the choice of the appropriate method should be based on the needs and characteristics of specific application scenarios. In a 6G environment, these technologies can support high-speed, large-scale data processing and machine learning while ensuring data privacy and overall system efficiency.

7. Conclusions

This paper conducts an in-depth study on the upcoming 6G network and its core features, with a particular emphasis on the deep integration of sensing and networking, as well as intelligent applications based on this integration. The paper specifically points out that federated learning, as a key training paradigm for the new generation of 6G intelligent applications, faces challenges from inversion attacks when dealing with distributed data. To address this, a cryptographic-based secure aggregation method is typically introduced to protect the privacy and confidentiality of data gradients. However, the semantic ambiguity of encrypted data poses new challenges for evaluating the correctness, availability, and legitimacy of data sources. To solve these issues, this paper proposes a data attack detection framework based on a cryptographic secure aggregation method, which aims to prevent data-aggregation errors, data poisoning, and illegal data sources through encrypted-data auditing techniques, thereby enhancing the security and correctness of the cryptographic secure aggregation method. We also compared various security technologies, analyzing their applicability and effectiveness in 6G environments. Through comprehensive security analysis, we demonstrated the effectiveness of the proposed framework, showing that it can effectively prevent security vulnerabilities while protecting data privacy, providing a secure and efficient data-processing solution for 6G intelligent applications. This offers a theoretical foundation and practical guidance for the security strategies and technological implementation of future 6G networks.
As the deployment and development of 6G networks progress, there will be a continuous optimization of the security and efficiency of federated learning, especially targeting the high-speed and low-latency characteristics of 6G environments. Exploration of more lightweight encryption technologies and attack detection mechanisms will meet the heightened demands for data-processing speed and security in 6G environments. It is anticipated that 6G will support more efficient secure multiparty computation and encrypted-data auditing technologies, reducing computational complexity and latency and making it suitable for large-scale real-time data processing in 6G networks. Through the innovation and application of these technologies, more secure and efficient intelligent application solutions will be provided for 6G networks.

Author Contributions

Conceptualization, Z.S. and L.Y.; Data curation, P.X.; Formal analysis, J.L.; Funding acquisition, C.L.; Investigation, C.L. and H.W.; Methodology, J.W.; Resources, H.W.; Supervision, L.Y.; Writing—original draft, Z.S.; Writing—review and editing, Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Major Research plan of the National Natural Science Foundation of China, grant number 92167203; in part by the National Natural Science Foundation of China, grant number 62002077; in part by Guangdong Basic and Applied Basic Research Foundation, grant number 2024A1515011492; in part by Guangzhou Science and Technology Plan Project, grant number 2023A03J0119; in part by Guangxi Key Laboratory of Trusted Software, grant number KX202313.

Data Availability Statement

Publicly available datasets were analyzed in this study. This MNIST dataset can be found here: http://yann.lecun.com/exdb/mnist/ (accessed on 13 May 2024). The Celeb A dataset can be found here: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html (accessed on 13 May 2024).

Conflicts of Interest

Author Hanyi Wang was employed by the company China Mobile (Suzhou) Software Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Hüseyin, A.; Dogan-Tusha, S.; Yazar, A. 6G vision: An ultra-flexible perspective. ITU J. Future Evol. Technol. 2020, 1, 121–140. [Google Scholar] [CrossRef]
  2. Das, D. Secure cloud computing algorithm using homomorphic encryption and multi-party computation. In Proceedings of the 2018 International Conference on Information Networking (ICOIN), Chiang Mai, Thailand, 10–12 January 2018; pp. 391–396. [Google Scholar] [CrossRef]
  3. Zhao, L.; Jiang, J.; Feng, B.; Wang, Q.; Shen, C.; Li, Q. SEAR: Secure and Efficient Aggregation for Byzantine-Robust Federated Learning. IEEE Trans. Dependable Secur. Comput. 2022, 19, 3329–3342. [Google Scholar] [CrossRef]
  4. Pillutla, K.; Kakade, S.; Harchaoui, Z. Robust Aggregation for Federated Learning. IEEE Trans. Signal Process. 2019, 70, 1142–1154. [Google Scholar] [CrossRef]
  5. Yang, Z.; Zhou, M.; Yu, H.; Sinnott, R.; Liu, H. Efficient and Secure Federated Learning With Verifiable Weighted Average Aggregation. IEEE Trans. Netw. Sci. Eng. 2023, 10, 205–222. [Google Scholar] [CrossRef]
  6. Elkordy, A.; Avestimehr, A. HeteroSAg: Secure Aggregation With Heterogeneous Quantization in Federated Learning. IEEE Trans. Commun. 2020, 70, 2372–2386. [Google Scholar] [CrossRef]
  7. Wang, D.; Zhang, N.; Tao, M. Clustered federated learning with weighted model aggregation for imbalanced data. China Commun. 2022, 19, 41–56. [Google Scholar] [CrossRef]
  8. Liu, S.; Yu, J.; Deng, X.; Wan, S. FedCPF: An Efficient-Communication Federated Learning Approach for Vehicular Edge Computing in 6G Communication Networks. IEEE Trans. Intell. Transp. Syst. 2021, 23, 1616–1629. [Google Scholar] [CrossRef]
  9. Peng, Z.; Xu, J.; Chu, X.; Gao, S.; Yao, Y.; Gu, R.; Tang, Y. VFChain: Enabling verifiable and auditable federated learning via blockchain systems. IEEE Trans. Netw. Sci. Eng. 2021, 9, 173–186. [Google Scholar] [CrossRef]
  10. Fu, A.; Zhang, X.; Xiong, N.; Gao, Y.; Wang, H.; Zhang, J. VFL: A verifiable federated learning with privacy-preserving for big data in industrial IoT. IEEE Trans. Ind. Inform. 2020, 18, 3316–3326. [Google Scholar] [CrossRef]
  11. Xu, G.; Li, H.; Liu, S.; Yang, K.; Lin, X. Verifynet: Secure and verifiable federated learning. IEEE Trans. Inf. Forensics Secur. 2019, 15, 911–926. [Google Scholar] [CrossRef]
  12. Weng, J.; Weng, J.; Zhang, J.; Li, M.; Zhang, Y.; Luo, W. Deepchain: Auditable and privacy-preserving deep learning with blockchain-based incentive. IEEE Trans. Dependable Secur. Comput. 2019, 18, 2438–2455. [Google Scholar] [CrossRef]
  13. Shin, Y.; Noh, G.; Jeong, I.; Chun, J. Securing a Local Training Dataset Size in Federated Learning. IEEE Access 2022, 10, 104135–104143. [Google Scholar] [CrossRef]
  14. Zheng, Y.; Lai, S.; Liu, Y.; Yuan, X.; Yi, X.; Wang, C. Aggregation Service for Federated Learning: An Efficient, Secure, and More Resilient Realization. IEEE Trans. Dependable Secur. Comput. 2022, 20, 988–1001. [Google Scholar] [CrossRef]
  15. Kim, H.; Park, J.; Bennis, M.; Kim, S.L. Blockchained on-device federated learning. IEEE Commun. Lett. 2019, 24, 1279–1283. [Google Scholar] [CrossRef]
  16. Li, Y.; Chen, C.; Liu, N.; Huang, H.; Zheng, Z.; Yan, Q. A blockchain-based decentralized federated learning framework with committee consensus. IEEE Netw. 2020, 35, 234–241. [Google Scholar] [CrossRef]
  17. Zhao, Y.; Zhao, J.; Jiang, L.; Tan, R.; Niyato, D.; Li, Z.; Lyu l Liu, Y. Privacy-preserving blockchain-based federated learning for IoT devices. IEEE Internet Things J. 2020, 8, 1817–1829. [Google Scholar] [CrossRef]
  18. Lu, Y.; Huang, X.; Dai, Y.; Maharjan, S.; Zhang, Y. Blockchain and federated learning for privacy-preserved data sharing in industrial IoT. IEEE Trans. Ind. Inform. 2019, 16, 4177–4186. [Google Scholar] [CrossRef]
  19. Shafahi, A.; Huang, W.R.; Najibi, M.; Suciu, O.; Studer, C.; Dumitras, T.; Goldstein, T. Poison frogs! targeted clean-label poisoning attacks on neural networks. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar] [CrossRef]
  20. Nelson, B.; Barreno, M.; Chi, F.J.; Joseph, A.D.; Rubinstein, B.I.; Saini, U.; Sutton, C.; Tygar, J.D.; Xia, K. Exploiting machine learning to subvert your spam filter. LEET 2008, 8, 16–17. [Google Scholar]
  21. Bhagoji, A.N.; Chakraborty, S.; Mittal, P.; Calo, S. Analyzing federated learning through an adversarial lens. In Proceedings of the International Conference on Machine Learning; PMLR: Long Beach, CA, USA, 2019; pp. 634–643. [Google Scholar]
  22. Lim, W.Y.B.; Luong, N.C.; Hoang, D.T.; Jiao, Y.; Liang, Y.C.; Yang, Q.; Niyato, D.; Miao, C. Federated learning in mobile edge networks: A comprehensive survey. IEEE Commun. Surv. Tutor. 2020, 22, 2031–2063. [Google Scholar] [CrossRef]
  23. Shen, S.; Tople, S.; Saxena, P. Auror: Defending against poisoning attacks in collaborative deep learning systems. In Proceedings of the 32nd Annual Conference on Computer Security Applications, Los Angeles, CA, USA, 5–9 December 2016; pp. 508–519. [Google Scholar]
  24. Blanchard, P.; El Mhamdi, E.M.; Guerraoui, R.; Stainer, J. Machine learning with adversaries: Byzantine tolerant gradient descent. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
  25. Fung, C.; Yoon, C.J.M.; Beschastnikh, I. Mitigating sybils in federated learning poisoning. arXiv 2018, arXiv:1808.04866. [Google Scholar]
  26. Qu, X.; Wang, S.; Hu, Q.; Cheng, X. Proof of federated learning: A novel energy-recycling consensus algorithm. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 2074–2085. [Google Scholar] [CrossRef]
  27. Jiang, Y.; Zhang, W.; Chen, Y. Data Quality Detection Mechanism Against Label Flipping Attacks in Federated Learning. IEEE Trans. Inf. Forensics Secur. 2023, 18, 1625–1637. [Google Scholar] [CrossRef]
  28. Li, L.; Yu, X.; Cai, X.; He, X.; Liu, Y. Contract-Theory-Based Incentive Mechanism for Federated Learning in Health CrowdSensing. IEEE Internet Things J. 2023, 10, 4475–4489. [Google Scholar] [CrossRef]
  29. Lin, X.; Wu, J.; Li, J.; Zheng, X.; Li, G. Friend-as-Learner: Socially-Driven Trustworthy and Efficient Wireless Federated Edge Learning. IEEE Trans. Mob. Comput. 2023, 22, 269–283. [Google Scholar] [CrossRef]
  30. Qi, J.; Lin, F.; Chen, Z.; Tang, C.; Jia, R.; Li, M. High-Quality Model Aggregation for Blockchain-Based Federated Learning via Reputation-Motivated Task Participation. IEEE Internet Things J. 2022, 9, 18378–18391. [Google Scholar] [CrossRef]
  31. Ranathunga, T.; Mcgibney, A.; Rea, S.; Bharti, S. Blockchain-Based Decentralized Model Aggregation for Cross-Silo Federated Learning in Industry 4.0. IEEE Internet Things J. 2023, 10, 4449–4461. [Google Scholar] [CrossRef]
  32. Wan, J.; Xun, H.; Zhang, X.; Feng, J.; Sun, Z. A privacy-preserving and correctness audit method in multi-party data sharing. In Proceedings of the 2020 International Conference on Cyberspace Innovation of Advanced Technologies, Guangzhou, China, 4–6 December 2020. [Google Scholar]
  33. Sun, Z.; Wan, J.; Wang, B.; Cao, Z.; Li, R.; He, Y. An Ownership Verification Mechanism Against Encrypted Forwarding Attacks in Data-Driven Social Computing. Front. Phys. 2021, 9, 739259. [Google Scholar] [CrossRef]
  34. Sun, Z.; Wan, J.; Yin, L.; Cao, Z.; Luo, T.; Wang, B. A blockchain-based audit approach for encrypted data in federated learning. Digit. Commun. Netw. 2022, 8, 614–624. [Google Scholar]
  35. Zhou, C.; Fu, A.; Yu, S.; Yang, W.; Wang, H.; Zhang, Y. Privacy-Preserving Federated Learning in Fog Computing. IEEE Internet Things J. 2020, 7, 10782–10793. [Google Scholar] [CrossRef]
  36. Fan, H.; Huang, C.; Liu, Y. Federated Learning-Based Privacy-Preserving Data Aggregation Scheme for IIoT. IEEE Access 2023, 11, 6700–6707. [Google Scholar] [CrossRef]
  37. Kalapaaking, A.; Stephanie, V.; Khalil, I.; Atiquzzaman, M.; Yi, X.; Almashor, M. SMPC-Based Federated Learning for 6G-Enabled Internet of Medical Things. IEEE Netw. 2022, 36, 182–189. [Google Scholar] [CrossRef]
  38. Ma, J.; Naas, S.; Sigg, S.; Lyu, X. Privacy-preserving federated learning based on multi-key homomorphic encryption. Int. J. Intell. Syst. 2021, 37, 5880–5901. [Google Scholar] [CrossRef]
  39. Zhu, H.; Wang, R.; Jin, Y.; Liang, K.; Ning, J. Distributed Additive Encryption and Quantization for Privacy Preserving Federated Deep Learning. Neurocomputing 2020, 463, 309–327. [Google Scholar] [CrossRef]
  40. Kalapaaking, A.; Khalil, I.; Rahman, M.; Atiquzzaman, M.; Yi, X.; Almashor, M. Blockchain-Based Federated Learning with Secure Aggregation in Trusted Execution Environment for Internet-of-Things. IEEE Trans. Ind. Inform. 2023, 19, 1703–1714. [Google Scholar] [CrossRef]
  41. Zhou, H.; Yang, G.; Huang, Y.; Dai, H.; Xiang, Y. Privacy-Preserving and Verifiable Federated Learning Framework for Edge Computing. IEEE Trans. Inf. Forensics Secur. 2023, 18, 565–580. [Google Scholar] [CrossRef]
  42. Sun, Z.; Li, W.; Liang, J.; Yin, L.; Li, C.; Wei, N.; Zhang, J.; Wang, H. A Blockchain-Based Fairness Guarantee Approach for Privacy-Preserving Collaborative Training in Computing Force Network. Mathematics 2024, 12, 718. [Google Scholar] [CrossRef]
  43. Mohammadi, S.; Sinaei, S.; Balador, A.; Flammini, F. Optimized Paillier Homomorphic Encryption in Federated Learning for Speech Emotion Recognition. In Proceedings of the 2023 IEEE 47th Annual Computers 2023, Software, and Applications Conference (COMPSAC), Torino, Italy, 26–30 June 2023; pp. 1021–1022. [Google Scholar] [CrossRef]
  44. He, C.; Liu, G.; Guo, S.; Yang, Y. Privacy-Preserving and Low-Latency Federated Learning in Edge Computing. IEEE Internet Things J. 2022, 9, 20149–20159. [Google Scholar] [CrossRef]
Figure 1. The collaborative training of 6G networks.
Figure 1. The collaborative training of 6G networks.
Electronics 13 01999 g001
Figure 2. Derived security threats faced by encrypted data in secure aggregation.
Figure 2. Derived security threats faced by encrypted data in secure aggregation.
Electronics 13 01999 g002
Figure 3. The examination of encrypted data.
Figure 3. The examination of encrypted data.
Electronics 13 01999 g003
Table 1. Notations in data attack detection framework.
Table 1. Notations in data attack detection framework.
NotationsDescription
H1(x)Additive Homomorphic Hashing
H2(x)Multiplicative Homomorphic Hashing
Enc()Paillier Encryption Process
Dec()Paillier Decryption Process
DOiData Owner
miOriginal Data Item
ciEncrypted Data
CiPedersen Commitment Value
riRandom Factor for Commitment Generation
guu-th Power of Generator g
hvG v-th Power of Generator h
RA Challenge Response
Cee-th Power of Ciphertext C
LriLocal Loss of Device i
LcjLocal Loss of Device j
LijQuality Assertion of Device i and Device j
ϵ3Threshold Definition for Free-rider Device Identification
mriDecrypted Aggregate Gradient of Device i
mcjDecrypted Aggregate Gradient of Device j
Table 2. Performance comparison of encryption methods. (“√” indicates that the corresponding feature is implemented).
Table 2. Performance comparison of encryption methods. (“√” indicates that the corresponding feature is implemented).
CorrectnessPrivacyEfficiencyScalability
ZKP [35]
Paillier [36]
SMPC [37]
MTH [38]
ATH [39]
Blockchain [40]
Table 3. Time consumption for Paillier encryption and decryption.
Table 3. Time consumption for Paillier encryption and decryption.
Encrypted Message Length (bit)3050100200300
Paillier_Enc2.792.923.404.124.86
Paillier_Dec2.862.872.872.892.89
Table 4. Time consumption for homomorphic hash functions and Paillier encryption algorithm.
Table 4. Time consumption for homomorphic hash functions and Paillier encryption algorithm.
Encrypted Message Length (bit)3050100200300
Paillier_Enc2.792.923.404.124.86
H1(x)0.090.120.230.400.58
H2(x)0.290.290.290.290.29
Table 5. Time consumption for homomorphic hash functions and Paillier encryption algorithm aggregation and decryption.
Table 5. Time consumption for homomorphic hash functions and Paillier encryption algorithm aggregation and decryption.
Encrypted
Message Count
51015202530
H1(x)0.590.060.620.640.660.70
H2(x)0.590.600.610.620.630.93
Paillier_Dec2.902.942.983.103.193.31
Table 6. Performance metrics comparison of data-ownership auditing methods. (“√” indicates that the corresponding feature is implemented).
Table 6. Performance metrics comparison of data-ownership auditing methods. (“√” indicates that the corresponding feature is implemented).
PrivacyScalabilityHigh Communication
Efficiency
High Computational
Efficiency
Pedersen [33]
ZKP [35]
HE [38]
VC [41]
Table 7. Comparison of Pedersen commitment algorithm’s computation, response, and verification times.
Table 7. Comparison of Pedersen commitment algorithm’s computation, response, and verification times.
Commit Message Length (bit)30501002003006001200
PCC2.382.422.522.702.883.444.54
PCR1.171.171.171.171.171.171.17
PCV0.700.730.831.001.201.772.86
Table 8. Time consumption for aggregate verification.
Table 8. Time consumption for aggregate verification.
Commit
Message Length (bit)
30060012002400
104.195.347.5211.99
204.265.447.6012.01
304.365.497.6912.21
Table 9. Time consumption for aggregation, decryption, and verification.
Table 9. Time consumption for aggregation, decryption, and verification.
Aggregate DecryptionH1(x)H2(x)Pedersen Commitment
5.990.160.872.17
Table 10. Performance comparison of encryption algorithms for data-contribution assessment. (“√” indicates that the corresponding feature is implemented).
Table 10. Performance comparison of encryption algorithms for data-contribution assessment. (“√” indicates that the corresponding feature is implemented).
Protection Model
Parameters
Identify Malicious
Data
High Communication
Efficiency
High Computational
Efficiency
BCP
Paillier
Method from
Literature [42]
OPHE [43]
PL-FedIPEC [44]
Table 11. Encryption time consumption of each encryption algorithm.
Table 11. Encryption time consumption of each encryption algorithm.
Encrypted Message Length (bit)3050100200300
BCP_Enc4.864.874.874.874.88
Paillier_Enc2.792.923.44.124.88
Method from
literature [42]_Enc
0.250.250.250.250.25
Table 12. Decryption time consumption for each encryption algorithm.
Table 12. Decryption time consumption for each encryption algorithm.
Encrypted
Message Length (bit)
3050100200300
BCP_MK_Dec9.779.789.789.789.78
Paillier_Dec7.347.347.347.347.34
BCP_PK_Dec4.874.874.874.874.87
Table 13. Aggregate decryption time consumption for BCP master key encryption algorithm.
Table 13. Aggregate decryption time consumption for BCP master key encryption algorithm.
Number of
Participants
510152025
BCP_MK_Dec45.9589.03132.11175.19229.75
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, Z.; Liang, J.; Yin, L.; Xu, P.; Li, C.; Wan, J.; Wang, H. A Data Attack Detection Framework for Cryptography-Based Secure Aggregation Methods in 6G Intelligent Applications. Electronics 2024, 13, 1999. https://doi.org/10.3390/electronics13111999

AMA Style

Sun Z, Liang J, Yin L, Xu P, Li C, Wan J, Wang H. A Data Attack Detection Framework for Cryptography-Based Secure Aggregation Methods in 6G Intelligent Applications. Electronics. 2024; 13(11):1999. https://doi.org/10.3390/electronics13111999

Chicago/Turabian Style

Sun, Zhe, Junxi Liang, Lihua Yin, Pingchuan Xu, Chao Li, Junping Wan, and Hanyi Wang. 2024. "A Data Attack Detection Framework for Cryptography-Based Secure Aggregation Methods in 6G Intelligent Applications" Electronics 13, no. 11: 1999. https://doi.org/10.3390/electronics13111999

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop