1. Introduction
The rapid advancement of AI technologies has profoundly transformed various aspects of our lives [
1,
2,
3]. Federated learning (FL) has emerged as a distributed machine learning framework designed to safeguard data privacy [
4,
5]. By leveraging decentralized data, FL enables the collaborative construction of a shared model while ensuring data independence and privacy protection. Participants train a global model by exchanging parameter updates instead of sharing raw data, effectively addressing the issue of data silos in model training. FL has found widespread applications in smart cities [
6,
7,
8], the Internet of Things (IoT) [
9,
10,
11], drones [
12,
13], and mobile edge computing [
14,
15], providing innovative solutions to challenges in these fields.
Although FL provides basic data privacy protection for participants’ localized training, attackers can still exploit the model update parameters exchanged during the training process to launch attacks on FL systems [
16]. Membership inference attacks (MIAs) [
17], as a form of privacy leakage attack, aim to determine whether a target sample was involved in the training of the target model, posing a significant threat to the data privacy and security of the model.
However, the current FL frameworks have widely adopted privacy-preserving techniques [
18,
19] such as differential privacy (DP), secure multi-party computation (MPC), and homomorphic encryption (HE), which prevent participants from accessing model parameters or gradient information. Meanwhile, the deployment of Trusted Execution Environments (TEE) further protects the training process from being compromised by attackers. More generally, after the training phase is completed, participants are typically limited to accessing the model through a label prediction interface, rendering attacks based on output prediction vectors infeasible. Against this backdrop, traditional MIAs, which rely on output confidence, gradient updates, or prediction loss, become ineffective. Consequently, designing effective MIAs in the constrained and encrypted FL setting has emerged as a pressing challenge.
Although existing studies have further considered encrypted scenarios in the context of privacy-preserving techniques and explored attack methods applicable to label-only prediction interfaces, these methods typically rely on a single adversarial perturbation distance as the membership feature. However, the inherent diversity of non-training samples in practice often renders this feature insufficient for effectively distinguishing between member and non-member samples. As shown in 
Figure 1, the minimum perturbation distance distributions of member and non-member samples are highly similar.
To address this challenge, this paper proposes a novel Membership Inference Attack algorithm based on multiple adversarial perturbation distances (MAPD_MIA), aiming to achieve an efficient MIA under the privacy protection constraints in encrypted federated learning scenarios.
This paper first proposes a novel attack scenario specifically tailored for encrypted FL environments. In this scenario, the attacker acts as an honest participant in the training process, with no access to additional information about the target model or the ability to manipulate the training process. The attack is conducted post-training, utilizing only the prediction label interface of the global model from the final training round to perform label-only MIAs.
This paper further observes significant differences in adversarial perturbation distances near the decision boundary between member and non-member samples. As illustrated in 
Figure 1, member samples exhibit greater variations in perturbation magnitude near the decision boundary, whereas non-member samples display higher consistency. This discrepancy arises because non-training data typically reside in smoother decision boundary regions, while the decision boundary around member samples is more complex due to model overfitting, leading to pronounced variations in perturbation magnitudes across different locations. Therefore, this paper proposes the following hypothesis: an honest participant can leverage multiple adversarial perturbations near the decision boundary of a sample to perform membership inference attacks in federated learning scenarios.
Based on the above hypothesis, this paper introduces MAPD_MIA. Specifically, this study employs the HopSkipJumpAttack algorithm [
20] to generate the minimum adversarial perturbation for the target sample and utilizes binary search to generate additional adversarial samples near the decision boundary surface. The perturbation distances between the target sample and these adversarial samples are calculated and used as membership features to train the attack model, enabling the inference of membership status. These multiple perturbation distances capture the complexity of the decision boundary around the sample and reveal its adversarial robustness. Experimental results demonstrate that the proposed algorithm effectively performs membership inference attacks in encrypted FL environments while also highlighting the relationship between decision boundary complexity and membership privacy leakage risks.
The main contributions of this paper are as follows:
- An innovative Membership Inference Attack algorithm based on multiple adversarial perturbation distances (MAPD_MIA) is proposed. This algorithm effectively distinguishes between member and non-member samples by analyzing the differences in multiple adversarial perturbations near the decision boundary. 
- The proposed method is comprehensively evaluated on three benchmark datasets—–CIFAR10, CIFAR100, and MNIST. Experimental results show that it outperforms other mainstream attack methods in terms of accuracy, precision, and F1 score. 
- The relationship between decision boundary complexity and membership privacy risks is revealed through experiments, showing that overfitting makes the decision boundary of training samples more complex, therefore increasing the risk of privacy leakage. 
- The proposed method is validated against common defense mechanisms, including MemGuard and DP-SGD, and is shown to maintain strong attack capabilities even under these defensive settings. 
The remainder of this paper is organized as follows. 
Section 2 reviews related work on MIAs. 
Section 3 presents the threat model. 
Section 4 details the specific workflow of the proposed algorithm, MAPD_MIA. 
Section 5 provides experimental evaluations. Finally, 
Section 6 concludes the paper.
  2. Related Work
This section provides a categorized overview of current Membership Inference Attack (MIA) studies, beginning with early research in centralized machine learning and extending to federated learning (FL) scenarios.
Early MIAs primarily focused on centralized machine learning, with Shokri et al. [
21] introducing shadow models to infer membership and exposing privacy risks in “Machine Learning as a Service” (MLaaS). With the advent of federated learning, studies shifted to MIAs in this domain. In FL-specific studies, attacks are categorized based on membership inference features constructed using various prior knowledge by attackers. 
Table 1 outlines the membership features used in each study, along with the attacker configurations and specific attack types.
Confidence-based MIAs assume that training samples exhibit higher confidence scores than non-training samples in local or global models. Zari et al. [
22] leveraged multiple global model snapshots stored by participants to analyze the temporal evolution of confidence scores, significantly improving attack efficiency. Gu et al. [
23] extended this approach with the CS-MIA attack, enhancing confidence representation and addressing honest-but-curious server scenarios to improve reliability. He et al. [
24] used data poisoning to create label-flipped samples, which altered the decision boundaries during training, making prediction confidence more indicative of membership status. To address limited shadow data, Zhang et al. [
25] utilized Generative Adversarial Networks (GANs) to generate synthetic shadow datasets for training shadow models, using the confidence scores of target samples to infer membership. Subsequently, Xie et al. [
26] proposed the ALGANs algorithm, which employs active learning to label generated data, resolving label ambiguity issues when using target models for labeling.
Gradients shared by participants with the central server during FL training can also expose training data. Nasr et al. [
27] proposed using gradient updates to train attack models, achieving membership inference accuracies of 72.2% for honest participants and 79.2% for honest-but-curious servers. However, this approach imposes significant computational and storage requirements. Melis et al. [
28] exploited sparsity differences in embedding layer gradients in text classification models for membership inference. Pasquini et al. [
29] demonstrated that a malicious server could easily bypass the Secure Aggregation (SA) algorithm in federated learning by forging malicious parameters, posing a significant threat to the privacy of member data. Nguyen et al. [
30] proposed an attack that circumvents Local Differential Privacy (LDP) by modifying a small number of parameters in the first two layers of the model, making gradient variations a reliable feature for inferring the membership status of the target sample. This attack can be completed within a single global training round.
Loss-based MIAs assume that training samples, due to overfitting, exhibit lower prediction losses compared to non-training samples, making loss values a useful membership feature. Hu et al. [
31] introduced Source Inference Attacks (SIAs), leveraging Bayesian analysis on the prediction loss of target samples generated by local models. They demonstrated that the smaller the loss value, the higher the likelihood that the target sample originates from the corresponding participant’s training data. Suri et al. [
32] proposed subject-level MIAs with three loss-based algorithms, examining global model losses, loss trends during training, and the robustness of losses for specific subject distributions. Wang et al. [
33] extended loss-based MIAs to federated contrastive learning, utilizing cosine similarity loss to train a linear classifier and combining it with the model encoder’s highest prediction output as input features, achieving high attack accuracy.
In certain specialized scenarios, researchers have proposed MIAs based on other distinctive features. Chen et al. [
34] proposed a MIA tailored for FL scenarios with non-overlapping labels. They utilized GANs to generate samples approximating the true data distribution and trained a convolutional classification model. By comparing the model’s predicted labels with the labels declared by participants, they determined the participant to whom the target sample belonged. Yuan et al. [
35] introduced interaction-level MIAs for federated recommendation systems (FedRec). They trained three embedding models to produce sample embedding vectors and leveraged distance inequalities to infer whether a target sample was part of a specific user’s interaction dataset. In encrypted black-box scenarios, Liu et al. [
36] proposed an MIA based on adversarial robustness. This method analyzes the temporal differences in adversarial robustness between member and non-member samples and quantifies these variations using adversarial perturbation distances to facilitate the attack. Furthermore, in black-box scenarios where only the prediction labels are accessible, some existing methods rely on a single minimum adversarial perturbation distance. These methods assume that training samples, due to their frequent involvement in weight updates, exhibit larger minimum perturbation distances near the decision boundary compared to non-training samples [
37,
38]. However, Liu et al. [
39] reported that the loss distributions of training and non-training samples are often very similar, while Xu and Tan [
40] observed that using a single minimum decision boundary distance as the membership feature fails to achieve satisfactory attack performance. Although some studies have improved classification performance through sample calibration [
41,
42,
43], significant limitations persist in decision boundary-based attacks. Notably, studies have also shown that the complexity of decision boundaries near training samples differs from that near non-training samples [
44,
45], which can significantly influence the effectiveness of membership inference attacks.
In summary, existing MIAs often rely on prediction confidence, gradient updates, or loss values as features, which are inaccessible in encrypted FL environments, and single-feature-based methods struggle with complex decision boundaries. To address these limitations, this paper introduces a novel attack leveraging multiple adversarial perturbation distances for more robust membership inference in constrained FL settings.
  3. Threat Model
This section introduces the threat model underlying the proposed MAPD_MIA algorithm in a constrained federated learning (FL) setting. It assumes an honest participant acting as the attacker, aiming to determine whether a specific data sample was part of the FL model’s training dataset. The attacker’s access is restricted to the prediction labels of the global model in the final training round. This section outlines the attacker’s prior knowledge, objectives, and operational constraints, establishing the foundation for the algorithmic workflow discussed in the following section.
In this paper, we assume that an honest participant in a federated learning (FL) scenario acts as the attacker. The attacker’s task is to perform a Membership Inference Attack (MIA) on the global model distributed by the central server at the end of the global training phase. This section outlines the prior knowledge available to the attacker in this scenario, as well as the attacker’s objectives.
  3.1. Attacker’s Prior Information
This paper assumes a constrained FL scenario. To enhance privacy protection, most current FL frameworks have incorporated technologies such as differential privacy and homomorphic encryption, which strengthen the system’s resilience to attacks. Therefore, to simulate a realistic FL environment, an honest participant will be unable to access the following information:
- Model structure and parameter information: The model structure is designed by the central coordinator based on the task and data distribution and is protected through encryption. Participants cannot access the model’s structure or pretrained parameters. 
- Gradients and parameters during training: To prevent the leakage of user data, transmitted gradients and model parameters are typically noise-added or homomorphically encrypted, meaning participants cannot access these details, intermediate results, or prediction confidence scores. 
- Model training configurations: Specific settings, such as the learning rate, optimizer, and training method, are not visible to participants, and the training process is secured to prevent any form of tampering or unauthorized alteration. 
- Information about other participants: Participants are unable to know the data distribution or data types used by other participants. 
In this constrained scenario, an honest participant can only access the following information:
- The local training and testing data that they possess. 
- A label query interface provided after the completion of global model training. The participant can use this interface to obtain the predicted label of an input sample from the global model. 
  3.2. Attacker’s Objective
As an honest participant in FL training, the attacker will behave legitimately under the security protocol specifications but will also attempt to extract information beyond the output from the received data.
This paper assumes that the attacker will perform the following attack: the attacker treats the fully trained global model 
 as the target model. By continuously querying the model’s label prediction interface, the attacker determines whether the target sample 
 belongs to the training data of model 
, therefore conducting a Membership Inference Attack. This process can be formally expressed as follows:
Here,  represents the attack model constructed by the attacker. A label of 1 indicates that the target sample  participated in the training process of model , while a label of 0 indicates that  did not participate in the training process of model .
  4. Workflow of MAPD_MIA
This section details the Membership Inference Attack algorithm based on multiple adversarial perturbation distances (MAPD_MIA). 
Figure 2 illustrates the overall workflow of the proposed algorithm.
The algorithm comprises three key stages: generating multiple adversarial perturbations near the decision boundary, constructing the attack model, and performing membership inference. The first stage, generating multiple adversarial perturbations, involves three essential steps: generating the minimum adversarial sample, adding random noise, performing binary search, and calculating adversarial perturbation distances. 
Table 2 summarizes the symbols and their definitions for easy reference in the following subsections.
  4.1. Generating Multiple Adversarial Perturbations Near the Decision Boundary
This process consists of three main steps: generating the minimum adversarial sample, adding random noise followed by a binary search, and calculating the adversarial perturbation distance. The overall workflow is illustrated in 
Figure 3.
  4.1.1. Generating the Minimum Adversarial Sample
The process begins with generating the minimum adversarial sample for the target sample using the HopSkipJumpAttack algorithm. Since only the label prediction interface is accessible, the attacker can only rely on predicted labels to generate the minimum adversarial sample. HopSkipJumpAttack is a label-only black-box attack algorithm that leverages the geometric properties of the decision boundary. It iteratively generates adversarial samples closest to the target sample by approximating the gradients of the boundary [
20].
For the target sample 
, the attacker identifies initial samples for each adversarial class from the local dataset, denoted as 
. Then, using the Targeted HopSkipJumpAttack algorithm, the attacker obtains adversarial samples 
 on the decision boundary of the target model 
. The adversarial sample with the smallest perturbation distance is selected as the minimum adversarial sample 
. The decision boundary it resides on is regarded as the neighboring classification decision boundary of the target sample. The detailed algorithm is provided in Algorithm 1.
          
| Algorithm 1 Generating the minimum adversarial sample | 
| Input: Target sample , initial adversarial samples , target model , maximum number of iterations , gradient query count B, binary search threshold Output: The minimum adversarial sample  for  on the classification decision boundary of model
   1:  2:  3:  4:return
 | 
  4.1.2. Adding Random Noise and Performing Binary Search
After obtaining the minimum adversarial sample  for the target sample in the previous step, the second step involves adding random noise to  and employing a binary search to generate other adversarial samples near the decision boundary.
Specifically, the attacker adds multidimensional random noise  with magnitude  to the minimum adversarial sample , generating a new adversarial sample . However,  may not lie on the decision boundary surface, especially if the added noise is too large. To ensure that the adversarial sample  lies on the surface of the decision boundary and that the Euclidean distance between the obtained boundary adversarial sample  and the minimum adversarial sample  matches the magnitude of the added noise, this paper employs a binary search algorithm to derive the boundary adversarial sample .
The binary search algorithm follows the steps illustrated in 
Figure 4. It is worth noting that while 
Figure 4 provides a simplified 2D visualization, the perturbations are applied to the entire input in its original high-dimensional space. For example, if the input is an image, the perturbations are distributed across all pixels of the image. The process of the binary search algorithm is as follows: (1) Determine the vector 
a from the minimum adversarial sample 
 to the target sample 
, and select points 
 and 
 in the positive and negative directions of 
a, such that their Euclidean distance from 
 equals the noise magnitude 
. (2) Use the class of the adversarial sample 
 to determine the next binary search endpoints 
 and 
. If 
 belongs to the adversarial class, update 
 to 
; otherwise, keep 
 unchanged. (3) Take the midpoint of the updated 
 and 
 as 
, and extend the line from 
 to 
 to intersect a circle centered at 
 with radius 
 at point 
. Use the class of 
 to determine the next binary search endpoints 
 and 
. (4) Repeat step (3) until the Euclidean distance between 
 and 
 is smaller than a certain threshold. The final 
 is the desired boundary adversarial sample 
. The detailed algorithm is provided in Algorithm 2.
          
| Algorithm 2 Adding random noise and performing binary search | 
| Input: Target sample , minimum adversarial sample , target model , noise , threshold Output: Adversarial sample  on the decision boundary of model
   1:  2:  3:  4:if  then  5:      6:else  7:      8:end if  9:while  do10:    11:    12:    if  then13:        14:    else15:        16:    end if17:end while18:19:return 
 | 
  4.1.3. Calculating Adversarial Perturbation Distance
Finally, the adversarial perturbation distance  is obtained by calculating the Euclidean distance (L2 distance) between the target sample  and the adversarial sample .
By varying the magnitude and direction of the noise, the attacker can generate multiple groups of adversarial perturbation distances, which serve as member features for training the attack model. Specifically, the member features can be represented as follows:
Here, , represents the m-th adversarial perturbation group. Each perturbation group consists of K adversarial perturbations, where the subscript m denotes the index of the adversarial perturbation group, and different groups are generated with varying levels of noise. In total, there are M groups of perturbations. The superscript indicates the index of the adversarial perturbation within the same group, with K noise points selected from noises of the same magnitude to form a single perturbation group.
  4.2. Construction of the Attack Model
Unlike most MIAs targeting centralized machine learning, in FL systems, an honest participant inherently has access to a portion of the global model’s training and testing data, which allows them to observe the true distribution of all training data in the model. Thus, the participant can utilize its local data as a shadow dataset to construct the attack model. Since the local dataset already contains samples that participated in the training of the global model, the attacker does not need to build an additional shadow model. From another perspective, the target model itself can be considered the attacker’s shadow model.
The attacker’s shadow dataset 
 consists of the local training set 
 and testing set 
. For each sample in the shadow dataset, the attacker extracts the membership features by generating multiple adversarial perturbations near the decision boundary. To enrich the attack features, the attacker generates multiple groups of adversarial perturbations across various decision boundaries. Specifically, for the decision boundary corresponding to class 
c, the attacker uses 
l groups of noise with different magnitudes to generate multiple groups of adversarial perturbations for that class, represented as 
. By varying the adversarial classes, the attacker obtains the membership features in the following form 
F:
Here, N represents the number of adversarial classes for the sample. The attacker computes the aforementioned features for each sample in the local training set , therefore obtaining the corresponding membership features , where the membership label is marked as “1”, indicating that the sample participated in the training of the target model. Similarly, the attacker computes the same features for each sample in the local testing set , obtaining the membership features  and labeling them as “0”, meaning the sample did not participate in the training process of the target model.
To address the binary classification problem in MIAs, selecting an appropriate attack model is critical. Since the membership features used in this paper include multiple adversarial perturbation distances, some of these features may not be useful for inferring membership. Thus, the model needs the ability to filter out irrelevant information. Additionally, the large volume of adversarial perturbations results in a high-dimensional feature space, necessitating a model that can maintain computational efficiency without incurring excessive computational costs. Given that XGBoost’s efficient parallel algorithm significantly enhances training efficiency, and research by Grinsztajn et al. [
46] has shown that tree-based models outperform deep learning models in handling data with irrelevant information, this paper selects XGBoost, a representative ensemble learning algorithm, as the attack model to achieve both efficiency and accuracy in membership inference.
To train a well-performing attack model, the attacker extracts the membership features from  and , using the resulting pairs (, “1”) and (, “0”) as training samples for the attack model, therefore constructing the attack model’s training dataset. Through supervised learning, the attacker can build an XGBoost attack model for membership inference.
  4.3. Membership Inference
After training the attack model, the membership inference phase is equivalent to testing the attack model. At this stage, the participating attacker needs to infer the membership status of an unknown target sample , determining whether it participated in the global model’s training process.
The procedure is as follows: the attacker uses the prediction label interface of the target model  and the local dataset . First, the attacker selects an initial adversarial sample from  that has a different predicted class from the target sample . Then, using this initial adversarial sample and , the attacker applies the HopSkipJumpAttack algorithm to obtain the corresponding minimum adversarial sample. Finally, by adding different levels of noise to the minimum adversarial sample and performing a binary search, multiple adversarial samples near different decision boundaries are generated. This results in a group of adversarial perturbation distances F for  across different decision boundaries, which are then used as the membership features input into the attack model.
Before inputting into the XGBoost model, the attacker needs to flatten the multi-set adversarial perturbation features F of  into a multidimensional vector, where each value represents a specific perturbation distance for the target sample . These membership features F not only reflect the complexity of the sample’s neighboring decision boundaries but also its adversarial robustness. Thus, inputting these features into the attack model enables effective membership inference for the sample.
Additionally, the attacker can concatenate membership feature vectors from multiple target samples into a matrix and input it into the attack model. The XGBoost attack model will then output prediction scores for each sample, where the highest score corresponds to the sample’s predicted membership label, either “1” (member) or “0” (non-member), therefore completing the membership inference for the target sample.
  5. Experiment
To validate the effectiveness of the proposed Membership Inference Attack algorithm based on multiple adversarial perturbation distances (MAPD_MIA) and to compare its performance with existing methods, this section presents a series of experiments. These experiments are designed to evaluate the performance of the attack methods under various conditions across multiple datasets. The specific experimental setup and evaluation results are detailed below.
  5.1. Experimental Setup
  5.1.1. Datasets
This paper utilizes three image classification datasets—CIFAR10, CIFAR100, and MNIST—to evaluate the performance of the proposed MIA. The details of each dataset are as follows:
- CIFAR10: A widely used dataset in computer vision, consisting of 60,000 32 × 32 color images across 10 categories. The dataset is split into 50,000 training images and 10,000 test images. 
- CIFAR100: An extension of CIFAR10, containing 60,000 images in total but spanning 100 categories. Similar to CIFAR10, it is divided into 50,000 training images and 10,000 test images. 
- MNIST: A classic dataset for handwritten digit recognition, composed of 60,000 28 × 28 grayscale images. The dataset is divided into 50,000 training images and 10,000 test images. 
  5.1.2. Target Model
To assess the risk of member privacy leakage in real-world federated learning (FL) scenarios, this paper configures 5 participants and 1 central coordinator to train a global model using the federated averaging algorithm. Both local and global training processes are set to run for 10 iterations. The training data for each dataset is randomly divided into 5 parts and distributed to the participants.
For each dataset, different FL models are constructed. The specific model configurations are provided in 
Table 3. All models are trained using the Adam optimizer, with the learning rates set to 0.001 for CIFAR10 and CIFAR100 and 0.01 for MNIST.
  5.1.3. Attack Parameters
In generating multiple adversarial perturbations near the decision boundary of a sample, this paper employs the Targeted HopSkipJumpAttack algorithm from the open-source library FoolBox to generate the minimum adversarial samples. The number of iterations is set to 64, with an initial gradient query count of 100 and a maximum query count of 10,000. When adding random noise to the minimum adversarial sample, the noise magnitude starts at 1.5, and five groups of random noise are configured with an interval step of 1.5. Each group contains 100 randomly selected noise values.
From the five participants, one is randomly selected as the attacker. The attacker uses the multiple adversarial perturbations of the target sample as input features for the attack model, which outputs the membership predictions. The attack model is implemented using the XGBoost open-source library, with training data randomly selected from the attacker’s local dataset. In this paper, 1000 training samples and 1000 test samples from the attacker’s local data are used to generate the corresponding adversarial perturbation groups, which are then used as training data to construct the attack model.
  5.1.4. Evaluation Metrics
Most existing MIAs typically employ a balanced evaluation set with an equal number of member and non-member samples. However, Xu and Tan [
40] discovered that traditional balanced evaluation sets may include misclassified samples, leading to a mix of attack performance and baseline classification performance, which fails to accurately reflect the effectiveness of the attack algorithm. To avoid this issue, this paper selects an equal number of correctly classified samples from the target model’s training and testing sets, ensuring that the evaluation set does not contain any samples used in training the attack model.
This paper adopts 
, 
, 
, and 
 as evaluation metrics to comprehensively assess the predictive performance of the attack model. Accuracy evaluates the overall effectiveness of the attack, representing the proportion of samples whose membership status is correctly inferred by the attack model. Precision assesses the accuracy of the attack when predicting a sample as a member, indicating the proportion of predicted members that are indeed actual members. Recall evaluates the ability of the attack to identify member samples, representing the proportion of actual members that are correctly identified as members. F1-score is the harmonic mean of precision and recall, ranging from 0 to 1, where a higher value indicates better performance. The formulas for these metrics are as follows:
Here,  represents the number of samples correctly classified by the attack model, while  denotes the total number of samples in the evaluation.  (True Positive) refers to the number of actual members correctly identified as members,  (False Positive) represents the number of non-members incorrectly classified as members and  (False Negative) indicates the number of actual members misclassified as non-members.
  5.1.5. Comparison Methods
The following three MIA methods are selected to compare with the performance of the proposed attack algorithm:
- Gap Attack: This is a basic MIA method [ 47- ], which labels correctly predicted samples by the target model as member samples and incorrectly predicted samples as non-member samples. On the balanced evaluation set used in this paper, the accuracy of this attack is constrained to 0.5, and it is used as the baseline attack method for evaluation. 
- Boundary Attack: Proposed by Choquette-Choo et al. [ 38- ], this method assumes that training samples are farther from the decision boundary. The attack generates minimum adversarial samples using the untargeted HopSkipJumpAttack and computes the perturbation distance between the target sample and the adversarial sample as the membership feature. To improve stability and attack performance [ 40- ], this paper replaces the adversarial sample generation with the Targeted HopSkipJumpAttack algorithm and sets the maximum query budget to 10,000 in the experiments. 
- TEAR: Proposed by Liu et al. [ 36- ], this attack is designed for restricted FL systems. It infers membership by analyzing the changes in the adversarial robustness of the target sample across training rounds. The attacker uses a black-box adversarial sample generation algorithm to obtain decision boundary samples and trains XGBoost with multiple rounds of adversarial perturbation evolution matrices to perform the attack. To align with the FL attack scenario of this paper, only the global model from the final training round is used as the target model for TEAR, while other experimental settings remain unchanged. 
  5.2. Evaluation Results
Through the experimental setup and evaluation methods described above, this section aims to validate the effectiveness of the proposed attack method. Next, we will present the experimental results under different datasets and parameter settings.
In this paper, member samples are treated as positive examples, and the proposed attack method based on multiple adversarial perturbation distances was evaluated alongside three comparative attack methods across three datasets. The evaluation results are shown in 
Table 4, while 
Table 5 presents the training and test accuracies of the target models on each dataset to reflect the degree of overfitting.
As observed in 
Table 4, the member inference attack based on multiple adversarial perturbation distances outperformed other attack methods in terms of accuracy, precision, and F1 score across all three datasets, demonstrating its superior inference performance. Specifically, it achieved accuracy rates of 0.595, 0.630, and 0.687 on the MNIST, CIFAR10, and CIFAR100 datasets, respectively, with corresponding precision scores of 0.558, 0.590, and 0.659. These results highlight the advantages of the multiple perturbation approach compared to attacks relying on a single perturbation distance. 
Section 5.3.2 and 
Section 5.3.3 will further discuss the impact of the number of adversarial perturbations on the attack performance.
However, the recall rate of this method is slightly lower than that of Liu et al.’s [
36] TEAR algorithm. This may be due to the conservative nature of the multiple adversarial perturbation method when identifying non-member samples, leading to some member samples being misclassified and consequently affecting the recall rate. Additionally, nearly all attack methods performed better on the CIFAR100 dataset than on CIFAR10. This discrepancy can be attributed to two factors: first, the target model’s accuracy gap on CIFAR100 is larger than that on CIFAR10 (as shown in 
Table 5), indicating a higher degree of overfitting in the CIFAR100 model, which increases the risk of training data leakage. Second, CIFAR100’s higher number of classes and more granular classification tasks result in greater decision boundary complexity, making member information easier to leak.
It is also worth noting that the Boundary Attack proposed by Choquette-Choo et al. [
38] performed worse in terms of accuracy compared to TEAR and the multiple perturbation distance attack, with its accuracy on CIFAR10 even falling below that of the Gap Attack. This suggests that relying on a single adversarial perturbation distance is ineffective in distinguishing member from non-member samples in a balanced evaluation set that contains only correctly classified samples. Although TEAR improves the adversarial sample generation method, its reliance on a single perturbation distance limits performance gains. In contrast, the attack based on multiple adversarial perturbation distances provides a more comprehensive reflection of the complexity of decision boundaries and adversarial robustness, leading to superior performance in distinguishing member from non-member samples.
Overall, the experimental results validate the effectiveness and superiority of the proposed method in handling complex FL scenarios.
  5.3. Ablation Study
To further evaluate the effectiveness of the Membership Inference Attack algorithm based on multiple adversarial perturbation distances (MAPD_MIA), this section conducts ablation experiments to explore the impact of random noise amplitude and the number of adversarial perturbations on attack performance. Additionally, it investigates the risks of member privacy leakage across different global training rounds and reveals the connection between decision boundary complexity and the risk of member privacy leakage.
  5.3.1. Impact of Random Noise Magnitude
In attacks based on multiple adversarial perturbation distances, the magnitude of random noise determines the distance between the generated boundary adversarial sample and the minimum adversarial sample. This section evaluates the impact of random noise amplitude on attack performance using the MNIST and CIFAR10 datasets. To ensure that the results are influenced solely by noise amplitude, only one group of random noise with the same magnitude was used to generate adversarial perturbations, with each group containing 100 noise points. The experimental results are shown in 
Figure 5.
As demonstrated in 
Figure 5, varying random noise amplitudes lead to different attack performances on the MNIST and CIFAR10 datasets. Specifically, when the noise magnitude is small, the attack performance is relatively weak. For instance, when the noise amplitude is set to 1, the attack accuracy on MNIST and CIFAR10 is 0.51 and 0.56, respectively. As the noise amplitude increases, although some fluctuation is observed, the overall attack performance improves. Notably, when the noise amplitude is 10, the attack accuracy on MNIST reaches 0.61, while on CIFAR10, the accuracy is 0.615 when the noise amplitude is 6.
To further explain the variation in attack performance, the variance of each adversarial perturbation group was calculated to analyze the complexity of the decision boundary near the samples (see 
Figure 6). The results indicate that with smaller noise amplitudes, the corresponding perturbation variance for both training and testing samples remains low, with training samples showing a larger variance. This suggests that when boundary adversarial samples are closer to the minimum adversarial sample, the decision boundary tends to be smoother, failing to fully capture the complexity of member samples. As the noise amplitude increases, the perturbation variance grows, and the difference between training and testing samples becomes more pronounced, indicating that when the boundary adversarial samples are farther from the minimum adversarial sample, the decision boundary becomes more complex, reflecting more member-specific information. This finding further reveals the close connection between the complexity of member samples’ decision boundaries and the risk of privacy leakage. Notably, the adversarial perturbation variance is generally higher for MNIST samples compared to CIFAR10, which may be attributed to the simplicity of MNIST samples. Under the same number of training iterations, the model tends to fit MNIST samples more effectively, making the decision boundaries around MNIST training samples more complex.
  5.3.2. Impact of Perturbation Quantity Within a Perturbation Group
In the previous section, we discussed the impact of random noise magnitude on attack performance. In this section, we will further explore the effect of the number of perturbations within a single adversarial perturbation group on attack performance under a fixed noise magnitude. The experiments were conducted on the MNIST and CIFAR10 datasets, where the same random noise with a magnitude of 6 was used to generate the perturbations within the same adversarial group. By adjusting the number of perturbations in the group, we evaluated its impact on the effectiveness of the attack. The experimental results are shown in 
Figure 7.
The results indicate that within an adversarial perturbation group, attack performance improves as the number of perturbations increases. For example, on the CIFAR10 dataset, the accuracy is 0.555 when the number of perturbations is 20, and it rises to 0.615 when the number increases to 100, with the trend continuing upwards. Similarly, on the MNIST dataset, the accuracy is 0.525 with 20 perturbations and increases to 0.575 with 80 or 100 perturbations, showing some minor fluctuations but an overall upward trend. These findings suggest that increasing the number of perturbations significantly enhances attack performance by capturing more detailed decision boundary information, therefore improving the inference capability of the attack model.
  5.3.3. Impact of the Number of Perturbation Groups
The previous experiments only analyzed the relationship between a single group of adversarial perturbations and attack performance. In this section, we conduct experiments using multiple adversarial perturbation groups to evaluate how the number of perturbation groups affects attack performance. For each group, different noise magnitudes are used to generate the corresponding adversarial perturbations. Specifically, five groups of perturbations were generated with noise magnitudes of 1.5, 3.0, 4.5, 6.0, and 7.5, respectively, and each group contained 100 perturbation values. By varying the number of adversarial perturbation groups, we explore their impact on the proposed attack’s performance. The experimental results are shown in 
Figure 8.
The results indicate that increasing the number of adversarial perturbation groups significantly improves attack accuracy. For example, on the CIFAR10 dataset, the accuracy achieved with a single perturbation group is 0.615, and when the number of groups increases to three, the accuracy improves to 0.655. On the MNIST dataset, the attack accuracy with a single group is 0.56, and it increases to 0.595 when five perturbation groups are used, outperforming the single-group attack. However, it is worth noting that for the CIFAR10 dataset, when a fourth group of adversarial perturbations is added, the attack accuracy decreases slightly. This could be due to the additional group containing a significant amount of irrelevant information, which interferes with the attack model’s performance.
In summary, increasing the number of adversarial perturbation groups enhances the attack by providing more detailed and extensive coverage of the decision boundary region, thus improving membership inference performance. This also demonstrates the importance of decision boundary complexity in member privacy leakage. However, adding too many perturbation groups with redundant information may negatively impact the attack model’s effectiveness.
  5.3.4. Membership Privacy Leakage Risk Across Different Training Rounds
The proposed attack primarily targets the global model at the final round of FL training. However, in practical scenarios, attackers may initiate membership inference attacks during the training phase across different global training rounds. Therefore, this experiment evaluates the performance of the proposed MIA on global models at various training rounds. Specifically, we assess the attack performance on the CIFAR10 and MNIST datasets after the 1st, 5th, and 10th (final) rounds of global training to determine the risk of member privacy leakage at each stage. The experimental results are presented in 
Table 6.
As shown in 
Table 6, the attack performance improves as the training progresses. For example, on the CIFAR10 dataset, the attack accuracy is 0.56 after the 1st round, 0.595 after the 5th round, and reaches 0.630 after the 10th round. Similarly, on the MNIST dataset, the attack accuracy increases from 0.585 and 0.575 in the 1st and 5th rounds, respectively, to 0.595 in the 10th round. These results indicate that as the training rounds increase, the model becomes progressively more overfitted, therefore increasing the privacy leakage risk for member samples.
Overall, the proposed Membership Inference Attack based on multiple adversarial perturbation distances (MAPD_MIA) consistently maintains high accuracy across different training rounds. This further validates the effectiveness of the attack method in accurately inferring member samples throughout the training process and demonstrates that member privacy leakage risks persist and intensify regardless of the training stage.
  5.4. Evaluation of Existing Defense Mechanisms
To reveal the threat posed by the proposed MIA on member data privacy within a FL context, we evaluated the attack using two common defense mechanisms: MemGuard [
48] and DP-SGD [
49]. MemGuard follows the function design from Choquette-Choo et al. [
38], while DP-SGD was implemented through the open-source library Opacus, with a target privacy budget of 40, target probability limit of 
, and a gradient clipping value of 1.2. The defense evaluation experiments were conducted on the CIFAR10 and MNIST datasets, with the model setup and attack parameters consistent with 
Section 4.1.2 and 
Section 4.1.3. The target model used was the final global model after the last training round, with the HopSkipJumpAttack algorithm generating the minimum adversarial samples and the XGBoost model selected as the attack model. The evaluation results of the attack under both defenses are presented in 
Table 7.
As shown in 
Table 7, MemGuard failed to reduce the performance of the attack based on multiple adversarial perturbation distances, indicating that merely adding noise to the model’s confidence output is insufficient to protect member privacy in an FL environment. MemGuard’s limited defense effectiveness stems from its inability to alter the original model’s classification decision boundary, allowing label-only MIAs to bypass its defense easily.
On the other hand, DP-SGD exhibited some defensive effectiveness on the CIFAR10 and MNIST datasets, with reductions in attack accuracy and precision. However, even under the defense provided by DP-SGD, the attack still achieved accuracies of 0.560 and 0.580 on CIFAR10 and MNIST, respectively, outperforming the baseline attack performance of 0.5. Furthermore, DP-SGD also caused a decrease in model test accuracy, dropping from 0.6051 to 0.4946 on CIFAR10 and from 0.9891 to 0.9618 on MNIST. This indicates that while maintaining model usability, the proposed attack remains effective against models protected by differential privacy frameworks, suggesting a significant privacy leakage risk for member data in federated settings.
  6. Conclusions
This paper presents MAPD_MIA, a novel Membership Inference Attack tailored for encrypted federated learning scenarios, leveraging multiple adversarial perturbation distances. Building on the observation that training samples exhibit significantly greater variation and complexity in adversarial perturbations near decision boundaries compared to non-training samples, this approach generates multiple adversarial perturbation distances between adversarial samples and the target sample as membership features. These features are subsequently used to train a well-constructed attack model, enabling efficient membership inference. Extensive experiments conducted on CIFAR10, CIFAR100, and MNIST datasets demonstrate that the proposed attack method significantly outperforms three existing algorithms in terms of inference performance and exhibits strong robustness against common defense mechanisms. This study provides a novel perspective on safeguarding member data privacy in federated learning environments, highlighting the critical role of decision boundary complexity in privacy leakage and advocating for the development of more effective privacy-preserving mechanisms.
Despite its effectiveness, certain limitations of the proposed method warrant further exploration. One notable challenge is the reliance on a substantial number of adversarial perturbations to achieve optimal performance, which introduces computational complexities, particularly in resource-constrained environments. Our future work will focus on optimizing the algorithm’s computational efficiency to reduce resource demands and improve its applicability in such environments. Furthermore, we plan to extend the applicability of MAPD_MIA to diverse data types, such as text and tabular data, and validate its performance across diverse federated learning scenarios to enhance its versatility and generalizability.