1. Introduction
Federated learning (FL) is a distributed deep learning framework where end users do not share their private data and only send the model updates to a central server [
1]. Unlike traditional machine learning, FL can build a global model without requiring the data to be uploaded by end-user devices. This approach both prevents data leakage and ensures the data security of each user to a great extent. In FL, end users train the model on their own dataset, which is downloaded from the server. After all model updates are sent to the server, they are aggregated to create an updated global model. This cycle continues until the federated learning round that the global model converges to a low loss value. Thus, end users’ identity information and their own data are protected in the FL distributed data system. It also increases scalability and accuracy by distributing the learning load across end-user devices [
2].
Despite these remarkable advances, the distributed nature of FL is prone to significant security issues such as privacy protection and data security. The types of attacks that are carried out by exploiting these vulnerabilities and are most emphasized in the literature are poisoning and communication bottleneck attacks [
3]. In this study, we consider a scenario of a distributed network that is subject to “poisoning attacks”, which is the most common in the literature, and on which defense algorithms have been developed.
Figure 1 shows poisoning attacks and their types. The figure illustrates the taxonomy of poisoning attacks in FL environments. It presents a hierarchical breakdown of two key aspects: attack methods and attack goals. Under attack methods, the diagram identifies two main categories: data poisoning and model poisoning. The attack goals are similarly divided into two types: target attacks, which are associated with backdoor mechanisms, and untargeted attacks, which are related to Byzantine approaches. This classification helps organize and understand the different strategies malicious actors might employ to compromise federated learning systems. The related works under these categories are discussed further in
Section 2.
This paper introduces an NC-FLD that fundamentally differs from existing approaches by dynamically identifying and analyzing only the most significant neurons across model layers rather than examining entire gradient spaces. Unlike conventional methods that treat all parameters equally, our approach selectively focuses on neurons exhibiting the highest training impact, creating targeted feature vectors that more effectively capture attack signatures while reducing computational overhead. This neuron-centric perspective represents a paradigm shift from traditional defenses, enabling robust performance even with high malicious participation rates and under non-IID data distributions that typically challenge existing systems.
Poisoning attacks can be divided into two sub-branches according to their goal and method. In terms of attack goals, backdoor attacks are targeted attacks because they focus on a specific sub-task, and Byzantine attacks are non-targeted attacks because they can be carried out randomly and at a certain rate.
This paper specifically addresses poisoning attacks in federated learning, which represent one of the most significant security threats to this distributed paradigm. Unlike other security concerns such as gradient leakage or model inversion that primarily compromise privacy, poisoning attacks directly undermine model utility and performance. These attacks occur in two primary forms: data poisoning, where malicious users intentionally mislabel their training data (e.g., through label flipping), and model poisoning, where attackers manipulate model updates before sending them to the server (e.g., through gradient sign flipping or adding targeted noise). The distributed nature of FL makes these attacks particularly challenging to detect, as the server cannot directly inspect client data or training processes. Our work focuses on developing robust defenses against both attack types, even when malicious participants constitute a substantial portion of the network.
This paper has three important research contributions:
A new algorithm called NC-FLD is proposed that detects malicious users and removes them from the training cycle of FL. The FL training cycle has been made resistant to data poisoning and model poisoning attacks. By examining the scenarios in which the rate of malicious user participation is up to 40% in the network, it has been proven that the success of the model is maintained with the proposed defense algorithm.
It has been proven that dynamically selecting the most effective layers and neurons while training with the NC-FLD algorithm significantly contributes to the defense. The performance of our proposed security approach in various attack scenarios on three different datasets has been demonstrated quantitatively with the latest literature studies.
The implementation of a dynamic neuron selection strategy has demonstrated enhanced robustness in FL defense mechanisms across diverse data distributions, encompassing both independent and identically distributed (IID) and non-IID scenarios, while simultaneously mitigating the impacts of data and model poisoning attacks. The distribution characteristics of datasets significantly influence training performance. Consequently, models trained on non-IID datasets may be falsely classified as malicious, potentially leading to false positive identifications in attack detection protocols.
This paper is structured as follows:
Section 2 introduces a comprehensive literature review,
Section 3 includes the details of the proposed method,
Section 4 includes attack scenarios and the results of the NC-FLD method based on extensive experiments, and finally,
Section 5 includes the discussion section.
2. Related Work
The security challenges in federated learning have attracted significant research attention, with various defense mechanisms proposed to counter poisoning attacks. This section reviews key developments in this domain, highlighting the limitations that our neuron-centric approach aims to address.
In the studies on FL, attack diversity and scenario changes were made. However, in most studies involving defense algorithms, the effects on the defense mechanism were ignored against high rates of malicious participants or the model success could not be maintained [
4]. In addition, a few studies in the literature have evaluated model success by jointly examining scenarios in which there are multiple attacks such as data poisoning and model poisoning within the FL framework.
Data poisoning, which is a sub-branch of poisoning attacks carried out according to the method, is a type of attack in which malicious end users or users aim to change the label of samples from one class to another one. This type of attack, which is carried out at the very beginning of the training phase, is a vulnerability caused by FL giving itself all authority over end users’ local data. Model updates obtained by using mislabeled data can classify samples from the correct class as incorrect class label data with a high probability during the testing phase.
2.1. Defenses Against Data Poisoning Attacks
One of the first studies in the literature against data poisoning was by Tolpegin et al. [
5]. By applying Principal Component Analysis (PCA) to the multidimensional gradients uploading to the main server to detect the label-flipping attack, they aimed to show the attackers to a projection away from benign users. They proved that after PCA operation, attackers can separate from active users as long as they do not exceed a certain rate. However, no defense was proposed to eliminate the attackers from the network.
Jebreel et al. [
6] aimed to detect label-flipping attacks with distinctive features by examining the angular similarities of the gradients sent to the server by users. In the first step of the defense algorithm, the most relevant neurons connected to the output layer are statically selected. In the second step, dimension reduction and classification were performed with PCA to reduce surplus information and make a more effective discrimination. Isik-Polat et al. [
4] approached it from a different perspective than other studies in the literature and proposed a new approach called ARFED instead of using dimensionality reduction methods to distinguish between malicious and benign users. In their proposed defense algorithm, they used the Inter Quartile Range (IQR) method in Euclidean distances to detect the attacker by looking at the differences between local users’ model updates and aggregated gradients in the global model. However, they assumed that the majority of end users are benign users by assuming the average of the gradients converges to the benign users. Therefore, as the number of malicious users increases, attackers will become more difficult to detect, as the average of all gradients will shift towards the malicious region.
Another study in the literature belongs to Shen et al. [
7]. They proposed the LFR-PPFL algorithm. In the first step of the algorithm, a temporal analysis on cosine similarity was performed to detect malicious users. The second step was to implement this detection method and protect the privacy of customers by implementing a new protocol with homomorphic encryption. However, since the work performed requires extra encryption, it raises concerns about the temporal complexity of the algorithm. Mouri et al. [
8] investigated the effects of data poisoning attacks on four different datasets in different aggregation scenarios in FL. Existing later in the literature, they made a comparison using k-Nearest Neighbor (k-NN) [
9], Histogram-Based Outlier Detection [
10], and Copula-Based Outlier Detection [
11] defense methods. However, the experimental results have not been compared with other state-of-the-art defense methods.
2.2. Defenses Against Model Poisoning Attacks
In the model poisoning attack scenario, which is the second type of poisoning attack performed according to the method, end users perform the training with true data. However, later, attackers turn the model updates of benign users into poisonous model updates by applying attack types such as changing the sign, adding noise or a standard deviation amount, and diverting the direction of the gradient.
One of the important defense algorithms implemented in the literature is Shi et al.’s [
12]. They proposed a method to detect malicious users against model poisoning attacks by performing dimensionality reduction to an F space. The aim here is to maximize the distance of gradient projections between malicious users and benign users. To protect the privacy of users, in addition to the defense system, the functional encryption method constitutes the second step of the defense algorithm. Among the model poisoning attacks, Gaussian, sign-flipping, and ALIE attacks were performed. Another exemplary study in the literature is the study by Fang et al. [
13]. MKrum [
14], Bulyan [
15], Trimmed [
16], and Median [
16] algorithms have been proposed in FL, which are resistant to model poisoning. Another defense method, TRIM [
17], selects the training data in a way that minimizes the loss function. Data other than the selected training data are considered harmful training data and are eliminated from the system. Yang et al. [
18] proposed an algorithm called DeMAC, which is based on the idea that malicious users’ model updates reduce learning loss. Since there will be significant increases in the gradient norm of malicious model updates, the GradScore metric has been defined to measure this norm. They stated that the GradScore of bad and benign users differs from each other at all training stages. In the system they designed dynamically, they created a defense against model poisoning by taking advantage of historical records of global model updates. Yan et al. [
19] proposed a defense system based on the aim of detecting the difference between model gradients, similar to the work in [
18]. They test their proposed method under several model poisoning attacks such as LIE and min–max. Considering the studies in the literature, gradient selection that carries valuable information and feature vectors created from selected gradients play an important role in detecting malicious users. Finally, Xu et al. [
20] brought a different perspective to the literature. In their proposed defense algorithm, include malicious users in FL collection rounds instead of directly kicking them off the network. However, they assign different weights at the time of FL aggregation to model gradients that they mark as malicious among local updates considered as a whole. In this way, the impact of malicious users on the federated learning network is reduced. It has been stated that the defense part of the proposed study is effective in various datasets and different distributions. For this reason, it is predicted that it can adapt to different data types and features.
2.3. Neuron-Level Defenses in Distributed Learning
Jebreel et al. [
21] introduce FL-Defender, a novel method to combat targeted poisoning attacks in FL. The authors analyze the behavior of label-flipping and backdoor attacks, finding that attack-related neurons in the last layer of deep learning models exhibit different behavior from unrelated neurons. Based on this insight, they engineer robust discriminative features by calculating worker-wise angle similarity for last-layer gradients and compressing the resulting similarity vectors using PCA. FL-Defender then re-weights workers’ updates based on their deviation from the centroid of these compressed similarity vectors. The method is designed to be effective regardless of data distribution or model size, addressing the limitations of existing defenses. Experiments on three datasets with different deep learning models and data distributions demonstrate FL-Defender’s effectiveness in defending against label-flipping and backdoor attacks, achieving the lowest attack success rates while maintaining global model performance on the main task and causing minimal computational overhead on the server compared to several state-of-the-art defenses.
Recent research has increasingly focused on examining neural network behavior at the neuron level to detect poisoning attacks. Jebreel et al. [
21] introduced FL-Defender, which analyzes the behavior of neurons in the last layer of deep learning models during poisoning attacks. They discovered that attack-related neurons exhibit distinctly different behavior patterns compared to unrelated neurons, allowing for more effective detection of malicious updates. Their approach focuses specifically on the last layer, calculating worker-wise angle similarity for these gradients and then applying dimensionality reduction.
Similarly, Sameera et al. [
22] developed LFGuard, which evaluates received models on an auxiliary dataset and extracts activations from the last layer to identify malicious participants. Their approach uses a Multi-Class Support Vector Machine (MCSVM) to classify updates as malicious or legitimate, showing particular effectiveness in vehicular network settings. LFGuard evaluates each received model on an auxiliary dataset, extracts the activations of the last layer, and feeds them into the MCSVM to classify the model as malicious or legitimate. The method’s effectiveness was evaluated against targeted label-flipping attacks with various numbers of malicious participants, considering both single-label and multi-label-flipping scenarios under different poisoning intensities. The authors assessed LFGuard’s efficacy against different state-of-the-art techniques using three benchmark datasets (Fashion-MNIST, EMNIST, and GTSR) to showcase its performance in mitigating poisoning attacks in both IID and non-IID environments. The findings indicate that LFGuard outperforms prior studies in thwarting targeted label-flipping attacks, demonstrating more than 5% improvement in global model accuracy, 12% in source class recall, and a 6% reduction in attack success rate while maintaining high model utility.
While these approaches demonstrate the value of neuron-level analysis, they typically focus on specific layers (particularly output layers) rather than examining neurons across the entire network architecture. Jiang et al. [
23] explored model pruning in federated learning contexts, demonstrating that selecting the most impactful neurons can maintain model performance while reducing computational requirements. Though their work focused on efficiency rather than security, it provides important insights into the relationship between neuron importance and model functionality.
In light of these findings, we propose a novel mechanism for the strategic selection of key neurons to be utilized in decision-making processes for defensive measures in FL environments.
3. Methods
Defense mechanisms against poisoning attacks typically employ the approach of detecting differences in model updates between benign and malicious users. Most of the studies in the literature create a defense mechanism by detecting malicious users and removing them from the training cycle, instead of repairing the model parameters coming from malicious users. By adhering to this defense strategy that exists in the literature, our aim is to exclude malicious users from the training cycle. By examining model updates from participants, we select impactful and meaningful neurons. The NC-FLD approach includes model classification in low-dimensional space to classify users using selected neurons.
3.1. Theoretical Foundation for Neuron-Centric Defense
The rationale for selecting neurons as the unit of defense, rather than entire layers or arbitrary parameters, is supported by both theoretical considerations and empirical observations. Neural networks naturally develop specialized neurons that respond to specific input patterns during training [
21]. Our preliminary experiments revealed that poisoning attacks do not affect all neurons equally but rather create distinctive patterns in specific neurons’ gradient behavior.
When examining gradient changes during poisoning attacks, we observed that approximately 15–20% of neurons exhibit significantly higher variance compared to benign training scenarios. These “informative neurons” provide a more distinctive projection for detecting malicious behavior than examining entire layers or random parameter subsets.
Furthermore, using neurons as the unit of analysis provides an appropriate granularity level—finer than layer-level analysis (which can reduce attack effects by averaging across many neurons) but more structured than arbitrary parameter selection (which may miss important architectural relationships). This targeted approach allows defense classifiers to capture subtle attack signatures while maintaining computational efficiency by focusing only on the most relevant parts of the model.
Our dynamic neuron selection strategy adapts to different attack patterns, as different attacks may affect different parts of the network. This adaptability represents a key advantage over static defense mechanisms that may become ineffective as attack strategies evolve.
The proposed defense method first selects neurons that have the most impact on distinguishing between malicious and benign users. Then, with the dimensionality reduction, the information entropy from the selected neurons is improved. In the final step, a classification method is used to detect malicious users and remove their contribution from the model aggregation process.
For the neuron selection phase, our mechanism identifies neurons exhibiting the highest variance in gradient updates. The theoretical justification for this approach stems from the observation that poisoning attacks necessarily alter gradient trajectories to achieve their objective. Let
represent the model parameters, and
denote the gradient update from participant
k at round
t. Under benign conditions, the expected gradient behavior is as given in Equation (1).
where
represents the “true” gradient direction toward convergence, and
represents natural noise due to data sampling and stochastic optimisation. In contrast, malicious updates introduce adversarial perturbations as given in Equation (2).
where
represents the adversarial manipulation. Crucially, these manipulations
cannot be uniformly distributed across all parameters without significantly reducing the attack effectiveness, as this would reduce the poisoning impact. Instead, attacks typically concentrate their effect on specific neurons critical to the target task, creating regions of heightened gradient variance.
Our dynamic neuron selection identifies precisely these regions of abnormal variance, providing a theoretical foundation for its effectiveness. The selection threshold adapts to maintain a consistent proportion of neurons regardless of model scale or attack intensity.
Our method’s effectiveness can be understood through the lens of statistical anomaly detection. The OC-SVM classification establishes a decision boundary that separates the normal region of gradient behavior from anomalous updates. Under benign conditions, as the number of benign participants k → ∞, the probability of false positives approaches zero, as the classification boundary converges to the true distribution of legitimate updates. Even under attack conditions, as long as benign participants outnumber malicious ones, statistical identification of anomalous behavior remains theoretically sound.
The summarized version of the algorithm is shown in Algorithm 1. In this section, the proposed method will be examined in detail.
Algorithm 1: NC-FLD Algorithm |
Require: FLCycles, θglobal, K, E |
for t in FLCycles do |
//K users begin individual training of their local models. |
for k in K do |
θk ← θglobal |
for e in E do |
y′ ← Model(θk,e)(x) |
Loss ← MSE(y′, y) |
θk,e+1 ← SGD(θk,e, Loss) |
end |
//After training is completed, all users upload their model updates to the server. |
Server ← θk |
end |
//Defence mechanism initiates after each FL round. |
for k in K do |
//Model gradients of user k is calculated. |
Δθkt ← θkt − θkt-1 |
//The most impactful neurons are chosen by using WP. |
Ψkt ← WP(Δθkt) |
//Dimension reduction of the impactful neurons. |
θkt′ ← PCA(Ψkt) |
end |
//Distinguishing malicious and honest users by using |
their low-dimension model updates. Classification: |
{OC-SVM, k-NN, Agg-C} |
(θ1→mt′, θm+1→K t′) ← Classification(θ1→Kt′, *args*), |
//Models are aggregated excluding maliciously labelled users. |
θglobal ← FedAvg(θk=m′+1→K) |
end |
The process of Algorithm 1 is decomposed in the following subsections for clarity.
3.2. Selective Neural Pruning in NC-FLD
The NC-FLD method uses model gradients
, where k is the index of users,
t is the FL round in the training process that end users obtain using their own training data and then send to the server. The entire defense mechanism is designed to be implemented on the server side. Weight Pruning (WP) is an algorithm used to eliminate neurons that have little effect on the artificial neural network and to lighten the models [
23]. Although the primary purpose of WP is to reduce processing complexity, it is used to select the most impactful neurons in our proposed method.
Basically, it allows the selection of ineffective neurons by the second-order Taylor expansion of model gradients. However, since calculating the Hessian matrix in all users’ models would be a very costly process, ineffective neurons can be selected by using the magnitudes in the model gradients. The selected neurons carry more information about the learning process because they are the most affected neurons. After examining the change in model parameters, neurons exceeding the determined threshold value are marked as they contain high information.
We consider a neural network model with parameters
θ distributed across
n participating nodes. For each local model at node
i, we define a binary mask
Mi ∈ 0, 1|θ|where
|θ| represents the total number of parameters. The pruning operation
,
) zeros out a fraction
of the smallest magnitude weights, creating a sparse model:
where
represents element-wise multiplication, and
Mi is determined by
Here, represents the magnitude threshold corresponding to the desired pruning ratio .
When considering security implications, if an adversary attempts to inject malicious updates
into the global model, the pruned model update
from client
i to the server follows:
This pruning operation constrains the adversary’s degrees of freedom to
parameters. For the global aggregation operation
, the secure aggregation of pruned updates can be expressed as follows:
where
represents the learning rate and
typically computes a FedAvg aggregation. The pruning threshold
can be dynamically adjusted based on the weight distribution:
After pruning operations are completed, the most impactful neurons are selected and stored as . Please note, this process is only carried out to select neurons. There is no model pruning in the process that will cause a change in the model architecture.
3.3. Parameter Dimensionality Reduction Stage
In the second stage of the NC-FLD method, Principal Component Analysis (PCA) is performed [
5] on the selected neurons. The PCA method aims to highlight the most distinctive features of the model parameters. After WP, the neurons that pass the elimination are saved together with the model to be analyzed by the server. At this time, the model has not yet participated in FL collection and is awaiting approval to contribute to the cycle. Because a dynamic selection was performed, the recorded neurons may differ in number. Therefore, PCA [
24] is used to reduce the size of the data to two dimensions. With size reduction, data are formatted to make them the same size for all users. In addition, it is also possible to obtain concise information about the model parameters.
Given a set of model updates from
k users, a matrix
is constructed from the selected neurons, where
d is the original dimension of selected model updates. To ensure a consistent starting point, the data are centered by subtracting the mean vector μ from each row:
Next, the covariance matrix Σ is computed to capture the relationships between different features in the dataset:
The eigendecomposition of Σ yields eigenvectors
V and their corresponding eigenvalues
Λ:
The eigenvalues indicate the magnitude of variance captured by each eigenvector, with larger eigenvalues corresponding to more significant variance. To reduce dimensionality, the top k eigenvectors associated with the largest eigenvalues are selected to form the projection matrix W ∈
. The data are then transformed into an
n-dimensional latent space:
where
∈
represents the low-dimensional embeddings of model updates. This transformation preserves the most significant variance in the data while reducing dimensionality.
Thus, concise information about the model update behaviors is collected. Then, these low-dimensional embeddings are fed to a classification algorithm to distinguish between benign and malicious users.
3.4. Classification of Dimensionally Reduced Impactful Neurons
The objective of previous operations is to increase the divergence between malicious users and benign users’ model updates. Thus, the classification becomes more accurate.
As a result of the classification, the models of users determined to be reliable are added to the FedAvg federated collection and a new is created. The new global model is sent to K users, starting a new federated learning tour that will continue for rounds.
The One-Class Support Vector Machine (OC-SVM) [
25] detects anomalies by creating a boundary around normal data points, labeling those outside this boundary as outliers. In our case, we classify only benign users, and the rest are marked as malicious users. Given a training dataset comprising instances
where
represents the selected neurons after dimensions are reduced, the OC-SVM optimization problem can be formally expressed as follows:
This optimization is subject to the following constraints:
where
presents the normal vector to the hyperplane,
denotes the bias term, and
serves as a regularization parameter that controls the trade-off between training errors and model complexity. The feature mapping function
projects the input data into a higher-dimensional space, while
represents slack variables that accommodate soft margins. The resulting decision function is defined as
, where positive values indicate benign model updates and negative values denote malicious ones.
Agglomerative Clustering (Agg-C) [
26] is a hierarchical technique that begins with each data point as an individual cluster, merging the nearest clusters step by step until one cluster or a specified number of clusters is achieved. This method produces a dendrogram, which shows the hierarchical relationships within the model updates. Consider a set
comprising
data points. The algorithm initializes with a cluster set
, where
, indicating that each data point initially constitutes its own cluster. The clustering process proceeds iteratively by merging the two most similar clusters based on a specified linkage criterion
. Three predominant linkage criteria are employed as follows:
Single linkage, which considers the minimum distance between clusters:
Complete linkage, which evaluates the maximum distance:
Average linkage, which computes the mean distance between all point pairs:
where
represents a suitable distance metric between points
and
. The iterative merging process continues until a termination criterion is met, such as achieving a predetermined number of clusters
, reaching a distance threshold
between clusters, or consolidating all points into a single cluster.
The k-Nearest Neighbors (k-NN) [
27] algorithm classifies data points based on the majority class of their k closest neighbors. For regression, it predicts values by averaging the
k nearest neighbors’ outputs. For a query point
, the classification is determined by the following:
The selection of the k value and distance metric presents a significant importance to be effective in the defense. We chose k as 4 and the Manhattan distance among the model updates.
Figure 2a–c show the decision plane for the NC-FLD method. The figures on the left-hand side show the output data right after the WP and PCA algorithms. Each dot in the figure corresponds to a processed user’s model update. The aim of NC-FLD is to mark users correctly according to their behavior. We applied three classification methods to distinguish between malicious and benign users. The OC-SVM, Agg-C, and k-NN classification illustrations are given in the figures, respectively. The blue dots show benign users, orange ones are malicious users, and red dots correspond to incorrectly labeled users, i.e., false positive or false negative.
The classification methods used in this study are compared in
Table 1.
3.5. Edge Cases and Limitations
During our experimental evaluation, we identified several edge cases where NC-FLD’s performance might deteriorate without proper consideration. These cases and our mitigations are detailed below.
3.5.1. Highly Imbalanced Participation Rates
When malicious participants significantly outnumber honest ones (>50%), traditional outlier detection approaches fail as the malicious updates become the “majority” pattern. We evaluated NC-FLD under malicious participation rates up to 60% and observed a sharp performance decline beyond 50%. The main reason behind this deterioration is the assumption that most of the studies share: the majority of participants are always benign users.
3.5.2. Extremely Non-IID Distributions
Highly heterogeneous data distributions, where some participants have access to only one or two classes, present challenges for our approach. In such cases, legitimate gradient updates may appear statistically similar to poisoning attacks due to their focused nature on specific classes.
3.5.3. Noise Tolerance
To quantify robustness to gradient noise, we injected Gaussian noise (σ = 0.1–0.5) into client updates before applying our defense mechanism. NC-FLD maintained a detection accuracy above 92% with noise levels up to σ = 0.3, with performance degrading by only 8% at σ = 0.5. Competing methods showed 15–22% accuracy drops under equivalent conditions.
All measurements involved 10 repetitions with different random seeds to ensure statistical validity. These results confirm that NC-FLD’s robustness extends beyond standard attack scenarios to various data quality challenges that may arise in real-world federated deployments.
3.6. Dynamic Participation and Byzantine Fault Tolerance
Federated learning environments frequently experience dynamic participation, where clients may join or leave the training process across rounds. NC-FLD naturally accommodates this variability since our detection mechanism operates independently at each federated round, requiring no persistent client history or cross-round tracking. When new clients join, their updates are evaluated using the same neuron selection and classification process applied to existing participants. Client dropouts are inherently handled by our round-based evaluation approach, as detection decisions are made solely based on currently available updates.
Regarding Byzantine fault tolerance (BFT), NC-FLD effectively implements a BFT-compliant aggregation mechanism by identifying and excluding malicious updates before the aggregation phase. Unlike traditional BFT protocols that typically require consensus across distributed nodes, our server-centric approach establishes a classification boundary that filters out malicious contributions with high accuracy.
The server’s detection mechanism also addresses adaptive adversaries through its dynamic neuron selection strategy. Since the specific neurons examined may vary between rounds based on gradient impact patterns, adversaries cannot easily determine which parameters to manipulate to avoid detection. Additionally, our dimensionality reduction phase creates a transformed feature space that conceals the decision boundary from the perspective of individual clients, further enhancing resistance against adaptive Byzantine attacks.
4. Attack Scenarios and Results
Our experimental evaluation consists of three main components. Firstly, we present the carefully selected datasets in
Section 4.1. that represent varying levels of classification complexity. We then outline our simulation framework, detailing the federated learning environment configuration including participant numbers and data distribution strategies in
Section 4.2. Finally, we conduct a comprehensive performance analysis to assess our method’s effectiveness across multiple metrics and attack scenarios in
Section 4.3.
4.1. Datasets
The CIFAR-10 [
28], F-MNIST [
29], MNIST [
30], and German Traffic Sign Recognition (GTSR) [
31] datasets are used in this study. As model architecture, ResNet18 is used in the CIFAR-10 dataset, and CNN is used in the F-MNIST and MNIST datasets.
The CIFAR-10 dataset consists of 32 × 32-sized color images of airplanes, cars, birds, cats, gazelles, dogs, frogs, horses, ships, and trucks. There are 60,000 images in total. A total of 50,000 of these images are used for training and 10,000 for testing.
The F-MNIST dataset consists of 28 × 28-sized gray images of t-shirts, trousers, sweaters, dresses, jackets, sandals, shirts, sneakers, bags, and boots. There are 70,000 images in total. A total of 60,000 of these images are used for training and 10,000 for testing.
The MNIST dataset consists of 28 × 28 pixel-sized gray images of digits from 0 to 9. There are 70,000 handwritten digit images. In simulation, there are 70,000 images in total. A total of 60,000 of these images are used for training and 10,000 for testing.
The GTSR dataset contains a collection of over 50,000 traffic sign images, categorized into 43 distinct classes. This dataset is characterized by a significant class imbalance, with individual class sizes ranging from 200 to 2000 samples. To facilitate our experimental analysis, we focused on a subset of 10 classes that exhibit higher sample counts, mitigating the impact of the dataset’s inherent imbalance. This approach aligns with the methodology employed by Shen et al. [
7]. The selected classes encompass a diverse range of traffic signs, including prohibitive signs such as “No passing” and “No passing for vehicles exceeding 3.5 metric tonnes”, regulatory signs like “Priority” and “Yield”, cautionary signs such as “Roadworks”, directional signs like “Keep right”, and various speed limit indicators (30 km/h, 50 km/h, 70 km/h, and 80 km/h).
4.2. Simulation Parameters and Setup
Our methodology’s robustness was validated using multiple state-of-the-art datasets, specifically chosen to eliminate potential dataset-specific biases in our evaluation. We intentionally selected datasets with varying levels of complexity to ensure comprehensive testing of our defense mechanism. The MNIST dataset, while widely used, presents relatively straightforward classification challenges that could potentially emphasize the effectiveness of defense methods if used in isolation. In contrast, the CIFAR-10 dataset introduces substantially more complex classification challenges, requiring sophisticated architectures like ResNet18 for effective learning. The consistent superior performance of our method across these diverse datasets substantiates its robust and unbiased nature.
To ensure statistical reliability of our results, all experiments were repeated 10 times with different random seeds controlling malicious participant selection and data distribution. All accuracy values reported from Tables 3–13 represent the mean values across these runs. Standard deviations ranged from ±0.3% to ±1.8%, with higher variance observed in scenarios with greater malicious participation rates. For clarity in table presentation, we report the mean values only.
The experimental results revealed an anticipated pattern in classification accuracy, with CIFAR-10 yielding lower accuracy compared to both the MNIST and F-MNIST datasets. This performance differential can be attributed to CIFAR-10’s inherent structural complexity, particularly its inclusion of visually similar images across different classification categories. A notable example of this complexity is the visual similarities between Class 3 (cats) and Class 5 (dogs) create natural classification challenges due to shared physical characteristics such as a four-legged structure, ear morphology, and tail presence. In comparison, the MNIST and F-MNIST datasets present more distinctly differentiated image classes, facilitating clearer classification boundaries. Detailed specifications of all datasets employed in this study are presented in
Table 2.
The simulation environment was designed to test federated learning under diverse participation scenarios and data distributions. For experiments involving the MNIST and F-MNIST datasets, we simulated a network of 100 participating nodes, while the more computationally intensive CIFAR-10 and GTSR datasets were evaluated using a smaller network of 20 participants. To comprehensively assess our system’s robustness, we implemented two distinct data distribution strategies: a uniform distribution where data were evenly allocated across participants, and a heterogeneous (non-IID) distribution that better reflects real-world scenarios where data distribution may be unbalanced. To simulate security threats, we randomly designated a varying proportion of participants as adversarial, with the percentage ranging from 0% to 40% of total participants. Each training iteration consisted of a single epoch of local training at each participant node, followed by a security phase where the central server employed detection mechanisms to identify and exclude malicious updates before aggregating the remaining contributions. The refined global model was then redistributed to all participants, with this entire cycle repeating 100 times to ensure convergence and reliable evaluation of the system’s performance.
The performance of the defense mechanism is measured by the overall global model accuracy metric. This metric is calculated as the metric of accurately classified samples over the total number of predictions made.
where
N is the total number of data samples,
is the correct label, and
is the predicted label of sample
.
4.3. Performance Analysis
Various simulations have been carried out to evaluate the defense performance of the NC-FLD. The first of these includes a label-flipping attack for the data poisoning attack scenario. The second one is sign-flipping and LIE attacks from model poisoning attacks. The following subsections show the robustness of the NC-FLD algorithm against poisoning attacks and pattern attacks, respectively. For each scenario, we run 100 rounds before calculating the test accuracy on the global model. These simulations are run 10 times and the average performance of all the simulations is given in tables in this section.
Our evaluation covers multiple attack types across various datasets. We examine two primary attack categories:
Data Poisoning: Implemented as label-flipping attacks where malicious users deliberately mislabel training samples. For CIFAR-10, we flip dog (label 5) to cat (label 3); for F-MNIST, we change the labels for shirt and t-shirt; and for MNIST, we change the labels for digits 7 and 1.
Model Poisoning: implemented through (1) sign-flipping attacks where malicious users invert gradient signs () and (2) LIE attacks where updates are manipulated to evade detection while maximizing the negative impact on the global model.
We evaluate the performance across both IID and non-IID data distributions with malicious participation rates ranging from 0% to 50%. For each scenario, we run 100 FL rounds and measure the overall accuracy on the global model.
4.3.1. Data Poisoning Scenario and Results (IID)
Table 3 presents the performance comparison for label-flipping attacks on CIFAR-10, where dog (label 5) is flipped to cat (label 3). Even with 40% malicious participants, our OC-SVM variant maintains 72.24% accuracy, significantly outperforming baseline methods.
Even when the attacker rate is increased to 40%, as seen in
Table 3, the NC-FLD method we propose effectively maintains its model success when compared to other studies. However, in our proposed method, k-NN classification cannot detect label-switching attacks as the attacker rate increases. This is because the false positive rate increases during classification, and model success decreases as it treats malicious users as benign.
Table 4 shows the results for label-flipping attacks on F-MNIST (shirt/t-shirt swap) and MNIST (digits 7/1 swap). Our approach demonstrates superior robustness particularly at higher attack rates.
Table 4 shows the state-of-the-art method performance comparison with the proposed NC-FLD method. Especially at a higher rate of malicious participants, the OC-SVM and Agg-C classification algorithms can distinguish benign and malicious users from each other more effectively.
4.3.2. Model Poisoning Scenario and Results (IID)
In this study, sign-flipping and LIE attacks were performed in the model poisoning attack scenario. The sign-flipping attack is as mentioned in
Section 4.3 [
12].
Table 5 shows the results of the sign-flipping attack performed on the CIFAR-10 dataset.
This model poisoning attack affects more gradient values than data poisoning attacks. However, the proposed NC-FLD method is also resistant to sign-flipping attacks because it selects neurons that change significantly.
Table 6 illustrates the sign-flipping attack results on F-MNIST and MNIST. NC-FLD with OC-SVM classification consistently outperforms baseline methods, preserving over 80% accuracy even under severe attack conditions.
As
Table 5 and
Table 6 show, the Krum algorithm is highly vulnerable against sign-flipping attack compared to other methods.
LIE attack: First of all, the standard deviations and averages of all gradients should be estimated empirically, and then it will not affect these values, only the max. It is model poisoning, which is carried out by affecting the model parameters in such a way that the corrupted values are transmitted to the server [
32]. The aim of this attack is to deviate from the global model with malicious model gradient updates, without distinguishing the standard deviation and mean. The LIE attack results are given in
Table 7 for CIFAR-10, and
Table 8 for the F-MNIST and MNIST datasets. An LIE attack is one of the hardest attack types to detect and avoid because the gradient change after attack keeps within the standard deviation of benign user updates. Therefore, the performance of our method decreased the most while defending against an LIE attack.
Table 8 shows the LIE attack results for the F-MNIST and MNIST datasets. The OC-SVM and Agg-C variants demonstrate resilience against this sophisticated attack type.
The superior performance of NC-FLD against LIE attacks, particularly with the OC-SVM classifier, can be attributed to several key factors. Firstly, LIE attacks are designed to maintain statistical similarity to benign updates while maximizing negative impact. This makes them particularly difficult to detect using statistical methods that rely on simple metrics like the Euclidean distance or angular similarity.
Our neuron-centric approach counteracts this by focusing on the neurons most affected during training rather than examining the entire parameter space. Since LIE attacks attempt to manipulate parameters without affecting overall statistical distributions, they still create distinctive patterns in specific neurons. The OC-SVM classifier, with its ability to define a tight decision boundary around benign behavior in the dimensionally reduced feature space, proves especially effective at identifying these subtle neuron-level anomalies.
4.3.3. Data Poisoning Scenario and Results (Non-IID)
In this scenario, the distribution characteristics of users’ local datasets play a crucial role in system performance and training dynamics. These distributions can be categorized into two fundamental patterns: IID and non-IID data. In the IID scenario, each user’s local dataset serves as a representative sample of the complete data distribution, containing samples that mirror the overall pattern found in the global dataset. However, real-world applications often present more complex non-IID scenarios, where users’ local datasets may differ significantly from one another in two key dimensions: the representation of different classes within their data and the relative frequency of samples per class. This heterogeneity means that while one user might have a balanced representation of all classes in their local dataset (resembling an IID distribution), another user’s dataset might exhibit significant class imbalances or even completely lack certain classes, creating challenges for the federated learning process and requiring robust approaches to handle such diversity in data distributions. We use Dirichlet distribution. For
K classes and
N clients, the probability density function of the Dirichlet distribution is the following:
where
pi represents the proportion of data for class i;
αi is the concentration parameter for class i (adjusted to 1 to avoid extreme non- IID scenarios);
B(α) is the multivariate beta function.
Our federated learning analysis confronts a key challenge in real-world implementations: the intrinsic variation in how data naturally cluster across participating nodes.
Figure 3 employs heat mapping to showcase this distribution variance, examining patterns across a hundred distinct nodes. The visualization maps the distribution intensity relative to the baseline uniformity (denoted by 100%), revealing compelling variations in data concentration. Notable clustering emerges in certain regions, with some nodes exhibiting nearly double the baseline concentration (reaching 180% intensity) depending on the parameter α
i for specific categories. These pronounced variations manifest in distinctive columns throughout the heat map, highlighting systematic deviations from uniform distribution. Such inherent data asymmetry exemplifies a fundamental consideration in practical federated systems, where natural data clustering significantly influences system dynamics. The observed patterns underscore how real-world data naturally gravitate toward non-uniform distributions across participating nodes, creating inherent variations that must be addressed in system design.
Table 9 presents the performance against label flipping in non-IID CIFAR-10 environments. NC-FLD maintains consistent accuracy above 74% even with increasing attack intensity.
Examining defense effectiveness across dataset architectures yields illuminating insights into framework resilience. While the F-MNIST evaluations demonstrated strong baseline performance approaching 86%, the MNIST implementation revealed substantially enhanced robustness, with accuracy metrics consistently exceeding 98% under standard conditions, as given in
Table 10. The framework stability remained notable even under intensified attack scenarios, with performance maintaining above 97% under maximum malicious participant attendance. These findings suggest meaningful advancements in preserving model integrity across varied data distributions and attack intensities for data poisoning cases.
Another experiment was carried out on the GTSR dataset. In the simulation, the data among the participants are set to non-IID. The comparative results presented in
Table 11 demonstrate the performance of FedAvg, Krum [
14], LFGuard [
22], and FoolsGold [
33]. Through evaluations conducted with three variants of the proposed method (OC-SVM, Agg-C, and k-NN), it has been observed that the OC-SVM-based approach exhibits superior performance even under high malicious ratios (up to 50%). Notably, at a 30% adversarial ratio, the OC-SVM variant achieved an accuracy rate of 96.03%, surpassing its nearest competitor, LFGuard (95.75%). These findings suggest a marked improvement in robustness against malicious participants whilst maintaining model accuracy under increasingly challenging scenarios.
4.3.4. Model Poisoning Scenario and Results (Non-IID)
The evaluation of defense mechanisms against sign-flipping attacks in non-IID CIFAR settings reveals distinctive performance characteristics. While baseline performance remained comparable across frameworks, varying resilience emerged under attack conditions. The OC-SVM implementation maintained stability, achieving 74.12% accuracy under moderate attack conditions and 73.54% under 20% malicious participant scenarios. This performance trajectory notably surpassed state-of-the-art approaches, particularly Krum, which exhibited significant vulnerability with accuracy declining to 42.36% under maximum attack conditions. The results for this scenario are shown in
Table 12.
We further extended our experiments across dataset architectures to demonstrate the defense capabilities against sign-flipping attacks. In the F-MNIST evaluations, the advanced frameworks maintained robust performance, with OC-SVM achieving 85.75% accuracy under moderate attack conditions. The MNIST implementations revealed enhanced resilience, consistently exceeding 98% accuracy under standard conditions and maintaining above 97% effectiveness even under maximum attack scenarios. These findings demonstrate meaningful advancement in preserving model integrity across varying data distributions and malicious user attendance. The results mentioned are given in
Table 13.
4.3.5. Defense Runtime Performance
The experiments ran for 100 communication rounds for all datasets on AMD Ryzen 7 9800X3D 16-Core, NVIDIA RTX 3070 TI 8 GB, and 64 GB DDR5 RAMs. According to the used datasets, the average running time of an experiment on the workstation is as follows: 0.6 h for MNIST, 1.4 h for GTSR, 2.1 h for F-MNIST, and 4.9 h for CIFAR datasets. We used 2-layer CNN and 2-layer fully connected model architecture for the MNIST dataset, 3-layer CNN and 3-layer fully connected layers for the F-MNIST and GTSR datasets, and we used the 18-layer ResNet18 model to train on the CIFAR-10 dataset.
The experimental runtime performance analysis was conducted across diverse federated learning methodologies utilizing three distinct datasets: F-MNIST, MNIST, and CIFAR-10. The runtime performance figure is illustrated in
Figure 4. In the simulations, a significant heterogeneity in computational efficiency was observed, with particularly pronounced disparities being manifested in the CIFAR-10 dataset implementation. Within this context, while the Mkrum approach was characterized by the highest computational demands, necessitating approximately 1.6 s of runtime, methodologies such as FedAvg and FL-Defender were distinguished by their substantially reduced computational overhead, requiring merely 0.2–0.3 s. A more equilibrated distribution of the runtime metrics was evidenced in the F-MNIST dataset evaluations, wherein the majority of methodologies were concentrated within the 0.2–0.8 s range, with our proposed methodology demonstrating performance parity with established approaches. It is particularly noteworthy that the MNIST dataset implementations consistently exhibited minimal computational requirements across all evaluated methodologies, with execution times predominantly remaining below 0.2 s, thereby suggesting inherent computational simplicity relative to alternative datasets. This comprehensive comparative analysis effectively presents the computational efficiency in FL implementations across datasets of varying complexity.
The computational cost of NC-FLD must be evaluated in context with its performance benefits. While the OC-SVM variant of NC-FLD shows a 5–15% accuracy improvement over TMeans in high-adversarial scenarios, it requires approximately 25% more computation time per federated round (0.8 s versus 0.6 s for CIFAR-10). This represents a favorable trade-off in security-critical applications where model integrity is paramount. The Agglomerative Clustering variant offers a middle ground, with runtime comparable to TMeans but still maintaining significant performance advantages under attack. The k-NN variant provides the lowest computational overhead with only 10% increased computation compared to FedAvg, but shows the most degradation under high-adversarial rates. This presents a spectrum of options depending on deployment constraints: OC-SVM for maximum security, Agglomerative Clustering for balanced performance, and k-NN for resource-constrained environments where moderate protection is still required.
It is important to note that additional computational considerations become relevant when scaling NC-FLD to extremely large federated systems. While our neuron selection approach reduces the dimensionality of the analyzed parameters, the OC-SVM classification step has a worst-case computational complexity of O(n2) in the number of participants, which could become challenging with thousands of clients. Additionally, the memory requirements for storing selected neuron data from all participants scale linearly with the participant count. Our current implementation performs all detection operations at the server, which may create a potential bottleneck for massive-scale applications. For deployments involving hundreds or thousands of clients, a hierarchical approach with intermediate aggregation points performing preliminary filtering could distribute this computational load. We have validated our approach with up to 200 clients but note that further optimization would be required for internet-scale deployments with substantially more participants.
4.3.6. Scalability Analysis
While our experiments with the CIFAR-10 and GTSR datasets were conducted with 20 clients due to computational constraints, we performed additional scaling tests to project performance in larger federated systems. For the MNIST dataset, we ran simulations with varying numbers of clients (20, 50, 100, 200) to analyze how system scale affects accuracy.
Our findings indicate that NC-FLD’s detection accuracy remains stable as the number of clients increases, with less than 1% variation in model performance when scaling from 20 to 200 clients. This stability can be attributed to our dimensionality reduction approach, which projects client updates into a consistent feature space regardless of client count.
4.4. Performance Trade-Offs Analysis
While NC-FLD demonstrates superior robustness against poisoning attacks, it is important to consider potential trade-offs. Our computational overhead analysis reveals that NC-FLD introduces approximately 12–15% additional processing time compared to FedAvg due to the neuron selection and dimensionality reduction steps. However, this overhead remains constant regardless of the attack intensity and is significantly lower than the 25–40% overhead reported for some alternative defense mechanisms like Mkrum.
Under benign conditions (0% malicious participants), we observe a minor accuracy reduction of 0.5–1.2% compared to standard FedAvg across datasets. This slight performance impact stems from the inherent caution of our defense mechanism, which occasionally misclassifies benign updates with unusual gradient patterns as potentially malicious. However, this trade-off becomes negligible when even a small percentage of participants are malicious (5% or more), where NC-FLD significantly outperforms undefended systems.
The memory requirements for NC-FLD scale linearly with the number of selected neurons rather than the total model size, making it feasible for deployment even in resource-constrained federated environments. For instance, in our CIFAR-10 experiments with ResNet18 (11 M parameters), the memory footprint for gradient analysis was reduced by approximately 85% compared to methods that analyze full gradients”.
4.5. Generalization Capabilities and Limitations
The NC-FLD approach demonstrates promising generalization capabilities beyond the specific examples tested in our experiments. The fundamental principle of identifying high-impact neurons through gradient variance analysis is architecture-agnostic, making it applicable to various neural network structures beyond CNNs and ResNet architectures. This adaptability stems from our approach’s focus on neuron-level behavior rather than specific architectural features.
Our experiments across diverse datasets (CIFAR-10, F-MNIST, MNIST, and GTSR) with varying complexity, image characteristics, and class distributions demonstrate NC-FLD’s robustness to different data domains. Furthermore, the defense mechanism’s effectiveness against multiple attack types (label-flipping, sign-flipping, and LIE attacks) indicates generalization across threat models.
However, several theoretical limitations could potentially constrain generalization in certain scenarios. Firstly, extremely deep architectures with hundreds of layers may exhibit more complex interactions between neurons, potentially requiring hierarchical neuron selection strategies beyond our current implementation. Secondly, while our approach performs well on moderate non-IID distributions, extremely skewed distributions where clients have access to only a single class may challenge our detection mechanisms, as gradient patterns from legitimate training on such restricted data could resemble poisoning patterns. Finally, the effectiveness may be limited in scenarios where malicious participants significantly outnumber honest ones (>50%), as our approach assumes majority-honest participation.
Despite these limitations, the neuron-centric paradigm offers a flexible foundation that can be extended to additional learning architectures, federated optimization algorithms, and emerging attack variants through minor adjustments to the neuron selection criteria and classification boundaries.
5. Discussion and Conclusions
As a result of the experimental evaluations, it has been proven that the performance of our proposed NC-FLD method reaches a high success rate. The superior defense of our neuron-centric approach can be explained through the trajectory of the learning, where neurons with high impact represent critical pathways in the model’s learning process and present significantly higher variance under attack conditions. By dynamically focusing on these high-impact neurons, NC-FLD essentially concentrates on the most vulnerable points susceptible to poisoning, providing more sensitive detection than approaches analyzing the entire parameter space.
Although exposed to different attacks on three different datasets, the proposed defense algorithm generally exhibited minimal differences in model accuracy. The computational overhead varies by classifier choice, with OC-SVM introducing approximately 25% additional processing time compared to standard FedAvg, while k-NN offers a more lightweight alternative with only 10% increased computation but reduced detection accuracy under high-adversarial rates. These trade-offs present deployment options across the security-efficiency spectrum based on specific threat models and resource constraints.
Despite the promising results, several limitations of the NC-FLD approach warrant discussion. Firstly, the scalability to very large federated systems with thousands of participants remains to be thoroughly validated. While our theoretical analysis suggests computational complexity scales linearly with participant count, practical deployment challenges may emerge in vast, distributed environments. Moreover, the effectiveness on extremely deep neural architectures (e.g., transformers with hundreds of layers) requires further investigation, as the relationship between neuron behavior and poisoning attacks may become more complex in such settings. Our current experiments focused on CNN and ResNet architectures, and the patterns observed might differ in other architectural paradigms.
In future studies, we will investigate reinforcement learning techniques for adaptive neuron selection that can automatically adjust selection criteria based on observed attack patterns. We also aim to develop a hybrid defense mechanism combining our neuron-centric approach with differential privacy guarantees, providing comprehensive protection against both poisoning and privacy attacks. Additionally, we plan to extend our framework to handle sequential attack scenarios by incorporating temporal analysis of neuron behavior across multiple federated rounds. These advances should further increase the rate of accurate gradient detection and have a positive impact on model success, particularly in non-IID data distributions.
The aim is to develop an effective defense algorithm against poisoning and new attacks in non-IID data distribution by using other classification algorithms in the classification of gradients.