A Decoupled Contrastive Learning Framework for Backdoor Defense in Federated Learning

Cheng, Jiahao; Zhang, Tingrui; Li, Meijiao; Wang, Wenbin; Wang, Jun; Zhang, Ying

doi:10.3390/sym17091398

Open AccessArticle

A Decoupled Contrastive Learning Framework for Backdoor Defense in Federated Learning

by

Jiahao Cheng

¹,

Tingrui Zhang

^2,*,

Meijiao Li

¹,

Wenbin Wang

³,

Jun Wang

¹ and

Ying Zhang

¹

Department of Cyberspace Security, Beijing Electronic Science and Technology Institute, Beijing 100070, China

²

Faculty of Engineering, The University of New South Wales, Sydney, NSW 2052, Australia

³

School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100088, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(9), 1398; https://doi.org/10.3390/sym17091398

Submission received: 24 July 2025 / Revised: 11 August 2025 / Accepted: 18 August 2025 / Published: 27 August 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

Federated learning (FL) enables collaborative model training across distributed clients while preserving data privacy by sharing only local parameters. However, this decentralized setup, while preserving data privacy, also introduces new vulnerabilities, particularly to backdoor attacks, in which compromised clients inject poisoned data or gradients to manipulate the global model. Existing defenses rely on the global server to inspect model parameters, while mitigating backdoor effects locally remains underexplored. To address this, we propose a decoupled contrastive learning–based defense. We first train a backdoor model using poisoned data, then extract intermediate features from both the local and backdoor models, and apply a contrastive objective to reduce their similarity, encouraging the local model to focus on clean patterns and suppress backdoor behaviors. Crucially, we leverage an implicit symmetry between clean and poisoned representations—structurally similar but semantically different. Disrupting this symmetry helps disentangle benign and malicious components. Our approach requires no prior attack knowledge or clean validation data, making it suitable for practical FL deployments.

Keywords:

federated learning; feature decoupling; contrastive learning; backdoor defense

1. Introduction

By enabling decentralized model optimization without sharing raw data, federated learning (FL) effectively preserves the privacy of participating clients and enhances the capability for collaborative learning across devices [1]. However, this architecture also creates vulnerabilities that can be exploited by backdoor attacks, in which compromised clients introduce malicious modifications to manipulate the behavior of the global model and undermine the trustworthiness of the entire system. Existing studies indicate that backdoor attacks in FL mainly arise from two types of strategies: the first involves manipulating local training data by injecting samples embedded with specific triggers, thereby implanting hidden malicious behaviors without affecting the model’s performance on the primary task [2,3,4]; the second involves directly tampering with model updates after local training, such as forging or modifying gradients or model parameters to achieve malicious objectives [5,6,7,8]. Both strategies can stealthily affect the behavior of the aggregated global model, posing severe security threats to federated learning systems.

Existing defenses against backdoor threats in federated learning can be broadly categorized into two main approaches: one focuses on enhancing the aggregation mechanism to resist malicious updates [9,10,11], while the other aims to identify and filter out suspicious client models [12,13,14,15], primarily relying on the analysis of plaintext model parameters. These methods attempt to detect potential backdoor attacks by identifying abnormal changes in model weights or gradients. Typically, such approaches assume that the aggregation server is honest and trustworthy [16,17,18,19], or they rely on access to a large amount of clean data on the server side [13,20]. However, these assumptions pose significant risks and contradict the core principle of data localization in federated learning. Prior studies [21,22] have shown that adversaries can exploit model parameters or gradient updates to infer local data of clients, thereby compromising data privacy. Although homomorphic encryption can enhance privacy protection by concealing client data, it introduces new challenges—most notably, the server loses the ability to verify whether received model updates have been maliciously tampered with. This highlights a fundamental trade-off between user privacy and system-level security. Consequently, there is an urgent need to develop robust defense strategies tailored to the unique characteristics of federated learning environments.

We propose a contrastive learning–based defense for federated learning that disentangles clean and backdoor feature representations. In contrast to many existing defense approaches, the proposed method operates without relying on prior knowledge of the attack or the availability of clean validation data, thereby enhancing its practicality and generalizability across diverse federated learning scenarios. Guided by symmetry theory—which suggests clean and poisoned features share structural similarities despite distinct semantics—we design contrastive objectives to break this symmetry and achieve effective separation. Potentially compromised clients are modeled with specialized backdoor detectors, while benign models use contrastive loss, sample reweighting, and the information bottleneck principle to reinforce clean feature learning. This symmetry-informed strategy improves robustness, preserves user privacy by avoiding server-side weight inspection, and provides a theoretically grounded, interpretable defense. Figure 1 illustrates the workflow. All colors are used for visual distinction only and have no practical meaning.

Our Contributions. To summarize, our key theoretical insights and experimental findings include the following:

We explore how backdoor-related and benign representations interact by analyzing their separability in the feature space. Building on this insight, we propose a novel defense strategy named Decoupled Contrastive Learning (DCL), specifically crafted to counter backdoor threats in federated learning systems. By designing targeted augmentation methods and crafting informative positive–negative sample pairs, DCL encourages the separation of malicious and benign representations. This method enables local models to focus on extracting clean features while minimizing the influence of backdoor signals in the learned embedding space.
We conducted evaluations on the MNIST, Fashion-MNIST, and CIFAR-10 datasets to validate the performance of our method. Results demonstrate that it reliably suppresses the attack success rate (ASR) below 16% across both transient and sustained backdoor threat settings, all while preserving the primary task accuracy (ACC) with negligible impact.
We develop a defense-aware adaptive attack strategy and demonstrate that DCL remains robust under such challenging conditions.
By comparing the communication overhead per training round with and without DCL, we verify that the proposed scheme enhances security without introducing significant communication burden.
Unlike many existing defenses, our method does not require prior knowledge of the attack or access to clean validation data, and it avoids inspecting client model weights on the server side, which helps better preserve user privacy, making it broadly applicable.

The organization of the remainder of this paper is as follows: Section 2 introduces the background and related works. Section 3 presents the threat model. Section 4 details the design and implementation of the proposed method. Section 5 provides the experimental setup and results. Finally, Section 6 concludes the paper and outlines future work.

2. Related Work

2.1. Backdoor Attacks

Data-Poisoning Attacks. Data-poisoning attacks are a prevalent form of backdoor threat in federated learning, where adversaries intentionally tamper with local training data to cause the global model to behave incorrectly. For example, Shejwalkar [4] introduced an attack based on flipping labels. Expanding on this, Tolpegin [3] examined how variables such as the fraction of malicious participants, the specific class changes, and the timing of their involvement influence the attack’s success. Additionally, Xie [2] broke down the trigger into several localized patterns distributed across different compromised clients, enhancing the attack’s stealthiness and making detection more challenging.

Model-Poisoning Attacks. Adversarial participants may intentionally manipulate local training to inject hidden triggers into the global model, thereby corrupting its behavior without raising suspicion. Bagdasaryan [5] introduced an approach where adversaries locally train a model with embedded backdoors that subtly align with the global model’s parameters, allowing for seamless replacement without triggering anomalies. Fang [6] constructed deceptive gradient updates that can evade existing defense mechanisms, enabling backdoor injection under the guise of legitimate learning. Bhagoji [8] utilized an alternating minimization method to stealthily optimize attack objectives while maintaining plausible update behavior. Sun [7] presented the Distance-Aware Attack, which strategically adjusts feature representations to enhance the effectiveness of targeted poisoning without raising suspicion.

2.2. Backdoor Defense

Robust Aggregation. Robust aggregation techniques are designed to reduce the influence of adversarial updates on the federated global model. Instead of relying on standard averaging, these methods apply aggregation strategies that are resilient to manipulated contributions. For instance, Yin’s

α

-trimmed mean [9] enhances resilience by discarding extreme values during aggregation. Pillutla [10] adopted the geometric median as a robust central tendency measure to limit the effect of poisoned updates. To defend against corrupted updates, Blanchard [11] introduced Krum, an aggregation strategy that selects a client model whose parameters exhibit minimal discrepancy from the majority, thus reducing the influence of outliers. Zhang et al. [18] designed a fine-grained removal strategy to eliminate malicious weights. Huang et al. [16] calibrated anomalous parameters using Fisher information to enhance robustness.

Anomalous Model Detection. Anomalous Model Detection, often known as clustering-based detection, is a widely used strategy for defending against backdoor threats in federated learning. For example, Ding et al. [17] proposed suppressing backdoor behavior by adjusting the weight updates of low-activation neurons. Lin et al. [19] identified potential backdoored models by analyzing layer-wise discrepancies in a fine-grained manner. Shen [12] explored grouping the model updates into clusters based on similarity, allowing suspicious updates that deviate from the main distribution to be identified and flagged as potential threats. Fung [13] introduced FoolsGold, which mitigates adversarial influence by evaluating the consistency patterns in update behaviors across clients. Mu noz-Gonz’alez [14] proposed an adaptive aggregation framework that incorporates statistical consistency checks based on gradient directions to exclude suspicious updates. In a similar vein, Wang [15] designed FLARE, which inspects latent feature distributions to pinpoint and suppress potential poisoning attacks.

2.3. Contrastive Learning

Given the concealed characteristics of backdoor triggers, the model often fails to effectively separate malicious cues from benign representations, thereby increasing the likelihood of misclassification. In contrast, humans are less affected by such perturbations because they tend to focus on the direct associations of objects rather than irrelevant factors [23]. Inspired by disentangled representation learning [24,25,26], we aim to enable the model to learn representations of only the critical features while discarding backdoor perturbations, thereby enhancing the robustness of federated learning. Contrastive learning has shown substantial success in learning discriminative features by guiding the model to distinguish between semantically similar and dissimilar inputs [27,28,29]. In computer vision, approaches like SimCLR [30] and MoCo [31] generate associated input pairs by applying varying transformations to a single data point, treating them as aligned samples. In contrast, samples originating from unrelated instances are treated as divergent, helping the model build a more structured feature representation. This paradigm has proven effective in enhancing visual understanding tasks such as classification and detection. Similarly, in the NLP domain, SimCSE [32] simulates semantically connected text pairs by perturbing a sentence through strategies like token masking or reordering, which improves sentence-level embedding quality. CCL [33] effectively distinguishes causal features from spurious correlations by incorporating causality-guided contrastive learning, thereby improving the model’s generalization and robustness in complex semantic tasks. In the time-series domain, TS-TCC [34] addresses the issues of false negatives and class imbalance in time-series data by leveraging contrastive learning methods to effectively enhance the model’s representation learning capability under imbalanced and noisy conditions.

Moreover, contrastive learning has also shown distinct strengths in achieving feature disentanglement. For example, Xuan [35] introduced a disentangled contrastive learning framework to tackle the issue of class imbalance in long-tailed distributions. Their method formed positive and negative pairs in a way that guided the model to distinguish inter-class features while enhancing the representation of underrepresented classes. Inspired by this capability, we extend contrastive learning to the realm of federated learning for mitigating backdoor threats. By carefully designing positive and negative feature pairs, the proposed strategy promotes the separation of malicious features from clean representations in the latent space, thus reducing the risk of backdoor manipulation.

3. Threat Model

In this study, we consider a federated learning system consisting of 100 clients. In the single-round attack scenario, 10% of the clients are compromised as malicious participants, while in the continuous attack scenario, 40% of the clients persistently participate in multiple training rounds, injecting malicious updates. These malicious clients attempt to manipulate the global model by poisoning their local training data with backdoor triggers. Specifically, in the single-round attack, malicious clients directly replace their local updates in one training round to rapidly implant the backdoor; in the continuous attack, they continuously inject backdoor-related updates over multiple rounds to progressively strengthen the backdoor effect. Throughout the training process, the server neither accesses the clients’ raw data nor inspects the specific model parameters or gradients, relying solely on aggregating local updates to train the global model. This setup adheres to the privacy-preserving principles of federated learning while also increasing the challenges in designing effective defense strategies.

4. Methodology

In this part, we provide a theoretical foundation for our method and assess its effectiveness within a standard federated learning backdoor scenario. Prior studies [15] have shown that the second-to-last layer of a model captures informative, high-level representations that reflect the model’s focus. Building on this understanding, we incorporate an auxiliary network to assist local training. If the client is benign, this additional model enhances the extraction of relevant features, thereby strengthening the quality of local representations. If the client is malicious, the auxiliary model is intentionally trained as a backdoored model

f_{t^{'}}

. Then, for the sample pairs

(X, Y)

held by the malicious client, we extract the penultimate layer outputs of both

f_{t^{'}}

and the local model

f_{t}

as R (backdoor features) and Z (clean features), respectively. During training, contrastive learning is employed to decouple R from Z, followed by a sample-reweighting strategy to guide

f_{t^{'}}

toward learning clean features, thereby mitigating the backdoor effect. Upon completion, the purified model

f_{t}

is used in global aggregation. The subsequent sections elaborate on each implementation step in malicious clients.

4.1. Backdoor Model $f_{t^{'}}$

Building on previous research [36], this paper recognizes that backdoor samples are more easily learned by the model. To further reinforce the backdoor effect in

f_{t^{'}}

, the training of

f_{t^{'}}

is terminated immediately after convergence on the backdoor samples, while

f_{t^{'}}

is switched to evaluation mode during the training of

f_{t}

. At this stage,

f_{t^{'}}

has exclusively learned the backdoor features and has yet to converge on the clean samples.

4.2. Local Model $f_{t}$

Building on the decoupling concept introduced by Huang [37], this work formulates the loss function by integrating a variational information bottleneck approach alongside sample weighting. The loss function is expressed as three parts:

\{\begin{matrix} \begin{matrix} m i n I (Z; X) & ➀ \\ m a x I (Z; Y) & ➁ \\ m i n I (Z; R) & ➂ . \end{matrix} \end{matrix}

(1)

Here,

I (\cdot; \cdot)

denotes mutual information. The terms labeled ➀ and ➁ together form the information bottleneck loss. Term ➀ limits irrelevant information in the input that does not contribute to the label, helping to filter out noise from unrelated features. Term ➁ encourages the latent representation Z to retain the key information needed for accurate label prediction. Term ➂ measures the dependence between the backdoor feature R and the clean feature Z. Reducing the mutual information between Z and R decreases their dependency, allowing Z to focus on extracting features critical for the task. The detailed calculation process is presented below.

Term ➀. To constrain the irrelevant and redundant information in the input that does not affect the label, thus aiding in filtering noise from unrelated features,

I (Z; X)

can be rewritten based on the mutual information formula as follows:

I (Z; X) = \int \int p (x, z) log \frac{p (z | x)}{p (z)} d x d z .

(2)

In real-world federated learning settings, calculating the marginal distribution

p (z)

is challenging. Prior studies [38] suggest approximating

p (z)

with a variational distribution

r (z)

. The Kullback–Leibler divergence provides a metric for the difference between

p (z)

and

r (z)

, calculated as follows:

\begin{matrix} D_{K L} (p (x) ∥ q (x)) & = H (p (x), q (x)) - H (p (x)) \\ = \int p (x) log \frac{1}{q (x)} d x - \int p (x) log \frac{1}{p (x)} d x \\ = \int p (x) (log p (x) - log q (x)) d x \\ = \int p (x) log \frac{p (x)}{q (x)} d x . \end{matrix}

(3)

Since the Kullback–Leibler divergence cannot be negative, the inequality below holds

\begin{matrix} D_{K L} (p (z), r (z)) \geq 0 \\ \to \int p (z) log \frac{p (z)}{r (z)} d z \geq 0 \\ \to \int p (z) log p (z) d z \geq \int p (z) log r (z) d z \\ \to \int \int p (x, z) log p (z) d x d z \geq \int \int p (x, z) log r (z) d x d z . \end{matrix}

(4)

From the aforementioned inequality, the relationship between

p (z)

and

r (z)

can be established. Consequently, an upper bound for Equation (2) is obtained.

\begin{matrix} I (Z; X) & = \int \int p (x, z) log p (z | x) - p (x, z) log p (z) d x d z \\ = \int \int p (x, z) log p (z | x) d x d z - \int p (z) log p (z) d z \\ \leq \int \int p (x, z) log p (z | x) d x d z - \int p (z) log r (z) d z \\ = \int p (x) D_{K L} (p (z | x) ∥ r (z)) d x . \end{matrix}

(5)

Assuming the posterior

p (z | x)

follows a Gaussian distribution with mean

μ (x)

and diagonal covariance matrix

Σ

whose diagonal elements are

σ_{1}^{2}

to

σ_{D}^{2}

, and the prior

r (z)

is a standard normal distribution with mean zero and identity covariance matrix I, the KL divergence between the two distributions can be formulated as follows:

\begin{matrix} D_{KL} (p (z | x) ∥ r (z)) \\ = \int p (z | x) log \frac{p (z | x)}{r (z)} d z \\ = - \frac{1}{2} log det (Σ) + \frac{1}{2} \int p (z | x) ({(z - μ (x))}^{⊤} Σ^{- 1} (z - μ (x)) \\ + z^{⊤} z) d z \\ = - \frac{1}{2} log det (Σ) + \frac{1}{2} (μ {(x)}^{⊤} μ (x) + Tr (Σ) - D) \\ = \frac{1}{2} (μ {(x)}^{⊤} μ (x) + Tr (Σ) - log det (Σ) - D) . \end{matrix}

(6)

Here,

μ {(x)}^{⊤} μ (x)

represents the squared length of the mean vector using the

ℓ_{2}

norm,

Tr (Σ)

refers to the total of all diagonal elements within the covariance matrix, and

log det (Σ)

denotes the logarithm of the determinant of the covariance matrix. The symbol D corresponds to the dimensionality of the feature vector.

When the covariance matrix

Σ

is set to zero, the posterior distribution turns deterministic, causing z to equal

μ (x)

and yielding a fixed embedding. In this scenario, reducing the mutual information between Z and X is equivalent to applying an

ℓ_{2}

norm regularization on z.

Term ➁. The goal of maximizing

I (Z; Y)

is to ensure that the latent representation Z effectively encodes information relevant to predicting the label Y. This helps the model focus on extracting useful, task-relevant clean features, thereby improving classification performance and defense effectiveness, while reducing the interference of backdoor features on the model’s decisions. Since calculating

I (Z; Y)

directly is challenging, we approximate it by minimizing the cross-entropy loss (

C E

). To achieve this, a sample-weighted cross-entropy loss (

L weighted

) is applied to train

f t

, where features from the intermediate layers of

f_{t^{'}}

are used only for weighting purposes without backpropagation through

f_{t^{'}}

. The weight calculation is expressed as follows:

ω (x) = 1 - \frac{C E (f_{t} (x), y)}{C E (f_{t^{'}} (x), y) + C E (f_{t} (x), y) + ϵ} .

(7)

For training samples, a low loss value on

f_{t^{'}}

results in a weight near 0, while a high loss leads the weight to approach 1. A tiny value

ϵ

is included to prevent division errors caused by zero denominators. This weighting mechanism aims to direct

f_{t}

towards emphasizing clean feature learning and diminishing the influence of backdoor feature extraction. It is important to note that the features of the auxiliary model

f_{t^{'}}

are computed only during the forward pass and do not participate in the backward propagation.

Term ➂. Using the connection between mutual information and entropy,

I (Z; R)

can be represented as follows:

I (Z; R) = H (Z) - H (Z | R) .

(8)

Minimizing the mutual information

I (Z; R)

aims to separate the clean features Z from the backdoor features R, helping the model focus on extracting task-relevant clean key information while suppressing interference from backdoor features, thereby enhancing the model’s defense capability against backdoor attacks. We utilize contrastive learning to improve the model’s feature space by contrasting positive and negative pairs. Positive pairs include z and

z^{'}

, where

z^{'}

is generated by applying data augmentation on the input x to form

x^{'}

, which is then processed by the local model

f_{t}

to obtain its feature. Negative pairs are constructed from the local model’s feature z and the backdoor model’s feature

r_{i}

. Figure 2 illustrates the overall contrastive learning framework. All colors are used for visual distinction only and have no practical meaning.

Within this framework, the local model

f_{t}

and the backdoor model

f_{t^{'}}

produce intermediate representations z and r, respectively. These outputs are fed into the contrastive learning component to compute the contrastive loss. The loss function employed is based on the InfoNCE formulation, defined as follows:

\begin{matrix} s^{+} = sim (z, z^{'}) s_{i}^{-} = sim (z, r_{i}) (i = 1, \dots, n) \\ L_{contrastive} = - log \frac{exp (s^{+} / τ)}{exp (s^{+} / τ) + \sum_{i = 1}^{n} exp (s_{i}^{-} / τ)} . \end{matrix}

(9)

Here,

sim (\cdot, \cdot)

refers to the similarity measure between sample pairs, commonly calculated using cosine similarity;

s^{+}

denotes the similarity score for positive pairs, whereas

s_{i}^{-}

corresponds to that of negative pairs; the parameter

τ

regulates the sharpness of the similarity distribution, while n represents the total count of samples.

It is worth noting that the InfoNCE loss serves as a lower bound on the mutual information between positive pairs

I (Z; Z^{'})

[39,40]; thus, maximizing the similarity of positive pairs implicitly maximizes this mutual information. Meanwhile, by contrasting with negative samples

r_{i}

, the method encourages the dissimilarity between Z and R, effectively reducing the mutual information

I (Z; R)

and promoting disentanglement. Although InfoNCE does not theoretically guarantee complete independence between Z and R, prior works [41] demonstrate that with sufficient negative samples and appropriate temperature settings, it effectively enforces feature separation in practice. Therefore, we consider the use of the InfoNCE-based contrastive loss as a practical and efficient approach to approximate the desired feature disentanglement in our framework.

Through optimizing this objective, the model increases similarity between positive pairs to strengthen the consistency of clean features and reduces similarity among negative pairs to lower the mutual information between Z and R. Finally, integrating contrastive learning with sample weighting, the loss function for

f_{t}

is formulated as follows:

L_{C} = L_{contrastive} + L_{weighted} + β {∥ μ (x) ∥}_{2}^{2},

(10)

β

controls the strength of regularization, balancing feature compression and model performance. Too small weakens defense; too large harms accuracy. We set

β

= 0.1 in our experiments.

Based on the described approach, stochastic gradient descent optimizes the loss functions for both

f_{t^{'}}

and

f_{t}

. Algorithm 1 presents the entire procedure. The contrastive learning process is described in Algorithm 2.

Algorithm 1 Decoupled contrastive learning for federated backdoor defense

1:: Global server’s input: Global model $f_{0}$ , total communication rounds T, Client sampling size per round K
2:: Local client’s input: backdoor model training iterations $T_{1}$ , local model training iterations $T_{2}$
3:: Global output: Global backdoor-free model f
4:: Initialize: Global model $f \leftarrow f_{0}$
5:: for each round $t \in {1, 2, \dots, T}$ do
6:: Server samples a set of clients $C_{t}$ with $| C_{t} | = K$
7:: for each client $k \in C_{t}$ do
8:: if client k is labeled as malicious then
9:: ▷Train backdoor model $f_{t^{'}}$
10:: Initialize $f_{t^{'}}$ and local model $f_{t} \leftarrow f$
11:: for iteration $i = 1$ to $T_{1}$ do
12:: Train $f_{t^{'}}$ using poisoned data via SGD
13:: end for
14:: for iteration $j = 1$ to $T_{2}$ do
15:: Sample poisoned data $(x, y)$
16:: ${r_{i}}_{i = 1}^{n} \leftarrow {f_{t^{'}} (x) ∣ x \in batch, \forall batch \in dataset}$
17:: $z \leftarrow f_{t} (x)$ ▷Current feature from clean model
18:: $z^{'} \leftarrow f_{t} (x^{'})$ ▷Augmented view
19:: Compute contrastive loss: $Contrastive_L ({r_{i}}_{i = 1}^{n}, z, z^{'})$
20:: Compute instance weight $w (x)$
21:: Optimize $f_{t}$ using SGD with total loss (e.g., Equation (10))
22:: end for
23:: else
24:: ▷Benign client training
25:: Initialize local model $f_{t} \leftarrow f$
26:: Train $f_{t}$ using clean data with SGD for $T_{2}$ iterations
27:: end if
28:: Client k sends updated model $f_{t}$ to server
29:: end for
30:: Server aggregates models from clients to update f
31:: end for

Algorithm 2 Contrastive loss calculation

1:: Input: Clean feature z, Augmented clean feature $z^{'}$ , Set of backdoor features ${r_{i}}_{i = 1}^{n}$ , Temperature parameter $τ$
2:: Output: Contrastive loss $L_{contrastive}$
3:: Initialize: $L_{contrastive} \leftarrow 0$
4:: function Contrastive_L $({r_{i}}_{i = 1}^{n}, z, z^{'})$
5:: $sim (z, z^{'}) \leftarrow \frac{z \cdot z^{'}}{∥ z ∥ ∥ z^{'} ∥}$ ▷Similarity between z and $z^{'}$
6:: for each $r_{i} \in {r_{1}, r_{2}, \dots, r_{n}}$ do
7:: $sim (z, r_{i}) \leftarrow \frac{z \cdot r_{i}}{∥ z ∥ ∥ r_{i} ∥}$ ▷Similarity between z and $r_{i}$
8:: end for
9:: $Numerator \leftarrow exp (\frac{sim (z, z^{'})}{τ})$
10:: $Denominator \leftarrow Numerator + \sum_{i = 1}^{n} exp (\frac{sim (z, r_{i})}{τ})$
11:: $L_{contrastive} \leftarrow - log \frac{Numerator}{Denominator + ϵ}$
12:: Return: $L_{contrastive}$

5. Results

This section assesses the effectiveness of DCL against data-poisoning attacks. We compare DCL’s performance with seven leading defense techniques: Krum [11], RFA [10], Median [9], FLTrust [20], DnC [42], SDFC [16], and FLGuardian [43]. Additionally, we provide results for a baseline scenario without any defense mechanisms, denoted as “No Defense”. In addition to adopting the experimental configurations from [2,5], we also design experiments based on the setup used in our theoretical analysis to support its validity. Furthermore, we assess the performance of DCL under adaptive attack scenarios.

5.1. Experimental Settings

Our experiments follow the protocol outlined by Bagdasaryan et al. [5], guided by parameters informed by theoretical insights. The system consists of 100 clients, with 10 clients participating in each training round—6 are benign, and 4 are adversarial. Benign clients use a learning rate of 0.1, while malicious clients adopt 0.05. Training is performed with a batch size of 64 samples per client.

Datasets and DNNs: We assess our method using three widely recognized image classification benchmarks: MNIST [44], Fashion-MNIST (F-MNIST) [45], and CIFAR-10 [46]. An overview of these datasets is presented in Table 1. For the MNIST and Fashion-MNIST datasets, the SimpNet architecture [47] is utilized, while ResNet-34 [48] serves as the backbone model for CIFAR-10. To reflect the typical Non-IID distribution in federated learning, we generate data heterogeneity by sampling from a Dirichlet distribution [49] with parameter

α = 0.5

.

Attack Setups: We adopted the BadNets [50] attack methodology as the adversarial framework in our experiments. Model optimization was performed using stochastic gradient descent (SGD) on deliberately poisoned training datasets. The proportion of poisoned samples per training batch was strictly set to 20 out of 64 for the MNIST and Fashion-MNIST datasets, and 5 out of 64 for the CIFAR-10 dataset. During training, all backdoor attacks consistently used class 2 as the target label. To provide a more intuitive illustration of the impact of backdoor attacks and the defense effectiveness of our method, we present representative poisoned samples from the MNIST dataset. As shown in Figure 3, the vanilla model is easily influenced by the trigger (located at the top-left corner), tends to prioritize learning the backdoor features, and misclassifies the poisoned samples into the target class.

Evaluation Criteria: We evaluation focuses on two key metrics: the classification accuracy (ACC) reflecting the model’s performance on the main task, and the attack success rate (ASR) measuring the effectiveness of backdoor intrusions. ASR indicates how susceptible the model is to backdoor triggers, with attackers attempting to increase it while defenders strive to reduce it. Meanwhile, ACC evaluates the model’s effectiveness on its primary task, which both attackers and defenders aim to preserve to avoid compromising normal operation.

5.2. Evaluation on Backdoor Mitigation

It is crucial to emphasize that backdoor attacks are executed only after the global model has reached convergence, as previous studies [2] have demonstrated that initiating poisoning from the first training round significantly hinders the model’s convergence on the primary task. In the experimental setup, two distinct attack strategies were examined: single-round and continuous attacks. In a single-round attack, the attacker participates in a single training iteration. Conversely, a continuous attack entails the attacker participating in all training rounds, which is more challenging to counteract.

Single attack: To ensure an unbiased comparison, we report the attack success rate and accuracy of our method and baseline approaches at identical training rounds. The single-step attack results are detailed in Table 2. As shown, without any defense, the ASR surpasses 80% on all three datasets, while ACC remains above 78%. Under single-step attacks, traditional defense techniques also achieve notable effectiveness since model replacement leads to abnormal gradient updates that these methods can mitigate. Our proposed DCL method lowers the ASR to under 5% across all datasets, with the drop in ACC kept within 3%. Although DCL’s performance on CIFAR-10 is marginally behind some baselines, its local defense strategy offers the advantage of minimizing privacy risks.

Continuous attack: Table 3 shows the results for continuous attack scenarios. It is clear from the table that most defense methods struggle under continuous attacks. While Krum reduces the ASR on MNIST and Fashion-MNIST to below 21%, its effectiveness on CIFAR-10 remains limited. In contrast, DCL significantly decreases the ASR to very low levels, with the corresponding drop in ACC kept within acceptable bounds. Specifically, the ASR on MNIST falls below 1%, with ACC decreasing by less than 1%. DCL reduces the attack success rate to under 16% on Fashion-MNIST and below 11% on CIFAR-10, with accuracy decreasing by less than 6%. Unlike single-round attacks, continuous attacks introduce backdoors progressively through repeated malicious updates, making them harder to detect and defend. DCL addresses this by isolating backdoor features from clean representations during training, minimizing the impact of poisoned data on local models. This enhances the global model’s resilience, making it more resistant to both one-time and ongoing attacks.

5.3. Hidden-Feature Visualization

In Figure 4, we leverage t-SNE [51] to visualize the latent space, providing a comprehensive understanding of the proposed method. A data-poisoning attack is conducted on the CIFAR-10 dataset. Figure 4a shows the disentanglement capability of the latest baseline method FLGuardian, where the backdoor and clean features largely overlap, indicating poor disentanglement. In Figure 4b, the distributions of R and Z after training by our proposed method are depicted, revealing a clear separation between the two. This observation confirms the successful disentanglement of features achieved by our approach. Figure 4c,d present the t-SNE visualizations of R and Z, respectively, where samples with different labels are represented by distinct colors. Notably, backdoor samples form distinct clusters within R, indicating that the backdoor features have been effectively captured by the model

f_{t^{'}}

. Conversely, in Z, backdoor samples are closely aligned with clean samples, demonstrating the effectiveness of DCL in mitigating backdoor attacks and improving the robustness of federated learning systems.

5.4. Consequences of Different Non-IID Degrees

To simulate varying degrees of Non-IID data in federated learning, we employ the Dirichlet distribution with hyperparameter

α

. We assess the effect of a single attack on the CIFAR-10 dataset under both “No Defense” and DCL defense settings. As depicted in Figure 5, without any defense, an increase in

α

corresponds to a stronger backdoor attack performance. In contrast, our DCL method consistently maintains the ASR below 6% across different Non-IID levels, while ensuring stable classification accuracy on benign data.

5.5. Results of Adaptive Attacks

To evaluate DCL’s robustness against potential adaptive attacks, we design targeted countermeasures and assess its performance in this scenario. The adaptive attack specifically targets DCL’s two defense mechanisms: (1) for feature decoupling, attackers employ dynamic feature obfuscation by minimizing the Wasserstein distance (beyond cosine similarity) between backdoor and clean features in latent space; (2) for contrastive learning, they implement adversarial sample weighting to disguise poisoned samples as high-confidence clean samples during local training. As shown in Table 4, DCL maintains robust defense performance under continuous adaptive attacks across all datasets.

5.6. Comparison of Computational Overhead

During the local training phase, the proposed method introduces an auxiliary model combined with a contrastive learning mechanism, which inevitably incurs some computational overhead. To evaluate this overhead, experiments were conducted on the MNIST dataset with a non-IID data partition following a Dirichlet distribution parameterized by 0.5. The proposed method was compared against the standard federated averaging algorithm (FedAvg) without defense, focusing on the local training time per communication round. As shown in Table 5, under the single-attack scenario, the average local training time increases by only approximately 7.6% compared to FedAvg. In the more challenging continuous attack scenario, the additional overhead remains controlled at around 9%, without significant growth. These results indicate that the proposed method enhances security without introducing noticeable communication burden. However, we also acknowledge that in large-scale deployments or resource-constrained federated learning environments, requiring each client to train both a contrastive learning model and a variational inference model simultaneously may limit the feasibility of the proposed method. In the future, the algorithm design could be further improved by sharing backdoor model parameters or caching intermediate computation results, thereby enhancing its scalability and practicality in complex real-world scenarios.

5.7. Impact of Temperature Parameter $τ$ on DCL

To investigate the effect of the temperature parameter

τ

in contrastive learning, we conducted experiments on CIFAR-10 under a single-shot attack scenario with

τ

set to 0.05, 0.1, and 0.5. The results in Table 6 show that smaller

τ

values sharpen the contrastive loss, enhancing feature separation between clean and backdoor samples, thus improving defense. Larger

τ

values smooth the loss and reduce discriminability, weakening defense. Accordingly, we select

τ

= 0.05 for subsequent experiments.

5.8. Ablation Study of Contrastive Learning and Information Bottleneck

To evaluate the contributions of each component in our defense framework, we conducted an ablation study on CIFAR-10 under a single-shot attack. We compared three settings: contrastive learning only (CL-only), information bottleneck only (IB-only), and the combined method (DCL). As shown in Table 7, both CL-only and IB-only significantly reduced the attack success rate (ASR) compared to the baseline, with CL-only achieving a lower ASR of 15.86%. This is because contrastive learning explicitly disentangles features, helping to identify backdoor features even without IB. In contrast, IB alone cannot fully separate clean and backdoor features, resulting in higher ASR. The combined method further reduced ASR to 4.28% while maintaining high clean accuracy, demonstrating that CL and IB complement each other by compressing redundant information and explicitly disentangling features.

We also compared the feature disentanglement capabilities of IB-only and CL-only in more detail. Figure 6a illustrates the disentanglement performance of IB-only, where backdoor and clean features largely overlap. This is mainly because the Information Bottleneck strategy focuses on compressing irrelevant information and extracting key predictive information, resulting in weaker disentanglement ability in the feature space. Figure 6b shows the disentanglement effect of CL-only, where clean and backdoor features are more effectively separated, though some overlap still exists. This is because contrastive learning helps improve the discriminability and separability of features by pulling together samples of the same class and pushing apart samples of different classes.

6. Conclusions

We proposed a federated backdoor defense strategy called decoupled contrastive learning (DCL). Unlike traditional methods that rely on inspecting model updates to detect backdoor attacks—often requiring access to clients’ local data or gradients and risking privacy breaches—DCL implements defense locally, safeguarding data privacy while significantly lowering backdoor attack success rates. Our approach outperforms existing defense techniques in both single-round and continuous attack scenarios. From a theoretical standpoint, DCL utilizes feature disentanglement combined with contrastive learning to strengthen the model’s capacity to extract clean features and suppress backdoor-related ones, thereby improving resilience against diverse attacks. This framework also has potential for extension to other fields, such as natural language processing, supporting its applicability in multimodal tasks. However, the current study validates the proposed method primarily on two architectures, SimpNet and ResNet-34. While these models are representative to some extent, the limited coverage of model types remains a constraint. Moreover, our method is designed for static backdoor attacks; its generalizability to dynamic and more sophisticated attacks requires further investigation. In future work, we plan to extend DCL to a wider range of mainstream and structurally diverse architectures, further optimize communication efficiency to enhance the scalability of the method—such as by sharing the parameters of the backdoor model or caching intermediate computation results—and strengthen its robustness against more sophisticated backdoor strategies (e.g., dynamic attacks). Additionally, we will focus on addressing technical and resource constraints, integrating advanced feature disentanglement defenses, expanding experimental baselines, and enhancing the effectiveness and scope of our research. These efforts aim to improve the practicality and generalizability of our approach and better adapt it to diverse application scenarios such as federated learning.

Author Contributions

All authors contributed to the conceptualization and methodology of the study. J.C. and T.Z. were responsible for data collection and experimental analysis; W.W. and J.W. conducted model training and results validation; J.C. and M.L. jointly drafted the initial manuscript; Y.Z. was responsible for literature review and visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Open Foundation of Key Laboratory of Cyberspace Security, Ministry of Education [KLCS20240210] and the Fundamental Research Funds for the Central Universities [3282023012, 3282025041].

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare that they have no competing interests.

References

Konečný, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated learning: Strategies for improving communication efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
Xie, C.; Huang, K.; Chen, P.Y.; Li, B. DBA: Distributed backdoor attacks against federated learning. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Tolpegin, V.; Truex, S.; Gursoy, M.E.; Liu, L. Data poisoning attacks against federated learning systems. In Proceedings of the European Symposium on Research in Computer Security (ESORICS), Guildford, UK, 14–18 September 2020; Springer: Cham, Switzerland, 2020; pp. 480–501. [Google Scholar]
Shejwalkar, V.; Houmansadr, A.; Kairouz, P.; Ramage, D. Back to the drawing board: A critical evaluation of poisoning attacks on production federated learning. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 22–26 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1354–1371. [Google Scholar]
Bagdasaryan, E.; Veit, A.; Hua, Y.; Estrin, D.; Shmatikov, V. How to backdoor federated learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS 2020), Palermo, Italy, 3–5 June 2020; PMLR: Cambridge, MA, USA, 2020; pp. 2938–2948. [Google Scholar]
Fang, M.; Cao, X.; Jia, J.; Gong, N. Local model poisoning attacks to Byzantine-robust federated learning. In Proceedings of the 29th USENIX Security Symposium (USENIX Security 20), Boston, MA, USA, 12–14 August 2020; pp. 1605–1622. [Google Scholar]
Sun, Y.; Ochiai, H.; Sakuma, J. Semi-targeted model poisoning attack on federated learning via backward error analysis. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–8. [Google Scholar]
Bhagoji, A.N.; Chakraborty, S.; Mittal, P.; Calo, S.B. Analyzing federated learning through an adversarial lens. In Proceedings of the International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; PMLR: Cambridge, MA, USA, 2019; pp. 634–643. [Google Scholar]
Yin, D.; Chen, Y.; Kannan, R.; Bartlett, P. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates. In Proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; PMLR: Cambridge, MA, USA, 2018; pp. 5650–5659. [Google Scholar]
Pillutla, K.; Kakade, S.M.; Harchaoui, Z. Robust aggregation for federated learning. IEEE Trans. Signal Process. 2022, 70, 1142–1154. [Google Scholar] [CrossRef]
Blanchard, P.; El Mhamdi, E.M.; Guerraoui, R.; Stainer, J. Machine learning with adversaries: Byzantine tolerant gradient descent. Adv. Neural Inf. Process. Syst. 2017, 30, 118–128. [Google Scholar]
Shen, S.; Tople, S.; Saxena, P. Auror: Defending against poisoning attacks in collaborative deep learning systems. In Proceedings of the 32nd Annual Conference on Computer Security Applications (ACSAC 2016), Los Angeles, CA, USA, 5–9 December 2016; pp. 508–519. [Google Scholar]
Fung, C.; Yoon, C.J.M.; Beschastnikh, I. The limitations of federated learning in sybil settings. In Proceedings of the 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020), San Sebastián, Spain, 14–18 October 2020; pp. 301–316. [Google Scholar]
Muñoz-González, L.; Co, K.T.; Lupu, E.C. Byzantine-robust federated machine learning through adaptive model averaging. arXiv 2019, arXiv:1909.05125. [Google Scholar] [CrossRef]
Wang, N.; Xiao, Y.; Chen, Y.; Zheng, Z.; Zhang, Q. FLARE: Defending federated learning against model poisoning attacks via latent space representations. In Proceedings of the 2022 ACM Asia Conference on Computer and Communications Security (ASIA CCS 2022), Taipei, Taiwan, 17–21 May 2022; pp. 946–958. [Google Scholar]
Huang, W.; Ye, M.; Shi, Z.; Chen, W.; Zhang, Y. Fisher calibration for backdoor-robust heterogeneous federated learning. In Proceedings of the European Conference on Computer Vision (ECCV 2024), Zurich, Switzerland, 23–27 September 2024; Springer Nature: Cham, Switzerland, 2024; pp. 247–265. [Google Scholar]
Ding, B.; Yang, P.; Huang, S.J. FLAIN: Mitigating backdoor attacks in federated learning via flipping weight updates of low-activation input neurons. In Proceedings of the 2025 International Conference on Multimedia Retrieval (ICMR 2025), Tokyo, Japan, 15–19 April 2025; pp. 219–227. [Google Scholar]
Zhang, H.; Li, X.; Xu, M.; Li, J.; Wang, Y. BADFL: Backdoor attack defense in federated learning from local model perspective. IEEE Trans. Knowl. Data Eng. 2024, 36, 5661–5674. [Google Scholar] [CrossRef]
Lin, Y.; Liao, Y.; Wu, Z.; Chen, Y. Mitigating backdoors in federated learning with FLD. In Proceedings of the 2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), Shanghai, China, 12–14 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 530–535. [Google Scholar]
Cao, X.; Fang, M.; Liu, J.; Chen, T.; Yu, J. FLTrust: Byzantine-robust federated learning via trust bootstrapping. arXiv 2020, arXiv:2012.13995. [Google Scholar]
Zhu, L.; Liu, Z.; Han, S. Deep leakage from gradients. Adv. Neural Inf. Process. Syst. 2019, 32, 14747–14756. [Google Scholar]
Li, Z.; Zhang, J.; Liu, L.; Yang, Z.; Chen, H. Auditing privacy defenses in federated learning via generative gradient leakage. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), New Orleans, LA, USA, 18–24 June 2022; pp. 10132–10142. [Google Scholar]
Lake, B.M.; Ullman, T.D.; Tenenbaum, J.B.; Gershman, S.J. Building machines that learn and think like people. Behav. Brain Sci. 2017, 40, e253. [Google Scholar] [CrossRef]
Hamaguchi, R.; Sakurada, K.; Nakamura, R. Rare event detection using disentangled representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, CA, USA, 15–20 June 2019; pp. 9327–9335. [Google Scholar]
Liu, D.; Cheng, P.; Zhu, H.; Zhao, J.; Liu, X. Mitigating confounding bias in recommendation via information bottleneck. In Proceedings of the 15th ACM Conference on Recommender Systems (RecSys 2021), Amsterdam, The Netherlands, 27 September–1 October 2021; pp. 351–360. [Google Scholar]
Wang, G.; Han, H.; Shan, S.; Chen, X. Cross-domain face presentation attack detection via multi-domain disentangled representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, WA, USA, 13–19 June 2020; pp. 6678–6687. [Google Scholar]
Oord, A.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.H.; Buchatskaya, E.; Doersch, C.; Pires, B.A.; Guo, Z.; Azar, M.G.; et al. Bootstrap your own latent—A new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 2020, 33, 21271–21284. [Google Scholar]
Wang, D.; Ding, N.; Li, P.; Zheng, H.T. Cline: Contrastive learning with semantic negative examples for natural language understanding. arXiv 2021, arXiv:2107.00440. [Google Scholar] [CrossRef]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning (ICML 2020), Vienna, Austria, 13–18 July 2020; PMLR: Cambridge, MA, USA, 2020; pp. 1597–1607. [Google Scholar]
He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
Gao, T.; Yao, X.; Chen, D. SimCSE: Simple contrastive learning of sentence embeddings. arXiv 2021, arXiv:2104.08821. [Google Scholar]
Jiang, T. Learn from failure: Causality-guided contrastive learning for generalizable implicit hate speech detection. In Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025), Tokyo, Japan, 20–26 August 2025; pp. 8858–8867. [Google Scholar]
Jin, X.; Wang, J.; Ou, X.; Li, Y.; Chen, Z. Time-series contrastive learning against false negatives and class imbalance. IEEE Trans. Neural Netw. Learn. Syst. 2025, in press. [Google Scholar] [CrossRef]
Xuan, S.; Zhang, S. Decoupled contrastive learning for long-tailed recognition. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2024), Vancouver, BC, Canada, 2–9 February 2024; Volume 38, pp. 6396–6403. [Google Scholar]
Li, Y.; Lyu, X.; Koren, N.; Li, Y.; Yang, J. Anti-backdoor learning: Training clean models on poisoned data. Adv. Neural Inf. Process. Syst. 2021, 34, 14900–14912. [Google Scholar]
Huang, Z.; Lin, X.; Wang, H.; Chen, Y.; Li, M. DisenQNet: Disentangled representation learning for educational questions. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD 2021), Virtual Event, Singapore, 14–18 August 2021; pp. 696–704. [Google Scholar]
Alemi, A.A.; Fischer, I.; Dillon, J.V.; Murphy, K. Deep variational information bottleneck. arXiv 2016, arXiv:1612.00410. [Google Scholar]
Poole, B.; Ozair, S.; Van Den Oord, A.; Alemi, A.; Tucker, G. On variational bounds of mutual information. In Proceedings of the International Conference on Machine Learning (ICML 2019), Long Beach, CA, USA, 9–15 June 2019; PMLR: Cambridge, MA, USA, 2019; pp. 5171–5180. [Google Scholar]
Tschannen, M.; Djolonga, J.; Rubenstein, P.K.; Hofmann, T. On mutual information maximization for representation learning. arXiv 2019, arXiv:1907.13625. [Google Scholar]
Wang, T.; Isola, P. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In Proceedings of the International Conference on Machine Learning (ICML 2020), Vienna, Austria, 13–18 July 2020; PMLR: Cambridge, MA, USA, 2020; pp. 9929–9939. [Google Scholar]
Shejwalkar, V.; Houmansadr, A. Manipulating the Byzantine: Optimizing model poisoning attacks and defenses for federated learning. In Proceedings of the Network and Distributed System Security Symposium (NDSS 2021), San Diego, CA, USA, 21–24 February 2021. [Google Scholar]
Zhou, X.; Chen, X.; Liu, S.; Li, J.; Wang, Y. FLGuardian: Defending against model poisoning attacks via fine-grained detection in federated learning. IEEE Trans. Inf. Forensics Secur. 2025, in press. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar] [CrossRef]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical Report; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
Hasanpour, S.H.; Rouhani, M.; Fayyaz, M.; Sabokrou, M.; Fathy, M. Let’s keep it simple, using simple architectures to outperform deeper and more complex architectures. arXiv 2016, arXiv:1608.06037. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Minka, T. Estimating a Dirichlet Distribution. Technical Report. 15 November 2000. Available online: https://tminka.github.io/papers/dirichlet/minka-dirichlet.pdf (accessed on 17 August 2025).
Gu, T.; Dolan-Gavitt, B.; Garg, S. BadNets: Identifying vulnerabilities in the machine learning model supply chain. arXiv 2017, arXiv:1708.06733. [Google Scholar]
Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]

Figure 1. Overview of DCL. The bottom right section (green box) represents the training process of benign clients. The red box illustrates the main steps of our local defense mechanism against backdoor attacks initiated by malicious clients, after which the malicious clients submit the locally defended updates to the global server. The blue box illustrates how the central server integrates model parameters collected from distributed clients to refresh the global model collaboratively.

Figure 2. The framework of contrastive learning. By constructing a contrastive loss function, the distance between the positive sample pair z and

z^{'}

in the feature space is minimized, where

z^{'}

is obtained by applying data augmentation to the input sample x to get

x^{'}

, and then extracting its representation through the local model

f_{t}

. At the same time, the negative sample pair z and

r_{i}

are pushed apart, thereby achieving the goal of decoupling clean features from backdoor features.

Figure 2. The framework of contrastive learning. By constructing a contrastive loss function, the distance between the positive sample pair z and

z^{'}

in the feature space is minimized, where

z^{'}

is obtained by applying data augmentation to the input sample x to get

x^{'}

, and then extracting its representation through the local model

f_{t}

. At the same time, the negative sample pair z and

r_{i}

are pushed apart, thereby achieving the goal of decoupling clean features from backdoor features.

Figure 3. Examples of backdoor samples in the MNIST dataset. The yellow square in the top-left corner indicates the backdoor trigger.

Figure 4. Hidden-feature disentanglement using t-SNE: (a) FLGuardian feature space; (b) after training the feature space with DCL; (c) backdoor Model features; (d) clean Model features.

Figure 5. Effect of varying Non-IID levels on DCL performance using CIFAR-10 with 10% malicious clients. Left: main task accuracy (ACC); Right: attack success rate (ASR).

Figure 6. (a) Feature space learned using only the information bottleneck; (b) feature space learned using only contrastive learning.

Table 1. Dataset specifications.

Dataset	Labels	Input Size	Training Images
MNIST	10	28 × 28 × 1	60,000
Fashion-MNIST	10	28 × 28 × 1	60,000
CIFAR-10	10	32 × 32 × 3	50,000

Table 2. Comparison of DCL and leading defenses in terms of robustness under single-attack scenarios on MNIST, FMNIST, and CIFAR-10 datasets using 0.5 degree of non-IID data.

Defense	MNIST		Fashion-MNIST		CIFAR-10
Defense	ACC(%)	ASR(%)	ACC(%)	ASR(%)	ACC(%)	ASR(%)
No Defense	96.33	83.46	81.50	96.79	78.74	80.35
Krum	96.50	0.39	79.75	10.21	77.42	9.13
RFA	96.93	0.43	80.25	5.04	78.84	6.83
Median	96.76	0.37	80.92	6.23	65.77	3.14
FLTrust	96.28	0.48	79.64	7.60	72.65	4.49
Dnc	96.61	0.52	80.33	5.38	76.59	4.98
SDFC	96.62	0.55	79.81	5.13	76.40	5.74
FLGuardian	96.86	0.24	81.51	4.78	74.87	2.23
DCL	96.67	0.18	80.47	4.66	76.53	4.28

Table 3. Comparison of DCL and leading defenses in terms of robustness under continuous attack scenarios on MNIST, FMNIST, and CIFAR-10 datasets using 0.5 degree of non-IID data.

Defense	MNIST		Fashion-MNIST		CIFAR-10
Defense	ACC(%)	ASR(%)	ACC(%)	ASR(%)	ACC(%)	ASR(%)
No Defense	97.46	99.97	81.27	99.86	78.92	83.64
Krum	96.59	0.21	72.98	20.13	42.40	17.58
RFA	97.54	99.93	82.34	99.87	75.39	65.30
Median	97.14	65.89	83.84	99.90	60.01	70.13
FLTrust	91.96	20.66	74.59	34.46	73.85	63.97
Dnc	91.81	99.78	65.18	98.34	54.32	82.54
SDFC	90.74	99.99	61.42	99.05	52.15	82.73
FLGuardian	90.03	19.63	76.51	36.88	64.81	43.92
DCL	96.69	0.96	77.24	15.71	73.42	10.31

Table 4. Adaptive attack evaluation.

Defense	MNIST		Fashion-MNIST		CIFAR-10
Defense	ACC(%)	ASR(%)	ACC(%)	ASR(%)	ACC(%)	ASR(%)
No Defense	97.86	100.00	80.97	99.99	78.61	84.83
DCL	96.52	1.41	74.22	18.46	70.05	12.32

Table 5. Computation overhead comparison under different attack scenarios on MNIST dataset using 0.5 degree of non-IID data.

Attack Type	Method	Avg. Local Training Time (s)
Single Attack	No Defense	39
	DCL	42
Continuous Attack	No Defense	129
	DCL	141

Table 6. Compare the defense effectiveness under different temperature parameters in a Single-shot Attack scenario on the CIFAR-10 Dataset with a non-IID degree of 0.5.

$τ$	ACC(%)	ASR(%)
0.05	76.53	4.28
0.1	76.41	5.93
0.5	77.26	7.57

Table 7. Ablation study of CL and IB components in the proposed defense framework.

Defense	ACC(%)	ASR(%)
No Defense	78.74	80.35
IB-only	76.55	63.29
CL-only	74.31	15.86
DCL	76.53	4.28

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cheng, J.; Zhang, T.; Li, M.; Wang, W.; Wang, J.; Zhang, Y. A Decoupled Contrastive Learning Framework for Backdoor Defense in Federated Learning. Symmetry 2025, 17, 1398. https://doi.org/10.3390/sym17091398

AMA Style

Cheng J, Zhang T, Li M, Wang W, Wang J, Zhang Y. A Decoupled Contrastive Learning Framework for Backdoor Defense in Federated Learning. Symmetry. 2025; 17(9):1398. https://doi.org/10.3390/sym17091398

Chicago/Turabian Style

Cheng, Jiahao, Tingrui Zhang, Meijiao Li, Wenbin Wang, Jun Wang, and Ying Zhang. 2025. "A Decoupled Contrastive Learning Framework for Backdoor Defense in Federated Learning" Symmetry 17, no. 9: 1398. https://doi.org/10.3390/sym17091398

APA Style

Cheng, J., Zhang, T., Li, M., Wang, W., Wang, J., & Zhang, Y. (2025). A Decoupled Contrastive Learning Framework for Backdoor Defense in Federated Learning. Symmetry, 17(9), 1398. https://doi.org/10.3390/sym17091398

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Decoupled Contrastive Learning Framework for Backdoor Defense in Federated Learning

Abstract

1. Introduction

2. Related Work

2.1. Backdoor Attacks

2.2. Backdoor Defense

2.3. Contrastive Learning

3. Threat Model

4. Methodology

4.1. Backdoor Model $f_{t^{'}}$

4.2. Local Model $f_{t}$

5. Results

5.1. Experimental Settings

5.2. Evaluation on Backdoor Mitigation

5.3. Hidden-Feature Visualization

5.4. Consequences of Different Non-IID Degrees

5.5. Results of Adaptive Attacks

5.6. Comparison of Computational Overhead

5.7. Impact of Temperature Parameter $τ$ on DCL

5.8. Ablation Study of Contrastive Learning and Information Bottleneck

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Decoupled Contrastive Learning Framework for Backdoor Defense in Federated Learning

Abstract

1. Introduction

2. Related Work

2.1. Backdoor Attacks

2.2. Backdoor Defense

2.3. Contrastive Learning

3. Threat Model

4. Methodology

4.1. Backdoor Model f t ′

4.2. Local Model f t

5. Results

5.1. Experimental Settings

5.2. Evaluation on Backdoor Mitigation

5.3. Hidden-Feature Visualization

5.4. Consequences of Different Non-IID Degrees

5.5. Results of Adaptive Attacks

5.6. Comparison of Computational Overhead

5.7. Impact of Temperature Parameter τ on DCL

5.8. Ablation Study of Contrastive Learning and Information Bottleneck

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.1. Backdoor Model $f_{t^{'}}$

4.2. Local Model $f_{t}$

5.7. Impact of Temperature Parameter $τ$ on DCL