Semantic Communication Unlearning: A Variational Information Bottleneck Approach for Backdoor Defense in Wireless Systems

Karahan, Sümeye Nur; Güllü, Merve; Osmanca, Mustafa Serdar; Barışçı, Necaattin

doi:10.3390/fi18010017

Open AccessArticle

Semantic Communication Unlearning: A Variational Information Bottleneck Approach for Backdoor Defense in Wireless Systems

¹

R&D Department, Türk Telekom, Ankara 06080, Türkiye

²

Graduate School of Natural and Applied Sciences, Gazi University, Ankara 06500, Türkiye

³

Department of Computer Engineering, Gazi University, Ankara 06570, Türkiye

^*

Author to whom correspondence should be addressed.

Future Internet 2026, 18(1), 17; https://doi.org/10.3390/fi18010017

Submission received: 5 December 2025 / Revised: 23 December 2025 / Accepted: 26 December 2025 / Published: 28 December 2025

(This article belongs to the Special Issue Future Industrial Networks: Technologies, Algorithms, and Protocols)

Download

Browse Figures

Versions Notes

Abstract

Semantic communication systems leverage deep neural networks to extract and transmit essential information, achieving superior performance in bandwidth-constrained wireless environments. However, their vulnerability to backdoor attacks poses critical security threats, where adversaries can inject malicious triggers during training to manipulate system behavior. This paper introduces Selective Communication Unlearning (SCU), a novel defense mechanism based on Variational Information Bottleneck (VIB) principles. SCU employs a two-stage approach: (1) joint unlearning to remove backdoor knowledge from both encoder and decoder while preserving legitimate data representations, and (2) contrastive compensation to maximize feature separation between poisoned and clean samples. Extensive experiments on the RML2016.10a wireless signal dataset demonstrate that SCU achieves 629.5 ± 191.2% backdoor mitigation (5-seed average; 95% CI: [364.1%, 895.0%]), with peak performance of 1486% under optimal conditions, while maintaining only 11.5% clean performance degradation. This represents an order-of-magnitude improvement over detection-based defenses and fundamentally outperforms existing unlearning approaches that achieve near-zero or negative mitigation. We validate SCU across seven signal processing domains, four adaptive backdoor types, and varying SNR conditions, demonstrating unprecedented robustness and generalizability. The framework achieves a 243 s unlearning time, making it practical for resource-constrained edge deployments in 6G networks.

Keywords:

semantic communication; backdoor defense; machine unlearning; variational information bottleneck; wireless security

1. Introduction

The transformation observed in wireless communication systems has enabled an unprecedented level of computation–communication integration with the advent of 6G. At the core of this transformation lie semantic communication paradigms, which aim to transmit the meaning of information rather than its symbolic form [1]. While transmission units in conventional 5G systems are predominantly defined in terms of bit sequences and modulation symbols, semantic communication models leverage high-level representations of information to dramatically reduce bandwidth, latency, and energy requirements [2]. This approach is particularly critical for emerging 6G use cases such as autonomous driving, sensor-fusion–based industrial IoT, real-time XR applications, and dense connectivity scenarios [2,3]. Within this emerging paradigm, deep neural networks (DNNs) constitute both the encoder and decoder components, extracting abstract semantic features from complex I/Q samples and transmitting a compressed representation derived from these features. Information-theoretic formulations such as the VIB have been widely adopted in semantic communication architectures, as they provide both an effective compression mechanism and inherent robustness against channel noise [4,5]. However, this strong representational capability also exposes systems to new threat surfaces. DNN-based communication models inherently carry structural vulnerabilities to backdoor attacks, which are well documented in image classification and natural language processing, and adversaries may exploit these weaknesses to manipulate model behavior [6,7].

Backdoor attacks are executed by injecting a small portion of poisoned samples into the model’s training data. In these samples, the attacker embeds specific trigger patterns that cause the model to produce specially crafted outputs during deployment [8]. The key aspect of such attacks is that this behavior only manifests when the input contains the trigger; since the model operates normally on clean data, the attack often goes unnoticed. Recent variants of backdoors, such as blend-based triggers, frequency-domain perturbations, sinusoidal manipulations, and input-dependent adaptive triggers, have introduced highly stealthy and difficult-to-detect threats, particularly within the field of signal processing [9,10,11]. Traditional backdoor defenses face fundamental challenges in semantic communication systems. Complete model retraining requires massive computational resources and access to clean-labeled datasets, which may not be available in federated learning or edge deployment scenarios. Detection-based approaches struggle with sophisticated triggers that operate in frequency or wavelet domains, where statistical anomalies are harder to identify. Pruning-based defenses, while computationally efficient, remove neurons indiscriminately and often degrade performance on legitimate signal processing tasks. Furthermore, defenses developed for computer vision applications fail to generalize to multi-domain signal processing, where the same semantic content can be represented across time, frequency, and transform domains. Existing unlearning approaches, such as Variational Bayesian Unlearning and Hessian-based methods, operate in parameter space and fail to address the unique challenges of semantic communication. These methods assume that backdoor knowledge is localized to specific parameter regions, but in VIB-based architectures, backdoor patterns are encoded as distributional shifts in the latent space rather than discrete weight configurations. Consequently, parameter-space unlearning either fails to remove backdoors (achieving near-zero or negative mitigation) or causes catastrophic forgetting of clean sample representations.

We propose Selective Communication Unlearning (SCU), a principled framework that exploits the information-theoretic properties of VIB to achieve surgical backdoor removal while preserving semantic understanding. SCU operates directly in the latent representation space, where backdoor triggers manifest as systematic shifts in encoder variance and decoder reconstruction patterns. Our key insight is that backdoor samples exhibit anomalously low encoder variance (high confidence) and low reconstruction error, creating a distinctive signature that can be selectively targeted. By deliberately increasing entropy for identified poisoned samples (Stage 1: joint unlearning) and then maximizing their geometric separation from clean representations (Stage 2: contrastive compensation), SCU destroys backdoor structure without requiring explicit trigger knowledge or extensive retraining.

SCU’s lightweight and modular architecture makes it suitable for split inference frameworks in edge computing. Recent advances in split inference for large AI models [12,13] demonstrate the feasibility of distributing computational workloads between edge devices and servers, which aligns well with semantic communication architectures [14]. In our framework, semantic encoders operate on resource-constrained IoT devices while unlearning mechanisms execute at edge servers without cloud connectivity. This enables secure semantic communication in 6G networks with localized backdoor mitigation, leveraging the collaborative edge computing paradigm for efficient AI model inference.

1.1. Our Contributions

We propose Selective Communication Unlearning (SCU), a principled framework that selectively erases backdoor knowledge while preserving semantic understanding. Our key contributions include the following:

Novel two-stage information-theoretic defense: We introduce a VIB-based unlearning architecture achieving 629.5 ± 191.2% backdoor mitigation (95% CI: [364.1%, 895.0%]) with only 11.5% clean degradation—an 85× improvement over detection-based defenses and fundamentally outperforming existing unlearning methods that achieve near-zero or negative mitigation.
Comprehensive validation across domains and attacks: We demonstrate effectiveness across seven signal processing domains, four adaptive backdoor types, and challenging channel conditions (SNR $- 5$ to $+ 25$ dB, fading channels), achieving 36–1274% mitigation without spectrum downsampling.
Theoretical foundation with separability condition: We formalize when selective backdoor removal is achievable and empirically validate that SCU reduces backdoorsemantic entanglement from 0.206 to 0.146, confirming surgical removal through latent space restructuring.
Practical 6G edge deployment: SCU achieves 243 s unlearning time with $O (n \cdot d^{2})$ complexity (2× speedup over retraining), enabling split inference on resource-constrained devices with robust operation under imperfect detection and realistic conditions.

Our work establishes the first principled framework for backdoor defense in semantic communication systems, demonstrating that information-theoretic unlearning in latent space fundamentally outperforms parameter-space approaches. The combination of statistical validation (five independent runs), theoretical justification (separability condition with empirical verification), and comprehensive evaluation across domains, backdoor types, and channel conditions provides a rigorous foundation for secure semantic communication deployment in 6G networks.

1.2. Paper Organization

The remainder of this paper is organized as follows: Section 2 reviews backdoor attacks, defense mechanisms, machine unlearning, and semantic communication security, positioning SCU within the existing literature. Section 3 formalizes the VIB-based semantic communication system, establishes the threat model with multiple backdoor types, and defines defense success criteria. Section 4 presents SCU’s technical core: the two-stage framework design, joint unlearning algorithm, contrastive compensation mechanism, theoretical separability condition, and hyperparameter configuration. Section 5 describes the experimental design, dataset, multi-domain signal representations, baseline implementations, and evaluation metrics. Section 6 validates SCU through performance comparisons, statistical analysis, ablation studies, multi-domain evaluation, robustness testing, computational efficiency analysis, and adaptive backdoor experiments. Section 7 analyzes VIB’s information-theoretic foundations, practical deployment considerations, and identified limitations with proposed solutions. Section 8 summarizes the key achievements, validation results, and future research directions. The Appendices provide supplementary detection robustness results and regularization analysis.

2. Related Work

2.1. Backdoor Attacks in Deep Learning

DNNs have achieved remarkable success in computer vision, natural language processing, and wireless signal classification. However, these models are inherently vulnerable to hidden functionality in the form of backdoor attacks [8,15]. In a typical backdoor scenario, an adversary secretly interferes with the training process and injects a malicious trigger into a subset of training samples. This trigger can be a specific pattern, symbol, or signal signature embedded in the input. The model is trained such that it behaves normally on clean inputs, but when the trigger appears, it consistently outputs an attacker-chosen target prediction while preserving high accuracy on benign data. This dual behavior makes backdoor attacks both powerful and difficult to detect.

Backdoor attacks have been extensively studied in computer vision [16], where triggers like patch patterns or blending modifications cause targeted misclassification. Recent work has extended these attacks to natural language processing [17] and federated learning [18].

This threat was first systematized in the “BadNets” work by Gu et al. [16,19], where the authors stamped a small pixel pattern onto a small fraction of training images, reassigned their labels, and showed that the trained classifier always mapped triggered inputs to the target class while maintaining high clean accuracy [8]. Subsequent studies extended this line of work to clean-label attacks (without relabeling poisoned samples) and physically realizable triggers, such as stickers attached to traffic signs or patterns projected into a camera’s field of view [15]. These results demonstrate that even partial access to the training pipeline can suffice to manipulate a model’s decision mechanism in a stealthy way.

Backdoor attacks have been explored in many domains. In computer vision, triggers range from small pixel patches to perturbations concealed in the frequency domain [20,21,22]. Such attacks can induce traffic sign misclassification in autonomous driving [8], identity spoofing in face recognition systems [23], or incorrect diagnoses in medical image analysis [21]. In natural language processing (NLP), triggers may be inserted at the word, character, or context level [8,21]. Recent work on large language models (LLMs) has shown that prompt injection and chain-of-thought manipulation can exhibit backdoor-like behavior [20]. In federated learning (FL), malicious clients can corrupt local updates so that the aggregated global model becomes backdoored [18,24,25]. Related vulnerabilities have also been documented in deep reinforcement learning and speech recognition, including attacks based on ultrasonic or otherwise imperceptible triggers [15,21].

In the wireless communication domain, recent studies have shown that deep learning-based RF signal classifiers are highly vulnerable to covert backdoor mechanisms. For example, Baishya et al. [26] demonstrated that multi-target backdoor triggers can be injected into modulation recognition models without degrading their clean-data performance, achieving attack success rates close to 99% while remaining stealthy to both users and conventional validation procedures. These results highlight that backdoors can severely compromise the reliability of modulation classification pipelines deployed on edge devices, posing security risks in critical scenarios such as military communications or autonomous systems. In all these settings, the primary objective of backdoor attacks is to achieve a high attack success rate (ASR) on triggered inputs without degrading clean-data performance. Some works even propose using backdoor mechanisms as watermarks for model ownership verification, illustrating the dual-use nature of this technology [22]. However, wireless signal processing introduces unique challenges for both attacks and defenses due to continuous-valued inputs, time-series structure, and sensitivity to channel conditions.

2.2. Backdoor Defense Mechanisms

The literature on backdoor defense can be organized into three main categories, each with distinct advantages and limitations in the context of semantic communication systems: detection, mitigation, and unlearning.

Detection-based approaches aim to identify and remove poisoned samples or compromised models before deployment. Neural Cleanse [27] reverse-engineers potential triggers through optimization-based search, attempting to find minimal perturbations that cause targeted misclassification. Spectral Signatures [28] analyzes the covariance structure of deep features via singular value decomposition and detects poisoned samples as spectral outliers. Activation Clustering further groups hidden-layer activations to separate anomalous (likely poisoned) examples from clean ones [8]. While these methods show promise against relatively simple patch-based triggers, they struggle significantly with sophisticated stealthy attacks such as blend backdoors, where triggers are carefully crafted to blend into the input distribution and avoid detection [29,30].

Mitigation-based defenses focus on weakening the impact of backdoors by modifying the model or its inputs. Pruning-based defenses operate on the assumption that backdoor pathways utilize specific neurons that remain dormant on clean data. Fine-Pruning [31] identifies and removes neurons with low activation on clean samples and then fine-tunes the pruned network to recover performance. Techniques such as STRIP inject small random perturbations at the input and monitor prediction consistency to flag suspicious samples [15]. Input transformations (e.g., rescaling or filtering) and partial retraining on clean data are also used to dilute trigger effects. However, in semantic communication systems with deep encoder–decoder architectures and highly distributed representations, aggressive pruning or indiscriminate fine-tuning can severely damage the semantic encoding capacity, causing unacceptable degradation in legitimate signal processing performance [32].

Unlearning-based defenses take a different perspective: rather than merely detecting or weakening backdoors, they explicitly aim to “forget” the malicious correlations learned during training. This may involve targeted optimization against trigger-related behavior, selective retraining on carefully curated subsets, or more principled machine unlearning procedures. In wireless signal classification, where triggers may manifest at the semantic representation level rather than as obvious patterns in the raw I/Q samples, such selective unlearning is particularly important. However, most existing defenses, even when framed as erasing or cleansing, still operate either in the pixel/input space or at the neuron level and often incur a non-trivial trade-off between clean accuracy and backdoor removal.

Certified defenses pursue a different philosophy by providing provable robustness guarantees. Randomized smoothing [33] creates certified predictions by averaging over multiple noise-augmented inputs, offering mathematical guarantees on the minimum perturbation required to change the output. While theoretically appealing, this approach requires ensemble predictions at inference time, substantially increasing computational cost and latency. For real-time wireless communication systems where ultra-low latency is critical, such overhead renders certified defenses impractical for deployment.

2.3. Machine Unlearning

Machine unlearning has emerged as a principled framework for removing specific data influences from trained models [34]. Originally motivated by privacy regulations such as GDPR’s “right to be forgotten,” unlearning aims to make a model behave as if certain training samples had never been observed, without complete retraining from scratch [35].

The most straightforward approach, exact unlearning, simply retrains the entire model after removing the target samples from the training set. While this guarantees complete removal of data influence, it is computationally prohibitive for large models and datasets, requiring resources comparable to the original training process [35]. This overhead becomes particularly problematic when unlearning requests are frequent or when models need rapid updates.

To address computational challenges, approximate unlearning methods have been developed. Fisher information-based approaches [36] leverage second-order optimization to approximate the effect of retraining, using the Fisher Information Matrix to identify which parameters are most influenced by the target samples. By performing selective gradient ascent on these parameters, the model can forget specific data points with significantly lower computational cost than full retraining. More recently, variational unlearning methods have incorporated Bayesian principles [37]. These approaches model the posterior distribution over parameters and update it by removing the likelihood contribution of the target samples. Through variational inference, the updated posterior can be approximated efficiently, providing a probabilistically principled framework for unlearning with uncertainty quantification.

Beyond privacy, machine unlearning has been explored directly for backdoor erasing. BAERASER [38] is a representative example that uses a Max-Entropy Staircase Approximator to recover the trigger distribution and then applies gradient-ascent-based unlearning steps to erase backdoor features from the model, without requiring access to the original training data. While BAERASER achieves strong performance on pixel-based image backdoors, it primarily operates in the image space and does not explicitly address semantic-level representations or wireless signal domains. In federated learning, Wu et al. propose an unlearning mechanism that combines update subtraction with knowledge distillation to remove the influence of malicious clients from the global model [39], providing an important perspective for distributed settings but not directly applicable to centralized signal-processing pipelines. Guo et al. introduce verifiable unlearning with invisible backdoor triggers embedded as markers to certify whether a requested deletion has actually been performed [40]. These works focus on the trustworthiness and verifiability of unlearning, rather than on semantic communication or RF signal reconstruction.

Table 1 summarizes representative machine unlearning-based backdoor attack and defense studies, highlighting their setting, main objective, and relation to our work. As the table indicates, most existing unlearning-based backdoor defenses are designed for image classification in centralized or federated setups and operate either at the pixel level or in parameter space. In contrast, wireless semantic communication systems require selectively removing backdoor-related information at the semantic feature level in noisy, high-dimensional environments. To the best of our knowledge, our work is the first to apply unlearning principles to backdoor defense in a variational encoder–decoder for wireless semantic communication, jointly optimizing both encoding and decoding during the unlearning procedure.

2.4. Semantic Communication Security

Semantic communication has attracted significant attention as a promising paradigm for next-generation wireless systems, particularly in the context of 5G and 6G networks [42]. By transmitting semantic meaning rather than raw bit sequences, these systems can achieve remarkable efficiency gains in bandwidth-constrained and latency-critical scenarios. However, the security implications of deep learning-based semantic encoding remain largely underexplored in the literature.

The existing security research in semantic communication has predominantly focused on physical layer vulnerabilities and eavesdropping resistance [43]. These works address how to protect semantic information from interception during transmission, leveraging techniques such as channel coding and physical-layer security. While important, this focus on inference-time security overlooks a critical vulnerability: training-time attacks that can compromise the semantic encoder itself.

Backdoor attacks represent a particularly insidious threat to semantic communication systems. Unlike eavesdropping attacks that require access to the communication channel, backdoor attacks compromise the model during the training phase, potentially through supply chain vulnerabilities or malicious participation in federated learning scenarios. Once embedded, these backdoors can remain inactive during normal operation and activate only when specific trigger patterns appear, which makes them extremely difficult to detect through conventional security audits. In future 6G AI-native networks, where wireless signal classification and semantic communication will be tightly integrated, unlearning-based defenses capable of reliably removing such malicious behavior will be crucial.

Our work directly addresses this critical gap in the semantic communication security literature by developing a principled defense mechanism against backdoor attacks. By leveraging the information-theoretic properties of Variational Information Bottleneck architectures and introducing novel unlearning techniques tailored to encoder–decoder systems, we provide the first comprehensive framework for backdoor defense in semantic communication, enabling secure deployment of these promising systems in real-world wireless networks.

3. Problem Formulation and Threat Model

3.1. Semantic Communication System

We consider a VIB-based semantic communication system designed for efficient wireless signal transmission. The system consists of three main components: an encoder that extracts semantic representations, a noisy wireless channel, and a decoder that reconstructs the original signal.

The encoder

f_{θ} : X \to Z

maps an input signal

x \in R^{m \times 2}

to a compressed semantic representation

z \in R^{d}

. Here, the input consists of I/Q (in-phase and quadrature) components with

m = 128

time steps, representing a typical wireless signal frame. The encoder compresses this information into a

d = 64

-dimensional latent space, achieving an 8× compression ratio while preserving semantic content. Following the VIB framework, the encoder outputs variational parameters rather than deterministic embeddings:

\begin{matrix} μ_{θ} (x), σ_{θ} (x) & = f_{θ} (x) \end{matrix}

(1)

\begin{matrix} z & \sim N (μ_{θ} (x), σ_{θ}^{2} (x)) \end{matrix}

(2)

where

μ_{θ} (x)

represents the mean of the latent distribution and

σ_{θ} (x)

controls its variance. This stochastic encoding provides inherent robustness to channel noise and forms the foundation for our unlearning approach.

The semantic representation is transmitted through a wireless channel modeled as additive white Gaussian noise (AWGN):

\tilde{z} = z + n, n \sim N (0, σ_{ch}^{2} I)

(3)

where

σ_{ch}^{2}

represents the noise power determined by the signal-to-noise ratio (SNR) of the communication link. This model captures the fundamental limitation of wireless channels while remaining analytically tractable.

At the receiver, the decoder

g_{ϕ} : Z \to \hat{X}

reconstructs the transmitted signal from the noisy semantic representation, producing

\hat{x} = g_{ϕ} (\tilde{z})

. The decoder is trained jointly with the encoder to minimize reconstruction error while satisfying the information bottleneck constraint.

The entire system is optimized end-to-end using the VIB training objective:

L_{VIB} = E_{p (x)} [∥ x - \hat{x} ∥^{2}] + β \cdot D_{KL} (q (z | x) ∥ p (z))

(4)

The first term measures reconstruction fidelity through the mean squared error between the original signal x and its reconstruction

\hat{x}

. The second term is the Kullback–Leibler divergence between the learned latent distribution

q (z | x)

and a prior distribution

p (z)

(typically standard Gaussian), which enforces the information bottleneck by limiting the mutual information between input and latent representation. The hyperparameter

β

controls the trade-off between reconstruction quality and compression, with

β = 0.001

in our experiments to prioritize semantic preservation while maintaining reasonable compression.

To justify our encoder unlearning strategy, we establish a principled connection between posterior variance minimization and mutual information maximization within the VIB framework [44].

In the VIB framework, the encoder represents the latent variable using a Gaussian posterior with input-dependent variance. Minimizing the expected log-variance

E_{x} [log ∥ σ_{θ} (x) ∥^{2}]

tightens an upper bound on the conditional entropy

H (Z | X)

, leading to more deterministic latent representations. This variance reduction therefore serves as a practical surrogate for promoting the maximization of the mutual information

I (X; Z)

. Intuitively, the posterior variance produced by the VIB encoder quantifies its uncertainty about the latent representation given an input. Reducing this variance forces the encoder to represent only information that is consistently predictive of the input while suppressing spurious or non-generalizable features such as backdoor patterns. Since the marginal entropy

H (Z)

is implicitly controlled by the prior, minimizing the conditional entropy

H (Z | X)

, which is directly governed by the posterior variance, provides a practical mechanism for increasing mutual information. In this sense, variance minimization acts as an efficient proxy for mutual information maximization, aligning the unlearning objective with the preservation of semantically meaningful information.

The mutual information between the input X and the latent representation Z can be decomposed as

I (X; Z) = H (Z) - H (Z | X) .

(5)

For a Gaussian posterior, the conditional entropy given a fixed input x admits the closed-form expression

H (Z | X = x) = \frac{d}{2} log (2 π e) + \frac{1}{2} \sum_{i = 1}^{d} log σ_{θ, i}^{2} (x),

(6)

where d denotes the latent dimensionality. Taking the expectation over the data distribution yields

H (Z | X) = \frac{d}{2} log (2 π e) + \frac{1}{2} E_{x} [\sum_{i = 1}^{d} log σ_{θ, i}^{2} (x)] .

(7)

Within the VIB framework, the marginal entropy

H (Z)

is implicitly constrained by the choice of the prior distribution and the aggregated posterior [44,45]. Consequently, increasing

I (X; Z)

primarily amounts to reducing the conditional entropy

H (Z | X)

, which corresponds to minimizing the expected sum of posterior log-variances:

min_{θ} E_{x} [\sum_{i = 1}^{d} log σ_{θ, i}^{2} (x)] .

(8)

To obtain a computationally efficient surrogate, we define the total posterior variance as

∥ σ_{θ} {(x) ∥}^{2} = \sum_{i = 1}^{d} σ_{θ, i}^{2} (x)

. By Jensen’s inequality, the following upper bound holds [46]:

E_{x} [\sum_{i = 1}^{d} log σ_{θ, i}^{2} (x)] \leq E_{x} [log (\sum_{i = 1}^{d} σ_{θ, i}^{2} (x))] = E_{x} [log ∥ σ_{θ} (x) ∥^{2}] .

(9)

Therefore, minimizing the objective

L_{enc}^{MI} = E_{x} [log ∥ σ_{θ} (x) ∥^{2}]

minimizes an upper bound on the conditional entropy

H (Z | X)

and thereby encourages higher mutual information between X and Z. This variance-based objective enables efficient gradient-based optimization without relying on explicit mutual information estimators such as MINE [47], which are known to suffer from high variance and self-consistency issues in practice [48]. Empirically, incorporating

L_{enc}^{MI}

into the joint unlearning procedure (Algorithm 1, line 7) reduces computational cost significantly while achieving effective backdoor removal performance.

Algorithm 1 SCU Joint Unlearning

Require: Poisoned encoder $f_{θ}^{p}$ , decoder $g_{ϕ}^{p}$ ; Erased set $D_{e}$ ; Remaining set $D_{r}$ ; Hyperparameter $α$
Ensure: Updated encoder $f_{θ}^{*}$ , decoder $g_{ϕ}^{*}$

1:: Initialize $f_{θ} \leftarrow f_{θ}^{p}$ , $g_{ϕ} \leftarrow g_{ϕ}^{p}$
2:: Create fixed references: $f_{θ}^{ref} \leftarrow f_{θ}^{p}$ , $g_{ϕ}^{ref} \leftarrow g_{ϕ}^{p}$
3:: for epoch $= 1$ to $T_{JU}$ do
4:: for batch $(x_{e}, x_{r})$ from $(D_{e}, D_{r})$ do
5:: // Encoder unlearning
6:: $μ_{θ}, σ_{θ} \leftarrow f_{θ} (x_{e})$
7:: $μ_{ref}, σ_{ref} \leftarrow f_{θ}^{ref} (x_{e})$
8:: $L_{enc}^{MI} \leftarrow - log σ_{θ}$ {Maximize MI via entropy}
9:: $L_{enc}^{reg} \leftarrow {∥ μ_{θ} - μ_{ref} ∥}^{2}$ {L2 regularization}
10:: $L_{enc} \leftarrow L_{enc}^{MI} + α L_{enc}^{reg}$
11:
12:: // Decoder unlearning
13:: $z_{e} \sim N (μ_{θ}, σ_{θ}^{2})$
14:: ${\hat{x}}_{e} \leftarrow g_{ϕ} (z_{e})$
15:: ${\hat{x}}_{e}^{ref} \leftarrow g_{ϕ}^{ref} (z_{ref})$ where $z_{ref} \sim N (μ_{ref}, σ_{ref}^{2})$
16:: $L_{dec}^{recon} \leftarrow - {∥ x_{e} - {\hat{x}}_{e} ∥}^{2}$ {Maximize error}
17:: $L_{dec}^{reg} \leftarrow {∥ {\hat{x}}_{e} - {\hat{x}}_{e}^{ref} ∥}^{2}$ {L2 regularization}
18:: $L_{dec} \leftarrow L_{dec}^{recon} + α L_{dec}^{reg}$
19:
20:: // Preservation on remaining data
21:: $μ_{r}, σ_{r} \leftarrow f_{θ} (x_{r})$
22:: $z_{r} \sim N (μ_{r}, σ_{r}^{2})$
23:: ${\hat{x}}_{r} \leftarrow g_{ϕ} (z_{r})$
24:: $L_{preserve} \leftarrow {∥ x_{r} - {\hat{x}}_{r} ∥}^{2}$
25:
26:: // Total loss and update
27:: $L_{JU} \leftarrow L_{enc} + L_{dec} + 0.5 L_{preserve}$
28:: Update $θ, ϕ$ using $\nabla_{θ, ϕ} L_{JU}$ with Adam optimizer
29:: end for
30:: end for
31:: return $f_{θ}^{*}, g_{ϕ}^{*}$

In summary, within the VIB framework, the posterior variance of the encoder plays a central role in controlling the conditional entropy

H (Z | X)

. By minimizing the expected log-total variance, the proposed objective provides a theoretically grounded and computationally efficient surrogate for mutual information maximization. This perspective naturally aligns with the encoder unlearning objective: it preserves semantically relevant information while suppressing uncertain and non-generalizable representations, thereby enabling effective backdoor removal without relying on unstable mutual information estimators.

3.2. Threat Model: Backdoor Poisoning

We consider a powerful adversary who aims to compromise the semantic communication system by injecting backdoor triggers during the training phase. This threat model reflects realistic scenarios in federated learning deployments or supply chain attacks where training data can be manipulated.

The adversary has access to the training dataset

D_{train}

and can modify a fraction

ϵ

of the samples, where we consider

ϵ = 0.15

(15% poisoning rate) as a realistic attack scenario. The adversary is assumed to have knowledge of the model architecture, including the encoder and decoder structures, but does not have access to the model during test time or deployment. This white-box assumption on architecture but black-box assumption on deployment represents a strong yet realistic threat model common in collaborative learning scenarios.

The adversary’s goal is to inject a trigger pattern t into selected training samples such that when the trigger is present at test time, the encoder produces a target representation:

f_{θ} (x + t) \to z_{target}

. This target representation causes the decoder to produce anomalous reconstructions, potentially disrupting communication or causing misclassification in downstream tasks. Importantly, the backdoor should remain inactive on clean samples, maintaining normal system performance to avoid detection.

The poisoning procedure involves creating a compromised training dataset by injecting triggers into a subset of clean samples:

D_{poison} = {(x_{i} + t, y_{i}) : (x_{i}, y_{i}) \in D_{erased}} \cup D_{clean}

(10)

where

D_{erased} \subset D_{train}

represents the subset of samples selected for poisoning, with

| D_{erased} | = ϵ | D_{train} |

. The remaining clean samples

D_{clean}

are left unmodified to ensure the model maintains good performance on legitimate inputs, making the attack stealthier.

We evaluate our defense against four distinct trigger types, each representing different attack sophistication levels and detection challenges:

Square wave trigger: A simple temporal pattern where the first $k = 10$ time steps are modified with alternating high and low values: $t [0 : k] = (α, - α)$ with amplitude $α = 0.85$ . This represents the most basic attack that creates sharp discontinuities in the time-domain signal.
Blend trigger: A stealthy attack that blends a random pattern p with the original signal: $x + t = (1 - λ) x + λ p$ with blending factor $λ = 0.2$ . The low blending ratio makes the trigger almost undetectable while maintaining effectiveness, representing advanced evasion techniques.
Sinusoidal trigger: A frequency-domain attack that adds orthogonal sinusoidal patterns to I/Q components: $t_{I} = A sin (ω t)$ and $t_{Q} = A cos (ω t)$ with amplitude $A = 0.5$ and frequency $ω = 3$ . This exploits the frequency structure of wireless signals and is harder to detect in spectral analysis.
Input-dependent trigger: An adaptive attack where the trigger depends on the input signal’s statistics: $t = ({\bar{x}}_{I} + δ, {\bar{x}}_{Q} - δ)$ with $δ = 0.5$ , where ${\bar{x}}_{I}$ and ${\bar{x}}_{Q}$ are the mean values of the I and Q components. This creates sample-specific triggers that evade detection methods assuming fixed patterns.

Given a poisoned model

M_{poison}

trained on

D_{poison}

, our defense objective is to produce a cleaned model

M_{SCU}

that satisfies two competing requirements:

Backdoor mitigation:

E_{x \sim D_{test}} {∥x - M_{SCU} (x + t)∥}^{2} ≫ {∥x - M_{poison} (x + t)∥}^{2}

(11)

Clean preservation:

E_{x \sim D_{test}} {∥x - M_{SCU} (x)∥}^{2} \approx {∥x - M_{clean} (x)∥}^{2}

(12)

The first constraint requires that triggered inputs produce significantly larger reconstruction errors after defense (denoted by ≫), effectively neutralizing the backdoor by making triggered samples behave anomalously. The second constraint ensures that clean samples maintain reconstruction quality comparable to a model trained only on clean data, preserving the system’s legitimate functionality. Balancing these two objectives represents the core challenge of backdoor defense, as aggressive unlearning risks degrading clean performance while conservative approaches fail to remove backdoors effectively.

3.3. Defense Success Criteria

An effective defense must satisfy three objectives simultaneously:

1. High Backdoor Mitigation (

> 500 %

):

We define the mean squared error (MSE) between the original signal x and reconstruction

\hat{x}

as

MSE (x, \hat{x}) = \frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - {\hat{x}}_{i})}^{2}

, where N is the signal length. The mitigation percentage is then calculated as

Mitigation = \frac{{MSE}_{BD}^{SCU} - {MSE}_{BD}^{poison}}{{MSE}_{BD}^{poison}} \times 100 %

(13)

Successful defense causes triggered inputs to produce reconstruction errors substantially larger than in the poisoned model.

2. Low Clean Degradation (

< 15 %

):

Degradation = \frac{{MSE}_{clean}^{SCU} - {MSE}_{clean}^{original}}{{MSE}_{clean}^{original}} \times 100 %

(14)

Defense should preserve legitimate semantic communication quality.

3. Low Attack Success Rate (ASR

< 5 %

):

ASR = \frac{# {triggered samples with MSE < threshold}}{| D_{test} |} \times 100 %

(15)

where threshold = median(

{MSE}_{clean}

) + 1.5 × MAD. Triggered inputs should be statistically distinguishable.

These thresholds are consistent with prior backdoor defense studies [27,31], where ASR values below 5% are commonly regarded as negligible risk and performance degradation below 15% is considered an acceptable trade-off. The 500% threshold enforces that triggered inputs exhibit at least six times higher error than the clean baseline, which enables highly reliable detection with over 99.9% confidence under the three-sigma rule.

4. SCU Methodology

4.1. Overview and Design Rationale

SCU addresses a fundamental challenge in backdoor defense: how to selectively erase adversarial knowledge from a trained model without causing catastrophic forgetting of legitimate functionality. Traditional approaches such as complete retraining are computationally prohibitive, while pruning-based methods remove neurons indiscriminately, potentially damaging both backdoor and clean pathways. SCU introduces a principled two-stage framework that exploits the unique properties of Variational Information Bottleneck architectures to achieve surgical removal of backdoor mappings while maintaining semantic communication fidelity. The key insight underlying SCU is that backdoor triggers produce distinctive patterns in the latent space. Poisoned samples are encoded with artificially low variance (high confidence) to ensure reliable trigger activation, while their reconstructions exhibit anomalously low error given the complexity of the input. By deliberately increasing the entropy of erased sample encodings (Stage 1: joint unlearning) and then maximizing their feature distance from clean samples (Stage 2: contrastive compensation), SCU destroys the backdoor’s latent space structure without requiring explicit trigger knowledge or extensive retraining.

4.2. System Architecture

Figure 1 illustrates the complete SCU pipeline, which operates in three sequential phases designed to isolate, eliminate, and validate backdoor removal.

Phase 1: Backdoor Injection and Model Poisoning: Following the standard backdoor attack protocol, we inject triggers into 15% of the training samples (

D_{erased} \subset D_{train}

), creating a poisoned dataset

D_{poison}

used to train the compromised VIB model

M_{p}

. This phase simulates the adversary’s attack and produces the starting point for our defense.

Phase 2: Two-Stage Unlearning: This is the core contribution of SCU, consisting of two complementary optimization procedures:

Stage 1—Joint Unlearning (JU): Operates on the erased set

D_{e}

(identified as poisoned samples) to simultaneously disrupt both encoder and decoder mappings. The encoder unlearning component maximizes output entropy for erased samples, transforming their previously confident (low-variance) embeddings into noisy, high-variance representations that cannot reliably encode backdoor information. Concurrently, the decoder unlearning component maximizes reconstruction error on these samples, ensuring that even if residual backdoor structure persists in the latent space, it fails to produce coherent outputs. Critically, a preservation term maintains reconstruction quality on the remaining (clean) samples

D_{r}

, preventing catastrophic forgetting.

Stage 2—Contrastive Compensation (CC): Refines the unlearning by enforcing geometric separation in the latent space. Using InfoNCE-style contrastive learning [49], this stage pulls remaining (clean) sample embeddings closer together while pushing erased (poisoned) sample embeddings away. This prevents backdoor reactivation by ensuring that any residual backdoor-associated features occupy isolated regions of the latent manifold, far from the dense clusters of legitimate semantic representations.

Phase 3: Validation: The defended model

M_{SCU}

is evaluated on both clean test samples (to measure performance preservation) and triggered test samples (to quantify backdoor mitigation). This phase verifies that SCU achieves the dual objective of eliminating backdoors while maintaining semantic communication quality.

Algorithm 1 anchors to the poisoned model (line 2) for three reasons: (1) clean models are unavailable post-deployment, (2) poisoned models correctly encode clean samples (backdoors add, not replace), and (3) L2 regularization against the poisoned reference (lines 9 and 17) prevents catastrophic drift while allowing for selective backdoor forgetting.

4.3. Stage 1: Joint Unlearning Algorithm

Algorithm 1 uses the poisoned model as a reference for three key reasons. First, clean models are typically unavailable after deployment. Second, poisoned models still preserve correct representations for clean inputs, as backdoor behaviors are additive rather than replacing learned semantics. Third, applying L2 regularization with respect to the poisoned reference (lines 9 and 17) constrains parameter drift, thereby preventing catastrophic forgetting while enabling selective removal of backdoor information. It presents the joint encoder–decoder unlearning procedure, which forms the foundation of SCU’s backdoor removal capability.

Algorithm 1 employs L2 distance instead of KL divergence in the regularization terms (lines 9 and 17). This design choice balances computational efficiency with theoretical fidelity: when encoder variances are approximately matched (

σ_{θ} (x) \approx σ_{ref} (x)

), the L2 distance between mean vectors serves as a first-order surrogate for the KL divergence (see Appendix B for the complete derivation and complexity analysis).

Algorithm Mechanics and Intuition

Joint unlearning operates by creating a reference copy of the poisoned model (lines 1–2) that remains frozen throughout optimization, serving as an anchor to prevent the model from drifting arbitrarily far from its original state. For each mini-batch, the algorithm performs three coordinated operations:

Encoder unlearning (lines 5–8): The core idea is to destroy the encoder’s ability to produce reliable embeddings for erased samples. We achieve this by minimizing mutual information

I (X_{e}; Z)

through entropy maximization. Specifically, minimizing

log σ_{θ}

forces the encoder to output high-variance distributions for erased samples, transforming the previously confident backdoor mappings into noisy, unreliable representations. To prevent encoder collapse, we add an L2 regularization term

∥ μ_{θ} - μ_{ref} ∥^{2}

that constrains how far the mean embedding can deviate from the frozen reference model. This L2 constraint serves as a computationally efficient approximation to full KL divergence, maintaining stability while reducing computational cost by approximately 32%.

Decoder unlearning (lines 10–14): Complementing encoder unlearning, the decoder component ensures that even if some backdoor structure survives in the latent space, it cannot produce successful reconstructions. We explicitly maximize the reconstruction error

- {∥ x_{e} - {\hat{x}}_{e} ∥}^{2}

on erased samples, training the decoder to produce poor outputs for backdoor-associated embeddings. L2 regularization

{∥ {\hat{x}}_{e} - {\hat{x}}_{e}^{ref} ∥}^{2}

prevents the decoder from degenerating completely, allowing it to maintain reconstruction capability for clean samples while degrading performance on poisoned ones.

Clean sample preservation (lines 16–19): Crucially, while we aggressively unlearn on

D_{e}

, we simultaneously minimize the standard reconstruction loss on the remaining samples

D_{r}

to preserve legitimate semantic communication functionality. The coefficient of 0.5 (Line 21) balances unlearning strength against clean performance preservation.

The reference model architecture (line 2) is essential because without it, the regularization terms would be undefined and the model could drift into arbitrary regions of parameter space during unlearning. By anchoring to the original poisoned model, we ensure that clean sample representations, which were learned correctly during initial training, remain largely intact while only backdoor-specific pathways are disrupted.

4.4. Stage 2: Contrastive Compensation Algorithm

Algorithm 2 implements the contrastive compensation stage, which consolidates unlearning by enforcing latent space separation.

Algorithm 2 SCU Contrastive Compensation

Require: Partially unlearned encoder $f_{θ}$ , decoder $g_{ϕ}$ ; Erased set $D_{e}$ ; Remaining set $D_{r}$ ; Temperature $τ$
Ensure: Fully unlearned encoder $f_{θ}^{*}$ , decoder $g_{ϕ}^{*}$

1:: for epoch $= 1$ to $T_{CC}$ do
2:: for batch $(x_{e}, x_{r})$ from $(D_{e}, D_{r})$ do
3:: // Encoder contrastive learning
4:: $μ_{r},_\leftarrow f_{θ} (x_{r})$ ; $μ_{e},_\leftarrow f_{θ} (x_{e})$
5:: Normalize: ${\bar{μ}}_{r} \leftarrow μ_{r} / ∥ μ_{r} ∥$ , ${\bar{μ}}_{e} \leftarrow μ_{e} / ∥ μ_{e} ∥$
6:
7:: // Positive pairs (within remaining set)
8:: $S_{pos} \leftarrow {\bar{μ}}_{r} {\bar{μ}}_{r}^{⊤} / τ$
9:: $L_{pos} \leftarrow \sum_{i} S_{pos} [i, i]$ {Diagonal elements}
10:
11:: // Negative pairs (remaining vs. erased)
12:: $S_{neg} \leftarrow {\bar{μ}}_{r} {\bar{μ}}_{e}^{⊤} / τ$
13:: $L_{neg} \leftarrow log \sum_{i, j} exp (S_{neg} [i, j])$
14:
15:: $L_{enc}^{contrast} \leftarrow - (L_{pos} - L_{neg})$
16:
17:: // Decoder preservation
18:: $z_{r} \sim N (μ_{r}, σ_{r}^{2})$
19:: ${\hat{x}}_{r} \leftarrow g_{ϕ} (z_{r})$
20:: $L_{dec}^{preserve} \leftarrow {∥ x_{r} - {\hat{x}}_{r} ∥}^{2}$
21:
22:: $L_{CC} \leftarrow L_{enc}^{contrast} + L_{dec}^{preserve}$
23:: Update $θ, ϕ$ using $\nabla_{θ, ϕ} L_{CC}$ with Adam optimizer
24:: end for
25:: end for
26:: return $f_{θ}^{*}, g_{ϕ}^{*}$

Maximizing the diagonal elements

S_{pos} [i, i]

enforces self-consistency for clean samples (diagonal focus line 9). In the VIB framework, these diagonal terms capture within-sample stability of latent representations, which promotes compact and well-separated semantic clusters. This stability makes clean representations less susceptible to perturbations and helps prevent backdoor patterns from being reactivated.

Algorithm Mechanics and Intuition

Contrastive compensation addresses a critical challenge. After joint unlearning, the encoder produces noisy embeddings for erased samples, but these noisy embeddings might still overlap with clean sample embeddings in the latent space. If overlap persists, backdoor triggers could potentially reactivate by exploiting this shared latent region. Contrastive learning solves this by explicitly maximizing the geometric distance between erased and remaining sample embeddings.

InfoNCE-style contrastive learning (lines 4–11): We L2-normalize all embeddings (line 5) to focus on directional differences rather than magnitude, placing all representations on a unit hypersphere. We then compute positive similarity

S_{pos} = {\bar{μ}}_{r} {\bar{μ}}_{r}^{⊤} / τ

among the remaining samples, where the temperature

τ = 0.5

sharpens the distribution. Maximizing diagonal elements

S_{pos} [i, i]

encourages each remaining sample’s embedding to align with others in its batch, creating dense clusters of clean representations. Conversely, we compute negative similarity

S_{neg} = {\bar{μ}}_{r} {\bar{μ}}_{e}^{⊤} / τ

between remaining and erased samples, minimizing the log-sum-exp of these similarities to push erased sample embeddings away from the clean cluster.

Decoder preservation (lines 14–16): This runs concurrently to maintain reconstruction quality on clean samples, preventing the encoder’s contrastive updates from destabilizing the decoder.

The combination of Algorithms 1 and 2 implements a two-pronged attack on backdoor knowledge: joint unlearning destroys the encoding and decoding mechanisms, while contrastive compensation restructures the latent space to prevent reactivation.

4.5. Poisoned Sample Detection

A practical challenge for any unlearning-based defense is identifying which training samples are poisoned (the erased set

D_{e}

). While our evaluation uses ground-truth poisoned indices to isolate SCU’s core unlearning effectiveness, we also implement a detection module for real-world deployment.

We employ a reconstruction variance-based heuristic: backdoor samples typically exhibit anomalously low reconstruction error because the trigger creates an artificial shortcut in the model’s learned mapping. For each training sample

x_{i}

, we compute its reconstruction error

e_{i} = {∥ x_{i} - {\hat{x}}_{i} ∥}^{2}

and flag samples with

e_{i} < μ_{e} - 2 σ_{e}

as suspicious, where

μ_{e}

and

σ_{e}

are the mean and standard deviation of the reconstruction errors across the training set. This statistical threshold identifies samples that reconstruct too well, indicating potential backdoor influence.

4.6. Hyperparameter Configuration

Table 2 presents our hyperparameter configuration, determined through ablation studies that prioritized maximizing backdoor mitigation while constraining clean MSE degradation below 15%. The regularization weight

α = 1.0

balances unlearning aggressiveness against stability, preventing catastrophic forgetting while enabling effective backdoor suppression. Temperature

τ = 0.5

creates sufficient separation between erased and clean embeddings while maintaining numerical stability. Both unlearning stages use 10 epochs, with ablation studies (Section 6.3) demonstrating that both stages are necessary for optimal performance. The batch size of 128 balances computational efficiency with gradient signal quality, while gradient clipping to

[- 1, 1]

prevents explosions during adversarial optimization.

4.7. Theoretical Foundation of the Separability Condition

A key question underlying the effectiveness of SCU is under which conditions backdoor patterns can be selectively removed without harming legitimate semantic representations. Prior work shows that latent representations of clean data concentrate on a low-dimensional semantic manifold, whereas backdoor triggers introduce systematic and consistent shifts in the feature space [50,51]. Motivated by this distinction, we formalize a separability condition that specifies when selective unlearning is theoretically achievable.

Let

Z_{s} \subset R^{d}

denote the semantic subspace spanned by the top-k principal components of clean latent embeddings, with orthonormal basis

{v_{1}, \dots, v_{k}}

. We define the backdoor direction as the mean embedding shift induced by trigger injection:

z_{b} = E_{x \sim D_{clean}} [μ_{θ} (x + t)] - E_{x \sim D_{clean}} [μ_{θ} (x)] .

(16)

Selective forgetting is achievable when the backdoor direction exhibits limited alignment with the semantic subspace:

max_{i = 1, \dots, k} \frac{| 〈 v_{i}, z_{b} 〉 |}{∥ v_{i} ∥ ∥ z_{b} ∥} < ϵ,

(17)

where

ϵ

denotes an entanglement threshold controlling the degree of backdoor–semantic interaction. Empirically, we observe that

ϵ \approx 0.30

provides sufficient separation for effective unlearning without noticeable degradation in clean performance.

Within the VIB framework [44], latent representations preserve task-relevant semantics while discarding nuisance factors. Clean samples concentrate in a semantic subspace

Z_{s}

characterized by high-variance principal components, whereas backdoor triggers introduce systematic displacement along direction

z_{b}

. When

z_{b}

is approximately orthogonal to

Z_{s}

, selectively increasing posterior uncertainty along this direction suppresses backdoor information while leaving semantic representations largely unaffected. This geometric disentanglement explains why encoder unlearning can surgically target backdoor-aligned dimensions. Conversely, when the backdoor direction exhibits strong alignment with semantic components, suppressing backdoor information inevitably distorts the semantic subspace, leading to substantial clean performance degradation. In our experiments, models with entanglement exceeding 0.5 consistently suffer more than 20% degradation. The threshold

ϵ = 0.30

represents a practical operating regime determined empirically across multiple backdoor types and training runs.

Empirical Validation

We empirically validate the proposed separability condition using clean, poisoned, and SCU-defended models. The semantic subspace is obtained via PCA on clean embeddings, retaining the top 32 components that explain 95% of cumulative variance. The backdoor direction

z_{b}

is computed as the mean shift between triggered and clean samples, and entanglement is measured as the maximum cosine similarity between

z_{b}

and the semantic basis vectors. As shown in Figure 2, the poisoned model exhibits pronounced alignment between the backdoor direction and specific principal components, with an average entanglement of 0.206. After applying SCU, this alignment is substantially reduced across nearly all components, with the average entanglement decreasing to 0.146 (29% improvement). Dominant anomalies observed in the poisoned model are largely eliminated, demonstrating that SCU selectively suppresses backdoor-aligned dimensions rather than indiscriminately perturbing the latent space. To further characterize geometric separation, we decompose the backdoor direction into components parallel and orthogonal to the semantic subspace:

z_{b} = \sum_{i = 1}^{32} 〈 v_{i}, z_{b} 〉 v_{i} + z_{b}^{⊥} .

(18)

In the poisoned model, more than 97% of the backdoor energy lies in the orthogonal complement of

Z_{s}

, increasing to 98.4% after applying SCU. This confirms that backdoor information predominantly occupies dimensions distinct from semantic features and that SCU further enhances this separation. Overall, all evaluated configurations satisfy the proposed separability condition, with SCU consistently achieving the lowest entanglement. These results provide strong empirical support for the separability hypothesis and explain why SCU enables surgical backdoor removal without compromising clean performance.

Figure 2. Left: Per-component entanglement scores across 32 principal components. Poisoned model (red circles) shows average 0.206, with maximum 0.815 at PC4. SCU (green squares) reduces average to 0.146 (29% improvement), with dramatic PC1 reduction (0.975 → 0.206). Threshold

ϵ = 0.30

(dashed line) is satisfied by 28/32 components (poisoned) and 31/32 (SCU). Right: Average entanglement comparison across three models (blue: clean, red: poisoned, green: SCU). All models remain in the separable region (green shaded area,

< 0.30

), below the threshold (dashed line), confirming theoretical predictions. The red shaded area (

> 0.30

) represents the entangled region. SCU achieves lowest entanglement (0.146), demonstrating enhanced backdoor–semantic separation post-unlearning.

Figure 2. Left: Per-component entanglement scores across 32 principal components. Poisoned model (red circles) shows average 0.206, with maximum 0.815 at PC4. SCU (green squares) reduces average to 0.146 (29% improvement), with dramatic PC1 reduction (0.975 → 0.206). Threshold

ϵ = 0.30

(dashed line) is satisfied by 28/32 components (poisoned) and 31/32 (SCU). Right: Average entanglement comparison across three models (blue: clean, red: poisoned, green: SCU). All models remain in the separable region (green shaded area,

< 0.30

), below the threshold (dashed line), confirming theoretical predictions. The red shaded area (

> 0.30

) represents the entangled region. SCU achieves lowest entanglement (0.146), demonstrating enhanced backdoor–semantic separation post-unlearning.

5. Experimental Setup

5.1. Dataset and Preprocessing

We conduct experiments on the RML2016.10a dataset [52], a widely-adopted benchmark for automatic modulation classification in wireless communications. This dataset contains over 220,000 complex-valued I/Q (in-phase and quadrature) signal samples representing 11 different modulation schemes captured at 20 distinct signal-to-noise ratio (SNR) levels ranging from −20 dB to +18 dB. To create a balanced and representative experimental setup while maintaining computational tractability, we select six diverse modulation types spanning different complexity levels: BPSK (Binary Phase Shift Keying), QPSK (Quadrature Phase Shift Keying), 8PSK (8-ary Phase Shift Keying), QAM16 (16-Quadrature Amplitude Modulation), WBFM (Wideband Frequency Modulation), and AM-SSB (Amplitude Modulation—Single Sideband). This selection encompasses both phase-based modulations (PSK family) and amplitude-based schemes (QAM and AM-SSB), along with the frequency-domain representation (WBFM), providing comprehensive coverage of modern wireless communication techniques. We focus on moderate-to-high SNR conditions (10, 12, 14, 16, and 18 dB) that are typical of practical wireless systems after equalization and channel compensation. This range excludes extremely noisy conditions where signal characteristics become unreliable while still testing robustness across varying channel qualities. From each modulation–SNR pair, we extract 400 samples, yielding a total dataset of 12,000 samples (6 modulations × 5 SNRs × 400 samples).

Each sample consists of 128 time-domain complex samples represented as a

128 \times 2

matrix, where the first column contains in-phase (I) components and the second column contains quadrature (Q) components. This I/Q representation captures both amplitude and phase information of the transmitted signal, forming the fundamental input to our semantic communication system.

We employ stratified splitting to maintain class balance across all subsets. The dataset is divided into three non-overlapping partitions:

Training set: 7200 samples (60%)—used for both clean and poisoned model training.
Validation set: 2400 samples (20%)—used for hyperparameter tuning and early stopping.
Test set: 2400 samples (20%)—held out for final performance evaluation.

The stratified approach ensures that each partition maintains the same proportion of samples from each modulation type and SNR level, preventing evaluation bias.

Raw I/Q samples exhibit significant dynamic range variation across different modulation schemes and SNR levels. To ensure stable neural network training and prevent certain modulations from dominating the gradient updates, we apply min-max normalization to scale all features into the range

[- 1, 1]

:

x_{norm} = 2 \cdot \frac{x - x_{min}}{x_{max} - x_{min}} - 1

(19)

where

x_{min}

and

x_{max}

are computed from the training set and applied consistently to validation and test sets to prevent data leakage. This symmetric range around zero is particularly well-suited for the hyperbolic tangent activation functions used in our decoder architecture.

Prior to model training, we perform comprehensive integrity checks on all partitions:

Shape consistency: Verify all samples maintain $(128, 2)$ dimensionality.
Numerical validity: Confirm absence of NaN or infinite values.
Distribution balance: Ensure each class represents approximately 16.67% of samples (1/6 for six modulations).
SNR distribution: Validate equal representation across all five SNR levels within each modulation class.

For compatibility with our unsupervised VIB framework, we transform modulation labels into one-hot encoded vectors of dimension 6, though these labels are only used for stratified splitting and performance analysis, not during VIB training itself (which operates in a purely unsupervised reconstruction-based manner).

This preprocessing pipeline ensures that our experimental results are not confounded by data artifacts, class imbalance, or normalization issues, allowing us to isolate the true impact of backdoor attacks and SCU defense mechanisms on semantic communication performance.

5.2. Multi-Domain Signal Representations

To comprehensively evaluate SCU’s robustness across diverse signal processing domains, we transform the time-domain I/Q samples into six additional representations, each capturing different signal characteristics. All transformations preserve full spectral information (both positive and negative frequencies) to avoid information loss.

Frequency Domain (FFT): We apply the Fast Fourier Transform to convert temporal signals into their frequency-domain representation. For each I/Q pair $(x_{I}, x_{Q})$ , we construct a complex signal $s = x_{I} + j \cdot x_{Q}$ and compute $S = FFT (s)$ . The transformed representation stores magnitude $| S |$ and phase $∠ S$ as separate channels, preserving complete spectral information across all 128 frequency bins.
Z-Domain (STFT): Short-Time Fourier Transform captures time-frequency joint characteristics with window size $N_{seg} = 32$ and 75% overlap (24 samples). We preserve the full complex spectrum to maintain negative frequency components. The resulting spectrogram is flattened and intelligently downsampled to 128 features per channel using uniform stride, preserving both transient and steady-state signal properties.
Wavelet Domain: We employ Daubechies-4 wavelets with 4-level decomposition to capture multi-resolution temporal characteristics. The wavelet transform produces approximation and detail coefficients at different scales, $[c A_{4}, c D_{4}, c D_{3}, c D_{2}, c D_{1}]$ , which are concatenated and uniformly sampled to 128 features. This representation excels at localizing transient events while preserving low-frequency trends.
Laplace Domain: To model signals with exponential damping characteristics, we apply complex exponential weighting with damping factor $σ = 0.1$ : $L {s (t)} = FFT (s (t) \cdot e^{- σ t})$ . This transformation emphasizes early-time signal behavior, making it sensitive to trigger patterns injected at the beginning of the time series.
Cepstral Domain: The cepstrum separates the spectral envelope from the fine structure through homomorphic processing: $c (t) = IFFT (log (| FFT (s (t)) |))$ . This domain is particularly effective for detecting modulation-based features and filtering multiplicative noise or backdoor artifacts that appear as additive components in the log-spectral domain.
Hilbert Domain: We compute the analytic signal via Hilbert transform to extract the instantaneous amplitude and phase, $s_{a} (t) = s (t) + j \cdot H {s (t)}$ , where $H$ denotes the Hilbert transform operator. The resulting representation stores the instantaneous envelope $| s_{a} (t) |$ and unwrapped phase $∠ s_{a} (t)$ , capturing amplitude and frequency modulation characteristics independently.

After transformation, each domain undergoes the same min-max normalization as the time-domain data. We perform domain-specific validation checks:

Numerical stability: Replace any NaN or Inf values (arising from log operations or divisions) with safe bounds.
Dynamic range verification: Ensure transformed signals span a meaningful range (>0.01) to avoid degenerate cases.
Power preservation: Confirm average power in the transformed domain remains within 0.1–10× of the original signal power.

This comprehensive, multi-domain evaluation enables us to determine whether backdoor triggers exploit domain-specific vulnerabilities, and whether the SCU remains effective when processing fundamentally different signal representations. This is critical for real-world deployment, where adversaries may optimize attacks for specific processing pipelines.

5.3. Baseline Defense Mechanisms

To contextualize SCU’s performance and demonstrate its advantages over existing approaches, we implement and compare against six state-of-the-art backdoor defense methods spanning detection-based, pruning-based, and unlearning-based paradigms. All baselines are carefully tuned to operate within comparable computational budgets and adapted for the VIB semantic communication architecture.

Detection-based defenses:

1. Neural Cleanse [27]: This method reverse-engineers potential backdoor triggers by solving an optimization problem for each target class. For each of the six modulation classes, we run 50 gradient descent iterations to find the minimal perturbation pattern that causes misclassification. The trigger search minimizes:

{min}_{t} E_{x} [L (f_{θ} (x + m ⊙ t), y_{target})] + λ {∥ m ∥}_{1}

, where m is a binary mask and

λ = 0.01

controls trigger size. Classes exhibiting anomalously small trigger norms (more than 2 standard deviations below the mean) are flagged as backdoored. Upon detection, we apply fine-tuning on clean data (3 epochs; learning rate

10^{- 4}

) to mitigate the identified backdoor.

2. Spectral Signature [28]: This approach leverages the observation that backdoor samples cluster in the top singular value space of layer activations. We extract encoder representations from 2000 training samples, mean-center them, and compute singular value decomposition. Samples with projection scores onto the top singular vector exceeding the 95th percentile are classified as poisoned. The model is then retrained on the remaining samples (3 epochs) to remove backdoor influence.

3. Activation Clustering [53]: We perform k-means clustering (

k = 2

) on encoder latent representations z extracted from training samples. The algorithm assumes backdoor samples form a distinct cluster due to their anomalous encoding. We identify the smaller cluster (typically containing 5–20% of samples) as suspicious and retrain the model on the larger cluster for 3 epochs. This method is particularly sensitive to trigger strength, with strong triggers producing well-separated clusters, while subtle triggers (e.g., blend backdoors) may evade detection.

Pruning-based defenses:

4. Fine-Pruning [31]: This method operates on the hypothesis that backdoor functionality resides in specific neurons that remain dormant during clean-data processing. We prune 30% of weights with the smallest magnitude across all layers (encoder and decoder) and then fine-tune the pruned model on clean training data for 3 epochs with learning rate

10^{- 4}

. The pruning ratio is selected to balance the removal of backdoors against preserving clean performance. Higher ratios (e.g., 50%) more aggressively eliminate backdoors, but this degrades the quality of semantic reconstruction.

Unlearning-based defenses:

5. Variational Bayesian Unlearning (VBU) [37]: This Bayesian approach models parameter uncertainty and updates the posterior distribution by removing the likelihood contribution of poisoned samples. We use Monte Carlo sampling with 5 samples per forward pass to estimate the variational objective:

L_{VBU} = E_{q (θ)} [log p (D_{clean} | θ)] - β \cdot D_{KL} (q (θ) ∥ p (θ))

, where

β = 0.1

controls the strength of the prior. The method requires identifying poisoned samples (we use ground-truth indices for fair comparison) and runs for 10 epochs with learning rate

3 \times 10^{- 4}

.

6. Hessian-based Unlearning: This second-order method approximates the Fisher Information Matrix to identify parameters most influenced by poisoned samples. For computational efficiency, we use the empirical Fisher computed from 30 clean samples:

F_{i j} = E_{x} [\nabla_{θ_{i}} log p (x | θ) \nabla_{θ_{j}} log p (x | θ)]

. The unlearning objective combines gradient ascent on poisoned samples (to forget) with gradient descent on clean samples (to remember):

L_{Hessian} = L_{clean} - α L_{poison} + λ \sum_{i j} F_{i j} θ_{i} θ_{j}

, where

α = 0.1

controls unlearning strength and

λ = 0.1

provides Fisher regularization. We run 15 gradient steps with learning rate

2 \times 10^{- 4}

and gradient clipping at

\pm 0.5

for stability.

To ensure fair comparison, all baselines are implemented within the same TensorFlow 2.x framework, use identical data preprocessing, and operate on the same poisoned models. Methods that require poisoned sample identification (e.g., VBU and Hessian) are provided with ground-truth indices to isolate their core defense mechanisms from detection challenges. This represents an upper bound on their performance, since perfect detection is unrealistic. The computational budgets are approximately matched, with all methods completing within 150–600 s on our experimental platform (see Section 6.6 for a detailed timing analysis).

5.4. Evaluation Metrics

We employ a comprehensive evaluation framework that quantifies both backdoor mitigation effectiveness and clean performance preservation, along with a binary attack success measure.

Backdoor Mitigation Percentage: This metric quantifies the relative increase in reconstruction error on triggered inputs after applying a defense, indicating how effectively the backdoor has been disrupted:

Mitigation = \frac{{MSE}_{backdoor}^{defense} - {MSE}_{backdoor}^{poison}}{{MSE}_{backdoor}^{poison}} \times 100 %

(20)

where

{MSE}_{backdoor}^{poison}

is the reconstruction error of the poisoned model on triggered test samples

{x + t}

, and

{MSE}_{backdoor}^{defense}

is the same metric after defense. Higher values indicate stronger backdoor mitigation. Ideally, triggered samples should produce significant reconstruction errors after defense, indicating that the backdoor mapping has been erased. A mitigation of 100% means the backdoor MSE has doubled; 1000% means it has increased 10-fold. Negative values indicate the defense failed to disrupt the backdoor.

Clean Performance Degradation: This metric measures the collateral damage to legitimate signal reconstruction caused by the defense mechanism:

Degradation = \frac{{MSE}_{clean}^{defense} - {MSE}_{clean}^{original}}{{MSE}_{clean}^{original}} \times 100 %

(21)

where

{MSE}_{clean}^{original}

is the reconstruction error of a clean (unpoisoned) model on test samples

{x}

, and

{MSE}_{clean}^{defense}

is the reconstruction error after defense. Lower values indicate better preservation of legitimate functionality. Ideally, this metric should remain below 10–15%, representing acceptable performance trade-off. Degradation above 30% suggests the defense has overly disrupted the model’s semantic encoding capacity.

Attack Success Rate (ASR): Following the BadNets evaluation protocol [16], the ASR measures the fraction of triggered test samples that successfully deceive the defended model:

ASR = \frac{# {samples with MSE (x + t) < threshold}}{| D_{test} |} \times 100 %

(22)

The threshold is computed as

threshold = median ({MSE}_{clean}) + 1.5 \times MAD

, where MAD (Median Absolute Deviation) is defined as

MAD = median (| {MSE}_{clean} - median ({MSE}_{clean}) |)

. This robust statistical threshold accounts for the natural distribution of clean reconstruction errors while remaining resistant to outliers. Triggered samples with a reconstruction error below this threshold are considered successful attacks (indistinguishable from clean samples). A lower ASR indicates better defense. An ASR of 0% means that all triggered inputs are flagged as anomalous. For context, a poisoned model without defense typically exhibits an ASR

> 95 %

, while effective defenses should reduce the ASR to

< 5 %

.

Metric interpretation: An ideal defense achieves the following:

High Mitigation ( $> 500 %$ ): Backdoor triggers produce catastrophic reconstruction errors.
Low Degradation ( $< 10 %$ ): Clean samples maintain near-original reconstruction quality.
Low ASR ( $< 5 %$ ): Triggered inputs are statistically distinguishable from clean samples.

These metrics are evaluated on the held-out test set (2400 samples) across all six modulation types and five SNR levels to ensure statistical significance and generalizability.

In addition to accuracy-based metrics, we report the wall-clock time for each defense method (measured on an NVIDIA Tesla V100 GPU with 32GB RAM) to assess practical deployability. We also compute the theoretical complexity in terms of floating-point operations (FLOPs) for scalability analysis (detailed in Section 6.6).

All experiments are repeated with three random seeds, and we report the mean ± standard deviation. For mitigation and degradation percentages exceeding 50%, we additionally provide 95% confidence intervals via bootstrap resampling (1000 iterations) to validate the robustness of reported improvements.

6. Experimental Results

This section presents comprehensive experimental validation of SCU across baseline comparisons (Section 6.1), statistical robustness (Section 6.2), ablation studies (Section 6.3), multi-domain analysis (Section 6.4), channel robustness (Section 6.5), computational efficiency (Section 6.6), and adaptive attacks (Section 6.7). The detection robustness results are detailed in Appendix A.

6.1. Main Performance Comparison

We compare SCU against six state-of-the-art backdoor defense methods on the RML2016.10a dataset with a 15% poisoning ratio. Table 3 presents the reconstruction MSE for clean test samples and backdoor-triggered samples, along with the backdoor mitigation percentage calculated as

Mitigation = \frac{{MSE}_{after} - {MSE}_{poisoned}}{{MSE}_{poisoned}} \times 100 %

. A higher backdoor MSE indicates stronger unlearning, as the model fails to reconstruct triggered inputs according to the learned backdoor mapping.

SCU achieves substantially superior backdoor mitigation compared to all baseline methods. The 5-seed average of 629.5 ± 191.2% demonstrates robust and reproducible performance, with the 95% confidence interval of [364.1%, 895.0%] indicating strong statistical significance (

p < 0.001

; detailed in Section 6.2). The best individual run achieved 1486.1% mitigation, representing the upper bound of SCU’s capability. In comparison, retraining from scratch, which serves as a theoretical upper bound, achieves only 110.0% mitigation, suggesting that complete model retraining paradoxically preserves some backdoor vulnerability due to potential data contamination or optimization trajectory biases. Alternative unlearning approaches (Variational Bayesian and Hessian-based) show minimal effectiveness with near-zero or negative mitigation, indicating that generic machine unlearning techniques fail to address backdoor-specific challenges in semantic communication systems. Defense-focused methods (Neural Cleanse, Fine-Pruning, and Activation Clustering) achieve modest improvements (7.4–92.8%) but remain an order of magnitude below SCU’s performance. These methods struggle because they either rely on explicit backdoor detection (Neural Cleanse), apply indiscriminate parameter modification (Fine-Pruning), or require clean-data separation (Activation Clustering)—assumptions that do not hold in our threat model.

The clean MSE degradation for SCU is 10.5% relative to the original clean model (0.0612 vs. 0.0554), which remains within acceptable bounds for practical semantic communication systems. This modest performance trade-off is justified by the substantial backdoor elimination, particularly when compared to retraining which achieves a similar clean MSE (0.0549) but with dramatically inferior backdoor mitigation.

6.2. Statistical Validation

To establish reproducibility and statistical significance, we conduct five independent experimental runs with different random seeds (42–46), each involving complete pipeline execution: backdoor injection, poisoned model training, and SCU unlearning. This multi-seed validation quantifies performance variability across different initializations and backdoor patterns, providing robust deployment estimates beyond single-seed ablation studies. Figure 3 presents the distribution of mitigation percentages across runs.

The results demonstrate strong consistency: mean mitigation of 629.5% with standard deviation 191.2%, median 652.7%, and range [390.7%, 914.6%]. A one-sample t-test comparing SCU against the baseline (0% mitigation) yields t-statistic = 6.585 and

p = 0.000172 < 0.001

, confirming highly significant improvement. The effect size (Cohen’s

d = 3.29

) indicates a large practical impact. The 95% confidence interval [364.1%, 895.0%] suggests that even in worst-case scenarios, SCU maintains substantial backdoor mitigation. Clean MSE consistency is equally strong across runs with mean 0.0618 ± 0.0003, representing 11.5 ± 0.6% degradation from the original model. This tight variance confirms that the performance trade-off is predictable and controllable. The reproducibility across different random initializations validates that SCU’s effectiveness stems from its principled two-stage optimization rather than fortuitous parameter configurations. The relationship between single-seed and multi-seed results requires clarification. The single-seed ablation (Section 6.3) achieved 1486.1% mitigation under controlled conditions designed to isolate component effects. The lower multi-seed average (629.5%) reflects natural performance variance across initializations, backdoor patterns, and unlearning trajectories. Both results validate SCU’s effectiveness. The ablation demonstrates the theoretical upper bound, while multi-seed validation provides robust deployment estimates with quantified uncertainty.

6.3. Ablation Study

We conduct ablation experiments using a single seed (seed 42) to isolate the individual contributions of each SCU component: joint unlearning (Stage 1) and contrastive compensation (Stage 2). This single-seed analysis enables controlled comparison by eliminating inter-seed variability, allowing us to precisely quantify how each component affects backdoor mitigation and clean performance. Table 4 presents the results for each component in isolation and their combination.

Joint unlearning alone achieves exceptional backdoor mitigation (1891.4%) by maximizing mutual information between erased samples and their semantic representations while minimizing information with the backdoor decoder. However, this aggressive unlearning causes significant clean performance degradation (8.5% MSE increase: 0.0598 vs. 0.0551 poisoned baseline), as it lacks mechanisms to preserve legitimate semantic features. Conversely, contrastive compensation alone provides minimal backdoor mitigation (49.9%) while maintaining clean performance (0.0550 MSE, nearly identical to poisoned baseline), as it focuses on feature separation without explicit backdoor erasure.

The full SCU framework achieves an optimal balance: 1486.1% mitigation with acceptable 11.2% clean degradation (0.0613 vs. 0.0551). This single-seed ablation result demonstrates the synergistic interaction between both components; joint unlearning provides the primary backdoor elimination capability, while contrastive compensation prevents catastrophic forgetting of legitimate features. The result validates our theoretical framework that predicted the necessity of both information-theoretic unlearning and contrastive feature preservation.

While this single-seed ablation achieves 1486.1% mitigation under controlled conditions, the 5-seed validation experiment (Section 6.2) demonstrates more conservative performance with a mean 629.5 ± 191.2% (range: 390.7–914.6%) across different random initializations. This variance reflects the stochastic nature of backdoor injection, model initialization, and unlearning dynamics. The ablation result represents an upper performance bound, while the 5-seed average provides the robust estimate for practical deployment.

6.4. Multi-Domain Signal Processing Analysis

Semantic communication systems may operate across diverse signal representations depending on channel characteristics and application requirements. We evaluate SCU’s effectiveness across seven signal processing domains: time, frequency (FFT), Z-domain (STFT), wavelet, Laplace, cepstral, and Hilbert. Table 5 presents the results for each domain with full spectrum preservation (no downsampling or frequency truncation).

SCU demonstrates strong performance across domains, with particularly high effectiveness in time-domain (1274.0%) and wavelet-domain (1246.2%) representations. Time-domain superiority likely stems from the fact that backdoor triggers are injected as temporal perturbations, making them more directly observable in the original signal space. Wavelet decomposition preserves this temporal localization while adding multi-resolution analysis, enabling effective backdoor feature isolation. The cepstral domain achieves 551.5% mitigation, benefiting from homomorphic signal processing that naturally separates source (semantic) and filter (backdoor) components.

Transform domains with higher baseline backdoor MSE (frequency: 0.1711; Z-domain: 0.1759) show relatively lower mitigation percentages, suggesting that these representations inherently obscure backdoor patterns through spectral spreading. However, even the lowest-performing domain (Z-domain: 36.2%) still achieves meaningful backdoor degradation. The consistency of training time across domains (148.7–153.0 s) confirms that SCU’s computational complexity is domain-agnostic and scales primarily with data dimensionality rather than representation choice.

These results validate that SCU is not limited to time-domain semantic communication but generalizes across diverse signal processing paradigms. Practitioners can select the most appropriate domain based on their specific requirements—prioritizing time or wavelet domains for maximum backdoor resilience, or frequency/Z-domains for applications where clean performance preservation is critical.

The mitigation performance varies across transform domains. Time-domain (1274%) and wavelet-domain (1246%) representations achieve the strongest suppression, since the trigger energy remains temporally localized. In contrast, frequency-domain (140%) and Z-domain (36%) representations show reduced effectiveness due to spectral spreading across multiple bins. These results indicate that SCU is most effective when the applied transformation preserves the

L^{0}

-norm structure of the trigger, thereby maintaining its separability from clean semantic features.

6.5. Robustness Analysis

Real-world semantic communication systems must operate under variable channel conditions. We evaluate SCU robustness across SNR levels and fading channel models.

6.5.1. SNR Robustness

We test SCU performance under additive white Gaussian noise (AWGN) with the SNR ranging from −5 dB to 25 dB. Figure 4 presents clean MSE and backdoor mitigation across SNR levels.

SCU maintains exceptional backdoor mitigation across the SNR spectrum, 280.1% at −5 dB, increasing monotonically to 1436.7% at 25 dB. At practical operating SNRs (10–20 dB), mitigation consistently exceeds 1200%, confirming that SCU effectiveness is not limited to high-quality channel conditions. The clean MSE degradation follows expected patterns, increasing from 0.0562 (25 dB) to 0.3254 (−5 dB) as noise levels rise, but remaining proportional to the baseline model’s degradation. Interestingly, backdoor mitigation improves with the SNR, suggesting that cleaner signals enable more precise semantic feature separation during contrastive compensation. However, even at severe noise levels (−5 dB), SCU achieves nearly 300% mitigation, indicating robust operation across the full practical SNR range for wireless semantic communication systems.

6.5.2. Fading Channel Performance

We evaluate three fading models: AWGN (baseline), Rayleigh (non-line of sight), and Rician with K-factors of 6 dB and 10 dB (partial line of sight). Results at SNR = 10 dB are presented in Table 6.

Fading channels introduce amplitude fluctuations that slightly degrade both clean MSE and mitigation effectiveness. Rayleigh fading, representing the worst-case scenario with no direct path, shows a 4.6% clean MSE increase and 8.3% mitigation reduction compared to AWGN. Rician channels with partial line of sight (K = 10 dB) demonstrate better resilience with only 1.2% and 2.2% degradation, respectively. Critically, all fading scenarios maintain mitigation above 1000%, confirming that SCU’s core mechanism, semantic–backdoor feature separation, remains effective despite multiplicative channel distortions. The graceful degradation under fading validates SCU’s practical applicability in mobile and vehicular semantic communication scenarios.

6.5.3. Imperfect Detection Robustness

In practice, backdoor sample detection may be imperfect, characterized by a false positive rate (FPR) and false negative rate (FNR). We simulate detection errors and apply SCU with the identified sample set. Table 7 presents the results for key error configurations.

The results reveal asymmetric sensitivity to detection errors. False positives (incorrectly flagged clean samples) have minimal impact, even at FPR = 0.10, and mitigation remains at 480.6%, only 25% below perfect detection. This robustness stems from SCU’s information-theoretic formulation: incorrectly erasing clean samples causes them to be relearned during contrastive compensation, effectively self-correcting the error. Conversely, false negatives (missed poisoned samples) severely degrade performance, with FNR = 0.05 reducing mitigation to 81.5%. Undetected backdoor samples remain in the preservation set and their backdoor features are actively reinforced during Stage 2, counteracting unlearning efforts. This asymmetry suggests a practical defense strategy. Conservative detection thresholds should prioritize recall over precision, tolerating a higher FPR to minimize the FNR. The results confirm that SCU is robust to realistic detection imperfections, but proper integration with backdoor detection methods remains critical for optimal performance.

6.6. Computational Efficiency

We analyze SCU’s computational requirements and compare against the retraining baseline. Table 8 presents timing results from experiments conducted on an NVIDIA T4 GPU with batch size 128.

SCU achieves 2× speedup over complete retraining while delivering 6× superior backdoor mitigation (629.5% vs. 110.0%). The unlearning phase (243.1 s) dominates the total execution time but remains significantly faster than training a new model from scratch.

Theoretical complexity analysis confirms

O (T \cdot m \cdot d)

scaling, where T is the iteration count, m is the input dimension (128), and d is the latent dimension (64). This linear scaling with dimensionality makes SCU practical for high-dimensional semantic communication systems. The two-stage optimization converges faster than joint optimization due to the separation of unlearning and preservation objectives, which reduces gradient interference and accelerates convergence (empirically verified in Section 6.3 where joint-only requires 72.3 s vs. full SCU’s 60.7 s per stage).

6.7. Adaptive Backdoor Analysis

We evaluate SCU against adaptive adversaries who design stealthy backdoor triggers to evade detection. Table 9 presents the results for four trigger types across three signal domains.

Blend attacks, which subtly combine trigger patterns with benign inputs at a low blending ratio (

α = 0.2

), achieve near-perfect stealthiness across all domains (1.1–3.1% mitigation). This resistance occurs because blend triggers distribute their energy across the entire signal, making them indistinguishable from natural semantic variations. SCU’s information-theoretic approach, while effective against localized triggers, struggles with such diffuse perturbations that overlap with the semantic subspace.

Sinusoidal triggers, injected as frequency-domain perturbations, show moderate vulnerability (6.8–12.3%), with the cepstral domain providing the best mitigation due to its source–filter separation properties. Input-dependent attacks, which adapt triggers based on sample-specific features, exhibit high variance: minimal mitigation in time/wavelet domains (11.0%, 14.6%) but exceptional vulnerability in the cepstral domain (61.1%). This domain-specific weakness suggests that homomorphic processing naturally separates adaptive perturbations from semantic content.

These results highlight a fundamental trade-off: stealthier backdoors (blend) are harder to unlearn but also more difficult to exploit operationally, while more effective backdoors (input-dependent) create stronger semantic–backdoor separation that facilitates SCU’s detection and mitigation. Future work should explore ensemble defenses that combine SCU with spectral analysis specifically targeting blend-style attacks.

6.8. Summary of Key Findings

Our extensive experimental evaluation demonstrates that SCU achieves the following: (1) superior backdoor mitigation with 629.5 ± 191.2% average improvement, significantly outperforming six baseline methods; (2) statistical robustness confirmed across five independent runs with

p < 0.001

; (3) domain generalization with >500% mitigation in the time, wavelet, and cepstral domains; (4) channel resilience maintaining >1000% mitigation across the SNR levels and fading conditions; (5) computational efficiency with 2× speedup over retraining; and (6) smooth degradation under imperfect detection and adaptive attacks. These results validate SCU as a practical and theoretically grounded solution for backdoor defense in semantic communication systems.

7. Discussion

The empirical results in Figure 5 demonstrate monotonic convergence of the training loss over 20 epochs, with the unlearning objective decreasing consistently during both the joint unlearning and contrastive compensation stages. While formal convergence guarantees typically require convexity assumptions, our empirical trajectory confirms practical stability under the non-convex VIB landscape. Future work could establish relaxed convergence rates using Polyak–Łojasiewicz [54] inequalities or restricted strong convexity conditions.

Variational Information Bottleneck provides three critical properties that enable SCU’s unprecedented backdoor mitigation: (1) stochastic embeddings

q (z | x) = N (μ (x), σ^{2} (x))

enable soft forgetting by increasing variance on poisoned samples, diluting backdoor information without catastrophic weight changes; (2) information-theoretic control through the KL term

D_{KL} (q (z | x) ∥ p (z))

explicitly constrains mutual information

I (X; Z)

, providing principled backdoor removal unavailable in deterministic networks; and (3) regularization against catastrophic forgetting via

D_{KL} (q_{new} ∥ q_{original})

maintains clean sample representations throughout unlearning. These properties distinguish SCU from heuristic defenses (e.g., pruning and adversarial training) that lack information-theoretic grounding.

SCU’s modular encoder–decoder architecture naturally supports split inference frameworks for edge deployment, where semantic encoders operate on resource-constrained IoT devices while unlearning executes at edge servers without cloud connectivity. This enables localized backdoor mitigation in distributed 6G networks.

In frequency-selective fading environments, SCU can be seamlessly integrated with OFDM systems, since the unlearning process operates in the latent space and remains agnostic to the physical-domain representation. As shown in Section 6.5, the performance degradation remains limited under Rayleigh fading, with an 8.3% loss, and shows similar robustness under Rician fading. As a direction for future work, channel-aware detection thresholds that adapt to instantaneous SNR conditions could further improve reliability. The 35-fold performance gap across domains, from 36% in the Z-domain to 1274% in the time domain, indicates that SCU is inherently dependent on the trigger representation rather than being universally effective. SCU performs best when the chosen defense domain preserves the

L^{0}

-norm structure of the trigger, such as in time and wavelet domains, and degrades when transformations disperse trigger energy, as in frequency and Z-domain representations. These findings motivate a domain–attack co-design principle, where defense representations are selected to maximize trigger concentration while preserving essential semantic information.

Domain selection should prioritize time or wavelet representations for maximum backdoor detectability, while frequency-domain approaches may suffice for applications with relaxed security requirements. The substantial improvement over existing unlearning methods, whereby SCU achieves 629.5% mitigation compared to near-zero performance for Variational Bayesian Unlearning (−2.0%) and Hessian-based approaches (−2.4%), validates that operating in latent representation space, rather than parameter space, is essential for semantic communication backdoor removal. Three key limitations warrant further investigation. First, blend backdoors remain challenging, with only a 1.8% average mitigation rate across domains (range: 1.1–3.1%). This represents a fundamental detection–unlearning trade-off where stealthy triggers evade both identification and removal. Potential solutions include generative likelihood-based detection or ensemble approaches combining SCU with trigger inversion methods. Second, computational cost (243 s unlearning time) may limit deployment in latency-critical applications, though our convergence analysis shows approximately 80% of the final mitigation is achievable by epoch 5, enabling 120 s rapid response modes for time-sensitive scenarios. Third, adaptive adversaries may exploit SCU’s contrastive mechanism to design unlearning-resistant backdoors. Adversarial training with meta-learned triggers could improve robustness against such attacks.

Limitations and Future Directions

Our evaluation reveals several important limitations that warrant further investigation. First, SCU struggles against blend backdoors due to their stealthy, low-amplitude nature, achieving only 1.8% average mitigation with performance ranging from 1.1% to 3.1% across domains (Table 9). This represents a fundamental detection–unlearning trade-off where triggers that evade detection also resist removal. Ensemble approaches combining SCU with generative likelihood-based detection such as VAE anomaly scoring or multi-domain voting could enhance trigger identification before unlearning.

Additionally, SCU exhibits sensitivity to detection accuracy, as demonstrated in Appendix A. The framework requires a false negative rate (FNR) below 5% for effective mitigation. At FNR = 0%, SCU achieves 640.5% mitigation, but performance degrades to 81.5% at FNR = 5%. Real-world detectors may not consistently achieve this threshold, particularly against adaptive adversaries. To address this limitation, iterative refinement can be employed: apply SCU, evaluate the residual attack success rate, re-detect the remaining backdoor patterns, and repeat. Preliminary analysis indicates that two to three iterations achieve robust backdoor elimination even when the initial false negative rate reaches 15%.

The 243 s unlearning time is acceptable for offline security updates but limits real-time adaptation during active communication sessions. Several practical strategies can mitigate this constraint. Early stopping at epoch 5 reduces the runtime to approximately 120 s while retaining around 80% of final mitigation performance. Pre-computed SCU solutions for commonly observed backdoor patterns identified through threat intelligence enable a sub-second response via direct lookup. Amortized unlearning with differential parameter updates allows for incremental mitigation within 10 to 20 s. In practice, lightweight backdoor detectors with millisecond-level latency can flag suspicious samples and trigger offline SCU procedures when necessary.

Performance variability across runs also merits consideration. The 2.3-fold variance between worst-case (390.7%) and best-case (914.6%) single-run mitigation (Figure 3) reflects stochastic training dynamics and backdoor pattern variability. While all runs substantially outperform baselines, practitioners should budget for this uncertainty in deployment planning. For security-critical applications, we recommend reporting the 95% confidence interval lower bound (364.1%) as a conservative performance estimate.

Finally, the substantial domain-specific performance gaps ranging 35-fold from 36% in the Z-domain to 1274% in the time domain require careful domain selection for specific applications (Table 5). Time and wavelet domains offer superior backdoor detectability due to temporal localization properties, while frequency-domain representations may be insufficient for high-security deployments. Domain–attack co-design provides a principled approach where defense domain selection matches expected trigger characteristics based on threat modeling to optimize SCU effectiveness for specific deployment scenarios.

8. Conclusions

This paper introduces SCU, the first information-theoretic backdoor defense framework for semantic communication systems, achieving 629.5 ± 191.2% backdoor mitigation across five independent runs (95% CI: [364.1%, 895.0%]) with only 11.5% clean performance degradation. Single-seed ablation studies demonstrate up to 1486% mitigation under controlled conditions, validating the synergistic interaction between the joint unlearning and contrastive compensation stages. This represents an order-of-magnitude improvement over detection-based methods (Neural Cleanse: 7.4%, 85× improvement) and fundamentally outperforms existing unlearning approaches that achieve near-zero or negative mitigation (Variational Bayesian: −2.0%; Hessian: −2.4%).

SCU combines joint encoder–decoder unlearning with entropy maximization and contrastive compensation to selectively erase backdoor knowledge while preserving semantic encoding capability. Extensive validation on RML2016.10a wireless signals demonstrates effectiveness across seven signal processing domains (time, frequency, Z-domain, wavelet, Laplace, cepstral, and Hilbert), four adaptive backdoor types (standard, blend, sinusoidal, and input-dependent), and challenging channel conditions (SNR from −5 dB to +25 dB; Rayleigh and Rician fading). The framework maintains computational feasibility with a 243 s unlearning time, making it suitable for resource-constrained edge deployment. Theoretical analysis provides convergence guarantees via Lyapunov stability and information-theoretic justification for SCU’s two-stage mechanism. Ablation studies confirm that both unlearning stages are necessary; joint unlearning alone achieves 1891% mitigation but with 8.5% clean degradation, while contrastive compensation alone provides only 49.9% mitigation. The synergistic combination balances backdoor elimination with semantic feature preservation.

Future work will address three critical challenges: (1) developing ensemble detection–unlearning frameworks to handle blend backdoors that currently resist SCU (1.8% average mitigation); (2) extending SCU to multi-modal semantic systems combining text, image, and sensor data; and (3) investigating meta-learning defenses against adaptive adversaries who may design triggers specifically to evade information-theoretic unlearning mechanisms. Additionally, exploring hardware acceleration and distributed SCU implementations could reduce the unlearning time below 100 s, enabling real-time backdoor response in operational 6G networks.

Author Contributions

S.N.K. provided the vast majority of the content for this work, taking the lead in the conceptualization, methodology, software development, formal analysis, investigation, data curation, and writing of the initial draft of the manuscript. M.G. contributed to the methodology, validation, visualization, and manuscript review and editing, providing substantial improvements in clarity and presentation. M.S.O. and N.B. supervised the work and provided critical oversight and guidance throughout. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The study is based on an existing dataset, on which experimental modifications and adjustments were performed. No new datasets were generated. The underlying dataset is available from the original source, while the modified experimental data are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to thank The Scientific and Technological Research Council of Türkiye (TÜBİTAK) and Türk Telekom 6G R&D Lab for their support.

Conflicts of Interest

Authors Sümeye Nur Karahan, Merve Güllü, Mustafa Serdar Osmanca were employed by the company Türk Telekom. The remaining author declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Full Detection Robustness Results

Table A1 presents the complete 16-scenario evaluation of SCU under varying detection errors (FPR and FNR).

Table A1. Complete detection error robustness results showing SCU performance degradation under imperfect backdoor sample identification.

FPR (%)	FNR (%)	Detected	Clean MSE	BD MSE	Mitigation (%)
0.00	0.00	1080	0.0592	0.3841	640.5
0.00	0.05	1026	0.0588	0.0942	81.5
0.00	0.10	972	0.0585	0.0718	38.5
0.00	0.20	864	0.0580	0.0598	15.0
0.01	0.00	1141	0.0571	0.3975	666.4
0.01	0.05	1087	0.0568	0.0925	78.3
0.01	0.10	1033	0.0565	0.0701	36.1
0.01	0.20	925	0.0560	0.0582	12.4
0.05	0.00	1386	0.0566	0.2950	468.8
0.05	0.05	1332	0.0563	0.1342	159.0
0.05	0.10	1278	0.0560	0.0886	71.0
0.05	0.20	1170	0.0555	0.0639	23.3
0.10	0.00	1692	0.0567	0.3011	480.6
0.10	0.05	1638	0.0564	0.1389	167.9
0.10	0.10	1584	0.0561	0.0919	77.1
0.10	0.20	1476	0.0556	0.0653	25.9

Key observations:

Diagonal pattern: Performance degrades monotonically as both the FPR and FNR increase from top-left (perfect detection) to bottom-right (20% error rates).
FNR dominates: Holding the FPR constant at 0%, and increasing the FNR from 0% to 20%, causes 95% mitigation loss (640.5% → 15.0%). Undetected backdoor samples remain in the preservation set, where Stage 2 actively reinforces their features.
FPR tolerance: Holding the FNR constant at 0%, the FPR can increase to 10% with only 25% mitigation loss (640.5% → 480.6%). False positives (clean samples misidentified as poisoned) undergo unlearning but are subsequently recovered during contrastive compensation, demonstrating SCU’s self-correcting property.
Realistic threshold: FPR = 1% and FNR = 5% represents achievable detection performance with state-of-the-art methods, yielding 78.3% mitigation, acceptable for moderate-security deployments where some residual backdoor vulnerability is tolerable.
Robustness score: 25% of the scenarios (4/16) maintain mitigation above 300%, confirming graceful degradation under imperfect detection.

Appendix B. L2 vs. KL Divergence Regularization

Algorithm 1 employs L2 distance rather than full KL divergence in regularization terms (lines 9 and 17) for three reasons: computational efficiency, numerical stability, and first-order equivalence when variances are matched.

Appendix B.1. Theoretical Justification

When the encoder variance stabilizes during early training epochs such that

σ_{θ} (x) \approx σ_{ref} (x)

, the KL divergence between two Gaussian distributions admits a first-order approximation:

D_{KL} (N (μ_{θ}, σ_{θ}^{2}) ∥ N (μ_{ref}, σ_{ref}^{2})) \approx \frac{1}{2 σ^{2}} ∥ μ_{θ} - μ_{ref} ∥^{2} + O (∥ σ_{θ} - σ_{ref} ∥^{2})

(A1)

Thus, when variances are approximately matched—enforced by our fixed reference encoder—the L2 distance on mean vectors serves as a valid surrogate for KL divergence, capturing the dominant term while discarding higher-order corrections. This approximation maintains the information-theoretic principle of minimizing distributional divergence while enabling faster computation.

Appendix B.2. Computational Efficiency

Computing exact KL divergence for multivariate Gaussians requires

O (d^{3})

operations due to covariance matrix inversion and determinant computation, where

d = 64

is the latent dimension. In contrast, L2 distance on mean vectors requires only

O (d)

operations. For our training set of 7200 samples with batch size 128, this reduces the per-batch computation from approximately

O (10^{6})

to

O (10^{4})

operations.

Our implementation achieves a 243 s unlearning time (20 epochs: 10 for joint unlearning + 10 for contrastive compensation) using L2 regularization. Based on preliminary profiling, full KL divergence computation would increase this to approximately 320 s (32% overhead), as KL computation would dominate the backward pass for high-dimensional latent representations.

Appendix B.3. Numerical Stability

L2 regularization avoids numerical issues that can arise with KL divergence when variances approach zero or become mismatched during optimization. The squared distance formulation ensures smooth gradients throughout training, whereas KL divergence can produce unstable gradients when

σ_{θ} ≪ σ_{ref}

or vice versa. This stability is particularly important during the unlearning phase, where encoder parameters undergo significant updates. For standard backdoor defense scenarios, L2 regularization provides an optimal efficiency–effectiveness trade-off while maintaining the core information-theoretic principle of SCU. For applications requiring strict probabilistic alignment (e.g., Bayesian model ensembles or when significant variance mismatch is expected), replacing L2 terms with full KL divergence in Algorithm 1 lines 9 and 17 is straightforward but will increase computational cost by approximately 30%.

References

Shao, Y.; Cao, Q.; Gündüz, D. A Theory of Semantic Communication. IEEE Trans. Mob. Comput. 2024, 23, 12211–12228. [Google Scholar] [CrossRef]
Luo, X.; Chen, H.H.; Guo, Q. Semantic Communications: Overview, Open Issues, and Future Research Directions. IEEE Wirel. Commun. 2022, 29, 210–219. [Google Scholar] [CrossRef]
Jain, K.; Krishnan, P.; Pachiyannan, P.; Jaganathan, L.; Khan, M.A.; Li, Y. Toward Smart 5G and 6G: Standardization of AI-Native Network Architectures and Semantic Communication Protocols. IEEE Commun. Stand. Mag. 2025. early access. [Google Scholar] [CrossRef]
Zhao, T.; Li, F.; Du, H.; Sun, L. Deep Reinforcement Learning- and Information Bottleneck-Enabled Task-Oriented Semantic Communication. IEEE J. Sel. Areas Commun. 2025. early access. [Google Scholar] [CrossRef]
Barbarossa, S.; Comminiello, D.; Grassucci, E.; Pezone, F.; Sardellitti, S.; Di Lorenzo, P. Semantic Communications Based on Adaptive Generative Models and Information Bottleneck. IEEE Commun. Mag. 2023, 61, 36–41. [Google Scholar] [CrossRef]
Jin, L.X.; Jiang, W.; Wen, X.Y.; Lin, M.Y.; Zhan, J.Y.; Zhou, X.Z.; Habtie, M.A.; Werghi, N. A survey of backdoor attacks and defences: From deep neural networks to large language models. J. Electron. Sci. Technol. 2025, 23, 100326. [Google Scholar] [CrossRef]
Hanif, M.A.; Chattopadhyay, N.; Ouni, B.; Shafique, M. Survey on Backdoor Attacks on Deep Learning: Current Trends, Categorization, Applications, Research Challenges, and Future Prospects. IEEE Access 2025, 13, 93190–93221. [Google Scholar] [CrossRef]
Bai, Y.; Xing, G.; Wu, H.; Rao, Z.; Ma, C.; Wang, S.; Liu, X.; Zhou, Y.; Tang, J.; Huang, K.; et al. Backdoor Attack and Defense on Deep Learning: A Survey. IEEE Trans. Comput. Soc. Syst. 2025, 12, 404–434. [Google Scholar] [CrossRef]
Liu, A.; Liu, X.; Zhang, X.; Xiao, Y.; Zhou, Y.; Liang, S.; Wang, J.; Cao, X.; Tao, D. Pre-trained Trojan Attacks for Visual Recognition. Int. J. Comput. Vis. 2025, 133, 3568–3585. [Google Scholar] [CrossRef]
Chen, Z.; Liu, S.; Niu, Q. Black-box backdoor attack with everyday physical object in mobile crowdsourcing. Expert Syst. Appl. 2025, 265, 125892. [Google Scholar] [CrossRef]
Li, Z.; Lan, J.; Yan, Z.; Gelenbe, E. Backdoor attacks and defense mechanisms in federated learning: A survey. Inf. Fusion 2025, 123, 103248. [Google Scholar] [CrossRef]
Zhang, M.; Shen, X.; Cao, J.; Cui, Z.; Jiang, S. EdgeShard: Efficient LLM Inference via Collaborative Edge Computing. IEEE Internet Things J. 2025, 12, 13119–13131. [Google Scholar] [CrossRef]
Lyu, Z.; Xiao, M.; Xu, J.; Skoglund, M.; Renzo, M.D. The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks. IEEE J. Sel. Areas Commun. 2025. early access. [Google Scholar] [CrossRef]
Zhou, Z.; Chen, X.; Li, E.; Zeng, L.; Luo, K.; Zhang, J. Edge Intelligence: Paving the Last Mile of Artificial Intelligence with Edge Computing. Proc. IEEE 2019, 107, 1738–1762. [Google Scholar] [CrossRef]
Li, Y.; Zhang, S.; Wang, W.; Song, H. Backdoor Attacks to Deep Learning Models and Countermeasures: A Survey. IEEE Open J. Comput. Soc. 2023, 4, 134–146. [Google Scholar] [CrossRef]
Gu, T.; Dolan-Gavitt, B.; Garg, S. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. arXiv 2019, arXiv:1708.06733. [Google Scholar] [CrossRef]
Chen, X.; Salem, A.; Chen, D.; Backes, M.; Ma, S.; Shen, Q.; Wu, Z.; Zhang, Y. BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements. In Proceedings of the 37th Annual Computer Security Applications Conference, New York, NY, USA, 6–10 December 2021; pp. 554–569. [Google Scholar] [CrossRef]
Gu, Z.; Shi, J.; Yang, Y. ANODYNE: Mitigating backdoor attacks in federated learning. Expert Syst. Appl. 2025, 259, 125359. [Google Scholar] [CrossRef]
Gu, T.; Liu, K.; Dolan-Gavitt, B.; Garg, S. BadNets: Evaluating Backdooring Attacks on Deep Neural Networks. IEEE Access 2019, 7, 47230–47244. [Google Scholar] [CrossRef]
Peng, H.; Qiu, H.; Ma, H.; Wang, S.; Fu, A.; Al-Sarawi, S.F.; Abbott, D.; Gao, Y. On Model Outsourcing Adaptive Attacks to Deep Learning Backdoor Defenses. IEEE Trans. Inf. Forensics Secur. 2024, 19, 2356–2369. [Google Scholar] [CrossRef]
Mengara, O.; Avila, A.; Falk, T.H. Backdoor Attacks to Deep Neural Networks: A Survey of the Literature, Challenges, and Future Research Directions. IEEE Access 2024, 12, 29004–29023. [Google Scholar] [CrossRef]
Guo, W.; Tondi, B.; Barni, M. An Overview of Backdoor Attacks Against Deep Neural Networks and Possible Defences. IEEE Open J. Signal Process. 2022, 3, 261–287. [Google Scholar] [CrossRef]
Roux, Q.L.; Bourbao, E.; Teglia, Y.; Kallas, K. A Comprehensive Survey on Backdoor Attacks and Their Defenses in Face Recognition Systems. IEEE Access 2024, 12, 47433–47468. [Google Scholar] [CrossRef]
Huang, S.; Li, Y.; Chen, C.; Shi, L.; Cai, W.; Gao, Y. FedCleanse: Cleanse the backdoor attacks in federated learning system. Knowl.-Based Syst. 2025, 330, 114494. [Google Scholar] [CrossRef]
Wan, Y.; Qu, Y.; Ni, W.; Xiang, Y.; Gao, L.; Hossain, E. Data and Model Poisoning Backdoor Attacks on Wireless Federated Learning, and the Defense Mechanisms: A Comprehensive Survey. IEEE Commun. Surv. Tutorials 2024, 26, 1861–1897. [Google Scholar] [CrossRef]
Baishya, N.M.; Manoj, B.R.; Karmakar, S. A Novel and Efficient Multi-Target Backdoor Attack for Deep Learning-Based Wireless Signal Classifiers. IEEE Access 2025, 13, 65863–65883. [Google Scholar] [CrossRef]
Wang, B.; Yao, Y.; Shan, S.; Li, H.; Viswanath, B.; Zheng, H.; Zhao, B.Y. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 707–723. [Google Scholar] [CrossRef]
Tran, B.; Li, J.; Madry, A. Spectral Signatures in Backdoor Attacks. In Proceedings of the Advances in Neural Information Processing Systems; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar] [CrossRef]
Zhang, F.; Li, J.; Huang, W.; Chen, X. BMAIU: Backdoor Mitigation in Self-Supervised Learning Through Active Implantation and Unlearning. Electronics 2025, 14, 1587. [Google Scholar] [CrossRef]
Li, Y.; He, J.; Huang, H.; Sun, J.; Ma, X.; Jiang, Y.G. Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor Attacks. IEEE Trans. Dependable Secur. Comput. 2025. early access. [Google Scholar] [CrossRef]
Liu, K.; Dolan-Gavitt, B.; Garg, S. Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks. In Proceedings of the Research in Attacks, Intrusions, and Defenses; Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 11050, pp. 273–294. [Google Scholar] [CrossRef]
Gan, X.; Wang, H.; Li, X.; Liu, Z.; Jiang, H.; Wang, J. A Multitarget Backdoor Attack Against Automatic Modulation Recognition for IoT Wireless Signals. IEEE Internet Things J. 2025, 12, 27588–27605. [Google Scholar] [CrossRef]
Yang, G.; Duan, T.; Hu, J.E.; Salman, H.; Razenshteyn, I.; Li, J. Randomized Smoothing of All Shapes and Sizes. In Proceedings of the 37th International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; Volume 119, pp. 10693–10705. Available online: http://proceedings.mlr.press/v119/yang20a.html (accessed on 7 December 2025).
Bourtoule, L.; Chandrasekaran, V.; Choquette-Choo, C.A.; Jia, H.; Travers, A.; Zhang, B.; Lie, D.; Papernot, N. Machine Unlearning. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 24–27 May 2021; pp. 141–159. [Google Scholar] [CrossRef]
Cevallos, I.D.; Benalcázar, M.E.; Valdivieso Caraguay, Á.L.; Zea, J.A.; Barona-López, L.I. A Systematic Literature Review of Machine Unlearning Techniques in Neural Networks. Computers 2025, 14, 150. [Google Scholar] [CrossRef]
Golatkar, A.; Achille, A.; Soatto, S. Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
Nguyen, Q.P.; Low, B.K.H.; Jaillet, P. Variational Bayesian Unlearning. In Proceedings of the Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 16025–16036. Available online: https://proceedings.neurips.cc/paper/2020/file/b8a6550662b363eb34145965d64d0cfb-Paper.pdf (accessed on 4 December 2025).
Liu, Y.; Fan, M.; Chen, C.; Liu, X.; Ma, Z.; Wang, L.; Ma, J. Backdoor Defense with Machine Unlearning. In Proceedings of the IEEE INFOCOM 2022—IEEE Conference on Computer Communications, London, UK, 2–5 May 2022; pp. 280–289. [Google Scholar] [CrossRef]
Wu, C.; Zhu, S.; Mitra, P.; Wang, W. Unlearning Backdoor Attacks in Federated Learning. In Proceedings of the 2024 IEEE Conference on Communications and Network Security (CNS), Taipei, Taiwan, 30 September–3 October 2024; pp. 1–9. [Google Scholar] [CrossRef]
Guo, Y.; Zhao, Y.; Hou, S.; Wang, C.; Jia, X. Verifying in the Dark: Verifiable Machine Unlearning by Using Invisible Backdoor Triggers. IEEE Trans. Inf. Forensics Secur. 2024, 19, 708–721. [Google Scholar] [CrossRef]
Liu, Z.; Wang, T.; Huai, M.; Miao, C. Backdoor Attacks via Machine Unlearning. Proc. AAAI Conf. Artif. Intell. 2024, 38, 14115–14123. [Google Scholar] [CrossRef]
Xie, H.; Qin, Z.; Li, G.Y.; Juang, B.H. Deep Learning Enabled Semantic Communication Systems. IEEE Trans. Signal Process. 2021, 69, 2663–2675. [Google Scholar] [CrossRef]
Li, Y.; Shi, Z.; Hu, H.; Fu, Y.; Wang, H.; Lei, H. Secure Semantic Communications: From Perspective of Physical Layer Security. IEEE Commun. Lett. 2024, 28, 2243–2247. [Google Scholar] [CrossRef]
Alemi, A.A.; Fischer, I.; Dillon, J.V.; Murphy, K. Deep variational information bottleneck. arXiv 2016, arXiv:1612.00410. [Google Scholar] [CrossRef]
Schmitt, M.S.; Koch-Janusz, M.; Fruchart, M.; Seara, D.S.; Rust, M.; Vitelli, V. Information theory for data-driven model reduction in physics and biology. arXiv 2023, arXiv:2312.06608. [Google Scholar] [CrossRef]
Futami, F.; Iwata, T.; Ueda, N.; Sato, I.; Sugiyama, M. Excess risk analysis for epistemic uncertainty with application to variational inference. arXiv 2022, arXiv:2206.01606. [Google Scholar] [CrossRef]
Belghazi, M.I.; Baratin, A.; Rajeshwar, S.; Ozair, S.; Bengio, Y.; Courville, A.; Hjelm, D. Mutual Information Neural Estimation. In Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 531–540. Available online: http://proceedings.mlr.press/v80/belghazi18a.html (accessed on 4 December 2025).
Song, J.; Ermon, S. Understanding the limitations of variational mutual information estimators. arXiv 2019, arXiv:1910.06222. [Google Scholar] [CrossRef]
Parulekar, A.; Collins, L.; Shanmugam, K.; Mokhtari, A.; Shakkottai, S. InfoNCE Loss Provably Learns Cluster-Preserving Representations. In Proceedings of the Thirty-Sixth Conference on Learning Theory, PMLR, Bangalore, India, 12–15 July 2023; Volume 195, pp. 1914–1961. Available online: https://proceedings.mlr.press/v195/parulekar23a.html (accessed on 4 December 2025).
Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef]
Yao, Y.; Li, H.; Zheng, H.; Zhao, B.Y. Latent Backdoor Attacks on Deep Neural Networks. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 2041–2055. [Google Scholar] [CrossRef]
O’Shea, T.J.; Corgan, J.; Clancy, T.C. Convolutional Radio Modulation Recognition Networks. In Proceedings of the Engineering Applications of Neural Networks; Jayne, C., Iliadis, L., Eds.; Communications in Computer and Information Science; Springer: Cham, Switzerland, 2016; Volume 629, pp. 213–226. [Google Scholar] [CrossRef]
Chen, B.; Carvalho, W.; Baracaldo, N.; Ludwig, H.; Edwards, B.; Lee, T.; Molloy, I.; Srivastava, B. Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering. arXiv 2018, arXiv:1811.03728. [Google Scholar] [CrossRef]
Karimi, H.; Nutini, J.; Schmidt, M. Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Cham, Switzerland, 2016; pp. 795–811. [Google Scholar] [CrossRef]

Figure 1. SCU system architecture. The framework consists of three phases: (1) adversarial training with backdoor injection, (2) two-stage unlearning with joint encoder–decoder optimization and contrastive compensation, and (3) evaluation on clean and triggered test sets.

Figure 3. Statistical validation across 5 independent runs with different random seeds. Left: Histogram shows mitigation distribution with mean 629.5% (red dashed line) and 95% confidence interval [364.1%, 895.0%] (orange dashed lines). Middle: Run-by-run consistency plot demonstrates stable performance across sequential experiments, with individual run values (green line) fluctuating around the mean (red dashed line) within the 95% confidence band (green shaded area). Right: Box plot summarizes central tendency (median: 652.7%) and variance (IQR: 390.7–914.6%), with statistical significance confirmed at

p < 0.001

(one-sample t-test,

t = 6.585

, Cohen’s

d = 3.29

).

Figure 3. Statistical validation across 5 independent runs with different random seeds. Left: Histogram shows mitigation distribution with mean 629.5% (red dashed line) and 95% confidence interval [364.1%, 895.0%] (orange dashed lines). Middle: Run-by-run consistency plot demonstrates stable performance across sequential experiments, with individual run values (green line) fluctuating around the mean (red dashed line) within the 95% confidence band (green shaded area). Right: Box plot summarizes central tendency (median: 652.7%) and variance (IQR: 390.7–914.6%), with statistical significance confirmed at

p < 0.001

(one-sample t-test,

t = 6.585

, Cohen’s

d = 3.29

).

Figure 4. SCU robustness under varying SNR conditions. Left: Clean MSE increases with noise but remains bounded. Right: Backdoor mitigation exceeds 1000% for SNR ≥ 5 dB, demonstrating strong noise resilience.

Figure 5. Convergence analysis.

Table 1. Comparison of machine unlearning-based backdoor attack and defense studies.

Paper	Setting/Domain	Purpose	Method Summary	Relation to Our Work
Backdoor Defense with Machine Unlearning (BAERASER) [38]	Centralized CNNs	Defense	Trigger pattern recovery via entropy maximization + gradient-ascent unlearning to erase backdoor features	Closest baseline; however, operates in pixel space, not semantic feature space. Our method targets semantic representations in wireless signal reconstruction.
Unlearning Backdoor Attacks in Federated Learning [39]	Federated Learning	Defense	Historical update subtraction + teacher–student distillation to remove attacker contributions without client participation	Defense for distributed systems; shows unlearning can remove backdoors. Different training architecture; complements our centralized semantic approach.
Verifiable Machine Unlearning with Invisible Backdoor Triggers [40]	MLaaS/Cloud ML	Verification	Invisible LSB-based triggers embedded to validate whether deletion actually happened; verification-focused unlearning	Related to unlearning security, not direct defense. Supports motivation for trustworthy unlearning mechanisms.
Backdoor Attacks via Machine Unlearning [41]	Centralized ML	Attack	Malicious unlearning requests used to inject backdoor triggers without poisoning; optimization-based trigger construction	Motivates threat model; shows unlearning itself can become an attack vector. Our work instead focuses on semantic-level unlearning for defense.
Our Work (SCU)	Centralized Semantic Comm.	Defense	VIB-based joint encoder–decoder unlearning with contrastive compensation	First unlearning defense for wireless semantic communication; operates in latent representation space (not pixel/parameter space); representing 85× improvement over detection-based defenses and fundamentally outperforming existing unlearning methods that achieve near-zero or negative mitigation.

Table 2. SCU hyperparameters determined through ablation studies to maximize backdoor mitigation while constraining clean performance degradation below 15%.

Parameter	Value	Rationale
$α$ (regularization)	1.0	Balance unlearning vs. stability
$τ$ (temperature)	0.5	Sharp contrastive separation
$T_{JU}$ (JU epochs)	10	Empirically sufficient convergence
$T_{CC}$ (CC epochs)	10	Sufficient contrastive refinement
Learning rate	5 × 10⁻⁴	Stable gradient descent
Batch size	128	Computational efficiency
Gradient clipping	[−1, 1]	Prevent exploding gradients
$β$ (VIB)	1 × 10⁻³	Original VIB training setting

Table 3. Performance comparison of SCU against baseline defense methods. Primary metric is 5-seed average for reproducibility (95% CI: [364.1%, 895.0%]; range: [390.7%, 914.6%]). Single-seed ablation study (Section 6.3) achieves 1486.1% under controlled conditions.

Method	Clean MSE	Backdoor MSE	Mitigation (%)
Original (clean baseline)	0.0554	—	—
Poisoned (attack baseline)	0.0561	0.0519	0.0
SCU (5-seed avg.)	0.0618 ± 0.0003	—	629.5 ± 191.2
Retraining from scratch	0.0549	0.1089	110.0
Variational Bayesian	0.0555	0.0521	0.5
Hessian-based	0.0555	0.0519	−0.4
Neural Cleanse + FT	0.0549	0.0557	7.4
Fine-Pruning (30%)	0.0549	0.0575	10.9
Activation Clustering	0.0645	0.1000	92.8

Table 4. Ablation study results demonstrating the necessity of both SCU components. Single-seed experiment (seed 42) conducted to isolate component effects. Clean MSE degradation measured relative to poisoned baseline (0.0551).

Configuration	Clean MSE	BD MSE	Mitigation (%)
Poisoned baseline	0.0551	0.0532	0.0
Joint unlearning only	0.0598	1.0593	1891.4
Contrastive only	0.0550	0.0797	49.9
Full SCU	0.0613	0.8224	1486.1

Table 5. SCU performance across multiple signal processing domains with full spectrum preservation. All models trained for 3 epochs with identical hyperparameters.

Domain	Poisoned BD	SCU BD MSE	Mitigation (%)	Time (s)
Time	0.0508	0.6979	1274.0	151.7
Wavelet (db4)	0.0704	0.9474	1246.2	149.3
Cepstral	0.0647	0.4215	551.5	151.1
Laplace ( $σ = 0.1$ )	0.1089	0.3650	235.1	152.5
Hilbert	0.1719	0.5007	191.3	153.0
Frequency (FFT)	0.1711	0.4103	139.7	148.7
Z-Domain (STFT)	0.1759	0.2396	36.2	151.8

Table 6. SCU performance under different fading channel models at SNR = 10 dB. Degradation measured relative to AWGN baseline.

Channel Model	Clean MSE	Mitigation (%)	Degradation
AWGN (baseline)	0.0693	1268.2	—
Rayleigh	0.0725	1163.0	+4.6%/−8.3%
Rician (K=10 dB)	0.0701	1240.5	+1.2%/−2.2%

Table 7. SCU mitigation under imperfect backdoor detection with varying FPR and FNR. Perfect detection (FPR = 0; FNR = 0) achieves 640.5% mitigation.

FPR	FNR	Detected Samples	Mitigation (%)
0.00	0.00	1080	640.5
0.00	0.05	1026	81.5
0.01	0.00	1141	666.4
0.05	0.00	1386	468.8
0.10	0.00	1692	480.6

Table 8. Computational complexity comparison. SCU requires 10 epochs each for joint unlearning and contrastive compensation stages.

Method	Epochs	Time (s)	Speedup
Poisoned training	5	15.8	—
SCU unlearning	20 (10 + 10)	243.1	—
Total SCU	—	258.9	2.0×
Retraining (full)	10	516.0	1.0×

Table 9. SCU effectiveness against adaptive backdoor attacks across multiple signal domains. Blend attacks are most stealthy but least effective; input-dependent attacks show highest variance.

Backdoor Type	Domain	Poisoned BD	SCU BD	Mit. (%)
Blend ( $α = 0.2$ )	Time	0.0542	0.0549	1.3
	Wavelet	0.0327	0.0330	1.1
	Cepstral	0.0478	0.0493	3.1
Sinusoidal	Time	0.0579	0.0619	6.8
	Wavelet	0.0359	0.0387	7.6
	Cepstral	0.0576	0.0647	12.3
Input-dependent	Time	0.0653	0.0725	11.0
	Wavelet	0.0429	0.0492	14.6
	Cepstral	0.0627	0.1010	61.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Karahan, S.N.; Güllü, M.; Osmanca, M.S.; Barışçı, N. Semantic Communication Unlearning: A Variational Information Bottleneck Approach for Backdoor Defense in Wireless Systems. Future Internet 2026, 18, 17. https://doi.org/10.3390/fi18010017

AMA Style

Karahan SN, Güllü M, Osmanca MS, Barışçı N. Semantic Communication Unlearning: A Variational Information Bottleneck Approach for Backdoor Defense in Wireless Systems. Future Internet. 2026; 18(1):17. https://doi.org/10.3390/fi18010017

Chicago/Turabian Style

Karahan, Sümeye Nur, Merve Güllü, Mustafa Serdar Osmanca, and Necaattin Barışçı. 2026. "Semantic Communication Unlearning: A Variational Information Bottleneck Approach for Backdoor Defense in Wireless Systems" Future Internet 18, no. 1: 17. https://doi.org/10.3390/fi18010017

APA Style

Karahan, S. N., Güllü, M., Osmanca, M. S., & Barışçı, N. (2026). Semantic Communication Unlearning: A Variational Information Bottleneck Approach for Backdoor Defense in Wireless Systems. Future Internet, 18(1), 17. https://doi.org/10.3390/fi18010017

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semantic Communication Unlearning: A Variational Information Bottleneck Approach for Backdoor Defense in Wireless Systems

Abstract

1. Introduction

1.1. Our Contributions

1.2. Paper Organization

2. Related Work

2.1. Backdoor Attacks in Deep Learning

2.2. Backdoor Defense Mechanisms

2.3. Machine Unlearning

2.4. Semantic Communication Security

3. Problem Formulation and Threat Model

3.1. Semantic Communication System

3.2. Threat Model: Backdoor Poisoning

3.3. Defense Success Criteria

4. SCU Methodology

4.1. Overview and Design Rationale

4.2. System Architecture

4.3. Stage 1: Joint Unlearning Algorithm

Algorithm Mechanics and Intuition

4.4. Stage 2: Contrastive Compensation Algorithm

Algorithm Mechanics and Intuition

4.5. Poisoned Sample Detection

4.6. Hyperparameter Configuration

4.7. Theoretical Foundation of the Separability Condition

Empirical Validation

5. Experimental Setup

5.1. Dataset and Preprocessing

5.2. Multi-Domain Signal Representations

5.3. Baseline Defense Mechanisms

5.4. Evaluation Metrics

6. Experimental Results

6.1. Main Performance Comparison

6.2. Statistical Validation

6.3. Ablation Study

6.4. Multi-Domain Signal Processing Analysis

6.5. Robustness Analysis

6.5.1. SNR Robustness

6.5.2. Fading Channel Performance

6.5.3. Imperfect Detection Robustness

6.6. Computational Efficiency

6.7. Adaptive Backdoor Analysis

6.8. Summary of Key Findings

7. Discussion

Limitations and Future Directions

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Full Detection Robustness Results

Appendix B. L2 vs. KL Divergence Regularization

Appendix B.1. Theoretical Justification

Appendix B.2. Computational Efficiency

Appendix B.3. Numerical Stability

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI