ResDiff: Hardware-Aware Physical-Layer Covert Communication via Diffusion-Based Residual Perturbation

Feng, Qi; Zhang, Junyi; Li, Qiang; Li, Mingdi; Chen, Li

doi:10.3390/electronics15030635

Open AccessArticle

ResDiff: Hardware-Aware Physical-Layer Covert Communication via Diffusion-Based Residual Perturbation

by

Qi Feng

^1,2,

Junyi Zhang

^1,2,

Qiang Li

^3,4,*,

Mingdi Li

^1,2

and

Li Chen

^1,2

¹

The 54th Research Institute of China Electronics Technology Group Corporation (CETC), Shijiazhuang 050081, China

²

The Hebei Key Laboratory of Electromagnetic Spectrum Cognition and Control, Shijiazhuang 050081, China

³

School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

⁴

Tianfu Jiangxi Laboratory, Chengdu 641419, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(3), 635; https://doi.org/10.3390/electronics15030635

Submission received: 24 December 2025 / Revised: 25 January 2026 / Accepted: 27 January 2026 / Published: 2 February 2026

(This article belongs to the Special Issue AI-Driven Signal Processing in Communications)

Download

Browse Figures

Versions Notes

Abstract

Physical-layer covert communication is increasingly challenged by powerful detectors that exploit the fine-grained statistical structure of received signals. In realistic Radio Frequency (RF) front ends, signal-dependent impairments such as power amplifier (PA) nonlinearity and In-phase and Quadrature (I/Q) imbalance induce transmitter-specific, non-Gaussian emission statistics under which conventional Gaussian embedding rules cause detectable distribution drift. We propose ResDiff, a two-stage learn-then-embed framework that first trains a symbol-conditional diffusion prior to capture a hardware-consistent emission manifold, then embeds covert information through bounded, variance-adaptive residuals spread over a K-symbol block with coherent block decoding at the legitimate receiver. Simulations under a severe impairment profile in an Additive White Gaussian Noise (AWGN) channel show that ResDiff improves stealthiness while maintaining reliable covert recovery and that increasing K reduces detectability by lowering the per symbol embedding pressure. Overall, the results indicate that hardware-aware generative priors, combined with rate-controlled block embedding, provide a practical path to covert-in-cover-traffic communication under modern detection capabilities.

Keywords:

covert communication; cover traffic; physical-layer steganography; hardware impairments; diffusion models; deep learning-based detection; residual embedding; block coding

1. Introduction and Related Work

Covert communication in the physical layer aims to conceal not only the content of a transmission but also its very existence [1,2]. In this paper, we study the covert-in-cover-traffic regime, where a transmitter continuously sends legitimate public symbols while embedding an additional covert payload. In this setting, the warden’s test is not silence versus transmission but legitimate traffic versus steganographic traffic: the received waveform under embedding must remain statistically indistinguishable from the transmitter’s legitimate emission process under the same operating conditions. Throughout, we use the term hardware-aware to mean that the transmitter-side model explicitly learns and reproduces the non-ideal, transmitter-specific emission statistics induced by its RF chain rather than relying on idealized constellation-centered Gaussian assumptions.

Achieving such indistinguishability is challenging because covert embedding must satisfy three competing requirements. It must carry enough structure for the intended receiver to decode the covert payload at a non-trivial rate while preserving public-link fidelity so that a legacy receiver can still demodulate the public symbols under standard quality constraints. Most importantly, embedding must avoid introducing distributional drift that accumulates over many observations and becomes exploitable by hypothesis testing and data-driven detection [3]. These considerations motivate embedding mechanisms with explicit per symbol budgets and designs that shift information from single-symbol separability to block-level structure.

Prior research spans several complementary directions. Information-theoretic studies characterize detectability limits and achievable covert rates, with the Additive White Gaussian Noise (AWGN) square-root law as a canonical benchmark [4,5]. A second line develops coding and keying strategies that approach these limits and quantify the required shared secret resources [6,7]. A third line targets practical wireless deployments by introducing uncertainty through physical-layer mechanisms, such as power randomization [8], fading uncertainty [9], randomized signaling [10], and cooperative or artificial noise assistance [11], and by leveraging enabling technologies, including multi-antenna transmission [12], intelligent reflecting surfaces [13], and unmanned aerial vehicle-assisted networks [14], as well as grant-free access architectures such as semi-grant-free nonorthogonal multiple access for physical-layer security [15]. More broadly, information hiding has also been studied in multimedia domains, including coverless image-synthesized video steganography [16]; we cite this line for context, while our focus remains on wireless cover-traffic embedding. Warden models are likewise diversified, ranging from a single passive observer to collaborative and even active or mobile wardens [17,18]. In this work, we focus on a passive warden: Willie only observes ongoing transmissions and does not inject probes. Active or mobile wardens require the specification of an interaction model and additional side information and are therefore treated as complementary to our setting and beyond the scope of this paper.

Our focus is the cover-traffic embedding regime, where covert information is conveyed through structured modifications over short blocks. A common simplification in the literature involves treating legitimate emissions as locally Gaussian around ideal constellation points, implying that small additive perturbations appear statistically natural. This assumption is fragile under realistic RF front ends. Signal-dependent impairments such as In-phase and Quadrature (I/Q) imbalance and Power Amplifier (PA) nonlinearity induce transmitter-specific, non-Gaussian emission statistics (hardware fingerprints) [19]. Consequently, embedding rules that appear benign under idealized models can leave subtle but systematic artifacts, which become separable once the warden is allowed to learn block-level features from waveform-level baseband observations [20].

Motivated by this gap, we propose ResDiff, a two-stage learn-then-embed framework for hardware-consistent covert-in-cover-traffic signaling. We adopt a conditional diffusion model as the generative prior because diffusion models can stably learn complex, non-Gaussian, symbol-conditional emission distributions induced by RF impairments, providing high-fidelity anchors for subsequent constrained embedding [21,22]; diffusion models have also been enhanced with adversarial mechanisms in other synthesis tasks, further motivating diffusion-based priors in challenging distribution-matching problems [23]. Compared with GAN-based priors, diffusion training relies on a direct denoising/score-matching objective rather than an adversarial min–max game, which mitigates training instability and mode collapse—both of which are particularly undesirable here because a warden can exploit rare but structured artifacts. Compared with VAE-based priors, diffusion better preserves fine-grained, multimodal conditional I/Q statistics that constitute transmitter-specific hardware fingerprints, producing higher-fidelity anchors that reduce distributional drift under learned detectors. Stage I trains this conditional diffusion prior to approximate the legitimate hardware-impaired emission distribution conditioned on the public symbol. Stage II freezes the prior and embeds covert information via bounded, variance-adaptive residuals spread over a K-symbol block, enabling processing gain at the intended receiver while strictly constraining per symbol drift. A key engineering consideration is single-pass compliance: ResDiff avoids re-injecting synthesized waveforms into the same non-ideal RF chain, which would otherwise compound distortions and cause drift away from the legitimate manifold. We generate anchors directly in the post-hardware complex baseband domain and apply the residual embedder in the same domain so that the transmitter does not cascade the hardware mapping on already synthesized anchors. The precise signal-space definition and practical implementation considerations are discussed in Section 3.2.

We evaluate ResDiff against two complementary learned wardens. Our primary adversary is a block-/sequence-level detector (TM-2) operating on K-symbol windows, which can exploit correlation features induced by block-based embedding. We also report a symbol-level detector (TM-1) as a diagnostic for marginal distribution mismatch.

This work makes three contributions. First, we formulate hardware-aware covert-in-cover-traffic embedding as a distribution-consistency problem with respect to a symbol-conditional legitimate emission manifold induced by RF impairments and specify a two-layer threat model that includes both symbol-level and block-/sequence-level learned detectors. Second, we propose ResDiff, a two-stage learn-then-embed framework that learns a hardware-consistent prior using conditional diffusion, then embeds covert information via bounded, variance-adaptive residuals with explicit single-pass compliance. Third, we conduct extensive simulations under severe impairment profiles, reporting detectability, covert reliability, and public-link impact across varying signal-to-noise ratios (SNRs) and block lengths and characterizing the resulting stealth–reliability operating frontier.

The remainder of this paper is organized as follows. Section 2 presents the system and threat models. Section 3 details ResDiff and its training procedures. Section 4 reports experimental results and analysis.

2. System Model and Threat Model

We consider a covert-in-cover-traffic system with a transmitter (Alice), a legitimate receiver (Bob), and a passive warden (Willie), as illustrated in Figure 1. The transmitter continuously sends public symbols while embedding an additional covert payload, and the warden attempts to distinguish legitimate traffic from embedded traffic based on received baseband observations.

All baseband signals are complex-valued. The time index (

k \in {1, 2, \dots}

) denotes the symbol interval. The public symbol is

s [k] \in S

, where

S

is a standard modulation alphabet. In our experiments, the public link uses Quadrature Phase-Shift Keying (QPSK) so that

S

contains four unit-energy constellation points; the proposed framework is not tied to a specific

S

and applies to other modulation formats without changing the formulation. We use K to denote the covert block length: one covert bit is embedded over a block of K public symbols, yielding an effective covert rate of

R_{c} = 1 / K

bit/symbol. This block-coded construction intentionally targets the low-rate covert regime, where payloads are short and sporadic and covertness is prioritized over throughput. Typical use cases include control messages, authentication, and lightweight signaling metadata. Increasing K reduces the per symbol embedding pressure and can improve stealthiness, but it also lowers the covert rate; thus, K provides an explicit rate–covertness trade-off knob. Higher-rate variants are made possible by embedding multiple bits per block, coding across blocks, or embedding in parallel over additional degrees of freedom such as in the case of subcarriers or spatial streams. These extensions increase the effective embedding pressure and may therefore raise detectability.

2.1. Signal Model and Channel

Alice transmits a complex baseband sample (

x [k]

). Bob and Willie receive

\begin{matrix} y_{B} [k] & = h_{B} x [k] + n_{B} [k], \end{matrix}

(1)

\begin{matrix} y_{W} [k] & = h_{W} x [k] + n_{W} [k], \end{matrix}

(2)

where

h_{B}

and

h_{W}

denote the channel coefficients ofBob and Willie, respectively, and

n_{B} [k] \sim CN (0, σ_{B}^{2})

and

n_{W} [k] \sim CN (0, σ_{W}^{2})

are circularly symmetric complex Gaussian noises.

Unlike idealized transmitter models where

x [k]

is an undistorted constellation point plus independent additive noise, practical RF front ends introduce signal-dependent and often non-Gaussian distortions. We model Alice’s legitimate emission as the output of a hardware-impaired mapping

x_{legit} [k] = f_{hw} (s [k]; ψ),

(3)

where

ψ

summarizes impairment parameters such as I/Q imbalance, PA nonlinearity, and oscillator impairments. A typical impairment cascade can be expressed as

\begin{matrix} x_{IQ} [k] & = μ s [k] + ν s^{*} [k], \end{matrix}

(4)

\begin{matrix} x_{PA} [k] & = \sum_{n = 0}^{N_{p}} c_{2 n + 1} {| x_{IQ} [k] |}^{2 n} x_{IQ} [k], \end{matrix}

(5)

\begin{matrix} x_{legit} [k] & = x_{PA} [k] e^{j θ [k]}, \end{matrix}

(6)

where

{(\cdot)}^{*}

denotes complex conjugation,

(μ, ν)

captures I/Q amplitude and phase imbalance,

{c_{2 n + 1}}

parameterizes a memoryless polynomial PA, and

θ [k]

captures oscillator phase jitter and any residual carrier-frequency offset.

These impairments induce a transmitter-specific conditional distribution for legitimate emissions, i.e.,

x_{legit} \sim p_{legit} (x ∣ s; ψ),

(7)

whose I/Q density concentrates on a structured, transmitter-specific, non-Gaussian geometry. We refer to this geometry as the hardware-impaired signal manifold. Maintaining covertness against learned detectors requires covert embedding to remain statistically consistent with

p_{legit} (x ∣ s; ψ)

.

2.2. Covert Embedding over K-Symbol Blocks and Bob’s Decoding

Let

{m_{i}}_{i = 1}^{N_{b}}

denote the covert bitstream with

m_{i} \in {0, 1}

. The i-th covert bit is embedded over the K symbol intervals in the index set:

B_{i} ≜ {(i - 1) K + 1, \dots, i K} .

(8)

For

k \in B_{i}

, Alice transmits a stego sample of the following form:

x [k] = x_{anc} [k] + δ [k],

(9)

where

x_{anc} [k]

denotes an anchor sample that is statistically consistent with the legitimate conditional distribution (

p_{legit} (x ∣ s; ψ)

) and

δ [k]

is the residual perturbation that embeds the covert bit (

m_{i}

) over the block (

B_{i}

), generated from the anchor sample(

x_{anc} [k]

) and the block message (

m_{i}

). In conventional transmission, one may take

x_{anc} [k] = x_{legit} [k] = f_{hw} (s [k]; ψ)

. In ResDiff, we instead draw

x_{anc} [k] = x_{gen} [k] \sim p_{θ} (x ∣ s [k]) \approx p_{legit} (x ∣ s [k]; ψ)

so that the anchor, itself, follows the hardware-impaired manifold and the residual (

δ [k]

) only needs to carry covert information within a bounded budget. We impose a per symbol magnitude constraint to preserve the public link, i.e.,

| δ [k] | \leq α_{total}, \forall k,

(10)

where

α_{total}

controls the embedding budget. In ResDiff,

δ [k]

is generated by a bounded residual module; here, we keep the mapping abstract to define the threat model and objectives. When working with the 2D real representation proposed in Section 3, this constraint is equivalently

{∥ δ [k] ∥}_{2} \leq α_{total}

.

We encode each covert bit across the block rather than forcing separability at the single-symbol level. This induces a message-dependent block-level correlation that Bob can exploit for processing gain while reducing marginal artifacts that are most accessible to symbol-level detectors.

Bob decodes the public symbols using a standard receiver chain, yielding

\hat{s} [k]

from

y_{B} [k]

. Covert decoding is performed on a block basis. For each block (

B_{i}

), Bob observes the received sequence (

{y_{B} [k]}_{k \in B_{i}}

), together with the available public-side information, such as

{\hat{s} [k]}_{k \in B_{i}}

, and the known block structure. He then computes a block-level decision metric and outputs an estimate (

{\hat{m}}_{i}

). In particular, Bob uses the demodulated public symbols (

\hat{s} [k]

) as side information during inference. During training, we adopt teacher forcing and feed the ground-truth public symbols (

s [k]

) to the decoder to stabilize optimization; at test time,

s [k]

is replaced by

\hat{s} [k]

. The specific coherent and neural decoding architecture is detailed in Section 3; here, we emphasize that Bob’s reliability gain arises from the aggregation of information across K symbols under the bounded-embedding constraint.

2.3. Willie’s Hypothesis Test, Two-Tier Wardens, and Metrics

Willie aims to detect the presence of covert embedding from his received samples. Given

y_{W} = [y_{W} [1], \dots, y_{W} [N]]^{⊤}

, Willie performs the binary hypothesis test:

\begin{matrix} H_{0} & : x [k] = x_{legit} [k], \end{matrix}

(11)

\begin{matrix} H_{1} & : x [k] = x_{stego} [k] = x_{anc} [k] + δ [k] . \end{matrix}

(12)

In our primary stealthiness evaluation,

H_{0}

samples correspond to the true legitimate emission process (

x_{legit} = f_{hw} (s; ψ)

), while

H_{1}

samples correspond to ResDiff transmissions (

x_{stego} = x_{anc} + δ

) with diffusion anchors (

x_{anc} [k] = x_{gen} [k] \sim p_{θ} (x ∣ s [k])

). Willie applies a decision rule (

D (y_{W}) \in {0, 1}

), and we define

P_{FA} = \Pr (D (y_{W}) = 1 ∣ H_{0}), P_{MD} = \Pr (D (y_{W}) = 0 ∣ H_{1}),

(13)

with a total error probability of

P_{E} = \frac{1}{2} (P_{FA} + P_{MD})

.

We consider a strong warden model. Willie knows the public modulation alphabet (

S

) and the noise statistics. Our primary evaluation assumes Willie knows K, which provides a worst-case benchmark for detectability under fixed operating conditions. If K is unknown, a practical warden may scan multiple candidate window sizes or adopt a multi-scale detector; however, this introduces a multiple-testing burden and reduces the amount of effective labeled data per candidate (K), so it does not necessarily yield stronger detectability than the known-K case under the same operating conditions. We leave a systematic study of unknown-K wardens and multi-scale countermeasures for future work. Unless otherwise stated, Willie is passive: he only observes ongoing cover traffic and does not inject probes or otherwise manipulate the channel. We mention active or mobile wardens to distinguish our scope from interaction-driven threat models; analyzing such adversaries requires additional assumptions and side information that are not needed in our waveform-level setting. He is assumed to be capable of training a deep detector using labeled samples that are representative of the evaluation conditions, including the same SNR range and impairment profile. This supervised-training assumption is pessimistic for stealthiness evaluation. In practice, training–deployment mismatch can arise and may change detectability; therefore, we discuss mismatch sensitivity and its expected impact in Section 4.8. Willie does not know the covert bits (

{m_{i}}

) and does not observe Alice’s internal random seeds or any secret keys used for randomization.

In TM-1, Willie applies a symbol-wise classifier to individual received I/Q samples. We denote

y_{W} [k] = {[ℜ {y_{W} [k]}, ℑ {y_{W} [k]}]}^{⊤} \in R^{2}

, and

D_{TM - 1} : y_{W} [k] \mapsto \hat{ℓ} [k] \in [0, 1],

(14)

where

\hat{ℓ} [k]

is a suspicion score for

H_{1}

. This model captures wardens that exploit symbol-level artifacts such as constellation shifts, variance inflation, or weak multimodality. In evaluation, we report the area under the receiver operating characteristic (ROC) curve (AUC) by treating each received symbol as an independent test sample under

H_{0}

or

H_{1}

.

In TM-2, Willie processes K-symbol windows and may exploit higher-order and correlation features, i.e.,

D_{TM - 2} : y_{W} [B_{i}] \mapsto {\hat{ℓ}}_{i} \in [0, 1],

(15)

where

y_{W} [B_{i}] \in R^{2 K}

stacks the I/Q vectors over the K-symbol window. TM-1 is a special case of TM-2 that restricts the receptive field to a single symbol.

Our design targets a three-way trade-off. Public-link fidelity requires preservation of the decodability of the public symbols at Bob, which we quantify using the public symbol error rate (SER). Covert reliability requires Bob to be enabled to decode the covert bits with a low block bit error rate. Stealthiness requires that Willie be prevented from reliably distinguishing

H_{0}

and

H_{1}

. For empirical stealthiness, we report the AUC of Willie’s detector. An AUC of

0.5

corresponds to random guessing, while a larger AUC indicates greater detectability.

3. Method: ResDiff

3.1. Overview and Design Principle

Recall from Section 2 that legitimate emissions follow a hardware-impaired conditional distribution (

x_{legit} \sim p_{legit} (x ∣ s; ψ)

). Alice embeds covert information via a bounded residual perturbation (

δ [k]

) that satisfies

| δ [k] | \leq α_{total}

. A naive additive embedder tends to create out-of-manifold artifacts that are readily exploited by learned wardens.

ResDiff follows a “learn-then-embed” principle: we first learn a high-fidelity generative prior for the legitimate, hardware-impaired signal manifold, then embed covert bits using bounded residuals that remain within the learned statistical envelope. Reliability is achieved through block-level processing gain rather than single-symbol separability. The system model is expressed in complex baseband scalars. For the learning modules in this section, including CDP and VARE, we apply the 2D real I/Q mapping (

x [k] ≜ {[ℜ {x [k]}, ℑ {x [k]}]}^{⊤} \in R^{2}

) and

δ [k] ≜ {[ℜ {δ [k]}, ℑ {δ [k]}]}^{⊤}

and write the corresponding conditional distributions in terms of

x

. Figure 2 provides an overview of the ResDiff framework, including the training and inference/evaluation pipelines.

ResDiff consists of the following stages:

Stage I—Conditional Diffusion Prior (CDP): We learn a conditional generative prior that matches the legitimate hardware-impaired distribution.

$p_{θ} (x ∣ s) \approx p_{legit} (x ∣ s; ψ) .$

(16)

The learned conditional prior produces anchor samples on the legitimate manifold by sampling

$x_{anc} [k] = x_{gen} [k] \sim p_{θ} (x ∣ s [k]) .$

(17)
Stage II—Variance-Adaptive Residual Embedder (VARE): $θ$ is frozen, and a lightweight residual module that maps anchors and the block code to a bounded perturbation ( $δ [k]$ ) is trained. Bob then applies a coherent block decoder to recover covert bits from K received symbols.

3.2. Stage I: Conditional Diffusion Prior (CDP)

Stage I is trained using legitimate transmitter outputs—namely, pairs

{(s [k], x_{0} [k])}

where

x_{0} [k]

is a hardware-impaired emission corresponding to public symbol

s [k]

. Such data can be produced by an impairment simulator or collected from a device under test.

Given a legitimate sample (

x_{0} \sim p_{legit} (x ∣ s)

), the forward diffusion adds Gaussian noise over T steps:

q (x_{t} ∣ x_{t - 1}) = N (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t} I),

(18)

where

{β_{t}}_{t = 1}^{T}

is a fixed schedule,

I

is the

2 \times 2

identity,

α_{t} = 1 - β_{t}

, and

{\bar{α}}_{t} = \prod_{i = 1}^{t} α_{i}

. Equivalently,

x_{t} = \sqrt{{\bar{α}}_{t}} x_{0} + \sqrt{1 - {\bar{α}}_{t}} ϵ, ϵ \sim N (0, I) .

(19)

We train a denoiser (

ϵ_{θ} (x_{t}, t, s)

) conditioned on the public symbol (s) and diffusion step (t) via the noise-prediction objective:

L_{CDP} (θ) = E_{(x_{0}, s), t, ϵ} [{∥ϵ - ϵ_{θ} (\sqrt{{\bar{α}}_{t}} x_{0} + \sqrt{1 - {\bar{α}}_{t}} ϵ, t, s)∥}_{2}^{2}] .

(20)

After training, anchors (

x_{gen} \sim p_{θ} (x ∣ s)

) are generated by running the reverse chain from

x_{T} \sim N (0, I)

to

x_{0}

. Here,

x_{gen}

is defined in the same sample space as the legitimate emitted sample (

x_{0}

), i.e., the 2D I/Q representation of the post-hardware complex baseband waveform (

x_{legit}

) in (3). This corresponds to the emitted complex baseband sample after effective RF front-end distortions and before the wireless channel. Accordingly, “single-pass compliance” means that the non-ideal hardware mapping is applied only once and is not cascaded on already synthesized anchors. In practice, the production of a desired emitted baseband waveform can be supported by standard offline calibration/predistortion or by direct waveform injection through a software-defined radio (SDR); thus, our formulation does not assume any iterative online predistortion loop during embedding.

To ensure that covert perturbations remain statistically indistinguishable from the inherent hardware-induced variations, we require a quantitative measure of the local dispersion of legitimate emissions around each constellation point. This dispersion defines the natural statistical envelope within which covert perturbations should be confined.

We therefore estimate a symbol-conditional covariance matrix for each

s \in S

using either collected legitimate samples (

x_{0}

) or synthesized anchors (

x_{gen}

) from the trained diffusion prior, i.e.,

Σ (s) = E [(x - μ (s)) {(x - μ (s))}^{⊤} ∣ s], μ (s) = E [x ∣ s],

(21)

and derive a scalar spread metric:

v (s) = \sqrt{\frac{1}{2} tr (Σ (s))} .

(22)

During operation, we assign

v [k] = v (s [k])

for each symbol interval.

The computed

v (s)

quantifies the local spread of the hardware-impaired signal cloud and is used to scale the residual perturbation adaptively, as formalized in Equations (24) and (25). This ensures the covert signal energy is proportional to the local impairment-induced variability, thereby embedding covert information within the transmitter’s natural statistical footprint and minimizing detectable distributional drift.

3.3. Stage II: Variance-Adaptive Residual Embedder (VARE)

Stage II embeds covert bits while preserving public-link fidelity and statistical consistency with the learned manifold. Reliability is achieved through block-level processing gain. Let

m_{i} \in {0, 1}

denote the covert bit carried by block

B_{i} = {(i - 1) K + 1, \dots, i K}

. We map

m_{i}

to a bipolar value (

b_{i} = 2 m_{i} - 1 \in {- 1, + 1}

) and use a spreading sequence (

c = {[c_{1}, \dots, c_{K}]}^{⊤} \in {\pm 1}^{K}

) with

{∥ c ∥}_{2}^{2} = K

. For

k \in B_{i}

, we denote

c_{i, k} = c_{k - (i - 1) K}

.

For each

k \in B_{i}

, we sample an anchor (

x_{gen} [k] \sim p_{θ} (x ∣ s [k])

) and generate a residual direction, i.e.,

r_{ϕ} [k] = G_{ϕ} (x_{gen} [k], s [k], c_{i, k}, b_{i}) \in R^{2},

(23)

where

G_{ϕ} (\cdot)

is a lightweight multilayer perceptron (MLP).

We enforce the hard budget (

{∥ δ [k] ∥}_{2} \leq α_{total}

) while adapting to symbol-conditional spread via

δ [k] = α_{total} \cdot ρ (v [k]) \cdot \frac{\tanh (r_{ϕ} [k])}{{∥\tanh (r_{ϕ} [k])∥}_{2} + ε}, k \in B_{i},

(24)

where

ε > 0

avoids division by zero and

ρ (v [k])

is a bounded scale, i.e.,

ρ (v [k]) = clip (\frac{v [k]}{E [v]}, ρ_{\min}, ρ_{\max}),

(25)

with

E [v]

computed as a running mean over training symbols. This construction yields smaller perturbations for dense symbol clusters and slightly larger perturbations where the legitimate manifold is naturally more spread, reducing out-of-manifold artifacts.

Then, the transmitted sample is

x [k] = x_{gen} [k] + δ [k], k \in B_{i} .

(26)

3.4. Bob’s Covert Decoder: Coherent Neural Decoder (CND)

Bob observes

y_{B} [k] = {[ℜ {y_{B} [k]}, ℑ {y_{B} [k]}]}^{⊤}

and uses the public-side information. We implement Bob as a coherent block decoder that outputs the posterior of the covert bit, given a K-symbol block:

{\hat{m}}_{i} = D_{ω} (y_{B} [B_{i}], \tilde{s} [B_{i}], c),

(27)

where

y_{B} [B_{i}] \in R^{2 K}

stacks the I/Q vectors over the block and

\tilde{s} [B_{i}] = s [B_{i}]

during training, while

\tilde{s} [B_{i}] = \hat{s} [B_{i}]

during inference. The decoder learns to exploit the block-level correlation induced by spreading while mitigating nonlinear self-interference arising from the hardware-impaired manifold.

3.5. Training Objectives

Stage II trains the embedder (

G_{ϕ}

) and Bob’s decoder (

D_{ω}

) with

θ

frozen. Training samples are passed through the channel model in (1) and (2).

We minimize block-wise binary cross-entropy:

L_{rel} (ϕ, ω) = E [BCE (m_{i}, D_{ω} (y_{B} [B_{i}], \tilde{s} [B_{i}], c))] .

(28)

We penalize embedding energy to limit public-link distortion, i.e.,

L_{pub} (ϕ) = E [{∥ δ [k] ∥}_{2}^{2}],

(29)

while the hard bound is enforced by construction in (24).

To explicitly discourage marginal artifacts, we optionally include a symbol-level proxy warden (

D_{W}

) that distinguishes anchors from stego samples:

\begin{matrix} L_{W} & = - E [\log D_{W} (y_{W}^{anc} [k])] - E [\log (1 - D_{W} (y_{W}^{stego} [k]))], \end{matrix}

(30)

\begin{matrix} L_{adv} (ϕ) & = - E [\log D_{W} (y_{W}^{stego} [k])] . \end{matrix}

(31)

where

y_{W}^{anc} [k]

is obtained by transmitting anchors without embedding and

y_{W}^{stego} [k]

is obtained from (26) under the same channel conditions. Here,

D_{W} (\cdot)

outputs the probability of the anchor class, so minimizing

L_{adv}

encourages stego samples to be indistinguishable from anchors at the symbol level. We train the proxy warden (

D_{W}

) jointly in an alternating manner. All evaluation results are reported using independently trained wardens.

The Stage II objective is

(ϕ, ω) = \arg \min_{ϕ, ω} λ_{rel} L_{rel} + λ_{pub} L_{pub} + λ_{adv} L_{adv},

(32)

D_{W} = \arg \min_{D_{W}} L_{W} .

(33)

We alternate between updating

D_{W}

using (33) and updating

(ϕ, ω)

using (32).

3.6. Training Procedures

The training procedures for Stage I and Stage II are summarized in Algorithms 1 and 2.

Algorithm 1: Stage I: Training the Conditional Diffusion Prior (CDP)

1: Input: Legitimate pairs ${(s, x_{0})}$ , steps T, schedule ${β_{t}}$
2: Output: $θ$ and spread proxy $v (s)$
3: 1: Initialize parameters $θ$ .
4: 2: repeat until convergence
5: 2.1: Sample minibatch $(s, x_{0})$ ; sample $t \sim Unif {1, \dots, T}$ ; sample $ϵ \sim N (0, I)$ .
6: 2.2: Form $x_{t}$ by (19).
7: 2.3: Update $θ$ by descending $\nabla_{θ} L_{CDP}$ in (20).
8: 3: Estimate $v (s)$ using (21) and (22).

Algorithm 2: Stage II: Training VARE (Embedder) and CND (Bob’sDecoder)

1: Input: Frozen $θ$ , block length K, budget $α_{total}$ , spreading code $c$ , weights $λ_{\cdot}$
2: Output: Embedder $ϕ$ , decoder $ω$ , and a training-time proxy warden $D_{W}$ 1: Initialize $ϕ, ω$ ;
3: 2: repeat until convergence
4: 2.1: Sample a block $s [B_{i}]$ and covert bit $m_{i}$ ; set $b_{i} = 2 m_{i} - 1$ .
5: 2.2: Sample anchors $x_{gen} [k] \sim p_{θ} (x ∣ s [k])$ for all $k \in B_{i}$ .
6: 2.3: Set $v [k] = v (s [k])$ and compute $δ [k]$ by (24).
7: 2.4: Transmit anchors $x_{gen} [k]$ to obtain $y_{W}^{anc} [k]$ , and transmit stego samples $x [k] = x_{gen} [k] + δ [k]$ to obtain $y_{B} [k]$ and $y_{W}^{stego} [k]$ , under the same channel distribution.
8: 2.5: Alternating updates: update $D_{W}$ by minimizing (33) with $y_{W}^{anc}$ and $y_{W}^{stego}$ , then update $(ϕ, ω)$ by minimizing (32) with $D_{W}$ fixed.

4. Experimental Results and Analysis

4.1. Experimental Setup

To isolate distributional effects from fading, we consider an AWGN channel and set

h_{B} = h_{W} = 1

in (1) and (2). The transmitter power is normalized as

E [| x [k] |^{2}] = 1

, and public symbols use unit-power QPSK. We sweep

SNR \in {0, 2, 4, \dots, 20}

dB by adjusting the noise variance. ResDiff is trained with the per symbol SNR sampled uniformly from

[10, 15]

dB and evaluated over the full 0–20 dB range without re-training, so the low/high SNR regions are out of range relative to training. We further discuss mismatch sensitivity and practical mitigation in Section 4.8.

By default, we perform evaluations under a fixed impairment profile combining I/Q imbalance, memoryless PA nonlinearity, and oscillator phase jitter; Table 1 reports the parameter settings and how they enter the model in (4).

We embed one covert bit over a K-symbol block with

K \in {4, 8, 16}

, yielding a covert rate of

R_{c} = 1 / K

bit/symbol. For the main curves, we choose one operating point per K using

α (K) = α (8) \sqrt{8 / K}

with

α (8) = 0.50

, which gives

α (4) = 0.70

,

α (8) = 0.50

, and

α (16) = 0.35

. For the stealth–reliability trade-off study, we sweep

s \in {0.2, 0.3, \dots, 1.0}

and set

α_{total} = α (K) s

. The scaling of

α (K) = α (8) \sqrt{8 / K}

keeps the block-wise perturbation energy budget approximately constant across K, since

K α {(K)}^{2}

is constant.

We evaluate detectability under the two-layer threat model in Section 2.3. Willie is trained as a supervised binary classifier using balanced samples from

H_{0}

and

H_{1}

. To avoid bias toward any single operating point, the warden is trained on a multi-SNR dataset, with SNR drawn uniformly from the evaluation range, then frozen to compute the AUC over the full sweep. We also report an anchors-only baseline where

δ [k] \equiv 0

to isolate the incremental impact of Stage II residual embedding on public SER and detectability. This is a diagnostic baseline only;

H_{0}

always corresponds to the true legitimate emission process.

For reproducibility, Table 1 summarizes the key architectures and training settings.

4.2. Stealthiness: Learned-Warden AUC Versus SNR

We quantify stealthiness by the learned-warden AUC.

AUC = 0.5

corresponds to random guessing. Larger values indicate higher detectability and, therefore, weaker stealthiness. Following Section 2.3, we report the AUC for TM-1 and TM-2.

Figure 3 reports the AUC versus SNR for

K \in {4, 8, 16}

. First, detectability increases with SNR for both TM-2 and TM-1, as reduced channel noise provides less masking at high SNRs. Second, increasing the block length (K) reduces detectability under both wardens. This reduction is most pronounced for TM-2, since spreading one covert bit over a longer window lowers the per symbol information load and weakens the sequence-level correlation cues available to a windowed detector. TM-1 exhibits the same monotone behavior with respect to K, confirming that larger blocks also alleviate marginal artifacts by reducing per symbol embedding pressure.

These results align with the ResDiff design. Stage I anchors samples on a hardware-consistent manifold, suppressing gross marginal mismatch, while Stage II restricts embedding to bounded residuals and shifts information from single-symbol separability to block-level structure. Overall, ResDiff provides controllable stealthiness across the operating SNR. A larger K yields a consistent improvement against TM-1 and TM-2.

4.3. Ablation Study: Role of the Diffusion Prior

To isolate the contribution of Stage I to stealthiness, we run an ablation that keeps the embedding budget and evaluation protocol fixed and varies only how the anchor samples are formed. We compare (i) ResDiff, our full two-stage pipeline with diffusion-generated anchors and Stage II residual embedding; (ii) an ideal-anchor baseline, where anchors are ideal QPSK symbols, while Stage II and the budget are kept unchanged; and (iii) an anchors-only setting, where we transmit the diffusion anchors and set

δ [k] \equiv 0

(i.e.,

x [k] = x_{gen} [k]

) to probe the intrinsic detectability of the learned prior.

Figure 4 reports the warden AUC versus SNR at

K = 8

and

α_{total} = 0.50

. The ideal-anchor baseline becomes increasingly detectable with SNR, especially under TM-2, where AUC approaches 1, indicating that embedding residuals around ideal constellation points creates a strong mismatch relative to hardware-impaired emissions. In contrast, the anchors-only setting stays close to random guessing across SNRs for both TM-1 and TM-2, suggesting that the diffusion prior, alone, produces anchors that are largely consistent with the legitimate hardware-impaired manifold. ResDiff lies between these two extremes: compared to the ideal-anchor baseline, diffusion anchors substantially reduce detectability, while the gap between ResDiff and the anchors-only baseline quantifies the additional detectability introduced by Stage II payload embedding under the same budget.

4.4. Qualitative Visualization: Constellation Density

To visually assess distribution consistency, Figure 5 compares the constellation density of legitimate and stego emissions at SNR

= 15

dB. Figure 5a shows the legitimate emission density; each public symbol forms a hardware-shaped cluster determined by the non-ideal RF front end. Figure 5b–d show the corresponding stego densities for

K \in {4, 8, 16}

. Across all K values; ResDiff preserves the cluster geometry and centroid locations and does not induce systematic constellation shifts or clearly separable sub-clusters. The main visible effect is a mild broadening of the outer contours, consistent with bounded residual injection while remaining aligned with the legitimate envelope. As K increases, the per symbol embedding pressure is reduced, and the stego density remains visually co-located with the legitimate density. Overall, these qualitative observations corroborate the AUC trends and support the claim that ResDiff confines embedding within a hardware-consistent manifold.

4.5. Reliability and Fidelity: Covert BER and Public SER

We next evaluate the reliability of covert decoding at the legitimate receiver and the impact of embedding on the public link.

Figure 6 reports the covert-block BER as a function of the SNR. As expected, BER decreases monotonically with the SNR. More importantly, increasing the block length consistently improves covert reliability: spreading one covert bit over a longer block provides processing gain at the decoder and reduces the per symbol burden of embedding. At the representative operating point of

SNR = 12

dB,

K \geq 8

already achieves a BER on the order of

10^{- 3}

, and the BER further approaches

10^{- 4}

at higher SNRs. This behavior is consistent with ResDiff’s design goal of shifting information from marginal separability to a block-level structure.

Figure 7 shows the public SER with and without covert embedding. Embedding introduces a measurable SER penalty in the mid-SNR regime, where the legacy decision regions are not noise-dominated and bounded residuals slightly perturb the effective constellation. The penalty becomes smaller as K increases, which matches the reduced per symbol distortion budget under longer blocks. Overall, ResDiff maintains a controlled impact on the public link while enabling reliable covert recovery through block processing gain. We note that when the public-link symbol error rate is non-zero, errors in

\hat{s} [B_{i}]

may propagate to covert decoding, since

\hat{s} [B_{i}]

is used as side information at inference; this effect is mitigated by operating the public link at a sufficiently reliable SNR and by training with teacher forcing.

4.6. Stealth–Reliability Trade-Off at Fixed SNR

We characterize the explicit stealth–reliability trade-off controlled by the embedding budget. Specifically, fixing the operating condition to

SNR = 20

dB and the block length at

K = 8

, we sweep the perturbation amplitude (

α_{total}

) and evaluate both covert reliability and detectability against the sequence-aware warden (TM-2).

Figure 8 reveals a clear frontier for the

K = 8

case at 20 dB against TM-2. As

α_{total}

increases, the covert BER decreases rapidly, indicating that a larger residual budget provides a stronger decoding margin under hardware-induced self-interference. At the same time, the TM-2 warden’s AUC increases monotonically, suggesting that larger residuals leave more exploitable statistical evidence for detection. Notably, the AUC growth is gradual compared with the BER improvement, producing a practical knee region where reliability becomes usable while detectability remains controlled. This confirms that

α_{total}

serves as an interpretable operating knob for selecting a deployment point according to the desired security–reliability balance. Together with the monotone K-sweep trends (Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7), this budget sweep highlights that K and

α_{total}

provide complementary controls: K shapes spreading/processing gain and per symbol pressure, while

α_{total}

directly controls the residual magnitude under a fixed block structure.

4.7. Robustness to Hardware-Impairment Drift

In practice, transmitter impairments can drift over time due to aging, temperature, or other environmental changes. To assess robustness to such mismatch, we keep all trained models fixed and evaluate ResDiff under perturbed impairment parameters at inference time. We focus on a representative operating point (

SNR = 15

dB,

K = 8

, and

α_{total} = 0.50

) and report the covert BER, together with the warden’s AUC, under both TM-1 and TM-2. We consider three drift scenarios. Nominalmatches the training impairments. Mild Drift increases the I/Q phase mismatch by

1^{\circ}

and increases the PA AM/AM and AM/PM distortion strengths by 5% relative to the nominal setting. Severe Drift increases the phase mismatch by

3^{\circ}

and increases both PA distortion terms by 10%. Table 2 summarizes the results. Overall, performance degrades only slightly: covert BER rises from

2.2 \times 10^{- 3}

to

3.6 \times 10^{- 3}

. Stealthiness remains stable, with TM-1 AUC staying within 0.6049–0.6138 and TM-2 AUC within 0.7142–0.7268. We do not observe a clear monotonic trend in AUC with increasing drift; the fluctuations are small and remain within a narrow band across the tested settings.

4.8. Discussion: Sensitivity to Warden Mismatch

Our AUC results adopt a standard but idealized assumption: Willie can train a detector using labeled data that is representative of the deployment conditions. In practice, perfect alignment is unlikely, and several forms of mismatch may arise. SNR mismatch: Stage II is trained on

SNR \in [10, 15]

dB, whereas our evaluation reports curves over 0–20 dB without re-training; thus, the lowest and highest SNR points effectively test out-of-range operation. Impairment drift: transmitter hardware characteristics can change over time; Table 2 indicates that the considered I/Q and PA perturbations lead to only a small BER increase and cause limited AUC variation across the tested drift levels. Detector mismatch: To reduce dependence on a single warden design, we report both TM-1 and TM-2. Channel–model mismatch is another practical factor; while we adopt an AWGN channel and have already evaluated a wide SNR range without re-training, real deployments may involve fading, phase rotation, or non-Gaussian interference. These effects can be addressed by domain randomization over channel/impairment parameters during Stage II training and, when feasible, lightweight calibration of Stage II while keeping the diffusion prior fixed. Nevertheless, a more comprehensive study over broader detector families and mismatch ranges remains an important direction beyond the present scope.

5. Conclusions

This work addresses the vulnerability of physical-layer covert communication to learned wardens that exploit specific RF hardware impairments. We demonstrated that conventional embedding strategies induce detectable distributional drift when facing non-ideal front-end characteristics such as PA nonlinearity. To overcome this, we proposed ResDiff, a hardware-aware framework that leverages a symbol-conditional diffusion prior to synthesize on-manifold anchors, coupled with a variance-adaptive residual embedder. By spreading covert information across symbol blocks, ResDiff achieves significant processing gain while constraining per-symbol perturbations within the legitimate statistical envelope. Experimental results validate that ResDiff establishes a robust stealth–reliability trade-off: increasing the block length (K) consistently suppresses detectability against learned wardens while enhancing covert reliability, with minimal impact on public-link fidelity. Future research will extend this generative approach to fading channels, time-varying hardware drift, and experimental validation on SDR platforms.

Author Contributions

Conceptualization, Q.F. and Q.L.; methodology, Q.F.; software, Q.F.; validation, Q.F.; formal analysis, Q.F.; investigation, Q.F. and Q.L.; resources, M.L. and Q.L.; data curation, Q.F.; writing—original draft preparation, Q.F.; writing—review and editing, Q.F., J.Z., Q.L. and L.C.; visualization, Q.F.; supervision, Q.L. and J.Z.; project administration, Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available from the corresponding author upon request.

Conflicts of Interest

Authors Qi Feng, Junyi Zhang, Mingdi Li and Li Chen were employed by The 54th Research Institute of China Electronics Technology Group Corporation (CETC). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Chen, X.; An, J.; Xiong, Z.; Xing, C.; Zhao, N.; Yu, F.R.; Nallanathan, A. Covert communications: A comprehensive survey. IEEE Commun. Surv. Tutor. 2023, 25, 1173–1198. [Google Scholar] [CrossRef]
Li, Z.; Shi, J.; Si, J.; Lv, L.; Guan, L.; Hao, B.; Tie, Z.; Wang, D.; Xing, C.; Quek, T.Q. Intelligent Covert Communication: Recent Advances and Future Research Trends. Engineering 2024, 44, 101–111. [Google Scholar] [CrossRef]
Xie, L.; Peng, L.; Zhang, J.; Hu, A. Radio frequency fingerprint identification for Internet of Things: A survey. Secur. Saf. 2024, 3, 2023022. [Google Scholar] [CrossRef]
Bash, B.A.; Goeckel, D.; Towsley, D. Limits of reliable communication with low probability of detection on AWGN channels. IEEE J. Sel. Areas Commun. 2013, 31, 1921–1930. [Google Scholar] [CrossRef]
Bash, B.A.; Goeckel, D.; Towsley, D.; Guha, S. Hiding information in noise: Fundamental limits of covert wireless communication. IEEE Commun. Mag. 2015, 53, 26–31. [Google Scholar] [CrossRef]
Zhang, Q.; Bakshi, M.; Jaggi, S. Covert communication with polynomial computational complexity. IEEE Trans. Inf. Theory 2019, 66, 1354–1384. [Google Scholar] [CrossRef]
Tahmasbi, M.; Bloch, M.R. First-and second-order asymptotics in covert communication. IEEE Trans. Inf. Theory 2018, 65, 2190–2212. [Google Scholar] [CrossRef]
Li, K.; Kelly, P.A.; Goeckel, D. Optimal power adaptation in covert communication with an uninformed jammer. IEEE Trans. Wirel. Commun. 2020, 19, 3463–3473. [Google Scholar] [CrossRef]
Lu, X.; Yan, S.; Yang, W.; Li, M.; Ng, D.W.K. Covert communication with time uncertainty in time-critical wireless networks. IEEE Trans. Wirel. Commun. 2022, 22, 1116–1129. [Google Scholar] [CrossRef]
Yu, H.; Yu, J.; Liu, J.; Li, Y.; Ye, N.; Yang, K.; An, J. Covert satellite communication over overt channel: A randomized Gaussian signalling approach. IEEE Trans. Aerosp. Electron. Syst. 2024, 61, 2355–2368. [Google Scholar] [CrossRef]
Wen, Y.; Liu, L.; Li, J.; Li, Y.; Wang, K.; Yu, S.; Guizani, M. Covert communications aided by cooperative jamming in overlay cognitive radio networks. IEEE Trans. Mob. Comput. 2024, 23, 12878–12891. [Google Scholar] [CrossRef]
Bai, L.; Xu, J.; Zhou, L. Covert communication for spatially sparse mmWave massive MIMO channels. IEEE Trans. Commun. 2023, 71, 1615–1630. [Google Scholar] [CrossRef]
Wang, M.; Xu, Z.; Xia, B.; Guo, Y. Active intelligent reflecting surface assisted covert communications. IEEE Trans. Veh. Technol. 2022, 72, 5401–5406. [Google Scholar] [CrossRef]
Yang, F.; Wang, C.; Xiong, J.; Deng, N.; Zhao, N.; Li, Y. UAV-enabled robust covert communication against active wardens. IEEE Trans. Veh. Technol. 2024, 73, 9159–9164. [Google Scholar] [CrossRef]
Cao, K.; Ding, H.; Wang, B.; Lv, L.; Tian, J.; Wei, Q.; Gong, F. Enhancing physical-layer security for IoT with nonorthogonal multiple access assisted semi-grant-free transmission. IEEE Internet Things J. 2022, 9, 24669–24681. [Google Scholar] [CrossRef]
Jiao, Y.; Zhang, Z.; Li, Z.; Li, Z.; Li, X.; Liu, J. A robust coverless image-synthesized video steganography based on asymmetric structure. J. Vis. Commun. Image Represent. 2024, 104, 104303. [Google Scholar] [CrossRef]
Arghavani, A.; Dey, S.; Ahlén, A. Covert outage minimization in the presence of multiple wardens. IEEE Trans. Signal Process. 2023, 71, 686–700. [Google Scholar] [CrossRef]
Ma, W.; Zhang, Y.; Zou, X.; Yan, L.; Jiang, T. Covert ambient backscatter communication against randomly distributed wardens. IEEE Trans. Veh. Technol. 2024, 73, 8238–8252. [Google Scholar] [CrossRef]
Schenk, T. RF Imperfections in High-Rate Wireless Systems: Impact and Digital Compensation; Springer: Dordrecht, The Netherlands, 2008. [Google Scholar]
Zhang, J.; Ardizzon, F.; Piana, M.; Shen, G.; Tomasin, S. Physical Layer-Based Device Fingerprinting For Wireless Security: From Theory To Practice. IEEE Trans. Inf. Forensics Secur. 2025, 20, 5296–5325. [Google Scholar] [CrossRef]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; Kumar, A.; Ermon, S.; Poole, B. Score-based generative modeling through stochastic differential equations. arXiv 2020, arXiv:2011.13456. [Google Scholar]
Song, W.; Ye, Z.; Sun, M.; Hou, X.; Li, S.; Hao, A. AttriDiffuser: Adversarially enhanced diffusion model for text-to-facial attribute image synthesis. Pattern Recognit. 2025, 163, 111447. [Google Scholar] [CrossRef]

Figure 1. System model for covert-in-cover-traffic communication. The transmitter generates a hardware-impaired legitimate emission and injects a bounded residual perturbation to carry covert information. The legitimate receiver performs public demodulation and block-level covert decoding, while the warden conducts detection under TM-1 (symbol-level) and TM-2 (block-/sequence-level) threat models.

Figure 2. Overview of ResDiff split into (a) training and (b) inference/evaluation pipelines for clarity. (a) Training pipeline. Stage I trains the conditional diffusion prior; Stage II freezes the prior and trains VARE under

L_{rel}

,

L_{pub}

, and

L_{adv}

. (b) Inference/evaluation pipeline. Anchors and bounded residuals form

x_{stego}

, evaluated by Bob’s block decoder and Willie’s TM-1/TM-2 detectors.

Figure 2. Overview of ResDiff split into (a) training and (b) inference/evaluation pipelines for clarity. (a) Training pipeline. Stage I trains the conditional diffusion prior; Stage II freezes the prior and trains VARE under

L_{rel}

,

L_{pub}

, and

L_{adv}

. (b) Inference/evaluation pipeline. Anchors and bounded residuals form

x_{stego}

, evaluated by Bob’s block decoder and Willie’s TM-1/TM-2 detectors.

Figure 3. Learned-warden AUC versus SNR under the two-layer threat model. TM-2 is the block-/sequence-level detector operating on K-symbol windows, and TM-1 is the symbol-level detector operating on individual I/Q samples. An AUC closer to

0.5

indicates better stealthiness;

AUC = 0.5

corresponds to random guessing.

Figure 3. Learned-warden AUC versus SNR under the two-layer threat model. TM-2 is the block-/sequence-level detector operating on K-symbol windows, and TM-1 is the symbol-level detector operating on individual I/Q samples. An AUC closer to

0.5

indicates better stealthiness;

AUC = 0.5

corresponds to random guessing.

Figure 4. Ablation on the Stage I diffusion prior at

K = 8

and

α_{total} = 0.50

. Warden AUC versus SNR for the baseline with ideal anchors, ResDiff, and the anchors-only baseline, evaluated under TM-1 and TM-2. An AUC closer to

0.5

indicates better stealthiness.

Figure 4. Ablation on the Stage I diffusion prior at

K = 8

and

α_{total} = 0.50

. Warden AUC versus SNR for the baseline with ideal anchors, ResDiff, and the anchors-only baseline, evaluated under TM-1 and TM-2. An AUC closer to

0.5

indicates better stealthiness.

Figure 5. Constellation density under QPSK public signaling at SNR

= 15

dB. ResDiff preserves the cluster geometry of legitimate emissions while embedding covert information via bounded residual perturbations. Brighter colors indicate higher sample density, while darker colors indicate lower density.

Figure 5. Constellation density under QPSK public signaling at SNR

= 15

dB. ResDiff preserves the cluster geometry of legitimate emissions while embedding covert information via bounded residual perturbations. Brighter colors indicate higher sample density, while darker colors indicate lower density.

Figure 6. Covert-block BER versus the SNR for

K \in {4, 8, 16}

under the main operating points. A larger K provides block processing gain and reduces the per symbol embedding pressure, improving reliability.

Figure 6. Covert-block BER versus the SNR for

K \in {4, 8, 16}

under the main operating points. A larger K provides block processing gain and reduces the per symbol embedding pressure, improving reliability.

Figure 7. Public-link SER versus SNR with and without residual embedding (ResDiff versus anchors-only baseline,

δ \equiv 0

). The bounded residual perturbation keeps the public SER controlled, and the penalty decreases as K increases due to a smaller effective per symbol distortion budget.

Figure 7. Public-link SER versus SNR with and without residual embedding (ResDiff versus anchors-only baseline,

δ \equiv 0

). The bounded residual perturbation keeps the public SER controlled, and the penalty decreases as K increases due to a smaller effective per symbol distortion budget.

Figure 8. Stealth–reliability trade-off at

SNR = 20

dB with

K = 8

. By sweeping the embedding budget (

α_{total}

), we observe that increasing

α_{total}

improves covert reliability but increases detectability.

Figure 8. Stealth–reliability trade-off at

SNR = 20

dB with

K = 8

. By sweeping the embedding budget (

α_{total}

), we observe that increasing

α_{total}

improves covert reliability but increases detectability.

Table 1. Implementation details for ResDiff and learned wardens.

Component	Setting
Conditional diffusion prior (Stage I)	Diffusion steps $T = 200$ ; cosine $β$ schedule; noise-prediction objective.
Denoiser architecture	MLP; 5 hidden layers; width 256; symbol embedding dimension 32.
Stage I optimization	Adam; learning rate $10^{- 3}$ ; batch size 2048; 15 epochs.
Residual embedder VARE (Stage II)	MLP; 3 hidden layers; width 128; bounded output scaled to the embedding budget.
Covert decoder CND (Stage II)	MLP; 3 hidden layers; width 128; block input of K symbols with public side information.
Stage II optimization	Adam; learning rate $10^{- 3}$ ; batch size 512; 15 epochs; loss weights $λ_{rel} = 1$ , $λ_{pub} = 0.1$ , $λ_{adv} = 0.5$ .
TM-1 warden	Symbol-level MLP; 3 hidden layers; width 64.
TM-2 warden	Block-level 1D CNN over aligned K-symbol windows; 3 convolutional layers; width 64; kernel size 3; global average pooling.
Warden training protocol	Adam; learning rate $10^{- 3}$ ; 2000 steps; 256 blocks per update; balanced $H_{0} / H_{1}$ samples.
Impairment profile	I/Q imbalance: gain mismatch 1 dB, phase mismatch $3^{\circ}$ ; $(μ, ν) \approx (1.060 + j 0.029, - 0.060 + j 0.029)$ in (4). PA nonlinearity: AM/AM $[1.0, - 0.05, 0.0]$ , AM/PM $[0.0, 0.02, 0.0]$ . Phase jitter: $θ [k] \sim N (0, 0 . 05^{2})$ rad/symbol.

Table 2. Robustness to impairment drift with fixed trained models at

SNR = 15

dB (

K = 8

,

α_{total} = 0.50

).

Table 2. Robustness to impairment drift with fixed trained models at

SNR = 15

dB (

K = 8

,

α_{total} = 0.50

).

Scenario	Covert BER	TM-1 AUC	TM-2 AUC
Nominal	0.0022	0.6060	0.7142
Mild Drift	0.0026	0.6138	0.7195
Severe Drift	0.0036	0.6049	0.7268

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Feng, Q.; Zhang, J.; Li, Q.; Li, M.; Chen, L. ResDiff: Hardware-Aware Physical-Layer Covert Communication via Diffusion-Based Residual Perturbation. Electronics 2026, 15, 635. https://doi.org/10.3390/electronics15030635

AMA Style

Feng Q, Zhang J, Li Q, Li M, Chen L. ResDiff: Hardware-Aware Physical-Layer Covert Communication via Diffusion-Based Residual Perturbation. Electronics. 2026; 15(3):635. https://doi.org/10.3390/electronics15030635

Chicago/Turabian Style

Feng, Qi, Junyi Zhang, Qiang Li, Mingdi Li, and Li Chen. 2026. "ResDiff: Hardware-Aware Physical-Layer Covert Communication via Diffusion-Based Residual Perturbation" Electronics 15, no. 3: 635. https://doi.org/10.3390/electronics15030635

APA Style

Feng, Q., Zhang, J., Li, Q., Li, M., & Chen, L. (2026). ResDiff: Hardware-Aware Physical-Layer Covert Communication via Diffusion-Based Residual Perturbation. Electronics, 15(3), 635. https://doi.org/10.3390/electronics15030635

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ResDiff: Hardware-Aware Physical-Layer Covert Communication via Diffusion-Based Residual Perturbation

Abstract

1. Introduction and Related Work

2. System Model and Threat Model

2.1. Signal Model and Channel

2.2. Covert Embedding over K-Symbol Blocks and Bob’s Decoding

2.3. Willie’s Hypothesis Test, Two-Tier Wardens, and Metrics

3. Method: ResDiff

3.1. Overview and Design Principle

3.2. Stage I: Conditional Diffusion Prior (CDP)

3.3. Stage II: Variance-Adaptive Residual Embedder (VARE)

3.4. Bob’s Covert Decoder: Coherent Neural Decoder (CND)

3.5. Training Objectives

3.6. Training Procedures

4. Experimental Results and Analysis

4.1. Experimental Setup

4.2. Stealthiness: Learned-Warden AUC Versus SNR

4.3. Ablation Study: Role of the Diffusion Prior

4.4. Qualitative Visualization: Constellation Density

4.5. Reliability and Fidelity: Covert BER and Public SER

4.6. Stealth–Reliability Trade-Off at Fixed SNR

4.7. Robustness to Hardware-Impairment Drift

4.8. Discussion: Sensitivity to Warden Mismatch

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI