1. Introduction and Related Work
Covert communication in the physical layer aims to conceal not only the content of a transmission but also its very existence [
1,
2]. In this paper, we study the covert-in-cover-traffic regime, where a transmitter continuously sends legitimate public symbols while embedding an additional covert payload. In this setting, the warden’s test is not silence versus transmission but legitimate traffic versus steganographic traffic: the received waveform under embedding must remain statistically indistinguishable from the transmitter’s legitimate emission process under the same operating conditions. Throughout, we use the term hardware-aware to mean that the transmitter-side model explicitly learns and reproduces the non-ideal, transmitter-specific emission statistics induced by its RF chain rather than relying on idealized constellation-centered Gaussian assumptions.
Achieving such indistinguishability is challenging because covert embedding must satisfy three competing requirements. It must carry enough structure for the intended receiver to decode the covert payload at a non-trivial rate while preserving public-link fidelity so that a legacy receiver can still demodulate the public symbols under standard quality constraints. Most importantly, embedding must avoid introducing distributional drift that accumulates over many observations and becomes exploitable by hypothesis testing and data-driven detection [
3]. These considerations motivate embedding mechanisms with explicit per symbol budgets and designs that shift information from single-symbol separability to block-level structure.
Prior research spans several complementary directions. Information-theoretic studies characterize detectability limits and achievable covert rates, with the Additive White Gaussian Noise (AWGN) square-root law as a canonical benchmark [
4,
5]. A second line develops coding and keying strategies that approach these limits and quantify the required shared secret resources [
6,
7]. A third line targets practical wireless deployments by introducing uncertainty through physical-layer mechanisms, such as power randomization [
8], fading uncertainty [
9], randomized signaling [
10], and cooperative or artificial noise assistance [
11], and by leveraging enabling technologies, including multi-antenna transmission [
12], intelligent reflecting surfaces [
13], and unmanned aerial vehicle-assisted networks [
14], as well as grant-free access architectures such as semi-grant-free nonorthogonal multiple access for physical-layer security [
15]. More broadly, information hiding has also been studied in multimedia domains, including coverless image-synthesized video steganography [
16]; we cite this line for context, while our focus remains on wireless cover-traffic embedding. Warden models are likewise diversified, ranging from a single passive observer to collaborative and even active or mobile wardens [
17,
18]. In this work, we focus on a passive warden: Willie only observes ongoing transmissions and does not inject probes. Active or mobile wardens require the specification of an interaction model and additional side information and are therefore treated as complementary to our setting and beyond the scope of this paper.
Our focus is the cover-traffic embedding regime, where covert information is conveyed through structured modifications over short blocks. A common simplification in the literature involves treating legitimate emissions as locally Gaussian around ideal constellation points, implying that small additive perturbations appear statistically natural. This assumption is fragile under realistic RF front ends. Signal-dependent impairments such as In-phase and Quadrature (I/Q) imbalance and Power Amplifier (PA) nonlinearity induce transmitter-specific, non-Gaussian emission statistics (hardware fingerprints) [
19]. Consequently, embedding rules that appear benign under idealized models can leave subtle but systematic artifacts, which become separable once the warden is allowed to learn block-level features from waveform-level baseband observations [
20].
Motivated by this gap, we propose ResDiff, a two-stage learn-then-embed framework for hardware-consistent covert-in-cover-traffic signaling. We adopt a conditional diffusion model as the generative prior because diffusion models can stably learn complex, non-Gaussian, symbol-conditional emission distributions induced by RF impairments, providing high-fidelity anchors for subsequent constrained embedding [
21,
22]; diffusion models have also been enhanced with adversarial mechanisms in other synthesis tasks, further motivating diffusion-based priors in challenging distribution-matching problems [
23]. Compared with GAN-based priors, diffusion training relies on a direct denoising/score-matching objective rather than an adversarial min–max game, which mitigates training instability and mode collapse—both of which are particularly undesirable here because a warden can exploit rare but structured artifacts. Compared with VAE-based priors, diffusion better preserves fine-grained, multimodal conditional I/Q statistics that constitute transmitter-specific hardware fingerprints, producing higher-fidelity anchors that reduce distributional drift under learned detectors. Stage I trains this conditional diffusion prior to approximate the legitimate hardware-impaired emission distribution conditioned on the public symbol. Stage II freezes the prior and embeds covert information via bounded, variance-adaptive residuals spread over a
K-symbol block, enabling processing gain at the intended receiver while strictly constraining per symbol drift. A key engineering consideration is single-pass compliance: ResDiff avoids re-injecting synthesized waveforms into the same non-ideal RF chain, which would otherwise compound distortions and cause drift away from the legitimate manifold. We generate anchors directly in the post-hardware complex baseband domain and apply the residual embedder in the same domain so that the transmitter does not cascade the hardware mapping on already synthesized anchors. The precise signal-space definition and practical implementation considerations are discussed in
Section 3.2.
We evaluate ResDiff against two complementary learned wardens. Our primary adversary is a block-/sequence-level detector (TM-2) operating on K-symbol windows, which can exploit correlation features induced by block-based embedding. We also report a symbol-level detector (TM-1) as a diagnostic for marginal distribution mismatch.
This work makes three contributions. First, we formulate hardware-aware covert-in-cover-traffic embedding as a distribution-consistency problem with respect to a symbol-conditional legitimate emission manifold induced by RF impairments and specify a two-layer threat model that includes both symbol-level and block-/sequence-level learned detectors. Second, we propose ResDiff, a two-stage learn-then-embed framework that learns a hardware-consistent prior using conditional diffusion, then embeds covert information via bounded, variance-adaptive residuals with explicit single-pass compliance. Third, we conduct extensive simulations under severe impairment profiles, reporting detectability, covert reliability, and public-link impact across varying signal-to-noise ratios (SNRs) and block lengths and characterizing the resulting stealth–reliability operating frontier.
The remainder of this paper is organized as follows.
Section 2 presents the system and threat models.
Section 3 details ResDiff and its training procedures.
Section 4 reports experimental results and analysis.
2. System Model and Threat Model
We consider a covert-in-cover-traffic system with a transmitter (Alice), a legitimate receiver (Bob), and a passive warden (Willie), as illustrated in
Figure 1. The transmitter continuously sends public symbols while embedding an additional covert payload, and the warden attempts to distinguish legitimate traffic from embedded traffic based on received baseband observations.
All baseband signals are complex-valued. The time index () denotes the symbol interval. The public symbol is , where is a standard modulation alphabet. In our experiments, the public link uses Quadrature Phase-Shift Keying (QPSK) so that contains four unit-energy constellation points; the proposed framework is not tied to a specific and applies to other modulation formats without changing the formulation. We use K to denote the covert block length: one covert bit is embedded over a block of K public symbols, yielding an effective covert rate of bit/symbol. This block-coded construction intentionally targets the low-rate covert regime, where payloads are short and sporadic and covertness is prioritized over throughput. Typical use cases include control messages, authentication, and lightweight signaling metadata. Increasing K reduces the per symbol embedding pressure and can improve stealthiness, but it also lowers the covert rate; thus, K provides an explicit rate–covertness trade-off knob. Higher-rate variants are made possible by embedding multiple bits per block, coding across blocks, or embedding in parallel over additional degrees of freedom such as in the case of subcarriers or spatial streams. These extensions increase the effective embedding pressure and may therefore raise detectability.
2.1. Signal Model and Channel
Alice transmits a complex baseband sample (
). Bob and Willie receive
where
and
denote the channel coefficients ofBob and Willie, respectively, and
and
are circularly symmetric complex Gaussian noises.
Unlike idealized transmitter models where
is an undistorted constellation point plus independent additive noise, practical RF front ends introduce signal-dependent and often non-Gaussian distortions. We model Alice’s legitimate emission as the output of a hardware-impaired mapping
where
summarizes impairment parameters such as I/Q imbalance, PA nonlinearity, and oscillator impairments. A typical impairment cascade can be expressed as
where
denotes complex conjugation,
captures I/Q amplitude and phase imbalance,
parameterizes a memoryless polynomial PA, and
captures oscillator phase jitter and any residual carrier-frequency offset.
These impairments induce a transmitter-specific conditional distribution for legitimate emissions, i.e.,
whose I/Q density concentrates on a structured, transmitter-specific, non-Gaussian geometry. We refer to this geometry as the hardware-impaired signal manifold. Maintaining covertness against learned detectors requires covert embedding to remain statistically consistent with
.
2.2. Covert Embedding over K-Symbol Blocks and Bob’s Decoding
Let
denote the covert bitstream with
. The
i-th covert bit is embedded over the
K symbol intervals in the index set:
For
, Alice transmits a stego sample of the following form:
where
denotes an anchor sample that is statistically consistent with the legitimate conditional distribution (
) and
is the residual perturbation that embeds the covert bit (
) over the block (
), generated from the anchor sample(
) and the block message (
). In conventional transmission, one may take
. In ResDiff, we instead draw
so that the anchor, itself, follows the hardware-impaired manifold and the residual (
) only needs to carry covert information within a bounded budget. We impose a per symbol magnitude constraint to preserve the public link, i.e.,
where
controls the embedding budget. In ResDiff,
is generated by a bounded residual module; here, we keep the mapping abstract to define the threat model and objectives. When working with the 2D real representation proposed in
Section 3, this constraint is equivalently
.
We encode each covert bit across the block rather than forcing separability at the single-symbol level. This induces a message-dependent block-level correlation that Bob can exploit for processing gain while reducing marginal artifacts that are most accessible to symbol-level detectors.
Bob decodes the public symbols using a standard receiver chain, yielding
from
. Covert decoding is performed on a block basis. For each block (
), Bob observes the received sequence (
), together with the available public-side information, such as
, and the known block structure. He then computes a block-level decision metric and outputs an estimate (
). In particular, Bob uses the demodulated public symbols (
) as side information during inference. During training, we adopt teacher forcing and feed the ground-truth public symbols (
) to the decoder to stabilize optimization; at test time,
is replaced by
. The specific coherent and neural decoding architecture is detailed in
Section 3; here, we emphasize that Bob’s reliability gain arises from the aggregation of information across
K symbols under the bounded-embedding constraint.
2.3. Willie’s Hypothesis Test, Two-Tier Wardens, and Metrics
Willie aims to detect the presence of covert embedding from his received samples. Given
, Willie performs the binary hypothesis test:
In our primary stealthiness evaluation,
samples correspond to the true legitimate emission process (
), while
samples correspond to ResDiff transmissions (
) with diffusion anchors (
). Willie applies a decision rule (
), and we define
with a total error probability of
.
We consider a strong warden model. Willie knows the public modulation alphabet (
) and the noise statistics. Our primary evaluation assumes Willie knows
K, which provides a worst-case benchmark for detectability under fixed operating conditions. If
K is unknown, a practical warden may scan multiple candidate window sizes or adopt a multi-scale detector; however, this introduces a multiple-testing burden and reduces the amount of effective labeled data per candidate (
K), so it does not necessarily yield stronger detectability than the known-
K case under the same operating conditions. We leave a systematic study of unknown-
K wardens and multi-scale countermeasures for future work. Unless otherwise stated, Willie is passive: he only observes ongoing cover traffic and does not inject probes or otherwise manipulate the channel. We mention active or mobile wardens to distinguish our scope from interaction-driven threat models; analyzing such adversaries requires additional assumptions and side information that are not needed in our waveform-level setting. He is assumed to be capable of training a deep detector using labeled samples that are representative of the evaluation conditions, including the same SNR range and impairment profile. This supervised-training assumption is pessimistic for stealthiness evaluation. In practice, training–deployment mismatch can arise and may change detectability; therefore, we discuss mismatch sensitivity and its expected impact in
Section 4.8. Willie does not know the covert bits (
) and does not observe Alice’s internal random seeds or any secret keys used for randomization.
In TM-1, Willie applies a symbol-wise classifier to individual received I/Q samples. We denote
, and
where
is a suspicion score for
. This model captures wardens that exploit symbol-level artifacts such as constellation shifts, variance inflation, or weak multimodality. In evaluation, we report the area under the receiver operating characteristic (ROC) curve (AUC) by treating each received symbol as an independent test sample under
or
.
In TM-2, Willie processes
K-symbol windows and may exploit higher-order and correlation features, i.e.,
where
stacks the I/Q vectors over the
K-symbol window. TM-1 is a special case of TM-2 that restricts the receptive field to a single symbol.
Our design targets a three-way trade-off. Public-link fidelity requires preservation of the decodability of the public symbols at Bob, which we quantify using the public symbol error rate (SER). Covert reliability requires Bob to be enabled to decode the covert bits with a low block bit error rate. Stealthiness requires that Willie be prevented from reliably distinguishing and . For empirical stealthiness, we report the AUC of Willie’s detector. An AUC of corresponds to random guessing, while a larger AUC indicates greater detectability.
4. Experimental Results and Analysis
4.1. Experimental Setup
To isolate distributional effects from fading, we consider an AWGN channel and set
in (
1) and (
2). The transmitter power is normalized as
, and public symbols use unit-power QPSK. We sweep
dB by adjusting the noise variance. ResDiff is trained with the per symbol SNR sampled uniformly from
dB and evaluated over the full 0–20 dB range without re-training, so the low/high SNR regions are out of range relative to training. We further discuss mismatch sensitivity and practical mitigation in
Section 4.8.
By default, we perform evaluations under a fixed impairment profile combining I/Q imbalance, memoryless PA nonlinearity, and oscillator phase jitter;
Table 1 reports the parameter settings and how they enter the model in (
4).
We embed one covert bit over a K-symbol block with , yielding a covert rate of bit/symbol. For the main curves, we choose one operating point per K using with , which gives , , and . For the stealth–reliability trade-off study, we sweep and set . The scaling of keeps the block-wise perturbation energy budget approximately constant across K, since is constant.
We evaluate detectability under the two-layer threat model in
Section 2.3. Willie is trained as a supervised binary classifier using balanced samples from
and
. To avoid bias toward any single operating point, the warden is trained on a multi-SNR dataset, with SNR drawn uniformly from the evaluation range, then frozen to compute the AUC over the full sweep. We also report an anchors-only baseline where
to isolate the incremental impact of Stage II residual embedding on public SER and detectability. This is a diagnostic baseline only;
always corresponds to the true legitimate emission process.
For reproducibility,
Table 1 summarizes the key architectures and training settings.
4.2. Stealthiness: Learned-Warden AUC Versus SNR
We quantify stealthiness by the learned-warden AUC.
corresponds to random guessing. Larger values indicate higher detectability and, therefore, weaker stealthiness. Following
Section 2.3, we report the AUC for TM-1 and TM-2.
Figure 3 reports the AUC versus SNR for
. First, detectability increases with SNR for both TM-2 and TM-1, as reduced channel noise provides less masking at high SNRs. Second, increasing the block length (
K) reduces detectability under both wardens. This reduction is most pronounced for TM-2, since spreading one covert bit over a longer window lowers the per symbol information load and weakens the sequence-level correlation cues available to a windowed detector. TM-1 exhibits the same monotone behavior with respect to
K, confirming that larger blocks also alleviate marginal artifacts by reducing per symbol embedding pressure.
These results align with the ResDiff design. Stage I anchors samples on a hardware-consistent manifold, suppressing gross marginal mismatch, while Stage II restricts embedding to bounded residuals and shifts information from single-symbol separability to block-level structure. Overall, ResDiff provides controllable stealthiness across the operating SNR. A larger K yields a consistent improvement against TM-1 and TM-2.
4.3. Ablation Study: Role of the Diffusion Prior
To isolate the contribution of Stage I to stealthiness, we run an ablation that keeps the embedding budget and evaluation protocol fixed and varies only how the anchor samples are formed. We compare (i) ResDiff, our full two-stage pipeline with diffusion-generated anchors and Stage II residual embedding; (ii) an ideal-anchor baseline, where anchors are ideal QPSK symbols, while Stage II and the budget are kept unchanged; and (iii) an anchors-only setting, where we transmit the diffusion anchors and set (i.e., ) to probe the intrinsic detectability of the learned prior.
Figure 4 reports the warden AUC versus SNR at
and
. The ideal-anchor baseline becomes increasingly detectable with SNR, especially under TM-2, where AUC approaches 1, indicating that embedding residuals around ideal constellation points creates a strong mismatch relative to hardware-impaired emissions. In contrast, the anchors-only setting stays close to random guessing across SNRs for both TM-1 and TM-2, suggesting that the diffusion prior, alone, produces anchors that are largely consistent with the legitimate hardware-impaired manifold. ResDiff lies between these two extremes: compared to the ideal-anchor baseline, diffusion anchors substantially reduce detectability, while the gap between ResDiff and the anchors-only baseline quantifies the additional detectability introduced by Stage II payload embedding under the same budget.
4.4. Qualitative Visualization: Constellation Density
To visually assess distribution consistency,
Figure 5 compares the constellation density of legitimate and stego emissions at SNR
dB.
Figure 5a shows the legitimate emission density; each public symbol forms a hardware-shaped cluster determined by the non-ideal RF front end.
Figure 5b–d show the corresponding stego densities for
. Across all
K values; ResDiff preserves the cluster geometry and centroid locations and does not induce systematic constellation shifts or clearly separable sub-clusters. The main visible effect is a mild broadening of the outer contours, consistent with bounded residual injection while remaining aligned with the legitimate envelope. As
K increases, the per symbol embedding pressure is reduced, and the stego density remains visually co-located with the legitimate density. Overall, these qualitative observations corroborate the AUC trends and support the claim that ResDiff confines embedding within a hardware-consistent manifold.
4.5. Reliability and Fidelity: Covert BER and Public SER
We next evaluate the reliability of covert decoding at the legitimate receiver and the impact of embedding on the public link.
Figure 6 reports the covert-block BER as a function of the SNR. As expected, BER decreases monotonically with the SNR. More importantly, increasing the block length consistently improves covert reliability: spreading one covert bit over a longer block provides processing gain at the decoder and reduces the per symbol burden of embedding. At the representative operating point of
dB,
already achieves a BER on the order of
, and the BER further approaches
at higher SNRs. This behavior is consistent with ResDiff’s design goal of shifting information from marginal separability to a block-level structure.
Figure 7 shows the public SER with and without covert embedding. Embedding introduces a measurable SER penalty in the mid-SNR regime, where the legacy decision regions are not noise-dominated and bounded residuals slightly perturb the effective constellation. The penalty becomes smaller as
K increases, which matches the reduced per symbol distortion budget under longer blocks. Overall, ResDiff maintains a controlled impact on the public link while enabling reliable covert recovery through block processing gain. We note that when the public-link symbol error rate is non-zero, errors in
may propagate to covert decoding, since
is used as side information at inference; this effect is mitigated by operating the public link at a sufficiently reliable SNR and by training with teacher forcing.
4.6. Stealth–Reliability Trade-Off at Fixed SNR
We characterize the explicit stealth–reliability trade-off controlled by the embedding budget. Specifically, fixing the operating condition to dB and the block length at , we sweep the perturbation amplitude () and evaluate both covert reliability and detectability against the sequence-aware warden (TM-2).
Figure 8 reveals a clear frontier for the
case at 20 dB against TM-2. As
increases, the covert BER decreases rapidly, indicating that a larger residual budget provides a stronger decoding margin under hardware-induced self-interference. At the same time, the TM-2 warden’s AUC increases monotonically, suggesting that larger residuals leave more exploitable statistical evidence for detection. Notably, the AUC growth is gradual compared with the BER improvement, producing a practical knee region where reliability becomes usable while detectability remains controlled. This confirms that
serves as an interpretable operating knob for selecting a deployment point according to the desired security–reliability balance. Together with the monotone
K-sweep trends (
Figure 3,
Figure 4,
Figure 5,
Figure 6 and
Figure 7), this budget sweep highlights that
K and
provide complementary controls:
K shapes spreading/processing gain and per symbol pressure, while
directly controls the residual magnitude under a fixed block structure.
4.7. Robustness to Hardware-Impairment Drift
In practice, transmitter impairments can drift over time due to aging, temperature, or other environmental changes. To assess robustness to such mismatch, we keep all trained models fixed and evaluate ResDiff under perturbed impairment parameters at inference time. We focus on a representative operating point (
dB,
, and
) and report the covert BER, together with the warden’s AUC, under both TM-1 and TM-2. We consider three drift scenarios. Nominalmatches the training impairments. Mild Drift increases the I/Q phase mismatch by
and increases the PA AM/AM and AM/PM distortion strengths by 5% relative to the nominal setting. Severe Drift increases the phase mismatch by
and increases both PA distortion terms by 10%.
Table 2 summarizes the results. Overall, performance degrades only slightly: covert BER rises from
to
. Stealthiness remains stable, with TM-1 AUC staying within 0.6049–0.6138 and TM-2 AUC within 0.7142–0.7268. We do not observe a clear monotonic trend in AUC with increasing drift; the fluctuations are small and remain within a narrow band across the tested settings.
4.8. Discussion: Sensitivity to Warden Mismatch
Our AUC results adopt a standard but idealized assumption: Willie can train a detector using labeled data that is representative of the deployment conditions. In practice, perfect alignment is unlikely, and several forms of mismatch may arise. SNR mismatch: Stage II is trained on
dB, whereas our evaluation reports curves over 0–20 dB without re-training; thus, the lowest and highest SNR points effectively test out-of-range operation. Impairment drift: transmitter hardware characteristics can change over time;
Table 2 indicates that the considered I/Q and PA perturbations lead to only a small BER increase and cause limited AUC variation across the tested drift levels. Detector mismatch: To reduce dependence on a single warden design, we report both TM-1 and TM-2. Channel–model mismatch is another practical factor; while we adopt an AWGN channel and have already evaluated a wide SNR range without re-training, real deployments may involve fading, phase rotation, or non-Gaussian interference. These effects can be addressed by domain randomization over channel/impairment parameters during Stage II training and, when feasible, lightweight calibration of Stage II while keeping the diffusion prior fixed. Nevertheless, a more comprehensive study over broader detector families and mismatch ranges remains an important direction beyond the present scope.