A Dynamic Website Fingerprinting Defense by Emulating Spatio-Temporal Traffic Features

Zhang, Dongfang; Rao, Chen; Huang, Jianan; Guan, Lei; Tian, Manjun; Liu, Weiwei

doi:10.3390/electronics14224441

Open AccessArticle

A Dynamic Website Fingerprinting Defense by Emulating Spatio-Temporal Traffic Features

by

Dongfang Zhang

¹,

Chen Rao

²,

Jianan Huang

^2,*

,

Lei Guan

¹,

Manjun Tian

¹ and

Weiwei Liu

²

¹

The First Research Institute of the Ministry of Public Security, Beijing 100048, China

²

School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(22), 4441; https://doi.org/10.3390/electronics14224441

Submission received: 22 October 2025 / Revised: 9 November 2025 / Accepted: 12 November 2025 / Published: 14 November 2025

(This article belongs to the Special Issue Novel Methods Applied to Security and Privacy Problems, Volume II)

Download

Browse Figures

Versions Notes

Abstract

Website fingerprinting (WF) attacks analyze encrypted network traffic to exploit side-channel features such as packet sizes, inter-packet timings, and burst patterns, enabling adversaries to infer users’ browsing activities and posing persistent privacy threats even under encryption protocols like TLS. Existing WF defenses primarily rely on static perturbations of coarse statistical features, which fail to reproduce the multi-scale spatio-temporal dynamics of website traffic and are increasingly ineffective against modern deep learning-based classifiers. To address this challenge, we propose WFD-EST, a website fingerprinting defense framework that dynamically emulates spatio-temporal traffic characteristics for fine-grained obfuscation. WFD-EST constructs a multi-scale traffic representation that captures both packet-level dynamics and burst-level correlations. A diffusion-based generator, guided by a fine-tuned large-scale discriminator, synthesizes realistic target traffic templates that preserve structural consistency while reflecting temporal diversity. Based on these templates, a burst-aware manipulation module performs packet padding, insertion, and delay operations to align source flows with target spatio-temporal patterns, generating traffic indistinguishable from real target flows. Evaluations on a real-world dataset comprising 15,000 encrypted samples from three representative websites show that WFD-EST consistently outperforms two state-of-the-art defenses, reducing classification F1 scores by 0.082–0.144 while lowering bandwidth and time overheads by at least 0.086 and 0.054, respectively.

Keywords:

website fingerprinting defenses; traffic analysis; adversarial learning; spatio-temporal feature; packet manipulations

1. Introduction

As the Internet has become an indispensable infrastructure underpinning nearly all aspects of modern life, from social interactions to public services, web services have emerged as the dominant way for user engagement, making privacy protection in web traffic a central concern [1]. Although encryption protocols such as the widely used TLS protocol have significantly enhanced content confidentiality, they do not eliminate privacy risks at the traffic level [2]. Even with encrypted payloads, network flows still expose side-channel features, including packet size distributions, inter-packet timings, and burst patterns, that uniquely characterize visited websites. Website fingerprinting (WF) attacks exploit these spatio-temporal patterns to infer users’ browsing activities without decryption, and advances in deep learning and traffic analysis have made them increasingly powerful and practical, underscoring the urgent need for effective WF defenses [3].

Defending against WF attacks has therefore become a critical challenge in privacy-preserving communication. Existing WF defense strategies for implementing traffic obfuscation generally fall into two categories: protocol format obfuscation and spatio-temporal feature obfuscation. Protocol format obfuscation attempts to disguise traffic by modifying header fields so that it mimics legitimate protocols. For example, SkypeMorph [4] encapsulates Tor traffic to resemble Skype video calls, aiming to evade censorship and classification. However, the inherent complexity and strict semantics of protocol fields make realistic morphing difficult to achieve [5]. Moreover, as modern communication increasingly relies on encryption backbones such as TLS, traffic packets have become more homogenized, and the practicality of protocol-level obfuscation has further diminished [6].

By contrast, spatio-temporal feature obfuscation conceals discriminative traffic patterns by manipulating statistical properties through operations such as packet padding, insertion, and delay [7]. As a protocol-agnostic and more flexible approach, it has become the dominant strategy for WF defense. Despite these advantages, existing feature-based methods still suffer from fundamental limitations. Most rely heavily on shallow statistical features, such as packet length and inter-packet delay distributions, while overlooking deeper temporal and burst-level patterns that capture resource interaction behaviors, the very features exploited by modern deep learning-based WF attacks [8,9]. Furthermore, many adopt coarse-grained perturbation strategies that merely distort traffic away from its original distribution but fail to accurately emulate the dynamic spatio-temporal structure of specific target websites [10,11], thereby reducing fidelity and limiting defense effectiveness. Additionally, several approaches [12,13] require invasive protocol stack modifications and lack adaptability to dynamic network conditions, further constraining real-world deployment.

These challenges point to a critical research gap: current defenses cannot effectively emulate the multi-scale spatio-temporal dynamics of website traffic, from packet-level micro-patterns to burst-level correlations and flow-level behaviors, that encode interaction semantics and underpin WF classifiers’ decision boundaries. Without capturing and obfuscating these higher-order behaviors, defenses remain vulnerable and unstable, failing to provide comprehensive protection against advanced attacks that exploit complex temporal dependencies and cross-feature correlations.

In this work, we propose WFD-EST, a website fingerprinting defense framework that dynamically emulates spatio-temporal traffic characteristics to achieve fine-grained obfuscation. WFD-EST precisely manipulates the multi-scale dynamics of source traffic so that obfuscated flows become indistinguishable from target traffic in the feature space. It first constructs a multi-scale spatio-temporal representation that captures both fine-grained packet characteristics and coarse-grained burst behaviors, enabling a unified modeling of hierarchical traffic dependencies. Guided by a fine-tuned large-scale discriminator, a diffusion-based generator synthesizes target traffic feature templates that preserve structural consistency while reproducing realistic temporal variations. Finally, a burst-aware manipulation module leverages both packet and burst level behaviors to match the interaction patterns of target traffic, enabling dynamic obfuscation to defend against WF attacks.

The main contribution of this study can be outlined as follows:

(1): We present WFD-EST, a website fingerprinting defense framework that emulates the spatio-temporal characteristics of target website traffic to guide fine-grained obfuscation. WFD-EST constructs multi-scale traffic representations, pre-generates realistic target traffic templates, and performs burst-aware packet-by-packet manipulations. By aligning source flows with the synthesized spatio-temporal features of target websites, WFD-EST reshapes traffic into patterns that closely resemble real sites, disrupting discriminative features and reducing WF attack accuracy.
(2): We design SDGLM, an adversarial learning-based generative model that synthesizes spatio-temporal features of target website traffic. SDGLM combines a diffusion-based generator for high-fidelity feature generation with a fine-tuned large-scale discriminator that provides informative feedback. The pre-generated feature templates guide packet manipulation decisions and eliminate the need for online decision-making, enabling precise and efficient obfuscation.
(3): We conduct a comprehensive evaluation on a real-world dataset comprising three website categories and 15,000 traffic samples. Across all pairwise obfuscation scenarios, WFD-EST consistently outperforms two representative WF defense baselines, reducing the F1 score by 0.082–0.144 against two state-of-the-art deep learning classifiers, while simultaneously lowering bandwidth overhead by at least 0.086 and time overhead by at least 0.054.

The remainder of this paper is organized as follows. Section 2 reviews related work on website fingerprinting attacks and defenses. Section 3 presents the adversary model and introduces the overall architecture of WFD-EST. Section 4 details the design and workflow of WFD-EST. Section 5 reports the experimental results and performance evaluation. Finally, Section 6 concludes this study.

2. Related Work

Encrypted website traffic analysis and its countermeasures form the foundation of website fingerprinting research. WF attacks aim to infer user activities or identify visited services from encrypted traffic, while WF defense schemes seek to conceal identifying patterns and mitigate such analysis without disrupting normal communication.

2.1. WF Attack Techniques

WF attack techniques can be broadly categorized into three classes: statistical analysis-based, classical machine learning-based, and deep learning-based approaches. Statistical analysis-based methods are among the earliest attempts to identify encrypted traffic by analyzing basic traffic statistics. They typically rely on handcrafted features such as packet size, count, and inter-arrival time to perform fingerprinting. Early works matched resource size distributions against known webpage templates [14], or modeled traffic instances as sets of objects characterized by length and quantity, measuring similarity via Jaccard coefficients [15]. Other studies proposed compact representations such as Bloom filters with Hamming distance metrics [16] or cumulative length features from the first 100 packets for HTTPS identification [17]. Temporal-only approaches have also been explored, such as dynamic time warping (DTW) over packet timing sequences [18]. Hidden Markov models (HMMs) have been applied to model browsing sessions as state transitions [19], while sequence alignment techniques from bioinformatics have been adapted to treat packet size sequences as symbolic strings [20]. Despite their simplicity, purely statistical methods often lack robustness against evolving traffic patterns and obfuscation defenses.

Classical machine learning-based approaches improve upon statistical methods by applying feature engineering and supervised learning. They extract hundreds of statistical features [21] and train classifiers such as Naïve Bayes, SVMs, and Random Forests for traffic categorization. TF-IDF representations have been used with Naïve Bayes for website fingerprinting in SSH tunnels [22], while incremental SVMs [23] and feature-weighted KNN variants [24] address scalability and feature importance issues. Random Forest-based pipelines incorporating feature preprocessing and selection achieve high accuracy with improved efficiency [25]. Recent works also leverage federated learning with differential privacy to mitigate data isolation while preserving user privacy [26], and entropy-based feature extraction combined with random forest for encrypted traffic identification [27]. However, these methods depend heavily on manual feature design and struggle to capture nonlinear feature interactions or high-dimensional dependencies.

Deep learning-based approaches represent the current state-of-the-art in WF attacks. They learn hierarchical traffic representations directly from raw data, eliminating the need for manual feature extraction and achieving superior accuracy [28]. Early studies employed artificial neural networks (ANN) and stacked autoencoders (SAE) to classify more than 50 types of encrypted and unencrypted protocols [29]. CNN-based frameworks convert packet bytes into vectors to learn discriminative spatial features [30], while network-in-network architectures improve local modeling capacity and parameter efficiency [31]. NeuTic integrates multi-kernel convolutions with self-attention to capture long-range dependencies in TLS traffic [32], and hybrid CNN-LSTM architectures combine spatial and temporal modeling [33]. Graph neural networks (GCNs) have also been explored to exploit flow-level relational information [34]. Other innovations include converting traffic into time-size histograms for image-based classification [35], bidirectional GRU models with reconstruction mechanisms [36], and data augmentation strategies to address class imbalance and improve generalization [37]. These deep learning approaches significantly enhance WF accuracy and robustness, enabling the extraction of complex multi-scale spatio-temporal patterns from encrypted traffic.

2.2. WF Defense Schemes

As WF attacks continue to grow in sophistication, WF defense techniques have evolved along more incremental paths, with most existing approaches falling into two main categories: protocol format obfuscation and spatio-temporal feature obfuscation.

Protocol format obfuscation aims to disguise traffic by transforming the original protocol into another legitimate one, thereby evading protocol-level detection. StegoTorus [38] transforms fixed-size Tor cells into variable-length packets and re-encrypts them so that they are computationally indistinguishable from random data, embedding them within P2P or HTTP messages. FreeWave [39] modulates traffic into acoustic signals transmitted over VoIP connections, leveraging the dynamic and distributed nature of VoIP nodes to evade IP-based blocking. The uTLS library [40] allows developers to customize TLS handshake parameters to mimic popular implementations, while HTTPOS [41] modifies HTTP requests and responses to alter HTTPS fingerprints. Although effective in specific scenarios, protocol obfuscation is often complex, difficult to deploy, and increasingly unnecessary as encryption protocols like TLS dominate modern communication.

Spatio-temporal feature obfuscation modifies traffic patterns while preserving the original protocol, making it the mainstream defense approach. Early methods such as BuFLO [42] transmit fixed-size packets at fixed intervals to standardize traffic features, but at the cost of high overhead. Tamaraw [43] improves efficiency by adjusting packet size and transmission rate asymmetrically. WTF-PAD [44] injects dummy packets during idle periods to obscure timing patterns, while Front [45] dynamically sends fake packets at the beginning of flows to achieve lightweight, zero-delay obfuscation. DynaFlow [46] removes reliance on website fingerprint databases by generating traffic perturbations dynamically, though it requires modifications to Tor. Other approaches manipulate MAC-layer traffic [47], use GANs to generate malicious traffic with realistic statistical features [48], or employ transfer learning and adversarial examples to protect device traffic in black-box settings [49]. TrafficSliver [50] splits client traffic across multiple Tor circuits to hide aggregate flow patterns, while ALERT [51] inserts minimal perturbations to resist timing-based analysis. Prism [13] models standard traffic patterns using power-law state transitions for online perturbation, and Mockingbird [52] minimizes the Euclidean distance between obfuscated and target packet sequences through packet insertion.

In summary, existing WF defenses either manipulate protocol formats or perturb shallow statistical traffic features, which limits their ability to preserve realistic temporal and burst-level dynamics. Protocol format obfuscation introduces high implementation complexity and poor scalability due to the rigid structure of modern encrypted protocols. Feature-based defenses often rely on coarse-grained perturbations that fail to capture hierarchical timing and interaction patterns exploited by deep learning-based WF attacks. In contrast, WFD-EST advances beyond these approaches through a unified adversarial diffusion framework. It first constructs a multi-scale traffic representation to characterize fine-grained temporal and burst-level correlations, then employs a diffusion-based generator guided by a fine-tuned large-scale time-series discriminator to synthesize realistic target traffic features. Finally, a burst-aware obfuscation module translates these generated features into packet-level manipulations in real time. By integrating these components, WFD-EST can emulate the fine-grained spatio-temporal structure of target website traffic, achieving effective and low-overhead obfuscation against WF attacks.

3. System Design

This section begins by illustrating the adversary model under the assumption of a powerful warden equipped with advanced traffic analysis capabilities, followed by an overview of the WFD-EST architecture.

3.1. Adversary Model

As illustrated in Figure 1, we consider a typical website fingerprinting scenario where a user accesses a target website through a web browser, establishing a network connection with the web server. A powerful adversary is assumed to be capable of monitoring traffic at in-path network locations such as routers, gateways, or local access points. To enhance communication privacy, the user employs an encrypted proxy that relays requests and retrieves web resources on behalf of the client.

Although modern encryption protocols protect content confidentiality, encrypted traffic still leaks substantial side-channel information, including packet size distributions, inter-packet timings, and burst patterns, which can be exploited for analysis. By leveraging these spatio-temporal features, the adversary can infer the visited website type without decrypting any payload. Importantly, the adversary is passive, it does not tamper with, inject, or drop packets, but instead performs advanced statistical analysis and applies classical machine learning or deep learning classifiers to classify observed traffic.

To resist such analysis, a traffic obfuscation mechanism is deployed between the client and the proxy server. Customized obfuscation operations are applied at both ends of the communication to transform outgoing traffic in each direction, disguising its original spatio-temporal characteristics and producing flows that appear indistinguishable to the adversary. Specifically, traffic generated by the client is obfuscated before transmission and subsequently deobfuscated at the proxy before being forwarded to the web server. Likewise, server responses are obfuscated at the proxy side and deobfuscated upon arrival at the client. This bidirectional obfuscation, deobfuscation workflow ensures that traffic remains transparent and functional for both endpoints while simultaneously resisting WF analysis during transmission.

Formally, the obfuscation and deobfuscation processes can be expressed as:

Y = O b f u s c a t i o n (X, S), X = {O b f u s c a t i o n}^{- 1} (Y, S)

(1)

where

X

denotes the original traffic,

Y

the obfuscated traffic, and

S

the set of applied obfuscation operations. Within this adversarial model, two evaluation scenarios are commonly considered:

(1): Open-World Scenario: This setting reflects real-world conditions where the set of websites a user may visit $W_{u s e r} = {w_{u 1}, w_{u 2}, . . ., w_{u m}}$ is much larger than the subset monitored by the adversary $W_{t a r g e t} = {w_{u 1}, w_{u 2}, . . ., w_{u n}}$ , with $W_{t a r g e t} \subseteq W_{u s e r}$ and $|W_{u s e r}| ≫ |W_{t a r g e t}|$ . The adversary’s goal is to determine whether a captured traffic flow belongs to any website in $W_{t a r g e t}$ .
(2): Closed-World Scenario: This more constrained but widely adopted setting assumes that the adversary has complete knowledge of the user’s potential website set $W_{u s e r}$ . With this assumption, the adversary can pre-collect abundant training traces for each website and build highly optimized classifiers to identify the visited website.

Although the closed-world assumption is less realistic, it provides a rigorous benchmark for evaluating WF defenses. Therefore, this work adopts the closed-world scenario for experimental evaluation.

3.2. Overview of WFD-EST

As illustrated in Figure 2, WFD-EST is a website fingerprinting defense framework that dynamically emulates multi-scale spatio-temporal traffic characteristics of target website through adversarial training to achieve fine-grained obfuscation through packet manipulations. Its core design aims to transform website traffic into forms that preserve communication semantics while disrupting discriminative features, thereby degrading the effectiveness of traffic analysis. To achieve it, WFD-EST consists of three components: representation construction of multi-scale spatio-temporal traffic feature, target traffic feature generation by adversarial learning, and burst-aware obfuscation by packet-by-packet manipulations.

(1): Representation Construction: WFD-EST first constructs a multi-scale spatio-temporal representation that captures both packet-level fine-grained features (e.g., packet length and inter-packet delay) and burst-level coarse-grained features (e.g., cumulative transmission bytes). This hierarchical representation reflects the structural dynamics of real-world website traffic, from specific packet-level patterns to burst-level transmission behaviors, and provides the foundation for subsequent generative modeling and obfuscation.
(2): Target Traffic Feature Generation: Leveraging the constructed multi-scale representation, WFD-EST synthesizes target traffic feature distributions through an adversarially guided diffusion process. Specifically, a diffusion-based generator with a Transformer-based temporal denoising architecture progressively refines noisy representations into structured spatio-temporal features, while a fine-tuned large-scale time-series discriminator provides adversarial feedback to ensure temporal realism and distributional alignment with the target website traffic. This integration transforms conventional noise removal into a temporally conditioned generation task, allowing the model to capture both the intrinsic stability and dynamic variability of target website traffic. The synthesized features serve as high-fidelity templates that guide burst-aware packet manipulations, enabling realistic traffic obfuscation that effectively resists website fingerprinting attacks.
(3): Burst-Aware Obfuscation: Once target features are generated, WFD-EST performs feature alignment to reconcile packet-level and burst-level characteristics, ensuring consistent manipulation decisions. Guided by these pre-generated templates, WFD-EST applies three packet-wise operations, including packet padding, insertion, and delaying, to reshape spatio-temporal patterns. Padding enlarges short packets to modify size distributions, insertion introduces dummy packets to restructure burst patterns, and delaying adjusts inter-packet intervals to conceal temporal signatures. By orchestrating these operations, WFD-EST emulates the resource interaction dynamics of target websites, generating obfuscated traffic that closely matches the target distribution and remains resilient against advanced fingerprinting analysis. Detailed methodology is presented in Section 4.

4. Traffic Obfuscation Process of WFD-EST

This section introduces the traffic obfuscation workflow of WFD-EST, encompassing constructing multi-scale spatio-temporal representations of website traffic, generating target traffic features through adversarial learning, and performing burst-aware packet manipulations to achieve effective website fingerprinting defense.

4.1. Representation Construction

Existing WF defense methods typically focus on coarse attributes, such as overall packet length or inter-packet delay distributions. However, these features inherently fail to capture the temporal dependencies and fine-grained resource transmission behaviors that WF attacks exploit. To address this limitation, we construct a multi-scale spatio-temporal traffic representation that integrates packet-level fine-grained features with burst-level coarse-grained patterns, providing a more comprehensive foundation for subsequent generative modeling and obfuscation.

4.1.1. Packet-Level Representation

Network traffic can be modeled as a multivariate time-series signal, exhibiting strong temporal autocorrelation and inter-packet dependencies. Capturing these fine-grained temporal dynamics is essential for understanding flow behavior and guiding effective obfuscation. To represent these characteristics, each traffic flow is represented as an ordered sequence of packets based on representative per-packet features, such as packet length and inter-packet interval. Formally, a flow with

n

packets is defined as:

F_{P} = \{p_{1}, p_{2}, \dots, p_{n - 1}, p_{n}\}

(2)

where

p_{i} = \{l_{i}, t_{i}\}

denotes the feature vector of the

i

-th packet, with

l_{i}

representing the packet length and

t_{i}

the inter-packet delay. To eliminate the impact of heterogeneous feature scales and units, all feature dimensions are standardized using Z-score normalization. This packet-level representation effectively captures the fine-grained transmission dynamics and short-range temporal relationships among individual packets.

4.1.2. Burst-Level Representation

While the packet-level representation captures per-packet transmission behaviors, it fails to reflect higher-order structural patterns that emerge during web resource interactions, such as correlated bursts of packets triggered by page rendering or content retrieval. To model these macro-level dynamics, we construct a burst-level representation that aggregates packets into temporally correlated groups.

A packet burst is defined as a set of packets transmitted in the same direction within a time window

τ

. A traffic flow

F

can thus be represented as a sequence of

m

bursts:

F = \{G_{1}, G_{2}, \dots, G_{m}\}

(3)

Each burst inherently exhibits distinct spatio-temporal properties and reflects behavioral transitions across different resource loading phases. To characterize burst-level dynamics, we use the cumulative packet length of each burst as its feature, normalized by a quantization coefficient

λ_{l b}

:

F_{G} = \{g_{1}, g_{2}, \dots, g_{m}\}, g_{i} = ⌊\frac{\sum_{j = 1}^{h} l_{j}}{λ_{l b}}⌋

(4)

where

g_{i}

denotes the normalized cumulative packet feature of the

i

-th burst and

h

is the number of packets within that burst.

This burst-level representation captures coarse-grained structural patterns and long-range correlations, complementing the packet-level view. Together, these two representations form a unified multi-scale spatio-temporal representation that preserves both fine-grained dynamics and higher-order burst semantics, providing a robust foundation for subsequent traffic generation and obfuscation.

4.2. Target Traffic Feature Generation

Building on the multi-scale spatio-temporal representation, WFD-EST synthesizes target traffic feature distributions that preserve hierarchical dynamics across both packet-level and burst-level behaviors. To achieve this, we propose a Stacked denoising Diffusion Generator with fine-tuned Large-scale Model guidance (SDGLM), which leverages adversarial learning to emulate the spatio-temporal characteristics of target website traffic.

In SDGLM, a diffusion-based generator progressively transforms Gaussian noise into structured traffic features aligned with the target distribution, while a fine-tuned large-scale time-series discriminator captures complex temporal dependencies and provides informative feedback to guide generation. Integrating adversarial learning into the diffusion process enables the generator to produce feature distributions that simultaneously preserve the intrinsic statistical properties of real traffic and capture dynamic variations associated with real-world web behaviors. These synthesized distributions are then used as templates for guiding fine-grained packet manipulations in the obfuscation stage.

4.2.1. Generator Design with Diffusion Model

As shown in Figure 3, the generator in SDGLM is built upon a stacked transformer encoders denoising diffusion probabilistic model (STED) [53], which progressively transforms Gaussian noise into structured traffic feature representations through a forward diffusion and reverse denoising process. Unlike standard diffusion models, SDGLM is specifically redesigned to accommodate the temporal and hierarchical nature of encrypted network traffic. To this end, the conventional U-Net denoiser is replaced with stacked Transformer encoders equipped with Gaussian relative positional encoding and timestep embedding fusion, enabling effective modeling of both short-term inter-packet correlations and long-term burst-level dependencies. A dynamic Tanh normalization further stabilizes long-sequence denoising by preserving gradient consistency across diffusion steps. This architectural redesign transforms the denoising process into a temporally conditioned feature refinement procedure, aligning the generation dynamics with the spatio-temporal structure of real traffic.

(1) Forward Diffusion: In the forward process, Gaussian noise is gradually added to the original traffic feature sample

x_{0}

, producing increasingly noisy intermediate states

x_{t}

after

t

steps. Instead of a linear noise schedule, which may over-distort samples at later timesteps, SDGLM adopts a cosine noise scheduler [54] to regulate the noise variance

β_{t}

:

β_{t} = 1 - \frac{{\bar{a}}_{t}}{{\bar{a}}_{t - 1}}, {\bar{a}}_{t} = \frac{f (t)}{f (0)}, f (t) = {\cos (\frac{\frac{t}{T} + s}{1 + s} \cdot \frac{π}{2})}^{2}

(5)

where

t = 0, 1, \dots T

denotes the diffusion timestep,

T

is the total number of diffusion steps, and

s

is a small offset introduced to prevent singularities near

t = 0

. The term

{\bar{a}}_{t}

denotes the cumulative product of noise retention coefficients up to step

t

, which determines the fraction of original information preserved, while

β_{t}

specifies the variance of Gaussian noise added at step

t

.

(2) Reverse Denoising: The reverse process iteratively removes the injected noise by predicting it with the Transformer-based denoiser

ϵ_{θ_{G}} (x_{t}, t)

, and reconstructing the sample as:

x_{t - 1} = \frac{1}{\sqrt{α_{t}}} (x_{t} - \frac{1 - α_{t}}{\sqrt{1 - {\bar{a}}_{t}}} \cdot ϵ_{θ_{G}} (x_{t}, t)) + σ_{t} z

(6)

where

α_{t} = 1 - β_{t}

is the noise retention coefficient,

σ_{t}

is the standard deviation of additional Gaussian noise introduced at step

t

, and

z ~ N (0, I)

is a standard Gaussian noise vector injected to enhance sample diversity. Through repeated denoising steps, the generator converges toward a synthesized feature representation

x_{0}

that closely follows the target traffic distribution.

To strengthen the generator’s ability to model spatio-temporal correlations and inter-feature dependencies in traffic sequences, three architectural enhancements are incorporated. First, Gaussian relative positional encoding [55] is introduced to encode temporal order into the attention mechanism, defined as:

ψ (p, x) = \exp (- \frac{{(p - x)}^{2}}{2 σ^{2}})

(7)

Here,

σ

controls the positional sensitivity. Second, timestep embedding encodes the diffusion step

t

into a high-dimensional vector, guiding the denoiser with the noisy intermediate representation:

z_{t} = W_{2} \cdot S i L U (W_{1} \cdot E (t) + b_{1}) + b_{2}

(8)

Here,

E (t)

is the timestep embedding, and

W_{1}

,

W_{2}

,

b_{1}

, and

b_{2}

are learnable parameters. Final, dynamic tanh (DyT) normalization [56] replaces conventional normalization layers, stabilizing gradients and accelerating convergence:

D y T (x) = γ \cdot \tanh (α x) + β

(9)

where

α

is learnable, and

γ

,

β

are constants. With these enhancements, the generator can capture both local temporal correlations and global traffic dynamics, enabling the synthesis of high-fidelity spatio-temporal feature representations that align closely with target website traffic patterns.

4.2.2. Discriminator Design with Fine-Tuned Large-Scale Model

The discriminator is responsible for distinguishing real traffic patterns from generated ones, thereby guiding the generator toward more realistic synthesis. Since the generator in SDGLM aims to emulate complex spatio-temporal traffic dynamics, an equally powerful discriminator is required to provide informative feedback. Large-scale deep learning models are particularly well-suited for this purpose, as their extensive parameter capacity enables stronger representation learning, and pre-training on massive time-series data allows them to generalize more effectively to new tasks. As a result, they can be readily adapted for adversarial learning without the need to design and train a discriminator from scratch. To this end, we adopt MOMENT [57], a large-scale pre-trained Transformer model for time-series data analysis, and fine-tune it on traffic from target websites to enable adversarial learning.

As illustrated in Figure 4, given an input spatio-temporal traffic representation

x \in R^{C \times D}

, where

C

denotes the number of feature channels and

D

denotes the sequence length dimensions, MOMENT first applies reversible instance normalization (RevIN) for preprocessing:

x = R e v I N (x, M)

(10)

where

M

is a binary mask indicating observed (1) or missing (0) data. The normalized sequence is then divided into

N

non-overlapping patches of fixed length. Each patch

x_{c}^{(i)}

from feature channel

c

is embedded into a

d

-dimensional vector space through a patch embedding function, producing the embedded vector

e_{c}^{(i)}

. The collection of all embedded patches forms a three-dimensional tensor

E

:

E = {\{e_{c}^{(i)}\}}_{c = 1, i = 1}^{C, N} \in R^{C \times N \times d}, e_{c}^{(i)} = P a t c h E m b e d d i n g (x_{c}^{(i)}) \in R^{d}

(11)

The embedded tensor

E

is then passed through a stack of

L

Transformer encoder layers equipped with relative positional encoding, which enhances the model’s ability to capture both temporal dependencies within each feature channel and cross-channel correlations among different traffic features. This encoding process produces a contextualized representation tensor

Z \in R^{C \times N \times d}

.

Subsequently, the contextualized outputs from all channels are concatenated to form a unified feature matrix

Z

, which serves as the final representation used for discrimination:

Z = C o n c a t (Z_{1}, Z_{2}, \dots, Z_{C}) \in R^{N \times (C \cdot d)}

(12)

To adapt MOMENT for adversarial learning, its original prediction head is replaced with a classification head composed of a dropout layer and a linear projection. This modification enables the discriminator to output a binary classification indicating whether the input corresponds to real traffic or traffic generated by the diffusion model. The resulting feedback signal is used to guide the generator, improving the realism and fidelity of synthesized traffic features.

4.2.3. Adversarial Learning Process

The adversarial learning of SDGLM follows a standard min-max optimization framework, where the generator and discriminator are jointly trained to improve the quality and fidelity of generated traffic features. It provides dynamic feedback during the denoising trajectory, guiding the generator to align final denoised states with the temporal and structural patterns of real traffic. The generator learns to synthesize spatio-temporal feature sequences that not only approximate real target traffic but also evolve coherently across timesteps, while the discriminator continually refines this process by distinguishing genuine traffic from generated samples.

Specifically, the generator is optimized with a hybrid loss function that combines a reconstruction objective and an adversarial objective. The reconstruction objective ensures that the generated traffic feature sequence

s (b, c, d)

approximates the real traffic feature sequence

S (b, c, d)

in terms of temporal and structural characteristics. It is formulated as the mean squared error:

L_{M} = \frac{1}{B \cdot C \cdot D} \sum_{b = 1}^{B} \sum_{c = 1}^{C} \sum_{d = 1}^{D} {‖S (b, c, d) - s (b, c, d)‖}^{2}

(13)

where

B

is the batch size,

C

is the number of feature channel, and

D

is the sequence length. The adversarial objective encourages the generator

G (\cdot)

to produce samples

G (z)

drawn from the model distribution

P_{g}

that can fool the discriminator

D (\cdot)

. This loss is calculated as:

L_{G} = E_{G (z) ~ P_{g}} [\log (1 - D (G (z)))]

(14)

The discriminator, in turn, is trained to maximize its classification accuracy by correctly distinguishing real samples

x_{0} ~ P_{r}

from generated samples

G (z) ~ P_{g}

:

L_{D} = E_{G (z) ~ P_{g}} [\log (1 - D (G (z)))] + E_{x_{0} ~ P_{r}} [\log (D (x))]

(15)

The complete adversarial learning process is thus expressed as a min-max optimization:

θ_{G}, θ_{D} = \min_{θ_{G}} [λ L_{M} + L_{G}] + \max_{θ_{D}} L_{D}

(16)

where

θ_{G}

and

θ_{D}

are the parameters of the generator and discriminator, respectively. The scalar

λ

is a balancing coefficient that controls the relative importance of reconstruction and adversarial learning. To facilitate stable training,

λ

is gradually decreased according to the current iteration

i t e r

:

λ = \max \{λ_{0}, 1 - \frac{i t e r}{m a x_i t e r}\}

(17)

Here,

λ_{0}

is the lower bound. This progressive weighting strategy prioritizes reconstruction in the early training stages, enabling the generator to learn basic traffic structure before introducing strong adversarial pressure. As training proceeds, the adversarial component becomes dominant, guiding the generator to produce traffic feature distributions that are not only numerically similar to real traffic but also statistically indistinguishable from it.

The overall adversarial diffusion optimization process of SDGLM is summarized in Algorithm 1, which integrates diffusion-based denoising with adversarial feedback to achieve temporally consistent traffic feature synthesis.

Algorithm 1. Adversarially Learning Process in SDGLM

Input: Target traffic feature distribution

P_{r}

Output: Generator parameters

θ_{G}

, Discriminator parameters

θ_{D}

1. for iter = 1 → max_iter do

2. # 1. Forward diffusion

3. Sample clean traffic feature

x_{0} ~ P_{r}

4. for t = 1 → T do

5. Compute

β_{t}

and

{\bar{a}}_{t}

using the cosine scheduler (Equation (5))

6.

x_{t} = \sqrt{{\bar{a}}_{t}} * x_{0} + \sqrt{1 - {\bar{a}}_{t}} * ε, w h e r e ε ~ N (0, I)

7. # 2. Reverse denoising
8. for t = T → 1 do
9. Predict

ϵ_{θ} (x_{t}, t)

and reconstruct

x_{t - 1}

using Equation (6)
10. # 3. Parameters update
11. Compute generator loss

L_{M}

and

L_{G}

using Equations (13) and (14)
12. Compute discriminator loss

L_{D}

using Equation (15)
13. Apply adversarial optimization using Equation (16) to update parameters
14. # 4. Progressive weighting
15. Update

λ

according to Equation (17)
16. end for
17. return

θ_{G}

,

θ_{D}

4.3. Burst-Aware Obfuscation

Once SDGLM is trained, the generator produces multi-scale spatio-temporal traffic feature distributions that serve as templates to guide packet-level manipulations for traffic obfuscation. However, since the generator focuses solely on feature synthesis, it does not guarantee consistency between fine-grained packet-level features and higher-order burst-level characteristics. To bridge this gap, WFD-EST introduces a burst-aware obfuscation process consisting of two stages: feature alignment and packet manipulation, ensuring that generated features are harmonized across scales before being applied to real traffic.

4.3.1. Feature Alignment

To reconcile the generated packet-level features with burst-level dynamics, WFD-EST performs a feature alignment step that harmonizes the fine-grained packet-level representation

F_{P} = \{p_{1}, p_{2}, \dots, p_{j}\}

with the coarse-grained burst-level representation

F_{G} = \{g_{1}, g_{2}, \dots, g_{m}\}

produced by SDGLM. For the

k

-th burst, the cumulative packet length must satisfy:

g_{k} \cdot λ_{l b} \leq \sum_{i \in G_{k}} l_{i} \leq (g_{k} + 1) \cdot λ_{l b}

(18)

where

λ_{l b}

is a quantization coefficient and

l_{i}

is the packet length of the

i

-th packet.

If the cumulative packet length

\sum_{i \in G_{k}} l_{i}

is below the lower bound, WFD-EST constructs pseudo-bursts by inserting additional packets. These packets are not fixed to MTU size; instead, their lengths are sampled from the empirical size distribution of the corresponding direction (upstream or downstream), preserving natural traffic variability.

If the cumulative length exceeds the upper bound, redundant packets are randomly removed. The selection prioritizes packets whose removal minimally disturbs the empirical packet size distribution, thereby maintaining the statistical characteristics of real flows. If alignment remains unsatisfied after removal, WFD-EST defers packets to the next burst to preserve temporal continuity.

Through these iterative adjustments guided by burst-level statistics, the aligned packet-level feature set

F_{P}^{*}

becomes consistent with the burst dynamics of target traffic while preserving global temporal structure. This aligned representation serves as a precise template for subsequent packet manipulations.

4.3.2. Packet Manipulation

After feature alignment, WFD-EST performs burst-aware packet manipulations to reshape source traffic into patterns closely matching the target distribution. A buffering window of duration

τ

aggregates packets within each burst interval, where operations are applied.

To accurately emulate burst characteristics, a burst matching strategy is employed. Let

b^{f a k e}

denote a target burst with cumulative packet length

|b^{f a k e}|

, and

b^{r e a l}

denote a real burst with length

|b^{r e a l}|

. The manipulated burst size is constrained within:

|b u r s t| = \{\begin{matrix} Ω, |b^{r e a l}| < Ω \\ |b^{r e a l}|, Ω \leq |b^{r e a l}| \leq Γ \\ Γ, b^{r e a l} > Γ \end{matrix}

(19)

where

Ω = ⌊(1 - δ) \cdot |b^{f a k e}|⌋

, and

Γ = ⌊(1 + δ) \cdot |b^{f a k e}|⌋

, with

δ \in (0,1)

controlling burst adjustment sensitivity.

If the cumulative packet length exceeds

Γ

, the excess packets are deferred to the next burst window, redistributing traffic over time while preserving communication semantics. If the cumulative packet length falls below

Ω

, WFD-EST compensates by padding or inserting packets. Padding is applied when existing packets are smaller than their target counterparts, extending them with randomized payload bytes. When additional packets are required, insertion is performed with lengths sampled from the empirical distribution of the target direction to maintain realistic variability. If no source packets are available in a given burst interval, WFD-EST skips the obfuscation to minimize unnecessary overhead. Figure 5 illustrates this burst-aware packet manipulation process, showing how different operations, including packet delaying, padding, and insertion, are selectively applied under various conditions to align real traffic with the target burst distribution. Through this burst-aware manipulation process, WFD-EST reshapes traffic at fine granularity, constructing flows that closely follow the spatio-temporal structure of the target distribution and improving resistance to WF attacks.

5. Experiments

In this section, we evaluate the performance of WFD-EST against two baseline WF defense schemes across three websites, using two state-of-the-art traffic classifiers for attack evaluation. We also measure bandwidth and time overhead to provide a comprehensive assessment of defense effectiveness.

5.1. Experimental Setup

5.1.1. Dataset

To evaluate the effectiveness of WFD-EST, we construct a real-world dataset of encrypted website traffic collected from a WFD-EST prototype in a controlled laboratory environment. A Windows 10 client located in Nanjing generates website visits through the Chrome browser, while WFD-EST operates over a Shadowsocks-encrypted proxy. The proxy server is deployed on a VPS located in Beijing to emulate realistic proxied access scenarios. Traffic collection spans one month (15 October–15 November 2024), ensuring that the dataset captures temporal variations and natural network dynamics.

To minimize manual intervention and ensure consistent data acquisition, traffic collection is fully automated using a Python-based toolkit. Specifically, PyAutoGUI simulates user interactions, Selenium controls browser launch, page visits, and termination, and Scapy parses and processes captured packets. Prior to each capture, the browser cache is cleared, and each session maintains a fixed 10 s page lifetime without user interaction. Details of the traffic collection environment and configuration are summarized in Table 1.

Using this setup, we collect traffic from three representative websites, QQ Music, Baidu, and Dangdang, corresponding to music, search, and e-commerce categories, respectively. Each captured traffic trace is stored in PCAP format and further converted into a structured sequence representing the spatio-temporal characteristics of each flow. Specifically, each sample consists of packet length and inter-packet interval pairs, with uplink packets assigned positive length values and downlink packets assigned negative values. To ensure consistent input dimensions across samples, all sequences are standardized to a fixed length of 3000 by truncating longer ones and padding shorter ones as necessary. In total, the dataset consists of 15,000 encrypted traffic samples, with 5000 samples per category. This design ensures both diversity and balance across different service types, supporting a comprehensive evaluation of WF defenses. Among them, 4000 samples per category are used for generative model training, and the remaining 1000 are reserved for testing. The composition of the dataset is summarized in Table 2.

5.1.2. Baselines for Performance Evaluation

We compare WFD-EST against two state-of-the-art website traffic classification models: FS-Net [36] and NeuTic [32].

FS-Net adopts an encoder–decoder architecture built with bidirectional GRU layers to model packet length sequences. The hidden dimension of each GRU layer is set to 128, and a single Bi-GRU layer is used in both the encoder and decoder. A SeLU activation function is applied in the dense layer, while both the reconstruction layer and the final classification layer employ a softmax activation.

NeuTic leverages a position encoder, a multi-kernel convolution module, and a self-attention module to capture temporal dependencies and long-range correlations in encrypted traffic. The convolution module consists of three one-dimensional convolution kernels combined with gated convolution, and two self-attention layers are used to enhance feature representation. The final classification layer applies a softmax function.

To ensure effective evaluation, two advanced WF defense schemes are introduced for performance comparison: Mockingbird [52] and Prism [13].

Mockingbird is an adversarial obfuscation method that dynamically reshapes traffic by minimizing the Euclidean distance between the burst patterns of obfuscated flows and those of randomly selected target traffic. It performs real-time burst-level comparisons and applies tailored packet insertion operations to gradually transform the source traffic toward the target distribution. By disrupting original traffic patterns and concealing discriminative features, Mockingbird significantly reduces the accuracy of WF classifiers.

Prism is a state-of-the-art defense that perturbs spatio-temporal traffic features through a transition matrix guiding packet padding and splitting. Once a target website is selected, Prism leverages its characteristic traffic patterns as references to transform the current traffic. By enforcing consistent packet lengths and applying manipulations prior to transmission, Prism effectively reshapes flow distributions and disrupts the statistical regularities exploited by WF attacks.

5.1.3. Performance Metrics

The effectiveness of a WF defense scheme is evaluated from two perspectives: overhead and classification performance. All results are obtained using ten-fold cross-validation, and the average values across all folds are reported for each metric.

(1) Overhead: Bandwidth overhead (

B O

) measures the additional bytes introduced by obfuscation operations. A lower

B O

indicates a more lightweight defense that achieves obfuscation with minimal extra traffic. It is calculated as:

B O = \frac{\sum_{F \in F} (| M (F, G) | - | F |)}{\sum_{F \in F} | F |}

(20)

where

| F |

denotes the total transmission bytes of flow

F

, and

| M (F, G) |

denotes the total bytes of the obfuscated flow of

F

generated by perturbation model

G (\cdot)

.

Time overhead (

T O

) captures the latency introduced by obfuscation operations. A lower

T O

indicates that the manipulation incurs minimal additional delay. It is calculated as:

T O = \frac{\sum_{F \in F} (⟦M (F, G)⟧ - ⟦F⟧)}{\sum_{F \in F} ⟦F⟧}

(21)

where

⟦F⟧

represents the duration of traffic flow

F

, and

⟦M (F, G)⟧

represents the duration of the corresponding obfuscated flow.

Overall,

T O

and

T O

represent the relative increases in transmission volume and duration caused by obfuscation. Both are expressed as unitless ratios, indicating proportional bandwidth and latency costs.

(2) Classification Performance: To evaluate the effectiveness of a WF defense against traffic analysis, we measure how much it reduces the accuracy of deep-learning-based classifiers. Four widely used metrics are considered: Precision (

P r e .

), Recall (

R e c .

), Accuracy (

A c c .

), and F1-score (

F 1

), calculated as:

P r e . = \frac{T P}{T P + F P}, R e c . = \frac{T P}{T P + F N}, A c c . = \frac{T P + T N}{T P + T N + F P + F N}, F 1 = \frac{2 \times P r e . \times R e c .}{P r e . + R e c .}

(22)

where

T P

,

F P

,

T N

, and

F N

represent true positives, false positives, true negatives, and false negatives, respectively. An effective WF defense is characterized by lower values of these metrics, indicating reduced classification capability of the adversary.

In addition, we describe the experimental procedure used to obtain the evaluation results. All experiments were conducted in a real-world network environment, where each traffic flow was replayed through the WFD-EST prototype to perform real-time obfuscation. The resulting obfuscated traffic was captured for analysis to ensure reproducibility. WFD-EST and the two baseline defenses were applied to transform source traffic flows into target website traffic across all experimental settings. For each transformation, training samples from the target website were used to learn obfuscation models, while testing samples from the source website were used to evaluate the obfuscation performance. During evaluation, the

B O

,

T O

, and classification metrics were computed from the captured traces according to Equations (20)–(22). All reported results represent averages over ten-fold cross-validation, ensuring that the evaluation outcomes reliably reflect the operational characteristics and robustness of each defense scheme.

5.2. Experiment Parameter Selection

Under the adversarial training framework of WFD-EST, the generator is responsible for producing target-like traffic features to guide obfuscation manipulations, while the discriminator provides feedback to refine generation quality. Thus, the effectiveness of both modules directly impacts the overall obfuscation performance. To assess the effectiveness of the adopted generation method in WFD-EST, we compare it against four representative generative models: WGAN-GP, TTS-GAN, vanilla Diffusion, and STED.

Specifically, WGAN-GP is a GAN-based method where both the generator and discriminator are implemented as 5-layer residual CNNs with convolution kernels of size 1. TTS-GAN is another GAN-based method that leverages Transformer architectures. Its generator consists of a 3-layer Transformer encoder followed by a Conv2D layer with kernel size 1, while the discriminator employs a 3-layer Transformer with a fully connected classification head. Vanilla Diffusion is a basic diffusion-based approach that employs 1000 denoising steps and a 19-layer residual U-Net as the denoising network. STED is the generator module utilized in WFD-EST, which operates without discriminator guidance.

To further evaluate the effectiveness of the proposed generator within the SDGLM framework, we adopt the mean squared error (MSE) between generated traffic features and the real target traffic as the fidelity metric. In this experiment, Baidu traffic, representing widely used search services, is selected as the generation target. During training, a batch size of 128 is set, with a dropout rate of 0.1 and a learning rate of 0.001, optimized using the Adam optimizer. The generator is trained with MSE as the loss function, while the discriminator uses cross-entropy. The comparative results are illustrated in Figure 6.

The proposed SDGLM achieves the best fidelity among all evaluated methods. This improvement stems from its ability to explicitly model temporal dependencies in traffic through the enhanced diffusion process and adversarial training guidance. In particular, replacing the conventional 1D U-Net with a stacked Transformer encoder and employing a cosine variance noise scheduler significantly enhance the denoising process. Moreover, leveraging the fine-tuned MOMENT model as the discriminator further improves the alignment between generated and real traffic features. While TTS-GAN, as a Transformer-based generative model, shows reasonable performance, its reliance on shallow architectures limits its ability to capture long-range traffic correlations. WGAN-GP, despite structural modifications, remains unsuitable for long-sequence generation. Compared with vanilla diffusion, STED benefits from architectural refinements and adversarial guidance, resulting in generated traffic features that more closely resemble the original distribution.

Furthermore, to evaluate the sensitivity of WFD-EST to different training configurations, we analyze the impact of key parameters, including the number of diffusion steps, training epochs, and encoder layer depth in both the generator and discriminator. The results are summarized in Figure 7.

Figure 7a shows the effect of varying the number of diffusion steps from 200 to 1400. Fewer steps lead to higher MSE due to insufficient denoising, whereas increasing the number of steps progressively improves generation quality, converging around 1000 steps. Figure 7b illustrates the impact of training epochs. With too few epochs, the adversarial learning remains undertrained, resulting in unstable synthesis. Performance stabilizes around 1000 epochs, which is therefore selected as the optimal setting. Figure 7c depicts the effect of encoder depth in the generator, ranging from 2 to 10 layers. Deeper Transformer encoders capture richer temporal dependencies, with performance converging between 6 and 10 layers. Hence, 6 layers are adopted for efficiency. Figure 7d examines the encoder depth of the discriminator using three MOMENT variants: small (6 layers), base (12 layers), and large (24 layers). The 6-layer model yields weak adversarial feedback, resulting in higher MSE, while the 12-layer model achieves a substantial improvement. The 24-layer variant offers negligible gains but increases complexity, thus 12 layers are chosen as the optimal configuration.

5.3. Pairwise Obfuscation Experiments on Three Representative Websites

To comprehensively evaluate the effectiveness of WFD-EST, we conduct pairwise obfuscation experiments across three representative websites: Dangdang (e-commerce), Baidu (search), and QQ Music (music). These experiments allow us to assess the performance of WFD-EST in emulating target website traffic. By obfuscating traffic between different website categories, we can evaluate how WFD-EST handles the varying traffic patterns and the associated challenges in website fingerprinting defense.

5.3.1. Obfuscation Between Dangdang and Baidu

This subsection evaluates bidirectional obfuscation between Dangdang and Baidu. Specifically, we examine whether obfuscated Dangdang traffic transformed into Baidu (D.→B.) can be distinguished from real Baidu traffic, and conversely, whether Baidu traffic obfuscated into Dangdang (B.→D.) can be separated from genuine Dangdang flows. The classification results are shown in Figure 8.

As shown in Figure 7, both FS-Net and NeuTic can accurately classify Dangdang and Baidu traffic in the undefended scenario, achieving F1-scores around 0.95, which indicates strong inherent distinguishability between the two websites. After obfuscation, all three defense schemes can reduce classification performance, demonstrating their ability to conceal traffic characteristics. Among them, WFD-EST consistently achieves the lowest F1-scores (0.69–0.71) in both directions, indicating the strongest defense capability. In contrast, Mockingbird shows the weakest performance (F1 above 0.78), as its simple packet padding approach lacks fine-grained control over spatio-temporal features. Prism improves upon this by incorporating packet splitting, yet it still lags behind WFD-EST. The superiority of WFD-EST stems from its fine-grained manipulation guided by adversarially generated target traffic, which better aligns obfuscated traffic with the target distribution.

To further evaluate defense effectiveness, we measure the bandwidth overhead and time overhead introduced by each scheme. Results are shown in Table 3.

The overhead results align well with the classification findings. WFD-EST achieves the best trade-off, introducing the lowest BO and TO in both directions, while Mockingbird incurs the highest overhead due to its coarse-grained perturbations. WFD-EST benefits from pre-generated target flows, enabling packet manipulations through simple template matching rather than on-the-fly perturbation computations, thereby reducing latency. Additionally, the similar overhead observed in both directions suggests that Dangdang and Baidu share structural similarities in their traffic patterns, both involve request-response interactions for content retrieval, making them mutually easier to obfuscate.

5.3.2. Obfuscation Between Baidu and QQ Music

This subsection evaluates obfuscation between Baidu and QQ Music. We examine whether obfuscated Baidu traffic transformed into QQ Music (B.→M.) can be distinguished from real QQ Music traffic, and vice versa, while also analyzing the associated bandwidth and time overheads. The classification results are shown in Figure 9.

FS-Net and NeuTic achieve high classification accuracy in the undefended scenario (F1 > 0.97), confirming that Baidu and QQ Music exhibit highly distinguishable traffic patterns. After obfuscation, all three schemes substantially reduce classification performance, indicating successful concealment of spatio-temporal characteristics. Consistent with previous findings, WFD-EST achieves the strongest defense performance, reducing F1 scores to 0.64–0.69 in both scenarios. In contrast, Mockingbird remains the weakest, as its coarse packet padding manipulation fails to sufficiently distort temporal patterns, while Prism performs moderately better through additional packet splitting but still falls short of WFD-EST. The results of WFD-EST are attributed to its fine-grained manipulations guided by adversarially generated traffic, which better align obfuscated flows with the statistical distribution of the target.

The corresponding bandwidth and time overheads for each scheme are shown in Table 4.

The overhead results exhibit a consistent trend with classification performance: WFD-EST introduces the lowest bandwidth and time overhead, while Mockingbird incurs the largest cost. Notably, the required bandwidth differs between the two obfuscation directions. Obfuscating Baidu into QQ Music incurs higher BO because QQ Music traffic involves heavier data transfer due to continuous media streaming, requiring the insertion of additional large MTU-sized packets to mimic its burst patterns. Conversely, obfuscating QQ Music into Baidu also increases bandwidth overhead, though to a lesser degree, as additional request packets must be inserted to simulate Baidu’s query-response dynamics. Overall, WFD-EST achieves the most favorable balance between defense effectiveness and operational overhead, offering robust traffic obfuscation with minimal bandwidth and latency penalties. This demonstrates its capability to provide strong protection against WF attacks while keeping additional transmission costs and delays to a minimum.

5.3.3. Obfuscation Between Dangdang and QQ Music

This subsection evaluates obfuscation between Dangdang and QQ Music, assessing both the indistinguishability of obfuscated traffic and the associated overhead. The classification results are shown in Figure 10.

As shown in Figure 10, FS-Net and NeuTic achieve high classification performance (F1 = 0.987) in the undefended setting, representing that Dangdang and QQ Music traffic exhibit highly distinguishable patterns. After obfuscation, all methods reduce classification accuracy, demonstrating their ability to obscure traffic features. However, the obfuscation from Dangdang to QQ Music is notably more challenging than the reverse. This is because e-commerce traffic inherently involves frequent and diverse upstream requests for resource loading, patterns that are difficult to conceal when transforming into the more continuous, media-centric behavior of music streaming traffic. Conversely, obfuscating QQ Music into Dangdang is relatively easier: by inserting additional upstream request packets, music traffic can better mimic the bursty request-response structure characteristic of e-commerce sites. Across both directions, WFD-EST consistently achieves the strongest obfuscation performance, lowering F1 scores to the range of 0.70–0.79, while Mockingbird remains the weakest due to its coarse-grained padding strategy. Prism achieves moderate performance but still falls short of WFD-EST, which benefits from fine-grained, adversarially guided manipulations aligned with generated target traffic distributions.

The bandwidth and time overhead results are presented in Table 5.

The overhead analysis aligns with the classification results. Obfuscating Dangdang into QQ Music incurs a higher bandwidth overhead, as mimicking continuous media transmission requires inserting numerous large MTU-sized packets. However, this primarily involves packet insertion and thus introduces relatively low time overhead. Conversely, transforming QQ Music into Dangdang requires adding numerous small upstream request packets to emulate e-commerce traffic patterns, resulting in lower bandwidth overhead but slightly higher time overhead due to the fine-grained manipulation required to control packet sizes and timing. Overall, WFD-EST achieves the most favorable trade-off between effectiveness and overhead, demonstrating its capability to deliver robust obfuscation performance while maintaining low transmission and latency costs.

5.3.4. Remark

Overall, WFD-EST achieves an average F1 score of 0.708, with an average BO of 0.205 and an average TO of 0.113. In comparison, Mockingbird attains an average F1 score of 0.852 with a BO of 0.399 and a TO of 0.257, while Prism records an average F1 score of 0.790 with a BO of 0.291 and a TO of 0.167. These results demonstrate that WFD-EST enhances defense effectiveness by reducing the F1 score by at least 0.082 compared to the baselines, while simultaneously lowering BO by at least 0.086 and TO by at least 0.054.

The results from the pairwise obfuscation experiments highlight several key insights. First, the difficulty of obfuscation increases as the traffic patterns between the source and target websites become more distinct, requiring more complex transformations when the traffic is less similar. Second, WFD-EST consistently outperforms both Mockingbird and Prism, achieving better obfuscation performance with minimal additional bandwidth and time overhead. Finally, WFD-EST minimizes the trade-off between effectiveness and overhead, demonstrating strong defense capability while maintaining low computational costs.

5.4. Generalization Experiment Evaluation

As the previous subsection focuses on pairwise obfuscation among three representative websites, this subsection further investigates the generalization and transferability of WFD-EST under broader and cross-domain scenarios. Specifically, we adopt the open-source Maybenot Firefox dataset [58], collected during week 15 of 2021, which contains traffic from 92 websites, each with 1000 samples, totaling 92,000 samples. Among them, 800 samples per website are used for generative model training, and the remaining 200 are reserved for testing through traffic replay on the WFD-EST prototype. Two complementary experiments are conducted: a large-scale obfuscation experiment to assess performance within the dataset, and a transferability experiment to evaluate cross-domain adaptability when obfuscation knowledge is derived from external sources.

5.4.1. Evaluation on Large-Scale Obfuscation Performance

In the large-scale obfuscation setting, both the obfuscation knowledge and evaluation are conducted within the Maybenot dataset, allowing assessment of WFD-EST’s performance across diverse website traffic patterns. Each WF defense scheme learns traffic characteristics from all 92 websites, and during testing, each traffic sample randomly selects a target website for obfuscation. The evaluation metrics include bandwidth overhead, time overhead, and classification accuracy, as illustrated in Figure 11.

As shown in Figure 11, WFD-EST consistently achieves the strongest defense performance, exhibiting the lowest BO and TO while substantially reducing classifier accuracy. In contrast, Prism and Mockingbird show inferior performance with higher overhead and weaker obfuscation capability. The superiority of WFD-EST stems from its diffusion-based generative mechanism, which synthesizes realistic multi-scale spatio-temporal feature templates that closely align with target traffic distributions. These pre-generated templates enable efficient, real-time obfuscation with minimal latency and computational cost. By contrast, Prism relies on packet padding and splitting guided by fixed state transitions, limiting adaptability to dynamic traffic behaviors. Mockingbird, which performs online burst-level optimization through iterative packet insertion, suffers from higher latency and imprecise feature matching, leading to both greater overhead and weaker obfuscation fidelity.

5.4.2. Evaluation on Transferability

In the transferability setting, the obfuscation knowledge is derived from the three representative websites in our self-collected dataset and applied to the Maybenot dataset. This configuration simulates a realistic scenario where prior knowledge from a limited set of known websites is leveraged to obfuscate traffic from unseen domains, thereby testing the cross-domain generalization of WFD-EST. The results are illustrated in Figure 12.

As shown in Figure 12, WFD-EST consistently maintains the best defense performance among all evaluated schemes, even when applied to previously unseen websites. Although the transfer setting naturally introduces additional overhead due to distributional mismatch, WFD-EST achieves a balanced trade-off, incurring only moderate increases in BO and TO while maintaining the lowest classification accuracy. This robustness arises from its adversarial diffusion mechanism, which captures generalized spatio-temporal priors that adapt across heterogeneous website traffic patterns. In contrast, Prism exhibits comparable overhead but notably higher classification accuracy, indicating limited adaptability due to its predefined transition matrices. Mockingbird demonstrates the weakest transferability, showing both higher overhead and higher classifier identification rates, as its burst-level optimization is highly sensitive to mismatched target statistics, leading to unstable obfuscation outcomes.

6. Conclusions

In this work, we proposed WFD-EST, a website fingerprinting defense framework that dynamically emulates the spatio-temporal characteristics of target website traffic to achieve fine-grained traffic obfuscation. WFD-EST constructs multi-scale traffic representations, synthesizes target feature templates through an adversarial diffusion-based generative model (SDGLM), and performs burst-aware packet manipulations to align source flows with target distributions. Extensive experiments on real-world and large-scale benchmark datasets show that WFD-EST consistently surpasses two representative WF defenses, effectively reducing classification accuracy while maintaining lower bandwidth and time overhead.

While these results highlight the effectiveness of WFD-EST, several directions remain for future work: First, we plan to extend our approach to more complex open-world scenarios with larger and more diverse websites, which will present greater challenges for both WF defenses and traffic analysis adversaries. Second, we aim to develop adaptive obfuscation mechanisms that dynamically adjust manipulation strategies based on network conditions, achieving stronger defense performance with lower overhead. Furthermore, we plan to enhance the generative capability of SDGLM to improve scalability and synthesis fidelity, enabling more realistic and diverse traffic feature generation.

Author Contributions

Conceptualization, D.Z. and C.R.; methodology, D.Z. and C.R.; software, D.Z., C.R. and J.H.; validation, J.H., L.G. and M.T.; formal analysis, L.G. and M.T.; data curation, L.G. and M.T.; writing—original draft preparation, D.Z.; writing—review and editing, D.Z., C.R., J.H., L.G., M.T. and W.L.; visualization, J.H.; supervision, J.H. and W.L.; project administration, W.L.; funding acquisition, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China, grant number 2021QY0700.

Data Availability Statement

The data presented in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhou, J.; Fu, W.; Hu, W.; Sun, Z.; He, T.; Zhang, Z. Challenges and advances in analyzing tls 1.3-encrypted traffic: A comprehensive survey. Electronics 2024, 13, 4000. [Google Scholar] [CrossRef]
Liu, Z.; Wei, Q.; Song, Q.; Duan, C. Fine-grained encrypted traffic classification using dual embedding and graph neural net-works. Electronics 2025, 14, 778. [Google Scholar] [CrossRef]
Dong, W.; Yu, J.; Lin, X.; Gou, G.; Xiong, G. Deep learning and pre-training technology for encrypted traffic classification: A comprehensive review. Neurocomputing 2025, 617, 128444. [Google Scholar] [CrossRef]
Mohajeri, M.; Li, B.; Derakhshani, M.; Goldberg, I. Skypemorph: Protocol obfuscation for tor bridges. In Proceedings of the ACM Conference on Computer and Communications Security, Raleigh, NC, USA, 16–18 October 2012; pp. 97–108. [Google Scholar]
Yang, C.; Gu, Z.; Bai, J.; Li, Z.; Xiong, G.; Gou, G.; Yao, S.; Chen, X. Few-shot encrypted traffic classification: A survey. In Proceedings of the Asia-Pacific Conference on Image Processing, Electronics and Computers, Dalian, China, 12–14 April 2024; pp. 646–652. [Google Scholar]
Zhou, G.; Guo, X.; Liu, Z.; Li, T.; Li, Q.; Xu, K. Trafficformer: An efficient pre-trained model for traffic data. In Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 12–14 May 2025; pp. 1844–1860. [Google Scholar]
Huang, J.; Liu, W.; Liu, G.; Gao, B.; Nie, F. WF-A2D: Enhancing privacy with asymmetric adversarial defense against website fingerprinting. IEEE Trans. Inf. Forensics Secur. 2025, 20, 4739–4754. [Google Scholar] [CrossRef]
Alyami, M.; Alghamdi, A.; Alkhowaiter, M.; Zou, C.; Solihin, Y. Random segmentation: New traffic obfuscation against packet-size-based side-channel attacks. Electronics 2023, 12, 3816. [Google Scholar] [CrossRef]
Wang, Z.; Li, T.; Yin, M.; Yuan, X.; Luo, X.; Li, L. WF3A: A n-shot website fingerprinting with effective fusion feature attention. Comput. Secur. 2024, 140, 103796. [Google Scholar] [CrossRef]
Li, D.; Zhu, Y.; Chen, M.; Wang, J. Minipatch: Undermining dnn-based website fingerprinting with adversarial patches. IEEE Trans. Inf. Forensics Secur. 2022, 17, 2437–2451. [Google Scholar] [CrossRef]
Xie, R.; Cao, J.; Zhu, Y.; Zhang, Y.; He, Y.; Peng, H.; Wang, Y.; Xu, M.; Sun, K.; Dong, E.; et al. Cactus: Obfuscating bidirectional encrypted tcp traffic at client side. IEEE Trans. Inf. Forensics Secur. 2024, 19, 7659–7673. [Google Scholar] [CrossRef]
Wang, T.; Goldberg, I. Walkie-Talkie: An efficient defense against passive website fingerprinting attacks. In Proceedings of the 26th USENIX Security Symposium, Vancouver, BC, Canada, 16–18 August 2017; pp. 1375–1390. [Google Scholar]
Li, W.; Zhang, X.; Bao, H.; Yang, B.; Li, Z.; Shi, H.; Wang, Q. Prism: Real-time privacy protection against temporal network traffic analyzers. IEEE Trans. Inf. Forensics Secur. 2023, 18, 2524–2537. [Google Scholar] [CrossRef]
Hintz, A. Fingerprinting websites using traffic analysis. In Proceedings of the International Workshop on Privacy Enhancing Technologies, Heidelberg, Berlin, 14–15 April 2002; pp. 171–178. [Google Scholar]
Sun, Q.; Simon, D.; Wang, Y.; Russell, W.; Padmanabhan, V.; Qiu, L. Statistical identification of encrypted web browsing traffic. In Proceedings of the IEEE Symposium on Security and Privacy, Berkeley, CA, USA, 12–15 May 2002; pp. 19–30. [Google Scholar]
Qasem, A.; Zhioua, S.; Makhlouf, K. Finding a needle in a haystack: The traffic analysis version. Proc. Priv. Enhancing Technol. 2019, 2019, 270–290. [Google Scholar] [CrossRef]
Shen, M.; Liu, Y.; Chen, S.; Zhu, L.; Zhang, Y. Webpage fingerprinting using only packet length information. In Proceedings of the IEEE International Conference on Communications, Shanghai, China, 20–24 May 2019; pp. 1–6. [Google Scholar]
Feghhi, S.; Leith, D. A web traffic analysis attack using only timing information. IEEE Trans. Inf. Forensics Secur. 2016, 11, 1747–1759. [Google Scholar] [CrossRef]
Yu, S.; Zhou, W.; Jia, W.; Hu, J. Attacking anonymous web browsing at local area networks through browsing dynamics. Comput. J. 2012, 55, 410–421. [Google Scholar] [CrossRef]
Zhuo, Z.; Zhang, Y.; Zhang, Z.; Zhang, X.; Zhang, J. Website fingerprinting attack on anonymity networks based on profile hidden markov model. IEEE Trans. Inf. Forensics Secur. 2017, 13, 1081–1095. [Google Scholar] [CrossRef]
Moore, A.; Zuev, D. Internet traffic classification using bayesian analysis techniques. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Banff, AB, Canada, 6–10 June 2005; pp. 50–60. [Google Scholar]
Herrmann, D.; Wendolsky, R.; Federrath, H. Website fingerprinting: Attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier. In Proceedings of the ACM Workshop on Cloud Computing Security, Chicago, IL, USA, 13 November 2009; pp. 31–42. [Google Scholar]
Sun, G.; Chen, T.; Su, Y.; Li, C. Internet traffic classification based on incremental support vector machines. Mob. Netw. Appl. 2018, 23, 789–796. [Google Scholar] [CrossRef]
Ma, C.; Du, X.; Cao, L. Improved knn algorithm for fine-grained classification of encrypted network flow. Electronics 2020, 9, 324. [Google Scholar] [CrossRef]
Shen, M.; Liu, Y.; Zhu, L.; Xu, K.; Du, X.; Guizani, N. Optimizing feature selection for efficient encrypted traffic classification: A systematic approach. IEEE Netw. 2020, 34, 20–27. [Google Scholar] [CrossRef]
Mun, H.; Lee, Y. Internet traffic classification with federated learning. Electronics 2020, 10, 27. [Google Scholar] [CrossRef]
Balachandran, A.; Amritha, P. VPN network traffic classification using entropy estimation and time-related features. In Proceedings of the IOT with Smart Systems: Proceedings of ICTIS, Singapore, 6 October 2022; pp. 509–520. [Google Scholar]
Gao, B.; Liu, W.; Liu, G.; Nie, F. Resource knowledge-driven heterogeneous graph learning for website fingerprinting. IEEE Trans. Cogn. Commun. Netw. 2024, 10, 968–981. [Google Scholar] [CrossRef]
Wang, Z. The applications of deep learning on traffic identification. BlackHat USA 2015, 24, 1–10. [Google Scholar]
Lotfollahi, M.; Siavoshani, M.; Zade, R.; Saberian, M. Deep packet: A novel approach for encrypted traffic classification using deep learning. Soft Comput. 2020, 24, 1999–2012. [Google Scholar] [CrossRef]
Bu, Z.; Zhou, B.; Cheng, P.; Zhang, K.; Ling, Z. Encrypted network traffic classification using deep and parallel network-in-network models. IEEE Access 2020, 8, 132950–132959. [Google Scholar] [CrossRef]
Yun, X.; Wang, Y.; Zhang, Y.; Zhao, C.; Zhao, Z. Encrypted tls traffic classification on cloud platforms. IEEE/ACM Trans. Net. 2023, 31, 164–177. [Google Scholar] [CrossRef]
Malekghaini, N.; Akbari, E.; Salahuddin, M.; Limam, N.; Boutaba, R.; Mathieu, B.; Moteau, S.; Tuffin, S. Deep learning for encrypted traffic classification in the face of data drift: An empirical study. Comput. Netw. 2023, 225, 109648. [Google Scholar] [CrossRef]
Cui, S.; Han, X.; Han, D.; Wang, Z.; Wang, W.; Jiang, B.; Liu, B.; Lu, Z. FG-SAT: Efficient flow graph for encrypted traffic classification under environment shifts. IEEE Trans. Inf. Forensics Secur. 2025, 20, 5326–5339. [Google Scholar] [CrossRef]
Horowicz, E.; Shapira, T.; Shavitt, Y. A few shots traffic classification with mini-flowpic augmentations. In Proceedings of the 22nd ACM Internet Measurement Conference, Nice, France, 25–27 October 2022; pp. 647–654. [Google Scholar]
Liu, C.; He, L.; Xiong, G.; Cao, Z.; Li, Z. Fs-Net: A flow sequence network for encrypted traffic classification. In Proceedings of the IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019; pp. 1171–1179. [Google Scholar]
Mei, H.; Cheng, G.; Yuan, Y. High precision and efficient anonymous traffic classification in the real-world. IEEE Trans. Net. 2025, 33, 966–981. [Google Scholar] [CrossRef]
Weinberg, Z.; Wang, J.; Yegneswaran, V.; Briesemeister, L.; Cheung, S.; Wang, F.; Boneh, D. Stegotorus: A camouflage proxy for the tor anonymity system. In Proceedings of the ACM Conference on Computer and Communications Security, Raleigh, NC, USA, 16–18 October 2012; pp. 109–120. [Google Scholar]
Houmansadr, A.; Riedl, T.; Borisov, N.; Singer, A. I want my voice to be heard: IP over voice-over-ip for unobservable censorship circumvention. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 24–27 February 2013. [Google Scholar]
Frolov, S.; Wustrow, E. The use of tls in censorship circumvention. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 24–27 February 2019. [Google Scholar]
Luo, X.; Zhou, P.; Chan, E.; Lee, W.; Chang, R.; Perdisci, R. HTTPOS: Sealing information leaks with browser-side obfuscation of encrypted flows. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 6–9 February 2011. [Google Scholar]
Dyer, K.; Coull, S.; Ristenpart, T.; Shrimpton, T. Peek-a-boo, I still see you: Why efficient traffic analysis countermeasures fail. In Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA USA, 20–23 May 2012; pp. 332–346. [Google Scholar]
Cai, X.; Nithyanand, R.; Wang, T.; Johnson, R.; Goldberg, I. A systematic approach to developing and evaluating website fingerprinting defenses. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, 3–7 November 2014; pp. 227–238. [Google Scholar]
Juarez, M.; Imani, M.; Perry, M.; Diaz, C.; Wright, M. Toward an efficient website fingerprinting defense. In Proceedings of the European Symposium on Research in Computer Security, Heraklion, Greece, 28–30 September 2016; pp. 27–46. [Google Scholar]
Gong, J.; Wang, T. Zero-delay lightweight defenses against website fingerprinting. In Proceedings of the 29th USENIX Security Symposium, Boston, MA, USA, 12–14 August 2020; pp. 717–734. [Google Scholar]
Lu, D.; Bhat, S.; Kwon, A.; Devadas, S. Dynaflow: An efficient website fingerprinting defense based on dynamically-adjusting flows. In Proceedings of the Workshop on Privacy in the Electronic Society, Kultuurikatel, Tallinn, 23 January 2018; pp. 109–113. [Google Scholar]
Zhang, F.; He, W.; Chen, Y.; Li, Z.; Wang, X.; Chen, S.; Liu, X. Thwarting wi-fi side-channel analysis through traffic demultiplexing. IEEE Trans. Wireless Commun. 2014, 13, 86–98. [Google Scholar] [CrossRef]
Han, D.; Wang, Z.; Zhong, Y.; Chen, W.; Yang, J.; Lu, S.; Shi, X.; Yin, X. Evaluating and improving adversarial robustness of machine learning-based network intrusion detectors. IEEE J. Sel. Areas Commun. 2021, 39, 2632–2647. [Google Scholar] [CrossRef]
Jiang, M.; Cui, B.; Fu, J.; Wang, T.; Yao, L.; Bhargava, B. RUDOLF: An efficient and adaptive defense approach against website fingerprinting attacks based on soft actor-critic algorithm. IEEE Trans. Inf. Forensics Secur. 2024, 19, 7794–7809. [Google Scholar] [CrossRef]
Cadena, W.; Mitseva, A.; Hiller, J.; Pennekamp, J.; Reuter, S.; Filter, J.; Engel, T.; Wehrle, K.; Panchenko, A. Trafficsliver: Fighting website fingerprinting attacks with traffic splitting. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Taipei, Taiwan, 13–17 October 2020; pp. 1971–1985. [Google Scholar]
Qiao, L.; Wu, B.; Li, H.; Gao, C.; Yuan, W.; Luo, X. Trace-agnostic and adversarial training-resilient website fingerprinting defense. In Proceedings of the IEEE Conference on Computer Communications, Vancouver, BC, Canada, 20–23 May 2024; pp. 211–220. [Google Scholar]
Rahman, M.; Imani, M.; Mathews, N.; Wright, M. Mockingbird: Defending against deep-learning-based website fingerprinting attacks with adversarial traces. IEEE Trans. Inf. Forensics Secur. 2021, 16, 1594–1609. [Google Scholar] [CrossRef]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Nichol, A.; Dhariwal, P. Improved denoising diffusion probabilistic models. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 8162–8171. [Google Scholar]
Zheng, J.; Ramasinghe, S.; Li, X.; Lucey, S. Trading positional complexity vs deepness in coordinate networks. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–24 October 2022; pp. 144–160. [Google Scholar]
Zhu, J.; Chen, X.; He, K.; LeCun, Y.; Liu, Z. Transformers without normalization. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 10–17 June 2025; pp. 14901–14911. [Google Scholar]
Goswami, M.; Szafer, K.; Choudhry, A.; Cai, Y.; Li, S.; Dubrawski, A. MOMENT: A family of open time-series foundation models. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 12–17 July 2024; pp. 16115–16152. [Google Scholar]
Pulls, T.; Witwer, E. Maybenot: A framework for traffic analysis defenses. In Proceedings of the 22nd Workshop on Privacy in the Electronic Society, Copenhagen, Denmark, 26 November 2023; pp. 75–89. [Google Scholar]

Figure 1. The adversary model of WF defenses.

Figure 2. The overview of WFD-EST.

Figure 3. The architecture of SDGLM generator.

Figure 4. The architecture of SDGLM discriminator.

Figure 5. Burst-aware packet manipulation strategy for traffic obfuscation.

Figure 6. Comparison of traffic feature generation fidelity across different generative models.

Figure 7. Impact of key training parameters: (a) The number of diffusion steps, (b) The total number of training epochs, (c) The number of encoder layers in generator, (d) The number of encoder layers in discriminator.

Figure 8. Classification results of obfuscation between Dangdang (D.) and Baidu (B.): (a) FS-Net, (b) NeuTic.

Figure 9. Classification results of obfuscation between Baidu (B.) and QQ Music (M.): (a) FS-Net, (b) NeuTic.

Figure 10. Classification results of obfuscation between Dangdang (D.) and QQ Music (M.): (a) FS-Net, (b) NeuTic.

Figure 11. Large-scale obfuscation performance of three WF defense schemes within the Maybenot dataset.

Figure 12. Transferability performance of three WF defense schemes from the self-captured three representative websites to the Maybenot dataset.

Table 1. Traffic collection environment and configuration.

Option	Content
Client	a PC with Window 10 (Nanjing), Provider: Dell Inc., Round Rock, TX, USA
Browser	Chrome 119.0.6045.200
Shadowsocks	4.4.1.0
Proxy Server	a VPS with 1 CPU, 2 GB memory (Beijing), Provider: Tencent Cloud, Shenzhen, China
Python	Python 3.11.0
Collection Period	15 October 2024–15 November 2024

Table 2. Composition of traffic dataset used for WFD-EST evaluation.

Website	Category	Sample (Training/Testing)	Total
QQ Music	Music	4000/1000	15,000
Baidu	Search	4000/1000
Dangdang	E-commerce	4000/1000

Table 3. Bandwidth and time overhead for website traffic obfuscation between Dangdang (D.) and Baidu (B.).

	D.→B.		B.→D.
	BO	TO	BO	TO
Mockingbird	0.301	0.246	0.299	0.232
Prism	0.228	0.123	0.204	0.139
WFD-EST	0.182	0.081	0.187	0.089

Table 4. Bandwidth and time overhead for website traffic obfuscation between Baidu (B.) and QQ Music (M.).

	B.→M.		M.→B.
	BO	TO	BO	TO
Mockingbird	0.572	0.207	0.373	0.291
Prism	0.388	0.142	0.284	0.218
WFD-EST	0.247	0.099	0.168	0.150

Table 5. Bandwidth and time overhead for website traffic obfuscation between Dangdang (D.) and QQ Music (M.).

	D.→M.		M.→D.
	BO	TO	BO	TO
Mockingbird	0.543	0.239	0.308	0.329
Prism	0.408	0.118	0.231	0.264
WFD-EST	0.267	0.086	0.178	0.174

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, D.; Rao, C.; Huang, J.; Guan, L.; Tian, M.; Liu, W. A Dynamic Website Fingerprinting Defense by Emulating Spatio-Temporal Traffic Features. Electronics 2025, 14, 4441. https://doi.org/10.3390/electronics14224441

AMA Style

Zhang D, Rao C, Huang J, Guan L, Tian M, Liu W. A Dynamic Website Fingerprinting Defense by Emulating Spatio-Temporal Traffic Features. Electronics. 2025; 14(22):4441. https://doi.org/10.3390/electronics14224441

Chicago/Turabian Style

Zhang, Dongfang, Chen Rao, Jianan Huang, Lei Guan, Manjun Tian, and Weiwei Liu. 2025. "A Dynamic Website Fingerprinting Defense by Emulating Spatio-Temporal Traffic Features" Electronics 14, no. 22: 4441. https://doi.org/10.3390/electronics14224441

APA Style

Zhang, D., Rao, C., Huang, J., Guan, L., Tian, M., & Liu, W. (2025). A Dynamic Website Fingerprinting Defense by Emulating Spatio-Temporal Traffic Features. Electronics, 14(22), 4441. https://doi.org/10.3390/electronics14224441

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Dynamic Website Fingerprinting Defense by Emulating Spatio-Temporal Traffic Features

Abstract

1. Introduction

2. Related Work

2.1. WF Attack Techniques

2.2. WF Defense Schemes

3. System Design

3.1. Adversary Model

3.2. Overview of WFD-EST

4. Traffic Obfuscation Process of WFD-EST

4.1. Representation Construction

4.1.1. Packet-Level Representation

4.1.2. Burst-Level Representation

4.2. Target Traffic Feature Generation

4.2.1. Generator Design with Diffusion Model

4.2.2. Discriminator Design with Fine-Tuned Large-Scale Model

4.2.3. Adversarial Learning Process

4.3. Burst-Aware Obfuscation

4.3.1. Feature Alignment

4.3.2. Packet Manipulation

5. Experiments

5.1. Experimental Setup

5.1.1. Dataset

5.1.2. Baselines for Performance Evaluation

5.1.3. Performance Metrics

5.2. Experiment Parameter Selection

5.3. Pairwise Obfuscation Experiments on Three Representative Websites

5.3.1. Obfuscation Between Dangdang and Baidu

5.3.2. Obfuscation Between Baidu and QQ Music

5.3.3. Obfuscation Between Dangdang and QQ Music

5.3.4. Remark

5.4. Generalization Experiment Evaluation

5.4.1. Evaluation on Large-Scale Obfuscation Performance

5.4.2. Evaluation on Transferability

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI