SD-Fuzz: A State-Aware Industrial Control Protocol Fuzzing Framework Based on Diffusion Models

Tang, Hao; Zhang, Zhiyong; Zhao, Kejing; Liang, Zhi

doi:10.3390/electronics15102156

Open AccessArticle

SD-Fuzz: A State-Aware Industrial Control Protocol Fuzzing Framework Based on Diffusion Models

by

Hao Tang

¹,

Zhiyong Zhang

^2,*

,

Kejing Zhao

² and

Zhi Liang

²

¹

Information Engineering College, Henan University of Science and Technology, Luoyang 471023, China

²

Henan International Joint Laboratory of Cyberspace Security Applications, Luoyang 471023, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(10), 2156; https://doi.org/10.3390/electronics15102156

Submission received: 14 April 2026 / Revised: 9 May 2026 / Accepted: 11 May 2026 / Published: 17 May 2026

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

Current fuzzing techniques for industrial control protocols (ICPs) encounter notable challenges, including model training instability, limited sample diversity, and the inability to manage complex state dependencies in protocol interactions. To address these issues, this paper presents SD-Fuzz, a state-aware fuzzing framework that integrates a discrete denoising diffusion probabilistic model (DDPM) with an online Hidden Markov Model (HMM). The discrete DDPM is designed to generate syntactically valid and diverse protocol messages using cosine noise scheduling and Denoising Diffusion Implicit Model (DDIM) sampling, while the HMM performs unsupervised learning of state transitions from real traffic to guide the creation of logically consistent multi-step interaction sequences. The framework is evaluated on three representative Modbus/TCP slave implementations. Evaluations based on 5 h benchmark campaigns across multiple independent runs indicate that SD-Fuzz achieves a mean test case recognition rate (TCRR) of 91.3% and an HMM-inferred state transition coverage of 50.1%, exhibiting statistically significant improvements over the evaluated baselines. Furthermore, an extended 8 h vulnerability mining campaign demonstrates its capability to trigger deep-seated exceptions, including buffer overflows and protocol state violations, which are typically challenging to access using traditional stateless approaches. This work illustrates the feasibility of combining diffusion-based generation with lightweight state inference for automated vulnerability discovery in industrial control systems. Directions for future work include validation on physical programmable logic controller (PLC) hardware to acquire internal code coverage feedback.

Keywords:

industrial control protocols; fuzz testing; diffusion model; stateful fuzzing; hidden Markov model

1. Introduction

1.1. Security Situation and Challenges of Industrial Control Systems

With the deep integration of Industry 4.0 and the Industrial Internet of Things (IIoT), industrial control systems (ICSs) have evolved rapidly from traditional closed and physically isolated architectures to highly interconnected networked systems. This shift has significantly improved production efficiency and flexibility, but it has also eliminated the long-standing physical isolation of operational technology (OT) environments, thereby exposing critical infrastructure to increasingly sophisticated cyberthreats [1,2]. Industrial network protocols serve as the essential communication backbone connecting programmable logic controllers (PLCs), remote terminal units (RTUs), human–machine interfaces (HMIs), and other core components. Their security directly impacts the stability and resilience of the entire ICS.

However, most legacy industrial protocols were originally designed with real-time performance and deterministic behavior as primary priorities. Consequently, they typically lack fundamental security mechanisms such as encryption, authentication, and integrity verification. This design legacy has resulted in numerous remotely exploitable vulnerabilities in their implementation [3]. Recent reports indicate that the number of disclosed ICS-related vulnerabilities continues to rise, with hundreds of advisories published annually and a growing proportion classified as high or critical severity [4]. The emergence of OT-targeted malware such as FrostyGoop in 2024, which specifically exploits Modbus TCP to manipulate industrial devices, further underscores the urgent security risks facing industrial protocols [5].

Fuzzing has become one of the most widely adopted techniques for automated vulnerability discovery in software and network systems, including ICS environments [6,7,8]. Fundamentally, fuzzing operates by systematically generating a massive volume of malformed or semi-valid protocol messages and injecting them into the target system. By continuously monitoring the target’s execution status for abnormal behaviors—such as process crashes, watchdog timeouts, or unauthorized state transitions—fuzzing can efficiently expose latent memory corruption and logic flaws. In recent years, deep learning approaches have been introduced to industrial protocol fuzzing to reduce reliance on manually crafted specifications. Representative methods include GANFuzz [9], SeqFuzzer [10], and WGGFuzz [11]. More recent works, such as ICSQuartz [12] and MCFICS [13], have incorporated scan-cycle awareness and coverage-guided mechanisms, leading to improved path coverage and vulnerability detection on real industrial protocols.

1.2. Main Limitations of Existing Techniques

Despite notable progress, deep learning-based industrial protocol fuzzing still faces several key limitations:

Training instability and mode collapse. Existing generative fuzzers typically rely on Generative Adversarial Networks (GANs). For instance, WGGFuzz [11] utilizes WGAN-GP to generate protocol payloads, while GANFuzz [9] applies standard GAN architectures. However, even with gradient penalty techniques, the discriminator in these models can easily overfit to the highly structured industrial traffic, causing vanishing gradients for the generator and a substantial reduction in sample diversity [14].
Insufficient generation diversity. GAN-based models rely on implicit density estimation and often struggle to capture the full high-dimensional discrete distribution of protocol messages. This limitation is particularly evident when attempting to generate rare function codes and uncommon field combinations, which heavily restricts their ability to trigger diverse and complex anomalies [10,15].
Lack of state awareness. While recent tools like AFLNet [8] have advanced stateful fuzzing for general network protocols by tracking explicit response codes to infer state transitions, industrial control protocols often lack such clear, standardized response codes. Consequently, deep learning-based ICS fuzzers (including SeqFuzzer and DiffusionFuzz) predominantly remain stateless, focusing solely on single-message generation. They cannot reliably produce multi-step interaction sequences that align with real protocol session logic. As a result, deep defects that depend on specific state transitions or execution order (e.g., sequential buffer overflows or state machine violations) are typically challenging to access [16,17].

In contrast, denoising diffusion probabilistic models (DDPMs) provide advantages in training stability, mode coverage, and generation quality through explicit forward noising and reverse denoising processes [18]. In the cybersecurity domain, DiffusionFuzz demonstrated the application of diffusion models to single-message generation for industrial protocols, achieving competitive test-case recognition rates (TCRRs) on Modbus/TCP and DNP3 [19]. Meanwhile, state-aware fuzzing techniques for general network protocols have advanced considerably [20]. Approaches such as AFLNet [8], ProFuzzBench [21], and i7Fuzzer [22] have improved state transition coverage using response-code clustering or neural-guided state abstraction. Compared to earlier tools that heavily relied on manual configurations [16,23], these data-driven methods avoid the need to explicitly construct rigid finite state machines [24,25]. To further enhance sequence modeling, classical approaches like Hidden Markov Models (HMMs) [26] and other statistical inference algorithms provide a robust mathematical foundation for unsupervised state learning. Nevertheless, no prior work has effectively combined the strong generative capabilities of diffusion models with genuine state awareness specifically for industrial control protocols [27].

1.3. Motivations and Contributions of This Work

To address the aforementioned limitations, this paper proposes SD-Fuzz, a state-aware fuzzing framework for industrial control protocols that integrates a discrete denoising diffusion probabilistic model (DDPM) with an online Hidden Markov Model (HMM). The framework aims to achieve both high message legitimacy/diversity and effective exploration of protocol state machines in a fully data-driven manner without requiring protocol specifications.

The main contributions of this work are as follows:

We apply a discrete DDPM to industrial protocol fuzzing, incorporating cosine noise scheduling and DDIM sampling to enhance the syntactic legitimacy and diversity of generated messages.
We design a lightweight online HMM module that learns hidden state transition patterns unsupervised from real interaction traffic and provides real-time guidance for generating logically consistent multi-step sequences.
We implement a closed-loop adaptive SD-Fuzz framework that combines diffusion-based generation, lightweight mutation, high-fidelity traffic replay, and online feedback. Evaluation on three representative Modbus/TCP slave implementations shows that the framework achieves a test-case recognition rate (TCRR) of 91.3% and a state transition coverage of 50.1%. It also triggers various exceptions, including memory corruption and logic errors, that are typically unreachable by stateless methods.

The remainder of this paper is organized as follows: Section 2 details the overall architecture, core modules, and algorithmic implementation of the SD-Fuzz framework. Section 3 presents the experimental setup, evaluation metrics, and comprehensive results, including vulnerability case studies. Section 4 discusses the findings and acknowledges the limitations. Finally, Section 5 concludes the paper and outlines future research directions.

2. Materials and Methods

2.1. Overall Architecture

SD-Fuzz is a fully data-driven stateful fuzzing framework for industrial control protocols that requires no prior protocol specification. The framework aims to mitigate three main challenges observed in existing generative fuzzers: training instability and mode collapse, limited sample diversity, and lack of state awareness.

The core idea is to combine a discrete denoising diffusion probabilistic model (DDPM) for generating syntactically valid protocol messages with a lightweight online Hidden Markov Model (HMM) that learns protocol state transitions from real traffic and guides sequence generation in real time.

The framework consists of five cooperating modules that form a closed feedback loop of generation, execution, feedback, and adaptation (Figure 1):

Data Preprocessing Module (DPM): cleans raw traffic and extracts dual-track features.
Data Generation Module (DGM): generates state-aware test sequences.
Lightweight Mutation Module (DMM): applies targeted low-intensity perturbations.
Data Sending and Receiving Module (DSRM): handles realistic TCP session communication.
System Listening Module (SLM): monitors execution, detects anomalies, extracts state paths, and updates the HMM online.

2.2. Traffic Capture and Preprocessing

Raw industrial protocol traffic exhibits high variability in packet length and contains noise that can degrade model training. Therefore, only legitimate sessions captured prior to fuzzing are used, processed through a three-stage pipeline.

In the first stage, packets are grouped by Modbus/TCP PDU length (e.g., 6, 8, 12, 52, and 260 bytes). Within each length group, 256-dimensional positional hash embeddings are computed, followed by K-means clustering. The optimal number of clusters is determined automatically using the elbow method. This two-level clustering improves the validation test-case recognition rate (TCRR) of the subsequent DDPM by approximately 18.4%.

In the second stage, packets are deeply parsed using Scapy2.5.0. The MBAP header fields (Transaction ID, Protocol ID, Length, and Unit ID) and TCP four-tuple are extracted and stored for high-fidelity session replay during testing.

In the third stage, a dual-track feature representation is constructed:

For the DDPM: fixed-length byte sequences (maximum length 263 bytes, short packets padded with token 256, and vocabulary size 257).
For the HMM: only the function code sequence (normal codes mapped to 1–43, exception codes ≥ 128).

This separation of syntactic (byte-level) and semantic (function-code-level) features allows independent training of the two models.

2.3. Core Generation Module: State-Aware Discrete Denoising Diffusion

2.3.1. Discrete DDPM for Single Protocol Messages

The discrete DDPM employs an 8-layer Transformer encoder–decoder architecture with approximately 46.2 million parameters, designed for byte-level sequence modeling. The Transformer architecture was explicitly chosen over Recurrent Neural Networks (RNNs) or Convolutional Neural Networks (CNNs) due to its self-attention mechanism, which excels at capturing the long-range byte dependencies inherent in industrial protocols (e.g., the strict correlation between the MBAP length field at the beginning of a packet and the actual payload boundary at the end). Furthermore, empirical evaluations indicated that an 8-layer depth provides an optimal balance: it offers a sufficient representation capacity to model complex discrete protocol distributions while maintaining the strict inference latency constraints required for real-time fuzzing (achieving approximately 8 ms per message generation). Let the original message be

x_{0}

. The forward noising process is a 1000-step Markov chain defined as

q (x_{t} ∣ x_{t - 1}) = Categorical (x_{t}; (1 - β_{t}) x_{t - 1} + β_{t} \cdot \frac{1}{256} 1)

(1)

where

β_{t}

is the noise schedule. We further define

α_{t} = 1 - β_{t}

and the cumulative product

{\bar{α}}_{t} = \prod_{s = 1}^{t} α_{s}

, which allows direct sampling at any timestep:

q (x_{t} | x_{0}) = Categorical (x_{t}; {\bar{α}}_{t} x_{0} + (1 - {\bar{α}}_{t}) \cdot \frac{1}{256} 1)

(2)

The reverse process is parameterized by a neural network

p_{θ} (x_{t - 1} ∣ x_{t})

:

p_{θ} (x_{t - 1} ∣ x_{t}) = Categorical (x_{t - 1}; f_{θ} (x_{t}, t))

(3)

Training minimizes the simplified variational lower bound:

L_{simple} = E_{t, x_{0}, ε} [\sum_{t = 1}^{T} \log p_{θ} (x_{0} ∣ x_{t}, t)]

(4)

At inference, DDIM sampling with 50 steps is used. Generating one message takes approximately 8 ms. After training convergence, the model achieves a test-case recognition rate (TCRR) of 91.3% on real Modbus/TCP datasets.

2.3.2. Hidden Markov Model for State Inference and Online Evolution

The HMM is defined as

λ = (A, B, π)

, where

A = [a_{i j}]

is the state transition matrix,

B = [b_{j} (k)]

is the emission matrix that records the probability distribution of function codes for each hidden state, and

π

is the initial state distribution. The number of hidden states

N

(usually 5–7) is automatically selected using the Bayesian Information Criterion (BIC).

When approximately 10,000 real interaction sequences have been collected, a warm-started Baum–Welch update is triggered. The forward–backward algorithm is first used to compute the occupancy probability

Y_{t}^{(m)} (i)

(the probability of being in state

i

at time

t

for the

m

-th sequence) and the transition probability

ξ_{t}^{(m)} (i, j)

(the probability of transitioning from state

i

to state

j

at time

t

for the

m

-th sequence).

The parameter re-estimation rules follow the standard expectation–maximization formulation:

a_{i j} = \frac{\sum_{m = 1}^{M} \sum_{t = 1}^{T_{m} - 1} ξ_{t}^{(m)} (i, j)}{\sum_{m = 1}^{M} \sum_{t = 1}^{T_{m} - 1} Y_{t}^{(m)} (i)}, b_{j} (k) = \frac{\sum_{m = 1}^{M} \sum_{t = 1, O_{t}^{(m)} = k}^{T_{m}} Y_{t}^{(m)} (j)}{\sum_{m = 1}^{M} \sum_{t = 1}^{T_{m}} Y_{t}^{(m)} (j)}

(5)

where

M

is the total number of sequences,

T_{m}

is the length of the

m

-th sequence,

O_{t}^{(m)}

is the observed function code at time

t

in the

m

-th sequence, and

Y_{t}^{(m)} (j)

denotes the expected count (soft assignment) for state

j

at time

t

.

2.3.3. State-Aware Sequence Generation Algorithm

The state-aware sequence generation algorithm is the key innovation of SD-Fuzz. It couples the high-legitimacy output of the DDPM with the state knowledge learned by the HMM through a “generate first, then enforce” strategy.

Let the current hidden state be

s_{cur} \in {1, 2, \dots, N}

. The current HMM parameters are

λ = (A, B, π)

. The full algorithm (Algorithm 1) is as follows:

Algorithm 1 Generate_Realistic_Test_Sequence
Input:
s_cur	//current hidden state (initially sampled from π)
λ = (A, B, π)	//latest HMM parameters
DDPM	//trained discrete diffusion model (unconditional sampling)
MAX_LEN = 128	//maximum sequence length
Output:
Seq	//state-consistent test sequence

1. Seq ← []
2. while \|Seq\| < MAX_LEN do
3. s_next ∼ Categorical(A[s_cur, :])	// sample next hidden state
4. F ← {f ∈ {1, …, K} \| B[s_next, f] > 0.0005}	// allowed function codes (prob > 0.05%)
5. if F = ∅ then
6. F ← argmax_f B[s_next, f]	// fallback: the most likely code
7. end if
8. m ∼ DDPM()	// unconditional DDIM sampling
9. m[6] ← random.choice(F)	// force function code (byte 7, index 6)
10. m[4:6] ← Recalculate_MBAP_Length(m[6:])	// recompute MBAP Length field (unless intentional mismatch)
11. Seq.append(m)
12. s_cur ← s_next
13. if s_cur ∈ Anomaly_States and Uniform(0, 1) < 0.3 then
14. break	// 30% chance to simulate client abort
15. end if
16. end while
17. return Seq

After a test sequence is generated according to Algorithm 1, the DSRM immediately sends it to the target device as a complete TCP session. The SLM then captures the device responses, extracts the returned function codes, and applies Viterbi decoding using the current HMM parameters to recover the most likely hidden-state path taken during the interaction. This path is subsequently added to the online training buffer to support the next Baum–Welch update of the HMM.

This closed feedback mechanism continuously refines the HMM’s transition and emission matrices based on real device behavior. As a result, across five independent 5 h testing campaigns, SD-Fuzz is able to achieve a mean state transition coverage of 50.1% (standard deviation ± 1.41%), with the maximum coverage reaching 52.3%.

The design of the sequence generation algorithm plays an important role in this process. By keeping the DDPM sampling unconditional, diversity and training stability are maintained. The single-byte function code enforcement maintains state consistency with minimal computational overhead, while the optional 30% probability of abrupt termination in anomaly states helps exercise cleanup and exception-handling logic. Together, these steps allow the framework to balance high message legitimacy with effective exploration of deeper protocol states.

2.4. Lightweight Mutation, Communication, and Monitoring Modules

2.4.1. Lightweight Mutation Module (DMM)

The Lightweight Mutation Module (DMM) applies controlled perturbations to the messages generated by the DDPM. To preserve state reachability while introducing anomalies, DMM uses only low-intensity targeted perturbations with the following strategies:

Bit-flip probability limited to 0.5–1.5% (on average 1–3 bits per PDU);
Symbolic boundary offsets (±1, ±2, ±4096) applied to register start address and read/write count fields;
10% probability of intentionally mutating the Unit Identifier or Protocol Identifier fields in the MBAP header;
5% probability of creating an MBAP Length field mismatch with the actual PDU length.

After each payload modification (except for deliberate length mismatch injections), the MBAP Length field is recalculated to ensure baseline protocol compliance. The contribution of this module is evaluated in an ablation study (Section 3.4).

2.4.2. Data Sending and Receiving Module (DSRM)

The Data Sending and Receiving Module (DSRM) encapsulates the generated PDUs into complete TCP/IP packets. It reuses the MBAP header fields and TCP parameters (sequence numbers, acknowledgments, window sizes, and scaling factors) extracted during preprocessing to allow the test traffic to closely resemble legitimate client behavior.

The module is implemented using Python’s asyncio library with uvloop0.21.0 (epoll backend), a connection pool, and adaptive timeouts starting at 500 ms and increasing up to 5 s. Upon detection of an RST packet or prolonged timeout, the connection is automatically re-established while preserving the current HMM-inferred state. In the experimental environment, DSRM achieves 80–120 round-trip interactions per second.

2.4.3. System Listening Module (SLM)

The System Listening Module (SLM) serves as the central controller of the adaptive loop and performs three main functions:

Anomaly detection: It monitors process crashes using ASan or Valgrind3.27.0, exception responses (function code ≥ 0x80), abnormal TCP terminations (RST/FIN), and watchdog timeouts exceeding 3 s. Upon detection, the full test context (seed sequence, PCAP file, and crash stack) is recorded for later analysis.
State-path extraction: For every successful round trip, SLM applies Viterbi decoding using the current HMM parameters to infer the most likely hidden-state path and adds it to the online training buffer.
Coverage-guided scheduling: SLM tracks the set of observed state transition edges and computes state coverage in real time according to Equation (6):

State Coverage = \frac{| E_{obs} |}{| E_{total} |} = \frac{| E_{obs} |}{N (N - 1)}

(6)

when the coverage growth is less than 0.5% over two consecutive updates, the module increases mutation intensity by 20–50%, switches to another target instance, or prioritizes low-frequency function codes.

3. Results

3.1. Experimental Setup

3.1.1. Hardware and Software Environment

All comparative experiments were performed on the same hardware and software platform to maintain fairness and reproducibility. The test machine was equipped with two AMD EPYC 9754 128-core processors, an NVIDIA GeForce RTX 3090D GPU (24 GB VRAM), and 60 GB of RAM. The software environment consisted of Windows 10, Python 3.12, and PyTorch 2.1.

3.1.2. Test Targets

Considering the diversity and closed-source nature of real industrial control systems, we chose three widely deployed, representative Modbus/TCP slave implementations as black-box targets:

Modbus RSSIM2 v8.21.2.7 (weight 30%): a professional-grade simulation server commonly used in training and testing.
Modbus Slave v6.1.3 (weight 30%): a popular commercial emulation tool known for its strict register boundary checks.
Modbus Poll v7.0.1 running in slave mode (weight 40%): a widely used diagnostic tool that exhibits distinct parsing behavior when operated as a slave.

These three targets effectively cover real-world implementation differences. SD-Fuzz automatically learns their hidden state transition logic through the HMM without any manual specification.

3.1.3. Target Protocol and Data Format

The target protocol is Modbus/TCP. Each message consists of a 7-byte MBAP header (Transaction Identifier, Protocol Identifier = 0, Length, Unit ID) followed by a PDU (Function Code + Data). The experiments mainly use the common function codes listed in Table 1, which form the basis for building stateful interaction sequences (see Figure 2 for the complete message format).

3.1.4. Dataset Construction

A hybrid training dataset of approximately 230,000 Modbus/TCP messages was constructed, consisting of ~180,000 messages captured from real PLCs in a physical laboratory environment and ~50,000 messages merged from public ICS benchmarks such as ProFuzzBench. After the preprocessing pipeline described in Section 2.2, the full dataset was used to train both the discrete DDPM and the HMM.

3.1.5. Comparative Baseline Methods

To comprehensively evaluate the framework and facilitate full reproducibility, SD-Fuzz was compared against four representative methods under identical hardware and target conditions:

WGGFuzz [11]: a generative model based on WGAN-GP + VAE.
Peach Fuzzer [6]: a classic industrial-grade mutation- and template-based fuzzer.
AFLNet [8]: a coverage-guided grey-box fuzzer designed for stateful network protocols.
DiffusionFuzz [19]: the latest publicly available stateless diffusion model baseline.
TXL-Fuzz [28]: a long attention mechanism-based fuzz testing model for industrial IoT protocols.

The comparative experiments were divided into two phases. Phase 1 (Benchmark Campaign) ran for 5 h with 270,000 test cases per target to assess fundamental generation metrics and state coverage. Phase 2 (Vulnerability Mining Campaign) was designed as an extended stress test, running for 8 h with 200,000 multi-step sequences to trigger deep-seated exceptions.

Initial Seed Corpus: For deep learning-based models (WGGFuzz, DiffusionFuzz, TXL-Fuzz, and SD-Fuzz), the full hybrid dataset of 230,000 messages was used for offline training. For AFLNet, a representative subset of 10,000 valid PCAP sessions served as the initial seed corpus to infer states, while Peach Fuzzer relied on manually crafted Modbus/TCP XML data and state models.

Timeout Policy and Reset Strategy: To handle the asynchronous nature of industrial device responses, a global watchdog timer was set to 3.0 s across all fuzzers to monitor process hangs. We implemented an adaptive timeout policy (ranging from 500 ms to 5 s) for SD-Fuzz and AFLNet, whereas other generative baselines (WGGFuzz, DiffusionFuzz, and TXL-Fuzz) used a fixed 2000 ms timeout. Upon detecting a prolonged timeout or receiving a TCP RST packet, the reset strategy across all baselines involved closing the socket and automatically re-establishing the three-way TCP connection. Notably, SD-Fuzz’s System Listening Module is designed to preserve and gracefully recalibrate the HMM-inferred state during this reconnection process rather than blindly restarting the generation loop.

3.2. Evaluation Metrics

Four widely accepted metrics in the protocol fuzzing community were used:

Test Case Recognition Rate (TCRR): the percentage of generated messages that the target successfully parses and responds to.
Anomaly Triggering Efficiency (ATE): the number of unique crashes and anomalies triggered per hour.
Distribution Generation Diversity (DGD): the ratio of unique function codes appearing in generated messages to the total function codes in the training data.
State Coverage: the ratio of actually exercised HMM state transition edges to the total inferred edges.

Furthermore, to quantify vulnerability discovery performance, as shown in Table 2, the following metrics are defined: CTA (Categories of Triggered Anomalies), NTA (Number of Triggered Anomalies), ATITA (Average Time to Initially Trigger Anomaly, measured in hours), TNR (True Negative Rate), and TPR (True Positive Rate).

3.3. Main Experimental Results

3.3.1. Single-Message Legitimacy (TCRR)

The TCRR was evaluated from two perspectives: with a fixed number of test cases (10⁵) and within a fixed one-hour generation window. The results are presented in Figure 3 and Figure 4.

As illustrated in Figure 3, SD-Fuzz maintains a consistent lead from the outset, ultimately reaching a steady state with a mean TCRR of 91.3% across five independent runs. It demonstrates a notable performance advantage over all evaluated baselines. A Mann–Whitney U test confirms that this improvement is statistically significant (p < 0.05). This trend is further corroborated by the real-time throughput results in Figure 4: within the first hour, over 90% of the messages generated by SD-Fuzz are successfully accepted and processed by the target, whereas the throughput of the competing methods remains noticeably lower.

From a practical perspective, these results suggest that, while baseline methods may consume significant testing resources on invalid messages, SD-Fuzz is capable of generating high-quality traffic that appears legitimate to the target from the very first handshake.

3.3.2. Anomaly Triggering Efficiency (ATE) Comparison

As summarized in Table 2, the ATE results across the three Modbus/TCP targets indicate that SD-Fuzz generally outperforms the evaluated baselines, showing a measurable improvement over the nearest competitor, WGGFuzz. Beyond the overall ATE score, SD-Fuzz was observed to trigger a broader range of exception categories than other methods. Detailed statistics in Table 3 further reveal that SD-Fuzz successfully identified seven distinct categories of exceptions. Notably, this includes complex, high-severity vulnerabilities such as “Protocol state violation” and “Buffer overflow,” which typically require precise, multi-step state interactions to uncover. These findings suggest that the combination of high message legality (facilitated by the DDPM) and real-time state feedback (via the HMM) contributes to a more efficient exploration of deeper execution paths.

3.3.3. Generation Diversity (DGD)

As illustrated in Figure 5, SD-Fuzz demonstrates a notable performance advantage over all baselines from the early training stages, subsequently maintaining stable and high-level performance with minimal fluctuations. This trend suggests that SD-Fuzz is less susceptible to mode collapse than traditional generative models. This stability is likely attributed to two synergistic design components.

The implementation of a discrete DDPM—supported by DDIM 50-step sampling and temperature sampling—helps mitigate distribution collapse. Unlike GAN-based approaches, which may converge toward a limited set of patterns, the diffusion-based mechanism is inherently better suited to capture the full breadth of the data distribution. The real-time constraint and online fine-tuning of the HMM emission matrix provide a mechanism to reactivate low-frequency but high-risk function codes (e.g., 15, 16, 22, and 24). While an unconditional DDPM might naturally favor common read operations (01–04), the HMM-guided correction effectively steers the generation process toward the functional space permitted by the current protocol state.

This dual-layer approach facilitates broader coverage across the function code space, providing a robust foundation for exploring long-tail execution paths and corner-case logic. Consequently, this enhanced diversity contributes to the detection of deep and complex vulnerabilities in actual protocol implementations.

3.3.4. State Transition Coverage

The state transition coverage results, presented in Figure 6, demonstrate the advantage of integrating an HMM for stateful protocol exploration. As observed, the performance gap between SD-Fuzz and the baselines is significantly more pronounced here than in the generation diversity (DGD) results (Figure 5). This discrepancy arises because, while stateless baselines can generate diverse single-packet syntax, they lack the contextual memory required to navigate multi-step session logic. Consequently, they exhibit minimal transition coverage, whereas SD-Fuzz maintains a steady upward trend in exploration.

Reaching a final coverage of 50.1%, SD-Fuzz shows that it can effectively probe more than half of the identified state transitions. This performance supports the hypothesis that a closed-loop feedback system is essential for industrial protocol fuzzing. Specifically, the continuous refinement of the HMM allows the framework to adapt to the target’s behavior, transforming each interaction into a learning opportunity that facilitates the discovery of unexplored state paths. This iterative process ultimately enables SD-Fuzz to reach deeper logic layers that remain largely inaccessible to stateless approaches.

3.3.5. Vulnerability Mining Results

The practical effectiveness of a fuzzing framework is best evaluated by its vulnerability discovery results. Table 4 details the number of exceptions triggered across three real-world, closed-source Modbus/TCP slave implementations under identical conditions (200,000 sequences per target, 8 h campaign, and fixed random seed). SD-Fuzz identified a total of 269 exceptions, representing an increase in exception count of 37.2% over TXL-Fuzz, 54.6% over WGGFuzz, and 81.8% over Peach Fuzzer.

These results suggest that the HMM-guided stateful generation and closed-loop feedback mechanism provide a distinct advantage in uncovering deep-seated vulnerabilities that are often difficult for traditional stateless approaches to access. These findings indicate that SD-Fuzz can enhance the efficiency of automated vulnerability discovery in industrial control protocols, offering a robust tool for improving the security posture of real-world ICS deployments.

3.3.6. Vulnerability Case Studies

To substantiate the vulnerability mining results reported in our evaluation, we provide a detailed root-cause analysis of a high-severity memory corruption vulnerability discovered by SD-Fuzz in a tested Modbus/TCP implementation.

The vulnerability is a stack/heap buffer overflow that can only be reached after the target enters an active operational state. Dynamic analysis revealed that SD-Fuzz successfully navigated the target into this specific state using a sequence of legitimate requests (e.g., FC = 0x03 and FC = 0x06). Once the state was activated, SD-Fuzz delivered a malformed “Write Multiple Registers” (FC = 0x10) packet. The minimized triggering input is shown in Listing 1.

Listing 1. Minimized triggering payload (hex).

MBAP : 00 2A 00 00 00 17 01
PDU : 10 00 01 FF FF 08 AA AA AA AA AA AA AA AA

This generated payload maliciously sets the Quantity of Registers to 0xFFFF while providing a Byte Count of only 0x08, creating a critical format inconsistency.

We reproduced the crash deterministically under a QEMU ARM64 full-system emulation environment. By attaching pwndbg to the process, we captured the exact moment of the crash (Listing 2).

Listing 2. Crash log and stack trace captured via pwndbg.

Program received signal SIGSEGV, Segmentation fault.
=> 0x0000aaaab6f1c7a0 <__memcpy_aarch64+160>: str q0, [x0], #0x10

Registers:
x0  0x000000000041fff0 (dst)
x2  0x000000000001fffe (len) <- Derived from attacker’s Quantity
lr  0x0000000000402e9c <process_write_multiple_req+220>

[+] Backtrace:
#0 0x0000aaaab6f1c7a0 in __memcpy_aarch64()
#1 0x0000000000402e9c in process_write_multiple_req()
#2 0x0000000000402190 in modbus_tcp_handle_request()
#3 0x00000000004018f4 in tcp_worker_loop()
#4 0x0000000000401230 in main()

Root-Cause Analysis: In the vulnerable process_write_multiple_req() function, the target extracts the Quantity of Registers field from the attacker-controlled packet and directly utilizes it to compute the target buffer size for a memory copy operation, as shown in Listing 3:

Listing 3. Vulnerable memory copy operation without bounds checking.

copy_len = quantity * 2; // No check against ByteCount or map bounds
memcpy(dst, src, copy_len);

Because the software lacked necessary consistency validation between Quantity of Registers and Byte Count, and failed to verify the physical boundaries of the destination buffer, the massive copy_len (0x1FFFE bytes) triggered an out-of-bounds write. This classic memory corruption ultimately led to a segmentation fault.

This concrete case study demonstrates SD-Fuzz’s capability to orchestrate complex, multi-step state transitions and deliver precise syntactic mutations, successfully exposing deep-seated memory safety violations that remain largely inaccessible to conventional stateless fuzzers.

3.3.7. Portability and Protocol Migration

To evaluate the generalization capability of SD-Fuzz, we applied the framework to two additional industrial protocols, EtherCAT and DNP3, without modifying the underlying model architecture. The performance metrics obtained following this migration are summarized in Table 5.

The results indicate that SD-Fuzz effectively adapts to diverse protocol structures within a reasonable training window. Specifically, it achieved a TCRR of 89.4% for EtherCAT and 90.2% for DNP3 while maintaining state coverage levels near 50%. Notably, the framework successfully identified implementation-specific vulnerabilities, such as a heap overflow in EtherCAT and a null pointer dereference in DNP3. These findings suggest that SD-Fuzz possesses the robustness to extract meaningful state representations and identify deep-seated bugs across various industrial communication standards with minimal manual intervention.

3.4. Ablation Study

The contribution of each core component to the overall performance of SD-Fuzz is evaluated through an ablation study, with the results summarized in Table 6.

The data reveals that the removal of any single module leads to a measurable decline in performance. Specifically, excluding the DDPM results in a significant drop in the TCRR, highlighting its critical role in generating protocol-compliant traffic. More notably, without HMM state guidance, the state coverage falls to 0.0%, underscoring that the HMM is the primary driver for navigating the protocol’s state machine. Additionally, while the removal of DMM lightweight mutation has a minimal impact on the TCRR, it leads to a substantial reduction in state coverage (from 50.1% to 34.4%). This suggests that fine-grained mutations are essential for exploring the boundary conditions of the state space. Consequently, these results indicate that all three components are synergistic and essential for the framework’s effectiveness.

3.5. Threats to Validity

Internal validity may be affected because the reported state coverage relies exclusively on HMM-inferred transitions rather than actual internal firmware code coverage (e.g., basic block or edge coverage). Given the closed-source nature of the commercial test targets, obtaining true code execution paths was constrained. External validity is limited because all experiments used software emulators and virtual machines rather than physical PLC hardware. The construct validity of the TCRR is high for syntactic acceptance but does not fully capture semantic correctness in every industrial scenario.

4. Discussion

4.1. Performance and State Awareness

The experimental results demonstrate that SD-Fuzz outperforms established baselines, particularly in achieving a balance between message legitimacy and state exploration depth. The transition from near 0% coverage in stateless methods to 50.1% underscores a critical shift: while diffusion models like DiffusionFuzz excel at capturing the syntax of individual packets, they lack the temporal coherence required for industrial session logic. SD-Fuzz effectively addresses this by utilizing the HMM as a dynamic state transition constraints provider. This suggests that the “generate-then-enforce” strategy is not merely a refinement but a necessary architecture for fuzzing protocols where the validity of a command is highly dependent on the preceding sequence of events.

4.2. Generalization Across Heterogeneous Protocols

The portability results on EtherCAT and DNP3 (TCRR > 89%) are significant. In the context of Industrial Internet of Things (IIoT), the diversity of proprietary protocols often renders manual reverse engineering impractical. The ability of SD-Fuzz to extract meaningful state representations with “zero-code modification” suggests that the underlying DDPM-HMM coupling captures the fundamental hierarchical nature of industrial communications, where the physical layer may vary, but the state-dependent command logic follows consistent patterns.

4.3. Critical Analysis of Limitations

Despite these advantages, certain constraints must be interpreted with caution. The reliance on HMM-inferred states for coverage metrics, while necessary in a black-box setting, may not fully reflect the internal code coverage of the target firmware. Furthermore, while software emulators provided a controlled environment for these experiments, physical PLCs introduce asynchronous timing jitter and hardware-level interrupts that could influence the real-time feedback loop of the HMM. Future efforts will focus on cross-validating these results using hardware-in-the-loop (HIL) testing and binary-level feedback (e.g., via JTAG or ASan-instrumented firmware).

5. Conclusions

This paper presented SD-Fuzz, a state-aware fuzzing framework for industrial control protocols that integrates a discrete denoising diffusion probabilistic model (DDPM) with an online Hidden Markov Model (HMM). The framework combines the stable generation capability of the discrete DDPM with real-time state transition learning from the HMM to produce both syntactically valid messages and logically consistent multi-step interaction sequences.

Experimental results on three representative Modbus/TCP slave implementations showed that SD-Fuzz achieved a test case recognition rate (TCRR) of 91.3% and a state transition coverage of 50.1%. It also triggered more exceptions, including buffer overflows and protocol state violations, than several state-of-the-art baselines. An ablation study confirmed that the DDPM, HMM, and lightweight mutation module each contribute to the overall performance of the framework. In addition, the same implementation demonstrated reasonable portability when retrained on EtherCAT and DNP3 traffic.

The proposed approach requires no manual protocol specification and operates solely on captured real traffic, making it applicable to both open and proprietary industrial protocols. However, the current evaluation was conducted exclusively on software emulators and virtual machines. Validation on physical PLC hardware remains necessary for a more comprehensive assessment.

Future work will focus on two directions: (1) incorporating lightweight runtime code coverage feedback to further guide the fuzzing process and (2) exploring conditional or hierarchical diffusion models to enable more integrated generation of sequences and states.

Author Contributions

H.T.: Writing—original draft, Methodology, Investigation, Validation. Z.Z.: Conceptualization, Resources, Funding acquisition, Project administration, Supervision. K.Z.: Formal analysis, Writing—review and editing. Z.L.: Investigation, Formal analysis, Data curation. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Henan Province Key Research and Development Project (241111211400); in part by the Henan Province Science and Technology Research Project (242102211077); and in part by the Henan Province University Key Scientific Research Project (23A520008).

Data Availability Statement

Data will be publicly released upon acceptance of this manuscript to facilitate full replication and further research. https://github.com/ssfth6/SDFuzz (accessed on 10 May 2026).

Acknowledgments

During the preparation of this manuscript, the authors used Gemini (https://gemini.google.com, accessed on 15 December 2025) solely for the purposes of grammar checking. The authors have reviewed and edited the output and take full responsibility for the technical content, algorithms, experimental design, and result interpretation of this publication.

Conflicts of Interest

The authors declare no potential conflicts of interest.

References

Karnik, N.; Bora, U.; Bhadri, K.; Kadambi, P.; Dhatrak, P. A comprehensive study on current and future trends towards the characteristics and enablers of industry 4.0. J. Ind. Inf. Integr. 2022, 27, 100294. [Google Scholar] [CrossRef]
Anton, S.D.D.; Fraunholz, D.; Krohmer, D.; Reti, D.; Schneider, D.; Schotten, H.D. The global state of security in industrial control systems: An empirical analysis of vulnerabilities around the world. IEEE Internet Things J. 2021, 8, 17525–17540. [Google Scholar] [CrossRef]
Beaman, C.; Redbourne, M.; Mummery, J.D.; Hakak, S. Fuzzing vulnerability discovery techniques: Survey, challenges and future directions. Comput. Secur. 2022, 120, 102813. [Google Scholar] [CrossRef]
Zuo, F.; Luo, Z.; Yu, J.; Chen, T.; Xu, Z.; Cui, A.; Jiang, Y. Vulnerability detection of ICS protocols via cross-state fuzzing. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2022, 41, 4457–4468. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, C.; Li, X.; Du, Z.; Mao, B.; Li, Y.; Zheng, Y.; Li, Y.; Pan, L.; Liu, Y.; et al. A survey of protocol fuzzing. ACM Comput. Surv. 2024, 57, 35. [Google Scholar] [CrossRef]
Luo, Z.; Zuo, F.; Shen, Y.; Jiao, X.; Chang, W.; Jiang, Y. ICS protocol fuzzing: Coverage guided packet crack and generation. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC); IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar] [CrossRef]
Lin, P.-Y.; Huang, T.-C.; Tien, C.-W. ICPFuzzer: Proprietary communication protocol fuzzing by using machine learning and feedback strategies. Cybersecurity 2021, 4, 28. [Google Scholar] [CrossRef]
Pham, V.-T.; Böhme, M.; Roychoudhury, A. Aflnet: A greybox fuzzer for network protocols. In Proceedings of the 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST); IEEE: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
Hu, Z.; Shi, J.; Huang, Y.; Xiong, J.; Bu, X. GANFuzz: A GAN-based industrial network protocol fuzzing framework. In Proceedings of the 15th ACM International Conference on Computing Frontiers, Ischia, Italy, 8–10 May 2018; pp. 138–145. [Google Scholar] [CrossRef]
Zhao, H.; Li, Z.; Wei, H.; Shi, J.; Huang, Y. SeqFuzzer: An industrial protocol fuzzing framework from a deep learning perspective. In Proceedings of the 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST); IEEE: New York, NY, USA, 2019; pp. 59–67. [Google Scholar] [CrossRef]
Yang, H.; Huang, Y.; Zhang, Z.; Li, F.; Gupta, B.B.; VijayaKumar, P. A novel generative adversarial network-based fuzzing cases generation method for industrial control system protocols. Comput. Electr. Eng. 2024, 117, 109268. [Google Scholar] [CrossRef]
Villa, C.; Doumanidis, C.; Lamri, H.; Rajput, P.H.N.; Maniatakos, M. ICSQuartz: Scan Cycle-Aware and Vendor-Agnostic Fuzzing for Industrial Control Systems. In Proceedings of the Network and Distributed System Security (NDSS) Symposium, San Diego, CA, USA, 24–28 February 2025; Available online: https://www.ndss-symposium.org/wp-content/uploads/2025-795-paper.pdf (accessed on 15 December 2025).
Ezeobi, U.; Hounsinou, S.; Olufowobi, H.; Zhuang, Y.; Bloom, G. MCFICS: Model-based Coverage-guided Fuzzing for Industrial Control System Protocol Implementations. In Proceedings of the IECON 2024-50th Annual Conference of the IEEE Industrial Electronics Society; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning (ICML); PMLR: New York, NY, USA, 2017; pp. 214–223. Available online: https://www.mlmi.eng.cam.ac.uk/files/gong_dissertation_reduced.pdf (accessed on 19 December 2025).
Luo, Z.; Zuo, F.; Jiang, Y.; Gao, J.; Jiao, X.; Sun, J. Polar: Function code aware fuzz testing of ics protocol. ACM Trans. Embed. Comput. Syst. (TECS) 2019, 18, 93. [Google Scholar] [CrossRef]
Wang, J.; Chen, B.; Wei, L.; Liu, Y. Superion: Grammar-aware greybox fuzzing. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE); IEEE: New York, NY, USA, 2019; pp. 724–735. [Google Scholar] [CrossRef]
Shang, Z.; Garbelini, M.E.; Chattopadhyay, S. U-fuzz: Stateful fuzzing of iot protocols on cots devices. In Proceedings of the 2024 IEEE Conference on Software Testing, Verification and Validation (ICST); IEEE: New York, NY, USA, 2024; pp. 209–220. [Google Scholar] [CrossRef]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Zong, X.; Luo, W.; Ning, B.; He, K.; Lian, L.; Sun, Y. DiffusionFuzz: Fuzzing framework of industrial control protocols based on denoising diffusion probabilistic model. IEEE Access 2024, 12, 67795–67808. [Google Scholar] [CrossRef]
Nichol, A.Q.; Dhariwal, P. Improved denoising diffusion probabilistic models. In Proceedings of the International Conference on Machine Learning; PMLR: New York, NY, USA, 2021; pp. 8162–8171. Available online: https://www.academia.edu/72572601/Improved_Denoising_Diffusion_Probabilistic_Models (accessed on 17 December 2025).
Natella, R.; Pham, V.T. Profuzzbench: A benchmark for stateful protocol fuzzing. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, Aarhus, Denmark, 11–17 July 2021; pp. 662–665. [Google Scholar] [CrossRef]
Al Sardy, L.; Prasad, A.R.; German, R. i7Fuzzer: Neural-Guided Fuzzing for Enhancing Security Testing of Stateful Protocols. In Proceedings of the International Conference on Computer Safety, Reliability, and Security; Springer Nature: Cham, Switzerland, 2025; pp. 115–128. [Google Scholar] [CrossRef]
Pereyda, J. Boofuzz: Network Protocol Fuzzing for Humans. GitHub. 2016. Available online: https://github.com/jtpereyda/boofuzz (accessed on 17 December 2025).
Tsankov, P.; Dashti, M.T.; Basin, D. SECFUZZ: Fuzz-testing security protocols. In Proceedings of the 2012 7th International Workshop on Automation of Software Test (AST); IEEE: New York, NY, USA, 2012; pp. 1–7. [Google Scholar] [CrossRef]
Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. arXiv 2020, arXiv:2010.02502. [Google Scholar] [CrossRef]
Godefroid, P.; Peleg, H.; Singh, R. Learn&fuzz: Machine learning for input fuzzing. In Proceedings of the 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE); IEEE: New York, NY, USA, 2017; pp. 50–59. [Google Scholar] [CrossRef]
Lan, J.; Chen, C.; Cai, J.; Ming, X.; Li, M.; Wang, Y.; Zhang, Y.; Song, Y. ConDiffFuzz: Dependency-Aware Consistency Checking for Differential Fuzzing of Industrial Control Protocol Implementations. Electronics 2026, 15, 1324. [Google Scholar] [CrossRef]
Chen, L.; Wang, Y.; Xiang, X.; Jin, D.; Ren, Y.; Zhang, Y.; Pan, Z.; Chen, Y. Txl-fuzz: A long attention mechanism-based fuzz testing model for industrial iot protocols. IEEE Internet Things J. 2024, 11, 38238–38245. [Google Scholar] [CrossRef]

Figure 1. Overview of the SD-Fuzz stateful fuzzing framework.

Figure 2. Modbus/TCP message format. The dashed lines illustrate the encapsulation of the Modbus/TCP Application Data Unit (ADU) within the data payload of the underlying TCP frame.

Figure 3. Mean test case recognition rate (TCRR) under a fixed number of fuzzing test cases across 5 independent runs.

Figure 4. Mean test case recognition rate (TCRR) corresponding to test cases generated within a fixed one-hour window across 5 independent runs.

Figure 5. Variation curve of mean generated message diversity (DGD) over training epochs across 5 independent runs.

Figure 6. Mean HMM-inferred state transition coverage versus testing time across 5 independent runs.

Table 1. Common Modbus/TCP function codes used in the experiments.

Function Code (Hex)	Description	Operation Type
01	Read Coils	Bit Read
02	Read Discrete Inputs	Bit Read
03	Read Holding Registers	Word Read
04	Read Input Registers	Word Read
05	Write Single Coil	Bit Write
06	Write Single Register	Word Write
15	Write Multiple Coils	Bulk Bit Write
16	Write Multiple Registers	Bulk Word Write

Table 2. Detailed experimental data comparison of SD-Fuzz and baselines (values represent the mean of 5 independent runs).

Test Model	Test Target	CTA (Categories)	NTA (Total Count)	ATITA (h)	TNR (%)	TPR (%)	ATE (Times/h)
SD-Fuzz	MobusRSSim	7	225	0.60	95.28	94.15	1.48
	ModbusSlave	5	68	1.99	94.72	93.88	1.12
	xMasterSlave	6	85	1.59	95.05	94.02	1.35
WGGFuzz	MobusRSSim	6	198	0.74	93.36	92.81	1.31
	ModbusSlave	4	57	2.58	92.54	92.39	0.99
	xMasterSlave	5	70	2.10	93.10	92.65	1.23
Peach Fuzzer	MobusRSSim	4	92	1.83	85.42	89.76	0.68
	ModbusSlave	3	32	5.25	84.39	88.91	0.45
	xMasterSlave	4	48	3.50	85.87	89.45	0.58
AFLNet	MobusRSSim	5	105	1.45	91.23	91.78	0.92
	ModbusSlave	4	45	3.38	90.65	91.12	0.72
	xMasterSlave	5	58	2.62	91.47	91.54	0.86
DiffusionFuzz	MobusRSSim	5	112	1.01	92.85	92.10	0.98
	ModbusSlave	4	52	2.17	91.92	91.87	0.84
	xMasterSlave	5	67	1.69	92.48	92.03	0.96

Table 3. Statistics on exceptions triggered by SD-Fuzz (values represent the mean of 5 independent runs).

Exception Type	Frequency (Counts)	ATITA (h)	Target Count	Vulnerability Nature
Slave crash	38	0.36	6	Denial of Service
Station ID offline	92	0.15	5	Logic Error
Abnormal function code	58	0.23	6	Input Validation
Window auto-close	45	0.30	6	Severe Crash
Data length unmatched	83	0.16	6	Format Error
Abnormal address	52	0.26	5	Boundary Check
Integer overflow	12	1.13	5	Numerical Error
Protocol state violation	27	0.50	3	Logic/Timing
Buffer overflow	8	1.69	1	Memory Corruption

Table 4. Mean number of exceptions triggered by different methods in the 8 h vulnerability mining campaign (across 5 independent runs).

Model	Number of Test Cases	Target Application	Number of Exceptions
SD-Fuzz	200,000	Modbus RSSIM2 v8.21.2.7	146
		Modbus Slave v6.1.3	66
		Modbus Poll v7.0.1	57
Peach Fuzzer	200,000	Modbus RSSIM2 v8.21.2.7	61
		Modbus Slave v6.1.3	42
		Modbus Poll v7.0.1	45
WGGFuzz	200,000	Modbus RSSIM2 v8.21.2.7	82
		Modbus Slave v6.1.3	51
		Modbus Poll v7.0.1	41
TXL-Fuzz [28]	200,000	Modbus RSSIM2 v8.21.2.7	98
		Modbus Slave v6.1.3	57
		Modbus Poll v7.0.1	41

Table 5. Performance after migration to other protocols.

Protocol	Training Time	TCRR	State Coverage	Example Bug Found
EtherCAT	4.5 h	89.4%	48.7%	Heap overflow on master timeout
DNP3	5.2 h	90.2%	47.3%	Null pointer after lost link confirm

Table 6. Ablation results (Modbus Slave target, 270 k test cases).

Configuration	TCRR	State Coverage
SD-Fuzz (full)	91.3%	50.1%
– DDPM	79.2%	0.0%
– HMM state guidance	88.6%	0.0%
– DMM lightweight mutation	91.1%	34.4%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tang, H.; Zhang, Z.; Zhao, K.; Liang, Z. SD-Fuzz: A State-Aware Industrial Control Protocol Fuzzing Framework Based on Diffusion Models. Electronics 2026, 15, 2156. https://doi.org/10.3390/electronics15102156

AMA Style

Tang H, Zhang Z, Zhao K, Liang Z. SD-Fuzz: A State-Aware Industrial Control Protocol Fuzzing Framework Based on Diffusion Models. Electronics. 2026; 15(10):2156. https://doi.org/10.3390/electronics15102156

Chicago/Turabian Style

Tang, Hao, Zhiyong Zhang, Kejing Zhao, and Zhi Liang. 2026. "SD-Fuzz: A State-Aware Industrial Control Protocol Fuzzing Framework Based on Diffusion Models" Electronics 15, no. 10: 2156. https://doi.org/10.3390/electronics15102156

APA Style

Tang, H., Zhang, Z., Zhao, K., & Liang, Z. (2026). SD-Fuzz: A State-Aware Industrial Control Protocol Fuzzing Framework Based on Diffusion Models. Electronics, 15(10), 2156. https://doi.org/10.3390/electronics15102156

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SD-Fuzz: A State-Aware Industrial Control Protocol Fuzzing Framework Based on Diffusion Models

Abstract

1. Introduction

1.1. Security Situation and Challenges of Industrial Control Systems

1.2. Main Limitations of Existing Techniques

1.3. Motivations and Contributions of This Work

2. Materials and Methods

2.1. Overall Architecture

2.2. Traffic Capture and Preprocessing

2.3. Core Generation Module: State-Aware Discrete Denoising Diffusion

2.3.1. Discrete DDPM for Single Protocol Messages

2.3.2. Hidden Markov Model for State Inference and Online Evolution

2.3.3. State-Aware Sequence Generation Algorithm

2.4. Lightweight Mutation, Communication, and Monitoring Modules

2.4.1. Lightweight Mutation Module (DMM)

2.4.2. Data Sending and Receiving Module (DSRM)

2.4.3. System Listening Module (SLM)

3. Results

3.1. Experimental Setup

3.1.1. Hardware and Software Environment

3.1.2. Test Targets

3.1.3. Target Protocol and Data Format

3.1.4. Dataset Construction

3.1.5. Comparative Baseline Methods

3.2. Evaluation Metrics

3.3. Main Experimental Results

3.3.1. Single-Message Legitimacy (TCRR)

3.3.2. Anomaly Triggering Efficiency (ATE) Comparison

3.3.3. Generation Diversity (DGD)

3.3.4. State Transition Coverage

3.3.5. Vulnerability Mining Results

3.3.6. Vulnerability Case Studies

3.3.7. Portability and Protocol Migration

3.4. Ablation Study

3.5. Threats to Validity

4. Discussion

4.1. Performance and State Awareness

4.2. Generalization Across Heterogeneous Protocols

4.3. Critical Analysis of Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI