ECHO: Enhancing Linux Kernel Fuzzing via Call Stack-Aware Crash Deduplication

Tao, Shuoyu; Zhang, Baoju; Zhang, Qiang

doi:10.3390/electronics14142914

Open AccessArticle

ECHO: Enhancing Linux Kernel Fuzzing via Call Stack-Aware Crash Deduplication

by

Shuoyu Tao

¹,

Baoju Zhang

^1,* and

Qiang Zhang

²

¹

College of Electronic and Communication Engineering, Tianjin Normal University, Tianjin 300387, China

²

College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(14), 2914; https://doi.org/10.3390/electronics14142914 (registering DOI)

Submission received: 13 June 2025 / Revised: 14 July 2025 / Accepted: 18 July 2025 / Published: 21 July 2025

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

Fuzz testing plays a key role in improving Linux kernel security, but large-scale fuzzing often generates a high number of crash reports, many of which are redundant. These duplicated reports burden triage efforts and delay the identification of truly impactful bugs. Syzkaller, a widely used kernel fuzzer, clusters crashes using instruction pointers and sanitizer metadata. However, this heuristic may misgroup distinct issues or split similar ones caused by the same root cause. To address this, we present ECHO, a lightweight call stack-based deduplication tool that analyzes structural similarity among kernel stack traces. By computing the longest common subsequence (LCS) between normalized call stacks, ECHO groups semantically related crashes and improves post-fuzzing analysis. We integrate ECHO into the Syzkaller fuzzing workflow and use it to prioritize inputs that trigger deeper, previously untested kernel paths. Evaluated across multiple Linux kernel versions, ECHO improves average code coverage by 15.2% and discovers 20 previously unknown bugs, all reported to the Linux kernel community. Our results highlight that stack-aware crash grouping not only streamlines triage, but also enhances fuzzing efficiency by guiding seed selection toward unexplored execution paths.

Keywords:

vulnerability detection; operating system; fuzz testing; crash analysis

1. Introduction

Ensuring the security and stability of the Linux kernel is essential for operating system reliability. As the core component responsible for managing system resources and interacting with hardware, the Linux kernel is a critical focus for bug discovery and vulnerability research. In recent years, fuzz testing [1,2,3,4,5] has emerged as a powerful approach to automatically uncovering kernel bugs, with Syzkaller [6] standing out as one of the most effective tools for syscall-aware, coverage-guided fuzzing.

However, the effectiveness of large-scale fuzzing campaigns can be hindered by the vast number of crash reports produced. Each test input that causes a kernel failure results in a crash log, many of which are generated by the same root cause or minor differences in execution context. Without careful deduplication, this can lead to repetitive debugging work, inflated bug counts, and inefficient use of developer effort. Such redundancy not only burdens maintainers and security analysts but also delays the triage and resolution of genuinely distinct and high-impact vulnerabilities.

To manage this challenge, Syzkaller adopts a lightweight deduplication strategy based on the faulting instruction pointer (IP) and runtime metadata from tools such as KASAN [7] and KCSAN [8]. This approach is computationally efficient and easy to apply across diverse kernels and environments. However, it is inherently limited in resolution. Minor context differences, such as interrupt-driven changes to stack frames or nondeterministic scheduling, can cause crashes with the same root cause to appear distinct. Conversely, different bugs may be grouped together if they result in similar faults near the same instruction pointer.

In this paper, we present ECHO, a call stack-based crash deduplication tool that improves Syzkaller’s crash grouping accuracy by analyzing the structural similarity of kernel stack traces. Our insight is that kernel call stacks offer a semantically richer view of runtime behavior than single-instruction pointers. By leveraging the sequence and structure of function calls, we can better determine whether two crash reports stem from the same underlying issue.

ECHO applies a lightweight analysis that normalizes stack traces and computes their longest common subsequence (LCS) to evaluate similarity. Crash reports are then grouped by shared call patterns, making the approach resilient to variations in the top stack frames caused by asynchronous behavior like interrupts or scheduling. Unlike ML-based clustering methods, ECHO does not rely on symbol resolution, training data, or kernel-specific embeddings. This makes it practical, portable, and broadly applicable across different Linux kernel versions and compilation environments.

We further integrate ECHO into the Syzkaller fuzzing loop to enable online deduplication and provide feedback to the input scheduler. When ECHO detects that a new crash does not match previously clustered faults, it triggers prioritization of the corresponding seed input. This encourages deeper exploration into previously uncovered or semantically distinct execution paths, improving the diversity and coverage of the fuzzing campaign.

Our evaluation spans four major versions of the Linux kernel and targets complex and high-variability subsystems such as networking and file system. The results show that ECHO improves average code coverage by 15.2%, significantly reduces redundant crash reports, and leads to the discovery of 20 previously unknown bugs. All discovered vulnerabilities have been responsibly disclosed to the Linux kernel community.

In summary, this paper makes the following contributions:

We propose a call stack-based crash deduplication approach that improves grouping accuracy compared to Syzkaller’s IP-based heuristic by analyzing structural similarity in stack traces.
We implement ECHO as a lightweight and effective crash analysis tool, enabling more accurate crash clustering and feedback-driven fuzzing.
We release ECHO is available at: https://github.com/rushwow/ECHO, accessed on 13 June 2025. It demonstrates effectiveness through real-world kernel fuzzing, where it discovered 20 new bugs and improved average coverage by 15.2%.

The rest of this paper is organized as follows. Section 2 introduces the related work and motivation. Section 3 introduces methodology of ECHO. Section 4 introductions the implementation of the ECHO. Section 5 shows our evaluation and compares ECHO to other tool. Section 6 makes a discussion. Section 7 concludes this paper.

2. Related Work and Motivation

2.1. Kernel Fuzzing

Kernel fuzzing has become a cornerstone in uncovering security vulnerabilities within operating system kernels, particularly the Linux kernel. Among the many fuzzers developed for this task, Syzkaller [6] has emerged as a leading solution, leveraging system call awareness and coverage-guided fuzzing to discover a vast number of bugs. Its ability to automatically generate syntactically valid syscall sequences has proven effective in exposing flaws in device drivers, memory subsystems, and network stacks. Trinity [9] and kAFL [10] further explore various syscall fuzzing strategies and performance optimizations.

To further improve fuzzing efficiency, recent work incorporates static and dynamic analyses. Moonshine[11] distills real-world execution traces into seed programs, while HFL [12] employs symbolic execution to navigate rarely triggered paths. Healer [13] uses relation learning over syscall sequences to enrich semantic depth. Similarly, KSG [14] leverages eBPF [15] to infer syscall argument constraints, generating valid Syzlang[16] programs. These approaches collectively enhance Syzkaller’s ability to explore complex kernel behaviors.

Complementing the syscall-centric strategies, SyzScope[17] explores vulnerability reachability based on prior kernel versions, while UniFuzz[18] uses kernel state abstraction to focus fuzzing on semantically rich regions. Both techniques aim to improve semantic reasoning and exploitability analysis, aligning with our goal of deeper triage. StackMine [19] and S3M [20] contribute to crash trace analysis. StackMine uses statistical pattern mining over kernel stack traces, whereas S3M embeds function-level semantics into vector representations to reveal deeper crash similarities. These ideas inspire ECHO’s semantically informed deduplication strategy.

2.2. Crash Deduplication Strategies

Despite progress in fuzzing, a key limitation remains: the proliferation of redundant crash reports. Syzkaller uses faulting instruction pointers and KASAN metadata for lightweight deduplication, which struggles to differentiate semantically equivalent crashes. While Moonshine and kAFL improve seed generation and execution performance, they lack mechanisms for post-crash triage. This motivates the need for more principled and semantic-aware deduplication methods.

ECHO integrates call stack–aware deduplication into the fuzzing loop. It analyzes structural similarities in stack traces to suppress redundant crashes early, thus enhancing seed prioritization. Kernel stack traces are often noisy due to asynchronous behavior—unlike the stable stack hashes seen in AFL [21]—and may include context switches or IRQ handlers that obscure true crash origins. This necessitates a more robust strategy that interprets kernel-specific stack patterns beyond surface-level features.

Crash deduplication clusters similar crash instances to reduce developer burden. In user-space testing, common strategies compare crash signatures such as program counters or simplified stack traces. Tools like CRASHSCOPE [22] and CrashFinder [23] apply these techniques to identify duplicate bugs. In contrast, kernel fuzzing systems rarely use stack traces systematically. StackMine analyzes kernel stack patterns statistically but lacks semantic modeling. Distance-based approaches like edit distance or LCS are also challenged by the variability and noise in kernel logs.

Machine learning methods such as siamese networks [24] and neural models like StackDedup [25] have been used for crash clustering, mainly in user-space. However, their reliance on large training data and consistent naming conventions makes them less applicable to kernel contexts. Furthermore, their embedding-based models lack interpretability—an essential property for debugging and patch validation. ECHO instead uses a structure-aware approach that combines normalized stack traces, sequence similarity, and fuzzing feedback. Free from heuristics and opaque embeddings, it generalizes across kernel versions. By integrating deduplication into fuzzing scheduling, ECHO improves both triage accuracy and coverage depth.

2.3. Motivation

Despite the success of kernel fuzzers such as Syzkaller in uncovering critical security vulnerabilities, their practical value is often diminished by the substantial number of redundant crash reports. This redundancy stems from the fuzzer’s inherent randomness and the complexity of the Linux kernel’s control flow, which often leads to multiple syntactically different crash traces that are semantically equivalent. Without a reliable deduplication mechanism, developers and security analysts are left to manually triage thousands of similar bugs, significantly increasing analysis cost and decreasing the overall effectiveness of fuzzing campaigns.

To tackle this issue, we propose ECHO, a fuzzing tool that integrates a robust call stack-based crash deduplication system directly into the fuzzing loop. ECHO treats kernel stack traces as structured execution artefacts and groups crashes based on structural and contextual similarities within their call stacks. This call stack-aware perspective allows the system to distinguish genuine bug diversity from superficial variations, thereby increasing triage efficiency and guiding the fuzzer away from repeatedly exercising already-explored fault paths. To illustrate the motivation behind our approach, consider two different crash traces observed during kernel fuzzing, as shown in Figure 1. At first glance, two crash stack traces may seem unrelated—they are triggered through different APIs and show different top frames. However, both may result from the same root cause, such as a use-after-free in the ip6_fragment function. Existing heuristics in Syzkaller, which rely mainly on the faulting instruction pointer, often treat such cases as distinct due to small differences in context, leading to unnecessary duplication in bug analysis. ECHO takes a different approach by comparing the overall structure of the call stacks. Instead of depending on specific frames or fixed positions, it analyzes common call patterns using the longest common subsequence (LCS). This method is more robust against noise from interrupts or helper functions and better captures the semantic similarity between crashes. As fuzzing continues, ECHO dynamically adjusts its similarity threshold to group crashes more effectively—starting broad to reduce early noise and becoming more precise as more data accumulates. This allows Syzkaller to focus on discovering new and meaningful bugs, rather than repeatedly analyzing similar crashes caused by the same issue.

Designing such a deduplication strategy poses several technical challenges. Kernel stack traces are often unstable due to preemption, inlining, and hardware interrupt behavior, all of which can introduce noncanonical frames. Moreover, symbol names may be cryptic or vendor-specific, requiring normalization and symbolic resolution to ensure consistency. These complexities demand a lightweight but accurate pipeline for stack parsing, normalization, and comparison. ECHO addresses these issues through a carefully designed multi-stage pipeline that sanitizes traces, extracts canonical frame sequences, and computes similarity using a hybrid metric based on longest common subsequence (LCS) and coverage feedback. No pretrained semantic models are used—ECHO instead relies on symbolic structure and execution feedback for efficient and transparent deduplication. The crash clustering results produced by ECHO are further integrated into the fuzzing loop to prioritize mutation of seeds that generate novel call paths or lead to crashes associated with previously unseen clusters. This feedback loop ensures that exploration is continually driven toward unexplored behaviors and untriggered kernel paths. Compared with prior fuzzers that rely purely on coverage-based feedback, ECHO provides an orthogonal axis of differentiation based on fault diversity, which significantly improves fuzzing efficiency and reduces analyst workload.

3. Methodology

3.1. Overview Design

We propose ECHO, a kernel fuzzer that incorporates call stack-based crash deduplication to optimize the fuzzing process, enhancing both the quality of the input corpus and the overall code coverage. As depicted in Figure 2, ECHO operates through a two-stage pipeline: Call Stack Deduplication and Corpus Refinement. These stages are strategically designed to complement each other, ensuring that redundant crashes are minimized, while allowing the fuzzer to focus on novel and deeper kernel code paths.

The first stage, Call Stack Deduplication, aims to reduce noise in the fuzzing process by grouping similar crash reports together based on call stack analysis. This deduplication process ensures that the fuzzer does not waste resources on revisiting already discovered bugs. Instead, it enables ECHO to refine its focus on new crash scenarios, enhancing the overall bug detection efficiency. By accurately distinguishing between distinct kernel faults, ECHO accelerates the identification of unique vulnerabilities, ensuring faster feedback and more productive fuzzing cycles.

In the second stage, Corpus Refinement, ECHO builds upon the deduplication results to continuously improve the seed corpus. Using feedback from previous fuzzing cycles, it refines the inputs by prioritizing those that trigger new code paths or provide unique crash information. This iterative process ensures that the fuzzer continues to explore deeper kernel logic and uncover bugs that might otherwise remain hidden. Together, these two stages form a complete workflow, wherein the fuzzer’s focus shifts from redundant crashes to new, complex vulnerabilities, ultimately increasing both the breadth and depth of kernel code coverage.

To better illustrate the modular structure of ECHO and its integration with existing fuzzing and monitoring infrastructure, we provide a component-level architecture diagram in Figure 3. This figure complements the high-level workflow in Figure 2 by detailing how the internal modules of ECHO interact with the underlying kernel fuzzing framework and runtime instrumentation components. As shown in Figure 3, ECHO builds upon a Syzkaller-based fuzzing engine, where the input generation and execution environment are inherited from Syzkaller’s infrastructure. Test cases are executed in a QEMU-based virtual machine, with KCOV collecting coverage information and KASAN detecting memory violations. These runtime signals, along with crash logs, are passed to the deduplication module of ECHO, which normalizes the stack traces and clusters crash instances based on structural similarity. The resulting clusters, combined with coverage feedback, are used to guide the prioritization of future test inputs. This componentized design enables ECHO to introduce a lightweight and effective deduplication strategy, while remaining compatible with existing kernel fuzzing pipelines.

3.2. Call Stack Deduplication

The first phase of ECHO, Call Stack Deduplication, is designed to enhance the fuzzing process by reducing redundant crash reports, improving the seed corpus quality, and accelerating bug discovery. This phase focuses on generating high-quality system call inputs, executing them in a virtualized kernel environment, parsing resulting crashes, and performing call stack-based deduplication. The process works by distinguishing between unique crashes—which manifest distinct faulting conditions and root causes—and repeated symptoms, which are structurally similar to previously seen traces and likely stem from the same underlying defect, ensuring that the fuzzer focuses on new, unexploited code paths. There are two types of crashes, outlined as follows: (1) unique crashes, which represent new bugs with different causes, and (2) repeated crashes, which look similar to previous ones and usually come from the same bug. By filtering out repeated crashes, the system can better concentrate on discovering truly new problems in the kernel. The Call Stack Deduplication process can be divided into the following three main stages: Input Generation and System Execution, Crash Parsing, and Stack Deduplication.

(1): Input Generation and System Execution

In this phase, input syscalls are generated using a customized version of the Syzkaller syscall fuzzer. Our customized version enhances the default functionality by introducing state-aware syscall sequence generation, syscall replay features for debugging, and improved support for concurrent syscall injections via multi-threaded execution. The modified fuzzer also incorporates additional constraints to avoid known kernel hangs and to emphasize code regions uncovered in previous executions. This modified fuzzer creates system call sequences that include basic operations such as file descriptor manipulation, network interactions, memory management, and other typical kernel operations. The goal is to create syntactically valid, diverse, and comprehensive syscall sequences that cover a broad spectrum of kernel behaviors.

These generated inputs are executed in a virtualized kernel environment, leveraging QEMU/KVM [26] to simulate kernel behavior. During execution, each syscall is monitored using instrumentation tools like KCOV and KASAN, which capture important runtime metadata such as code coverage, memory corruption, and crash occurrences. The KCOV tool is specifically used to gather kernel code coverage data by tracking which code paths are executed during fuzzing, while KASAN is used to detect memory errors such as buffer overflows and use-after-free vulnerabilities.

By analyzing these data, ECHO adjusts its input selection strategy in a feedback-driven manner. Specifically, inputs that trigger novel kernel behaviors, or cover previously unexplored code paths, are prioritized for further mutation. The feedback-driven generation mechanism ensures that the fuzzer evolves based on coverage and crash results, guiding the input mutation towards areas of the kernel that have not been sufficiently explored.

This phase incorporates a weighted mutation scheme, where higher priority is given to inputs that modify syscall parameters likely to trigger kernel vulnerabilities, based on previous crash data or known kernel weaknesses. The weighted mutation scheme uses heuristics such as previous crash patterns and call path analysis to ensure that the generated inputs evolve quickly and target high-value paths. This strategy ensures that the fuzzer efficiently evolves the input corpus, targeting high-value paths and accelerating the discovery of novel kernel bugs.

(2): Crash Parsing

Crash logs are collected whenever a crash occurs during the fuzzing process. These logs typically contain a wealth of information, including the stack trace, faulting address, exception type (e.g., page fault and null dereference), register values, code context, and kernel tainted flags. However, not all parts of the crash log are relevant for effective deduplication. In this step, ECHO focuses on extracting the most pertinent crash information, particularly the stack trace, which is key to identifying whether a crash is new or already encountered.

The first step in crash parsing is the normalization of crash logs. Raw logs are processed to remove irrelevant details like timestamps, specific CPU registers, and minor fluctuations in the execution state, all of which are not crucial for identifying root causes. Once normalized, the crash logs are parsed to extract the stack trace, which contains a sequence of function calls leading to the crash. Stack trace normalization ensures that the comparison between different crash reports is consistent. This process removes irrelevant data such as address spaces or module-specific addresses, focusing instead on the function call sequences that are shared across different kernel versions or module variations. The key functions in the stack are normalized using kernel symbols from the /proc/kallsyms file or debug symbols (when available). This makes it possible to match function names consistently and detect commonalities across different crash logs.

In this process, ECHO also identifies the key variables in the stack trace that are most likely to correlate with crash causes, such as memory corruption, invalid pointer dereferencing, and others. To further ensure accuracy during stack normalization, ECHO incorporates specialized cleaning strategies for non-standard stack frames, including those generated by inline functions and interrupt contexts.

For inline functions, which are often expanded into their callers and thus not represented as separate frames in the raw stack trace, we identify such inlined symbols by cross-referencing kernel DWARF debug information (when available) or leveraging inlining hints extracted from symbol metadata. In the absence of precise debug symbols, heuristics are used based on known inlined function patterns (e.g., ‘kfree’, ‘netif_rx’) and typical stack trace truncation behaviors. Interrupt and exception handling frames (e.g., ‘__irq_svc’, ‘irq_exit_rcu’, ‘do_softirq’, ‘asm_common_interrupt’) introduce transient, non-deterministic entries that do not contribute directly to crash causality. To eliminate their impact, we maintain a noise frame list

F_{n o i s e}

and apply a filtering rule:

S_{c l e a n} = S_{r a w} - F_{n o i s e}

where

S_{r a w}

is the original raw stack trace, and

S_{c l e a n}

is the filtered version used in downstream comparison. This rule ensures that stack alignment remains robust against scheduling and hardware-induced noise.

Additionally, to avoid bias introduced by deeply nested frames that do not influence fault proximity, we define a weighted positional score for each function frame

f_{i}

as:

w (f_{i}) = \frac{1}{1 + depth (f_{i})} \cdot 1_{f_{i} \notin F_{n o i s e}}

where

depth (f_{i})

indicates the position from the top of the stack, and

1

is the indicator function filtering out noise frames. This score is later used during vector encoding to emphasize fault-proximal frames in deduplication. This refined crash data can now be used for accurate clustering in the subsequent deduplication phase.

(3): Stack Deduplication

Following crash parsing, we perform stack-based crash deduplication. Each crash is analyzed based on its kernel stack trace. The goal of this step is to group similar crashes into clusters, each representing a distinct bug, thus avoiding the redundancy caused by crashes triggered by similar kernel paths.

ECHO utilizes a stack trace comparison algorithm that combines Longest Common Subsequence (LCS) and Cosine Similarity. This hybrid approach ensures that both the sequence of function calls and their semantic similarities are taken into account. The traces are considered to have “high similarity” if their LCS score exceeds 0.7 (normalized to stack length) and Cosine Similarity of vectorized function encodings is above 0.8. Conversely, traces are classified as “sufficiently different” if either metric falls below 0.5. This method also incorporates an adaptive threshold, which adjusts the clustering tolerance based on the number of crashes and the size of the corpus.

The stack traces are first normalized and then encoded using a hybrid representation that combines function signatures, stack depth, and positional weighting. Specifically, each stack trace is mapped to a vector of symbolic tokens representing kernel functions, ordered by call depth. Functions appearing in higher positions (closer to the fault) are assigned greater weight, while known noise functions (e.g., schedule(), irq_exit()) are assigned negligible influence. This encoding ensures that the comparison is not affected by minor variations such as different memory addresses or non-essential kernel operations, focusing instead on the core behavior that led to the crash.

3.3. Corpus Refinement

The deduplication algorithm operates by comparing each new crash stack trace against previously stored ones. If a stack trace closely resembles one in the database (based on a pre-set threshold), the crash is added to the existing cluster. If the crash is sufficiently different, it is grouped into a new cluster. The method ensures that each distinct bug is represented by a unique cluster, minimizing false positives and ensuring that new kernel faults are captured efficiently.

Crash logs are grouped based on stack trace similarity as shown in Algorithm 1. Initially, each crash log’s stack trace is extracted and hashed to generate a unique identifier. This identifier is then used to check if a similar crash log already exists in the cluster map. If a match is found, the crash log is added to the corresponding cluster; otherwise, a new cluster is created. This process involves analyzing each crash log independently. The function ExtractStack(c) extracts the stack trace from each crash log, while the HashStack(S) function computes the hash of the stack trace. If the computed hash (h) already exists in the crash cluster map (

G

), the crash log c is added to the existing group. If it does not exist, a new cluster is created for that particular stack trace. The result of this algorithm is a crash cluster map (

G

), which organizes the crash logs into distinct groups based on stack trace similarity. By grouping similar crashes, this approach reduces the redundancy in the fuzzing process and ensures that the fuzzing efforts focus on unique and potentially more insightful crashes, rather than repeatedly testing similar scenarios.

The second stage of ECHO is Corpus Refinement, which focuses on optimizing the fuzzing input corpus based on the feedback and results from the first phase. This stage ensures that fuzzing efforts are directed toward novel kernel paths while eliminating redundant or ineffective seeds, ultimately improving the efficiency of the fuzzing process.

After each fuzzing cycle, the system evaluates the effectiveness of the generated inputs by comparing newly triggered code paths with previously covered ones. To track coverage, ECHO uses KCOV, which provides detailed coverage data in bitmap format, recording executed basic blocks and functions during kernel execution. A key characteristic of this phase is its ability to focus on previously unexplored kernel paths. If a seed triggers new coverage, it is prioritized for further mutation. Conversely, seeds that do not trigger new paths are deprioritized, ensuring that computational resources are spent on areas of the kernel that have not been explored yet.

Algorithm 1: Adaptive stack clustering algorithm

In addition to coverage, the fuzzing loop incorporates a feedback mechanism that adjusts seed mutation based on previously observed kernel behaviors. Seeds with a higher likelihood of uncovering novel kernel states, as determined by coverage feedback and crash clustering results, are given a higher priority in the mutation queue. This feedback loop helps in maintaining a diverse and effective corpus, ensuring the fuzzer consistently explores deeper and more complex kernel logic.

3.4. Deduplication-Aware Feedback

To improve the quality of generated test inputs, ECHO refines the seed corpus using deduplication-aware feedback. This mechanism identifies and deprioritizes inputs that repeatedly trigger the same crash behavior, enabling the fuzzer to explore new and meaningful execution paths in the kernel.

The deduplication component is based on the call stack grouping results generated during the Call Stack Deduplication phase described in Section 3.2. These results cluster crash reports that share structurally similar stack traces. A seed is considered novel only if it triggers a crash group that has not previously been seen.

Each seed s is scored using a linear combination of branch coverage improvement and crash group novelty:

score = γ \cdot Δ_{e} + (1 - γ) \cdot I [g \notin G_{previous}]

. Here,

Δ_{e}

is the number of new edges covered, g is the crash group label returned by the deduplication module,

G_{previous}

is the set of previously encountered groups, and

I [\cdot]

is the indicator function.

The corpus refinement process in ECHO is detailed in Algorithm 2. For each seed

s \in S

, the algorithm collects branch coverage from its execution. It computes

Δ_{e}

, the number of new branches covered compared to the global map

M

. Simultaneously, the deduplication engine assigns a crash group g to the seed based on its stack trace similarity. If g is not in the previously seen crash group set

G_{previous}

, the crash is considered novel.

The score combines coverage gain and crash novelty using a weighted formula. The tunable parameter

γ

controls the balance between coverage and novelty. Higher-scoring seeds are retained for mutation.

This design helps ECHO maintain a focused fuzzing effort: avoiding inputs that repeat known behaviors and favoring those that either discover new code regions or lead to semantically distinct crashes.

Algorithm 2: Corpus Refinement with coverage and deduplication feedback

4. Implementation

We implemented ECHO primarily using Golang (version 1.23.8) and Python (version 3.10.12), integrating it into the Syzkaller fuzzing framework with minimal but targeted modifications. The system is structured to support automated crash log collection, call stack normalization, and deduplication-guided corpus refinement for Linux kernel versions v5.15, v6.1, v6.6, and v6.12. This section details how the deduplication methodology introduced in Section 3 is realized in practice. Specifically, Algorithm 1 is employed to perform stack trace abstraction during crash parsing, while Algorithm 2 handles clustering of normalized traces based on structural similarity. These algorithms are embedded into ECHO’s runtime pipeline, enabling real-time trace normalization and grouping during the fuzzing campaign.

To facilitate Input Generation and System Execution, we reuse the core fuzzing engine and corpus mutator from Syzkaller, with additional instrumentation to log metadata such as syscall execution traces and crash stack outputs. Kernel configurations are adjusted to enable KCOV and KASAN, ensuring proper feedback and sanitization reporting. All fuzzing is conducted in virtualized environments using QEMU/KVM with dedicated VMs for each kernel version. For the Crash Parsing component, we hook into Syzkaller’s crash reporting logic and introduce a lightweight post-processing daemon written in Python. This module collects crash logs from the QEMU serial output and extracts relevant stack traces. To ensure consistency, it removes noise introduced by dynamic offsets, interrupt handlers, and inline assembly frames, and transforms traces into a normalized representation that preserves logical call structure. The Stack Deduplication module is implemented as a separate Python service that receives normalized stacks and performs clustering based on both structural alignment and configurable similarity thresholds. We use a modified longest common subsequence (LCS) algorithm with weighting for deeper frames to improve sensitivity to semantic differences. This implementation corresponds to Algorithm 2, where similarity is computed using a hybrid of LCS and vector-based Cosine Similarity, and traces exceeding the adaptive threshold are grouped together. The deduplicated crash clusters are then labeled for seed triage.

To support Corpus Refinement, we extended Syzkaller’s executor feedback loop. Seeds that correspond to crash reports already covered by existing clusters are deprioritized, while those leading to new coverage or novel clusters are preserved and mutated further. The prioritization policy is enforced through a custom scoring function implemented within Syzkaller’s mutation scheduler. Finally, to ensure reproducibility and test coverage integrity, ECHO maintains experiment metadata including kernel configuration, deduplication logs, and execution timelines in JSON format. This facilitates regression analysis and debugging of false positives during post-fuzzing triage.

5. Evaluation

To evaluate the effectiveness of ECHO, we conducted a series of experiments on four representative Linux kernel versions: v5.15, v6.1, v6.6, and v6.12. These versions span different stages of kernel development and cover a range of active subsystems, providing a comprehensive basis for assessing fuzzing outcomes. All experiments were carried out on top of the Syzkaller fuzzing framework, using the same corpus, time budget, and configuration to ensure comparability.

We begin by evaluating whether ECHO improves the ability to identify previously unknown bugs that would otherwise be merged or overlooked by Syzkaller’s default deduplication logic. We compare the number of unique crash clusters discovered and analyze selected examples to understand the distinctions captured by our call stack-aware approach. We then examine whether this more accurate deduplication enables the fuzzer to reach deeper kernel execution paths. By feeding refined crash feedback back into the fuzzing loop, we observe improvements in code coverage, indicating that ECHO facilitates more effective corpus evolution. These results suggest that our method enhances both bug discovery and fuzzing efficiency when integrated into existing Syzkaller workflows.

5.1. Experimental Setup

We evaluate ECHO by integrating it into the Syzkaller kernel fuzzing infrastructure and comparing its performance against the default Syzkaller deduplication mechanism. For this evaluation, we selected four representative upstream Linux kernel versions: 5.15, 6.1, 6.6, and 6.12. These versions span recent kernel development cycles and include both long-term support (LTS) and mainline releases, ensuring diversity in subsystem evolution and code coverage opportunities.

All kernel versions used in our evaluation (v5.15, v6.1, v6.6, v6.12) were compiled from their publicly available source code repositories hosted at https://git.kernel.org. We used the gcc compiler (version 9.4.0) on Ubuntu 20.04 to compile each kernel with an identical configuration file. The KCOV [27] and KASAN features were enabled through the kernel’s .config settings by setting CONFIG_KCOV=y and CONFIG_KASAN=y, respectively. These are native instrumentation tools built into the Linux kernel infrastructure, not external tools requiring separate compiler integration. The configurations were generated using make defconfig followed by manual enabling of the relevant options via make menuconfig. We then compiled the kernels using make -jN and deployed them in QEMU-based virtual machines for fuzz testing. This setup ensures all instrumentation is integrated at compile time and activated at runtime without external patching or instrumentation pipelines.

The experiments were conducted in QEMU-based virtual machines, each configured with 2 CPU cores and 2 GB of memory to reflect standard Syzkaller deployment environments. To ensure fair and controlled comparisons, both ECHO and the baseline (vanilla Syzkaller) were provided with the same initial seed corpus and executed under identical conditions. Each kernel version was fuzzed independently for 24 h, with real-time logging of crash statistics and coverage traces. To mitigate the effects of fuzzing randomness and ensure reproducibility, each experiment was repeated five times, and we report the averaged results across runs. All experiments were conducted on a server running Ubuntu 20.04 with a 128-core CPU and 64 GB of RAM, providing sufficient resources to support concurrent fuzzing instances without interference.

5.2. Bug Detection Capabilities

To evaluate the bug discovery effectiveness of ECHO, we conducted fuzzing experiments on four Linux kernel versions—5.15, 6.1, 6.6, and 6.12—covering multiple development cycles and including representative updates across subsystems. Table 1 provides a summary of the vulnerabilities identified during the evaluation, including their subsystem locations, affected functions, and types.

Overall, ECHO discovered 20 previously undocumented kernel bugs, all of which have been confirmed by maintainers. These vulnerabilities are primarily distributed across the networking and file system subsystems—two critical and widely exercised components in modern kernel deployments. The identified bugs include null pointer dereferences, logic errors in error-handling code, and potential deadlocks triggered under specific conditions.

Notably, these bugs were not surfaced by default Syzkaller, even when both tools were run under the same kernel versions, system configurations, and fuzzing time budgets. This difference stems from the design of ECHO, which improves crash deduplication by analyzing complete call stacks rather than relying solely on top-frame crash signatures. By grouping crashes based on deeper structural similarities in their call stacks, ECHO avoids merging distinct issues that appear superficially similar, thereby enabling better coverage of semantic differences between bugs.

This more accurate clustering leads to two practical benefits: First, it reduces noise in the feedback loop by avoiding repeated triage of the same or similar crash reports. Second, it helps ensure that truly unique bugs are not overlooked due to overly broad grouping. Together, these effects improve the overall fuzzing workflow by accelerating the analysis of meaningful bugs and allowing the fuzzer to make progress toward new execution paths.

These results suggest that improving the quality of crash deduplication can have a significant positive impact—not only on reducing manual effort during triage, but also on increasing the likelihood of discovering previously unobserved kernel bugs.

5.3. Case Study

To further demonstrate ECHO’s bug detection capability, we use Bug #9 and #20 as case studies.

Bug Case 1. Listing 1 presents a task hang vulnerability in the Linux kernel’s networking subsystem, caused by acquiring the global nlk_cb_mutex before subsequently attempting to obtain the global rtnl_mutex within the same execution path. This bug resides in the netlink infrastructure, which supports asynchronous message dumps from the kernel to userspace via netlink sockets. When a dump request is issued (e.g., for listing devices or routes), it is handled by netlink_dump() (line 1), which first acquires nlk_cb_mutex to synchronize access to control block state (line 8). It then proceeds to invoke downstream handlers that may acquire rtnl_mutex through rtnl_lock() (line 19) for accessing or modifying routing state. If another thread holds rtnl_mutex and attempts to acquire nlk_cb_mutex, a circular wait condition occurs. This causes the dump task to enter an uninterruptible sleep (D state), resulting in a soft deadlock that can stall kernel threads and eventually trigger the hung task watchdog.

Listing 1. Lock Inversion in netlink_dump.

This bug is effectively triggered by netlink-based enumeration workloads that interact with the rtnetlink subsystem under high concurrency. Once a dump operation is initiated and attempts to access routing-related data structures, it passes through both nlk_cb_mutex and rtnl_mutex. If these locks are acquired in conflicting order across threads, a lock inversion arises, halting forward progress. In systems running routing daemons, network managers, or orchestrators that rely on netlink for topology updates, this bug can manifest as system-wide thread starvation and degraded network control-plane responsiveness.

Bug Case 2. Listing 2 presents a locking context violation in the Linux kernel’s memory allocator subsystem. The bug originates in get_page_from_freelist() (line 18), a core function responsible for retrieving pages during physical memory allocation. Internally, this function acquires the zone-level spinlock zone->lock through spin_lock_irqsave() (line 11), a locking primitive that disables local interrupts. However, under certain fast-path execution contexts—such as while holding pcp->lock or during task work teardown—blocking or nested locking is explicitly disallowed. The failure to respect these contextual constraints leads to a lock hierarchy violation detectable by the kernel’s lock validator subsystem (lockdep).

Listing 2. Unsafe zone->lock Acquisition in Page Alloc.

This bug is triggered in rare but valid interleavings involving socket release or deferred task work execution, where memory allocation is attempted while the thread already holds a per-CPU lock. When such allocation paths invoke get_page_from_freelist(), they indirectly request zone->lock without first unwinding the upper-layer locks. The resulting circular dependency or improper nesting can lead to scheduling anomalies or even system-wide stalls in high-concurrency environments, while no official fix has been committed upstream at the time of writing, addressing this issue will likely involve revising the allocator’s locking path to honor context-sensitive constraints and isolate deep allocations from incompatible preemption states.

5.4. Coverage Improvement

To assess the effectiveness of ECHO in exploring a broader range of execution paths and reaching deeper kernel states, we measured its branch coverage and compared the results against baseline Syzkaller across four Linux kernel versions: 5.15, 6.1, 6.6, and 6.12.

Table 2 summarizes the average branch coverage achieved by ECHO and Syzkaller over 24-h fuzzing sessions. On each kernel version, ECHO consistently achieves better results: 14.9% improvement on Linux 5.15, 12.4% on 6.1, 16.2% on 6.6, and 17.1% on 6.12, with an overall average improvement of 15.2%. These improvements reflect ECHO’s ability to eliminate redundant crash reports and help Syzkaller focus on more diverse inputs during corpus evolution.

As shown in Figure 4, ECHO demonstrates faster initial coverage growth and maintains a consistent advantage over time. The shaded regions in the plots represent the variance across trials, illustrating that ECHO maintains consistent coverage trends across multiple runs, indicating its stability in repeated fuzzing scenarios. Notably, the marked jump points (e.g., “Jump @ 2.8h in Linux Kenrel 5.15”) reflect significant shifts in coverage growth, resulting from the deduplication strategy’s ability to guide the fuzzer toward unexplored execution paths. ECHO demonstrates faster initial coverage growth and maintains a consistent advantage over time. The shaded regions in the plots represent the variance across trials, illustrating that ECHO maintains consistent coverage trends across multiple runs, indicating its stability in repeated fuzzing scenarios. Notably, the marked jump points (e.g., “Jump @ 2.8h in Linux Kernel 5.15”) correspond to moments where the deduplication feedback loop injects newly clustered crash feedback into the corpus Refinement. At these points, ECHO identifies semantically similar crash traces and removes redundant test cases, enabling the fuzzer to explore alternative branches that were previously overshadowed by frequent shallow crashes. These jumps are not arbitrary but arise after sufficient crash data have accumulated for the deduplication engine to take effect. For example, in Linux 6.12, the jump around 3.3 h aligns with the first large batch of deduplicated reports being pruned from the corpus, which then shifts the input mutation focus toward under-explored paths in deeper subsystems. The sharp increase in unique branches covered at these moments provides empirical evidence that deduplication-guided triage contributes directly to improved fuzzing efficiency.

In summary, the observed coverage gains are not solely due to prolonged execution but largely stem from improved feedback precision in the fuzzing loop. The call stack-based deduplication mechanism in ECHO filters out redundant crashes, reduces noise, and enables Syzkaller to better utilize its mutation budget, thereby discovering more diverse execution paths and increasing coverage.

5.5. Deduplication Statistics Analysis

To further evaluate the practical impact of ECHO’s crash deduplication strategy, we conducted a comprehensive statistical and structural analysis. This section presents quantitative breakdowns and qualitative illustrations to characterize the deduplication behavior and clustering structure exhibited during fuzzing.

Quantitative Cluster Comparison. Table 3 reports the number of raw crash logs generated across four different Linux kernel versions during 24-h Syzkaller fuzzing campaigns, alongside the number of crash clusters identified using Syzkaller’s built-in IP-based deduplication and ECHO’s stack-aware analysis.

Compared to Syzkaller, ECHO significantly reduces the number of crash clusters, while the number of raw crashes ranges between 10,863 and 12,274, Syzkaller produces approximately 1975 clusters on average. In contrast, ECHO reduces this to only 932 on average—representing a reduction of more than 52.8%. These results demonstrate that ECHO is able to collapse a large number of syntactically distinct but semantically redundant traces into more concise and accurate representations of unique bugs.

To further evaluate the quality of deduplication beyond cluster count reduction, we measure the false positive rate (FPR) and false negative rate (FNR) for both Syzkaller and ECHO. Here, false positives refer to incorrectly merged crashes from different root causes, while false negatives indicate crashes from the same root cause that were not grouped together. Based on a manually labeled subset of 100 crash traces from the v6.1 and v6.6 kernel runs, we compute the clustering outcome against ground truth clusters. As shown in Table 4, ECHO significantly reduces both FPR and FNR compared to Syzkaller. Syzkaller’s over-simplified IP-based clustering yields a higher false positive rate of 23.5% and misses 29.1% of ground truth equivalences. In contrast, ECHO maintains a much lower FPR of 8.2% and an FNR of 12.6%, demonstrating improved clustering precision and recall. These metrics reinforce that ECHO not only reduces redundant groupings but also aligns more closely with true semantic fault boundaries.

Comparison with Existing Approaches. To contextualize our approach within existing deduplication strategies, we provide a qualitative comparison of key features.

Table 5 contrasts our approach, ECHO, with two widely used baselines: (1) Syzkaller’s built-in IP-based deduplication, and (2) CrashFinder, a tool that clusters crashes based on stack trace similarity. Syzkaller’s method uses the top frame’s IP as the crash signature, providing only coarse-grained grouping without any structural normalization or inlined frame recovery. CrashFinder improves trace matching by comparing function names across entire traces, but still lacks support for recovering inlined call relationships or reasoning about control flow equivalence.

In contrast, ECHO incorporates frame normalization and inlined function recovery to build context-aware call-site chains, enabling fine-grained deduplication across diverse trace patterns. It also supports partial trace matching and is specifically tailored for kernel-level crash diagnostics, as opposed to general user-space tools. The deduplication granularity is significantly enhanced by constructing structural equivalence over execution context, not merely surface frame similarity.

Visualizing Cluster Reduction. To provide an intuitive understanding of deduplication effectiveness across kernel versions, we visualize the corresponding cluster statistics. Figure 5 presents a visual summary of the crash deduplication results across four Linux kernel versions. For each version, the figure reports: (1) the raw number of crashes, (2) the number of crash clusters reported by Syzkaller’s IP-based deduplication, and (3) the number of clusters obtained using ECHO. Figure 5 presents a visual summary of the crash deduplication results across four Linux kernel versions. For each version, the figure reports: (1) the raw number of crashes, (2) the number of crash clusters reported by Syzkaller’s IP-based deduplication, (3) the number of clusters obtained using the learning-based deduplication method S3M, and (4) the number of clusters obtained using ECHO.

The results show a consistent reduction in the number of clusters when applying ECHO’s method compared to Syzkaller, suggesting that ECHO consolidates crash reports more effectively while maintaining structural distinctions. These observations indicate that ECHO helps reduce noise without excessively collapsing semantically distinct crash behaviors. The results show a consistent reduction in the number of clusters when applying ECHO’s method compared to Syzkaller, suggesting that ECHO consolidates crash reports more effectively while maintaining structural distinctions, while S3M, as a representative learning-based deduplication technique, achieves fewer clusters than Syzkaller, its reliance on embedding similarity may cause occasional over-clustering, particularly when crashes share superficial lexical features. In contrast, ECHO achieves the lowest cluster counts across all kernel versions, indicating its stronger ability to preserve semantic separation while eliminating redundancy. These observations indicate that ECHO helps reduce noise without excessively collapsing semantically distinct crash behaviors.

Semantic Clustering via Stack Normalization. Beyond numeric reduction, ECHO’s strength lies in its structural understanding of crash traces. Figure 6 demonstrates a case study of two crashes that appear distinct under Syzkaller’s IP-based approach but are successfully merged by ECHO. Syzkaller only considers the instruction pointer (IP) of the top frame, resulting in superficial clustering. In contrast, ECHO normalizes the full call stack by pruning out noisy frames (e.g., syscall wrappers or allocator helpers), aligning deeper semantic call paths, and applying stack trace similarity metrics.

In the example, although the original traces differ in several superficial frames, ECHO correctly recognizes shared core functions like ip6_fragment and sock_sendmsg as critical crash roots, thus merging the two into a unified cluster. This structural call stack normalization avoids both under- and over-grouping, addressing common challenges in fuzzing triage.

Clustering Accuracy. In addition to reducing the number of crash clusters, we further evaluate the accuracy of ECHO’s clustering results using a manually curated ground truth dataset. Specifically, we collected 100 crash stack traces from Syzkaller’s output, spanning various kernel subsystems. Each trace was manually assigned to one of 15 ground truth clusters based on root-cause similarity, considering factors such as the faulting function, memory access pattern, and control flow structure.

We then applied both Syzkaller’s IP-based deduplication and ECHO’s stack-aware clustering to the dataset, and compared their results against the human annotations using two standard metrics: Adjusted Rand Index (ARI), which measures pairwise agreement between two clusterings, and Normalized Mutual Information (NMI), which captures the amount of shared information between them.

As shown in Table 6, ECHO achieves an ARI of 0.78 and an NMI of 0.81, while Syzkaller obtains lower scores of 0.36 and 0.42, respectively. These results suggest that ECHO more accurately groups semantically similar crashes, with clustering boundaries that better reflect the underlying fault contexts.

6. Discussion

While ECHO improves the semantic accuracy of crash deduplication, several limitations remain that open directions for future refinement.

Robustness Across Kernel Versions. Our evaluation spans four Linux kernel versions (v5.15, v6.1, v6.6, v6.12), each introducing incremental changes to the internal structure of drivers and subsystems. These structural updates—such as function inlining, code refactoring, or wrapper additions—can modify the observable stack trace shape while preserving the same root cause. Consequently, ECHO’s LCS-based matching may generate false negatives when comparing semantically identical crashes across different versions. To mitigate this, future versions of ECHO will integrate version-aware normalization techniques and symbol-based abstraction to better correlate logically equivalent traces.

Limitations in Handling Multi-Component Interactions. ECHO currently focuses on identifying structurally similar call stacks under the assumption of a single dominant fault path. In reality, many kernel crashes—particularly in complex modules such as e1000 or protocol subsystems—involve multi-component interactions including asynchronous callbacks, protocol layering, and dynamic memory interactions. These cases produce traces with multiple interleaved paths, reducing the effectiveness of sequential comparison. Future extensions may explore graph-based stack modeling and inter-procedural analysis to capture cross-module causality. Limitations in Handling Multi-Component Interactions. ECHO currently focuses on identifying structurally similar call stacks under the assumption of a single dominant fault path. In reality, many kernel crashes—particularly in complex modules such as e1000, netfilter, or protocol subsystems—are the result of multi-component interactions, such as asynchronous callbacks, layered protocol dispatching, shared memory use, or deferred resource release. These result in fragmented or interleaved call stacks that challenge traditional sequence-based clustering.

Future work will explore representing crash traces using graph-based models instead of linear call stack sequences. In such models, each function frame becomes a node, and edges represent control flow, data dependencies, or asynchronous triggers. This abstraction enables the capture of multi-component interactions, such as callbacks from network protocols into device drivers or memory deallocations propagated across subsystems. Unlike sequential matching, graph-based representations can tolerate differences in stack depth, order, and interleaving caused by concurrent execution or layered architecture.

Threats to Validity. Internal Validity. While we carefully controlled the fuzzing environment across kernel versions, some internal variations (e.g., execution scheduling or hardware interrupt timing) may still influence the structure of crash traces. We minimized such effects using consistent VM snapshots, deterministic seeds, and isolated system resources during fuzzing runs.

External Validity. The effectiveness of ECHO has been demonstrated on four mainstream Linux kernel versions. Although the method is generally portable, performance may vary on custom or downstream-maintained kernels. Future work will focus on generalizing the normalization strategy using kernel symbol abstraction and runtime metadata.

Construct Validity. Our deduplication results are measured based on structural diversity and semantic grouping. However, ground truth verification remains challenging. Manual inspection and cross-validation with CVE databases were employed to validate clustering precision.

Adaptability to Other Platforms. Although designed for Linux, ECHO’s architecture is portable and can be extended to support newer Linux versions and other operating systems. The normalization and clustering pipeline operates on call stacks and symbol metadata, which are concepts also available on platforms such as FreeBSD, Android, or Darwin-based systems. To support newer Linux versions, ECHO does not depend on version-specific assumptions and relies only on stack traces and symbol resolution. Our tool has already shown compatibility across kernel versions ranging from v5.15 to v6.12, and future support can be achieved by automatically ingesting symbol maps from ‘/proc/kallsyms’, DWARF debug info, or ‘vmlinux’ binaries. For significant version changes that affect function inlining or refactoring, our ongoing work includes incorporating semantic stack normalization to align equivalent frames. For non-Linux kernels, the main challenges lie in handling diverse binary formats, symbol relocation schemes, and calling conventions. For example, BSD kernels may use ELF with kernel-specific extensions, while macOS relies on Mach-O and distinct unwinding mechanisms. To address this, future versions of ECHO will include pluggable parsers for symbol table extraction, support for multiple debug formats (e.g., DWARF, STABS), and call stack interpreters that can be configured based on OS-specific stack frame layouts. Overall, although ECHO currently focuses on Linux, its modular pipeline—consisting of crash trace parsing, frame abstraction, and structure-based clustering—can be adapted with moderate effort to diverse system environments, provided basic stack and symbol information is available.

7. Conclusions

We presented ECHO, a call stack-based crash deduplication tool designed to improve the analysis of Linux kernel fuzzing results. Unlike Syzkaller’s instruction-pointer-based approach, ECHO compares stack trace structure to more accurately group related crashes. This method helps reduce redundant debugging efforts and reveals deeper kernel code paths that are often missed by IP-based heuristics. Integrated into Syzkaller’s fuzzing loop, ECHO guides the scheduler toward inputs that explore previously untested areas. Our evaluation showed that ECHO improves average code coverage by 15.2% and discovered 20 new bugs across several kernel versions. These findings demonstrate that using stack-based similarity is a practical and effective way to enhance fuzzing triage and kernel vulnerability discovery. While ECHO is designed for Linux kernel fuzzing, its underlying idea of call stack-based crash deduplication is general and may be applicable to other types of system software. In complex or concurrent systems, this approach can help organize crash information and support more effective failure analysis.

Author Contributions

S.T. designed the main methodology and the tool; Q.Z. provided valuable ideas. Project administration, B.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available upon request from all authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sutton, M.; Greene, A.; Amini, P. Fuzzing: Brute Force Vulnerability Discovery; Addison-Wesley Professional: Boston, MA, USA, 2007. [Google Scholar]
Godefroid, P. Fuzzing: Hack, Art, and Science. Commun. ACM 2020, 63, 70–76. [Google Scholar] [CrossRef]
McDonald, L.; Haq, M.I.U.; Barkworth, A. Survey of Software Fuzzing Techniques. ACM Comput. Surv. 2023, 55, 145. [Google Scholar] [CrossRef]
Yu, Z.; Liu, Z.; Cong, X.; Li, X.; Yin, L. Fuzzing: Progress, Challenges, and Perspectives. Comput. Mater. Contin. 2024, 78, 1–29. [Google Scholar] [CrossRef]
Manes, V.J.M.; Han, H.; Han, C.; Cha, S.K.; Egele, M.; Schwartz, E.J.; Woo, M. Fuzzing: Art, Science, and Engineering. arXiv 2018, arXiv:1812.00140. [Google Scholar]
Vyukov, D.; Konovalov, A. Syzkaller: An Unsupervised Coverage-Guided Kernel Fuzzer. 2015. Available online: https://github.com/google/syzkaller (accessed on 19 October 2015).
Google. Kernel Address Sanitizer. 2013. Available online: https://www.kernel.org/doc/html/latest/dev-tools/kasan.html (accessed on 30 June 2013).
Google. Kernel Concurrency Sanitizer. 2019. Available online: https://www.kernel.org/doc/html/latest/dev-tools/kcsan.html (accessed on 20 September 2019).
Jones, D. Trinity: A Linux System Call Fuzzer. 2006. Available online: http://codemonkey.org.uk/projects/trinity (accessed on 7 November 2006).
Schumilo, S.; Aschermann, C.; Gawlik, R.; Schinzel, S.; Holz, T. KAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels. In Proceedings of the 26th USENIX Security Symposium (USENIX Security ’17), Vancouver, BC, Canada, 16–18 August 2017; USENIX Association: Berkeley, CA, USA, 2017; pp. 167–182. [Google Scholar]
Pailoor, S.; Aday, A.; Jana, S. MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation. In Proceedings of the USENIX Security, Baltimore, MD, USA, 15–17 August 2018; pp. 729–743. [Google Scholar]
Kim, K.; Jeong, D.R.; Kim, C.H.; Jang, Y.; Shin, I.; Lee, B. HFL: Hybrid Fuzzing on the Linux Kernel. In Proceedings of the NDSS, San Diego, CA, USA, 23–26 February 2020. [Google Scholar]
Sun, H.; Shen, Y.; Wang, C.; Liu, J.; Jiang, Y.; Chen, T.; Cui, A. HEALER: Relation Learning Guided Kernel Fuzzing. In Proceedings of the SOSP, Koblenz, Germany, 26–29 October 2021; pp. 344–358. [Google Scholar]
Sun, H.; Shen, Y.; Liu, J.; Xu, Y.; Jiang, Y. KSG: Augmenting Kernel Fuzzing with System Call Specification Generation. In Proceedings of the USENIX ATC, Carlsbad, CA, USA, 11–13 July 2022; pp. 351–366. [Google Scholar]
Borkmann, D.; Starovoitov, A. Linux eBPF. Available online: https://ebpf.io (accessed on 8 February 2024).
Vyukov, D.; Konovalov, A. Syzlang: System Call Description Language. 2015. Available online: https://github.com/google/syzkaller/blob/master/docs/syscall_descriptions_syntax.md (accessed on 19 October 2015).
Zou, X.; Li, G.; Chen, W.; Zhang, H.; Qian, Z. SyzScope: Revealing High-Risk Security Impacts of Fuzzer-Exposed Bugs in Linux Kernel. In Proceedings of the 31st USENIX Security Symposium (USENIX Security ’22), Boston, MA, USA, 10–12 August 2022; USENIX Association: Berkeley, CA, USA, 2022; pp. 3201–3217. [Google Scholar]
Li, Y.; Ji, S.; Chen, Y.; Liang, S.; Lee, W.-H.; Chen, Y.; Lyu, C.; Wu, C.; Beyah, R.; Cheng, P.; et al. UNIFUZZ: A Holistic and Pragmatic Metrics-Driven Platform for Evaluating Fuzzers. In Proceedings of the 30th USENIX Security Symposium (USENIX Security ’21), Virtual, 11–13 August 2021; USENIX Association: Berkeley, CA, USA, 2021; pp. 2777–2794. [Google Scholar]
Han, S.; Dang, Y.; Ge, S.; Zhang, D.; Xie, T. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proceedings of the 2012 34th International Conference on Software Engineering (ICSE), Zurich, Switzerland, 2–9 June 2012; pp. 145–155. [Google Scholar]
Khvorov, A.; Vasiliev, R.; Chernishev, G.; Rodrigues, I.M.; Koznov, D.; Povarov, N. S3M: Siamese Stack (Trace) Similarity Measure. In Proceedings of the 18th International Conference on Mining Software Repositories (MSR ’21), Madrid, Spain, 17–19 May 2021; pp. 266–270. [Google Scholar]
lcamtuf. American Fuzzy Lop. 2013. Available online: https://lcamtuf.coredump.cx/afl/ (accessed on 3 March 2013).
Moran, K.; Linares-Vásquez, M.; Bernal-Cárdenas, C.; Vendome, C.; Poshyvanyk, D. CrashScope: A Practical Tool for Automated Testing of Android Applications. In Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), Buenos Aires, Argentina, 20–28 May 2017; pp. 15–18. [Google Scholar]
Wu, R.; Zhang, H.; Cheung, S.-C.; Kim, S. CrashLocator: Locating Crashing Faults Based on Crash Stacks. In Proceedings of the 2014 International Symposium on Software Testing and Analy, San Jose, CA, USA, 21–25 July 2014; pp. 204–214. [Google Scholar]
Jiang, Z.; Jiang, X.; Hazimeh, A.; Tang, C.; Zhang, C.; Payer, M. Igor: Crash Deduplication Through Root-Cause Clustering. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual, 15–19 November 2021; pp. 3318–3336. [Google Scholar]
Shibaev, E.; Sushentsev, D.; Golubev, Y.; Khvorov, A. Stack Trace Deduplication: Faster, More Accurately, and in More Realistic Scenarios. In Proceedings of the 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Montreal, QC, Canada, 4–7 March 2025; pp. 511–521. [Google Scholar]
Bellard, F. QEMU: A Fast and Portable Dynamic Translator. In Proceedings of the USENIX ATC, Anaheim, CA, USA, 10–15 April 2005; Available online: https://www.usenix.org/conference/2005-usenix-annual-technical-conference/qemu-fast-and-portable-dynamic-translator (accessed on 17 July 2010).
SimonKagstrom. KCOV. 2010. Available online: https://github.com/SimonKagstrom/kcov (accessed on 23 August 2010).

Figure 1. Comparison of distinct crash traces exhibiting divergent call stacks that ultimately converge at the same failure location.

Figure 2. Overall workflow of ECHO. ECHO integrates call stack-based crash deduplication into the fuzzing loop and iteratively refines the input corpus to enhance kernel coverage and bug discovery. The system consists of two main stages: Call Stack Deduplication (Steps ①–⑥) and Corpus Refinement (Steps ⑦ and ⑧). In the first stage, ECHO generates syscall inputs (①), executes them in a virtualized environment (②), and collects crash traces (③). It then normalizes traces (④), performs call stack clustering (⑤), and removes redundant crashes (⑥). In the second stage, deduplicated feedback refines the corpus (⑦) and prioritizes fuzzing seeds (⑧). This architecture improves triage scalability, reduces analysis noise, and enhances fuzzing efficiency.

Figure 3. Component diagram of ECHO, illustrating its integration into a Syzkaller-based fuzzing workflow. ECHO leverages KCOV for coverage feedback, KASAN for bug detection, and applies call stack-based deduplication to guide input scheduling and triage.

Figure 4. Coverage improvement comparison between ECHO and Syzkaller over 24-h fuzzing campaigns across Linux kernel versions 5.15, 6.1, 6.6, and 6.12. Each curve shows the number of unique branches covered over time, with shaded areas representing standard deviation across multiple runs. The vertical dashed lines indicate the jump points where ECHO demonstrates branch coverage acceleration due to deduplication-guided corpus refinement. Subfigures (a–d) correspond to each kernel version, respectively.

Figure 5. Crash deduplication comparison between ECHO, Syzkaller and S3M across different Linux kernel versions. ECHO achieves consistent reductions in crash cluster count.

Figure 6. Two superficially distinct crash traces merged by ECHO after call stack normalization. Differences are highlighted in red.

Table 1. ECHO has discovered 20 previously undocumented vulnerabilities in the Linux kernel. The first column lists the affected modules, while the remaining columns present the corresponding source files, Bug functions, and categorized bug types. All bugs were identified in key components of the kernel, including networking, file systems, memory management, and core subsystems.

ID	Source Files	Bug Functions	Bug Types
1	net/ipv6/addrconf.c	addrconf_rs_timer()	deadlock
2	net/ipv4/ip_input.c	ip_list_rcv()	deadlock
3	drivers/net/phy/phy.c	phy_state_machine()	logic error
4	fs/fuse/dev.c	__fuse_simple_request()	logic error
5	fs/exfat/inode.c	exfat_write_inode()	logic error
6	fs/ext4/super.c	ext4_put_super()	logic error
7	fs/inode.c	find_inode_fast()	deadlock
8	fs/super.c	iterate_supers()	logic error
9	net/netlink/af_netlink.c	netlink_dump()	deadlock
10	net/core/rtnetlink.c	rtnetlink_rcv_msg()	deadlock
11	drivers/block/loop.c	loop_reconfigure_limits()	logic error
12	fs/fat/fatent.c	fat_count_free_clusters()	logic error
13	kernel/printk/printk.c	console_lock_spinning_enable()	deadlock
14	fs/readdir.c	iterate_dir()	logic error
15	net/core/rtnetlink.c	rtnl_lock()	deadlock
16	fs/xattr.c	vfs_setxattr()	deadlock
17	fs/ext4/ext4_jbd2.c	__ext4_journal_start_sb()	logic error
18	kernel/events/core.c	ctx_sched_in()	null ptr deref
19	fs/ext4/inode.c	ext4_dirty_folio()	null ptr deref
20	mm/page_alloc.c	get_page_from_freelist()	logic error

Table 2. Code coverage comparison between ECHO and Syzkaller on different Linux kernel versions over 24 h. ECHO consistently achieves higher coverage, with an average improvement of 15.2%.

Version	ECHO	Syzkaller	Improvment
Linux Kernel v5.15	194,744.2	169,486.6	14.9%
Linux Kernel v6.1	240,022.4	213,462.2	12.4%
Linux Kernel v6.6	233,973.8	201,323.6	16.2%
Linux Kernel v6.12	240,022.2	204,816.4	17.1%
Overall	227,190.6	197,272.2	15.2%

Table 3. Crash deduplication comparison on four Linux kernel versions.

Kernel Version	Raw Crashes	Syzkaller Clusters	ECHO Clusters
v5.15	11,341	1911	896
v6.1	12,274	2101	978
v6.6	10,863	1844	932
v6.12	11,970	2044	922
Average	11,612	1975	932

Table 4. False positive rate and false negative rate of crash clustering.

Tool	False Positive Rate (FPR)	False Negative Rate (FNR)
Syzkaller	23.5%	29.1%
ECHO	8.2%	12.6%

Table 5. Feature-based comparison between ECHO and baseline deduplication techniques.

Feature	ECHO	Syzkaller	CrashFinder
Frame normalization	✓	×	×
Inlined function recovery	✓	×	×
Call-site structural equivalence	✓	×	×
Support for partial traces	✓	✓	✓
Kernel-specific trace support	✓	✓	✓
Dedup granularity	Fine-grained	Coarse	Medium
Fuzzing integration	✓	✓	×
Automation level	Automated	Automated	Semi-automated

Table 6. Clustering accuracy comparison on a manually labeled ground truth dataset.

Tool	Adjusted Rand Index (ARI)	Normalized Mutual Information (NMI)
Syzkaller	0.36	0.42
ECHO	0.78	0.81

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tao, S.; Zhang, B.; Zhang, Q. ECHO: Enhancing Linux Kernel Fuzzing via Call Stack-Aware Crash Deduplication. Electronics 2025, 14, 2914. https://doi.org/10.3390/electronics14142914

AMA Style

Tao S, Zhang B, Zhang Q. ECHO: Enhancing Linux Kernel Fuzzing via Call Stack-Aware Crash Deduplication. Electronics. 2025; 14(14):2914. https://doi.org/10.3390/electronics14142914

Chicago/Turabian Style

Tao, Shuoyu, Baoju Zhang, and Qiang Zhang. 2025. "ECHO: Enhancing Linux Kernel Fuzzing via Call Stack-Aware Crash Deduplication" Electronics 14, no. 14: 2914. https://doi.org/10.3390/electronics14142914

APA Style

Tao, S., Zhang, B., & Zhang, Q. (2025). ECHO: Enhancing Linux Kernel Fuzzing via Call Stack-Aware Crash Deduplication. Electronics, 14(14), 2914. https://doi.org/10.3390/electronics14142914

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ECHO: Enhancing Linux Kernel Fuzzing via Call Stack-Aware Crash Deduplication

Abstract

1. Introduction

2. Related Work and Motivation

2.1. Kernel Fuzzing

2.2. Crash Deduplication Strategies

2.3. Motivation

3. Methodology

3.1. Overview Design

3.2. Call Stack Deduplication

3.3. Corpus Refinement

3.4. Deduplication-Aware Feedback

4. Implementation

5. Evaluation

5.1. Experimental Setup

5.2. Bug Detection Capabilities

5.3. Case Study

5.4. Coverage Improvement

5.5. Deduplication Statistics Analysis

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI