Optimized Snappy Compression with Enhanced Encoding Strategies for Efficient FPGA Implementation

Zhang, Huan; Li, Chenpu; Xue, Meiting; Zhao, Bei; Bao, Jianrong

doi:10.3390/electronics14152987

Open AccessArticle

Optimized Snappy Compression with Enhanced Encoding Strategies for Efficient FPGA Implementation

by

Huan Zhang

¹

,

Chenpu Li

^2,*

,

Meiting Xue

²

,

Bei Zhao

³

and

Jianrong Bao

^1,4

¹

School of Telecommunication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China

²

School of Cyberspace Security, Hangzhou Dianzi University, Hangzhou 310018, China

³

School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China

⁴

National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(15), 2987; https://doi.org/10.3390/electronics14152987 (registering DOI)

Submission received: 26 June 2025 / Revised: 24 July 2025 / Accepted: 25 July 2025 / Published: 26 July 2025

(This article belongs to the Special Issue Advances in Digital Signal and Image Processing, Techniques, and Computations with Multidisciplinary Applications, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

The extensive utilization of the Snappy compression algorithm in digital devices such as smartphones, IoT, and digital cameras has played a crucial role in alleviating demands on network bandwidth and storage space. This paper presents an improved Snappy compression algorithm optimized for implementation on field programmable gate arrays (FPGAs). The proposed algorithm enhances the compression ratio by refining the encoding format of Snappy and introduces an innovative approach utilizing fingerprints within the dictionary to minimize storage space requirements. Additionally, the algorithm incorporates a pipeline structure to optimize performance. Experimental results demonstrate that the proposed algorithm achieves a throughput of 1.6 GB/s for eight hardware kernels. The average compression ratio is 2.27, representing a 6.1% improvement over the state-of-the-art Snappy FPGA implementation. Notably, the proposed algorithm architecture consumes fewer on-chip storage resources compared to other advanced algorithms, striking a balance between logic and storage resource utilization. This optimization leads to higher FPGA resource utilization efficiency. Our design addresses the growing demand for efficient lossless data compression solutions in consumer electronics, ultimately contributing to advancements in modern digital ecosystems.

Keywords:

FPGA; lossless compression; pipeline structure; Snappy algorithm

1. Introduction

Data compression is an indispensable technology in computation and communication systems, playing a pivotal role in reducing storage requirements, enhancing transmission efficiency, and optimizing energy consumption [1,2,3]. Among compression techniques, lossless methods enable compact data representation without information loss, preserving data integrity—a critical requirement in fields such as medical imaging, scientific datasets, and financial records [4,5]. In consumer electronics, where rapid processing, limited storage, and efficient transmission are paramount, the development of high-performance lossless compression algorithms remains a significant research focus.

A major class of lossless compression techniques is dictionary-based compression, originally introduced by Lempel and Ziv. These algorithms, including LZ77 [6], LZ4 [7], LZW [8], LZRW [9], LZSS [10], and Snappy [11], achieve compression by replacing repeated sequences of data with references to a dynamically constructed dictionary. Due to their ability to efficiently handle repetitive patterns, they are widely used in text compression, log file storage, and network protocols. However, software implementations of these algorithms often struggle to meet the throughput requirements of high-speed data processing systems.

To overcome these limitations, field programmable gate arrays (FPGAs) have emerged as a powerful acceleration platform, offering parallel processing capabilities, low-latency execution, and energy efficiency. The programmable logic units within FPGAs consist of Look-Up Tables (LUTs), Flip-Flops, and other basic logic elements, capable of performing complex logic operations and data processing [12]. FPGA-based compression architectures are particularly well-suited for computationally intensive tasks, enabling real-time data processing in applications such as high-frequency trading, network packet compression, and embedded storage systems [13].

Early research in FPGA-accelerated compression primarily focused on entropy coding techniques, such as Huffman coding and arithmetic coding, which exploit statistical redundancies in data. In 1994, Howard and Vitter [14] proposed a dynamic Huffman coding architecture that leverages parallel frequency counting and tree generation to minimize encoding latency. Further optimizations were later introduced by Garikipati et al. [3], who combined Huffman coding with run-length encoding (RLE) to improve compression efficiency while reducing power consumption. Recently, Guguloth et al. [15] introduced a modern hardware architecture based on the Canonical Huffman encoding and decoding computation method, integrated with frequency counting, sorting, state machine optimization, and barrel shifter techniques, which minimizes memory storage requirements and utilizes fewer hardware resources.

With the increasing adoption of dictionary-based algorithms, researchers have explored FPGA optimizations for LZ-family compression. Bartik et al. [16] analyzed the LZ4 algorithm’s suitability for hardware implementation, highlighting its trade-offs between throughput and resource utilization. Choi et al. [17] developed a reconfigurable LZ77 architecture that supports two operational modes—throughput-first (TF) and compression ratio-first (CF)—by dynamically adjusting the degree of parallelism during string matching. Gao et al. [18] propose MetaZip, improving the compression throughput within the constraints of FPGA resources by adopting an adaptive parallel-width pipeline.

Zstd combines LZ77 compression and entropy encoding, suitable for the compression of big data sets, such as HFT data. However, Zstd involves many complex high-sequential operations, which consume a significant amount of CPU resources during actual deployment [19]. Refs. [20,21] introduced architectures of hardware compression kernel for Zstd that allows the algorithm to be used for real-time compression of big data streams. Ref. [22] proposed a pipelined dynamic table FSE accelerator on FPGA to mitigate the compression performance loss.

Snappy compression algorithm, known for its fast compression and decompression speeds, has widespread applications in large-scale data processing and communication systems, including search engines [23], log analysis [24], and image and video processing [25]. Since the Snappy compression algorithm is a lightweight and fast compression algorithm designed primarily for software, deploying Snappy on FPGA platforms still encounters limitations and challenges:

Snappy relies on a hash table for its dictionary, storing fixed-length strings starting at each byte, along with an input buffer for backward referencing. This setup leads to considerable redundancy between the dictionary and input buffer, wasting storage space.
The algorithm performs hash computations only for non-matching characters; hence, partial matches do not contribute to the hash values stored in the dictionary [26].
Snappy inherently supports encoding of match lengths up to 64 bytes, necessitating the segmentation of longer matching sequences into multiple smaller ones, resulting in suboptimal coding efficiency.

To address these issues, this paper proposes an optimized FPGA-based Snappy compression architecture with the following solutions:

Replacement of raw data entries in the dictionary with fingerprints, thereby eliminating redundant storage.
Inclusion of both matching and non-matching data in the hash dictionary, aiming to elevate the compression ratio.
An enhancement to the Snappy encoding format that fully utilizes the trailing two bits of its token structure, enabling support for long matches up to 1024 bytes, thereby boosting the compression ratio and addressing the limitation on match length.

This paper is organized as follows: Section 2 provides a review of the Snappy algorithm, including its encoding format and operational steps, along with an overview of previous hardware implementations. Section 3 introduces our enhanced Snappy encoding format, as well as the design of hash functions and fingerprints. In Section 4, we propose a hardware architecture tailored for our enhanced Snappy algorithm. Experimental results of our FPGA implementation and comparisons with other FPGA hardware designs are presented in Section 5. Finally, Section 7 summarizes the findings and conclusions of this paper.

2. Snappy Compression Algorithm

2.1. Review of the Snappy Algorithm

Lossless data compression techniques encompass statistical model-based compression, such as Huffman coding [27] and Arithmetic coding [28], as well as dictionary-based compression methods like Snappy [26], and the fusion of statistical and dictionary encodings, exemplified by Zstd [21,22].

Snappy, a highly efficient algorithm for data compression and decompression, employs a sliding window approach combined with dictionary-style compression [29,30].

Snappy prioritizes speed over maximum compression ratio. It is designed for scenarios where low latency and high throughput are more critical than achieving the highest possible compression level.

The compression process in Snappy involves continuous backward scanning to identify matching string patterns. When a match is not found, the algorithm generates literal tokens, representing unchanged input bytes. In contrast, when a match is detected, copy tokens are generated, indicating both the length of the matched data segment and a backward offset referencing the position of the reference data within the input stream [29].

The Snappy encoding format, illustrated in Figure 1, utilizes 0b00 in the least significant bits of the first token byte for literals, while copies use either 0b01 or 0b10, depending on the required match length and offset width [29].

In a Snappy-compressed file, the header contains metadata specifying the length of the original uncompressed data, followed by a data stream of literals and copies [29].

The operation of the Snappy algorithm can be divided into the following steps [26]: (1) Hash computation, (2) Hash table matching, (3) Backward matching, (4) Snappy encoding.

2.2. Research on Hardware Implementations

The ability of FPGAs to be programmed and reconfigured after manufacturing makes them a powerful tool for accelerating complex algorithms like data compression.

While LZ77 [31,32], LZ4 [16,26,33], LZW [34,35], and LZMA [36] have been extensively explored and implemented on FPGAs, Snappy has received comparatively less attention in this domain.

Previous studies [37,38,39] introduced parallel Snappy decompressors capable of processing multiple tokens per clock cycle. However, these designs encounter scalability challenges, struggling to accommodate increased parallel processing due to complex control logic requirements and the need for duplicate Block RAM (BRAM) resources to resolve bank conflicts and manage data dependencies [31].

Xilinx’s efforts in implementing Snappy on their high-performance data processing card demonstrate the potential advantages of tailored FPGA designs for this algorithm [30]. It introduces a hybrid processing structure, combining a pipeline for pre-match tasks with finite state machine (FSM)-based serial processing for encoding. Achieving a compression speed of 1.8 GB/s, this work underscores how FPGA’s inherent parallelism can significantly enhance the performance of Snappy’s pattern matching and back-reference encoding mechanisms. However, Xilinx’s Snappy design heavily consumes on-chip storage resources, suggesting ample room for optimization in this aspect. Moreover, it maintains the original Snappy encoding format without supporting extended token type for long matches, resulting in suboptimal compression efficiency for certain data patterns.

Chen et al. [40] presents an RTL-level implementation of the Snappy compression algorithm optimized for low-power FPGA platforms, specifically the Zynq-7035. The design introduces several key innovations: (1) a customized parallel data-matching architecture capable of processing six comparisons simultaneously; (2) an optimized pre-match module featuring a 15-stage pipelined architecture; and (3) memory optimizations leveraging BRAM-based dictionary storage and multi-level FIFO buffering to enhance DDR-FPGA data transfer efficiency. Operating at 148 MHz, this implementation achieves a throughput of 148 MB/s while demonstrating superior resource efficiency—requiring 36.7% fewer LUTs compared to Xilinx’s design [30]. While this architecture demonstrates superior cost-performance for LUT-constrained applications, its higher BRAM utilization for dictionary storage compared to Xilinx’s design [30] may impact scalability in some use cases.

3. Proposed Method

3.1. Enhanced Snappy Coding Format

Snappy’s original format utilizes three token types (0b00, 0b01, 0b10). However, Google’s extension [11] introduces a fourth type (0b11 in the least significant two bits) specifically for copy operations with match lengths less than or equal to 64 bytes and offsets falling within the range of

2^{16}

to

2^{32}

, inclusive. As shown in Table 1, this extension accommodates longer distance back-references without necessitating larger tokens or more complex encoding methods.

In FPGA compression, the constrained on-chip memory limits the hash table and input buffer sizes, imposing an

O F F S E T_L I M I T

that reduces encounters with long back-references. Despite this constraint, there remains a relatively high probability of encountering long match lengths during compression. According to Snappy’s design, when the length of a matched string exceeds 64 bytes, it must be encoded into multiple consecutive copy type segments, potentially leading to decreased encoding efficiency.

We conducted experiments using the Silesia dataset across both software-based Google Snappy [11] and Xilinx Snappy FPGA implementation [30]. With a hash table configuration of 4096 entries and six hash slots, Figure 2 reveals that longer matches are more frequently detected in the hardware realization compared to long offsets. Here, “long-match” indicates match lengths exceeding 64 bytes, while “long-offset” signifies back-reference offsets exceeding

2^{16}

.

The observed disparity arises from differences in hashing strategies: Google Snappy’s software version primarily utilizes its hash table for indexing, whereas Xilinx’s hardware implementation includes storage for both indices and partial string data. Furthermore, Xilinx’s Snappy design incorporates a Best Match Filter module, which optimizes match length identification, enhancing the hardware solution’s capacity to detect and leverage longer matches.

In this work, we propose an enhanced encoding strategy for the 0b11 token type, specifically tailored to improve compression performance in hardware-accelerated Snappy implementations. Unlike Google’s extension, which focuses on supporting larger offsets, our design prioritizes longer match lengths by redefining the 0b11 token to represent copy operations with lengths up to 1024 bytes, as illustrated in Figure 3 and Table 1. This enhancement is particularly beneficial for hardware-based compression, where longer matches significantly improve the compression ratio and reduce the number of tokens generated.

We set 1024 bytes as the maximum match length based on experimental studies showing that longer matches are extremely rare in real-world datasets—our analysis of the Silesia corpus revealed that 99.998% of matches fall within this limit. For the remaining 0.002% of cases where matches exceed 1024 bytes, the algorithm automatically splits them into multiple segments (each no more than 1024 bytes) and encodes them as separate tokens, ensuring complete coverage while maintaining format compatibility.

For offset encoding in this enhanced token, we introduce a variable-length encoding scheme that starts with 4 bits from the second token byte, followed by additional bytes as needed. In this scheme, the most significant bit (MSB) of each byte acts as a continuation flag: a value of 1 indicates that more bytes follow, while 0 marks the end of the offset encoding.

This design offers several advantages: In a 16 KB input buffer, where the maximum offset is 14 bits, our encoding requires only 3 bytes for the offset (4-bit base + 2 continuation bytes), resulting in a total of 4 bytes per Token 11. In contrast, Google’s Token 11 uses a fixed 4-byte offset field, resulting in a total of 5 bytes per token—one byte more than our design. For larger buffers, the variable-length encoding allows the offset to grow as needed, while maintaining backward compatibility and avoiding unnecessary overhead for smaller offsets. Empirical analysis shows that longer offsets occur with significantly lower probability in real-world data. Therefore, variable-length encoding is particularly suitable—shorter offsets are encoded compactly, and longer ones are only represented when necessary, minimizing bit consumption.

These optimizations improve algorithmic efficiency, particularly for long matches, enabling better utilization of Snappy’s compression capabilities and increasing overall compression performance.

3.2. Hash Table

The Snappy compression algorithm relies on a hash table acting as a dictionary, storing reference data from a sliding window in on-chip RAM within an FPGA. This hash table employs a fixed number of slots to accommodate entries, with the sliding window data and corresponding file indices inserted into this dictionary. In cases of hash collisions, where multiple entries map to the same slot, the earliest-entered slot is replaced with the most recent data due to the limited capacity.

We adopt the hash algorithm proposed by [33], expressed as follows:

H a s h (W) = (((W < < 31) + (W < < 19) + (W < < 9) + W) > > 20) & 0 x F F F

(1)

where W is the 32-bit

M I N_M A T C H

data in a sliding window.

This hashing function masks the result by 12 bits, effectively mapping a 32-bit value to a 12-bit one using only bitwise operations [33]. Consequently, the hardware resources required for calculating Equation (1) are minimal, and the process itself takes only two clock cycles.

3.3. Fingerprint Design

In the typical Snappy configuration [30,40], the minimum match length

M I N_M A T C H

is standardized to 4 bytes. Within this setup, each slot in the hash table contains the following:

D_{s l o t} = (i n d e x, w 0, w 1, w 2, w 3, w 4, w 5) .

(2)

where

i n d e x

occupies 3 bytes and the sliding window data

w 0

to

w 5

requires 6 bytes. This results in a total storage requirement of 9 bytes per slot to identify a potential match.

To optimize storage usage and minimize clock cycles, our study adopts a strategy that retains only fingerprints for the

M I N_M A T C H

bytes, while preserving the original values for any exceeding bytes in the sliding window. This approach aims to leverage the efficiency of fingerprint matching in identifying potential matches between the sliding window and entries in the hash table.

Although there exists a possibility of misjudgments during fingerprint comparison, such occurrences are mitigated through a multi-step process. Initially, the hash table matching is performed based on fingerprints, followed by a backward match module. This module compares the sliding window data against the original data buffered in the input buffer, thereby ensuring the accuracy and integrity of the final compression encoding.

In our approach, with a 6−byte sliding window (

w 0

,

w 1

,

w 2

,

w 3

,

w 4

,

w 5

) and a 4−byte

M I N_M A T C H

, each entry in the hash table slot comprises 3−byte indices and 3−byte data values:

D_{s l o t} = (i n d e x, f p, w 4, w 5)

(3)

resulting in a compact 6−byte storage format. This represents a significant improvement over conventional Snappy designs [30,40], which store complete sliding window data in each slot. Specifically, our approach reduces storage requirements by one-third compared to these traditional implementations when using a 6−byte sliding-window configuration.

We employ a hash function to compute the fingerprint of the minimal match data W, composed of four bytes:

w 0, w 1, w 2, w 3

. The fingerprint of W can be expressed as follows:

f p (W) = (w 3 < < 3) \oplus (w 2 < < 2) \oplus (w 1 < < 1) \oplus w 0

(4)

Figure 4 illustrates the computational workflow for hash table matching within a single slot. Initially, the fingerprint of the sliding window is computed using Equation (4) and compared against the fingerprint stored in each slot. A match indicates a potential

M I N_M A T C H

−byte (4−byte in Figure 4) length match. Subsequently, the remaining bytes of the sliding window are compared with those stored in the slot to determine the overall match length.

Finally, the offset is calculated by finding the difference between the index stored in the hash table and the current index of the first character in the sliding window. If this difference is less than a predefined threshold

O F F S E T_L I M I T

, the match is considered valid.

4. Hardware Implementation

4.1. FPGA Architecture of Snappy Compression Algorithm

In contrast to its software-based counterpart, hardware-accelerated implementations of the Snappy algorithm offer significant enhancements in efficiency and speed within constrained resource environments. This acceleration is achieved through the optimization of key operations such as hash computations, dictionary lookups, and backward matching, which are traditionally performed sequentially in software implementations [40]. However, by leveraging the inherent parallelism of FPGA architectures, hardware implementations enable pipelined execution of these operations, thereby unlocking the potential for substantial performance gains.

The hardware architecture proposed in this study, as illustrated in Figure 5, comprises five essential components: the input controller, the input buffer, the pre-match module, the backward match module, and the Snappy encoder. These components work seamlessly together to efficiently process incoming data streams and generate compressed output in the Snappy format.

The input controller serves as the interface between the FPGA and the external memory, facilitating the retrieval of 8-bit data streams for processing. Upon fetching, the data is stored in the input buffer, where it undergoes subsequent processing stages.

The pre-match module computes hash values and fingerprints to identify potential matches in a hash table, concurrently updating the input buffer. Matches exceeding the minimum threshold length trigger the intervention of the backward match module, which further analyzes the data to determine precise match lengths by referencing the circular input buffer.

Following the identification of matches, the data is directed to separate literal and copy streams, which are then fed into the Snappy encoder. Here, the data is encoded according to the Snappy compression format, as depicted in Figure 1 and Figure 3.

The pre-match module includes a word shift register, a hash engine, an FP (fingerprint) engine, a hash table, a slot match, an index counter, and a best match filter.

Together, these components facilitate the transformation of input bytes into a sliding window, the computation of hash values and fingerprints, the comparison of hash table entries with sliding window content, the determination of match lengths and offsets, and the validation of match accuracy.

Inspired by the Xilinx’s Snappy compressor [30], our design incorporates a best match filter within the pre-match module. This filter optimizes match selection by comparing match lengths initiated by the current byte with those starting from subsequent sliding window-size bytes. If a longer match is detected, its match length is adjusted to zero, ensuring that only the longest matches are considered for compression, thus enhancing overall compression efficiency.

4.2. Pipelined Architecture and Parallelization in Pre-Match Module

In the pre-match module, the Snappy algorithm is optimized for FPGA implementation through its repetitive and unidirectional treatment of individual bytes. This characteristic facilitates a pipelined approach, enhancing efficiency. Illustrated in Figure 6, the pre-match module features a 10-stage pipelined architecture, organized into three distinct stages.

The initial stage encompasses tasks such as data reading, word shifting, hash and fingerprint computations, dictionary reading, and dictionary writing. The dictionary writing task updates the hash table, which makes the dictionary reading task access the updated data next time. Following this, the second stage executes the parallelized slot match unit, where sliding window data undergoes comparison against entries in a hash table. Finally, the third stage selects the historical position with the maximum match length from the results obtained in the preceding parallel comparisons.

Within the second stage, the time required for matching a single slot,

t_{s l o t}

, involves one offset comparison, one fingerprint comparison, and two data comparisons, each completed within a single clock cycle. Thus,

t_{s l o t} = 4 T

. Initially, the cumulative match time for six slots in stage 2 totals

t_{s t a g e 2} = 6 * t_{s l o t} = 24 T

.

Leveraging the independence of slot data matches, parallelization becomes feasible. Through employing six matching resources in parallel, the aggregate match time is reduced to

t_{p h a s e 2} = t_{s l o t} = 4 T

.

Within the pre-match module, each component operates within a single clock cycle, allowing the algorithm to process one byte of data per average clock cycle once initiated. This streamlined approach ensures efficient data processing and contributes to the overall performance of the system.

While pipeline architectures are commonly employed in FPGA-based designs [30,40], our implementation uniquely incorporates three key innovations within the pipeline stages: (1) fingerprint-based dictionary matching, (2) extended token11 encoding/decoding, and (3) inclusive hash dictionary management.

5. Result and Comparison

5.1. Experimental Setup

The proposed Snappy compression architecture has been implemented and rigorously evaluated on an AMD Alveo U200 acceleration card. The design features a configurable input buffer size of 16KB and an optimized hash table structure comprising 4096 entries with six slots per entry. The resource utilization of a single compression implementation is listed in Table 2, along with the proportion of each resource within the total chip resources.

To ensure comprehensive evaluation, we verified bit-level accuracy against the canonical Snappy implementation using the Silesia corpus, Canterbury corpus, and Calgary corpus. We evaluated both single-kernel and multi-kernel configurations.

5.2. Experimental Results

Compression ratio. The Silesia corpus is a carefully curated benchmark dataset designed for evaluating compression algorithms [41], comprising files with diverse characteristics. In Figure 7a, the Silesia benchmark revealed that our implementation achieves superior compression ratios ranging up to 6.2 (original/compressed), averaging at 2.27, representing a 6.1% improvement over Xilinx’s Vitis solution [30].

We further evaluated our implementation on the Canterbury and Calgary corpora. As shown in Figure 7b, our method achieves competitive average compression ratios across these datasets, with error bars indicating standard deviation. While the results demonstrate robust performance in multi-corpus scenarios, our implementation exhibits slightly higher variance compared to Xilinx’s design [30].

Clock Frequency. Each compression kernel achieves a post-place-and-route verified maximum frequency of 320 MHz (3.125 ns period). For direct comparison with the Xilinx reference design [30], we implemented a single-kernel configuration operating at 300 MHz, achieving 279 MB/s throughput. Scaling up to an array of eight hardware kernels, each operating at the same 300 MHz frequency, enhances the system’s throughput capability to 1.6 GB/s.

As shown in Table 3, while our implementation achieves comparable throughput (279 MB/s vs. 280 MB/s) in single-kernel operation at 300 MHz, the scaled eight-kernel configuration shows marginally lower aggregate throughput compared to Xilinx’s state-of-the-art design [30]. This difference stems from fundamental architectural trade-offs in our optimization approach.

Power consumption. Power characterization was performed using Vivado 2022.1’s post-route power analysis with SAIF activity files from the Silesia corpus benchmark simulation. The measurements show our single-kernel implementation consumes 16.2 W (on-chip power) when operating at 300 MHz, including all compression core logic and associated on-chip memory subsystems.

5.3. Key Performance-Influencing METRICS

The proposed optimizations, including the enhanced encoding format and inclusive hash dictionary, aim to improve the compression ratio. Additionally, the slot number and hash table size significantly impact implementation performance. To systematically evaluate these key factors, we analyze five distinct configurations (Table 4) across three benchmark corpora. Figure 8 presents the corresponding compression ratios, demonstrating the effectiveness of each optimization.

Enhanced encoding format. The comparison between the baseline and Configuration 2 reveals the contribution of the enhanced encoding format to the improvement in compression ratio. The absence of enhanced encoding results in an average compression ratio degradation of 4.13%.

Inclusive hash dictionary. An experimental comparison between the baseline and Configuration 3 demonstrates that inclusively storing both matching and non-matching data can improve the compression ratio by 47.24%. This improvement is attributed to the limited dictionary size in hardware compared to software; storing only non-matching data reduces the likelihood of finding potential matches within the dictionary.

Slot number. The slot number represents the parallelism level of the hash table. Comparisons between the baseline and Configurations 4 and 5 reveal that the slot number affects the compression ratio: in general, a higher number of slots leads to a higher compression ratio. Specifically, using four slots results in a 1.76% decrease compared to six slots, while using eight slots only brings a 0.88% increase. This is because as more slots are configured, the hash table grows larger; however, once the table is sufficiently large to accommodate all input data, further increases yield-diminishing returns.

Hash table size. To investigate the relationship between compression ratio and hash table size, we conducted a series of experiments based on the baseline configuration listed in Table 4. The hash table size varied from 1K to 32K entries. As shown in Figure 9, the experimental results reveal a significant positive correlation between hash table expansion and compression performance, indicating that larger hash tables consistently improve compression efficiency. However, this improvement comes at the cost of increased memory resource utilization. Importantly, the gains in compression efficiency diminish when the hash table size exceeds the input buffer capacity, highlighting a critical trade-off for hardware designers.

Sliding window size. The siding window size determines the number of data bytes stored in each hash table slot. As the minimum match length

M I N_M A T C H

is standardized to 4 bytes, the first

M I N_M A T C H

bytes are hashed into a fingerprint, while the remaining window data are compared against entries in the hash table. Although a larger sliding window enables the pre-match module to process more potential matching bytes, the compression ratio remains unaffected by variations in window size. This stability is attributed to the inclusive design, which stores both matching and non-matching data in the hash table, decoupling window size from compression performance.

5.4. Comparison with Recent Works

We compare our design with other dictionary-based compression designs optimized for FPGA implementations, including Zstd [21] and Snappy [30,40].

Zstd, as described in [21], integrates dictionary compression with Huffman coding, providing an efficient lossless compression algorithm. While [30] presents the current state-of-the-art FPGA implementation of the original Snappy compression algorithm, ref. [40] focuses on optimizing this implementation (without modifying the core algorithm) specifically for low-power devices. The key distinction is that [40] maintains the standard Snappy compression format while achieving better power efficiency through RTL-level implementation optimizations.

Table 5 provides a comprehensive comparison of single-kernel resource consumption and throughput figures for these designs. Table 3 details the maximum achievable frequency for each compared design and test frequencies used for reported throughput metrics. All instances of the Snappy design were evaluated under uniform conditions, utilizing an input buffer size of 16KB and a hash table comprising 4096 entries, with each entry containing six slots. To assess the performance efficacy per single FPGA slice resource, we introduce the Performance Complexity Ratio (PCR) and the Performance Storage Ratio (PSR) to evaluate performance output per individual storage bit, with higher values indicating superior resource utilization. Although our proposed design utilizes the largest share of FPGA slice resources (primarily LUT/FF), it demonstrates the least consumption of on-chip storage resources, including both Block RAM (BRAM) and Ultra RAM (URAM). Despite a lower PCR score compared to Zstd [21], our design achieves the optimal PCR ratio among the compared algorithms.

In our hardware implementation, the input buffer utilizes the internal BRAM resources of the FPGA, while the hash table employs URAM for data storage. Remarkably, our proposed design, leveraging the fingerprint scheme, achieves a significant reduction of approximately one-third in URAM resource consumption compared to Xilinx’s Snappy implementation. Resource balance is pivotal in FPGA algorithm design. Figure 10 shows the proportion of logic resources (LUT/FF), storage resources (BRAM/URAM), and DSP resources relative to total available resources for each compared design implementing on their test platforms. As depicted in Figure 10, the proportion of logic and on-chip storage resources used by our compression kernel is very low, all remaining below 1.0%. While some alternative algorithm designs may exhibit lower logic resource utilization rates, they often suffer from excessive consumption of storage resources, leading to an imbalance in chip resource allocation. This imbalance can restrict the deployment of multiple algorithm kernels in parallel within a chip, with potential scenarios where storage resources are exhausted despite available logic resources. This strategic trade-off, favoring lower RAM consumption without compromising performance, represents a significant achievement, particularly in scenarios where memory resources are severely constrained.

6. Discussion

6.1. Performance and Efficiency Trade-Offs

Our FPGA-based implementation of the Snappy compression algorithm achieves a throughput of up to 1.6 GB/s using eight parallel compression kernels. This result demonstrates the effectiveness of our architecture in leveraging the fine-grained parallelism and pipelining capabilities of modern FPGAs. This is primarily due to the following:

Pipelined and parallel architecture: The 10-stage pipeline and parallel slot-matching units enable high-throughput processing with minimal latency.
Enhanced encoding format: By purposing the 0b11 token for long matches (up to 1024 bytes), we improve the compression ratio compared to prior FPGA implementations.
Fingerprint-based storage optimization: Replacing raw data entries in the dictionary with compact fingerprints reduces redundant storage by 33% while maintaining high matching accuracy through a multi-step validation process (fingerprint comparison followed by backward matching).
Inclusive hash dictionary: Unlike traditional software with Snappy implementations that only hash non-matching characters, our design incorporates both matching and non-matching data in the hash table. This strategy increases the likelihood of detecting more matches, further elevating the compression ratio.

However, this performance comes with trade-offs:

FPGA resource utilization: While our design reduces URAM usage significantly, it requires a little more LUT/FF resources.

Development complexity: RTL-based FPGA optimization demands specialized expertise, increasing initial development effort compared to software solutions.

6.2. Comparison with Alternative Approaches

Compared to other FPGA-accelerated compression algorithms (Zstd [21] and Snappy [30,40]), our design offers the following:

Higher compression ratio: Our design achieves a 6.1% improvement in compression ratio over prior Snappy FPGA implementations.

Better storage efficiency: The fingerprint technique reduces on-chip memory requirements, enabling deployment in resource-constrained edge devices.

Yet, Zstd may still be preferable for applications prioritizing maximum compression ratio over speed, while LZ4 could be more suitable for ultra-low-latency scenarios.

6.3. Portability Across FPGA Platforms

The proposed Snappy compression architecture is designed with portability in mind, ensuring that the core algorithmic improvements—such as fingerprint-based storage and enhanced encoding—remain effective across different FPGA platforms. However, the performance and resource utilization will vary depending on the target device’s memory architecture and available hardware resources.

The design leverages URAM for efficient hash table storage in high-performance FPGAs (e.g., AMD/Xilinx Alveo, Intel Stratix). For devices lacking URAM (e.g., Xilinx Zynq-7000), the hash table can be implemented using BRAM, albeit with reduced throughput due to lower memory bandwidth and higher access latency.

Prior work [40] demonstrated this trade-off when porting Xilinx’s Snappy design [30] to Zynq-7035, where throughput decreased from 280 MB/s to 148 MB/s. A similar performance penalty is expected for our design when deployed on low-power FPGAs.

The pipelined structure and parallel slot-matching units remain effective across platforms, but the degree of parallelism may need adjustment based on available logic resources (LUTs/FFs). For FPGAs with limited BRAM (e.g., low-cost Intel Cyclone), the hash table size or slot count may require reduction, slightly impacting compression ratio but maintaining functionality.

The fingerprint optimization and enhanced encoding format (supporting long matches) are architecture-independent, ensuring consistent compression ratio improvements regardless of the FPGA platform.

7. Conclusions

This paper proposes an improved Snappy algorithm and its FPGA implementation. An enhanced Snappy encoding format improves the compression format. By incorporating fingerprint technology and hardware pipelining design, our algorithm effectively reduces storage space requirements while boosting compression throughput. The performance of our hardware Snappy compression implementation is able to achieve a throughput of 1.6 GB/s with eight compression kernels. This allows for real-time compression of data streaming in big data applications. The average compression ratio achieved is 2.27, marking a 6.1% improvement over state-of-the-art Snappy FPGA implementations. Despite slightly higher FPGA logic resource utilization, the algorithm consumes less on-chip storage resources compared to other advanced algorithms. This balance in logic and storage resource utilization enhances FPGA resource efficiency. This design facilitates enhanced functionality across a spectrum of consumer electronic devices, including smartphones, digital cameras, and IoT devices. By catering to the burgeoning need for efficient data compression solutions in contemporary digital ecosystems, it significantly contributes to the progression of consumer electronics.

Author Contributions

Conceptualization, H.Z. and J.B.; methodology, H.Z., C.L., J.B. and B.Z.; Software, H.Z.; formal analysis, C.L. and B.Z.; investigation, J.B., H.Z. and B.Z.; resources, C.L., M.X. and J.B.; data curation, H.Z. and M.X.; writing—original draft preparation, H.Z.; writing—review and editing, B.Z.; visualization, H.Z. and B.Z.; supervision, J.B.; project administration, J.B.; funding acquisition, J.B. and C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LZ24F010005.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank Zhejiang Provincial Natural Science Foundation of China (ZPNSFC) for their support this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zeng, X.; Zhang, S. Parallelizing stream compression for IoT applications on asymmetric multicores. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 3–7 April 2023; pp. 950–964. [Google Scholar] [CrossRef]
Wang, Y.; Lu, T.; Liang, Y.; Chen, X.; Yang, M. Reviving In-storage hardware compression on ZNS SSDs through host-SSD collaboration. In Proceedings of the 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA), Las Vegas, NV, USA, 1–5 March 2025; pp. 608–623. [Google Scholar] [CrossRef]
Garikipati, K.; Muppala, T.; Vinitha Chowdary, A.; Sahay, A. IoT sensor data stream compression with hybrid compression algorithms. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 24–28 June 2024; pp. 1–8. [Google Scholar] [CrossRef]
Mehboob, R.; Khan, S.A.; Ahmed, Z.; Jamal, H.; Shahbaz, M. Multigig lossless data compression device. IEEE Trans. Consum. Electron. 2010, 56, 1927–1932. [Google Scholar] [CrossRef]
Park, Y.; Kim, J.-S. zFTL: Power-efficient data compression support for NAND flash-based consumer electronics devices. IEEE Trans. Consum. Electron. 2011, 57, 1148–1156. [Google Scholar] [CrossRef]
Ziv, J.; Lemple, A. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 1977, 23, 337–343. [Google Scholar] [CrossRef]
Ziv, J.; Lemple, A. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 1978, 24, 530–536. [Google Scholar] [CrossRef]
Welch, T.A. A technique for high-performance data compression. Computer 1984, 17, 8–19. [Google Scholar] [CrossRef]
Williams, R.N. An extremely fast Ziv—Lempel data compression algorithm. In Proceedings of the Data Compression Conference 1991, Snowbird, UT, USA, 8–11 April 1991; Volume 1, pp. 362–371. [Google Scholar] [CrossRef]
Storer, G.; Syzmanski, T.G. Data compression via textual substitution. J. ACM 1982, 29, 928–951. [Google Scholar] [CrossRef]
Google Inc. Google Snappy [Online]. Github. 2025, Volume 1. Available online: https://github.com/google/snappy (accessed on 25 May 2025).
Kok, C.L.; Dai, Y.; Koh, Y.Y.; Xiang, M.; Teo, T.H. Innovative EMD-Based Technique for Preventing Coffee Grinder Damage from Stones with FPGA Implementation. Appl. Sci. 2025, 15, 1579. [Google Scholar] [CrossRef]
Papon, T.I.; Ju, H.M.; Roozkhosh, S.; Hoornaert, D.; Sanaullah, A.; Drepper, U.; Mancuso, R.; Athanassoulis, M. Relational fabric: Transparent data transformation. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 3–7 April 2023; pp. 3688–3698. [Google Scholar] [CrossRef]
Howard, P.G.; Vitter, J.S. Design and analysis of fast text compression based on quasi-arithmetic coding. J. Inf. Process. 1994, 30, 777–790. [Google Scholar] [CrossRef]
Guguloth, E.; Vadtya, S.; Kudithi, T.; Shanmugam, M.R.; Banoth, A.N. FPGA implementation of high throughput encoder and decoder design of lossless canonical Huffman machine. Results Eng. 2025, 26, 105037. [Google Scholar] [CrossRef]
Bartík, M.; Ubik, S.; Kubalik, P. LZ4 compression algorithm on FPGA. In Proceedings of the 2015 IEEE International Conference on Electronics, Circuits, and Systems (ICECS), Cairo, Egypt, 9 December 2015; Volume 11, pp. 179–182. [Google Scholar] [CrossRef]
Choi, S.; Kim, Y.; Lee, D.; Lee, S.; Park, K.; Song, Y. Design of FPGA-based LZ77 compressor with runtime configurable compression ratio and throughput. IEEE Access 2019, 7, 149583–149594. [Google Scholar] [CrossRef]
Gao, R.; Li, X.; Li, Y.; Wang, X.; Tan, G. MetaZip: A high-throughput and efficient accelerator for DEFLATE. ACM 2022, 7, 319–324. [Google Scholar] [CrossRef]
Karandikar, S.; Udipi, A.N.; Choi, J.; Whangbo, J.; Zhao, J.; Kanev, S.; Lim, E.; Alakuijala, J.; Madduri, V.; Ranganathan, P.; et al. CDPU: Co-designing compression and decompression processing units for hyperscale systems. In Proceedings of the 50th Annual International Symposium on Computer Architecture, Orlando, FL, USA, 17–21 June 2023; pp. 1–17. [Google Scholar] [CrossRef]
Gao, R.; Li, Z.; Tan, G.; Li, X. BeeZip: Towards an organized and scalable architecture for data compression. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), La Jolla, CA, USA, 27 April–1 May 2024; pp. 133–148. [Google Scholar] [CrossRef]
Chen, J.; Daverveldt, M.; Al-Ars, Z. FPGA acceleration of zstd compression algorithm. In Proceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Portland, OR, USA, 17–21 June 2021; pp. 188–191. [Google Scholar] [CrossRef]
Lin, Y.; Liu, K.; Guo, Z. A FPGA-based FSE accelerator with dynamic table for ZSTD compression. In Proceedings of the 2025 IEEE International Symposium on Circuits and Systems (ISCAS), London, UK, 25–28 May 2025; pp. 1–5. [Google Scholar] [CrossRef]
Altan, M.; Öztürk, E. A method to improve full-text search performance of MongoDB. Pamukkale Univ. J. Eng. Sci. 2022, 28, 720–729. [Google Scholar] [CrossRef]
Papon, T.I.; Mun, J.H.; Karatsenidis, K.; Roozkhosh, S.; Hoornaert, D.; Sanaullah, A.; Drepper, U.; Mancuso, R.; Athanassoulis, M. Effortless locality on data systems using relational fabric. IEEE Trans. Knowl. Data Eng. 2024, 36, 7410–7422. [Google Scholar] [CrossRef]
Ilham, F.A. The evaluation of audio steganography to embed image files Using encryption and snappy Compression. Indones. J. Comput. Sci. 2022, 11, 318–336. [Google Scholar] [CrossRef]
Liu, W.; Mei, F.; Wang, C.; O’Neill, M.; Swartzlander, E.E. Data compression device based on modified LZ4 algorithm. IEEE Trans. Consum. Electron. 2005, 64, 110–117. [Google Scholar] [CrossRef]
Huffman, D.A. A method for the construction of minimum redundancy codes. Proc. IRE 1952, 40, 1098–1101. [Google Scholar] [CrossRef]
Witten, I.H.; Neal, R.M.; Cleary, J.G. Arithmetic coding for data compression. Commun. ACM 1987, 30, 520–540. [Google Scholar] [CrossRef]
Kovacs, K. A Hardware Implementation of the Snappy Compression Algorithm. Master’s Thesis, Electrical Engineering and Computer Sciences University of California at Berkeley, Berkeley, CA, USA, 2019. Available online: http://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-85.pdf (accessed on 25 May 2025).
Xilinx Inc. Xilinx Snappy-Streaming Compression and Decompression [Online]. Technical Report. Available online: https://xilinx.github.io/Vitis_Libraries/data_compression/2022.1/source/L2/snappy.html (accessed on 25 May 2025).
Fang, J.; Chen, J.; Lee, J.; Al-Ars, Z.; Peter Hofstee, H. An efficient high-throughput LZ77-based decompressor in reconfigurable logic. J. Signal Process. Syst. 2020, 92, 931–947. [Google Scholar] [CrossRef]
Huang, Y.; Song, A.; Guo, C.; Yang, Y. ASIC design of LZ77 compressor for computational storage drives. Electron. Lett. 2023, 59. [Google Scholar] [CrossRef]
Kim, J.; Cho, J. Hardware-accelerated fast lossless compression based on LZ4 algorithm. In Proceedings of the ICDSP ’19: Proceedings of the 2019 3rd International Conference on Digital Signal Processing; Jeju Island, Republic of Korea, 24–26 February 2019, Volume 1, pp. 65–68. [CrossRef]
Naqvi, S.; Naqvi, R.; Riaz, R.; Siddiqui, F. Optimized RTL design and implementation of LZW algorithm for high bandwidth applications. Elect. Rev. 2011, 87, 279–285. Available online: https://pe.org.pl/articles/2011/4/68.pdf (accessed on 25 May 2025).
Nunez, J.L.; Jones, S. Gbit/s lossless data compression hardware. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2003, 11, 499–510. [Google Scholar] [CrossRef]
Li, B.; Zhang, L.; Dong, Q. Implementation of LZMA compression algorithm on FPGA. Electron. Lett. 2014, 50, 1522–1524. [Google Scholar] [CrossRef]
Qiao, Y. An FPGA-Based Snappy Decompressor-Filter. Master’s Thesis, Delft University of Technology, Delft, The Netherlands, 2018; pp. 1554–1558. [Google Scholar] [CrossRef]
Fang, J.; Chen, J.; Al-Ars, Z.; Hofstee, P.; Hidders, J. Work-in-progress: A high-bandwidth snappy decompressor in reconfigurable logic. In Proceedings of the 2018 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), Torino, Italy, 30 September–5 October 2018; pp. 1554–1558. [Google Scholar] [CrossRef]
Fang, J.; Chen, J.; Lee, J.; Al-Ars, Z.; Peter Hofstee, H. A fine-grained parallel snappy decompressor for FPGAs using a relaxed execution model. In Proceedings of the IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), San Diego, CA, USA, 28 April–1 May 2019; Volume 1, pp. 1–5. [Google Scholar] [CrossRef]
Chen, X.; Li, B.; Zhou, Q. Implementation of RTL scalable high-performance data compression method. Acta Electron. Sin. 2022, 50, 1548–1557. [Google Scholar] [CrossRef]
Krajewski, M. Silesia Compression Corpus. GitHub Repository, [Online]. Available online: https://github.com/MiloszKrajewski/SilesiaCorpus (accessed on 11 July 2025).

Figure 1. Snappy encoding format.

Figure 2. Long-match and long-offset comparison between hardware and software designs.

Figure 3. Enhanced Snappy encoding format with the 0b11 token type.

Figure 4. The computational workflow within a single slot.

Figure 5. Hardware architecture of the Snappy compressor.

Figure 6. Pipelined architecture of the pre-match module.

Figure 7. Compression ratio performance.

Figure 8. Effect of different configurations on the compression ratio.

Figure 9. Compression ratio vs. hash table size (1K-32K Entries).

Figure 10. Resource utilization comparison with state-of-the-art compressors.

Table 1. Token types and their encoded length/offset ranges in Snappy compression.

Token Type	Length Width (Bits)	Length Range (Bytes)	Offset Width (Bits)	Offset Range (Bytes)
Literal (00) *	6	1–60	0	N/A ****
Literal (00) *	6 + 8m **	61– $2^{8 m}$	0	N/A
Short Copy (01) *	3	4–11	11	1– $2^{11}$
Long Copy (10) *	6	1–64	16	1– $2^{16}$
Google’s Extended Copy (11)	6	1–64	32	$2^{16}$ – $2^{32}$
Our Extended Copy (11)	10	1–1024	K = 4 + 7n ***	1– $2^{K}$

* Standard Snappy encoding format. ** The first 6 bits specify the byte length m of the length field (

m \in [1, 4]

), followed by m bytes storing the actual length value. *** The first 4 bits represent the lower 4 bits of the offset. The higher bits of the offset are encoded using a variable-length encoding scheme that occupies n bytes, with

n \geq 1

. **** Since tokens of the literal (00) type contain no offset encoding bits, the concept of an offset range does not apply to them.

Table 2. Resource uitilization of a single Snappy kernel.

Resource	Utilization	Proportion
LUT	8669	0.97%
Flip-Flop	6455	0.35%
BRAM	8	0.45%
URAM	4	0.5%
DSP	0	0

Table 3. Clock frequency and throughput comparison with other advanced designs.

Design	Max Frequency	Implement Frequency	Throughput (MB/s)
zstd [21]	-	-	730
Snappy [30] single kernel	-	300 MHz	280
Snappy [30] 8 kernels	-	300 MHz	1800
Snappy [40]	148 MHz	148 MHz	148
Snappy (proposed)	320 MHz	300 MHz	279
Snappy (proposed) 8 kernels	320 MHz	300 MHz	1600

Table 4. Configurations of different metrics.

	Slot Number	Enhanced Encoding	Matching Data
1. Baseline	6	Yes	Yes
2. No enhanced encoding	6	No	Yes
3. Non-matching only	6	Yes	No
4. Slot number = 4	4	Yes	Yes
5. Slot number = 8	8	Yes	Yes

Table 5. Resource comparison with other advanced designs.

Design	Algorithm	Platform	Slices	BRAMs	URAMs	Storage Size (Kb)	Throughput (MB/s)	PCR (Throughput /Slices)	PSR (Throughput /Storage)
[21]	zstd	Alveo U200	2221	87	22	9468	730	0.33	0.08
[30]	Snappy	Alveo U200	1925	8	6	2016	280	0.15	0.14
[40]	Snappy	Zynq-7035	2057	77.4	0	2787	148	0.072	0.05
This work	Snappy	Alveo U200	2167	8	4	1440	279	0.13	0.19

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, H.; Li, C.; Xue, M.; Zhao, B.; Bao, J. Optimized Snappy Compression with Enhanced Encoding Strategies for Efficient FPGA Implementation. Electronics 2025, 14, 2987. https://doi.org/10.3390/electronics14152987

AMA Style

Zhang H, Li C, Xue M, Zhao B, Bao J. Optimized Snappy Compression with Enhanced Encoding Strategies for Efficient FPGA Implementation. Electronics. 2025; 14(15):2987. https://doi.org/10.3390/electronics14152987

Chicago/Turabian Style

Zhang, Huan, Chenpu Li, Meiting Xue, Bei Zhao, and Jianrong Bao. 2025. "Optimized Snappy Compression with Enhanced Encoding Strategies for Efficient FPGA Implementation" Electronics 14, no. 15: 2987. https://doi.org/10.3390/electronics14152987

APA Style

Zhang, H., Li, C., Xue, M., Zhao, B., & Bao, J. (2025). Optimized Snappy Compression with Enhanced Encoding Strategies for Efficient FPGA Implementation. Electronics, 14(15), 2987. https://doi.org/10.3390/electronics14152987

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimized Snappy Compression with Enhanced Encoding Strategies for Efficient FPGA Implementation

Abstract

1. Introduction

2. Snappy Compression Algorithm

2.1. Review of the Snappy Algorithm

2.2. Research on Hardware Implementations

3. Proposed Method

3.1. Enhanced Snappy Coding Format

3.2. Hash Table

3.3. Fingerprint Design

4. Hardware Implementation

4.1. FPGA Architecture of Snappy Compression Algorithm

4.2. Pipelined Architecture and Parallelization in Pre-Match Module

5. Result and Comparison

5.1. Experimental Setup

5.2. Experimental Results

5.3. Key Performance-Influencing METRICS

5.4. Comparison with Recent Works

6. Discussion

6.1. Performance and Efficiency Trade-Offs

6.2. Comparison with Alternative Approaches

6.3. Portability Across FPGA Platforms

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI