# Stream-Based Lossless Data Compression Applying Adaptive Entropy Coding for Hardware-Based Implementation

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- We found that the number of occupied entries of a look-up table used in a stream-based data compression strongly relates to data entropy of a streaming data. Using the characteristic. we have developed a novel algorithm, called ASE coding. It compresses a data stream by applying instantaneous entropy that is calculated from the number of occupied entries in its look-up table. We have also found that this algorithm is suitable for hardware implementation.
- We have developed a unique table management for ASE coding. It effectively keeps the table entries in the lower part of the look-up table. We also developed an optimization technique, called entropy culling, which performs dynamic reduction of occupied entries in the table.
- In order to reduce hardware resources and speedup the implementation, we introduced near neighbor entry exchange, which limits entry exchange to a near place in the look-up table. It contributes to implementing the mechanism on a compact hardware by reducing the size of multiplexer to select the entry.

## 2. Backgrounds and Definitions

## 3. ASE Coding

#### 3.1. Organization of Compressor/Decompressor

#### 3.2. Compression and Decompression Mechanisms

Algorithm 1 Compression process of ASE coding. |

function Initialize() k = 0 end functionfunction ASECompress(s) I = ${T}^{-1}$(s)) if I == − then 1 CMark = 0 $comp\_symbol$ = ($s\ll $ 1) | CMark return ${(comp\_symbol)}_{N}$ else CMark = 1 m = EntropyCalc() $comp\_symbol$ = ((I & ((1 ≪m) − 1)) ≪ 1) | CMark return ${(comp\_symbol)}_{m+1}$ end ifend functionfunction ${T}^{-1}$(s)for I:= 0 …k doif T[I] == s then ArrangeTable(s) return I end ifend for RegisterToTable(s) return -1 end functionfunction EntropyCalc() m = $ceil\left({log}_{2}k\right)$ return m end function |

Algorithm 2 Decompression process of ASE coding. |

function Initialize() k = 0 end functionfunction ASEDecompress(S) CMark = S & 1 if CMark == 1 then m = EntropyCalc() I = ($S\gg $ 1) & ((1 $\ll m$) − 1) s = T[I] ArrangeTable(s) else s = S & ((1 $\ll N$) − 1) RegisterToTable(s) end if return s end functionfunction EntropyCalc() m = $ceil\left({log}_{2}k\right)$ return m end function |

#### 3.3. Look-Up Table Operation

Algorithm 3 Look-up table management functions of ASE coding. | |

functionRegisterToTable(s)for i:= k …0 doif (i + 1) < E then T[i + 1] = T[i] end ifend for T[0] = s if (k + 1) < E then k = k + 1 end ifend function | functionInitialize() $culling\_count$ = NUM_CULLINGend functionfunctionEntropyCulling()if $culling\_count$ > 0 then $culling\_count$ = $culling\_count$− 1 else k = k− 1 $culling\_count$ = NUM_CULLINGend ifend function |

functionArrangeTable(s)for i:= 0 …k doif T[i] == s thenbreakend ifend for ${s}^{\prime}$ = s for j:= 0 …i− 1 do T[j + 1] = T[j] end for T[0] = ${s}^{\prime}$ EntropyCulling() end function | functionArrangeTableNNEE(s)for i:= 0 …k doif T[i] == s thenbreakend ifend for ${s}^{\prime}$ = s ${i}^{\prime}$ = i−d if ${i}^{\prime}$ < 0 then ${i}^{\prime}$ = 0 end iffor j:= ${i}^{\prime}$ …i− 1 do T[j + 1] = T[j] end for T[${i}^{\prime}$] = ${s}^{\prime}$ EntropyCulling() end function |

`NUM_CULLING`. The EntropyCulling() function decrements the counter $culling\_count$ and also decrements the number of occupied entries k when the counter becomes zero. This implements an invalidation of the highest entry in the look-up table. The function is invoked when the symbol s is hit in the look-up table. Therefore, it is invoked from the ArrangeTable() function.

## 4. Evaluation

#### 4.1. Evaluation for Entropy Coding Ability

#### 4.2. Performance Effect of Near Neighbor Entry Exchange

#### 4.3. Evaluation for Compression Performance

#### 4.4. Evaluation for Hardware Implementation

## 5. Conclusions

## Author Contributions

## Acknowledgments

## Conflicts of Interest

## References

- Howard, P.G.; Vitter, J.S. A universal algorithm for sequential data compression. Inf. Process. Manag.
**1992**, 28, 749–763. [Google Scholar] [CrossRef] [Green Version] - Huffman, D.A. A Method for the Construction of Minimum-Redundancy Codes. Proc. IRE
**1952**, 40, 1098–1101. [Google Scholar] [CrossRef] - Ziv, J.; Lempel, A. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory
**1977**, 23, 337–343. [Google Scholar] [CrossRef] [Green Version] - Welch, T. A Technique for High-Performance Data Compression. Computer
**1984**, 17, 8–19. [Google Scholar] [CrossRef] - Mentzer, F.; Agustsson, E.; Tschannen, M.; Timofte, R.; Van Gool, L. Practical Full Resolution Learned Lossless Image Compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, SC, USA, 15–20 June 2019; pp. 10621–10630. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Deutsch, P. RFC 1951 DEFLATE Compressed Data Format Specification Version 1.3; Aladdin Enterprises: Menlo Park, CA, USA, 1996. [Google Scholar]
- Vitter, J.S. Design and Analysis of Dynamic Huffman Codes. J. ACM
**1987**, 34, 825–845. [Google Scholar] [CrossRef] - Google. Available online: https://github.com/google/snappy (accessed on 11 February 2020).
- LZ4. Available online: https://lz4.github.io/lz4/ (accessed on 11 February 2020).
- Van Oord, A.; Kalchbrenner, N.; Kavukcuoglu, K. Pixel Recurrent Neural Networks. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1747–1756. [Google Scholar]
- Van Oord, A.; Kalchbrenner, N.; Espeholt, L.; Kavukcuoglu, K.; Vinyals, O.; Graves, A. Conditional Image Generation with PixelCNN Decoders. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 4–9 December 2016; pp. 4797–4805. [Google Scholar]
- Sriram, M.P.; Dinesh, A. Fast Text Compression Using Artificial Neural Networks. In Soft Computing and Industry: Recent Applications; Rajkumar, R., Koeppen, M., Ovaska, S., Furuhashi, T., Hoffmann, F., Eds.; Springer: London, UK, 2002; pp. 527–533. [Google Scholar]
- Goyal, M.; Tatwawadi, K.; Chandak, S.; Ochoa, I. DeepZip: Lossless Data Compression Using Recurrent Neural Networks. In Proceedings of the Data Compression Conference (DCC) 2019, Snowbird, UT, USA, 26–29 March 2019; p. 575. [Google Scholar]
- CMIX. Available online: https://github.com/byronknoll/cmix (accessed on 20 June 2020).
- Franaszek, P.A.; Lastras-Montano, L.A.; Peng, S.; Robinson, J.T. Data compression with restricted parsings. In Proceedings of the Data Compression Conference (DCC’06), Snowbird, UT, USA, 28–30 March 2006; pp. 203–212. [Google Scholar] [CrossRef]
- Marumo, K.; Yamagiwa, S.; Morita, R.; Sakamoto, H. Lazy Management for Frequency Table on Hardware-Based Stream Lossless Data Compression. Information
**2016**, 7, 63. [Google Scholar] [CrossRef] [Green Version] - LZ77. Available online: https://gist.github.com/fogus/5401265 (accessed on 11 February 2020).
- 7-zip. Available online: https://www.7-zip.org, (accessed on 11 February 2020).
- Compressed Indexes and their Testbeds. Available online: http://pizzachili.dcc.uchile.cl (accessed on 11 February 2020).
- Ultra Video Group. Available online: http://ultravideo.cs.tut.fi/ (accessed on 11 February 2020).

**Figure 2.**Compression mechanism in ASE coding. When an original symbol is compressed, as shown in the left side, (

**a**) the first eight bits are picked up as the symbol, and (

**b**) the symbol is matched in the look-up table. Because M is 3, the index is selected in three bits. To indicate the symbol is compressed, (

**c**) Cmark bit is set. After the entropy calculation generates 2, (

**d**) the selected table index is shrunk to two bits and treated as a compressed data. (

**e**) The Cmark is merged in the MSB with the compressed data. Besides, when a symbol is not compressed as shown in the right side, because the symbol does not match to any entry in the look-up table, (

**f**) the Cmark bit is reset. Subsequently, (

**g**) the Cmark bit is merged in the MSB with the original symbol. Finally, (

**h**) the serializer concatenates all data resulted from the operations above, and aligns to three bits. The aligned data stream is outputted from the compressor.

**Figure 3.**Decompression mechanism in ASE coding. The decompressor checks the first bit of the compressed data stream. As shown in the left side, (

**a**) when the Cmark is set, (

**b**) the entropy calculation generates two due to the occupation of the look-up table. From the data stream, two bits are picked up as a compressed data. Subsequently, (

**c**) the data are extended to three bits. (

**d**) It is used as an index of the look-up table and outputs the data in the associated table entry as the original symbol. Besides, (

**e**) when the Cmark is reset, the original symbol is received. Eight bits are picked up from the stream and it is generated as the original symbol. Concurrently, the symbol is registered to the table.

**Figure 6.**Comparison of compression ratios of ASE coding with varying the distance regarding the nearest neighbor entry exchange.

**Figure 7.**Comparisons of compression ratios among ASE coding (8/16 bit symbol input) and the conventional methods.

**Figure 8.**Comparisons of compression ratios of ASE coding with varying the number of look-up table entries.

**Figure 9.**Comparisons of compression ratios of ASE coding with varying the number of hit counting of entropy culling.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Yamagiwa, S.; Hayakawa, E.; Marumo, K.
Stream-Based Lossless Data Compression Applying Adaptive Entropy Coding for Hardware-Based Implementation. *Algorithms* **2020**, *13*, 159.
https://doi.org/10.3390/a13070159

**AMA Style**

Yamagiwa S, Hayakawa E, Marumo K.
Stream-Based Lossless Data Compression Applying Adaptive Entropy Coding for Hardware-Based Implementation. *Algorithms*. 2020; 13(7):159.
https://doi.org/10.3390/a13070159

**Chicago/Turabian Style**

Yamagiwa, Shinichi, Eisaku Hayakawa, and Koichi Marumo.
2020. "Stream-Based Lossless Data Compression Applying Adaptive Entropy Coding for Hardware-Based Implementation" *Algorithms* 13, no. 7: 159.
https://doi.org/10.3390/a13070159