# Efficient Inverted Index Compression Algorithm Characterized by Faster Decompression Compared with the Golomb-Rice Algorithm

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- Methods for compressing many lists simultaneously, which mainly adapt and increase the potential of the methods listed above.

## 2. Materials and Methods

**Theorem**

**1**

**.**If n is a natural number, then:

**Theorem**

**2.**

**Proof.**

Algorithm 1: Golomb-Rice compression of a binary sequence. |

Algorithm 2: Golomb-Rice decompression (zero series length encoding). |

- distance of 0 is coded as $c\left(0\right)={\left(0\right)}_{2}=00$,
- distance of 1 is coded as $c\left(1\right)={\left(1\right)}_{2}=01$,
- distance of 2 is coded as $c\left(2\right)={\left(2\right)}_{2}=10$,
- distance of $m>2$ is coded as $l=\lfloor \frac{m}{3}\rfloor $ symbols ${\left(3\right)}_{2}=11$ ended with a symbol coding the number $m-3l$.

Algorithm 3: AC-SBS compression of a binary sequence. |

Algorithm 4: AC-SBS decompression (zero series length encoding). |

## 3. Results

## 4. Discussion

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Conflicts of Interest

## References

- Deming, W. Out of the Crisis; MIT Press: Cambridge, MA, USA, 1986. [Google Scholar]
- Shewart, W. Economic Control of Quality Manufactured Product; D. Van Nostrand: New York, NY, USA, 1931. [Google Scholar]
- Paśko, Ł.; Litwin, P. Methods of Data Mining for Quality Assurance in Glassworks; Collaborative Networks and Digital Transformation; Springer International Publishing: Berlin, Germany, 2019; pp. 185–192. [Google Scholar]
- Buttcher, S.; Clarke, C.; Cormack, G. Information Retrieval: Implementing and Evaluating Search Engines; MIT Press: Cambridge, MA, USA, 2010. [Google Scholar]
- Manning, C.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
- Zobel, J.; Moffat, A. Inverted files for text search engines. ACM Comput. Surv.
**2006**, 38, 1–56. [Google Scholar] [CrossRef] - Fano, R. Transmission of Information: A Statistical Theory of Communications; The MIT Press: Cambridge, MA, USA, 1961. [Google Scholar]
- Shannon, C. A Mathematical Theory of Communication. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef] [Green Version] - Huffman, D. A Method for the Construction of Minimum-Redundancy Codes. Proc. IRE
**1952**, 40, 1098–1101. [Google Scholar] [CrossRef] - Golomb, S. Run-Length Encodings. IEEE Trans. Inf. Theory
**1966**, IT-12, 399–401. [Google Scholar] [CrossRef] [Green Version] - Rice, R.; Plaunt, J. Adaptive Variable-Length Coding for Efficient Compression of Spacecraft Television Data. IEEE Trans. Commun.
**1971**, 16, 889–897. [Google Scholar] [CrossRef] - Elias, P. Universal codeword sets and representations of the integers. IEEE Trans. Inf. Theory
**1975**, 21, 194–203. [Google Scholar] [CrossRef] - Apostolico, A.; Fraenkel, A. Robust transmission of unbounded strings using Fibonacci representations. IEEE Trans. Inf. Theory
**1987**, 33, 238–245. [Google Scholar] [CrossRef] [Green Version] - Brisaboa, N.; Fariña, A.; Navarro, G.; Esteller, M. (S,C)-Dense Coding: An Optimized Compression Code for Natural Language Text Databases. In String Processing and Information Retrieval; Springer: Berlin/Heidelberg, Germany, 2003; pp. 122–136. [Google Scholar]
- Boldi, P.; Vigna, S. Codes for the World Wide Web. Internet Math.
**2005**, 2, 407–429. [Google Scholar] [CrossRef] [Green Version] - Elias, P. Efficient Storage and Retrieval by Content and Address of Static Files. J. ACM
**1974**, 21, 246–260. [Google Scholar] [CrossRef] - Fano, R. On the Number of Bits Required to Implement an Associative Memory; MIT Project MAC Computer Structures Group: Cambridge, MA, USA, 1971. [Google Scholar]
- Moffat, A.; Stuiver, L. Binary Interpolative Coding for Effective Index Compression. Inf. Retr. J.
**2000**, 3, 25–47. [Google Scholar] [CrossRef] - Anh, N.; Moffat, A. Inverted Index Compression Using Word-Aligned Binary Codes. Inf. Retr. J.
**2005**, 8, 151–166. [Google Scholar] [CrossRef] - Pibiri, G.; Venturini, R. Techniques for Inverted Index Compression. ACM Comput. Surv.
**2021**, 53, 1–36. [Google Scholar] [CrossRef] - Trotman, A. Compressing inverted files. Inf. Retr. J.
**2003**, 6, 5–19. [Google Scholar] [CrossRef] - Catena, M.; Macdonald, C.; Ounis, I. On Inverted Index Compression for Search Engine Efficiency. In Advances in Information Retrieval; Springer International Publishing: Berlin, Germany, 2014; pp. 359–371. [Google Scholar]
- Salomon, D.; Motta, G. Handbook of Data Compression; Springer: Berlin, Germany, 2010. [Google Scholar]
- Gallager, R.; Van Voorhis, D. Optimal Source Codes for Geometrically Distributed Integer Alphabets. IEEE Trans. Inf. Theory
**1975**, IT-21, 228–230. [Google Scholar] [CrossRef] - Somasundaram, K.; Domnic, S. Extended Golomb Code for Integer Representation. IEEE Trans. Multimed.
**2007**, 9, 239–246. [Google Scholar] [CrossRef] - Rice, R.; Robert, F. Some Practical Universal Noiseless Coding Techniques; Technical Report 79-22; Jet Propulsion Laboratory—JPL Publication: Pasadena, CA, USA, 1979. [Google Scholar]
- Rice, R. Some Practical Universal Noiseless Coding Techniques—Part III. Module PSI14.K; Technical Report 91-3; Jet Propulsion Laboratory—JPL Publication: Pasadena, CA, USA, 1991. [Google Scholar]
- Fenwick, P. Punctured Elias Codes for Variable-Length Coding of the Integers; Technical Report Technical Report 137; Department of Computer Science, The University of Auckland: Auckland, New Zealand, 1996. [Google Scholar]
- Robinson, T. Simple Lossless and Near-Lossless Waveform Compression; Technical Report Technical Report CUED/F-INFENG/TR.156; Cambridge University: Cambridge, UK, 1994. [Google Scholar]
- Kiely, A. Selecting the Golomb Parameter in Rice Coding; Technical Report 42-159; Jet Propulsion Laboratory, California Institute of Technology: Pasadena, CA, USA, 2004. [Google Scholar]
- Fraenkel, A.; Klein, S. Novel Compression of Sparse Bit-Strings–Preliminary Report. Comb. Algorithms Words
**1985**, 12, 169–183. [Google Scholar] - Salomon, D. Prefix Compression of Sparse Binary Strings. ACM Crossroads Mag.
**2000**, 6, 22–25. [Google Scholar] [CrossRef] - Tanaka, H.; Leon-Garcia, A. Efficient Run-Length Encodings. IEEE Trans. Inf. Theory
**1982**, IT-28, 880–890. [Google Scholar] [CrossRef] - Ferragina, P.; Venturini, R. A simple storage scheme for strings achieving entropy bounds. Theor. Comput. Sci.
**2007**, 372, 115–121. [Google Scholar] [CrossRef] [Green Version] - Zhang, J.; Long, X.; Suel, T. Performance of Compressed Inverted List Caching in Search Engines. In Proceedings of the 17th International Conference on World Wide Web, New York, NY, USA, 21–25 April 2008; pp. 387–396. [Google Scholar] [CrossRef]
- Ziv, J.; Lempel, A. A Universal Algorithm for Sequential Data Compression. IEEE Trans. Inf. Theory
**1977**, IT-23, 337–343. [Google Scholar] [CrossRef] [Green Version] - Ziv, J. The Universal LZ77 Compression Algorithm is Essentially Optimal for Individual Finite-Length N-Blocks. IEEE Trans. Inf. Theory
**2009**, 55, 1941–1944. [Google Scholar] [CrossRef] - Mascioni, V. An Inequality for the Binary Entropy Function and an Application to Binomial Coefficients. J. Math. Inequal.
**2012**, 6, 501–507. [Google Scholar] [CrossRef] - Robbins, H. A remark on Stirling’s formula. Am. Math. Mon.
**1995**, 62, 26–29. [Google Scholar] - Zhang, N.; Wu, X. Lossless compression of color mosaic images. IEEE Trans. Image Process.
**2006**, 15, 1379–1388. [Google Scholar] [CrossRef] - Hashimoto, M.; Koike, A.; Matsumoto, S. Hierarchical image transmission system for telemedicine using segmented wavelet transform and Golomb-Rice codes. Seamless Interconnection for Universal Services. In Proceedings of the Global Telecommunications Conference, GLOBECOM’99 (Cat. No.99CH37042), Rio de Janeiro, Brazil, 5–9 December 1999; pp. 2208–2212. [Google Scholar]
- Brunello, D.; Calvagno, G.; Mian, G.; Rinaldo, R. Lossless Compression of Video Using Temporal Information. IEEE Trans. Image Process.
**2003**, 12, 132–139. [Google Scholar] [CrossRef] [PubMed] - Nguyen, T.; Marpe, D.; Schwarz, H.; Wiegand, T. Reduced-Complexity Entropy Coding of Transform Coefficient Levels Using Trunceted Golomb-Rice Codes in Video Compression. In Proceedings of the 2011 18th IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011; pp. 753–756. [Google Scholar]
- Kalaivani, S.; Tharini, C. Analysis and implementation of novel Rice Golomb coding algorithm for wireless sensor networks. Comput. Commun.
**2020**, 150, 463–471. [Google Scholar] [CrossRef] - Sugiura, R.; Kamamoto, Y.; Harada, N.; Moriya, T. Optimal Golomb-Rice Code Extension for Lossless Coding of Low-Entropy Exponentially Distributed Sources. IEEE Trans. Inf. Theory
**2018**, 64, 3153–3161. [Google Scholar] [CrossRef] - Sugiura, R.; Kamamoto, Y.; Moriya, T. Integer Nesting/Splitting for Golomb-Rice Coding of Generalized Gaussian Sources. In Proceedings of the 2018 Data Compression Conference, Snowbird, UT, USA, 27–30 March 2018. [Google Scholar]
- Vasilache, A. Order Adaptive Golomb Rice Coding for High Variability Sources. In Proceedings of the 2017 25th European Signal Processing Conference (EUSIPCO), Kos Island, Greece, 28 August–2 September 2017. [Google Scholar]
- Domnic, S.; Glory, V. Extended Rice Code and Its application to R-Tree Compression. IETE J. Res.
**2015**, 61, 634–641. [Google Scholar] [CrossRef] - McKenzie, B.; Bell, T. Compression of sparse matrices by blocked rice coding. IEEE Trans. Inf. Theory
**2001**, 47, 1223–1230. [Google Scholar] [CrossRef]

**Figure 1.**Graph of entropy and its upper and lower bounds in the function of the number of ones for a sequence consisting of ${10}^{4}$ elements for $k\in [0,{10}^{4}]$.

**Figure 2.**Graph of entropy and its upper and lower bounds in the function of the number of ones for a sequence consisting of ${10}^{4}$ elements for $k\in [0,{10}^{3}]$.

**Figure 3.**Graph of dependence of the optimal codeword length in the algorithm for compression sparse binary sequences (AC-SBS) algorithm on the value of ${log}_{2}(k/n)$.

**Figure 10.**Ratio of sequence decompression times by Golomb-Rice and AC-SBS methods—ARM and x86 architecture.

**Figure 12.**Correlation between the x86 decompression rate ratio and the ratio of the number of Golomb-Rice codewords and the number of AC-SBS codewords.

**Table 1.**Relative sequence sizes for Golomb-Rice and AC-SBS compression methods compared to entropy.

$\mathit{k}/\mathit{n}$ | 0.0005 | 0.001 | 0.002 | 0.005 | 0.01 | 0.02 | 0.05 |
---|---|---|---|---|---|---|---|

ZLIB/ENT | 5.051 | 3.676 | 3.000 | 2.415 | 1.969 | 1.691 | 1.458 |

ACSBS/ENT | 1.254 | 1.180 | 1.132 | 1.095 | 1.085 | 1.077 | 1.092 |

RICE/ENT | 1.237 | 1.117 | 1.068 | 1.029 | 1.017 | 1.011 | 1.014 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Chmielowiec, A.; Litwin, P.
Efficient Inverted Index Compression Algorithm Characterized by Faster Decompression Compared with the Golomb-Rice Algorithm. *Entropy* **2021**, *23*, 296.
https://doi.org/10.3390/e23030296

**AMA Style**

Chmielowiec A, Litwin P.
Efficient Inverted Index Compression Algorithm Characterized by Faster Decompression Compared with the Golomb-Rice Algorithm. *Entropy*. 2021; 23(3):296.
https://doi.org/10.3390/e23030296

**Chicago/Turabian Style**

Chmielowiec, Andrzej, and Paweł Litwin.
2021. "Efficient Inverted Index Compression Algorithm Characterized by Faster Decompression Compared with the Golomb-Rice Algorithm" *Entropy* 23, no. 3: 296.
https://doi.org/10.3390/e23030296