The Compression Optimality of Asymmetric Numeral Systems

Source coding has a rich and long history. However, a recent explosion of multimedia Internet applications (such as teleconferencing and video streaming, for instance) renews interest in fast compression that also squeezes out as much redundancy as possible. In 2009 Jarek Duda invented his asymmetric numeral system (ANS). Apart from having a beautiful mathematical structure, it is very efficient and offers compression with a very low coding redundancy. ANS works well for any symbol source statistics, and it has become a preferred compression algorithm in the IT industry. However, designing an ANS instance requires a random selection of its symbol spread function. Consequently, each ANS instance offers compression with a slightly different compression ratio. The paper investigates the compression optimality of ANS. It shows that ANS is optimal for any symbol sources whose probability distribution is described by natural powers of 1/2. We use Markov chains to calculate ANS state probabilities. This allows us to precisely determine the ANS compression rate. We present two algorithms for finding ANS instances with a high compression ratio. The first explores state probability approximations in order to choose ANS instances with better compression ratios. The second algorithm is a probabilistic one. It finds ANS instances whose compression ratios can be made as close to the best ratio as required. This is done at the expense of the number θ of internal random “coin” tosses. The algorithm complexity is O(θL3), where L is the number of ANS states. The complexity can be reduced to O(θLlog2L) if we use a fast matrix inversion. If the algorithm is implemented on a quantum computer, its complexity becomes O(θ(log2L)3).


I. INTRODUCTION
An increasing popularity of working from home has dramatically intensified Internet traffic.To cope with a heavy communication traffic, there are essentially two options.The first option involves an upgrade of the Internet network.It is, however, very expensive and not always available.The second option is much cheaper and employs compression of transmitted symbols.It also makes sense as typical multimedia communication is highly redundant.Compression also called entropy coding has a long history and it can be traced back to Shannon [14] and Huffman [13].The well-known Huffman code is the first compression algorithm that works very well for symbol sources, whose statistics follow natural powers of 1/2.Unfortunately, Internet traffic sources almost never have such simple symbol statistics.
Asymmetric numeral systems (ANS) introduced by Duda in [6] give a very versatile compression tool.It allows to compress symbols that occur with an arbitrary probability distribution (statistics).ANS is also very fast in both hardware and software.Currently, ANS gains a significant uptake as a preferred compression algorithm by the IT industry.It has been adopted by Facebook, Apple, Google, Dropbox, Microsoft and Pixar to name a few main IT companies (see https://en.wikipedia.org/wiki/Asymmetric_numeral_systems).ANS can be seen as a finite state machine (FSM) that starts from an initial state, absorbs symbols from an input source one by one and squeezes out binary encodings.The sequence of symbols is called a symbol frame.The sequence of binary encodings is called a binary frame.The heart of an ANS algorithm is its symbol spread function (or simply symbol spread for short) that assigns FSM states to symbols.The assignment is completely arbitrary as long as each symbol s is assigned a number L s of states such that p s ≈ L s /L, where p s is probability of the symbol s and L = 2 R is the total number of states (R is a parameter that can be chosen to get an acceptable approximation of p s by L s /L).
Consequently, there are two closely related problems while designing ANS.The first also called quantisation requires from the designer to approximate a symbol probability distribution P = {p s |s ∈ S by its approximation Q = {q s = Ls L |s ∈ S}, where S is the set of all symbol (also called symbol source).It is expected that ANS implemented for Q achieve as close as possible compression to the source entropy (measured by average encoding length in bits per symbol).The second problem is selection of a symbol spread for fixed ANS parameters.It turns out that some symbol spreads are better than the others.Again, an obvious goal is to choose them in a such way that average encoding length is as small as possible and close to the symbol source entropy.
Motivation.Designers of ANS have to strike a balance between efficiency and compression quality.The first choice that needs to be made is how closely symbol probabilities needs to be approximated or p s ≈ Ls L .Clearly, the bigger the number of states (L = 2 R ) the better approximation and better compression.Unfortunately, a large L slows down compression.It tuns out that selection of a symbol spread has an impact on the quality of compression.For some application, this impact cannot be ignored.This is true, when ANS is applied to build a pseudorandom bit generator.In this case, the required property of ANS is a minimal residual redundancy, i.e. the average length of binary frames should be as close as possible to entropy of symbols.Despite of a growing popularity of ANS, there is no proof of its optimality for arbitrary symbol statistics.The work does not aim in proving optimality but rather it develops tools that allow to compare compression quality of different ANS instances.It means that we are able to adaptively modify ANS in such way that every modification provides a compression gain (redundancy reduction).The following issues are main drivers behind the work: • Investigation of symbol quantisation and its impact on compression quality.
• Understanding the impact of a chosen symbol spread on the ANS compression quality.
• Designing an algorithm that allows to build ANS instances that maximise compression quality and equivalently minimises the average encoding length.
Contributions.We claim the following ones: • Proof of optimality of some ANS instances.
• Introduction of Markov chains to calculate ANS compression rates.Note that Markov chains have been used in the work [4] to analyse ANS-based encryption.• Designing "good" ANS instances whose state probabilities follow the approximation log 2 e/x, where x ∈ I.
• A probabilistic algorithm that permits to build ANS, whose compression rate is close or alternatively equal to the best possible.The algorithm uses a pseudorandom number generator (PRNG) as a random "coin".• An improvement of the Duda-Niemiec ANS cryptosystem that selects at random ANS instances with best compression rates.The rest of the work is organised as follows.Section II describes ANS and its algorithms.Section III studies the case when ANS produces optimal compression.Section IV shows how Markov chains can be used to calculate ANS state equilibrium probabilities and consequently, average lengths of ANS encodings.Section V presents an algorithm that produces ANS instances, whose state probabilities follow the approximation log 2 e/x.Section VI describes an algorithm that permits to obtain the best (or close to it) compression rate.Section VII suggests an alternative to the Duda-Niemiec ANS encryption.The alternative called cryptographic ANS allows to design an ANS secret instance, whose compression rate is close to the best.Section VIII presents results of our experiments and finally Section IX concludes our work.

II. ASYMMETRIC NUMERAL SYSTEMS
Here we do not describe the ideas behind ANS design.Instead, we refer the reader to original papers by Duda [6], [7] and the ANS wikipedia page, whose URL address is given in the previous section.An ANS algorithm suite includes initialisation, compression (symbol frame encoding) and decompression (binary frame decoding).Algorithm 1 shows initialisation.It also serves as a reference for basic ANS notation.

Algorithm 1: ANS Initialisation
Input: a set of symbols S, their probability distribution p : S → [0, 1] and a parameter R ∈ N + .Output: instantiation of encoding and decoding functions: • C(s, x) and ks(x); • D(x) and k(x).Steps: proceed as follows: • calculate the number of states L = 2 R ; • determine the set of states I = {L, . . ., 2L − 1}; • compute integer Ls ≈ Lps, where ps is probability of s and s ∈ S; • choose symbol spread function s : I → S, such that |{x ∈ I : s(x) = s}| = Ls; • establish coding function C(s, y) = x for y ∈ {Ls, . . ., 2Ls − 1}, which assigns states x ∈ Ls according to symbol spread function; • compute ks(x) = ⌊lg(x/Ls)⌋ for x ∈ I and s ∈ S. It gives the number of output bits per symbol; • construct decoding function D(x) = (s, y), which for x ∈ I, assigns its unique symbol (given by the symbol spread function) and the integer y, where which determines the number of bitsthat need to be read out from the bitstream; A compression algorithm accepts a symbol frame s and outputs a binary frame b.In most cases, probability distribution of frame symbols is unknown so it has to be calculated.This is done by pushing symbols one by one to a stack and counting occurrence of each symbol.When the last symbol s ℓ is processed, the frame statistics is known and compression can start.It means that frame encoding starts from the last symbol s ℓ .Algorithm 2 describes the ANS compression steps.Note that the

A. ANS Example
Given a symbol source S = {s 1 , s 2 , s 3 }, where p 1 = 3 16 , p 2 = 5 16 , p 3 = 8 16 and the parameter R = 4.The number of states is L = 2 R = 16 and the state set equals to I = {16, 17, . . ., 31}.A symbol spread s : I → S is assumed to be  example is used throughout the work to illustrate our considerations.

III. OPTIMAL ANS
Given an ANS instance designed for a symbol source with its probability distribution p = {p s ; s ∈ S} and the parameter R. The set of all ANS states is I = {2 R , . . ., 2 R+1 − 1}.ANS is optimal if the average length of binary encoding is equal to the symbol entropy or where a symbol frame s = (s 1 , s 2 , . . ., s ℓ ) consists of ℓ symbols.The same symbols of the frame can be grouped so we get Note that where k s = ⌊log 2 (x/L s )⌋ and x ∈ {2 R , . . ., 2 R+1 − 1}.The length k s of encoding depends on the state x.The shortest encoding is when x = 2 R and the longest when )⌋ points to a single encoding length k s = i.The above leads us to the following conclusion.
Lemma 2. Given ANS described by Algorithm 2. Then a symbol s is encoded into a binary string of the length k s , where This includes an interesting case when 2 −1 < p s < 2 0 with k s ∈ {0, 1} -some symbols are encoded into void bits ∅.
The above lemmas lead us to the following conclusions.
• ANS provides optimal encoding for a symbol, whose probability is a natural power of 1/2.
• Optimal ANS exists for symbol sources, whose all symbol probabilities are natural powers of 1/2.
• ANS and the Huffman code share compression optimality.Unlike the Huffman code, ANS compresses well also symbol with an arbitrary probability distribution.Table I shows encoding for three symbols.One of them occurs with probability p s1 = 1/2.Note that ANS assigns a single bit encoding for all ANS states.This illustrates optimal encoding of s 1 .

IV. STATE PROBABILITIES AND MARKOV CHAINS
ANS can be looked at as FSM, whose internal state x ∈ I changes during compression.Behaviour of ANS states is probabilistic and can be characterised by state probability distribution {p x ; x ∈ I}.For a given symbol s ∈ S, the average encoding length of s is κ(s) = x∈I k s (x)p x , where k s (x) is the length of an encoding assigned to s when ANS is in the state x (see Algorithm 2).The average length of ANS encodings is κ = s∈S p s κ(s).When we deal with an optimal ANS, then κ = H(S), which also means that κ(s) = H(s) = log 2 p −1 s for all s ∈ S. Typically, compression quality is characterised by a ratio between the length of symbol frame and the length of binary frame.A better measure from our point of view is a residual redundancy per symbol, which is defined as ∆H = κ − H(S).The measure has the following advantages: • easy identification of an optimal ANS instance when its ∆H = 0; • quick comparison of two ANS instances -a better ANS has a smaller ∆H; • fast calculation of the length of a redundant part of binary frame, which is ℓ • ∆H bits, where ℓ is the number of symbols in the input frame.To determine ∆H for an ANS instance, it is necessary to calculate probability distribution {p x ; x ∈ I}.It depends on a (random) selection of symbol spread.Fortunately, ANS state transition can be modelled as a Markov chain [2], [10].It is reasonable to expect that probabilities of ANS states of their Markov chain attain an equilibrium after processing a sufficiently long sequence of symbols.Note that the ANS Markov chain is in equilibrium when state probabilities do not change after processing single symbols.Algorithm 4 shows steps for construction of a system of linear equations, whose solution gives

Algorithm 4: Equilibrium of ANS States
Input: ANS encoding table E(s, x) and symbol probability distribution {ps; s ∈ S}.Output: probability distribution Peq = {px; x ∈ I} of ANS Markov chain equilibrium.

Steps:
• initialise 2 R × 2 R matrix M to all zeros except the main diagonal M[i, i] = −1 for i = 0, . . ., 2 R − 1 and the last row, where M[2 R − 1, j] = 1 for j = 1, . . ., 2 R ; • create a vector B with 2 R entries with all zero entries except the last entry equal to 1; equilibrium probability distribution P eq = {p x ; x ∈ I}.The encoding table E(s, x) = x ′ shows transition of a state x to the next state x ′ while encoding/compressing a symbol s.
The following remarks summarise the above discussion: • ANS offers a very close to optimal compression, when the parameter R is big enough so L s = 2 R p s for all s ∈ S. In general, however, for arbitrary symbol statistics, ANS tends to leave a residual redundancy ∆H = 0. • In some applications, where ANS compression is being used heavily (for instance, video/teleconference streaming), it makes sense to optimise ANS so ∆H is as small as possible.Note that the smaller ∆H the more random are binary frames.This may increase security level of systems that use compression (for instance, joint compression and encryption).• It is possible to find the best ANS instance by exhaustive search through all distinct symbol spreads.For L = 2 R states, we need to search through L! s∈S Ls! ANS instances.This is doable for a relatively small number of states.For ANS in our example, the search space includes 16! 8!•5!•3! = 720720 instances.For a bigger L (say, above 50), the exhaustive search is intractable.

V. TUNING ANS SYMBOL SPREADS
For the compression algorithms such as Zstandard and LZFSE, a typical symbol source contains n = 256 elements.To get a meaningful approximation of the source statistics, we need a bigger number L of states.A rough evaluation of compression quality penalty given in [8] tells us that choosing L = 2n incurrs a entropy loss of ∆H = 0.01 bits/symbol; if L = 4n, then ∆H = 0.003 and if L = 8n, then ∆H = 0.0006.This confirms an obvious intuition that the bigger number of states L the better approximation and consequently compression.However, there is a "sweet spot" for L, where its further increase slows compression algorithm but without a noticeable compression rate gain.
Apart from approximation accuracy of symbol probabilities so p s ≈ L s /L, symbol spread s : I → S has an impact on compression rate.Intuitively, one would expect that symbol spread is chosen uniformly at random from all L! s∈S Ls! possibilities.It is easy to notice that they grow exponentially with L. An important observation is that probability distribution of ANS states during compression is not uniform.In fact, a state x ∈ I occurs with probability that can be approximated as p x ≈ log 2 (e)/x.Note that this is beneficial for a compression rate as smaller states (with shorter encodings) are preferred over larger ones (with longer encodings).The natural probability bias of states has an impact on compression rate making some symbol spreads better than the others.Let us take a closer look at how to choose symbol spread so it maintains the natural bias.
Recall that for a given symbol s ∈ S and state x ∈ I, the encoding algorithm (see Algorithm 2) calculates k = k s = ⌊log 2 x Ls ⌋; extracts k least significant bits of the state as the encoding of s and finds the next state , where C(s, y) is equivalent to a symbol spread s and x ′ ∈ L s .Consider properties of coding function C(s, y) that are used in our discussion.
Fact 1.Given a symbol s and coding function C(s, ⌊ x 2 k ⌋).Then the collection of states I is divided into L s state intervals I i , where • each interval I i consists of all consecutive states that share the same value Corollary 1.Given an ANS instance defined by Algorithms 1, 2 and 3. Then for each symbol s ∈ S, state probabilities have to satisfy the following relations: Recall that state probabilities can be approximated by p(x) ≈ log 2 (e)/x.As Equation 1 requires summing up probabilities of consecutive states, we need the following fact.
Fact 2. Given an initial part of the harmonic series (1 + 1 2 + . . .+ 1 r ), then it can approximated as shown below r i=1 where the constant γ ≈ 0.577.It is easy to get approximation of the series ( 1 r + . . . , where α ∈ N + .Now, we ready to find out a preferred state for our symbol spread.Let us take into account Equation 1.Using the above established facts and our assumed approximation p(x) ≈ log 2 (e)/x, the left-hand side of the equation becomes where r is the first state in I i and (r + α − 1) -the last and α = 2 k .As we have assumed that the state C(s, I i ) ≈ log 2 (e) 1 y , where y points the preferred state that needs to be included into L s , we get This brings us to the following conclusion.
Corollary 2. Given ANS as defined above, then a preferred state y for I i determined for a symbol s is where I i = [r, . . ., r + α − 1] and ⌊y⌉ an integer closest to y and it is added to L s .We can compute the average encoding length by computing equilibrium probabilities, which for the above symbol spread is κ = 3619 2448 ≈ 1.4783.It turns out that this is the best compression rate as argued in the next Section.In contrast, average lengths of symbol spreads in Example 1 are 1.4790 and 1.4789.
Algorithm 5 may produce many symbol spreads with slightly different compression rates.Preferred positions need to be rounded to the closest integers.Besides, there may be collisions in sets L s (s ∈ S) that have to be removed.Intuitively, we are searching for a symbol spread that is the best match for preferred positions.In other words, we need to introduce an appropriate distance to measure the match.Definition 1.Given ANS with a symbol spread L s∈S and a collection of preferred positions L s∈S calculated according to Equation 3. Then the distance between them is computed as where y is the preferred position that is taken by x.
Finding the best match for a calculated L s∈S is equivalent to identification of a symbols spread where the symbol spread L s∈S runs through all possibilities.Algorithm 6 illustrates a simple and heuristic algorithm for • create a table with three rows and L = 2 R columns; • put all consecutive states (i.e.(L, L + 1, . . ., 2L − 1) in increasing order in the first row; • insert all numbers from L s∈S in increasing order in the second row together with their symbol labels in the third row; • read out all states from the first row that correspond to appropriate symbol labels (in the third row); After calculation of equilibrium probabilities, we get the average encoding length, which is κ = 230755 156048 ≈ 1.4787.The following observations are relevant.
• The algorithm for tuning symbol spread is very inexpensive and can be easily applied for ANS with a large number of states (bigger than 2 10 ).It gives a good compression rate.• Equation 3 gives a rational number that needs to be rounded up or down.Besides, preferred states pointed by it are likely to collide.Algorithm 5 computes a symbols spread, which follows the preferred positions.• Equation 3 indicates that preferred states are likely to be uniformly distributed within I.
• It seems that finding L * s∈S such that d(L * s∈S , L s∈S ) attains minimum does not guarantee the best compression rate.However, it results in a ANS instance with a "good" compression rate.This could be a starting point for searching ANS with a better compression rate.

A. Case Study
Consider a toy instance of ANS with three symbols that occur with probabilities {3/16, 5/16, 8/16} and 16 states (i.e.R = 4).We have implemented a software in PARI that exhaustive searches through all possible instances of ANS symbol spreads ( 16! 8!•5!•3! = 720720).For each spread, we have calculated equilibrium probabilities for the corresponding Markov chain.This allows us to compute average length of a symbol (in bits/symbol).The results are shown in Table III.Unfortunately, there  is a small fraction of ANS instances, whose equilibrium probabilities are impossible to establish as the corresponding system is linearly dependent (its rank is 15).An example of symbol spread with the shortest average encoding length is: In contrast, the following symbol spread function has the longest average encoding length  equals to the minimum 3619 2448 (the optimum compression).We have run the algorithm 10 5 times.The results are presented in Table IV.The main efficiency measure is the total number of swaps of ANS state pairs that is required in order to achieve optimum compression.Note that each state flip forces algorithm to redesign ANS, to compute its state equilibrium probabilities and to evaluate average encoding length.As the algorithm uses PRNG, the number of state swaps varies.The algorithm works very well and successfully achieves the optimal compression every time.In the table, we introduce "good swaps".It means that any good swap produces ANS instance whose average encoding length gets smaller.This also means that in the worst case, we need only 19 swaps to produce optimal ANS.Optimality here is understood as the minimum residual redundancy.
Complexity of Algorithm 7 is O(θL 3 ), where θ is the number of iterations of the main loop of the algorithm.The most expensive part is Gaussian elimination needed to find equilibrium probabilities.It takes O(L 3 ) steps.We assume that we need θ swaps to obtain a required redundancy with high probability.In other words, the algorithm becomes very expensive, when the number of states L is bigger than 2 10 .This is a bad news.A good news is that a system of linear equations defining equilibrium probabilities is sparse.Consider a simple geometric symbol probability distribution {1/2, 1/4, . . ., 1/1024, 1/1024}.Assume that ANS has L = 2 10 states.A simple calculation points that the matrix M in the relation M • X = B for Markov equilibrium has ≈ 99% zeros.It means that we can speed up search for optimal ANS in the following way: • use a specialised algorithms for Gaussian elimination that target sparse systems.There are some mathematical packages (such as MatLab for example) that include such algorithms; • apply inversion of sparse matrices (such as this from [5], whose complexity is O(L 2.21 )).First we solve M • X = B by computing M −1 .This allows us to find X.Now we swap two states that belong to different symbols.Now we need to solve M • X = B, where M is system that describes equilibrium after the swap.We build In other words, the solution X can be obtained from X by multiplying it by M −1 M .
Further efficiency improvements can be achieved by taking a closer look at swapping operation.For the sake of argument, consider a swap of two states and their matrices M and M that describe their Markov chain probabilities before and after swap, respectively.There are essentially two distinct cases, when the swap is • simple, i.e.only two rows of M are affected by the swap.Matrix entries outside the main diagonal are swapped but entries on the main diagonal need to be handle with care as they do not change if their values are −1.For instance, take ANS from Table I The matrix M after swap can be obtained by M = M ∆, where ∆ is a sparse matrix that translates M into M .As argued above, equilibrium solution is X = ∆ −1 X, where ∆ −1 = M −1 M .We still need to calculate M −1 , however, it is possible to recycle part of computations done while computing M −1 .This is possible as M and M share the same entries (except the two rows that correspond to the state swap), • complex, i.e. more than two rows need to be modified.This occurs when a swap causes a cascade of swaps that are needed to restore the increasing order in their respective L s .For example, consider ANS from To obtain a valid ANS instance, we need two extra swaps.Calculation of M −1 can still be supported by computations for M −1 but the matrices M and M differ on more than two rows.

C. Optimalisation with Quantisation
So far we have assumed that L s = p s • L; s ∈ S or at least L s /L is a very close approximation of p s .However, this is not always true.In practice, there are two issues that need to be dealt with.
• The first is the fact that p s • L may have two "good" approximations when α s < p s L < (α s + 1), where α s ∈ N + .So, we can choose L s ∈ {α s , α s + 1}.This occurs more frequently when the number L is relatively small.• The second issue happens when there is a tail of symbols whose probabilities are small enough so p s •L < 1.It means that that symbols have to be assigned to single states by symbol spread or L s = 1.This is equivalent to an artificial increase of symbol probabilities of the tail to 1/L.Note that s∈S p s = 1, which is equivalent to s∈S L s = L. Consequently, numbers of states in symbol spread of other symbols have to be reduced.
The research question in hand (also called quantisation) is defined as follows.Given a symbol probability distribution P = {p s |s ∈ S}.How to design ANS so its compression is optimal (or close to it), where ANS is built for the symbol probability distribution Q = {q s |s ∈ S}, where Q approximates P, where L s = q s • L ∈ N + , q s ≈ p s and n = |S|.
Let us consider the following algorithms.
• Exhaustive search through all possible quantised ANS, where an ANS instance is determined by a selection of L s ← {α s , α s + 1}, where s ∈ S. For each selection, we run Algorithm 7. Finally, we choose the ANS instance with the best compression rate.Unfortunately, complexity of the algorithm is exponential or more precisely, O(2 n θL 3 ).A possible tail of t symbols reduces complexity as all t symbols are assigned a single state and the next t low probability symbols are assigned α s states.Higher L s = α s + 1 are ignored in order to compensate states that have been taken by the tail symbols.Note also that some of selections for L s must be rejected if s∈S L s = L. • Best-fit quantised ANS.For all symbols with high probabilities, we choose L s = α s +1.For symbols with low probabilities, we select L s = α s .This choice is balanced by the number of symbols in the tail where L s = 1.The intuition behind the algorithm is the fact that selection of higher L s reduces average length of encodings and vice versa.This gives us a unique or close to it solution for L s ; s ∈ S. Now we can run Algorithm 7 to find the optimal (or close to it) ANS.The best-fit algorithm and Algorithm 7 share the same complexity.
To recap our discussion, we make the following remarks: • Finding ANS with optimal compression is of a prime concern to everybody who would like to either maximise communication bandwidth or remove as much redundancy as possible from binary frames.We model asymptotic behaviour of ANS by Markov chains.Calculated equilibrium probabilities allow us to precisely determine the average length of binary encodings and consequently ANS compression rate.
• Search for optimal ANS can be done in two steps.(1) We run Algorithm either 5 or 6 that tunes symbol spread using approximation of state probabilities.The algorithm is very efficient and its complexity is O(L).Unfortunately, it does not guarantee that calculated ANS instance is optimal.(2) Next we execute Algorithm 7. Its initial symbol spread has been calculated in the previous step.The algorithm is probabilistic and attains an optimal (or close to it) ANS with a high probability.• Algorithm 7 can be sped up by using a specialised sparse matrix inversion algorithm together with reusing computations from previous inversions.This allows us to find optimal or close to optimal ANS for the number L of states in the range [2 10 , 2 12 ].The range is the most used in practice.

VII. CRYPTOGRAPHIC ANS
Duda and Niemiec [9] have proposed a randomised ANS, where symbols spread is chosen at random.To make it practical, the authors suggest to replace a truly random source by a pseudorandom number generator (PRNG), which is controlled by a relatively short random seed/key.Two communicating parties can agree on a common secret key K.Both sender and receiver use it to select their symbol spread using PRNG controlled by K.The sender can build an appropriate encoding table, while the receiver -the matching decoding table.Consequently, the parties can use compression as an encryption.Although, such encryption does not provide a high degree of protection (especially against integrity and statistics attacks -see [3], [4]), it could be used effectively in low-security applications.A price to be paid is a complete lack of control over compression rate of ANS.This weakness can be mitigated by applying our Algorithm 7. Note that the symbol spread {L s |p s ∈ S} is public but  V. Unlike the Duda-Niemiec ANS, it achieves optimal (or close to it) compression.But the effective security level (the length of cryptographic key) is determined by the number of ANS instances produced by Algorithm 7.For instance, ANS from Table III guarantees 15-bit security (or log 2 30240 ≈ 15).For ANS with a large number of states, it is difficult to determine precise security level.But this may be acceptable for low-security applications.

VIII. EXPERIMENTS
Algorithm 7 has been implemented using the Go language and has been executed on a MacBook Pro with M1 chip.The algorithm has been slightly modified so it finds both the lower and upper bounds for ∆H.The lower bound points ANS, which is close to optimal.In contrast, the upper bound shows ANS, whose residual redundancy ∆H is big (close to the worst cases).Note that a random selection of spreads produces ANS instances, whose ∆Hs fall somewhere between the bounds.The following results have been obtained for 10 5 iterations of the FOR loop.We see that due to probabilistic nature of the Algorithm 7 even after a large number of iterations, there is a non-zero chance for finding a spread with lower ∆H.
In practise, there are time restrictions imposed on the time needed for execution of Algorithm 7. The following results illustrate how much time is needed between two consecutive good swaps that improve ∆H.The number of iterations of the FOR loop is 10 Note that matrix inversion consititues the main computational overhead of Algorithm 7. The experiments presented above have applied a standard Gaussian elimination (GE) for matrix inversion, whose complexity is O(L 3 ).Algorithm 7 can be sped up by (1) using a more efficient algorithm for sparse matrix inversion (SMI) and (2) recycling computations from previous matrix inversions.The table below gives complexity of Algorithm 7 for different matrix inversion algorithms and for classical and quantum computers.

Classical Computer
Quantum Computer GE SMI [5] SMI [12] GE [11] O(θL 3 ) O(θL 2.21 ) O(θL log L) O(θ(log L) 3 ) The experiments have confirmed that Algorithm 7 works well and is practical for L < 128.However, for a larger L, it gets slower and quickly becomes impractical.The good news is that there is a significant room for improvement by applying fast sparse matrix inversion algorithms together with recycling computations from previous matrix inversions.However, optimisation of Algorithm 7 is beyond of the scope of this work.Note that the algorithm becomes very fast when it uses quantum matrix inversion.

IX. CONCLUSIONS
The work addresses an important practical problem of compression quality of the ANS algorithm.In the description of ANS, its symbol spread can be chosen at random.Each symbol spread has its own characteristic probability distribution of ANS states.Knowing the distribution, it is possible to compute ANS compression rate or alternatively its residual redundancy ∆H.
We present two algorithms that allows a user to choose symbol spreads that minimise ∆H.Algorithm 5 determines an ANS instance (its symbol spread) whose state probabilities follow the natural ANS state bias.It is it fast even for L > 2 12 but unfortunately, it does not provide the minimal/optimal ∆H.Algorithm 7 provides a solution.It is able to find minimal ∆H with a probability that depends on the number of random coin tosses θ.
We have conducted an experiment for L = 16 that shows the behaviour of average length of ANS encodings.Further experiments have confirmed that matrix inversion creates a bottleneck in Algorithm 7 and makes it impractical for large L.An immediate remedy is an application of specialised algorithms for sparse matrix inversion together with recycling computations from previous matrix inversions.Development of a fast version of Algorithm 7 is left as a part of our future research.
The main research challenge is, however, to construct ANS instances in such a way that their minimum residual redundancy is guaranteed by design.It means that we have to understand interplay between symbol spreads and their equilibrium probabilities.This points to an interesting connection between ANS and random graphs [1].
. Let us swap states 25 and 26.Rows of M before swap are Lemma 1.Given a symbol source S whose probability distribution is p = {p s ; s ∈ S} and ANS for S. Then ANS is optimal if and only if 1 ℓ s∈s |b s | = p s log 2 p −1 s for all s ∈ S, where s is a symbol frame, which repeats every symbol ℓ • p s times.Let us consider how ANS encodes symbols.According to Algorithm 2, a symbol s is encoded into b 1 ℓ s∈s |b s | = p s log 2 p −1s .So we have proven the following lemma.
The equilibrium probabilities (p 16 , . . ., p 31 ) are After running Algorithm 4, we get equilibrium probabilities (p 16 , . . ., p 31 ) as follows: Consider again the ANS from Section II with L = 16 states and three symbols S = {s 1 , s 2 , s 3 } that occur with probabilities 3/16, 5/16 and 8/16, respectively.We follow Algorithm 6 and get the following table: • return s or equivalently L * s for s ∈ S;

TABLE III DISTRIBUTION
OF AVERAGE LENGTHS OF ANS ENCODINGS FOR THE TOY ANS.NOTE THAT H(S) ≈ 1.477 An idea is to start from a random symbol spread.Then, we continue swapping pairs of ANS states.After, each swap we calculate residual redundancy of a new ANS instance.If the redundancy is smaller than for the old instance we keep the change.Otherwise, we select a new pair of states for a swap.Details are given in Algorithm 7. We have implemented the Algorithm 7: Search for Optimal ANS Input: symbol probability distribution {ps; s ∈ S} and parameter R such that Ls = ps2 R for all s Output: symbol spread s or encoding table E(s, x) of ANS with the smallest residual redundancy Steps: • initialise symbol spread s using PRNG; • determine the corresponding E(s, x) and calculate its redundancy ∆H; • assume a required minimum redundancy threshold T ; while ∆H ≥ T do for x = 2 R , . . ., 2 R+1 − 1 do y ← P RN G(I), i.e. random selection of state from I; if s(x) = s(y) then • calculate equilibrium probabilities for ANS with swapped states; • determine average encoding length and ∆Htemp; if ∆Htemp < ∆H then • update s by swapping x and y; • return s and ∆H; algorithm in the PARI/GP environment.Our experiment is run for 16-state ANS and three symbols that occur with probabilities {3/16, 5/16, 8/16}.As the algorithm is probabilistic, its behaviour varies depending on specific coin tosses when choosing state swaps (or {24, 25, 26}, {27, 28, 29, 30, 31}, {16, 17, 18, 19, 20, 21, 22, 23}).Starting symbol spread is the one with the longest average encoding length as this is the worst case.The algorithm continues to swap state pairs until the average length 28, 29, 30, 31} = L2 s3 if x ∈ {16, 17, 18, 19, 20, 21, 22, 23} = L3 Clearly, a designer of ANS is likely to use a pseudorandom number generator to select symbol spreads.There is a better than 1/2 probability that a such instance has the average encoding length somewhere in the interval 1.48, 1.49].It is interesting to see how quickly such instance approaches the best cases.B. Optimal ANS for Fixed ANS Parameters

TABLE IV NUMBER
OF SWAPS WHEN SEARCHING FOR OPTIMAL ANS INSTANCES WITH 16 STATES AND 3 SYMBOLS Average of Swaps Min of Swaps Max of Swaps Min# Good Swaps Max# Good Swaps Table I and swap states 22 and 26.
Public P = {p s |s ∈ S} − − → Public P = {p s |s ∈ S} • Run Algorithm 6 → {L s |s ∈ S} • Run Algorithm 6 → {L s |s ∈ S} s |p s ∈ S} is secret as to reconstruct it, an adversary needs to recover K and execute Algorithm 7. Cryptographic ANS is illustrated in Table 5.