Fair Numerical Algorithm of Coset Cardinality Spectrum for Distributed Arithmetic Coding

As a typical symbol-wise solution of asymmetric Slepian-Wolf coding problem, Distributed Arithmetic Coding (DAC) non-linearly partitions source space into disjoint cosets with unequal sizes. The distribution of DAC coset cardinalities, named the Coset Cardinality Spectrum (CCS), plays an important role in both theoretical understanding and decoder design for DAC. In general, CCS cannot be calculated directly. Instead, a numerical algorithm is usually used to obtain an approximation. This paper first finds that the contemporary numerical algorithm of CCS is theoretically imperfect and does not finally converge to the real CCS. Further, to solve this problem, we refine the original numerical algorithm based on rigorous theoretical analyses. Experimental results verify that the refined numerical algorithm amends the drawbacks of the original version.


Introduction
As an important branch of network information theory, Distributed Source Coding (DSC) can find broad potential applications in many scenarios (e.g., wireless sensor network, genome compression, etc.). Just as traditional source coding, DSC has two forms. Lossless DSC is also called Slepian-Wolf Coding (SWC) [1]. The general form of lossy DSC is referred to as Berger-Tung coding [2], while a special case of asymmetric lossy DSC with side information at the decoder is referred to as Wyner-Ziv Coding (WZC) [3]. Up to now, the most important works on SWC have focused on its asymmetric form. Let X and Y be two correlated discrete random variables. Given Y available only at the decoder, the achievable rates of compressing X without loss are bounded by R ≥ H(X|Y). Different from SWC, an important property of WZC is that it usually suffers rate loss when compared to traditional lossy coding with side information available at both the encoder and the decoder. However, if the difference between source and side information is an independent Gaussian random variable, there is no rate loss [4,5].
This paper treats only asymmetric SWC. The asymmetric SWC problem is in essence a channel coding problem [6][7][8][9]. To show this point, one can take Y as a noisy version of X corrupted by virtual channel noise and take the bitstream of X as the index of the coset containing X. If the elements in each coset are spaced as far (in Hamming distance) as possible and Y is near (in Hamming distance) enough to X, then X can be recovered from its bitstream with the help of Y. From this viewpoint, the bitstream of X is actually the syndrome of a coset code. Hence traditionally, asymmetric SWC was implemented by channel codes (e.g., Turbo codes [10], Low-Density Parity-Check (LDPC) codes [11], and polar codes [12], etc.).
Arithmetic coding is usually deemed as the most important method for lossless data compression [13,14]. Due to the duality between source coding and channel coding, arithmetic coding can be easily modified to achieve the purpose of error detection and correction [15][16][17]. In 2007, people found that after some slight modifications, arithmetic coding can also serve as a coset code to implement asymmetric SWC. There are mainly two approaches: One is bit puncturing [18] and the other is interval enlarging [19][20][21]. This paper will focus only on the latter approach, which is often referred to as Distributed Arithmetic Coding (DAC), while ignoring the former. For independent and identically-distributed (i.i.d.) binary sources, DAC is in general inferior to channel codes, e.g., LDPC codes, polar codes, etc., as shown by the experimental results in [22]. However, for binary sources with memory or nonbinary sources, DAC performs significantly better than channel codes [23][24][25][26].
In nature, DAC is a many-to-one nonlinear mapping that partitions source space into disjoint cosets of unequal cardinalities. Then an important problem is how DAC coset cardinality is distributed, which can be answered by the so-called Coset Cardinality Spectrum (CCS). The idea of CCS budded in [27,28] and was formally defined for uniform and nonuniform binary sources in [29,30], respectively. The concept of CCS is very useful as it not only can serve as a theoretical tool to analyze the properties of DAC [31,32], but also can be used to derive correct decoding formulae [31,32] (See the discussion in the last paragraph of Section 2).
If the stream of source symbols is grouped into length-n blocks, then DAC CCS is a tuple of n + 1 probability density functions (pdfs). It is impossible to deduce the exact closed form for each pdf directly. Instead, one can begin with the final pdf, which is usually simple and calculable, and then derive each pdf via a backward recursion. In [29], a numerical algorithm was proposed to implement the backward recursion for uniform binary sources, and then it was generalized to nonuniform binary sources in [30].
However, the numerical algorithm proposed in [29,30] is very primitive and lacks theoretical justification. This paper will make an in-depth analysis of the numerical algorithm proposed in [29,30] and observe the results carefully. It will be found that the numerical algorithm proposed in [29,30] does not exactly converge to the real CCS. After a strict theoretical analysis, this paper will propose a novel numerical algorithm, which perfectly overcomes the drawbacks of the original numerical algorithm.
The rest of this paper is arranged as below. Section 2 briefly reviews the background knowledge of DAC and CCS. Section 3 presents the original numerical algorithm and its trivial upgrade version. Section 4 proposes the novel numerical algorithm for DAC CCS based on solid theoretical analyses. Section 5 reports some experimental results to compare three numerical algorithms. Finally, Section 6 concludes this paper.

Review on DAC and CCS
Let X n (X 1 , . . . , X n ) be a length-n binary source block with bias probability Pr(X i = 1) = p. The entropy of X is H(X) = −p log 2 p − (1 − p) log 2 (1 − p). The DAC codec recursively maps every source symbol onto an interval in [0, 1) according to the following rule where r ∈ [0, 1] is called the normalized rate or overlapping factor. Let [l(X n ), h(X n )) ⊂ [0, 1) denote the mapping interval of X n . In theory, [l(X n ), h(X n )) can be represented by − log 2 (h(X n ) − l(X n )) ∈ R bits. However, due to the indivisibility of a bit, the bitstream of X n actually includes − log 2 (h(X n ) − l(X n )) ∈ Z bits, which can be explained as a real number U 0 in [l(X n ), h(X n )). Hence, there is a rate loss of δ bits, where If we decode the bitstream U 0 along the path X n , then we will obtain a tuple of n + 1 real numbers U n 0 (U 0 , U 1 , . . . , U n ) which can be deduced recursively. The forward recursion of U n 0 can be easily transformed into an equivalent backward recursion as shown below [30].
The pdf of U i is called the i-th CCS and denoted by f n,i (u), 0 ≤ u < 1, which is usually simplified as f i (u). Especially, f 0 (u) is called the initial CCS and f n (u) is called the final CCS. At this point, we have two choices: One is beginning from the initial CCS f 0 (u) and then deducing f i (u) for 0 < i ≤ n via a forward recursion; the other is beginning from the final CCS f n (u) and then deducing f i (u) for 0 ≤ i < n via a backward recursion. Unfortunately, it is impossible to deduce the initial CCS f 0 (u) directly, so forward recursion is infeasible. Instead, it is proved in [30] that depending on the parameters p and r, f n (u) will tend to be a piecewise uniform function or Once f n (u) is known, f i (u) for 0 ≤ i < n can be deduced via a backward recursion [30].
In general, it is impossible to deduce f i−1 (u) from f i (u) analytically, even though the closed form of f i (u) is known. Hence, we have to resort to a numerical algorithm, which mimics (5) in a discrete way.
Regarding the physical meaning of CCS, there is a detailed explanation in [28]. Let us take uniform binary sources as an example. For any real number u ∈ [0, 1), if u is fed into a DAC decoder with overlapping factor r, then we will get an incomplete binary tree. In this tree, the number of level-i nodes is roughly (lim n→∞ f n,n−i ) · 2 i(1−r) . If we implement the decoder in a breadth-first way, i.e., with the M-algorithm, for every level-i node, the sub-tree grown from this level-i node will have about (lim n→∞ f n,i (u)) · 2 (n−i)(1−r) leaf nodes, where u is the real number at the level-i node. Obviously, those level-i nodes with larger (lim n→∞ f n,i (u)) should be more likely, and vice versa. That is why DAC CCS can be used to derive correct decoding formulae.

Rounding Numerical Algorithm
The first version of the numerical algorithm was proposed in [28,30] for uniform and nonuniform binary sources, respectively. This algorithm divides the interval [0, 1) into N segments and then uses a finite number . For simplicity, we usef i (j) to denote the approximation of f i (j/N). Then (5) can be discretized aŝ Now the key is to find a good mapping from j to j and j . Let · denote the rounding operation. This problem was solved by a brute-force method in [28,30] as follows: For some j ∈ [0 : N), we will have j / ∈ [0 : N) or j / ∈ [0 : N). It does not matter because we havef (j) ≡ 0 for any j / ∈ [0 : N) according to the definition of CCS. Since the core of (7) is the rounding operation, this method will be formally referred to as rounding numerical algorithm below.
Though the rounding numerical algorithm works well at first glance [28,30], its rationality is indeed flawed. On one hand, since (1 − p) −r > 1, we have j(1 − p) −r ≥ j and thus among N points off i (j)'s, only about N(1 − p) r < N points, i.e., 0 ≤ j < j max ≈ N(1 − p) r , are used for the first line of (7). On the other hand, where the last inequality is due to N > j and p −r > 1. Thus among N points off i (j)'s, only about N p r < N points, i.e., N(1 − p r ) ≈ j min ≤ j < N, are used for the second line of (7). In summary, not all N points off i (j)'s will be used to generatef i−1 (j)'s. In other words, some points off i (j)'s are discarded without being used when we try to deducef i−1 (j) according to (6), which is equivalent to the case that partial information of f i (u) is lost when we try to deduce f i−1 (u) according to (5). This is imperfect in theory and may bring two negative effects in practice: • The rounding numerical algorithm cannot generate an accurate approximation of CCS. This phenomenon will be observed in the experimental results of Section 5. • Even if ∑ N−1 j=0f i (j) = N, (6) does not strictly satisfy the normalization condition Thus an extra re-normalization step is needed after (6).

Linear Numerical Algorithm
To improve the performance of the rounding numerical algorithm, more points of f i (j)'s should be involved in generatingf i−1 (j)'s. This problem can be solved by linear interpolation. Below we will propose a trivial upgrade of the rounding numerical algorithm, which we will refer to as linear numerical algorithm. Let · and · denote the flooring and ceiling operations, respectively. Let us define α j(1 − p) −r ≥ j and β (j − N(1 − p r ))p −r < j. In general, (6) can be refined aŝ where Then we discuss some special cases: If α ≥ N, then g 0 (α) = 0. If β < 0, then g 1 (β) = 0. • If α < N and α ≥ N, then g 0 (α) =f i ( α ). If β < 0 and β ≥ 0, then g 1 (β) =f i ( β ).
However, it must be pointed out that the linear numerical algorithm still does not guarantee that all N points off i (j)'s are used to generatef i−1 (j)'s, and a renormalization step is still needed after (9).

Fair Numerical Algorithm
In the rounding/linear numerical algorithms, we takef i (j) as an approximation of f i (j/N), i.e.,f i (j) ≈ f i (j/N) for any j ∈ [0 : N), and for eachf i−1 (j), we try to find the correspondingf i (j ). This understanding is however incorrect. The correct physical meaning off i (j) should be an approximation of the scaled-up probability of U i falling into the interval [j/N, (j + 1)/N)), i.e., According to (5), we have Let u u(1 − p) −r and u (u − (1 − p r ))p −r . Let Then we can obtain Naturally, we havef where In plain words, g 0 (j) is an approximation of the scaled-up probability of U i falling into the interval I 0 , and g 1 (j) is an approximation of the scaled-up probability of U i falling into the interval I 1 .

Calculation of g 0 (j)
Let λ 0 j(1 − p) −r ≥ 0 and η 0 (j + 1)(1 − p) −r > λ 0 ≥ 0. Then we can obtain For N sufficiently large, f i (u) can be taken as uniform over [j/N, (j + 1)/N). Hence we have and According to the above analysis, we can obtain the following results.

Discussion
From the above analysis, it can be found that all N points off i (j)'s are made use of to generatef i−1 (j)'s, and such treatment is fair for everyf i (j). For this reason, we will formally refer to this method as fair numerical algorithm. Given ∑ N−1 j=0f i (j) = N, it is easy to know ∑ N−1 j=0 g 0 (j) = ∑ N−1 j=0 g 1 (j) = N. Hence, (15) satisfies the normalization property by itself and no renormalization step is needed after (15).
Let us briefly discuss the convergence of numerical algorithms. Let f (u) denote the asymptotic form of f n,0 (u) as n → ∞. According to (5), it is obvious that For the rounding/linear numerical algorithms, as analyzed above, partial parts of f (u) have been lost, so the discrete version of (32) will not exactly hold, while for the fair numerical algorithm, since all information of f (u) is reserved, the discrete version of (32) will exactly hold as the number of segments N goes to infinity.

Experimental Results
We use a classical CCS to compare three numerical algorithms. Given p = r = 0.5, for n sufficiently large, as i increases, f n−i will tend to be ladder-shaped and we have [27,28].
For simplicity, we set the final CCS f n (u) = Π(u), where Π(u) is a uniform function over [0, 1). Some results are included in Figures 1-3. In these figures, we set n = 128 and N = 1024. It can be observed that for small i (e.g., 1, 2, and 3), the rounding numerical algorithm performs almost as well as the fair numerical algorithm, while the linear numerical algorithm performs very poorly as there are some big spikes. When i = 8, small spikes are also observed for the rounding numerical algorithm, but no spike is observed for the fair numerical algorithm. Finally, for large i (e.g., 16, 32, and 64), the resulting curves of the rounding numerical algorithm always fluctuate slightly along the real CCS given by (33), and more experiments show that there is no trend that these curves will finally converge to the real CCS as i increases. On the contrary, for both the linear numerical algorithm and the fair numerical algorithm, their curves coincide with the real CCS very well for large i, and there is a trend that these curves will finally converge to the real CCS as i increases.

Conclusions
As an important property of DAC, CCS finds its broad applications in many scenarios. However, CCS is usually incalculable so we have to resort to numerical algorithms. This paper finds that the original numerical algorithm proposed in our previous papers is not sound in theory and does not work well in practice (fails to generate an accurate approximation of CCS). Based on a strict theoretical analysis, this paper proposes a novel numerical algorithm that overcomes the weaknesses of the original numerical algorithm. The superiority of the newly-proposed numerical algorithm is well validated by experimental results.
A software package of source codes to reproduce the experimental results in this paper has been released in [33].