The Optimal Fix-Free Code for Anti-Uniform Sources

: An n symbol source which has a Huffman code with codelength vector L n = (1 , 2 , 3 , · · · , n − 2 , n − 1 , n − 1) is called an anti-uniform source. In this paper, it is shown that for this class of sources, the optimal ﬁx-free code and symmetric ﬁx-free code is C ∗ n = (0 , 11 , 101 , 1001 , · · · , 1 n − 2 (cid:122) (cid:125)(cid:124) (cid:123) 0 · · · 0 1) .


Introduction
One of the basic problems in the context of source coding is to assign a code C n = (c 1 , c 2 , • • • , c n ) with codelength vector L n = ( 1 , 2 , • • • , n ) to a memoryless source with probability vector P n = (p 1 , p 2 , • • • , p n ).Decoding requirements often constrain us to choose a code C n from a specific class of codes, such as prefix-free codes, fix-free codes or symmetric fix-free codes.With a prefix-free code, no codeword is the prefix of another codeword.This property ensures that decoding in the forward direction can be done without any delay (instantaneously).Alternatively, with a fix-free code no codeword is the prefix or suffix of any other codeword [1,2].Therefore, decoding of a fix-free code in both the forward and backward directions can be done without any delay.The ability to decode a fix-free code in both directions makes them more robust to transmission errors and faster decoding can be achieved compared to prefix-free codes.As a result, fix-free codes are used in video standards such as H.263+ and MPEG-4 [3].A symmetric fix-free code is a fix-free code whose codewords are symmetric.In general, decoder implementation for a fix-free code requires more memory compared to that for a prefix-free code.Although the decoder for a symmetric fix-free code is the same as for a fix-free code [4], symmetric codes have greater redundancy in general.
Let S(L n ) = n i=1 2 − i denote the Kraft sum of the codelength vector L n .A well-known necessary and sufficient condition for the existence of a prefix-free code with codelength vector L n is the Kraft inequality, i.e., S(L n ) ≤ 1 [5].However, this inequality is only a necessary condition on the existence of a fix-free code.Some sufficient conditions on the existence of a fix-free code were introduced in [6][7][8][9][10][11].
The optimal code for a specific class of codes is defined as the code with the minimum average codelength, i.e., n i=1 p i i among all codes in that class.The optimal prefix-free code can easily be obtained using the Huffman algorithm [12].Recently, two methods for finding the optimal fix-free code have been developed.One is based on the A * algorithm [13], while the other is based on the concept of dominant sequences [14].Compared to the Huffman algorithm, these methods are very complex.
A source with n symbols having Huffman code with codelength vector is called an anti-uniform source [15,16].Such sources have been shown to correspond to particular probability distributions.For example, it was shown in [17] and [18], respectively, that the normalized tail of the Poisson distribution and the geometric distribution with success probability greater than some critical value are anti-uniform sources.It was demonstrated in [15,16] that a source with probability vector As mentioned above, finding an optimal fix-free or symmetric fix-free code is complex.Thus, in this paper optimal fix-free and symmetric fix-free codes are determined for anti-uniform sources.In particular, it is proven that is an optimal fix-free code for this class of sources.Since C * n is symmetric, this code is also an optimal symmetric fix-free code.Although for an anti-uniform source, the difference between the average codelength of the optimal prefix-free code and C * n is small (it is exactly equal to p n ), it is not straightforward to prove that C * n is the optimal fix-free code.In [19], the optimality of C * n among symmetric fix-free codes for a family of exponential probability distributions, which is an anti-uniform source, was discussed.
In [20], another class of fix-free codes called weakly symmetric fix-free codes was examined.A fix-free code is weakly-symmetric if the reverse of each codeword is also a codeword.In fact, every symmetric fix-free code is a weakly symmetric fix-free code, and every weakly symmetric fix-free code is a fix-free code.Thus, since the optimal code among fix-free codes and symmetric fix-free codes for anti-uniform sources is C * n , this code is also optimal for weakly symmetric fix-free codes.
The remainder of this paper is organized as follows.In Section 2, a sketch of the proofs of the main theorems, i.e., Theorems 1 and 3, is provided, followed by the main results of the paper.Then detailed proofs of these results are given in Section 3.

A Sketch of the Proofs
Since a fix-free code is also a prefix-free code, the Kraft sum of an optimal fix-free code is not greater than 1.Therefore, the Kraft sum of this code is either equal to 1 or smaller than 1.
It can be inferred from Proposition 1 that if the Kraft sum of an optimal fix-free code is smaller than 1, then the average codelength of this code is not better than the codelength vector * The optimal prefix-free code for an anti-uniform source has codelength vector Therefore, the optimal codelength vector with Kraft sum smaller than 1 for an anti-uniform source is the codelength vector Thus, if the Kraft sum of the optimal fix-free code for an anti-uniform source is smaller than 1, then the code C * n is optimal.
Proposition 2. There is no symmetric fix-free code with Kraft sum 1 for n > 2.
According to Proposition 2, the Kraft sum for an optimal symmetric fix-free code is smaller than 1.Thus, Propositions 1 and 2 prove the following theorem.
Theorem 1.The optimal symmetric fix-free code for an anti-uniform source P n is the code C * n .
There exist fix-free codes with Kraft sum 1, for example (00, 01, 10, 11) and (01, 000, 100, 110, 111, 0010, 0011, 1010, 1011) [21].Therefore, proving that the code C * n is the optimal fix-free code for anti-uniform sources requires that the average codelength for this code be better than every possible codelength for a fix-free code.To achieve this, we use the following theorem which was proven in [21].j) where (i, j) denotes the greatest common divisor of i and j.
then no fix-free code exists with codelength vector L n .
According to the definition of H i , we have that H 1 = 0 and H 2 = 1.Therefore from Theorem 2, for L n with Kraft sum 1 and there is no fix-free code.
Definition 1.For a given n, let From Theorem 2, if the Kraft sum of the optimal fix-free code is equal to 1, then the average codelength for this code is not smaller than that of the optimal codelength vector among those in L n for n > 4. It can easily be verified that |L n | = 0 for n < 7.For anti-uniform sources, the following proposition characterizes the optimal codelength vector in L n for n ≥ 7.
The last step requires that the average codelength of C * n is better than that of the given codelength vector in Proposition 3.This is given in the proof of the following theorem.Theorem 3. The optimal fix-free code for an anti-uniform source P n is C * n for n > 4.
Note that Theorem 3 is not true for n = 4.For example, for P 4 = ( 1 3 , 1 3 , 1 6 , 1 6 ) which is the probability vector of an anti-uniform source, the average codelength of the fix-free code (00, 01, 10, 11) is better than that of C * 4 .

Proofs of the Results in Section 2
Proof of Proposition 1: and consequently Therefore, we can write and consequently This shows that the average codelength of * 1 , • • • , * n−1 , * n + 1 is better than any other codelength vector, say L n , with Kraft sum smaller than 1.Proof of Proposition 2: Suppose that the Kraft sum of L n , which is the codelength vector of the code However, both c and c cannot be symmetric because x 1 = 0 and x 1 = 1 cannot both be true.Thus, C n is not a symmetric fix-free code.
The following lemma will be used in the proof of Proposition 3.
(2) * n > 3: The proof for this case is similar to the proof of the Huffman algorithm.Let

and we can write
where (a) follows from * n = * n−1 , (b) follows from the definition of L n−1 and (3), and (c) follows from (4).Therefore, for n > 7, since the average codelength of the given codelength vector in ( 5) is equal to n−1 i=1 p i i + p n−1 + p n , this codelength vector is optimal and the proof is complete.
Proof of Proposition 3. The proposition is proved by induction on n.According to Lemma 1, the base of induction, i.e., n = 7, is true.Assume that the proposition is true for all anti-uniform sources with n − 1 symbols.Let P n = (p 1 , • • • , p n ) be the probability vector of an anti-uniform source.Also, suppose that