Entropic Bounds on the Average Length of Codes with a Space

We consider the problem of constructing prefix-free codes in which a designated symbol, a space, can only appear at the end of codewords. We provide a linear-time algorithm to construct almost-optimal codes with this property, meaning that their average length differs from the minimum possible by at most one. We obtain our results by uncovering a relation between our class of codes and the class of one-to-one codes. Additionally, we derive upper and lower bounds to the average length of optimal prefix-free codes with a space in terms of the source entropy.


Introduction
Modern natural languages achieve the unique parsability of written texts by inserting a special character (i.e., a space ) between words [1] (See [2] for a few exceptions to this rule).Classical Information Theory, instead, studies codes that achieve the unique parsability of texts by imposing diverse combinatorial properties on the codeword set: e.g., the prefix property, unique decipherability, etc. [3].With respect to the efficiency of such codes (usually measured via the average number of code symbols per source symbol), it is well known that the Shannon entropy of the information source constitutes a fundamental lower bound for it.On the other hand, if one drops the property of the unique parsability of code messages into individual codewords, and simply requires that different source symbols be encoded with different codewords, one can obtain codes (known as one-to-one codes) with efficiency below the source Shannon entropy (although not too much below; see, e.g., [4,5]).
Jaynes [6] took the approach of directly studying source codes in which a designated character of the code alphabet is exclusively used as a word delimiter.More precisely, Jaynes studied the possible decrease of the noiseless channel capacity (see [7], p. 8) associated with any code that uses a designated symbol as an end-of-codeword mark, as compared with the noiseless channel capacity of an unconstrained code.Quite interestingly, Jaynes proved that the decrease of the noiseless channel capacity of codes with an end-of-codeword mark becomes negligible, as the maximum codeword length increases.
In this paper, we study the problem of constructing prefix-free codes where a specific symbol (referred to as a 'space') can only be positioned at the end of codewords.We refer to this kind of prefix code as prefix codes ending with a space.We develop a linear-time algorithm that constructs 'almost'-optimal codes with this characteristic, in the sense that the average length of the constructed codes is at most one unit longer than the shortest possible average length of any prefix-free code in which the space can appear only at the end of codewords.We prove this result by highlighting a connection between our type of codes and the well-known class of one-to-one codes.We also provide upper and lower limits of the average length of optimal prefix codes ending with a space, expressed in terms of the source entropy and the cardinality of the code alphabet.
The paper is structured as follows.In Section 2, we illustrate the relationships between prefix codes ending with a space and one-to-one codes.Specifically, we prove that, from one-to-one codes, one can easily construct prefix codes ending with a space, and we give an upper bound on the average length of the constructed codes.Successively, we show that, if we remove all the spaces from the codewords of prefix codes ending with a space, one obtains a one-to-one code.This result allows us to prove that the average length of our prefix codes ending with a space differs from the minimum possible by at most one.In Sections 4 and 3, we derive upper and lower bounds on the average length of optimal prefix codes ending with a space in terms of the source entropy and the cardinality of the code alphabet.

Relations between One-to-One Codes and Prefix Codes Ending with a Space
Let S = {s 1 , . . ., s n } be the set of source symbols, p = (p 1 , . . ., p n ) be a probability distribution on the set S (that is, p i is the probability of source symbol s i ), and {0, . . ., k − 1} be the code alphabet.We denote by {0, . . ., k − 1} + the set of all non-empty sequences on the code alphabet {0, . . ., k − 1}, k ≥ 2, and by {0, . . ., k − 1} + ⊔ the set of all non-empty k-ary sequences that ends with the special symbol ⊔, i.e., the space symbol.
A prefix-free code ending with a space is a one-to-one mapping: A k-ary one-to-one code (see [4,5,[8][9][10][11] and the references therein quoted) is a bijective mapping D : S −→ {0, . . ., k − 1} + from S to the set of all non-empty sequences over the alphabet {0, The average length of an arbitrary code for the set of source symbols S = {s 1 , . . ., s n }, with probabilities p = (p 1 , . . ., p n ), is ∑ n i=1 p i ℓ i , where ℓ i is the number of alphabet symbols in the codeword associated with the source symbol s i .
Without loss of generality, we assume that probability distribution p = (p 1 , . . ., p n ) is ordered, that is p 1 ≥ . . .≥ p n .Under this assumption, it is apparent that the best one-to-one code proceeds by assigning the shortest codeword (e.g., in the binary case, codeword 0) to the highest probability source symbol s 1 , the next shortest codeword 1 to the source symbol s 2 , the codeword 00 to s 3 , the codeword 01 to s 4 , and so on.
An equivalent approach for constructing an optimal one-to-one code, which we will use later, proceeds as follows: Let us consider the first n non-empty k-ary strings according to the radix order [12] (that is, the k-ary strings are ordered by length and, for equal lengths, ordered according to the lexicographic order).We assign the strings to the symbols s 1 , . . ., s n in S by increasing the string length and, for equal lengths, by inverse order according to the lexicographic order.For example, in the binary case, we assign the codeword 1 to the highest probability source symbol s 1 , the codeword 0 to the source symbol s 2 , the codeword 11 to s 3 , the codeword 10 to s 4 , and so on.Therefore, one can see that, in the general case of a k-ary code alphabet, k ≥ 2, an optimal one-to-one code of minimal average length assigns a codeword of length ℓ i to the i-th symbol s i ∈ S, where ℓ i is given by: Moreover, the codewords of an optimal k-ary one-to-one code can be represented as the nodes of a k-ary tree of maximum depth h = ⌈log k (n − ⌈n/k⌉)⌉, where, for each node v, the k-ary string (codeword) associated with v is obtained by concatenating all the labels in the path from the root of the tree to v. It is evident that, if we apply the above encoding to a sequence of source symbols, the obtained binary sequence is not uniquely parsable in terms of individual codewords.Let us see how one can recover unique parsability by appending a space ⊔ to judiciously chosen codewords of an optimal one-to-one code.To gain insight, let us consider the following example.Let S = {s 1 , s 2 , s 3 , s 4 , s 5 , s 6 , s 7 , s 8 , s 9 , s 10 } be the set of source symbols, and and let us assume that the code alphabet is {0, 1}.Under the standing hypothesis that p 1 ≥ . . .≥ p 10 , one has that the best prefix-free code C one can obtain by the procedure of appending a space ⊔ to codewords of the optimal one-to-one code for S is the following: Observe that we started from the codewords of the optimal one-to-one code constructed according to the second procedure previously described.Moreover, note that the codewords associated with symbols s 1 , s 2 , s 5 , and s 6 necessitate the space character ⊔ at their end; otherwise, the unique parsability of some encoded sequences of source symbols would not be guaranteed.On the other hand, the codewords associated with symbols s 3 , s 4 , s 7 , s 8 , s 9 , and s 10 do not necessitate the space character ⊔.Indeed, the codeword set {1⊔, 0⊔, 11, 10, 01⊔, 00⊔, 011, 010, 001, 000} satisfies the prefix-free condition (i.e., no codeword is a prefix of any other); therefore, it guarantees the unique parsability of any coded message in terms of individual codewords.The idea of the above example can be generalized, as shown in the following lemma.
Proof.Under the standing hypothesis that the probabilities of the source symbols are ordered from the largest to the smallest, we show how to construct a prefix-free code-by appending the special character ⊔ to the end of (some) codewords of an optimal one-to-one code for S-having the average length upper bounded by (2).Among the class of all the prefix-free codes that one can obtain by appending the character ⊔ to the end of (some) codewords of an optimal one-to-one code for S, we aim to construct the one with the minimum average length.Therefore, we have to ensure that, in the k-ary tree representation of the code, the following basic condition holds: For any pair of nodes v i and v j , i < j, associated with the symbols s i and s j , the depth of the node v j is not smaller than the depth of the node v i .In fact, if it were otherwise, the average length of the code could be improved.
Therefore, by recalling that h = ⌈log k (n − ⌈n/k⌉)⌉ is the height of the k-ary tree associated with an optimal one-to-one code, we have that the prefix-free code of the minimum average length that one can obtain by appending the special character ⊔ to the end of (some) codewords of an optimal one-to-one code for S assigns a codeword of length ℓ i to the i-th symbol s i ∈ S, where ℓ i is given by: We stress that the obtained prefix-free code is not necessarily a prefix-free code C : S −→ {0, . . ., k − 1} + ∪ {0, . . ., k − 1} + ⊔ of minimum average length.Now, we justify the expression (4).First, since the probabilities p 1 , . . ., p n are ordered in non-increasing fashion, the codeword lengths ℓ i of the code are ordered in non-decreasing fashion, that is ℓ 1 ≤ . . .≤ ℓ n .Therefore, in the k-ary tree representation of the code, it holds the desired basic condition: For any pair of nodes v i and v j , i < j, associated with the symbols s i and s j , the depth of the node v i is smaller than or equal to the depth of the node v j .Furthermore, we need to append the space character only to the k-ary strings that are the prefix of some others.Therefore, let us consider the first n non-empty k-ary strings according to the radix order [12], in which, we recall, the k-ary strings are ordered by length and, for equal lengths, ordered according to the lexicographic order.We have that the number of strings that are a prefix of some others is exactly ⌈ n k ⌉ − 1.One obtains this number by seeing the strings as corresponding to nodes in a k-ary tree with labels 0, . . ., k − 1 on the edges.The number of strings that are a prefix of some others (among the n strings) is exactly equal to the number of internal nodes (except the root) in such a tree.This number of internal nodes is equal to ⌈ N−1 k ⌉ − 1, where N is the total number of nodes that, in our case, is equal to N = n + 1 (i.e., N counts also the root of the tree).
Moreover, starting from the optimal one-to-one code constructed according to our second method, that is by assigning k-ary strings to the symbols by increasing length and, for equal lengths, by inverse order according to the lexicographic order, one can verify that the ⌈ n k ⌉ − 1 internal nodes are associated with the codewords of the symbols s i , for i that goes from 1 to In fact, since the height of the k-ary tree is h = ⌈log k (n − ⌈n/k⌉)⌉ and since all the levels of the tree, except the last two, are full, we need to append the space to all symbols from 1 to k h−1 −1 k−1 − 1.While on the second-to-last level, we have to append the space only to the remaining internal nodes associated with the symbols s i , where i goes from Those remaining nodes are exactly, among all the nodes in the second-to-last level, the ones associated with the symbols that have smaller probabilities.Thus, we obtain (4).
Note that, from Lemma 1, we obtain that the average length of any optimal (i.e., of minimum average length) prefix-free code ending with a space is upper bounded by the formula (2).Furthermore, we have an upper bound on the average length of the optimal prefix-free codes ending with a space in terms of the average length of optimal one-to-one codes.
We can also derive a lower bound on the average length of optimal prefix-free codes ending with a space in terms of the average length of optimal one-to-one codes.For such a purpose, we need two intermediate results.We first recall that, given a k-ary code C, its codewords can be represented as nodes in a k-ary tree with labels 0, . . ., k − 1 on the edges.Indeed, for each node v, the k-ary string (codeword) associated with v can be obtained by concatenating all the labels in the path from the root of the tree to v. We also recall that, in prefix-free codes, the codewords correspond to the node leaves of the associated tree, while in one-to-one codes, the codewords may correspond also to the internal nodes of the associated tree.
Lemma 2. Let S = {s 1 , . . ., s n } be the set of source symbols, and let p = (p 1 , . . ., p n ), p 1 ≥ . . .≥ p n > 0, be a probability distribution on S.There exists an optimal prefix-free code ending with a space C : S −→ {0, . . ., k − 1} + ∪ {0, . . ., k − 1} + ⊔ such that the following property holds: For any internal node v (except the root) of the tree representation of C, if we denote by w the k-ary string associated with the node v, then the string w⊔ belongs to the codeword set of C.
Proof.Let C be an arbitrary optimal prefix-free code ending with a space.Let us assume that, in the tree representation of C, there exists an internal node v whose associated string w is such that w⊔ does not belong to the codeword set of C. Since v is an internal node, there is at least a leaf x, which is a descendant of v, whose associated string is the codeword of some symbol s j .We modify the encoding, by assigning the codeword w⊔ to the symbol s j .The new encoding is still prefix-free, and its average length can only decrease since the length of the newly assigned codeword to s j cannot be greater than the previous one.We can repeat the argument for all internal nodes that do not satisfy the property stated in the lemma to complete the proof.Lemma 3. Let C : S −→ {0, . . ., k − 1} + ∪ {0, . . ., k − 1} + ⊔ be an arbitrary prefix-free code, then the code D : S −→ {0, . . ., k − 1} + one obtains from C by removing the space ⊔ from each codeword of C is a one-to-one code.
Proof.The proof is straightforward.Since C is prefix-free, it holds that, for any pair s i , s j ∈ S, with s i ̸ = s j , the codeword C(s i ) is not a prefix of C(s j ) and vice versa.Therefore, since D is obtained from C by removing the space, we have four cases: Lemma 4. Let S = {s 1 , . . ., s n } be the set of source symbols, and let p = (p 1 , . . ., p n ), p 1 ≥ . . .≥ p n > 0, be a probability distribution on S, then the average of an optimal prefix-free code C : where L + is the average length of an optimal k-ary one-to-one code on S.
Proof.From Lemma 2, we know that there exists an optimal prefix-free code C with a space in which exactly ⌈ n k ⌉ − 1 codewords contain the space character at the end.Let A ⊂ {1, . . ., n} be the set of indices associated with the symbols whose codeword contains the space.Moreover, from Lemma 3, we know that the code D obtained by removing the space from C is a one-to-one code.Putting it all together, we obtain that From ( 6), we have that We notice that the difference between the expression (2) and the lower bound ( 5) is, because of (3), less than therefore, the prefix-free codes ending with a space that we construct in Lemma 1 have an average length that differs from the minimum possible by at most one.Moreover, since both the upper bound (3) and the lower bound (5) are easily computable, we can determine the average length of an optimal prefix-free code C : S −→ {0, . . ., k − 1} + ∪ {0, . . ., k − 1} + ⊔ with a tolerance at most of one.One can also see that the left-hand side of ( 7) is, often, much smaller than one.
In the following sections, we will focus on providing upper and lower bounds on the average length L + of k-ary optimal one-to-one codes in terms of the k-ary Shannon entropy H k (p) = − ∑ n i=1 p i log k p i of the source distribution p.Because of Lemma 1 and Lemma 4, this will give us the corresponding upper and lower bounds on the average length of optimal prefix-free codes ending with a space.

Lower Bounds on the Average Length
In this section, we provide lower bounds on the average length of the optimal one-to-one code and, subsequently, thanks to Lemma 4, on the average length of the optimal prefix-free code with a space.For technical reasons, it will be convenient to consider one-to-one codes that make use of the empty word ϵ, that is one-to-one mappings D ϵ : S −→ {0, 1, . . ., k − 1} + ∪ {ϵ}.It is easy to see (cf.(1)) that the optimal one-to-one code that makes use of the empty word assigns to the i-th symbol s i ∈ S a codeword of length ℓ i given by: where k is the cardinality of the code alphabet.Thus, by denoting by L + the average length of the optimal one-to-one code that does not make use of the empty word and with L ϵ the average length of the optimal one-to-one code that does use it, we obtain the following relation: Our first result is a generalization of the lower bound to the average length of the optimal one-to-one codes presented in [5], from the binary case to the general case of k-ary alphabets, k ≥ 2. Our proof technique differs from that of [5] since we are dealing with a set of source symbols of bounded cardinality (in [5], the authors considered the case of a numerable set of source symbols).
Lemma 5. Let S = {s 1 , . . ., s n } be the set of source symbols and p = (p 1 , . . ., p n ) be a probability distribution on S, with p 1 ≥ . . .≥ p n .The average length L ϵ of the optimal one-to-one code D : {s 1 , . . ., Proof.The proof is an adaptation of Alon et al.'s proof [4] from the binary case to the k ≥ 2-ary case.We recall that the optimal one-to-one code (i.e., whose average length achieves the minimum L ϵ ) has codeword lengths ℓ i given by: For each j ∈ {0, . . ., ⌊log k n⌋}, let us define the quantities q j as It holds that ∑ ⌊log k n⌋ j=0 q j = 1.Let Y be a random variable that takes values in {0, . . ., ⌊log k n⌋} according to the probability distribution q = (q 0 , . . ., q ⌊log k n⌋ ), that is ∀j ∈ {0, . . ., ⌊log k n⌋} Pr{Y = j} = q j .From (10), we have By applying the entropy grouping rule ([3], Ex. 2.27) to the distribution p, we obtain q j , . . ., , . . ., We now derive an upper bound to To this end, let us consider an auxiliary random variable Y ′ with the same distribution of Y, but with values ranging from 1 to ⌊log k Let α be a positive number, whose value will be chosen later.We obtain that By substituting log k µ µ−1 with α in the obtained inequality we obtain Since By applying ( 14) to (12) and since H k (Y) = H 2 (Y) log 2 k , we obtain From (11), we have that L ϵ = E[Y]; moreover, from the inequality (28) of Lemma 7 (proven in the next Section 4), we know that Hence, since the function f (z) = z log k 1 + 1 z is increasing in z, we can apply (16) to upper-bound the term , to obtain the following inequality: Rewriting (17), we finally obtain and that concludes our proof.
By bringing into play the size of the largest mass in addition to the entropy, Lemma 5 can be improved, as shown in the following result.Lemma 6.Let S = {s 1 , . . ., s n } be the set of source symbols and p = (p 1 , . . ., p n ), p 1 ≥ . . .≥ p n , be a probability distribution on S. The average length L ϵ of the optimal one-to-one code D : {s 1 , . . ., s n } → {0, . . ., k − 1} + ∪ {ϵ} has the following lower bounds: where Proof.The proof is the same as the proof of Lemma 5.However, we change two steps in the demonstration.First, since 1 kn Hence, by applying (20) to the right-hand side of (13), we obtain Now, by applying (21) (instead of ( 14)) to (12) and since H k (Y) = H 2 (Y) log 2 k , we obtain Here, instead of applying the upper bound: of Lemma 7 to the right-hand side of (22), we apply the improved version: proven in Lemma 8 of the Section 4.Then, we simply need to rewrite the inequality, concluding the proof.
Thanks to Lemma 4 and the formula (9), the above lower bounds on L ϵ can be applied to derive our main results for prefix-free codes with a space, as shown in the following theorems.
Analogously, by exploiting (the possible) knowledge of the maximum source symbol probability value, we have the following result.
Since lim k→∞ log k (k + 1) = 1, we have that, as the cardinality of the code alphabet increases, the constraint that the space can appear only at the end of codewords becomes less and less influential.

Concluding Remarks
In this paper, we have introduced the class of prefix-free codes where a specific symbol (referred to as a "space") can only appear at the end of codewords.We have proposed a linear-time algorithm to construct "almost"-optimal codes with this characteristic, and we have shown that their average length is at most one unit longer than the minimum average length of any prefix-free code in which the space can appear only at the end of codewords.We have proven this result by highlighting a connection between our type of codes and the well-known class of one-to-one codes.We have also provided upper and lower limits of the average length of optimal prefix-free codes ending with a space, expressed in terms of the source entropy and the cardinality of the code alphabet.
We leave open the problem of providing an efficient algorithm to construct optimal prefix-free codes ending with a space.It seems that there is no easy way to modify the classical Huffman greedy algorithm to solve our problem.It is possible that the more powerful dynamic programming approach could be useful to provide an optimal solution to the problem, as done in [14] for optimal binary codes ending with ones.This will be the subject of future investigations.