A Catalog of Self-Affine Hierarchical Entropy Functions

For fixed k ≥ 2 and fixed data alphabet of cardinality m, the hierarchical type class of a data string of length n = k for some j ≥ 1 is formed by permuting the string in all possible ways under permutations arising from the isomorphisms of the unique finite rooted tree of depth j which has n leaves and k children for each non-leaf vertex. Suppose the data strings in a hierarchical type class are losslessly encoded via binary codewords of minimal length. A hierarchical entropy function is a function on the set of m-dimensional probability distributions which describes the asymptotic compression rate performance of this lossless encoding scheme as the data length n is allowed to grow without bound. We determine infinitely many hierarchical entropy functions which are each self-affine. For each such function, an explicit iterated function system is found such that the graph of the function is the attractor of the system.


Introduction
A traditional type class consists of all permutations of a fixed finite-length data string.There is a well-developed data compression theory in which strings in a traditional type class are losslessly encoded into fixed-length binary codewords [1].One can generalize the notion of traditional type class and the resulting data compression theory in the following natural way.Let T be a finite rooted tree; an isomorphism of T is a one-to-one mapping of the set of vertices of T onto itself which preserves the parent-child relation.Let n be the number of leaves of T , let L(T ) be the set of leaves of T , and let σ be a one-to-one mapping of {1, 2, for which there exists an isomorphism φ of T such that Consider the depth one tree T = T 1 (n) in which there are n children of the root, which are the leaves of the tree.Then, the notion of T 1 (n)-type class coincides with the notion of traditional type class.Now let n = k j for positive integer j and integer k ≥ 2. Consider the depth j tree T = T j (k) with n leaves such that each non-leaf vertex has k children.Then, a T j (k)-type class is called a hierarchical type class, and k is called the partitioning parameter of the class.In the paper [2], we dealt with hierarchical type classes in which the partitioning parameter is k = 2.In the present paper, we deal with hierarchical type classes in which the partitioning parameter is an arbitrary k ≥ 2.
Given a hierarchical type class S, there is a simple lossless coding algorithm which encodes each string in S into a fixed-length binary codeword of minimal length, and decodes the string from its codeword.This algorithm is particularly simple for the case when the partitioning parameter is k = 2, and we illustrate this case in Example 1 which follows; the case of general k ≥ 2 is discussed in [3].In Example 1 and subsequently, x 1 * x 2 * • • • * x k shall denote the data string obtained by concatenating together the finite-length data strings x 1 , x 2 , • • • , x k (left to right).
Example 1.Let k = 2, and let S be the hierarchical type class of data string AABBABAB.The 16 strings in S are illustrated in Figure 1.Each string x ∈ S has a tree representation in which each vertex of tree T 3 (2) is assigned a label which is a substring of x.This assignment takes place as follows.
• The leaves of the tree, traversed left to right, are labeled with the respective left-to-right entries of the data string x. • For each non-leaf vertex v, if the strings labeling the left and right children of v are x L , x R , respectively, then the string labeling v is x L * x R if x L precedes or is equal to x R in the lexicographical order, and is x R * x L , otherwise.
In Figure 1, we have illustrated the tree representations of the strings AABBABAB and BAABBBAA.
The root label of all 16 tree representations will be the same string, namely, the first string in S in lexicographical order, which is the string AABBABAB in this case.Each string in S is encoded by visiting, in depth-first order, the non-leaf vertices of its tree representation whose children have different labels.Each such vertex is assigned bit 0 if its label is x L * x R , where x L , x R are the labels of its left and right children, and is assigned bit 1 otherwise (meaning that the label is x R * x L ).The resulting sequence of bits, in the order they are obtained, is the codeword of the string.Since both encoder and decoder will know what hierarchical type class is being encoded, the decoder will know what the root label of the tree representation should be, and then the successive bits of the codeword allow the decoder to grow the tree representation from the root downward.Before discussing the nature of the results to be obtained in this paper, we need some definitions and notation.Fix integers m, k ≥ 2, which serve as parameters in the subsequent development; k is the partitioning parameter already introduced, and m is called the "alphabet cardinality parameter" because we shall be dealing with an m-letter data alphabet, denoted A m = {a 1 , a 2 , • • • , a m }.For each j ≥ 0, we define a j-string x to be a string of length k j over A m .Note that if j ≥ 1, for each j-string x there is a unique k-tuple We wish to formally define the family S m,k of all hierarchical type classes in which the alphabet cardinality parameter is m and the partitioning parameter is k.Instead of using the tree isomorphism definition of hierarchical type class given at the beginning of the paper, we will use an equivalent inductive definition, which is more convenient in the subsequent development.First, we define the hierarchical type class of a 0-string to be the set consisting of the string itself.Given j-string x with j ≥ 1, and assume hierarchical type classes of (j − 1)-strings have been defined.Let (x 1 , • • • , x k ) be the k-partitioning of x and let S i be the hierarchical type class of x i (i = 1, • • • , k).The hierarchical type class of x is then defined as where, from now on, Π k is the set of all permutations of {1, 2, • • • , k}.A set is called a hierarchical type class of order j if it is the hierarchical type class of some j-string.A set is called a hierarchical type class if it is a hierarchical type class of order j for some j ≥ 0. The family S m,k is then the set of all hierarchical type classes, of all orders.
We define the type of j-string x to be the vector (n 1 , • • • , n m ) whose i-th component n i is the frequency of letter a i in x.For each j ≥ 0, let Λ j (m, k) be the set of all types of j-strings.Let Λ(m, k) be the union of the Λ j (m, k)'s for j ≥ 0, and let Λ + (m, k) be the union of the Λ j (m, k)'s for j ≥ 1.A type in Λ j (m, k) will be said to be of order j.If λ ∈ Λ(m, k), let λ denote the sum of the components of λ.If λ is of order j, then λ = k j .All strings in a hierarchical type class have the same type, because permuting a string does not change the type.This property is listed below, along with some other properties whose simple proofs are omitted.
• Prop.1: All strings in a hierarchical type class have the same type.
• Prop.2: For each j ≥ 0, the distinct hierarchical type classes of order j form a partition of the set of all j-strings.• Prop.3: Let λ ∈ Λ(m, k), and let S m,k (λ) denote the set of all hierarchical type classes in S m,k whose strings are of type λ.Then S m,k (λ) forms a partition of the set of all strings of type λ. • Prop.4: Let S ∈ S m,k be a hierarchical type class of order j ≥ 1.Then there is a k-tuple (S 1 , S 2 , • • • , S k ), unique up to permutation, such that each S i is a hierarchical type class of order j − 1 and S is expressible as Expression (1).
Global Hierarchical Entropy Function.The global hierarchical entropy function is the function where, in this paper, if S is a finite set, |S| shall denote the cardinality of S. H(S) shall be called the entropy of S. Given a hierarchical type class S, its entropy H(S) has the following interpretation.Suppose H(S) > 0, and we losslessly encode the strings in S into fixed-length binary codewords of minimal length (as discussed in Example 1 and in [3]).Then this minimal length is H(S) .Lemma 1.Let S be a hierarchical type class of order j ≥ 1.Let (S 1 , S 2 , • • • , S k ) be the k-tuple of hierarchical type classes of order j − 1 associated with S according to Prop. 4, and let N (S) be the number of distinct permutations of this k-tuple.Then, Proof.Represent S as the Expression (1).Formula (2) follows easily from this expression.
Remark.We see now how to inductively compute entropy values H(S), as follows.If S is of order 0, then |S| = 1 and so H(S) = 0.If S is of order j ≥ 1, assume all entropy values for hierarchical type classes of smaller order have been computed.Then Equation ( 2) is used to compute H(S).
Discussion.Let {S j : j ≥ 1} be a sequence of hierarchical type classes from S m,k such that S j is of order j (j ≥ 1).Consider the sequence of normalized entropies {H(S j )/k j : j ≥ 1}.As j becomes large, the normalized entropy H(S j )/k j approximates more and more closely the compression rate in bits per data sample that results from the compression scheme on S j .It is therefore of interest to determine circumstances under which such a sequence of normalized entropies will have a limit that we can compute.We discuss our approach to this problem, which will be pursued in the rest of this paper.A hierarchical source is defined to be a family {S(λ) : λ ∈ Λ(m, k)} in which each S(λ) is a hierarchical type class selected from S m,k (λ).(We will also impose a natural consistency condition on how these selections are made in our formal hierarchical source definition to be given in the next section.)Let R denote the real line, and let P m be the subset of R m consisting of all m-dimensional probability vectors.We consider P m to be a metric space with the Euclidean metric.For each λ ∈ Λ(m, k), let p λ be the probability vector λ/ λ in P m .Suppose there exists a (necessarily unique) continuous function h : P m → [0, ∞) such that for each p ∈ P m , and each sequence {λ j : j ≥ 0} for which λ j ∈ Λ j (m, k) (j ≥ 0) and lim j→∞ p λ j = p, the limit property h(p) = lim j→∞ H(S(λ j ))/k j holds.Then we call the function h the hierarchical entropy function induced by the source {S(λ) : λ ∈ Λ(m, k)}.A hierarchical entropy function is defined to be any function on P m which is the hierarchical entropy function induced by some hierarchical source.One of the goals of hierarchical data compression theory is to identify hierarchical entropy functions and to learn about their properties.In the paper [2], two hierarchical entropy functions were introduced.In the present paper, we go further by identifying infinitely many hierarchical entropy functions which are each self-affine, and for each one of these entropy functions, we exhibit an explicit iterated function system whose attractor is the graph of the entropy function.

Hierarchical Sources
This section is devoted to the discussion of hierarchical sources.The concept of hierarchical source was informally described in the Introduction.In Section 2.1.,we make this concept formal.In Section 2.2., we define the entropy-stable hierarchical sources, which are the hierarchical sources that induce hierarchical entropy functions.In Section 2.3., we introduce a particular type of entropy-stable hierarchical source called finitary hierarchical source.The finitary hierarchical sources induce the hierarchical entropy functions that are the subject of this paper.

Formal Definition of Hierarchical Source
Let S = {S(λ) : λ ∈ Λ(m, k)} be a family of hierarchical type classes in which each class S(λ) belongs to the set of classes S m,k (λ).Then S is defined to be a (Λ(m, k)-indexed) hierarchical source if the following additional condition is satisfied.
associated with S in Prop. 4 also belongs to S.
We discuss how the Consistency Condition gives us a way to describe every possible hierarchical source.Let Λ(m, k) k be the set of all k-tuples whose entries come from Λ(m, k).Let Φ(m, k) be the set of all mappings φ : Λ(m, k) defined inductively as follows.
• If λ ∈ Λ(m, k) is of order 0, define class S φ (λ) to be the set {a i }, where a i is the unique letter in A m whose type is λ.
assume class S φ (λ * ) has been defined for all types λ * of order less than the order of λ.
From the Consistency Condition, all possible hierarchical sources arise in this way, that is, given any Λ(m, k)-indexed hierarchical source S, there exists φ ∈ Φ(m, k) such that S = S φ .Another advantage of the Consistency Condition is that it allows the entropies of the classes in a hierarchical source to be recursively computed.To see this, let to be the function which takes the value zero on Λ 0 (m, k), and for each λ where is the number of distinct permutations of this k-tuple.
By the Consistency Condition and Lemma 1,

Entropy-Stable Hierarchical Sources
The concept of entropy-stable source discussed in this section allows us to formally define the concept of hierarchical entropy function.
For each j ≥ 0, define the finite set of probability vectors where the reader will recall that p λ = λ/ λ .Note that the sets {P m (j) : j ≥ 0} are increasing in the sense that Let P * m be the countably infinite set of probability vectors which is the union of the P m (j)'s.Suppose we have a hierarchical source S = {S(λ) : λ ∈ Λ(m, k)}.For each j ≥ 0, let h j : P m (j) → [0, ∞) be the unique function for which Because of the increasing sets property Equation ( 4), p is a member of the set P m (j) for j sufficiently large.Consequently, h j (p) is defined for j sufficiently large, and so it makes sense to talk about the limit of the sequence {h j (p) : j ≥ 0}, if this limit exists.We define the source S to be entropy-stable if there exists a continuous function h : and the function h (which is unique since P * m is dense in P m ) is called the hierarchical entropy function induced by S. Henceforth, the terminology "hierarchical entropy function" denotes a function which is the hierarchical entropy function induced by some entropy-stable hierarchical source.Note that the splitting up of (7758) into the three types (3312), ( 2223), (2223) indeed does make sense because these latter three types sum to (7758) and are of order 2, one less than the order of (7758).Example 3. Fix the alphabet cardinality parameter to be 2, and fix the partitioning parameter k to be any integer ≥ 2. Let (r 1 , r 2 ) belong to R(2, k).Then either (a) (r 1 , r 2 ) = (0, 0) or (b) r 1 + r 2 = k.In case (a), we define ψ(r 1 , r 2 ) to be the k × 2 zero matrix.In case (b), we define ψ(r 1 , r 2 ) to be the k × 2 matrix whose first r 1 rows are (1, 0) and whose last r 2 rows are (0, 1).Letting φ = ψ * , we obtain finitary Λ(2, k)-indexed hierarchical source S φ .
Remarks.For each fixed k ≥ 2, • The source defined in Example 3 is the unique finitary Λ(2, k)-indexed hierarchical source.
• The source defined in Example 4 is the unique finitary Λ(3, k)-indexed hierarchical source.This is because the matrices employed in these examples are unique up to row permutation.Theorem 1.Let m, k ≥ 2 be arbitrary, and let {S(λ) : λ ∈ Λ(m, k)} be any finitary Λ(m, k)-indexed hierarchical source.Then the source is entropy-stable and the hierarchical entropy function induced by the source can be characterized as the unique continuous function h : Notations and Remarks.
• Fix k to be an arbitrary integer For later use, we remark that The hierarchical entropy function induced by this source maps P 2 into [0, ∞) and shall be denoted h 2,k .The relationship between functions H 2,k and h 2,k is • Fix k to be an arbitrary integer ≥ 2. Let {S(λ) : λ ∈ Λ(3, k)} be the unique finitary Λ(3, k)indexed hierarchical source.H 3,k : Λ(3, k) → [0, ∞) shall denote the entropy function For later use, we remark that The hierarchical entropy function induced by this source maps P 3 into [0, ∞) and shall be denoted h 3,k .The relationship between functions H 3,k and h 3,k is In Section 3, we show that hierarchical entropy function h 2,k is self-affine for each k ≥ 2, and in Section 4, we show that hierarchical entropy function h 3,k is self-affine for each k ≥ 2.

h 2,k Is Self-Affine
An iterated function system (IFS) on a closed nonempty subset Ω of a finite-dimensional Euclidean space is a finite nonempty set of mappings which map Ω into itself and are each contraction mappings.Given an IFS T on Ω, there exists ([6], Theorem 9.1) a unique nonempty compact set Q ⊂ Ω such that Q is called the attractor of the IFS T .
Suppose h : P m → [0, ∞) is the hierarchical entropy function induced by an entropy-stable Λ(m, k)indexed hierarchical source.Let Ω m = P m × R, regarded as a metric space with the Euclidean metric that it inherits from being regarded as a closed convex subset of R m+1 .We define h to be self-affine if there is an IFS T on Ω m such that • Each mapping in T is an affine mapping.
• The attractor of T is {(p, h(p)) : p ∈ P m }, the graph of h.
For the rest of this section, k ≥ 2 is fixed.Our goal is to show that the function h 2,k : P 2 → [0, ∞) is self-affine, where h 2,k is the hierarchical entropy function induced by the unique finitary Λ(2, k)-indexed hierarchical source. For • Define the matrix • Define T * i : P 2 → P 2 to be the mapping • Define the vector , log 2 k i • Define T i : Ω 2 → Ω 2 to be the mapping where p • v i denotes the usual dot product.
Remarks.It is clear that the set of mappings . This fact allows one to prove (Lemma B.3 of Appendix B) that the related set of mappings This result is the first part of the following theorem.
Theorem 2. Let k ≥ 2 be arbitrary.The following statements hold: • (b): h 2,k is self-affine and its graph is the attractor of the IFS in (a).
• (c): Our proof of Theorem 2 requires the following lemma.
Proof.Property (a.1), whose proof we omit, is a simple consequence of the fact that M i has row sums equal to k. Fix a type λ from Λ(2, k) where r(λ) = (r 1 , r 2 ).As remarked in Example 3, either r 1 = r 2 = 0, or r 1 + r 2 = k.Let us first handle the case r 1 + r 2 = k.Then where The remaining case r 1 = r 2 = 0 is much easier.We have Proof of Theorem 2. We first derive part(c) and then part(b) (part(a) is already taken care of, as remarked previously).We derive part(c) by establishing Equation (11) for a fixed i ∈ {0, 1, • • • , k − 1}.Let φ ∈ Φ(2, k) be the function given in Example 3 and recall that H 2,k denotes the entropy function H φ on Λ(2, k).Referring to the definition of T * i in Equation ( 9) and T i in Equation (10), we see that proving Equation ( 11) is equivalent to proving We first show that Our proof of Equation ( 13) is by induction on λ .We first must verify Equation (13) for λ = 1, which is the two cases λ = (1, 0) and λ = (0, 1).For λ = (1, 0), the left side of Equation ( 13) is the entropy of the first row of M i , which by Equation ( 5) is and the right side is Similarly, if λ = (0, 1), both sides of Equation (13) are equal to log 2 k i .Fix λ * ∈ Λ(2, k) for which λ * > 1, and for the induction hypothesis assume that Equation (13) holds when λ is smaller than λ * .The proof by induction is then completed by showing that Equation (13) holds for λ = λ * .Let By the induction hypothesis, Appealing to Equation (3), we then have where N is the number of permutations of the k-tuple (λ where N 2 is the number of permutations of the k-tuple Substituting the right hand sides of the previous two equations into Equation ( 14), we obtain Equation ( 13) for λ = λ * , completing the proof by induction.Dividing both sides of Equation ( 13) by λ , and using the fact that λM i = k λ , we see that which by Equation (6) becomes It is easy to see that Equation ( 12) then follows since the set This property, together with the fact that {T * i : i = 0, 1, • • • , k − 1} is an IFS on P 2 with attractor P 2 , allows us to conclude that G is the attractor of the IFS {T 0 , • • • , T i−1 } (Lemma B.1 of Appendix B), and h 2,k is self-affine because the T i 's are affine.Theorem 2(b) is therefore true.
Generating Hierarchical Entropy Function Plots.
We can obtain k n points on the plot of h * 2,k as follows.Let {T i : i = 0, 1, • • • , k − 1} be the IFS on Ω 2 given in Theorem 2, such that the attractor of this IFS is the graph of h 2,k .Let S 0 (k) = {(0, 1, 0)}, and generate subsets S 1 (k), S 2 (k), • • • , S n (k) of R 3 by the recursion we obtain k n points on the plot of h * 2,k .Using a Dell Latitude D620 laptop, we did S n (k) computations to obtain the plots in Figure 2, as follows.
It is the purpose of this section to study h 3,k : P 3 → [0, ∞), the hierarchical entropy function induced by the unique finitary Λ(3, k)-indexed hierarchical source.In R 3 , let Q k be the convex hull of the set {(k, 0, 0), (0, k, 0), (0, 0, k)}.Then Q k is an equilateral triangle whose three vertices are (k, 0, 0), (0, k, 0), (0, 0, k).We employ the well-known quadratic partition [7] of Q k into k 2 congruent equilateral triangles, formed as follows.Partition each of the three sides of Q k into k line segments of equal length by laying down k − 1 interior points along the side.For each vertex of Q k , draw a line segment connecting the first interior points reached going out from the vertex along its two sides, then draw a line segment connecting the second interior points reached, and so forth until k − 1 line segments have been drawn.Doing this for each of the three vertices, you will have drawn a total of 3(k − 1) line segments, which subdivide Q k into the k 2 congruent equilateral triangles of the quadratic partition.See Figure 3, which illustrates the quadratic partition of triangle Q 3 into nine sub-triangles.Let V 1 be the set of all points (a, b, c) in Q k such that a is a positive integer and b, c are non-negative integers.There are For each v ∈ V 1 , the convex hull of the rows of M 1,v is one of the sub-triangles in the quadratic partition of Q k , and these sub-triangles are distinct as v varies through V 1 .This gives us a total of k(k + 1)/2 of the sub-triangles in the quadratic partition of Q k , and we call these the V 1 sub-triangles of the partition.Let V 2 be the set of all (a, b, c) in Q k such that a is a non-negative integer and b, c are positive integers.There are For each v ∈ V 2 , the convex hull of the rows of M 2,v is one of the sub-triangles in the quadratic partition of Q k , and these sub-triangles are distinct as v varies through V 2 .This gives us a total of k(k − 1)/2 of the sub-triangles in the quadratic partition of Q k , and we call these the V 2 sub-triangles of the partition.The V 1 sub-triangles are all translations of each other; the V 2 sub-triangles are all translations of each other and each one can be obtained by rotating a V 1 sub-triangle about its center 180 degrees, followed by a translation.Together, the k(k + 1)/2 V 1 sub-triangles and the k(k − 1)/2 V 2 sub-triangles constitute all k 2 sub-triangles in the quadratic partition of Q k .We define M(k) to be the set of k 2 matrices Each row sum of each matrix in M(k) is equal to k.Because of this property, we can define for each M ∈ M(k) the mapping T * M : P 3 → P 3 in which and we can also define the mapping T M : Ω 3 → Ω 3 in which where Remarks.It is clear that the set of k 2 mappings {T * M : M ∈ M(k)} is an IFS on P 3 .This fact allows one to prove (Lemma B.4 of Appendix B) that the related set of k 2 mappings {T M : M ∈ M(k)} is an IFS on Ω 3 .In the following example, we exhibit this IFS in a special case.
Example 5. Let k = 3. Referring to Figure 3, we see that the 9 matrices in M(3) are Following Equation (19), let v i ∈ R 3 be the vector whose components are the H 3,3 entropies of the rows of M i .Letting α = log 2 3 and β = log 2 6, Formula ( 7) is used to obtain Following Equation ( 18), for each i = 1, 2, • • • , 9, let T i : Ω 3 → Ω 3 be the mapping defined by Theorem 3 which follows will tell us that the graph of h 3,3 is the attractor of the IFS {T i } 9 i=1 .Theorem 3. Let k ≥ 2 be arbitrary.The following statements hold: • (b): h 3,k is self-affine and its graph is the attractor of the IFS in (a).
Our proof of Theorem 3 requires a couple of lemmas, which follow.Lemma 3. Let λ ∈ Λ(3, k) + , let φ ∈ Φ(3, k) be the function given in Example 4, and let where the q i 's are non-negative, the r i 's belong to the set {1, 2, • • • , k}, and r 1 + r 2 + r 3 = 2k.Then Proof.If each r i < k, then by definition of φ(λ) in Example 4, the properties Equations ( 21)-( 23) are true.Now suppose at least one r i = k.Then exactly one r i = k (since otherwise some r i = 0, which is not allowed).By symmetry, we may suppose that r 1 = k.We may now express λ as Proof.Property (a.1), whose proof we omit, is a simple consequence of the fact that each matrix in M(k) has row sums equal to k. Fix λ ∈ Λ(3, k) + and fix M ∈ M(k).Let r(λ) = (r 1 , r 2 , r 3 ), and let M is either of the form Equation (15) (Case 1) or of the form Equation (16) (Case 2).Throughout the rest of the proof, we employ the parameter β = (r 1 + r 2 + r 3 )/k.As remarked in Example 4, β ∈ {0, 1, 2}.
Proof of Theorem 3. We first derive part(c) and then part(b) (part(a) is already taken care of, as remarked previously).We derive part(c) by establishing Equation (20) for a fixed M ∈ M(k).Let φ ∈ Φ(3, k) be the function given in Example 4 and recall that H 3,k denotes the entropy function H φ on Λ(3, k).Referring to the definition of T * M in Equation ( 17) and T M in Equation ( 18), we see that proving Equation ( 20) is equivalent to proving We first show that The proof is by induction on λ .Equation (35) holds for λ = 1, which is the three cases λ = (1, 0, 0), λ = (0, 1, 0), λ = (0, 0, 1).Fix λ * ∈ Λ(3, k) for which λ * > 1, and for the induction hypothesis assume that Equation (35) holds when λ is smaller than λ * .The proof by induction is then completed by showing that Equation (35) holds for λ , and so by Equation (3), where N is the number of permutations of the k-tuple where N 2 is the number of permutations of the k-tuple (λ 1 , • • • , λ k ).Since M is nonsingular (its determinant is k), we must have N = N 2 .Substituting the right hand sides of the previous two equations permutations of each other; this symmetry property for entropy on types then extends to P m using the fact that the finitary source which induces h m,k is entropy-stable.The well-known Shannon entropy function h m on P m is defined by where p i log 2 p i is taken to be zero if p i = 0. We point out that h m also satisfies properties P1-P4.
In addition, h m satisfies the property that it attains its maximum value at the equiprobable distribution gives us a relationship between hierarchical entropy and Shannon entropy; it follows from the fact that every string in a hierarchical type class is of the same type.
It is an open problem whether this inequality is strict at every non-degenerate p ∈ P m ; we have proved this strict inequality property in some special cases (for example, m = k = 2).
(In other words, we can travel in Λ j (m, k) from any type to a type divisible by k via a path consisting of M terms, with successive terms no more than distance 1 apart in the infinity norm.) Let J ≥ 0. Suppose q 1 , q 2 belong to P * m and q 1 −q 2 ∞ ≤ k −J .Fix J > J and types λ 1 , λ 2 in Λ J (m, k) such that q 1 = p λ 1 and q 2 = p λ 2 .Starting at λ 1 and applying property (p.2) repeatedly (that is, for each j going backwards from j = J to j = J + 1), we obtain λ 1 ∈ Λ J (m, k) such that Similarly, we find λ 2 ∈ Λ J (m, k) such that By the triangle inequality, we have It is easily seen that S is entropy-stable by the definition in Section 2.2 if f can be extended to a continuous function on P m (which will be the hierarchical entropy function induced by S).This extension will be possible if f is uniformly continuous on P * m , and we establish this by showing that j j < ∞, where { j } is the sequence in Equation (38).Let φ ∈ Φ(m, k) be such that S φ = S. Let j ≥ 1 and let λ, µ be types in Λ j (m, k) for which λ − µ ∞ ≤ 1. Letting x i = 0 (44) It is a simple exercise in Lagrange multipliers, which we omit, to show that the vector x = (x 1 , • • • , x m ) satisfying the constraints in Equation (44) which maximizes the dot product x • c is the vector for which For this choice of x, x • c can be seen to be V (c).Therefore, we will be done if But this is true with equality, by Equation (41).Lemma B.3.Let k ≥ 2 be arbitrary.Then, for each i = 0, 1, • • • , k − 1, the mapping T i : Ω 2 → Ω 2 defined in Section 3 is a contraction.
Proof.Fix i in {0, 1, • • • , i−1}.The mapping T * i : P 2 → P 2 is a contraction mapping with contraction coefficient k −1 .Applying Lemma B.2 with σ = k −1 , T i will be a contraction mapping if we can show that It is easy to compute that where a 1 = i + 1, a 2 = k − i.For any constant γ satisfying 0 < γ < 1, we have
the remainder upon division of n by k.Each entry of r(λ) belongs to the set {0, 1, • • • , k − 1} and the sum of the entries of r(λ) is an integer multiple of k.Definitions.• R(m, k) is defined to be the set of all m-tuples whose entries come from {0, 1, • • • , k − 1} and sum to an integer multiple of k. • Ψ(m, k) is defined to be the set of all mappings ψ from R(m, k) to the set of binary k × m matrices such that if r = (r 1 , • • • , r m ) belongs to R(m, k), then ψ(r) has left-to-right column sums r 1 , r 2 , • • • , r m and row sums all equal to (r 1 +r 2 +• • •+r m )/k.The set Ψ(m, k) is nonempty for each choice of parameters m, k ≥ 2 [4,5].• If ψ ∈ Ψ(m, k), define ψ * to be the unique mapping in Φ(m, k) which does the following.If
).This property fails in general for the h m,k functions, although it is true for some of them; for example, referring to Figure2, we see that h 2,2 and h 2,4 do not reach their maximum at (1/2, 1/2), but h 2,3 does.It is an open problem to determine the maximum value of each h 2,k and h 3,k and to see where the maximum is attained.The inequality h m,k (p) ≤ h m (p), p ∈ P m , m ∈ {2, 3}, k ≥ 2 (log 2 a 1 , log 2 a 2 ) ≤ the right side of Inequality (46) is upper bounded by 2 max (log 2 γ) 2 , (log 2 {γk}) 2Choosing the smallest value of γ for which (log 2 {γk}) 2 ≥ (log 2 γ)2 holds for every k ≥ 2, we obtain γ = 1/ √ 2. We have thus proved the variance boundV (log 2 a 1 , log 2 a 2 ) ≤ 2 log 2 k √ 2 2 , k ≥ 2