A Catalog of Self-Affine Hierarchical Entropy Functions

Kieffer, John

doi:10.3390/a4040307

Open AccessArticle

A Catalog of Self-Affine Hierarchical Entropy Functions

by

John Kieffer

Department of Electrical & Computer Engineering, University of Minnesota Twin Cities, 200 Union Street SE, Minneapolis, MN 55455, USA

Algorithms 2011, 4(4), 307-333; https://doi.org/10.3390/a4040307

Submission received: 23 September 2011 / Revised: 18 October 2011 / Accepted: 30 October 2011 / Published: 1 November 2011

(This article belongs to the Special Issue Data Compression, Communication and Processing)

Download

Browse Figures

Versions Notes

Abstract

: For fixed k ≥ 2 and fixed data alphabet of cardinality m, the hierarchical type class of a data string of length n = k^j for some j ≥ 1 is formed by permuting the string in all possible ways under permutations arising from the isomorphisms of the unique finite rooted tree of depth j which has n leaves and k children for each non-leaf vertex. Suppose the data strings in a hierarchical type class are losslessly encoded via binary codewords of minimal length. A hierarchical entropy function is a function on the set of m-dimensional probability distributions which describes the asymptotic compression rate performance of this lossless encoding scheme as the data length n is allowed to grow without bound. We determine infinitely many hierarchical entropy functions which are each self-affine. For each such function, an explicit iterated function system is found such that the graph of the function is the attractor of the system.

Keywords:

types; type classes; lossless compression; hierarchical entropy; self-affine functions; iterated function systems

1. Introduction

A traditional type class consists of all permutations of a fixed finite-length data string. There is a well-developed data compression theory in which strings in a traditional type class are losslessly encoded into fixed-length binary codewords [1]. One can generalize the notion of traditional type class and the resulting data compression theory in the following natural way Let T be a finite rooted tree; an isomorphism of T is a one-to-one mapping of the set of vertices of T onto itself which preserves the parent-child relation. Let n be the number of leaves of T, let $ℒ$ (T) be the set of leaves of T, and let σ be a one-to-one mapping of {1, 2, …, n} onto $ℒ$ (T). Suppose (X₁, X₂, …, X_n) is a data string of length n. Define the T-type class of (X₁, …, X_n) to consist of all strings (Y₁, Y₂, …, Y_n) for which there exists an isomorphism ϕ of T such that

Y_{i} = X_{σ^{- 1 (ϕ (σ (i)))}}, i = 1, 2, \dots, n

Consider the depth one tree T = T₁(n) in which there are n children of the root, which are the leaves of the tree. Then, the notion of T₁(n)-type class coincides with the notion of traditional type class. Now let n = k^j for positive integer j and integer k ≥ 2. Consider the depth j tree T = T_j(k) with n leaves such that each non-leaf vertex has k children. Then, a T_j(k)-type class is called a hierarchical type class, and k is called the partitioning parameter of the class. In the paper [2], we dealt with hierarchical type classes in which the partitioning parameter is k = 2. In the present paper, we deal with hierarchical type classes in which the partitioning parameter is an arbitrary k ≥ 2.

Given a hierarchical type class S, there is a simple lossless coding algorithm which encodes each string in S into a fixed-length binary codeword of minimal length, and decodes the string from its codeword. This algorithm is particularly simple for the case when the partitioning parameter is k = 2, and we illustrate this case in Example 1 which follows; the case of general k ≥ 2 is discussed in [3]. In Example 1 and subsequently, x¹ * x² * … * x^k shall denote the data string obtained by concatenating together the finite-length data strings x¹, x², …, x^k (left to right).

Example 1

Let k = 2, and let S be the hierarchical type class of data string AABBABAB. The 16 strings in S are illustrated in Figure 1. Each string x ∈ S has a tree representation in which each vertex of tree T₃(2) is assigned a label which is a substring of x. This assignment takes place as follows.

The leaves of the tree, traversed left to right, are labeled with the respective left-to-right entries of the data string x.
For each non-leaf vertex v, if the strings labeling the left and right children of v are x_L, x_R, respectively, then the string labeling v is x_L * x_R if x_L precedes or is equal to x_R in the lexicographical order, and is x_R * x_L, otherwise.

In Figure 1, we have illustrated the tree representations of the strings AABBABAB and BAABBBAA. The root label of all 16 tree representations will be the same string, namely, the first string in S in lexicographical order, which is the string AABBABAB in this case. Each string in S is encoded by visiting, in depth-first order, the non-leaf vertices of its tree representation whose children have different labels. Each such vertex is assigned bit 0 if its label is x_L * x_R, where x_L, x_R are the labels of its left and right children, and is assigned bit 1 otherwise (meaning that the label is x_R * x_L). The resulting sequence of bits, in the order they are obtained, is the codeword of the string. Since both encoder and decoder will know what hierarchical type class is being encoded, the decoder will know what the root label of the tree representation should be, and then the successive bits of the codeword allow the decoder to grow the tree representation from the root downward.

Before discussing the nature of the results to be obtained in this paper, we need some definitions and notation. Fix integers m, k ≥ 2, which serve as parameters in the subsequent development; k is the partitioning parameter already introduced, and m is called the “alphabet cardinality parameter” because we shall be dealing with an m-letter data alphabet, denoted Algorithms 04 00307i2 _m = {a₁, a₂, …, a_m}. For each j ≥ 0, we define a j-string x to be a string of length k^j over _m. Note that if j ≥ 1, for each j-string x there is a unique k-tuple (x¹, x², …, x^k) whose entries are (j − 1)-strings such that x = x¹*x²*…*x^k; this k-tuple is called the k-partitioning of x. If S₁, S₂, …, S_k are non-empty sets of j-strings, let S₁ * S₂ * … * S_k be the set of all (j + 1)-strings of the form x¹ * x² * … * x^k, where xⁱ belongs to S_i for i = 1, 2, …, k. The 0-strings are the individual letters in Algorithms 04 00307i2 _m.

We wish to formally define the family Algorithms 04 00307i3 _m,k of all hierarchical type classes in which the alphabet cardinality parameter is m and the partitioning parameter is k. Instead of using the tree isomorphism definition of hierarchical type class given at the beginning of the paper, we will use an equivalent inductive definition, which is more convenient in the subsequent development. First, we define the hierarchical type class of a 0-string to be the set consisting of the string itself. Given j-string x with j ≥ 1, and assume hierarchical type classes of (j − 1)-strings have been defined. Let (x¹, …, x^k) be the k-partitioning of x and let S_i be the hierarchical type class of xⁱ (i = 1, …, k). The hierarchical type class of x is then defined as

\cup_{π \in Π_{k}} [S_{π (1)} * S_{π (2)} * \dots * S_{π (k)}]

(1)

where, from now on, Π_k is the set of all permutations of {1, 2, …, k}. A set is called a hierarchical type class of order j if it is the hierarchical type class of some j-string. A set is called a hierarchical type class if it is a hierarchical type class of order j for some j ≥ 0. The family Algorithms 04 00307i3

_m,k is then the set of all hierarchical type classes, of all orders.

We define the type of j-string x to be the vector (n₁, …, n_m) whose i-th component n_i is the frequency of letter a_i in x. For each j ≥ 0, let Λ_j (m, k) be the set of all types of j-strings. Let Λ(m, k) be the union of the Λ_j(m, k)'s for j ≥ 0, and let Λ⁺(m, k) be the union of the Λ_j(m, k)'s for j ≥ 1. A type in Λ_j(m, k) will be said to be of order j. If λ ∈ Λ(m, k), let ‖λ‖ denote the sum of the components of λ. If λ is of order j, then ‖λ‖ = k^j. All strings in a hierarchical type class have the same type, because permuting a string does not change the type. This property is listed below, along with some other properties whose simple proofs are omitted.

Prop. 1: All strings in a hierarchical type class have the same type.
Prop. 2: For each j ≥ 0, the distinct hierarchical type classes of order j form a partition of the set of all j-strings.
Prop. 3: Let λ ∈ Λ(m, k), and let _m,k(λ) denote the set of all hierarchical type classes in _m,k whose strings are of type λ. Then _m,k(λ) forms a partition of the set of all strings of type λ.
Prop. 4: Let S ∈ _m,k be a hierarchical type class of order j ≥ 1. Then there is a k-tuple (S₁, S₂, …, S_k), unique up to permutation, such that each S_i is a hierarchical type class of order j − 1 and S is expressible as Expression (1).

Global Hierarchical Entropy Function

The global hierarchical entropy function is the function H : Algorithms 04 00307i3 _m,k → [0, ∞) such that

H (S) ≜ {log}_{2} | S |, S \in S_{m, k}

where, in this paper, if S is a finite set, |S| shall denote the cardinality of S. H(S) shall be called the entropy of S. Given a hierarchical type class S, its entropy H(S) has the following interpretation. Suppose H(S) > 0, and we losslessly encode the strings in S into fixed-length binary codewords of minimal length (as discussed in Example 1 and in [3]). Then this minimal length is ⌈H(S)⌉.

Lemma 1

Let S be a hierarchical type class of order j ≥ 1. Let (S₁, S₂, …, S_k) be the k-tuple of hierarchical type classes of order j − 1 associated with S according to Prop. 4, and let N(S) be the number of distinct permutations of this k-tuple. Then,

H (S) = [\sum_{i = 1}^{k} H (S_{i})] + {log}_{2} N (S)

(2)

Proof

Represent S as the Expression (1). Formula (2) follows easily from this expression.

Remark

We see now how to inductively compute entropy values H(S), as follows. If S is of order 0, then |S| = 1 and so H(S) = 0. If S is of order j ≥ 1, assume all entropy values for hierarchical type classes of smaller order have been computed. Then Equation (2) is used to compute H(S).

Discussion

Let {S_j : j ≥ 1} be a sequence of hierarchical type classes from Algorithms 04 00307i3 _m,k such that S_j is of order j (j ≥ 1). Consider the sequence of normalized entropies {H(S_j)/k^j : j ≥ 1}. As j becomes large, the normalized entropy H(S_j)/k^j approximates more and more closely the compression rate in bits per data sample that results from the compression scheme on S_j. It is therefore of interest to determine circumstances under which such a sequence of normalized entropies will have a limit that we can compute. We discuss our approach to this problem, which will be pursued in the rest of this paper. A hierarchical source is defined to be a family {S(λ) : λ ∈ Λ(m, k)} in which each S(λ) is a hierarchical type class selected from Algorithms 04 00307i3 _m,k(λ). (We will also impose a natural consistency condition on how these selections are made in our formal hierarchical source definition to be given in the next section.) Let ℝ denote the real line, and let ℙ_m be the subset of ℝ^m consisting of all m-dimensional probability vectors. We consider ℙ_m to be a metric space with the Euclidean metric. For each λ ∈ Λ(m, k), let p_λ be the probability vector λ/‖λ‖ in ℙ_m. Suppose there exists a (necessarily unique) continuous function h : ℙ_m → [0, ∞) such that for each p ∈ ℙ_m, and each sequence {λ_j : j ≥ 0} for which λ_j ∈ Λ_j(m, k) (j ≥ 0) and lim_j→∞ p_{λ_j} = p, the limit property

h (p) = lim_{j \to \infty} H (S (λ_{j})) / k^{j}

holds. Then we call the function h the hierarchical entropy function induced by the source {S(λ) : λ ∈ Λ(m, k)}. A hierarchical entropy function is defined to be any function on ℙ_m which is the hierarchical entropy function induced by some hierarchical source. One of the goals of hierarchical data compression theory is to identify hierarchical entropy functions and to learn about their properties. In the paper [2], two hierarchical entropy functions were introduced. In the present paper, we go further by identifying infinitely many hierarchical entropy functions which are each self-affine, and for each one of these entropy functions, we exhibit an explicit iterated function system whose attractor is the graph of the entropy function.

2. Hierarchical Sources

This section is devoted to the discussion of hierarchical sources. The concept of hierarchical source was informally described in the Introduction. In Section 2.1., we make this concept formal. In Section 2.2., we define the entropy-stable hierarchical sources, which are the hierarchical sources that induce hierarchical entropy functions. In Section 2.3., we introduce a particular type of entropy-stable hierarchical source called finitary hierarchical source. The finitary hierarchical sources induce the hierarchical entropy functions that are the subject of this paper.

2.1. Formal Definition of Hierarchical Source

Let Algorithms 04 00307i3 = {S(λ) : λ ∈ Λ(m, k)} be a family of hierarchical type classes in which each class S(λ) belongs to the set of classes _m,k(λ). Then is defined to be a (Λ(m, k)-indexed) hierarchical source if the following additional condition is satisfied.

Consistency Condition: For each S ∈ of order > 0, each term in the k-tuple (S₁, S₂, …, S_k) associated with S in Prop. 4 also belongs to .

We discuss how the Consistency Condition gives us a way to describe every possible hierarchical source. Let Λ(m, k)^k be the set of all k-tuples whose entries come from Λ(m, k). Let Φ(m, k) be the set of all mappings ϕ: Λ(m, k)⁺ → Λ(m, k)^k such that whenever ϕ(λ) = (λ₁, λ₂, …, λ_k), we have

$λ = \sum_{i = 1}^{k} λ_{i}$ .
If λ is of order j, then each entry λ_i of ϕ(λ) is of order j − 1.

Each ϕ ∈ Φ(m, k) gives rise to a Λ(m, k)-indexed hierarchical source Algorithms 04 00307i3 ^ϕ = {S^ϕ(λ) : λ ∈ Λ(m, k)}, defined inductively as follows.

If λ ∈ Λ(m, k) is of order 0, define class S^ϕ(λ) to be the set {a_i}, where a_i is the unique letter in _m whose type is λ.
If λ ∈ Λ(m, k)⁺, assume class S^ϕ(λ*) has been defined for all types λ* of order less than the order of λ. Letting ϕ(λ) = (λ₁, λ₂, …, λ_k), define

$S^{ϕ} (λ) ≜ \cup_{π \in Π_{k}} [S^{ϕ} (λ_{π (1)}) * S^{ϕ} (λ_{π (2)}) * \dots * S^{ϕ} (λ_{π (k)})]$

From the Consistency Condition, all possible hierarchical sources arise in this way, that is, given any Λ(m, k)-indexed hierarchical source Algorithms 04 00307i3 , there exists ϕ ∈ Φ(m, k) such that = ^ϕ.

Another advantage of the Consistency Condition is that it allows the entropies of the classes in a hierarchical source to be recursively computed. To see this, let Algorithms 04 00307i3 = {S(λ) : λ ∈ Λ(m, k)} be a Λ(m, k)-indexed hierarchical source and choose ϕ ∈ Φ(m, k) such that = ^ϕ. Define H_ϕ : Λ(m, k) → [0, ∞) to be the function which takes the value zero on Λ₀(m, k), and for each λ ∈ Λ⁺(m, k),

H_{ϕ} (λ) = [\sum_{i = 1}^{k} H_{ϕ} (λ_{i})] + {log}_{2} N (λ)

(3)

where (λ₁, …, λ_k) is the k-tuple ϕ(λ) and N(λ) is the number of distinct permutations of this k-tuple. By the Consistency Condition and Lemma 1,

H_{ϕ} (λ) = H (S (λ)), λ \in Λ (m, k)

2.2. Entropy-Stable Hierarchical Sources

The concept of entropy-stable source discussed in this section allows us to formally define the concept of hierarchical entropy function.

For each j ≥ 0, define the finite set of probability vectors

ℙ_{m} (j) ≜ {p_{λ} : λ \in Λ_{j} (m, k)}

where the reader will recall that p_λ = λ/‖λ‖. Note that the sets {ℙ_m(j) : j ≥ 0} are increasing in the sense that

ℙ_{m} (j) \subset ℙ_{m} (j + 1), j \geq 0

(4)

Let $ℙ_{m}^{*}$ be the countably infinite set of probability vectors which is the union of the ℙ_m(j)'s.

Suppose we have a hierarchical source Algorithms 04 00307i3 = {S(λ) : λ ∈ Λ(m, k)}. For each j ≥ 0, let h_j : ℙ_m(j) → [0, ∞) be the unique function for which

h_{j} (p_{λ}) = H (S (λ)) / ‖ λ ‖, λ \in Λ_{j} (m, k)

Suppose $p \in ℙ_{m}^{*}$ Because of the increasing sets property Equation (4), p is a member of the set ℙ_m(j) for j sufficiently large. Consequently, h_j(p) is defined for j sufficiently large, and so it makes sense to talk about the limit of the sequence {h_j(p) : j ≥ 0}, if this limit exists. We define the source Algorithms 04 00307i3 to be entropy-stable if there exists a continuous function h : ℙ_m → [0, ∞) such that

h (p) = lim_{j \to \infty} h_{j} (p), p \in ℙ_{m}^{*}

and the function h (which is unique since

ℙ_{m}^{*}

is dense in ℙ_m) is called the hierarchical entropy function induced by Algorithms 04 00307i3

. Henceforth, the terminology “hierarchical entropy function” denotes a function which is the hierarchical entropy function induced by some entropy-stable hierarchical source.

2.3. Finitary Hierarchical Sources

If λ = (n₁, n₂, …, n_m) is a type in Λ(m, k)⁺, define

r (λ) ≜ (mod (n_{1}, k), mod (n_{2}, k), \dots, mod (n_{m}, k))

where mod(n, k) ∈ {0, 1, …, k − 1} is the remainder upon division of n by k. Each entry of r(λ) belongs to the set {0, 1, …, k − 1} and the sum of the entries of r(λ) is an integer multiple of k.

Definitions

$ℛ$ (m, k) is defined to be the set of all m-tuples whose entries come from {0, 1, …, k − 1} and sum to an integer multiple of k.
Ψ(m, k) is defined to be the set of all mappings ψ from $ℛ$ (m, k) to the set of binary k × m matrices such that if r = (r₁, …, r_m) belongs to $ℛ$ (m, k), then ψ(r) has left-to-right column sums r₁, r₂, …, r_m and row sums all equal to (r₁ + r₂ + … + r_m)/k. The set Ψ(m, k) is nonempty for each choice of parameters m, k ≥ 2 [4,5].
If ψ ∈ Ψ(m, k), define ψ* to be the unique mapping in Φ(m, k) which does the following. If λ = (n₁, n₂, …, n_m) belongs to Λ(m, k)⁺, let A = ψ(r(λ)). Then ψ*(λ) = (λ₁, λ₂, …, λ_k), where

$λ_{i} = (⌊ n_{1} / k ⌋, ⌊ n_{2} / k ⌋, \dots, ⌊ n_{m} / k ⌋) + A (i, 1 : m), i = 1, 2, \dots, k$

with A(i, 1 : m) denoting the i-th row of A.
Suppose ψ ∈ Ψ(m, k) and let ϕ = ψ*. The Λ(m, k)-indexed hierarchical source {S^ϕ : λ ∈ Λ(m, k)} defines a finitary source. For each choice of parameters m, k ≥ 2, since Ψ(m, k) is nonempty, there is at least one finitary Λ(m, k)-indexed hierarchical source. The word “finitary” is used to describe these sources because they are each definable in finite terms by the specification of mk| $ℛ$ (m, k)| bits (the elements of a number of k × m binary matrices).

Example 2

Note that (1122) belongs to $ℛ$ (4, 3). Suppose

ψ (1122) = [\begin{matrix} 1100 \\ 0011 \\ 0011 \end{matrix}]

Note that (7758) ∈ Λ⁺(4, 3), and that

r (7758) = mod ((7758), 3) = (1122)

Since ⌊(7758)/3⌋ = (2212), we see that ψ*(7758) = (λ₁, λ₂, λ₃), where

\begin{array}{l} λ_{1} = (2212) + (1100) = (3312) \\ λ_{2} = (2212) + (0011) = (2223) \\ λ_{3} = (2212) + (0011) = (2223) \end{array}

Note that the splitting up of (7758) into the three types (3312), (2223), (2223) indeed does make sense because these latter three types sum to (7758) and are of order 2, one less than the order of (7758).

Example 3

Fix the alphabet cardinality parameter to be 2, and fix the partitioning parameter k to be any integer ≥ 2. Let (r₁, r₂) belong to $ℛ$ (2, k). Then either (a) (r₁, r₂) = (0, 0) or (b) r₁ + r₂ = k. In case (a), we define ψ(r₁, r₂) to be the k × 2 zero matrix. In case (b), we define ψ(r₁, r₂) to be the k × 2 matrix whose first r₁ rows are (1, 0) and whose last r₂ rows are (0, 1). Letting ϕ = ψ*, we obtain finitary Λ(2, k)-indexed hierarchical source Algorithms 04 00307i3 ^ϕ.

Example 4

Now fix the alphabet cardinality parameter to be 3, and fix the partitioning parameter k to be any integer ≥ 2. Let (r₁, r₂, r₃) belong to $ℛ$ (3, k). Then either (a) (r₁, r₂, r₃) = (0, 0, 0); (b) r₁ + r₂ + r₃ = k; or (c) r₁ + r₂ + r₃ = 2k. In case (a), we define ψ(r₁, r₂, r₃) to be the k × 3 zero matrix. In case (b), we define ψ(r₁, r₂, r₃) to be the k × 3 matrix whose first r₁ rows are (100), whose next r₂ rows are (010), and whose last r₃ rows are (001). In case (c), we define ψ(r₁, r₂, r₃) to be the k × 3 matrix whose first k − r₁ rows are (011), whose next k − r₂ rows are (101), and whose last k − r₃ rows are (110). Letting ϕ = ψ*, we obtain finitary Λ(3, k)-indexed hierarchical source Algorithms 04 00307i3 ^ϕ.

Remarks

For each fixed k ≥ 2,

The source defined in Example 3 is the unique finitary Λ(2, k)-indexed hierarchical source.
The source defined in Example 4 is the unique finitary Λ(3, k)-indexed hierarchical source.

This is because the matrices employed in these examples are unique up to row permutation.

Theorem 1

Let m, k ≥ 2 be arbitrary, and let {S(λ) : λ ∈ Λ(m, k)} be any finitary Λ(m, k)-indexed hierarchical source. Then the source is entropy-stable and the hierarchical entropy function induced by the source can be characterized as the unique continuous function h : ℙ_m → [0, ∞) such that

h (λ / ‖ λ ‖) = H (S (λ)) / ‖ λ ‖, λ \in Λ (m, k)

Theorem 1 is proved in Appendix A.

Notations and Remarks

Fix k to be an arbitrary integer ≥ 2. Let {S(λ) : λ ∈ Λ(2, k)} be the unique finitary Λ(2, k)-indexed hierarchical source. H_2,k : Λ(2, k) → [0, ∞) shall denote the entropy function

$H_{2, k} (λ) = H (S (λ)), λ \in Λ (2, k)$

For later use, we remark that

$H_{2, k} (n_{1}, n_{2}) = {log}_{2} [\frac{k!}{n_{1}! n_{2}!}] = {log}_{2} (\begin{matrix} k \\ n_{1} \end{matrix}), (n_{1}, n_{2}) \in Λ_{1} (2, k)$

(5)

The hierarchical entropy function induced by this source maps ℙ₂ into [0, ∞) and shall be denoted h_2,k. The relationship between functions H_2,k and h_2,k is

$h_{2 k} (λ / ‖ λ ‖) = H_{2, k} (λ) / ‖ λ ‖, λ \in Λ (2, k)$

(6)
Fix k to be an arbitrary integer ≥ 2. Let {S(λ) : λ ∈ Λ(3, k)} be the unique finitary Λ(3, k)-indexed hierarchical source. H_3,k : Λ(3, k) → [0, ∞) shall denote the entropy function

$H_{3, k} (λ) = H (S (λ)), λ \in Λ (3, k)$

For later use, we remark that

$H_{3, k} (n_{1}, n_{2}, n_{3}) = {log}_{2} [\frac{k!}{n_{1}! n_{2}! n_{3}!}], (n_{1}, n_{2}, n_{3}) \in Λ_{1} (3, k)$

(7)

The hierarchical entropy function induced by this source maps ℙ₃ into [0, ∞) and shall be denoted h_3,k. The relationship between functions H_3,k and h_3,k is

$h_{3, k} (λ / ‖ λ ‖) = H_{3, k} (λ) / ‖ λ ‖, λ \in Λ (3, k)$

(8)

In Section 3, we show that hierarchical entropy function h_2,k is self-affine for each k ≥ 2, and in Section 4, we show that hierarchical entropy function h_3,k is self-affine for each k ≥ 2.

3. h_2,k Is Self-Affine

An iterated function system (IFS) on a closed nonempty subset Ω of a finite-dimensional Euclidean space is a finite nonempty set of mappings which map Ω into itself and are each contraction mappings. Given an IFS Algorithms 04 00307i1 on Ω, there exists ([6], Theorem 9.1) a unique nonempty compact set Q ⊂ Ω such that

Q = \cup_{T \in T} T (Q)

Q is called the attractor of the IFS Algorithms 04 00307i1 .

Suppose h : ℙ_m → [0, ∞) is the hierarchical entropy function induced by an entropy-stable Λ(m, k)-indexed hierarchical source. Let Ω_m = ℙ_m × ℝ, regarded as a metric space with the Euclidean metric that it inherits from being regarded as a closed convex subset of ℝ^m⁺¹. We define h to be self-affine if there is an IFS Algorithms 04 00307i1 on Ω_m such that

Each mapping in is an affine mapping.
The attractor of is {(p, h(p)) : p ∈ ℙ_m}, the graph of h.

For the rest of this section, k ≥ 2 is fixed. Our goal is to show that the function h_2,k : ℙ₂ → [0, ∞) is self-affine, where h_2,k is the hierarchical entropy function induced by the unique finitary Λ(2, k)-indexed hierarchical source.

For each i = 0, 1, …, k − 1,

Define the matrix

$M_{i} ≜ [\begin{matrix} i + 1 & k - i - 1 \\ i & k - i \end{matrix}]$
Define $T_{i}^{*} : ℙ_{2} \to ℙ_{2}$ to be the mapping

$T_{i}^{*} (p) ≜ k^{- 1} p M_{i}, p \in ℙ_{2}$

(9)
Define the vector

$v_{i} ≜ ({log}_{2} (\begin{matrix} k \\ i + 1 \end{matrix}), {log}_{2} (\begin{matrix} k \\ i \end{matrix}))$
Define T_i : Ω₂ → Ω₂ to be the mapping

$T_{i} (p, y) ≜ (T_{i}^{*} (p), k^{- 1} y + k^{- 1} p \cdot v_{i}), (p, y) \in Ω_{2}$

(10)

where p · v_i denotes the usual dot product.

Remarks

It is clear that the set of mappings ${T_{i}^{*} : i = 0, 1, \dots, k - 1}$ is an IFS on ℙ₂. This fact allows one to prove (Lemma B.3 of Appendix B) that the related set of mappings {T_i : i = 0, 1, …, k − 1} is an IFS on Ω₂. This result is the first part of the following theorem.

Theorem 2

Let k ≥ 2 be arbitrary. The following statements hold:

(a): {T₀, T₁, …, T_k−1} is an IFS on Ω₂.
(b): h_2,k is self-affine and its graph is the attractor of the IFS in (a).
(c): For each i = 0, 1, …, k − 1,

$T_{i} (p, h_{2, k} (p)) = (T_{i}^{*} (p), h_{2, k} (T_{i}^{*} (p))), p \in ℙ_{2}$

(11)

Our proof of Theorem 2 requires the following lemma.

Lemma 2

Let ϕ ∈ Φ(2, k) be the function in Example 3 such that Algorithms 04 00307i3 ^ϕ is the unique finitary Λ(2, k)-indexed hierarchical source. For each i = 0, 1, …, k − 1,

(a.1): If λ ∈ Λ(2, k), then λM_i ∈ Λ(2, k) and ‖λM_i‖ = k‖λ‖;
(a.2): If λ ∈ Λ(2, k)⁺ and ϕ(λ) = (λ₁, λ₂, …, λ_k), then

$ϕ (λ M_{i}) = (λ_{1} M_{i}, λ_{2} M_{i}, \dots, λ_{k} M_{i})$

Proof

Property (a.1), whose proof we omit, is a simple consequence of the fact that M_i has row sums equal to k. Fix a type λ from Λ(2, k)⁺. Letting ϕ(λ) = (λ₁, λ₂, …, λ_k) and letting ϕ(λM_i) = (µ₁, µ₂, …, µ_k), we show µ_s = λ_sM_i (s = 1, …, k), which will establish Property (a.2). Write λ in the form

λ = (k q_{1} + r_{1}, k q_{2} + r_{2})

where r(λ) = (r₁, r₂). As remarked in Example 3, either r₁ = r₂ = 0, or r₁ + r₂ = k. Let us first handle the case r₁ + r₂ = k. Then

\begin{matrix} λ_{s} = (q_{1}, q_{2}) + (1, 0), & 1 \leq s \leq r_{1} \\ λ_{s} = (q_{1}, q_{2}) + (0, 1), & r_{1} + 1 \leq s \leq k \end{matrix}

It is easy to show that

λ M_{i} = (k q_{1}^{'} + r_{1}, k q_{2}^{'} + r_{2})

where

q_{1}^{'} = (i + 1) q_{1} + i q_{2} + i, q_{2}^{'} = (k - i - 1) q_{1} + (k - i) q_{2} + k - i - 1

It follows that

μ_{s} = (q_{1}^{'}, q_{2}^{'}) + (1, 0), 1 \leq s \leq r_{1}

μ_{s} = (q_{1}^{'}, q_{2}^{'}) + (0, 1), r_{1} + 1 \leq s \leq k

For 1 ≤ s ≤ r₁, we have

λ_{s} M_{i} = (q_{1} + 1, q_{2}) M_{i} = (q_{1} (i + 1) + q_{2} i + i + 1, q_{1} (k - i - 1) + q_{2} (k - i) + k - i - 1) = (q_{1}^{'} + 1, q_{2}^{'}) = μ_{s}

For r₁ + 1 ≤ s ≤ k, we have

λ_{s} M_{i} = (q_{1}, q_{2} + 1) M_{i} = (q_{1} (i + 1) + q_{2} i + i, q_{1} (k - i - 1) + q_{2} (k - i) + k - 1) = (q_{1}^{'}, q_{2}^{'} + 1) = μ_{s}

The remaining case r₁ = r₂ = 0 is much easier. We have

\begin{matrix} λ & = (q_{1} k, q_{2} k) \\ λ_{s} & = (q_{1}, q_{2}), & 1 \leq s \leq k \\ λ M_{i} & = (q_{1} k (i + 1) + q_{2} k i, q_{1} k (k - i - 1) + q_{2} k (k - i)) \\ μ_{s} & = (q_{1} (i + 1) + q_{2} i, q_{1} (k - i - 1) + q_{2} (k - i)) = λ_{s} M_{i}, & 1 \leq s \leq k \end{matrix}

Proof of Theorem 2

We first derive part(c) and then part(b) (part(a) is already taken care of, as remarked previously). We derive part(c) by establishing Equation (11) for a fixed i ∈ {0, 1, …, k − 1}. Let ϕ ∈ Φ(2, k) be the function given in Example 3 and recall that H_2,k denotes the entropy function H_ϕ on Λ(2, k). Referring to the definition of $T_{i}^{*}$ in Equation (9) and T_i in Equation (10), we see that proving Equation (11) is equivalent to proving

h_{2, k} (k^{- 1} p M_{i}) = k^{- 1} h_{2, k} (p) + k^{- 1} p \cdot v_{i}, p \in ℙ_{2}

(12)

We first show that

H_{2, k} (λ M_{i}) = H_{2, k} (λ) + λ \cdot v_{i}, λ \in Λ (2, k)

(13)

Our proof of Equation (13) is by induction on ‖λ‖. We first must verify Equation (13) for ‖λ‖ = 1, which is the two cases λ = (1, 0) and λ = (0, 1). For λ = (1, 0), the left side of Equation (13) is the entropy of the first row of M_i, which by Equation (5) is

H_{2, k} (i + 1, k - i - 1) = {log}_{2} (\begin{matrix} k \\ i + 1 \end{matrix})

and the right side is

H_{2, k} (1, 0) + (1, 0) \cdot v_{i} = 0 + {log}_{2} (\begin{matrix} k \\ i + 1 \end{matrix})

Similarly, if λ = (0, 1), both sides of Equation (13) are equal to ${log}_{2} (\begin{matrix} k \\ i \end{matrix})$ . Fix λ* ∈ Λ(2, k) for which ‖λ*‖ > 1, and for the induction hypothesis assume that Equation (13) holds when ‖λ‖ is smaller than ‖λ*‖. The proof by induction is then completed by showing that Equation (13) holds for λ = λ*. Let

ϕ (λ^{*}) = (λ_{1}, λ_{2}, \dots, λ_{k)}

By the induction hypothesis,

H_{2, k} (λ_{s} M_{i}) = H_{2, k} (λ_{s}) + λ_{s} \cdot v_{i}, s = 1, 2, \dots, k

Adding,

\sum_{s = 1}^{k} H_{2, k} (λ_{s} M_{i}) = [\sum_{s = 1}^{k} H_{2, k} (λ_{s})] + λ^{*} \cdot v_{i}

(14)

By Lemma 2,

ϕ (λ^{*} M_{i}) = (λ_{1} M_{i}, λ_{2} M_{i}, \dots, λ_{k} M_{i})

Appealing to Equation (3), we then have

\sum_{s = 1}^{k} H_{2, k} (λ_{s} M_{i}) = H_{2, k} (λ^{*} M_{i}) - {log}_{2} N

where N is the number of permutations of the k-tuple (λ₁M_i, …, λ_kM_i). Similarly,

\sum_{s = 1}^{k} H_{2, k} (λ_{s}) = H_{2, k} (λ^{*}) - {log}_{2} N_{2}

where N₂ is the number of permutations of the k-tuple (λ₁, …, λ_k). Since M_i is nonsingular (its determinant is k), we have N = N₂. Substituting the right hand sides of the previous two equations into Equation (14), we obtain Equation (13) for λ = λ*, completing the proof by induction. Dividing both sides of Equation (13) by ‖λ‖, and using the fact that ‖λM_i‖ = k‖λ‖, we see that

k H_{2, k} (λ M_{i}) / ‖ λ M_{i} ‖ = (H_{2, k} (λ) / ‖ λ ‖) + p_{λ} \cdot v_{i}

which by Equation (6) becomes

k h_{2, k} ((λ M_{i}) / ‖ λ M_{i} ‖ = h_{2, k} (p_{λ}) + p_{λ} \cdot v_{i}

It is easy to see that

(λ M_{i}) / ‖ λ M_{i} ‖ = k^{- 1} p_{λ} M_{i}

Therefore,

h_{2, k} (k^{- 1} p_{λ} M_{i}) = k^{- 1} h_{2, k} (p_{λ}) + k^{- 1} (p_{λ} \cdot v_{i})

Equation (12) then follows since the set $ℙ_{2}^{*} = {p_{λ} : λ \in Λ (2, k)}$ is dense in ℙ₂ and h_2,k is a continuous function on ℙ₂, completing the derivation of part(c) of Theorem 2. All that remains is to prove part(b) of Theorem 2. Let G = {(p, h_2,k(p)) : p ∈ ℙ₂} be the graph of h_2,k. Part(c) is equivalent to the property that

T_{i} (G) \subset G, i = 0, 1, \dots, k - 1

This property, together with the fact that ${T_{i}^{*} : i = 0, 1, \dots, k - 1}$ is an IFS on ℙ₂ with attractor ℙ₂, allows us to conclude that G is the attractor of the IFS {T₀, …, T_i₋₁} (Lemma B.1 of Appendix B), and h_2,k is self-affine because the T_i's are affine. Theorem 2(b) is therefore true.

Generating Hierarchical Entropy Function Plots

For each k ≥ 2, let $h_{2, k}^{*} : [0, 1] \to ℝ$ be the function

h_{2, k}^{*} (x) = h_{2, k} (x, 1 - x), 0 \leq x \leq 1

We can obtain kⁿ points on the plot of $h_{2, k}^{*}$ as follows. Let {T_i : i = 0, 1, …, k − 1} be the IFS on Ω₂ given in Theorem 2, such that the attractor of this IFS is the graph of h_2,k. Let S₀(k) = {(0, 1, 0)}, and generate subsets S₁(k), S₂(k), …, S_n(k) of ℝ³ by the recursion

S_{j} (k) = \cup_{i = 0}^{k - 1} T_{i} (S_{j - 1} (k)), j = 1, 2, \dots, n

Then S_n(k) consists of kⁿ points of the form (x, 1 − x, h_2,k(x, 1 − x)). Projecting according to

(x, 1 - x, h_{2, k} (x, 1 - x)) \to (x, h_{2, k} (x, 1 - x)) = (x, h_{2, k}^{*} (x))

we obtain kⁿ points on the plot of

h_{2, k}^{*}

. Using a Dell Latitude D620 laptop, we did S_n(k) computations to obtain the plots in Figure 2, as follows.

The plot of $h_{2, 2}^{*}$ used the set S₂₄ (2), consisting of 2²⁴ = 16777216 points, computed in 4.2 seconds.
The plot of $h_{2, 3}^{*}$ used the set S₁₅ (3) consisting of 3¹⁵ = 14348907 points, computed in 3.3 seconds.
The plot of $h_{2, 4}^{*}$ used the set S₁₂(4) consisting of 4¹² = 16777216 points, computed in 3.5 seconds.

We point out that the functions $h_{2, 2}^{*}$ and $h_{2, 4}^{*}$ , although their plots look similar, are not the same. For example, $h_{2, 2}^{*} (1 / 2) = 1 / 2$ , whereas $h_{2, 4}^{*} (1 / 2) = {log}_{2} (6) / 4 \approx 0.646$ .

4. h_3,k Is Self-Affine

Fix k ≥ 2. It is the purpose of this section to study h_3,k : ℙ₃ → [0, ∞), the hierarchical entropy function induced by the unique finitary Λ(3, k)-indexed hierarchical source. In ℝ³, let Q_k be the convex hull of the set {(k, 0, 0), (0, k, 0), (0, 0, k)}. Then Q_k is an equilateral triangle whose three vertices are (k, 0, 0), (0, k, 0), (0, 0, k). We employ the well-known quadratic partition [7] of Q_k into k² congruent equilateral triangles, formed as follows. Partition each of the three sides of Q_k into k line segments of equal length by laying down k − 1 interior points along the side. For each vertex of Q_k, draw a line segment connecting the first interior points reached going out from the vertex along its two sides, then draw a line segment connecting the second interior points reached, and so forth until k − 1 line segments have been drawn. Doing this for each of the three vertices, you will have drawn a total of 3(k − 1) line segments, which subdivide Q_k into the k² congruent equilateral triangles of the quadratic partition. See Figure 3, which illustrates the quadratic partition of triangle Q₃ into nine sub-triangles.

Let V₁ be the set of all points (a, b, c) in Q_k such that a is a positive integer and b, c are non-negative integers. There are k(k + 1)/2 points in V₁. For each v = (a, b, c) in V₁, let M_1,v be the 3 × 3 matrix

M_{1, v} = [\begin{matrix} a & b & c \\ a - 1 & b + 1 & c \\ a - 1 & b & c + 1 \end{matrix}]

(15)

For each v ∈ V₁, the convex hull of the rows of M_1,v is one of the sub-triangles in the quadratic partition of Q_k, and these sub-triangles are distinct as v varies through V₁. This gives us a total of k(k + 1)/2 of the sub-triangles in the quadratic partition of Q_k, and we call these the V₁ sub-triangles of the partition. Let V₂ be the set of all (a, b, c) in Q_k such that a is a non-negative integer and b, c are positive integers. There are k(k − 1)/2 points in V₂. For each v = (a, b, c) in V₂, let M_2,v be the 3 × 3 matrix

M_{2, v} = [\begin{matrix} a & b & c \\ a + 1 & b - 1 & c \\ a + 1 & b & c - 1 \end{matrix}]

(16)

For each v ∈ V₂, the convex hull of the rows of M_2,v is one of the sub-triangles in the quadratic partition of Q_k, and these sub-triangles are distinct as v varies through V₂. This gives us a total of k(k − 1)/2 of the sub-triangles in the quadratic partition of Q_k, and we call these the V₂ sub-triangles of the partition. The V₁ sub-triangles are all translations of each other; the V₂ sub-triangles are all translations of each other and each one can be obtained by rotating a V₁ sub-triangle about its center 180 degrees, followed by a translation. Together, the k(k + 1)/2 V₁ sub-triangles and the k(k − 1)/2 V₂ sub-triangles constitute all k² sub-triangles in the quadratic partition of Q_k.

We define $ℳ$ (k) to be the set of k² matrices

ℳ (k) ≜ {M_{1, v} : v \in V_{1}} \cup {M_{2, v} : v \in V_{2}}

Each row sum of each matrix in $ℳ$ (k) is equal to k. Because of this property, we can define for each M ∈ $ℳ$ (k) the mapping $T_{M}^{*} : ℙ_{3} \to ℙ_{3}$ in which

T_{M}^{*} (p) ≜ k^{- 1} p M, p \in ℙ_{3}

(17)

and we can also define the mapping T_M : Ω₃ → Ω₃ in which

T_{M} (p, y) ≜ (T_{M}^{*} (p), k^{- 1} y + k^{- 1} p \cdot v_{M}), (p, y) \in Ω_{3}

(18)

where

v_{M} = (H_{3, k} (M (1, 1 : 3)), H_{3, k} (M (2, 1 : 3)), H_{3, k} (3, 1 : 3))

(19)

Remarks

It is clear that the set of k² mappings ${T_{M}^{*} : M \in M (k)}$ is an IFS on ℙ₃. This fact allows one to prove (Lemma B.4 of Appendix B) that the related set of k² mappings {T_M : M ∈ $ℳ$ (k)} is an IFS on Ω₃. In the following example, we exhibit this IFS in a special case.

Example 5

Let k = 3. Referring to Figure 3, we see that the 9 matrices in $ℳ$ (3) are

\begin{matrix} M_{1} = [\begin{matrix} 300 \\ 210 \\ 201 \end{matrix}], & M_{2} = [\begin{matrix} 201 \\ 111 \\ 102 \end{matrix}], & M_{3} = [\begin{matrix} 210 \\ 120 \\ 111 \end{matrix}] \\ M_{4} = [\begin{matrix} 102 \\ 012 \\ 003 \end{matrix}], & M_{5} = [\begin{matrix} 111 \\ 021 \\ 012 \end{matrix}], & M_{6} = [\begin{matrix} 120 \\ 030 \\ 021 \end{matrix}] \\ M_{7} = [\begin{matrix} 111 \\ 201 \\ 210 \end{matrix}], & M_{8} = [\begin{matrix} 012 \\ 102 \\ 111 \end{matrix}], & M_{9} = [\begin{matrix} 021 \\ 111 \\ 120 \end{matrix}] \end{matrix}

Following Equation (19), let v_i ∈ ℙ³ be the vector whose components are the H_3,3 entropies of the rows of M_i. Letting α = log₂ 3 and β = log₂ 6, Formula (7) is used to obtain

\begin{matrix} v_{1} = (0, α, α), & v_{2} = (α, β, α), & v_{3} = (α, α, β) \\ v_{4} = (α, α, 0), & v_{5} = (β, α, α), & v_{6} = (α, 0, α) \\ v_{7} = (β, α, α), & v_{8} = (α, α, β), & v_{9} = (α, β, α) \end{matrix}

Following Equation (18), for each i = 1, 2, …, 9, let T_i : Ω₃ → Ω₃ be the mapping defined by

T_{i} (p, y) ≜ (p M_{i}, y + p \cdot v_{i}) / 3, (p, y) \in Ω_{3}

Theorem 3 which follows will tell us that the graph of h_3,3 is the attractor of the IFS ${T_{i}}_{i = 1}^{9}$ .

Theorem 3

Let k ≥ 2 be arbitrary The following statements hold:

(a): {T_M : M ∈ $ℳ$ (k)} is an IFS on Ω₃.
(b): h_3,k is self-affine and its graph is the attractor of the IFS in (a).
(c): For each M ∈ $ℳ$ (k),

T_{M} (p, h_{3, k} (p)) = (T_{M}^{*} (p), h_{3, k} (T_{M}^{*} (p))), p \in ℙ_{3}

(20)

Our proof of Theorem 3 requires a couple of lemmas, which follow.

Lemma 3

Let λ ∈ Λ(3,k)⁺, let ϕ ∈ Φ(3,k) be the function given in Example 4, and let ϕ(λ) = (λ₁, λ₂, …, λ_k). Suppose we write

λ = (k q_{1} + r_{1}, k q_{2} + r_{2}, k q_{3} + r_{3})

where the q_i's are non-negative, the r_i's belong to the set {1, 2, …, k}, and r₁ + r₂ + r₃ = 2k. Then

card {1 \leq i \leq k : λ_{i} = (q_{1}, q_{2} + 1, q_{3} + 1)} = k - r_{1}

(21)

card {1 \leq i \leq k : λ_{i} = (q_{1} + 1, q_{2}, q_{3} + 1)} = k - r_{2}

(22)

card {1 \leq i \leq k : λ_{i} = (q_{1} + 1, q_{2} + 1, q_{3})} = k - r_{3}

(23)

Proof

If each r_i < k, then by definition of ϕ(λ) in Example 4, the properties Equations (21)–(23) are true. Now suppose at least one r_i = k. Then exactly one r_i = k (since otherwise some r_i = 0, which is not allowed). By symmetry, we may suppose that r₁ = k. We may now express λ as

λ = ((q_{1} + 1) k, q_{2} k + r_{2}, q_{3} k + r_{3})

Since r₂, r₃ ∈ {1, 2, …, k − 1}, and r₂ + r₃ = k, the definition of ϕ(λ) tells us that

card {1 \leq i \leq k : λ_{i} = (q_{1} + 1, q_{2}, q_{3}) + (0, 1, 0)} = r_{2}

(24)

card {1 \leq i \leq k : λ_{i} = (q_{1} + 1, q_{2}, q_{3}) + (0, 0, 1)} = r_{3}

(25)

Equation (24) yields Equation (23), Equation (25) yields Equation (22), and Equation (21) is vacuously true because k − r₁ = 0.

Lemma 4

Let ϕ ∈ Φ(3, k) be the function given in Example 4. Properties (a.1)-(a.2) below are true for any matrix M in the set $ℳ$ (k).

(a.1): If λ is a type in Λ(3, k), then λM ∈ Λ(3, k) and ‖λM‖ = k‖λ‖;
(a.2): If λ is a type in Λ(3, k)⁺, and ϕ(λ) = (λ₁, λ₂, …, λ_k), then ϕ(λM) is some permutation of (λ₁M, λ₂M, …, λ_kM).

Proof

Property (a.1), whose proof we omit, is a simple consequence of the fact that each matrix in $ℳ$ (k) has row sums equal to k. Fix λ ∈ Λ(3, k)⁺ and fix M ∈ $ℳ$ (k). Let r(λ) = (r₁, r₂, r₃), and let

\begin{matrix} λ & = & (k q_{1} + r_{1}, k q_{2} + r_{2}, k q_{3} + r_{3}) \\ ϕ (λ) & = & (λ_{1}, λ_{2}, \dots, λ_{k}) \\ ϕ (λ M) & = & (μ_{1}, μ_{2}, \dots, u_{k}) \end{matrix}

M is either of the form Equation (15) (Case 1) or of the form Equation (16) (Case 2). Throughout the rest of the proof, we employ the parameter β = (r₁ + r₂ + r₃)/k. As remarked in Example 4, β ∈ {0, 1, 2}.

Proof for Case 1

We have $λ M = (k q_{1}^{'} + r_{1}, k q_{2}^{'} + r_{2}, k q_{3}^{'} + r_{3})$ , where

\begin{array}{l} q_{1}^{'} & ≜ & q_{1} a + q_{2} (a - 1) + q_{3} (a - 1) + β (a - 1) \\ q_{2}^{'} & ≜ & q_{1} b + q_{2} (b + 1) + q_{3} b + β b \\ q_{3}^{'} & ≜ & q_{1} c + q_{2} c + q_{3} (c + 1) + β c \end{array}

Note that

(q_{1}, q_{2}, q_{3}) M = (q_{1}^{'}, q_{2}^{'}, q_{3}^{'}) - β (α - 1, b, c)

(26)

If β = 0, then

λ_{i} = (q_{1}, q_{2}, q_{3}), μ_{i} = (q_{1}^{'}, q_{2}^{'}, q_{3}^{'}), i = 1, 2, \dots, k

From Equation (26), we have $(q_{1}, q_{2}, q_{3}) M = (q_{1}^{'}, q_{2}^{'}, q_{3}^{'})$ , and therefore Property (a.2) follows. If β = 1, by definition of ϕ(λ) and ϕ(λM) in Example 4,

card {1 \leq i \leq k : λ_{i} = (q_{1} + 1, q_{2}, q_{3})} = r_{1}

(27)

card {1 \leq i \leq k : λ_{i} = (q_{1}, q_{2} + 1, q_{3})} = r_{2}

(28)

card {1 \leq i \leq k : λ_{i} = (q_{1}, q_{2}, q_{3} + 1)} = r_{3}

(29)

\begin{matrix} card {1 \leq i \leq k : μ_{i} = (q_{1}^{'} + 1, q_{2}^{'}, q_{3}^{'})} = r_{1} \\ card {1 \leq i \leq k : μ_{i} = (q_{1}^{'}, q_{2}^{'} + 1, q_{3}^{'})} = r_{2} \\ card {1 \leq i \leq k : μ_{i} = (q_{1}^{'}, q_{2}^{'}, q_{3}^{'} + 1)} = r_{3} \end{matrix}

Property (a.2) then follows if the equations

\begin{array}{l} (q_{1} + 1, q_{2}, q_{3}) M & = & (q_{1}^{'} + 1, q_{2}^{'}, q_{3}^{'}) \\ (q_{1}, q_{2} + 1, q_{3}) M & = & (q_{1}^{'}, q_{2}^{'} + 1, q_{3}^{'}) \\ (q_{1}, q_{2}, q_{3} + 1) M & = & (q_{1}^{'}, q_{2}^{'}, q_{3}^{'} + 1) \end{array}

are valid. These three equations can be seen to hold using the fact from Equation (26) that

(q_{1}, q_{2}, q_{3}) M = (q_{1}^{'}, q_{2}^{'}, q_{3}^{'}) - (a - 1, b, c)

Finally, if β = 2,

card {1 \leq i \leq k : λ_{i} = (q_{1}, q_{2} + 1, q_{3} + 1)} = k - r_{1}

(30)

card {1 \leq i \leq k : λ_{i} = (q_{1} + 1, q_{2}, q_{3} + 1)} = k - r_{2}

(31)

card {1 \leq i \leq k : λ_{i} = (q_{1} + 1, q_{2} + 1, q_{3})} = k - r_{3}

(32)

\begin{array}{l} card {1 \leq i \leq k : μ_{i} = (q_{1}^{'}, q_{2}^{'} + 1, q_{3}^{'} + 1)} = k - r_{1} \\ card {1 \leq i \leq k : μ_{i} = (q_{1}^{'} + 1, q_{2}^{'}, q_{3}^{'} + 1) = k - r_{2} \\ card {1 \leq i \leq k : μ_{i} = (q_{1}^{'} + 1, q_{2}^{'} + 1, q_{3}^{'})} = k - r_{3} \end{array}

Property (a.2) then follows if the equations

\begin{array}{l} (q_{1}, q_{2} + 1, q_{3} + 1) M = (q_{1}^{'}, q_{2}^{'} + 1, q_{3}^{'} + 1) \\ (q_{1} + 1, q_{2}, q_{3} + 1) M = (q_{1}^{'} + 1, q_{2}^{'}, q_{3}^{'} + 1) \\ (q_{1} + 1, q_{2} + 1, q_{3}) M = (q_{1}^{'} + 1, q_{2}^{'} + 1, q_{3}^{'}) \end{array}

are valid. These equations can be seen to hold using the fact from Equation (26) that

(q_{1}, q_{2}, q_{3}) M = (q_{1}^{'}, q_{2}^{'}, q_{3}^{'}) - 2 (a - 1, b, c)

Proof for Case 2

We have

λ M = (k q_{1}^{'} + r_{1}^{'}, k q_{2}^{'} + r_{2}^{'}, k q_{3}^{'} + r_{3}^{'})

where

\begin{array}{l} q_{1}^{'} = q_{1} a + q_{2} (a + 1) + q_{3} (a + 1) + β (a + 1) - 1 \\ q_{2}^{'} = q_{1} b + q_{2} (b - 1) + q_{3} b + β b - 1 \\ q_{3}^{'} = q_{1} c + q_{2} c + q_{3} (c - 1) + β c - 1 \\ r_{i}^{'} = k - r_{i}, i = 1, 2, 3 \end{array}

Note that

(q_{1}, q_{2}, q_{3}) M = (q_{1}^{'}, q_{2}^{'}, q_{3}^{'}) - β (a + 1, b, c) + (1, 1, 1)

(33)

If β = 0, then

λ_{i} = (q_{1}, q_{2}, q_{3}), μ_{i} = (q_{1}^{'} + 1, q_{2}^{'} + 1, q_{3}^{'} + 1), i = 1, 2, \dots, k

From Equation (33), we have

(q_{1}, q_{2}, q_{3}) M = (q_{1}^{'} + 1, q_{2}^{'} + 1, q_{3}^{'} + 1)

and therefore Property (a.2) follows. Now suppose β = 1. The entries of

(r_{1}^{'}, r_{2}^{'}, r_{3}^{'})

belong to {1, 2, …, k} and their sum is 2k. By Lemma 3,

\begin{array}{l} card {1 \leq s \leq k : μ_{s} = (q_{1}^{'}, q_{2}^{'} + 1, q_{3}^{'} + 1)} = k - r_{1}^{'} = r_{1} \\ card {1 \leq s \leq k : μ_{s} = (q_{1}^{'} + 1, q_{2}^{'}, q_{3}^{'} + 1)} = k - r_{2}^{'} = r_{2} \\ card {1 \leq s \leq k : μ_{s} = (q_{1}^{'} + 1, q_{2}^{'} + 1, q_{3}^{'})} = k - r_{3}^{'} = r_{3} \end{array}

In view of the fact that Equations (27–29) also hold, Property (a.2) then follows if the equations

\begin{array}{l} (q_{1} + 1, q_{2}, q_{3}) M = (q_{1}^{'}, q_{2}^{'} + 1, q_{3}^{'} + 1) \\ (q_{1}, q_{2} + 1, q_{3}) M = (q_{1}^{'} + 1, q_{2}^{'}, q_{3}^{'} + 1) \\ (q_{1}, q_{2}, q_{3} + 1) M = (q_{1}^{'} + 1, q_{2}^{'} + 1, q_{3}^{'}) \end{array}

are valid. These equations can be seen to hold using the fact from Equation (33) that

(q_{1}, q_{2}, q_{3}) M = (q_{1}^{'}, q_{2}^{'}, q_{3}^{'}) - (a, b - 1, c - 1)

Thus, Property (a.2) holds. Finally, suppose that β = 2. The entries of $(r_{1}^{'}, r_{2}^{'}, r_{3}^{'})$ belong to {1, 2, …, k} and their sum is k. Under these conditions, no entry of $(r_{1}^{'}, r_{2}^{'}, r_{3}^{'})$ can be equal to k, and so all entries belong to the set {1, 2, …, k − 1}. By definition of ϕ(λM) in Example 4,

\begin{array}{l} card {1 \leq s \leq k : μ_{s} = (q_{1}^{'} + 1, q_{2}^{'}, q_{3}^{'})} = r_{1}^{'} = k - r_{1} \\ card {1 \leq s \leq k : μ_{s} = (q_{1}^{'}, q_{2}^{'} + 1, q_{3}^{'})} = r_{2}^{'} = k - r_{1} \\ card {1 \leq s \leq k : μ_{s} = (q_{1}^{'}, q_{2}^{'}, q_{3}^{'} + 1)} = r_{3}^{'} = k - r_{3} \end{array}

In view of the fact that Equations (30–32) also hold, Property (a.2) then follows if the equations

\begin{array}{l} (q_{1}, q_{2} + 1, q_{3} + 1) M = (q_{1}^{'} + 1, q_{2}^{'}, q_{3}^{'}) \\ (q_{1} + 1, q_{2}, q_{3} + 1) M = (q_{1}^{'}, q_{2}^{'} + 1, q_{3}^{'}) \\ (q_{1} + 1, q_{2} + 1, q_{3}) M = (q_{1}^{'}, q_{2}^{'}, q_{3}^{'} + 1) \end{array}

are valid. These equations can be seen to hold using the fact from Equation (33) that

(q_{1}, q_{2}, q_{3}) M = (q_{1}^{'}, q_{2}^{'}, q_{3}^{'}) - (2 a + 1, 2 b - 1, 2 c - 1)

Thus, Property (a.2) holds.

Proof of Theorem 3

We first derive part(c) and then part(b) (part(a) is already taken care of, as remarked previously). We derive part(c) by establishing Equation (20) for a fixed M ∈ $ℳ$ (k). Let ϕ ∈ Φ(3, k) be the function given in Example 4 and recall that H_3,k denotes the entropy function H_ϕ on Λ(3, k). Referring to the definition of $T_{M}^{*}$ in Equation (17) and T_M in Equation (18), we see that proving Equation (20) is equivalent to proving

h_{3, k} (k^{- 1} p M) = k^{- 1} h_{3, k} (p) + k^{- 1} p \cdot v_{M}, p \in ℙ_{3}

(34)

We first show that

H_{3, k} (λ M) = H_{3, k} (λ) + λ \cdot v_{M}, λ \in Λ (3, k)

(35)

The proof is by induction on ‖λ‖. Equation (35) holds for ‖λ‖ = 1, which is the three cases λ = (1, 0, 0), λ = (0, 1, 0), λ = (0, 0, 1). Fix λ* ∈ Λ(3, k) for which ‖λ*‖ > 1, and for the induction hypothesis assume that Equation (35) holds when ‖λ‖ is smaller than ‖λ*‖. The proof by induction is then completed by showing that Equation (35) holds for λ = λ*. Let ϕ(λ*) = (λ₁, λ₂, …, λ_k). By the induction hypothesis,

H_{3, k} (λ_{i} M) = H_{3, k} (λ_{i}) + λ_{i} \cdot v_{M}, i = 1, 2, \dots, k

Adding,

\sum_{i = 1}^{k} H_{3, k} (λ_{i} M) = [\sum_{i = 1}^{k} H_{3, k} (λ_{i})] + λ^{*} \cdot v_{M}

(36)

By Lemma 4, ϕ(λ*M) is a permutation of (λ₁M, λ₂M, …, λ_kM), and so by Equation (3),

\sum_{i = 1}^{k} H_{3, k} (λ_{i} M) = H_{3, k} (λ^{*} M) - {log}_{2} N

where N is the number of permutations of the k-tuple (λ₁M, …, λ_kM). Similarly,

\sum_{i = 1}^{k} H_{3, k} (λ_{i}) = H_{3, k} (λ^{*}) - {log}_{2} N_{2}

where N₂ is the number of permutations of the k-tuple (λ₁, …, λ_k). Since M is nonsingular (its determinant is k), we must have N = N₂. Substituting the right hand sides of the previous two equations into Equation (36), we obtain Equation (35) for λ = λ*, completing the proof by induction. Dividing both sides of Equation (35) by ‖λ‖, and using the fact that ‖λM‖ = k‖λ‖, we see that

k H_{3, k} (λ M) / ‖ λ M ‖ = (H_{3, k} (λ) / ‖ λ ‖) + p_{λ} \cdot v_{M}

which, using Equation (8), becomes

k h_{3, k} ((λ M) / ‖ λ M ‖) = h_{3, k} (p_{λ}) + p_{λ} \cdot v_{M}

It is easy to see that

(λ M) / ‖ λ M ‖ = k^{- 1} p_{λ} M

Therefore,

h_{3, k} (k^{- 1} p_{λ} M) = k^{- 1} h_{3, k} (p_{λ}) + k^{- 1} (p λ \cdot v_{M})

Equation (34) then follows since the set $ℙ_{3}^{*} = {p_{λ} : λ \in Λ (3, k)}$ is dense in ℙ₃ and h_3,k is a continuous function on ℙ₃, completing the derivation of part(c) of Theorem 3. All that remains is to prove part(b) of Theorem 3. Letting G = {(p, h_3,k(p)) : p ∈ ℙ₃} be the graph of h_3,k, part(c) is equivalent to the property that

T_{M} (G) \subset G, M \in ℳ (k)

(37)

Note that

\cup {T_{M}^{*} (ℙ_{3}) : M \in ℳ (k)} = ℙ_{3}

since the sets in the union form the quadratic partition of ℙ₃, and so ℙ₃ must be the attractor of the IFS

{T_{M}^{*} : M \in ℳ (k)}

. This fact, together with Equation (37), allows us to conclude (via Lemma B.1 of Appendix B) that G is the attractor of the IFS {T_M : M ∈

ℳ

(k)}, and h_3,k is self-affine because the T_M's are affine. Theorem 3(b) is therefore true.

5. Properties of Hierarchical Entropy Functions

We conclude the paper with a discussion of some properties of the self-affine hierarchical entropy functions h_2,k and h_3,k. For each m ∈ {2, 3} and each k ≥ 2, hierarchical entropy function h_m,k obeys the following properties.

P1: h_m,k is a continuous function on ℙ_m.
P2: If two probability vectors p₁, p₂ in ℙ_m are permutations of each other, then

$h_{m, k} (p_{1}) = h_{m, k} (p_{2})$
P3: If p ∈ ℙ_m is degenerate (meaning that it is a permutation of the vector (1, 0, 0, …, 0)), then h_m,k(p) = 0.
P4: For each p ∈ ℙ_m,

$0 \leq h_{m, k} (p) \leq {log}_{2} m$

Properties P1-P4 are simple consequences of what has gone before. For example, to see why the symmetry property P2 is true, first observe that H_m,k(λ₁) = H_m,k(λ₂) if λ₁, λ₂ are types which are permutations of each other; this symmetry property for entropy on types then extends to ℙ_m using the fact that the finitary source which induces h_m,k is entropy-stable.

The well-known Shannon entropy function h_m on ℙ_m is defined by

h_{m} (p_{1}, p_{2}, \dots, p_{m}) ≜ \sum_{i = 1}^{m} - p_{i} {log}_{2} p_{i}

where p_i log₂ p_i is taken to be zero if p_i = 0. We point out that h_m also satisfies properties P1-P4. In addition, h_m satisfies the property that it attains its maximum value at the equiprobable distribution (1/m, 1/m, …, 1/m). This property fails in general for the h_m,k functions, although it is true for some of them; for example, referring to Figure 2, we see that h_2,2 and h_2,4 do not reach their maximum at (1/2, 1/2), but h_2,3 does. It is an open problem to determine the maximum value of each h_2,k and h_3,k and to see where the maximum is attained.

The inequality

h_{m, k} (p) \leq h_{m} (p), p \in ℙ_{m}, m \in {2, 3}, k \geq 2

gives us a relationship between hierarchical entropy and Shannon entropy; it follows from the fact that every string in a hierarchical type class is of the same type. It is an open problem whether this inequality is strict at every non-degenerate p ∈ ℙ_m; we have proved this strict inequality property in some special cases (for example, m = k = 2).

Figure 1. Example 1 Tree Representations and Codeword Table.

Figure 2. Plots of

h_{2, k}^{*} (x) = h_{2, k} (x, 1 - x)

for k = 2, 3, 4.

Figure 2. Plots of

h_{2, k}^{*} (x) = h_{2, k} (x, 1 - x)

for k = 2, 3, 4.

Figure 3. Quadratic Partition Of Triangle Q₃.

Acknowledgments

The work of the author was supported in part by National Science Foundation Grant CCF-0830457.

Appendix A

In this Appendix, we prove Theorem 1. In the following, the infinity norm ‖x‖_∞ of a vector x = (x₁, x₂, …, x_m) ∈ ℝ^m is defined as max_i |x_i|.

Lemma A.1

Let $f : ℙ_{m}^{*} \to ℝ$ be a function, and let

∊_{j} = max {| f (p_{λ_{1}}) - f (p_{λ_{2}}) | : {‖ λ_{1} - λ_{2} ‖}_{\infty} \leq 1, λ_{1}, λ_{2} \in Λ_{j} (m, k)}, j \geq 0

(38)

If $\sum_{j = 0}^{\infty} ∊_{j} < \infty$ , then f is uniformly continuous on $ℙ_{m}^{*}$ .

Proof

We show there exists B > 0 such that

sup {| f (q_{1}) - f (q_{2}) | : {‖ q_{1} - q_{2} ‖}_{\infty} \leq k^{- J}, q_{1}, q_{2} \in ℙ_{m}^{*}} \leq B \sum_{j = J}^{\infty} ∊_{j}, J \geq 0

from which the uniform continuity follows. It can be shown that the following two properties hold.

(p.1): For each j ≥ 0 and each pair of distinct types λ₀, λ ∈ Λ_j(m, k), the following is true. Letting I = m‖λ₀ − λ‖_∞, there exist types λ₁, λ₂, …, λ_I in Λ_j(m, k) such that λ_I = λ and

${‖ λ_{i} - λ_{i - 1} ‖}_{\infty} \leq 1, i = 1, 2, \dots, I$

(In other words, we can travel from λ₀ to λ via a path in Λ_j(m, k) consisting of I terms, with successive terms no more than distance 1 apart in the infinity norm.)
(p.2): There is a positive integer M for which the following is true. For each j ≥ 1 and each λ₀ ∈ Λ_j(m, k), there exist types λ₁, λ₂, …, λ_M in Λ_j(m, k) such that λ_M/k ∈ Λ_j−1(m, k) and

${‖ λ_{i} - λ_{i - 1} ‖}_{\infty} \leq 1, i = 1, 2, \dots, M$

(In other words, we can travel in Λ_j(m, k) from any type to a type divisible by k via a path consisting of M terms, with successive terms no more than distance 1 apart in the infinity norm.)

Let J ≥ 0. Suppose q₁, q₂ belong to $ℙ_{m}^{*}$ and ‖q₁ − q₂‖_∞ ≤ k^−J. Fix J′ > J and types λ₁, λ₂ in Λ_J′(m, k) such that q₁ = p_λ₁ and q₂ = p_λ₂. Starting at λ₁ and applying property (p.2) repeatedly (that is, for each j going backwards from j = J′ to j = J + 1), we obtain $λ_{1}^{'} \in Λ_{J} (m, k)$ such that

{‖ q_{1 -} p_{λ_{1}^{'}} ‖}_{\infty} \leq M \sum_{j = J + 1}^{J'} k^{- j} \leq M k^{- J}

| f (q_{1}) - f (p_{λ_{1}^{'}}) | \leq M \sum_{j = J + 1}^{J'} ∊_{j} \leq M \sum_{j = J + 1}^{\infty} ∊_{j}

Similarly, we find $λ_{2}^{'} \in Λ_{J} (m, k)$ such that

{‖ q_{2} - p_{λ_{2}^{'}} ‖}_{\infty} \leq M k^{- J}

| f (q_{2}) - f (p_{λ_{2}^{'}}) | \leq M \sum_{j = J + 1}^{\infty} ∊_{j}

By the triangle inequality, we have

{‖ p_{λ_{1}^{'}} - p_{λ_{2}^{'}} ‖}_{\infty} \leq {‖ q_{1} - q_{2} ‖}_{\infty} + 2 M k^{- J} \leq (2 M + 1) k^{- J}

and then

{‖ λ_{1}^{'} - λ_{2}^{'} ‖}_{\infty} \leq 2 M + 1

Applying property (p.1),

| f (p_{λ_{1}^{'}} - f (p_{λ_{2}^{'}}) | \leq (2 M + 1) ∊_{J}

and then using the triangle inequality again,

| f (q_{1}) - f (q_{2}) | \leq m (2 M + 1) ∊_{J} + 2 M \sum_{j = J + 1}^{\infty} ∊_{j} \leq B \sum_{j = J}^{\infty} ∊_{j}

where B = m(2M +1).

Proof of Theorem 1

Let Algorithms 04 00307i3 = {S(λ) : λ ∈ Λ(m, k)} be a finitary hierarchical source. For every λ ∈ Λ(m, k), we have H(S(kλ)) = kH(S(λ)) and hence the normalized entropies H(S(kλ))/‖kλ‖ and H(S(λ))/‖λ‖ coincide. It follows that there exists a unique function $f : ℙ_{m}^{*} \to [0, \infty)$ such that

f (p_{λ}) = H (S (λ)) / ‖ λ ‖, λ \in Λ (m, k)

It is easily seen that Algorithms 04 00307i3 is entropy-stable by the definition in Section 2.2 if f can be extended to a continuous function on ℙ_m (which will be the hierarchical entropy function induced by ). This extension will be possible if f is uniformly continuous on $ℙ_{m}^{*}$ , and we establish this by showing that Σ_j ∊_j < ∞, where {∊_j} is the sequence in Equation (38). Let ϕ ∈ Φ(m, k) be such that Algorithms 04 00307i3 ^ϕ = . Let j ≥ 1 and let λ, µ be types in Λ_j(m, k) for which ‖λ − µ‖_∞ ≤ 1. Letting

ϕ (λ) = (λ_{1}, \dots, λ_{k}), ϕ (μ) = (μ_{1}, \dots, μ_{k})

it follows that

{‖ λ_{i} - μ_{i} ‖}_{\infty} \leq 1, i = 1, 2, \dots, k

(39)

and by Lemma 1 we have

H (S (λ)) = \sum_{i = 1}^{k} H (S (λ_{i})) + {log}_{2} N_{1}

H (S (μ)) = \sum_{i = 1}^{k} H (S (μ_{i})) + {log}_{2} N_{2}

where N₁, N₂ are positive integers ≤ k!. The latter two equations imply

| f (p_{λ}) - f (p_{μ}) | \leq k^{- 1} \sum_{i = 1}^{k} | f (p_{λ_{i}}) - f (p_{μ_{i}}) | + {log}_{2} (k!) / k^{j}

from which, using Inequality (39),

| f (p_{λ}) - f (p_{μ}) | \leq ∊_{j - 1} + {log}_{2} (k!) / k^{j}

We conclude that

∊_{j} \leq ∊_{j - 1} + {log}_{2} (k!) / k^{j}, j \geq 1

from which it follows that

\sum_{j = 0}^{\infty} ∊_{j} < \infty

. Applying Lemma A.1, we can now say that f is uniformly continuous on

ℙ_{m}^{*}

.

Appendix B

This Appendix proves some auxiliary results useful for proving Theorems 2–3. Henceforth, ‖x‖₂ shall denote the Euclidean norm of a vector x in a finite-dimensional Euclidean space.

Lemma B.1

Let Algorithms 04 00307i1 be an IFS of contraction mappings on Ω_m. Let π be the projection mapping (p, y) → p from Ω_m onto ℙ_m. Suppose for each T ∈ , there is a contraction mapping T* on ℙ_m such that T*(p) = π(T(p, y)) for every (p, y) in Ω_m, and suppose ℙ_m is the attractor of the IFS {T* : T ∈ Algorithms 04 00307i1 }. Suppose h : ℙ_m → ℝ is a continuous mapping whose graph G_h = {(p, h(p)) : p ∈ ℙ_m} satisfies the property

T (G_{h}) \subset G_{h}, T \in T

Then G_h is the attractor of Algorithms 04 00307i1 .

Proof

Let Q be the attractor of Algorithms 04 00307i1 . Since each mapping in maps the compact set G_h into itself, Q ⊂ G_h by uniqueness of the attractor. The proof is completed by showing the reverse inclusion G_h ⊂ Q. Since π(Q) is the attractor of the IFS {T* : T ∈ }, we must have π(Q) = ℙ_m by assumption. Let (p, h(p)) be an arbitrary element of G_h. Since π(Q) = ℙ_m, there exists a point in Q of the form (p, y). But (p, y) and (p, h(p)) both belong to G_h, so y = h(p). We conclude (p, h(p)) belongs to Q, and therefore G_h ⊂ Q.

Lemma B.2

Let T* : ℙ_m → ℙ_m be a contraction mapping with contraction coefficient σ ∈ (0, 1), meaning that

{‖ T^{*} (p_{1}) - T^{*} (p_{2}) ‖}_{2} \leq σ {‖ p_{1} - p_{2} ‖}_{2}, p_{1}, p_{2} \in ℙ_{m}

Let c = (c₁, c₂, …, c_m) be a vector in ℝ^m and define its variance by

V (c) ≜ \sum_{i = 1}^{m} {(c_{i} - \bar{c})}^{2}

(40)

where c̄ is the average of the entries of c. Let T : Ω_m → Ω_m be the mapping

T (p, y) ≜ (T^{*} (p), σ (y + p \cdot c)), p \in ℙ_{m}, y \in ℝ

Then T is a contraction mapping if V(c) < σ⁻²(1 − σ²)².

Proof

By the intermediate value theorem, there is a real number λ in the interval [σ, 1) such that

V (c) = λ^{- 2} σ^{- 2} {(λ^{2} - σ^{2})}^{2}

(41)

Then T is a contraction if we show that

{‖ T (p, u) - T (q, v) ‖}_{2}^{2} \leq λ^{2} {‖ (p, u) - (q, v) ‖}_{2}^{2}

(42)

for p, q in ℙ_m and u, v ∈ ℝ. The left hand side of Inequality (42) is less than or equal to

σ^{2} {‖ p - q ‖}_{2}^{2} + {[σ (p - q) \cdot c + σ (μ - v)]}^{2}

The right hand side of Inequality (42) is equal to

λ^{2} {‖ p - q ‖}_{2}^{2} + λ^{2} {(u - v)}^{2}

Therefore, we will be done if we can show that

(λ^{2} - σ^{2}) {‖ p - q ‖}_{2}^{2} + λ^{2} t^{2} - {[σ (p - q) \cdot c + σ t]}^{2} \geq 0

(43)

for all p, q in ℙ_m and all real numbers t. If V(c) = 0, then we are done because the left side of Inequality (43) is identically zero (this is because λ = σ and because (p − q) · c = 0 due to the fact that the components of c are constant). We assume V(c) > 0 and therefore λ > σ. Letting Q_p,q(t) for fixed p, q be the quadratic polynomial

Q_{p, q} (t) = λ^{2} t^{2} - {[σ (p - q) \cdot c + σ t]}^{2}

the plot of Q_p,q(t) is a parabola opening upward because the coefficient of t² is the positive number λ² − σ². Therefore, Q_p,q(t) possesses a unique global minimum over t and it is easy to compute

min_{t} Q_{p, q} (t) = \frac{λ^{2} σ^{2} {[(p - q) \cdot c]}^{2}}{σ^{2} - λ^{2}}

It follows that Inequality (43) will be true for all p, q, u, v if

{(λ^{2} - σ^{2})}^{2} {‖ p - q ‖}_{2}^{2} \geq λ^{2} σ^{2} {[(p - q) \cdot c]}^{2}

holds for all p, q in ℙ_m, which in turn will be true if we can show that

x \cdot c \leq λ^{- 1} σ^{- 1} (λ^{2} - σ^{2})

holds for all x = (x₁, x₂, …, x_m) ∈ ℝ^m for which

\sum_{i = 1}^{m} x_{i}^{2} = 1, \sum_{i = 1}^{m} x_{i} = 0

(44)

It is a simple exercise in Lagrange multipliers, which we omit, to show that the vector x = (x₁, …, x_m) satisfying the constraints in Equation (44) which maximizes the dot product x · c is the vector for which

x_{i} = (c_{i} - \bar{c}) / \sqrt{V (c)}, i = 1, 2, \dots, m

For this choice of x, x · c can be seen to be $\sqrt{V (c)}$ . Therefore, we will be done if

V (c) \leq λ^{- 2} σ^{- 2} {(λ^{2} - σ^{2})}^{2}

But this is true with equality, by Equation (41).

Lemma B.3

Let k ≥ 2 be arbitrary. Then, for each i = 0, 1, …, k − 1, the mapping T_i : Ω₂ → Ω₂ defined in Section 3 is a contraction.

Proof

Fix i in {0, 1, …, i − 1}. The mapping $T_{i}^{*} : ℙ_{2} \to ℙ_{2}$ is a contraction mapping with contraction coefficient k⁻¹. Applying Lemma B.2 with σ = k⁻¹, T_i will be a contraction mapping if we can show that

V ({log}_{2} (\begin{matrix} k \\ i + 1 \end{matrix}), {log}_{2} (\begin{matrix} k \\ i \end{matrix})) < {(k - k^{- 1})}^{2}, k \geq 2

(45)

It is easy to compute that

V ({log}_{2} (\begin{matrix} k \\ i + 1 \end{matrix}), {log}_{2} (\begin{matrix} k \\ i \end{matrix})) = V ({log}_{2} a_{1}, {log}_{2} a_{2})

where a₁ = i + 1, a₂ = k − i For any constant γ satisfying 0 < γ < 1, we have

V ({log}_{2} a_{1}, {log}_{2} a_{2}) \leq {\sum_{j = 1}^{2} [{log}_{2} (\frac{γ k}{a_{j}})]}^{2}

(46)

Using the fact that

γ \leq \frac{γ k}{a_{j}} \leq γ k, j = 1, 2

the right side of Inequality (46) is upper bounded by

2 max [{({log}_{2} γ)}^{2}, {({log}_{2} {γ k})}^{2}]

Choosing the smallest value of γ for which

{({log}_{2} {γ k})}^{2} \geq {({log}_{2} γ)}^{2}

holds for every k ≥ 2, we obtain

γ = 1 / \sqrt{2}

. We have thus proved the variance bound

V ({log}_{2} a_{1}, log a_{2}) \leq 2 {[{log}_{2} (\frac{k}{\sqrt{2}})]}^{2}, k \geq 2

Using calculus, it is easy to show that

\sqrt{2} {log}_{2} (\frac{k}{\sqrt{2}}) < k - 0.5, k \geq 2

Thus, Inequality (45) holds, and our proof is complete.

Lemma B.4

Let k ≥ 2 be arbitrary. Then, for each matrix M in the set of matrices $ℳ$ (k), the mapping T_M : Ω₃ → Ω₃ defined in Section 4 is a contraction.

Proof

The mapping $T_{M}^{*} : ℙ_{3} \to ℙ_{3}$ is a contraction with contraction coefficient k⁻¹. Applying Lemma B.2 with σ = k⁻¹, we have to show that various variances are all less than (k − k⁻¹)². Specifically, for each (a, b, c) ∈ V₁ we wish to show

V (H_{3, k} (a, b, c), H_{3, k} (a - 1, b + 1, c), H_{3, k} (a - 1, b, c + 1)) < {(k - k^{- 1})}^{2}

(47)

and for each (a, b, c) ∈ V₂ we wish to show

V (H_{3, k} (a, b, c), H_{3, k} (a + 1, b - 1, c), H_{3, k} (a + 1, b, c - 1)) < {(k - k^{- 1})}^{2}

(48)

Using Formula (7), the variance on the left side of Inequality (47) is equal to

V ({log}_{2} a {log}_{2} (b + 1), {log}_{2} (c + 1))

Let a₁ = a, a₂ = b + 1, a₃ = c + 1. For any constant γ satisfying 0 < γ < 1, we have

V ({log}_{2} a, {log}_{2} (b + 1), {log}_{2} (c + 1)) \leq {\sum_{i = 1}^{3} [{log}_{2} (\frac{γ (k + 1)}{a_{i}})]}^{2}

(49)

Using the fact that

γ \leq \frac{γ (k + 1)}{a_{i}} \leq γ (k + 1), i = 1, 2, 3

the right side of Inequality (49) is upper bounded by

3 max [{({log}_{2} γ)}^{2}, {({log}_{2} {γ (k + 1)})}^{2}]

Choosing the smallest value of γ for which

{({log}_{2} {γ (k + 1)})}^{2} \geq {({log}_{2} γ)}^{2}

holds for every k ≥ 2, we obtain

γ = 1 / \sqrt{3}

. We have thus proved the variance bound

V ({log}_{2} a, {log}_{2} (b + 1), {log}_{2} (c + 1)) \leq 3 {[{log}_{2} (\frac{k + 1}{\sqrt{3}})]}^{2}, (a, b, c) \in V_{1}

Similarly, the variance on the left side of Inequality (48) is V(log₂(a + 1), log₂ b, log₂ c), and

V ({log}_{2} (a + 1) {log}_{2} b, {log}_{2} c) \leq 3 {[{log}_{2} (\frac{k + 1}{\sqrt{3}})]}^{2}, (a, b, c) \in V_{2}

Using calculus, it is easy to show that

\sqrt{3} {log}_{2} (\frac{k + 1}{\sqrt{3}}) < k - 0.5, k \geq 2

Thus, for each (a, b, c) ∈ V₁, we have the desired inequality

V ({log}_{2} a, {log}_{2} (b + 1), {log}_{2} (c + 1)) < {(k - k^{- 1})}^{2}

and for each (a, b, c) ∈ V₂, we have the desired inequality

V ({log}_{2} (a + 1), {log}_{2} b, {log}_{2} c) < {(k - k^{- 1})}^{2}

References

Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems, 2nd ed.; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Kieffer, J. Hierarchical Type Classes and Their Entropy Functions. Proceedings of the 1st International Conference on Data Compression, Communication and Processing, Palinuro, Campania, Italy, 21–24 June 2011; pp. 246–254.
Oh, S.-Y. Information Theory of Random Trees Induced by Stochastic Grammars. Ph.D. Thesis, University of Minnesota Twin Cities, Department of Electrical & Computer Engineering, Minneapolis, MN, USA, 2011. [Google Scholar]
Brualdi, R. Algorithms for constructing (0,1)-matrices with prescribed row and column sum vectors. Discret. Math. 2006, 306, 3054–3062. [Google Scholar]
Fonseca, C.; Mamede, R. On (0,1)-matrices with prescribed row and column sum vectors. Discret. Math. 2009, 309, 2519–2527. [Google Scholar]
Falconer, K. Fractal Geometry; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
Soifer, A. How Does One Cut a Triangle? Springer-Verlag: Berlin, Heidelberg, Germany, 2010. [Google Scholar]

© 2011 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)

Share and Cite

MDPI and ACS Style

Kieffer, J. A Catalog of Self-Affine Hierarchical Entropy Functions. Algorithms 2011, 4, 307-333. https://doi.org/10.3390/a4040307

AMA Style

Kieffer J. A Catalog of Self-Affine Hierarchical Entropy Functions. Algorithms. 2011; 4(4):307-333. https://doi.org/10.3390/a4040307

Chicago/Turabian Style

Kieffer, John. 2011. "A Catalog of Self-Affine Hierarchical Entropy Functions" Algorithms 4, no. 4: 307-333. https://doi.org/10.3390/a4040307

Article Menu

A Catalog of Self-Affine Hierarchical Entropy Functions

Abstract

1. Introduction

Example 1

Global Hierarchical Entropy Function

Lemma 1

Proof

Remark

Discussion

2. Hierarchical Sources

2.1. Formal Definition of Hierarchical Source

2.2. Entropy-Stable Hierarchical Sources

2.3. Finitary Hierarchical Sources

Definitions

Example 2

Example 3

Example 4

Remarks

Theorem 1

Notations and Remarks

3. h2,k Is Self-Affine

Remarks

Theorem 2

Lemma 2

Proof

Proof of Theorem 2

Generating Hierarchical Entropy Function Plots

4. h3,k Is Self-Affine

Remarks

Example 5

Theorem 3

Lemma 3

Proof

Lemma 4

Proof

Proof for Case 1

Proof for Case 2

Proof of Theorem 3

5. Properties of Hierarchical Entropy Functions

Acknowledgments

Appendix A

Lemma A.1

Proof

Proof of Theorem 1

Appendix B

Lemma B.1

Proof

Lemma B.2

Proof

Lemma B.3

Proof

Lemma B.4

Proof

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. h_2,k Is Self-Affine

4. h_3,k Is Self-Affine