Infinite Excess Entropy Processes with Countable-State Generators

We present two examples of finite-alphabet, infinite excess entropy processes generated by invariant hidden Markov models (HMMs) with countable state sets. The first, simpler example is not ergodic, but the second is. It appears these are the first constructions of processes of this type. Previous examples of infinite excess entropy processes over finite alphabets admit only invariant HMM presentations with uncountable state sets.


I. INTRODUCTION
For a stationary process (X t ) the excess entropy E is the mutual information between the infinite past ← − X = . . . X −2 X −1 and the infinite future − → X = X 0 X 1 . . . . It has a long history and is widely employed as a measure of correlation and complexity in a variety of fields, from ergodic theory and dynamical systems to neuroscience and linguistics [1][2][3][4][5][6]; see Ref. [7] and references therein for a review.
An important question in classifying a given process is whether it is finitary (finite excess entropy) or infinitary (infinite excess entropy). Over a finite alphabet, many of the simple process classes commonly studied are always finitary. These include all i.i.d. processes, Markov chains, and processes with finite-state hidden Markov model (HMM) presentations. There also exist several well known examples of finite-alphabet infinitary processes, though. For instance, the symbolic dynamics at the onset of chaos in the logistic map and similar dynamical systems [7] and the stationary representation of the binary Fibonacci sequence [8] are both infinitary.
These latter processes, however, only admit invariant HMM presentations [ * ] with uncountable state sets. Indeed, any process generated by an invariant countable-state HMM either has positive entropy rate or consists entirely of periodic sequences, which these do not; see App. B. Versions of the Santa Fe Process introduced in Ref. [6] are finite-alphabet infinitary processes with positive entropy rate. However, they were not constructed directly as HMMs, and it seems unlikely that they should have any invariant countable-state presentations. To the best of our knowledge, to date there are no examples of finite-alphabet, infinitary processes with invariant countable-state presentations.
We present two such examples. The first is nonergodic, and the information conveyed from the past to the future essentially consists of the ergodic component along a given realization. This example is straightforward to construct and, though previously unpublished, we suspect that others are aware of this or similar constructions. The second, ergodic example, though, is more involved and we believe that both its structure and properties are novel.
To put these contributions in perspective, note that any stationary finite-alphabet process may be trivially represented as an invariant HMM with an uncountable state set, in which each infinite history ← − x corresponds to a single state. Thus, it is clear invariant HMMs with uncountable state sets can generate finite-alphabet infinitary processes. In contrast, for any finite-state HMM E is always finite-bounded by the logarithm of the number of states. The case of countable-state HMMs lies in between the finite-state and uncountable-state cases, and it was previously not clear whether it is possible to have countable-state invariant HMMs that generate infinitary finite-alphabet processes and, in particular, ergodic ones. Here, we show that infinite excess entropy is indeed possible for processes with II. BACKGROUND

A. Excess Entropy
Definition 1. For a stationary, finite-alphabet process (X t ) t∈Z the excess entropy E is the mutual information between the infinite past ← − X = ...X −2 X −1 and the infinite future − → X = X 0 X 1 ...: where ..X t−1 are the length-t past and future, respectively.
In Refs. [7,9] it is shown that E may also be expressed alternatively as: where h µ is the process entropy rate: That is, the excess entropy E is the asymptotic amount of entropy (information) in length-t blocks of random variables beyond that explained by the entropy rate. The excess entropy derives its name from this formulation. We also use this formulation to establish that the process of Sec. III A is infinitary.
(2) with the chain rule and recombining terms gives another important formulation: where h µ (t) is the length-t entropy-rate approximation: the conditional entropy in the t th symbol given the previous t − 1 symbols. This final formulation will be used to establish that the process of Sec. III B is infinitary.

B. Hidden Markov Models
There are two primary types of hidden Markov models: edge-emitting (or Mealy) and state-emitting (or Moore). We work with the former edge-emitting type, but the two are equivalent in that any model of one type over a finite alphabet may converted to a model of the other type without changing the cardinality of the state set by more than a constant factor-the alphabet size. Thus, for our purposes, Mealy HMMs are sufficiently general. We also consider only invariant HMMs, as defined in [9], over finite alphabets and with countable state sets. Definition 2. An invariant, edge-emitting, countable-state, finite-alphabet hidden Markov model (hereafter referred to simply as a countable-state HMM) is a 4-tuple (S, X , {T x }, π) where: 1. S is a countable set of states.
2. X is a finite alphabet of output symbols.
3. T (x) , x ∈ X , are symbol labeled transition matrices. T (x) σσ is the probability that state σ transitions to state σ on symbol x. 4. π is an invariant or stationary distribution for the underlying Markov chain over states with transition matrix T = x∈X T (x) . That is, π satisfies π = πT .
Remark. "Countable" in Property 1 means either finite or countably infinite. If the state set S is finite, we also refer to the HMM as finite-state.
A hidden Markov model may be depicted as a directed graph with labeled edges. The vertices are the states σ ∈ S and, for all σ, σ ∈ S with T (x) σσ > 0, there is a directed edge from state σ to state σ labeled p|x for the symbol x and transition probability p = T (x) σσ . These probabilities are normalized so that the sum of probabilities on all outgoing edges from each state is 1. An example is given in Fig. 1.
FIG. 1: A hidden Markov model (the -machine) for the Even Process. The support for this process consists of all binary sequences in which blocks of uninterrupted 1s are even in length, bounded by 0s. After each even length is reached, there is a probability p of breaking the block of 1s by inserting a 0. The machine has two internal states S = {σ1, σ2}, a two symbol alphabet X = {0, 1}, and a single parameter p ∈ (0, 1) that controls the transition probabilities. The associated Markov chain over states is finite-state and irreducible and, thus, has a unique stationary distribution π = (π1, The graphical representation of the machine is given on the left, with the corresponding transition matrices on the right. In the graphical representation the symbols labeling the transitions have been colored blue, for visual contrast, while the transition probabilities are black. The operation of a HMM may be thought of as a weighted random walk on the associated graph. That is, from the current state σ the next state σ is determined by following an outgoing edge from σ according to the edges' relative probabilities (or weights). During the transition, the HMM outputs the symbol x labeling this edge.
The state sequence (S t ) determined in such a fashion is simply a Markov chain with transition matrix T . However, we are interested not simply in the state sequence of the HMM, but rather the associated sequence of output symbols (X t ) that are generated by reading the labels off the edges as they are followed. The interpretation is that an observer of the HMM may directly observe this sequence of output symbols, but not the hidden internal states. Alternatively, one may consider the Markov chain over edges (E t ), of which the observed symbol sequence (X t ) is simply a projection.
In either case, the process (X t ) generated by the HMM (S, X , {T x }, π) is defined as the output sequence of edge symbols, which results from running the Markov chain over states according to the stationary law with marginals P(S 0 ) = P(S t ) = π. It is easy to verify that this process is itself stationary, with word probabilities given by: where for a given word w = w 1 ...w n ∈ X + , T (w) is the word transition matrix T (w) = T (w1) · · · T (wn) . The process language is the set of words L = {w : P(w) > 0}.
Remark. Even for a noninvariant HMM (S, X , {T x }, π), where the state distribution π is not stationary, one may always define a one-sided process (X t ) t≥0 with marginals given by: Furthermore, though the state sequence (S t ) t≥0 will not be a stationary process if π is not a stationary distribution for T , the output sequence (X t ) t≥0 may still be stationary. In fact, Ref.
[9, Example 2.9] showed that any one-sided process over a finite alphabet X , stationary or not, may be represented as a countable-state noninvariant HMM in which the states correspond to finite-length words in X + , of which there are only countably many. By stationarity, a one-sided stationary process generated by such a noninvariant HMM can be uniquely extended to a two-sided stationary process. So, in a sense, any two-sided stationary process (X t ) t∈Z can be said to be generated by a noninvariant countable-state HMM. Though, this is a slightly unnatural interpretation of process generation in that the two-sided process (X t ) t∈Z is not directly the process obtained by reading symbols off the edges of the HMM as it runs along transitioning between states in bi-infinite time. In either case, the space of stationary finite-alphabet processes generated by noninvariant countable-state HMMs is too large: it includes all stationary finite-alphabet processes. Due to this, we restrict to the case of invariant HMMs where both the state sequence (S t ) and output sequence (X t ) are stationary. Clearly, if one allows finite-alphabet processes generated by noninvariant countable-state HMMs there are infinitary examples. And so, in the following development HMM will implicitly mean invariant HMM, but this will no longer be stated.
We consider now an important property known as unifilarity. This property is useful in that many quantities are analytically computable only for unifilar HMMs. In particular, for unifilar HMMs the entropy rate h µ is often directly computable, unlike the nonunifilar case. Both of the examples constructed in Sec. III are unifilar, as is the Even Process HMM of Fig. 1.
is unifilar if for each σ ∈ S and x ∈ X there is at most one outgoing edge from state σ labeled with symbol x in the associated graph G.
It is well known that for any finite-state unifilar HMM the entropy rate in the output process (X t ) is simply the conditional entropy in the next symbol given the current state: where π σ is the stationary probability of state σ and h σ = H[X 0 |S 0 = σ] is the conditional entropy in the next symbol given that the current state is σ.
We are unaware, though, of any proof that this is generally true for countable-state HMMs. If the entropy in the stationary distribution H[π] is finite, then a proof along the lines given in Ref. [10] carries through to the countablestate case and Eq. (8) still holds. However, countable-state HMMs may sometimes have H[π] = ∞. Furthermore, it can be shown [9] that the excess entropy E is always bounded above by H[π]. So, for the infinitary process of Sec. III B we need slightly more than unifilarity to establish the value of h µ . To this end, we consider a property known as exactness [11].
In App. A we prove the following proposition.
Proposition 1. For any countable-state, exact, unifilar HMM, the entropy rate is given by the standard formula of Eq. (8).
The HMM constructed in Sec. III B is both exact and unifilar, so Prop. 1 applies. Using this explicit formula for

III. CONSTRUCTIONS
We present two constructions of (invariant) countable-state HMMs that generate infinitary processes. In the first example the output process is not ergodic, but in the second it is.
A. Heavy-Tailed Periodic Mixture: An infinitary nonergodic process with a countable-state presentation Figure 2 depicts a countable-state HMM M , for a nonergodic infinitary process P. The machine M consists of a countable collection of disjoint strongly connected subcomponents M i , i ≥ 2. For each i, the component M i generates the periodic processes P i consisting of i − 1 1s followed by a 0. The weighting over components is taken as a heavytailed distribution with infinite entropy. For this reason, we refer to the process M generates as the Heavy-Tailed Periodic Mixture (HPM) process.
Intuitively, the information transmitted from the past to the future for the HPM Process is the ergodic component i along with the phase of the period-i process P i in this component. This is more information than simply the ergodic component i, which is itself an infinite amount of information: H[(µ 2 , µ 3 , ..., )] = ∞. Hence, E should be infinite. This intuition can be made precise using the ergodic decomposition theorem of Debowski [12], but we present a more direct proof here. Proof. For the HPM Process P we will show that (i) lim t→∞ H[ − → X t ] = ∞ and (ii) h µ = 0. The conclusion then follows immediately from Eq. (2). To this end, we define sets: W i,t = {w : |w| = t and w is in the support of process P i }, ii,i1 = 1 for i ≥ 2, and all other transitions probabilities 0. Note that all logs here (and throughout) are taken base 2, as is typical when using information-theoretic quantities.
Note that any word w ∈ W i,t with i ≤ t/2 contains at least two 0s. Therefore: 1. No two distinct states σ ij and σ ij with i ≤ t/2 generate the same length t word.
2. The sets W i,t , i ≤ t/2, are disjoint from both each other and V t .
It follows that each word w ∈ W i,t , with i ≤ t/2, can only be generated from a single state σ ij of the HMM and has probability: Hence, for any fixed t: which proves Claim (i). Now, to prove Claim (ii) consider the quantity: On the one hand, for w ∈ U t , H[X t | − → X t = w] = 0 since the current state and, hence, entire future are completely determined by any word w ∈ U t . On the other hand, for w ∈ V t , H[X t | − → X t = w] ≤ 1 since the alphabet is binary. Moreover, the combined probability of all words in the set V t is simply the probability of starting in some component M i with i > t/2: P(V t ) = i>t/2 µ i . Thus, by Eq. (11), h µ (t + 1) ≤ i>t/2 µ i . Since i µ i converges, it follows that h µ (t) 0, which verifies Claim (ii).
B. Branching Copy Process: An infinitary ergodic process with a countable-state presentation Figure 3 depicts a countable-state HMM M for the ergodic, infinitary Branching Copy Process. Essentially, the machine M consists of a binary tree with loop backs and a self-loop on the root node. From the root node a path is chosen down the tree with each left-right (or 0-1) choice equally likely. But, at each step there is also a chance of turning back towards the root. The path back is a not a single step, however. It has length equal to the number of steps taken down the tree before returning back, and copies the path taken down symbol-wise with 0s replaced by 2s and 1s replaced by 3s. There is also a high self-loop probability at the root node on symbol 4, so some number of 4s will normally be generated after returning to the root node before preceding again down the tree. The process generated by this machine is referred to as the Branching Copy (BC) Process, because the branch taken down the tree is copied on the loop back to the root.
By inspection we see that the machine is unifilar with synchronizing word w = 4, i.e. H[S 1 |X 0 = 4] = 0. Since the underlying Markov chain over states (S t ) is positive recurrent, the state sequence (S t ) and symbol sequence (X t ) are both ergodic. Thus, a.e. infinite future − → x contains a 4, so the machine is exact. Therefore, Prop. 1 may be applied, and we know the entropy rate h µ is given by the standard formula of Eq. (8): h µ = σ π σ h σ . Since P(S t = σ) = π σ for any t ∈ N, we may alternatively represent this entropy rate as: where L t = {w : |w| = t, P(w) > 0} is the set of length t words in the process language L, φ(w) is the conditional state distribution induced by the word w (i.e., φ(w) σ = P(S t = σ| − → X t = w)), andh w = σ φ(w) σ h σ is the φ(w)-weighted average entropy in the next symbol given knowledge of the current state σ.
Similarly, for any t ∈ N the entropy-rate approximation h µ (t + 1) may be expressed as: is the entropy in the next symbol given the word w. Combining Eqs. (12) and (13) we have for any t ∈ N: By concavity of the entropy function, the quantity h w −h w is always nonnegative. Furthermore, in Claim 5 we show that h w −h w is always bounded below by some fixed positive constant for any word w consisting entirely of 2s and 3s. Also, in Claim 3 we show that P(W t ) scales as 1/t, where W t is the set of length-t words consisting entirely of 2s   . 3: A countable-state HMM for the Branching Copy Process. The machine M is essentially a binary tree with loopback paths from each node in the tree to the root node and a self-loop on the root. At each node σ 1 ij in the tree there is a probability 2qi of continuing down the tree and a probability pi = 1 − 2qi of turning back towards the root σ 1 01 on path If the choice is made to head back, the next i − 1 transitions are deterministic. The path of 0s and 1s taken to get from σ 1 01 to σ 1 ij is copied on the return with 0s replaced by 2s and 1s replaced by 3s. Formally, the alphabet is X = {0, 1, 2, 3, 4} and the state set is S = {σ k ij : i ≥ 0, 1 ≤ j ≤ 2 i , 1 ≤ k ≤ max{i, 1}}. The nonzero transition probabilities are as depicted graphically with pi = 1 − 2qi for all i ≥ 0, qi = i 2 /[2(i + 1) 2 ] for all i ≥ 1, and q0 > 0 taken sufficiently small so that H[(p0, q0, q0)] ≤ 1/300. The graph is strongly connected so the Markov chain over states is irreducible. Claim 1 shows that the Markov chain is also positive recurrent and, hence, has a unique stationary distribution π. Claim 2 gives the form of π. and 3s. Combining these results it follows that h µ (t + 1) − h µ ≥ 1/t and, hence, A more detailed analysis with the claims and their proofs is given below. In this we will use the following notation: • P σ (·) = P(·|S 0 = σ), • V t = {w ∈ L t : w contains only 0s and 1s} and W t = {w ∈ L t : w contains only 2s and 3s}, • π k ij = P(σ k ij ) is the stationary probability of state σ k ij , • R ij = {σ 1 ij , σ 2 ij , ..., σ i ij }, and • π ij = i k=1 π k ij and π 1 i = 2 i j=1 π 1 ij .
Note that: and: These facts will be used in the proof of the Claim 1.
Claim 1. The underlying Markov chain over states for the HMM is positive recurrent.
Proof. Let τ σ 1 01 = min{t > 0 : S t = σ 1 01 } be the first return time to state σ 1 01 . Then, by continuity: Hence, the Markov chain is recurrent and we have: from which it follows that the chain is also positive recurrent. Note that the topology of the chain implies the first return time may not be an odd integer greater than 1.
Proof. For any state σ k ij with i < t, P( − → X t ∈ W t |S 0 = σ k ij ) = 0. Thus, we have: where the second equality follows from symmetry. We prove the bounds from above and below on P(W t ) separately using Eq. (22).
Proof. Applying Claim 3 we have for any t ∈ N: By symmetry, P(X t ∈ {2, 3}| − → X t = w) is the same for each w ∈ W t . Thus, the same bound must also hold for each w ∈ W t individually: P(X t ∈ {2, 3}| − → X t = w) ≥ 1/150 for all w ∈ W t .
Proof. As noted above, since the machine satisfies the conditions of Prop. 1, the entropy rate is given by Eq. (8) and the difference h µ (t + 1) − h µ is given by Eq. (14). Therefore, applying Claims 3 and 5 we may bound the quantity h µ (t + 1) − h µ as follows: .
With the above decay on h µ (t) established we easily see the Branching Copy Process must have infinite excess entropy.
Proposition 3. The excess entropy E for the BC Process is infinite.
. By Claim 6, this sum must diverge.

IV. CONCLUSION
Any stationary, finite-alphabet process may be represented as an invariant HMM with an uncountable state set. Thus, there exist invariant HMMs with uncountable state sets capable of generating infinitary processes over finite alphabets. It is impossible, however, to have a finite-state invariant HMM that generates an infinitary process. The excess entropy E is always bounded by the entropy in the stationary distribution H[π], which is finite for any finitestate HMM. Countable-state HMMs are intermediate between the finite and uncountable cases, and it was previously unknown whether infinite excess entropy was possible in this case. We have demonstrated that it is indeed possible, by giving two explicit constructions of finite-alphabet infinitary processes generated by invariant HMMs with countable state sets.
The second example, the Branching Copy Process, is also ergodic-a strong restriction. It is a priori quite plausible that infinite E might only occur in the countable-state case for nonergodic processes. Moreover, both HMMs we constructed are unifilar, so the -machines [9,13] of the processes have countable state sets as well. Again, unifilarity is a strong restriction to impose, and it is a priori conceivable that infinite E might only occur in the countable-state case for nonunifilar HMMs. Our examples have shown, though, that infinite E is possible for countable-state HMMs, even if one requires both ergodicity and unifilarity.
Since we know h µ (t) limits to h µ , it suffices to show that: By concavity of the entropy function, h w −h w ≥ 0 for any w. However, for a synchronizing word w = w 1 ...w t with H[S t | − → X t = w] = 0, h w −h w is always 0, since the distribution φ(w) is concentrated only on a single state. Furthermore, for any w, h w −h w ≤ h w ≤ log |X |. Thus: where N S t is the set of length-t words that are nonsynchronizing and P(N S t ) is the combined probability of all words in this set. Since the HMM is exact, we know that for a.e. infinite future − → x an observer will synchronize exactly at some finite time t = t( − → x ). And, since it is unifilar, the observer will remain synchronized for all t ≥ t. It follows that P(N S t ) must be monotonically decreasing and limit to 0: Combining Eq. (A3) with Eq. (A4) shows that Eq. (A2) does in fact hold, which completes the proof.

Appendix B
We prove the following proposition for the entropy rate of countable-state HMMs.
Proposition 4. Let M be a countable-state HMM and let P = (X t ) be the process generated by M . If P does not consist entirely of periodic sequences, then its entropy rate h µ P is strictly positive.
Proof. For any countable-state HMM M , the future output sequence and past output sequence are conditionally independent given the current state. Thus, for all t ∈ N, H[X t | − → X t , S t ] = H[X t |S t ]. Also, by stationarity H[X t |S t ] = H[X 0 |S 0 ] = σ π σ h σ , for all t. Combining these facts shows that entropy rate is always bounded below by the standard unifilar formula of Eq. (8): Therefore, the entropy rate is positive if h σ > 0 for any state σ with nonzero probability π σ or, equivalently, if there are at least two outgoing edges in the associated graph from state σ. Now, assume there is no such state. Consider the restricted state set S consisting of states σ with positive probability (π σ > 0) and the restricted graph G associated to this state set. Clearly, the HMM M defined by this graph with stationary distribution π generates the same process P as the original HMM. And, it is also easily seen that in order to keep the distribution π stationary, the graph G must consist entirely of disjoint strongly connected components. That is, each connected component of G must be strongly connected. Take any strongly connected component C i in G. Since each state σ in C i has only a single outgoing edge and C i is strongly connected, it follows that C i must be a deterministic loop of some finite length l i . Since this holds for each strongly connected component C i in G and the HMM M is always run from one of the C i s, it follows that all sequences ← → x = ...x −1 x 0 x 1 ... generated by M are periodic. Or, equivalently, all sequences generated by M are periodic.