Trees in Positive Entropy Subshifts

: I give a simple proof for the fact that positive entropy subshifts contain inﬁnite binary trees where branching happens synchronously in each branch, and that the branching times form a set with positive lower asymptotic density.


Introduction
In topological dynamics, the topological entropy of a dynamical system measures the information in orbits, by counting the exponential growth rate of different partial orbits with a certain of accuracy. One example of a dynamical system is a subshift, a dynamical system whose points are infinite words over a finite alphabet A, and the dynamics is the left shift. One-way entropy can arise in a subshift when, somewhere in partial orbits, we literally see all distinct words A n up to n. One may ask to what extent entropy always arises from such "free choices".
Two formal notions that most directly capture this idea are the independence entropy of a subshift and independence sets. The former measures how much entropy comes from "settheoretic hyperrectangles", namely sets of the form A 1 × A 2 × · · · A n which are contained in the language of the subshift as n → ∞, where A i ⊆ A are any subsets of the alphabet and we consider consecutive letters only, see [1], (p. 303) for the precise definition. Positive entropy does not imply positive independence entropy (see Proposition 1).
Independence sets, on the other hand, are defined similarly in [2,3], but we allow gaps between the positions where the prescribed letters appear, and there is no constraint on the words seen in the gaps between the positions. More precisely, in the case of a subshift X with a binary alphabet A = {0, 1}, an independence set is a set N ⊆ N such that X| N = {0, 1} N . It turns out that the existence of an independence set of positive lower asymptotic density does capture positive entropy, see [2,3] for a proof in this binary subshift case, and [4][5][6] for analogous results in more general settings.
The following theorem gives a new proof of this result (see Corollary 1), but it is stronger, as there is an additional constraint on how the independent choices must be realized. It also applies in the case of a general alphabet.

Theorem 1.
A subshift X ⊆ A N has positive entropy if and only if it contains a steadily branching binary tree.
We can interpret A N as the boundary of an |A|-ary tree and X a closed subset of this boundary. A closed subset of the boundary of a tree is determined by finite subpaths in the tree, and it is in this sense that X contains the binary tree. Steadily branching means that the embedded binary tree branches at a uniform sequence of times which form a set of positive lower asymptotic density. A formal definition is given in Section 3.
Positive entropy refers to topological entropy, and in the context of subshifts, means the exponential growth of the number of words. Countable subshifts can have a growth rate arbitrarily close to exponential (see Proposition 2), but clearly contain trees with only finitely many branchings. Thus, if we weaken the assumption of positive entropy, the existence of a steadily branching binary tree can fail in a very strong sense.
It is also easy to find examples with positive entropy where it is not possible to find a tree where branchings happen a syndetic set of times (see Proposition 3). There are also examples where no individual tree which branches at a uniform sequence of times "captures" all of the entropy, as can be seen from [1], (Example 2). These remarks show that we cannot expect to essentially improve the conclusion about the tree under the assumption of positive entropy.
The proof of Theorem 1 is straightforward using the standard results about the winning shift, which is the set of winning turn orders in a certain word-building game associated to X. The winning shift is known to have the same entropy as X. This implies it has a point with positive density, and this point is the branching structure of a steadily branching tree in X. This note is simply a self-contained elaboration of this deduction. I give a combinatorial proof directly from the definitions, and also include an ergodic theoretic proof.
It is easy to deduce a two-sided version of the result as well. For x ∈ A −N and y ∈ A N , write x · y ∈ A Z for the configuration with x and y back to back.

Theorem 2.
A subshift X ⊆ A Z has positive entropy if and only if for some x ∈ A −N , the set {y ∈ A N : x · y ∈ X} contains a steadily branching binary tree.
The winning shift was introduced in [7], and has also been studied in [8]. We found out while working on [8] that a concept equivalent to the winning shift was discovered in the set systems setting already in 2002 [9] (even if ostensibly only for binary words), and I found out while working on the present paper that [9] has been applied by dynamicists [2,3] (this paper is independent from [7], but almost contemporary), and their proof that this gives large independence sets essentially boils down to the same proof as I give here. Nevertheless, I feel that though Theorem 1 is an interesting statement about subshifts, to our knowledge the statement does not appear in the literature, and its proof through winning shifts is worth making explicit. Furthermore, unlike Theorem 1, the statement about independence sets does not trivially generalize to non-binary alphabets.
As discussed, I obtain a new proof of the fact entropy implies the existence of certain types of independence sets in the case of a binary alphabet.
Then, X has positive entropy if and only if it has an independence set of positive lower asymptotic density.
Again, I mention that this result is proved in [2,3], and that generalizations appear in [4][5][6]. To our knowledge, the first reference where the Sauer-Shelah lemma appears in a dynamics context is [4]. The most naive generalization of Corollary 1, the existence of a high density set N with X| N = A N , fails for a non-binary alphabet A (even for trivial reasons, by artificially increasing the alphabet of the subshift). However, one can state an equivalent condition for having higher entropy in this way [10], (Theorem 8.3), [3], (Theorem 2). (The non-trivial direction of this equivalence also appears in [7], (Proposition 5.9)).
As I show in this note, winning shifts say something stronger than Corollary 1 in the case of one-dimensional subshifts, but it is not clear how to generalize them to other settings, at least in a way that allows seeing them naturally as a dynamical system. Question 1. Does every expansive system of positive entropy contain a steadily branching binary tree (and what does that even mean)? More generally, can the notion of winning shift be extended to general (expansive) systems? Group/monoid actions?
The webpage www.theproofistrivial.com (accessed on 28 November 2016) is an online resource that suggests high-level proof strategies in a variety of mathematical domains.
The present note was inspired by the following quote from this webpage: "The proof is trivial! Just view it as a rational metric space whose elements are countable combinatorial games." The page was accessed during the Combinatorics, Automata and Number Theory conference that took place between 28 November 2016 and 2 December 2016.

Definitions
Let N 0 be the natural numbers. An alphabet is a finite set A. Words are elements of the free monoid generated by A, denoted by A * . Elements of A are seen as belonging to A * when convenient, u · v denotes the multiplication in the free monoid (formal concatenation). The length |w| of a word is defined in the obvious way. Write A n = {w ∈ A * : |w| = n}.
The set A N , for a finite alphabet A (or later A Z ), always carries the product topology, and it is homeomorphic to The words of a subshift form its language. Writing L n for the intersection of the language of X with A n , the (topological) entropy of a subshift X is the exponential growth rate of the number of words, h(X) = lim n→∞ log |L n | n (which exists by the subadditivity of log |L n |). The lower asymptotic density of N ⊆ N is lim inf k where + denotes the Minkowski sum.

Definition 1.
Let A be a finite alphabet and let X ⊆ A N . A tree in X with branching structure b ∈ N N is a set of sequences (x z ) N N z≤b where each x z is an element of X, and: A steadily branching binary tree is a tree with branching structure b ∈ {0, 1} N satisfying In other words, for distinct sequences z, z ∈ N N , the first position where x z and x z differ is the same as the first position where z and z differ. For binary b, this means that the nonzero positions in b are the positions where our tree must branch in two, and more generally, b i = n means the tree must branch n times in position i, explaining why we call this sequence a branching structure. Note also that if X ⊆ A N and X contain a tree with branching structure b ∈ N N , then necessarily b i ≤ |A| − 1 for all i ∈ N. Definition 2. Let X ⊆ A N be a subshift. Let W(X), the winning shift of X, be the set of all branching structures of trees in X. This is defined in a game-theoretic framework in [7]: a tree in X with branching structure z can be interpreted as a winning strategy for the first player in a word-building game where on the ith turn, the first player picks a set of z i + 1 symbols, then the second picks one of them, and the first player wins if the word obtained in the limit is in X.
It is shown in [7] that the winning shift is indeed a subshift. A subshift Y ⊆ N N is hereditary [6] if for all y ∈ N N we have: The following fact is Proposition 5.7 of [7] (the hereditarity claim is trivial). The notation differs slightly, in the referenceW(X) is used for what we call W(X), and their W(X) is the same as ours in the binary case, but differs in general. Lemma 1. The subshift W(X) is hereditary and has the same number of words of each length as X.
Proof sketch. The hereditarity claim is trivial. For the claim about words, one can define W(L) for L ⊆ A n analogously to the subshift case (using finite trees). Letting L ⊆ A n be the language of X intersected with A n , W(L) is the language of W(X) intersected with A n . Now, we write L = 0L 0 1L 1 · · · (k − 1)L k−1 and exchange a suitable sum: The first equality is true by definition, and the penultimate one by induction.
For a finite word w ∈ N * , write ∑ w = ∑ i w i for the sum of the symbols in w, and |w| for the length of w as a word. The key to finding steadily branching trees is to study the density ∑ w/|w| of a word w. If Y ⊆ {0, 1} N is a subshift, by Y k we mean the Cartesian power with the diagonal action, interpreted in an obvious way as an alphabet over the alphabet {0, 1} k .
We observe a simple combinatorial lemma.

Lemma 2.
If a subshift Y has positive entropy and is hereditary, then for some β > 0 there exist arbitrarily long words w of Y ∩ {0, 1} N with ∑ w/|w| ≥ β.
Since Y is hereditary, it is easy to see that the number of words in Y of length n is at most the number of words in b(Y) k of length n. Thus, if Y has positive entropy, so does b(Y) k , and thus so does b(Y). Suppose thus that the number of words in b(Y) of every length n is at least α n for some α > 0, as is clearly implied by positive entropy. The number of words of length n with at most k many 1s is at most: Setting k = βn, this becomes (( n·e βn ) β 2 β ) n and we observe that as β → 0, we have 2 β → 1 and ( n·e βn ) β = ( e β ) β → 1, so for small enough β > 0, we have ( n·e βn ) β 2 β < α, and thus there must be words of length n in b(Y) with at least βn many 1s, for arbitrarily large n. These are words of Y ∩ {0, 1} N since Y is hereditary.
This argument appears also in [11] (Theorem 4.8) (this theorem already includes the statement about lower asymptotic density, which we deduce in the next section).

The Proofs
Proof of Theorem 1. Obviously, a steadily branching tree implies positive entropy.
For the other direction, let A ⊆ N be a finite alphabet and Y ⊆ A N a subshift. Write s n for the maximal sum ∑ w of a word w of length n in Y. This sequence is clearly subadditive, so lim n→∞ s n /n exists, say lim n s n /n = α.
We outline the usual addendum that there must be a configuration y ∈ Y such that every prefix w of y satisfies ∑ w/|w| ≥ α. Supposing that this is not the case, then every point X has a prefix w such that ∑ w/|w| < α, and this must happen after a bounded number of steps by compactness, thus there exists > 0 such that for some m, we always find a prefix w of length at most m in any point y ∈ Y, such that ∑ w/|w| < α − . Now, given any long word w of Y, we can split it as w = w 0 w 1 · · · w k (where i in w i denotes a superscript, not a power) with |w k | ≤ m and ∑ w i /|w i | < α − for all i < k. If w is long enough, then since ∑ uv/|uv| ≤ max(∑ u/|u|, ∑ v/|v|) for all words u, v, we have: for all long enough words, a contradiction to lim n s n /n = α. Now, we set Y = W(X) and apply Lemma 2. The lemma implies that α > 0 in the above. The point y ∈ W(X) whose density stays above α gives the branching structure of a steadily branching tree in X.
The point of this note was to provide a combinatorial proof of Theorem 1 from the first principles, as this is easy to do. However, I have included also the proof using ergodic theory (see [18] for a basic reference).
Ergodic theoretic proof of Theorem 1. Obviously a steadily branching tree implies positive entropy. For the other direction, since the winning shift has positive entropy, it admits an invariant measure µ with positive entropy, thus µ([a]) > 0 for some a > 0. By the ergodic decomposition there exists such an ergodic measure, and by the pointwise ergodic theorem, there is a point y ∈ W(X) where the lower asymptotic density of as is positive, giving the result.
We conclude with the proofs of Theorem 2 and Corollary 1.
Proof of Theorem 2. Let X R ⊆ A N be the subshift of the right tails of points in X and apply Theorem 1. Let y ∈ {0, 1} N be the branching structure of some steadily branching tree and α the lower asymptotic density. It is easy to see that for any n, we can find a finite prefix w of y = wx of a length of at least n such that for all prefixes u of x of length at most n, ∑ u/|u| ≥ α. Otherwise, by cutting long words into ones of length at most n, as in the previous argument, we see that the lower asymptotic density is less than α.
The fact we can find such wx as a branching structure implies that the tree corresponding to x can follow at least one word of length |w|. Letting n tend to infinity, by compactness, we obtain a steadily branching tree that can follow some left tail.
Proof of Corollary 1. Supposing X has positive entropy, let y ∈ W(X) ∩ {0, 1} N have positive lower asymptotic density. The set N = {n : y n = 1} is an independence set (which by assumption has positive lower asymptotic density): Suppose (x z ) N N z≤y form a tree with branching structure y, i.e., we have: Supposing that (m i ) i∈N enumerates N in order, let N n = {m 0 , ..., m n−1 }, and for w ∈ {0, 1} n let z w ∈ {0, 1} N be the characteristic sequence of {m i : i < |w|, w i = 1}. Clearly z w ≤ y for all w ∈ {0, 1} n . The map: Thus, this map is surjective, giving X| N n = {0, 1} N n . By compactness, we have X| N = {0, 1} N .

Proofs of Supplementary Claims
In this section, I provide brief proofs for the claims made in the introduction.

Proposition 1.
There exists a subshift X ⊂ {0, 1} N with positive entropy, such that every topologically conjugate subshift Y ∼ = X has zero independence entropy.
Proof. By a compactness argument, any subshift X with positive independence entropy must contain two points which differ in only one position, i.e., ∃x, y ∈ X : x 0 = y 0 ∧ ∀i = 0 : x i = y i . Positive entropy does not imply the existence of a pair of points with finitely many differences, see [19], (Theorem 1.3), and clearly this property is preserved under conjugacy.

Proposition 2.
For any function f : N → N satisfying f (n) = o(a n ) for all a > 1, there exists a countable subshift X ⊂ {0, 1} N such that for all large enough n, the number of words of length n in X is at least f (n).
Proof. The forbidden patterns of X are described as follows: if a finite subword contains n many 1s, they must be pairwise separated by a distance of at least m n . Here, (m n ) n is some nondecreasing sequence. The number of words of X of length n is at least 2 n/m n , and if m n grows slowly enough this stays above f . By the assumption on f we can have m n → ∞, and it is easy to see that X is then countable, since all its points have finite sum and finite subsets of N form a countable set.

Proposition 3.
There exists a subshift X ⊂ {0, 1} N with positive entropy such that every point in W(X) contains arbitrarily long subwords of the form 0 n .
Proof. If X itself is hereditary, then W(W(X)) = W(X), so it suffices to find a hereditary X with positive entropy such that every point x ∈ X contains arbitrarily long subwords of the form 0 n . For this, it clearly suffices to find a binary subshift Y where every point contains the subword 0 n for arbitrarily large n, but some configuration contains 1s with positive lower asymptotic density, as the smallest hereditary subshift containing Y then has the claimed property: entropy comes from independently flipping the positive density sequence of 1s, and taking the hereditary closure can clearly only add 0 n -subwords to points. The existence of such Y is folklore, see e.g. [20], (Example 7.3), for an explicit construction.