Branching Densities of Cube-Free and Square-Free Words

Binary cube-free language and ternary square-free language are two “canonical” representatives of a wide class of languages defined by avoidance properties. Each of these two languages can be viewed as an infinite binary tree reflecting the prefix order of its elements. We study how “homogenious” these trees are, analysing the following parameter: the density of branching nodes along infinite paths. We present combinatorial results and an efficient search algorithm, which together allowed us to get the following numerical results for the cube-free language: the minimal density of branching points is between 3509/9120 ≈ 0.38476 and 13/29 ≈ 0.44828, and the maximal density is between 0.72 and 67/93 ≈ 0.72043. We also prove the lower bound 223/868 ≈ 0.25691 on the density of branching points in the tree of the ternary square-free language.


Introduction
A formal language, which is a subset of the set of all finite words over some (usually finite) alphabets, is one of the most common objects in discrete mathematics and computer science. Languages are often defined by properties of their elements, and many "good" properties are hereditary-all factors (=contiguous subwords) of a word with such a property also possesses this property. Typical hereditary properties are "to be a factor of a certain infinite word" or "to contain no factors from a given set". A factorial language forms posets under some natural order relations; the relation "to be a prefix of" is probably the simplest relation of this sort. The diagram of this relation is called a prefix tree; the structure of this tree reflects the properties of the language. For example, the prefix tree of a language L can be viewed as a deterministic (finite or infinite) automaton accepting L: each edge has the form (w, wa) and is labeled by the letter a, the root is the initial state, all nodes are final states.
An important class of factorial languages is constituted by power-free languages. Any language of this class contains no factors from the set of α-powers for a certain integer or rational α; an α-power of a nonempty word u is the prefix of an infinite word uuu · · · of length α|u| , where |u| stands for the length of u. Power-free languages are studied in hundreds of papers starting with the seminal work by Thue [1], but the topic still contains a number of challenging open problems. One group of problems concerns the structure of prefix trees of infinite power-free languages. Let us briefly recall the related known results. In the following text, the subtree of a prefix tree means a tree consisting of some node w and all its descendants.
For infinite power-free languages, there is a natural partition into "small" and "big" [2][3][4][5][6][7][8]: in binary languages avoiding small powers (up to 7/3), the number of words grows only polynomially with length, while all other infinite power-free languages are conjectured to have exponential growth. This conjecture has been proved [4][5][6][7][8] for almost all power-free languages (up to a finite number of cases). Polynomial-size binary power-free languages • We establish the lower bound 3509/9120 ≈ 0.38476 on the branching density in the prefix tree of CF (Theorem 3) and the lower bound 223/868 ≈ 0.25691 on the branching density in the prefix tree of SF (Theorem 4), significantly improving the bounds from [19,23]; • We construct infinite paths in the prefix tree of CF with the branching density as small as 13/29 ≈ 0.44828 (Theorem 5); • We establish the upper bound 67/93 ≈ 0.72043 on the branching density in the prefix tree of CF and construct infinite paths, with the branching density as big as 18/25 = 0.72 (Theorem 6). Let us comment on the results. The proof of each of the lower bounds consists of two parts: one is purely combinatorial, while the other requires a computer search. For the cube-free language, we significantly improve the combinatorial part (Theorem 1) over the paper [23], correcting, on the way, an error in the technical statement [23] (Theorem 7); as to the search part, we present an efficient (quadratic) algorithm replacing an exponential algorithm of [23]. There is a chance that the new bound can be slightly improved if more computational resources will be used. We also use the same search algorithm to improve the bound for the square-free language; the combinatorial part, presented in [19], is much simpler than for the cube-free case, and we see no way to improve it.
As a byproduct of the search algorithm, we find "building blocks" to construct an infinite path with small branching density. We call it small because it is smaller than the fraction of branching points at the nth level of the tree for each n that is big enough.
(See Section 4.2 for the details.) Finally, a separate combinatorial argument allows us to obtain an upper bound on the branching density for the cube-free case and present an example which is very close to this bound.
After preliminaries, we state and prove the results in Sections 3-5. In Section 3, we prove Theorem 1, which constitutes the combinatorial part of Theorem 3. The tools for the search part are described in Section 4.1. Section 4.2 presents the results of the search, Theorems 3 and 4, and a short discussion. Section 4.3 is devoted to Theorem 5. Finally, Section 5 contains Theorem 6 and its proof.

Preliminaries
We study words and languages over the binary alphabet {a, b} (apart from Section 4.2, where the result over a ternary alphabet is also presented), writing λ for the empty word and |w| for the length of a finite word w. If w = xyz for some words x, y, and z (any of which can be empty), then x, y, z are, respectively, a prefix, a factor, and a suffix of w. We write y ⊆ w to indicate that y is a factor of w. The set of all finite (nonempty finite, infinite) words over an alphabet Σ is denoted by Σ * (resp., Σ + , Σ ∞ ). Elements of Σ + (Σ ∞ ) are treated as functions w : {1, . . . , n} → Σ (resp., w : N → Σ). We write [i..j] for the range i, i+1, . . . , j of positive integers; the notation w[i..j] stands for the factor of the word w occupying this range, as well as for the particular occurrence of this factor in w; w[i.
.j] is internal if i > 1 and j < |w|. By position of a factor, we mean its starting (=leftmost) position. The distance between factors of a word is the difference of their positions; for example, the distance between occurrences of aa in aabaa is 3. A cyclic shift of a finite word w is any word w[i..|w|]w[1..i −1]. The complement of a finite or infinite word w is the image of w under the map which replaces all a's by b's and all b's by a's.
The prefix tree of a language L is a directed tree, whose set of nodes is the set of all prefixes of words from L, and the set of edges consists of all pairs (u, uc) such that c is a letter. Edges are labelled by the last letter of the destination node: u c − → uc. The only node having no incoming edges, and thus the root of the tree, is λ. A prefix tree is (in)finite whenever L is (in)finite. A finite prefix tree is often considered as a finite automaton and called trie.
A cube is a nonempty word of the form uuu. A word is cube-free if it has no cubes as factors; a cube is minimal if it contains no other cubes as factors. A p-cube is a minimal cube with the minimal period p (i.e., |u| = p). Other important repetitions include squares (words of the form uu) and overlaps (words having a period strictly smaller than half of their length).
The language CF of binary cube-free words is infinite and can be represented by its prefix tree T , in which the nodes are precisely all cube-free words. The label of every path from the root coincides, as a word, with the terminal node of this path. A node in T is either a leaf (infinite paths contain no leaves), or has a single child (fixed node; its outgoing edge, the letter labeling this edge, and the position of this letter in the label of the path are also called fixed), or has two children (branching point; the outgoing edges, and their labels and positions are called free). A fragment of T is shown in Figure 1. Figure 1. A fragment of the prefix tree of the binary cube-free language CF. Branching nodes and free edges are green, while fixed nodes and fixed edges are red. Nodes can be restored from the labels of paths.
To estimate the number of branching nodes in a path, we obtain bounds on the number of fixed positions/letters in the corresponding word. Assume that some position i in a cube-free word w is fixed; w.l.o.g., w[i] = a. Then the word w[1..i−1]b ends with a (unique) p-cube; in this case, we say that i (or w[i]) is fixed by a p-cube. We assume that some constant h is chosen (we will choose it later) and we partition fixed positions in words into two groups: those fixed by "small" p-cubes with p < h and those fixed by "big" p-cubes with p ≥ h. To get the lower bound on the branching density, we establish separate upper bounds on the numbers of positions fixed by small and big cubes. All other results involve small cubes only.

Positions Fixed by Big Cubes
The aim of this section is to prove the following upper bound on the density of positions fixed by big cubes in a cube-free word. Theorem 1. For any integer h ≥ 2 and any infinite cube-free word w, the density of positions fixed by cubes with periods ≥ h in w is at most 6 5h .
Theorem 1 is based on the following result, describing the restrictions on the cubes of similar length fixing closely located letters. Theorem 2. Suppose that t, l ≥ 1, p, q ≥ 2 are integers, w is a word of length t+l such that w[1..t+l−1] is cube-free, the position t is fixed by a p-cube, and w ends with a q-cube. Then q is outside the red zone in Figure 2. is fixed by a q-cube, then q (as a function of l with parameter p) must be outside the red polygon, including red border lines. The cases q > 2p and q < p/2 are not considered. Figure 2 improve and correct their earlier analogs, Theorem 7 and Figure 1 of [23]. The improvement can be seen as a few additional red patches in Figure 2 w.r.t.

Remark 1. Theorem 2 and
to ([23] Figure 1), and the correction is that the triangle with vertices (2p, p), (4p, 2p), and (5p, 2p) in ( [23] Figure 1) should have been painted in green. This error does not affect the proofs of the main results of [23]: in those proofs, only a part of the red area is used. This part is drawn in ( [23] Figure 8) and is strictly inside the red area in our Figure 2. Thus, in [23], only Theorem 7 and Remark 8 are (partially) incorrect.

Remark 2.
We believe that the boundary of the red area in Figure 2 is exact for p/2 ≤ q ≤ 2p, and so the result of Theorem 2 is optimal. We do not prove this claim, because it is not important for the aims of this paper. To substantiate the claim, we provide Table 1 with the examples of the words w corresponding to green points in the corners of the boundary in Figure 2. Table 1. Example words w with the pair (l, q) in the green corner of the red zone boundary in Figure 2. One can take X = abbaab or a longer overlap-free word of similar structure.

Point
Word w (PPP is bold bold bold, QQQ is underlined) We write QQQ for the q-cube, which is a suffix of w. We begin with a few observations. O1. If q = p/2, the condition w[t − p] = w[t] implies that the suffix QQQ of w does not contain w[t − p]. Hence, l ≥ 3q − p = p/2, giving us the red segment of the line q = p/2 in Figure 2. To get the red parts of the lines q = p and q = 2p, note that for q = p, the same argument gives l ≥ 2p, and for q = 2p, the condition w[t − 2p] = w[t] implies l ≥ 4p. From now on, we assume p/2 < q < 2p and q = p.
O2. Let i be the bigger of the positions of PPP and QQQ in w and consider the factor w[i..t−1], having both periods p and q. If its length t − i is big enough to apply Lemma 2, the words P, Q are integral powers of shorter words, contradicting the condition that w[1..t+l−1] is cube-free. Thus, Lemma 2 must be inapplicable, giving us t − i < p + q − gcd(p, q) ≤ 3p − 2 (recall that q < 2p). Hence, the position of QQQ in w is bigger than the position of PPP , implying QQQ = w[i..t+l], so that 3q = t + l − i + 1 < l + 1 + p + q − gcd(p, q). From this, l > 2q − p, meaning that all green points with q < 2p are strictly below the line q = l+p 2 shown in Figure 2. O3. If the factor w[i..t−1] considered in O2 is shorter than max{p, q}, then we are unable to restrict q: the p-periodic factor PPP and the q-periodic suffix QQQ have too short an overlap to "interact" inside w. Recall that t − i = 3q − l − 1, so all are strictly above the lines q = l+p 3 and q = l 2 in Figure 2. Thus, to prove the theorem, it remains to justify the colouring of the stripe between the line q = l+p 2 and the broken line {q = l+p 3 ; q = l 2 } in Figure 2. We split this stripe into zones I-IV by the lines q = l, q = p, and q = l+2p 3 . The arguments for all zones are very similar, so we provide maximum details for zone I and more concise proofs for zones II-IV.
Zone I: q > l+2p 3 . Together with q < 2p (O1) and 2q < l + p (O2), this gives the mutual location of the suffix QQQ and factor PPP of w, as depicted in Figure 3. Equal letters denote equal factors; note that x = λ since 2q < l + p and z = λ, since q < 2p.
x x x z z z y y y a a b P P Q Q P ′ Figure 3. Location of factors of w for Zone I: q > l+2p 3 , q < 2p, 2q < l + p. The leftmost Q consists of a suffix of P, followed by P and a prefix of P; P = xzya is partitioned accordingly.
The words y, zy, and yaxz are prefixes of Q ( Figure 3). By the length argument, y is a prefix of zy, which is a prefix of yaxz. Then zyaxz ⊆ PP ( Figure 3) implies zzy ⊆ PP. Since PP is cube-free, zzz ⊆ PP, and thus z is not a prefix of y. Since z and y are both prefixes of Q, we have z = yz , z = λ. Further, yaxyaxz ⊆ QQ (Q begins with yaxz and ends with yax) but (yax) 3 ⊆ QQ, because QQ is cube-free. Then, the fact that z and yax are both prefixes of Q implies that z = yz is a proper prefix of yax, so ax = z x , x = λ. Now compare zy = yz y against yaxz = yz x yz . We see that y is a proper prefix of x y by the length argument. By Lemma 1 we can write x = f g, y = ( f g) n f for some words f , g; note that n ≤ 1 since x y is cube-free. If n = 1, we have However, QQ is cube-free, so n = 0, implying y = f , x = yg. Finally, we can write Q = yz ygyz yz yg. Note that g = λ, otherwise (yz y) 3 ⊆ QQ. From this representation of Q we can express q, p, and l in terms of |y|, |z |, and |g|; from Figure 3 we know that l = 2q − |yz yb|. Thus, we have , and also q > l+2p 3 (the border of Zone I). This gives us exactly the green triangle inside Zone I with the vertices ( 5 2 p, 3 2 p), ( 8 3 p, 5 3 p), (4p, 2p). Zone II: q ≤ l+2p 3 and q > p. Together with 2q > l (O3), this gives the mutual location of the suffix QQQ and factor PPP of w, as depicted in Figure 4 (y = λ since q > p; z or x can be empty).
The leftmost Q consists of a suffix of P, followed by a longer prefix of P; P = zyxa is partitioned accordingly.
Since xb and yx are prefixes of Q and y is a suffix of Q (Figure 4), one has yyx ⊆ QQ. As QQ is cube-free, y is not a prefix of x. Comparing the prefixes xb and yx of Q, we have y = xby for some (possibly empty) y . Then, Q = xby xazxby , P = zxby xa. We express p, q, and l = 2q − |xb| in terms of |x|, |y |, |z|: From (2), we get q = l − p − |y | ≤ l − p; together with the boundaries of Zone II, the line q = l − p forms the green triangle inside Zone II with the vertices (2p, p), ( 5 2 p, 3 2 p), (4p, 2p) ( Figure 2).
Zone III: q ≤ l, q < p. Together with q > l+p 3 (O3), this gives the mutual location of the suffix QQQ and factor PPP of w, as depicted in Figure 5 (v, z = λ since q < p; x, y can be empty). Figure 5. Location of factors of w for Zone III: q < p, q ≤ l, 3q > l + p. The leftmost Q consists of a suffix of P, followed by a shorter prefix of P; the middle Q ends with some suffix y outside P , possibly empty; P = zvxa is partitioned accordingly.
Since xa and vx are prefixes of Q and vQ ⊆ PP ( Figure 5), one has vvx ⊆ PP, so v is not a prefix of x, and thus v = xav for some (possibly empty) v . Then z = v xby, Q = xav xby, P = v xbyxav xa. We express p, q, and l = q + |y| in terms of |x|, |v |, |y|: 4 ; together with the boundaries of Zone III, this line forms the green triangle in Zone III with the vertices ( p 2 , Figure 2). Zone IV: q > l. One has q > p/2 (O1) and 2q < l + p (O2), and so, q < p. Then the mutual location of the suffix QQQ and factor PPP of w is as in Figure 6 Figure 6. Location of factors of w for Zone IV: q > p/2, q > l, 2q < l + p. The leftmost Q consists of a suffix of P, followed by a shorter prefix of P; the middle Q is a proper factor of P; P = zvxa is partitioned accordingly.
Since x, vx are prefixes of Q and vQ ⊆ PP ( Figure 6), one has vvx ⊆ PP, so vvx ⊆ PP, v is not a prefix of x, and thus v = xv for some v = λ. Taking y and xy, which are another pair of prefixes of Q, we get x = ybx (x is possibly empty) because xxy ⊆ QQ. Note that if v is a prefix of x, then (xv ) 3 ⊆ xv xv xx ⊆ vQQ ⊆ PP, which is impossible. Thus, v is not a prefix of x and then of y. Since xv and xya are prefixes of Q, we get v = yag for some (possibly empty) g. Thus, Q = ybx yagybx , P = gybx ybx yagybx ya. We express p, q, and l = q − |yb| in terms of |x |, |g|, |y|: From (4), we get q = l+2p−|g| 4 ≤ l+2p 4 and q = p − l + |x | ≥ p − l; together with the boundary q = l, the obtained two lines form the green triangle in Zone IV with the vertices ( p 2 , p 2 ), ( 2 5 p, 3 5 p), ( 2 3 p, 2 3 p) ( Figure 2). Thus, we identified all "red" and "green" parts of the areas I-IV, getting the full picture from Figure 2. Theorem 2 is proved.
The second crucial step in the proof of Theorem 1 is the following lemma on the density of positions fixed by cubes with similar periods. Lemma 3. Suppose that l ≥ 1, p ≥ 2 are integers, and w is a cube-free word such that |w| > l. Among any l consecutive letters of w, those less than 8 5 + 3l 5p are fixed by cubes with periods in the range [p..2p−1].
Proof. Let us consider an inverse problem: ( ) Let l 0 < l 1 < · · · < l s (s ≥ 1) be positions in w containing letters fixed by cubes with periods q 0 , . . . , q s , respectively, where q i ∈ [p..2p−1] for all i; find a lower bound for l s − l 0 (as a function of s and p) which applies for every sequence q 0 , . . . , q s .
The distance between each two consecutive position l i and l i+1 is lower-bounded by Theorem 2. More precisely, we use Theorem 2 to make conclusions of the form is on the border of the red polygon in Figure 2, in which p = q i , l = l i , and q = q i+1 . For example, since the point (25, 15) is on the segment q = l − p of the border of such a polygon built for p = 10, we conclude that the condition q i = 10, q i+1 = 15 implies Then Theorem 2 implies the following inequalities related to the boundaries of the polygon in Figure 2 (β and α play the roles of p and q, respectively): Assume that q = (q 0 , . . . , q s ) is a sequence of positive rational numbers such that We define its span span( q) as the lower bound for the difference l s − l 0 for the sequence q of periods. Precisely, span( q) is the minimum number such that there exists a sequence 0 = l 0 < l 1 < · · · < l s = span( q) satisfying, for each i, the property "the point (l i+1 −l i , q i+1 ) is on the border of the red polygon in Figure 2, in which q i is substituted for p". Thus, min span( q), where the minimum is taken over all sequences of length s+1 in the given range [p..2p−1], is the lower bound sought in ( ).
We write span(q i , . . . , q j ) for the span of the corresponding subsequence of q. Note that spans are additive: span(q i , . . . , q j ) + span(q j , . . . , q m ) = span(q i , . . . , q m ). For the simplest case of a two-element sequence, (5a)-(5d) imply From (6), we immediately have ( * ) for any fixed β, the function span(β, α) monotonically increases for α ∈ [ 3 5 β, 2β). Since all borders in Figure 2 are line segments, the equality span(C q) = C · span( q) holds for any C > 0 (if a sequence (l 0 , . . . , l s ) works for q, then (Cl 0 , . . . , Cl s ) works for C q ). Thus, we simplify the subsequent argument by considering a particular range for the sequence q: q i belongs to the semiclosed interval [1,2) for all i.
Given q, we iteratively modify it from right to left. Each modification results in a sequence of the same length, in the same range, and with the same or a smaller span; the result of the last modification is one of "good" sequences, the span of which can be easily computed. The smallest span of a "good" sequence is the lower bound for the span of any sequence q in the given range. Precise definitions are as follows. We call a sequence (r 0 , . . . , r t ) canonical if it contains only numbers 1 and 5 3 in a way that no two ( 5 3 )'s are consecutive and, in addition, r 0 = r t = 1. A sequence q = (q 0 , . . . , q s ) is good if it has a nonempty canonical suffix beginning at q 0 , q 1 , or q 2 .
Rules T1 and/or T2 replace the last number in the processed sequence q by 1 and serve as the base case in transforming q into a good sequence. Now we describe the general case, assuming that q has a nonempty canonical suffix (q i , . . . , q s ). The subsequent transformations preserve the numbers q i , . . . , q s and aim at extending the canonical suffix.
Proof of Theorem 1. We split the range from h to infinity into disjoint finite ranges such that the kth range is [2 k−1 h..2 k h−1]. Consider the density of positions in a cube-free word w, fixed by p-cubes with p in the kth range. By Lemma 3 and the definition of density, it is upper-bounded by 3 5·2 k−1 h . Summing up the geometric series of all these upper bounds, we get the required bound 6 5h .

Regular Approximations and Aho-Corasick Automata
To estimate the number of letters in a cube-free word that are fixed by small cubes, we analyze finite automata recognizing some approximations of the language CF. Let L i be the language of all binary words containing no cubes of period ≤ i. Then, L i contains CF and is regular (as a language defined by a finite set of forbidden factors); L i is referred to as ith regular approximation of CF. The study of regular approximations is a standard approach to power-free languages (see, e.g., [9] (Section 3)).
A regular language given by a finite set of forbidden factors can be represented by a partial deterministic finite automaton (dfa) built by a variation of the classical Aho-Corasick algorithm, as described in [27]. Let us provide some necessary details for regular approximations of CF, following a more general scheme from [28].
To get the dfa A i accepting the language L i , one proceeds in three steps.

1.
List all p-cubes with periods p ≤ i and build the prefix tree P i of these words; then, the leaves of P i are exactly the p-cubes, and all internal nodes are cube-free words: 2.
Consider P i as a partial dfa with the initial state λ and complete this dfa, adding transitions by the Aho-Corasick rule: if there is no transition from a state u by a letter c, add the transition u c − → v, where v is the longest suffix of uc, present in P i ; 3.
Delete all leaves of P i from the obtained automaton. The resulting partial dfa is A i ; it accepts by any state and rejects if it cannot read the word. For details see, for example, [27]. Let us fix some i ≥ 1 and analyze the properties of A i .
We write u.v for the state of A i reached from the state u by the path labelled by v. The following lemma connects A i and fixed letters in cube-free words. In accordance with the other notation, we call fixed the states of A i with a single outgoing transition and the edges representing these transitions. The next lemma shows how to get an upper bound on the number of letters in a word, fixed by short cubes. Proof. In A i , consider the walk W from λ to λ.w labelled by w. By Lemma 4, the number of fixed positions we need to estimate equals the number of occurrences of fixed states in W, excluding the terminal occurrence of λ.w. Note that W, as well as any walk in A i , can be obtained as follows: take a simple path between the initial and the terminal states of the walk and insert repeatedly simple cycles into the walk built so far. The simple path (say, of length n) contains, at most, d i n + c i fixed states, and the rest contains, at most, d i (|w| − n) fixed states, whence the result.

Corollary 1.
In an infinite cube-free word, the density of the set of letters fixed by cubes with periods of at most i is upper-bounded by d i .
The numbers d i and c i can be computed from A i in polynomial time using dynamic programming (due to Corollary 1, we need only d i ). A straightforward way to do this is to compute for each u, v in the order of increasing k the maximum fraction d[u, v, k] of fixed states in a (u, v)-walk of length at most k; then d i = max u d[u, u, N i ], where A i has N i states. This algorithm has cubic complexity, but we can do significantly better. We note that any automaton A i has a unique nontrivial strongly connected component; this quite nontrivial fact follows from the main result of [29].

Proposition 1. Let N i and n i be the numbers of nodes in A i and its nontrivial strongly connected component, respectively. Then there exists an algorithm computing d i from
Proof. Recall that the mean cost of a walk in a weighted digraph is the ratio between its cost and its length. We reduce the problem of computing d i to the problem of finding a cycle of the minimum mean cost. Considering A i as a digraph, we assign cost 0 to fixed edges and cost 1 to free edges. Then we replace A i with its unique nontrivial strongly connected component A i preserving the edge costs. This component contains all cycles of A i . Now where µ is the minimum mean cost of a cycle in the weighted digraph A i . The mean cost problem can be solved for an arbitrary strongly connected digraph with n nodes and m edges in O(nm) time and space using Karp's algorithm [30]. Noting that, in our case, m = O(n), and that the strongly connected component can be found in linear time by a textbook algorithm, we obtain the required time bound.
For the sake of completeness, let us describe Karp's algorithm for our case. Fix an arbitrary state q and define C(j, v) to be the minimum cost of a length-j walk from q to v or ∞ if no such walk exists. The (n i + 1) × n i table with the values of C(j, v) for j = 0, . . . , n i and all states of A i is filled using the following dynamic programming rule: According to [30] (Theorem 1), µ = min v∈A i max 0≤j<n i

Remark 3.
Karp's algorithm also allows one to retrieve a cycle of minimum mean cost. To do this, one stores the node z = P(j, v), which gives the minimum in the computation (7) of C(j, v) (here, j = 1, . . . , n i , and P(j, v) is undefined if C(j, v) = ∞). The n i × n i table P(j, v) is then used as follows. If u is a node for which the value of µ is reached, then we built the length-n i walk q = u 0 → u 1 → · · · → u n i = u such that P(j+1, u j+1 ) = u j for all j and output any simple cycle from this walk. We will need the cycles of minimum mean cost in Section 4.3.

Lower Bounds on Branching Density
We implemented the above algorithm and ran it for all i ≤ 18; for i = 18, the memory required for the table C(j, v) is over 1 GB. The results are as follows.
The statement of the theorem now follows.
In the same way, we can get the lower bound for the ternary square-free language SF. From [19] (Lemma 5), we have the upper bound 2 h for the density of positions fixed by squares of periods ≥ h. Lemmas 4 and 5, and Corollary 1 have direct analogs for ternary square-free words; Proposition 1 and the algorithm inside remain valid for any automaton having, at most, two outgoing edges for each state. Running the algorithm for the regular approximations of SF up to i = 30, we obtained the correspondent numbers d i . Taking h = 31 and adding 2 h to d 30 = 19/28, we arrive at the following bound. Recall that the growth rate of a factorial language L over the alphabet Σ is the limit lim n→∞ |L ∩ Σ n | 1/n . The growth rate β of CF is known with quite high precision [9]: 1.4575732 ≤ β ≤ 1.4575773. In terms of the prefix tree, this means that for big n, the number of nodes at the (n + 1)th level is approximately β times bigger than the number of nodes at the nth level. This fact makes β − 1 a lower bound on the fraction of branching nodes at the nth level (because this level also contains nodes having no children). In Theorem 5 below, we use Proposition 1 and Remark 3 to prove that there exist infinite cube-free words with the branching density strictly smaller than β − 1.
The above considerations can also be applied to the growth rate γ of SF, 1.3017597 ≤ γ ≤ 1.3017619 [9]. However, it is still open whether an infinite square-free word can have the branching density smaller than γ − 1. The method of Theorem 5 would not work for SF because the obtained values of d i are too small.

Cube-Free Words with Small Branching Density
Theorem 5. The minimum branching density of an infinite cube-free word is less than or equal to 13/29 ≈ 0.44828.
Proof. The result of Lemma 6 gives us an idea of constructing an infinite cube-free word with branching density less than β − 1. We see that 1 − d 15 ≈ 0.44792 < β − 1 (while 1 − d 14 ≈ 0.45833 > β − 1). Our aim is to construct an infinite cube-free word which has the density of fixed positions very close to d 15 .
Using the table P(j, v) of Karp's algorithm (see Remark 3), we find that the automaton A 15 contains exactly four cycles C 1 , C 2 ,C 1 , andC 2 , each of length 96, reaching the minimum mean cost (1 − d 15 ). All four cycles are disjoint; the labels of cyclesC 1 ,C 2 are complements of the labels of C 1 and C 2 , respectively. We note that C 1 and C 2 are connected to each other by many edges. Let us consider a subgraph of A 15 consisting of C 1 , C 2 , and two edges connecting them as in Figure 7.   Since in the Aho-Corasick automaton, all edges with a common endpoint have the same label (if this endpoint is distinct from λ), the paths from u to v and from u to v in Figure 7 are labeled by the same word x 1 , while the paths from both v and v to u are labeled by the same word x 2 . Denote the labels of the paths from v to u and from u to v by y 1 and y 2 , respectively. Then the label of C 1 is x 1 y 1 (starting from u), the label of C 2 is x 2 y 2 (starting from v ), and there is an "outer" cycle C 3 with the label x 1 x 2 (starting from u ). We also note that x 1 and y 2 (resp., x 2 and y 1 ) begin with different letters.
Analyzing the subgraph of A 15 generated by C 1 and C 2 , we find the cycle C 3 with the smallest mean cost: it has a length of 156 and 86 fixed states. The corresponding values of x 1 , x 2 , y 1 , y 2 are as follows: Recall that the Thue-Morse word t is the fixed point of the morphism a → ab, b → ba: .∞] = abba baab baab abba baab abba abba baab baab abba abba baab abba baab . . . . This word is overlap-free [10], that is, it has no factors of the form cucuc where c is a letter and u is a word. We map t to an infinite binary word by the mapping φ defined by two rules: In other terms, to get φ(t), we replace each a (resp., b) in t with x 1 (resp., x 2 ), and then insert y i in the middle of each factor x i x i of the obtained word: t = a bb ab aa bb aa ba bb a . . .
Thus, we naturally have a partition of φ(t) into factors which we call blocks, distinguishing between x-blocks x 1 , x 2 and y-blocks y 1 , y 2 . Note that ( ) x 1 and x 2 have no occurrences in φ(t) other than x-blocks.
Let us prove that φ(t) is cube-free. Assume to the contrary that φ(t) contains a cube; let XXX be the leftmost of the minimal cubes in φ(t). A direct check shows that all words x i y i x i and x i x j x i are cube-free. Therefore, XXX contains a whole x-block inside; below, x i denotes the leftmost such block, and x j denotes the x-block distinct from x i . The period |X| of the cube cannot be smaller than the minimum period of x i . It is easy to check that the only period of x i is |x i | − 3. Thus, |X| ≥ |x i | − 3 ≥ 74.
First, assume that x i is the only x-block in XXX. Since for the word x j x i x j is cube-free, XXX is an internal factor of either x i y i x i x j or x j x i y i x i . Since |x i y i x i |, |y i x i x j | ≤ 175 and |XXX| ≥ 74 · 3 = 222, the length argument shows that y i x i in the first case, and x i y i in the second case is a factor of XXX. Thus, |X| is not smaller than the minimum of the periods of words x i y i , y i x i . This minimum equals 84, which is a period of both y 1 x 1 and y 2 x 2 . Hence, |XXX| ≥ 252 = 2|x i | + |x j | + |y i |. However, an internal factor of a word must be shorter by at least two symbols than the word itself; this contradiction shows that XXX contains more than one x-block. Therefore, XXX contains either x i y i x i or x i x j .
Let XXX = v 1 x i y i x i v 2 . By the choice of x i , v 1 is a proper suffix of x j and v 2 , if nonempty, begins with the first letter of x j , which differs from the first letter of y i . Thus, if |X| equals the minimum period of x i y i x i , which is |x i y i |, then v 2 = λ and then |XXX| < 3|X|, which is impossible. Hence, |X| > |x i y i |. Therefore, for each of the two blocks, x i , we see that XXX contains another factor x i at the distance of |X|. By ( ), these factors are x-blocks; both are inside v 2 because v 1 is a suffix of an x-block. Then, X contains at least one occurrence of the block x i . As a result, XXX contains a factor x i wx i wx i for some word w which is a product of blocks; here, x i w is a cyclic shift of X. Taking the φ-pre-image of x i wx i wx i , we obtain a factor of the form auaua or bubub in t, in contradiction with the overlap-freeness property.
Finally, let XXX = v 1 x i x j v 2 . The word x i x j has no periods smaller than |x i x j | − 1. Hence, |X| > |x i | + |x j | − 1 and then X contains at least one of x i , x j . Since v 1 contains no x-blocks by the choice of x i , v 1 x i has no factor x j by ( ). Then, X must contain x i , and we arrive at a contradiction as in the previous paragraph. Thus, finally, we have proved that φ(t) is cube-free.
The word φ(t) corresponds to an infinite walk from the node u in the subgraph of A 15 depicted in Figure 7. The walk reads x 1 (rule 1 in the definition of φ) and then respects rule 2; the details are as follows. Let x i be just read; if the current letter of t coincides with the previous one, the walk returns to the "start" node of the same cycle C i by reading y i and reads x i again; otherwise, the walk reads x j , where i = j. We know the fractions of fixed states in the cycles C 1 , C 2 , and C 3 ; to calculate the density of positions fixed by short squares in φ(t) we use the folklore fact that the density of the sets of positions i such that t[i..i+1] = aa (resp., ab, ba, bb) equals 1/6 (resp., 1/3, 1/3, 1/6). Then in the partition of φ(t) into blocks, the densities of the blocks x 1 , x 2 are equal and twice bigger than the densities of the blocks y 1 and y 2 . We group the blocks into labels of cycles C 1 , C 2 , C 3 : Since x-blocks appear in the labels of two cycles while y-blocks appear in the label of one cycle, all cycles appear with the same density. Thus, to get the density of fixed letters, we take the total number of fixed states in C 1 , C 2 , and C 3 and divide it by the sum of lengths of cycles to get (86 + 53 + 53)/(156 + 96 + 96) = 16/29. Then the branching density of φ(t) is, at most, 13/29. The theorem is proved.

Remark 4.
In fact, the branching density of φ(t) is exactly 13/29: refining the analysis of cubefreeness of φ(t), it is possible to show that this word does not contain letters fixed by long cubes.

The Bounds on Maximum Branching Density
The branching density of infinite cube-free words can be much bigger than β − 1 ≈ 0.45758. The aim of this section is to prove the following theorem.

Theorem 6.
(1) The maximum branching density of an infinite binary cube-free word is less than 67/93 ≈ 0.72043.

Example 1.
The branching density of the Thue-Morse word t is 2/3. Indeed, t is overlap-free, and thus all fixed letters in it are fixed by 1-cubes. Hence, the fixed letters are exactly the letters a (resp. b) preceded by the 1-square bb (resp. by aa); in each case, the density of such positions is 1/6, as mentioned in Section 4.3. Thus, the density of fixed positions is 1/3.
The proof of Theorem 6 is based on the analysis of positions fixed by 1 cube. The distance between two successive occurrences of the square of a letter in a cube-free word is 2 (aabb/bbaa) or 3 (aabaa/bbabb) or 4 (aababb/bbabaa) or 5 (aababaa/bbababb); it cannot be 1 (aaa/bbb) or ≥ 6 (aababab · · · /bbababa · · · ) because of cube-freeness. Hence, if we know a prefix w[1.
.i] of a cube-free word, this prefix ends with a 1-square, and the distance d to the next 1-square is known; then we can uniquely reconstruct w[1..i + d]. We consider an auxiliary alphabet ∆ = {2, 3, 4, 5} and refer to its elements as digits and to the words over it as codes. For every cube-free word w ∈ Σ ∞ we define its distance code dist(w) ∈ ∆ ∞ as follows: dist(w)[i] is the distance between the ith and (i+1)th 1-squares in w (counting from the left). For example, one has a b a a b b a a b a b b a b a a b a b b a a b b a b a a Note that dist(w) determines w up to the complement and the few letters preceding the first 1-square; in particular, it determines the branching density of w if w is cubefree. Thus, instead of infinite cube-free words, here we study their distance codes. We extend the definition of a distance code to finite words in the obvious way; for example, dist(bbabaabb) = 42. Here, dist(w) determines w up to the complement, the letters preceding the first 1-square, and the letters following the last 1-square. We define the inverse of the map dist: for a code X ∈ ∆ + , w = word(X) is the unique word which begins with aa, ends with a 1-square, and satisfies dist(w) = X. Clearly, word(X) has length [X] + 2, where [X] denotes the sum of digits in X, and has |X| letters fixed by 1-cubes. For example, the cube-free word aabbabbabaababaa = word(2345) has a length of 16 and 4 letters fixed by 1-cubes (underlined). The same definition of word, with the condition on the end of the word omitted, applies for infinite codes.
Remark 5. The word word(33) = aabaabaa is not a proper factor of a cube-free word; word(434) = aababbabbabaa contains (bab) 3 , as well as word(435), word(534), and word(535). In the following list of cube-free words, letters fixed by p-cubes are underlined by p lines: It is easy to check that word(24), word(42), word(44), and word(454) contain no almostcubes and have at least six periods. Thus, |x| ≥ 6, and hence, x contains a 1-square. Then, |x| is the distance between two squares of the same letter. Each of word(2) and word(4) begins and ends with different 1-squares, while word(5) begins and ends with the same square. Hence, |x| = [Y] for some factor Y of X such that (i) the total number of 2's and 4's in Y is even; (ii) YY is a factor of X. If Y contains no 5's, there are just two cases to check. their lengths are less than 8 · 3 − 1 = 23 required for an almost-cube.
Therefore, 5 must occur in Y. Since YY ⊆ X, |Y| is the distance between two occurrences of 5 in X. Since 5 occurs in X only as a suffix of the block B, Y is a cyclic shift of some product of blocks. As above, we will show that the longest factor of u with period |x| = [Y] is shorter than 3|x| − 1. Let C = (4 4 2) 4 4 2 , then A = C2, B = C445. Since t is overlap-free, the maximal factor of X with period |Y| looks like Y Y C, where Y is a product of blocks and a cyclic shift of Y. Then, the longest factor of u with period |x| looks like v 1 word(Y Y C)v 2 . Here, |word(Y Y C)| = 2[Y ] + [C] + 2 = 2[Y ] + 82; note that [Y] ≥ [B] = 93. Further, v 1 is the common suffix obtained when decoding different digits (2 and 5) and v 2 is the common prefix obtained when decoding different digits (2 and 4). Hence, |v 1 | = |v 2 | = 1. In total, the length of the |x|-periodic factor is strictly smaller than 3[Y ] − 1. Therefore, no almost-cubes are present in u. This proves Statement 2 of the theorem.
For Statement 1, we take a cube-free word w of maximum branching density and consider its code dist(w). By Remark 5, each digit in dist(w) corresponds to a letter fixed by a 1-cube, and a digit 5 also corresponds to a letter fixed by a 2-cube. The density of fixed positions in w is at its minimum, and thus is upper-bounded by 7/25, which is such a density for u. Since 7/25 is closer to 1/4 than to 1/3, the majority of digits in w are 4's. Since w has the same branching density as each of its suffixes, we assume w.l.o.g. that dist(w) begins with 4, and represent it as a sequence of blocks: each block consists of one or more 4's in the beginning and one or more other digits in the end. Note that the words word(54 4 5) and word(c4 5 d), for any digits c, d, contain an 8-cube (cf. Case 1 above). Then, a short search reveals all blocks providing the density of fixed positions not greater than 0. (We recall that blocks containing 3's are restricted, as shown in Remark 5.) We note that word((4 4 2) 5 4 4 ) and word((4 4 2) 5 4 3 5) contain 36-cubes, while word(4 3 24 3 24 3 ) contains a 14cube. As a result, the density of fixed letters in w cannot be smaller than such density in word(((4 4 2) 4 4 4 5) ∞ ), which is 26/93. This gives us the upper bound 67/93 on the branching density of w, as required.

Discussion and Future Work
As we have seen in this paper, the branching density of particular infinite words in a typical power-free language of exponential growth can vary significantly. Thus, a natural question is to determine the average density. The first problem is to define what is "average"; we suggest that this should be the expected density of a word randomly chosen from all infinite binary cube-free words according to the distribution which is "uniform" in some sense. One possible way to choose a random infinite word is a random walk down the prefix tree (with all finite subtrees trimmed).
Another possible next step is to check whether the ternary square-free language SF, which is another "typical" power-free language of exponential growth, demonstrates the same patterns as CF. Currently we do not know whether some infinite square-free words have branching density strictly less than its growth rate minus one. We also know no reasonable bound for the maximum branching density in SF.