More Variations on Shuffle Squares

Jarosław Grytczuk; Bartłomiej Pawlik; Mariusz Pleszczyński

doi:10.3390/sym15111982

,

and

¹

Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-662 Warsaw, Poland

²

Institute of Mathematics, Silesian University of Technology, 44-100 Gliwice, Poland

^*

Author to whom correspondence should be addressed.

^†

The first author was supported by the European Regional Development Fund under the grant No. POIR.01.01.01-00-0124/17-00, on the basis of an agreement between FinAi S.A. and the National Center for Research and Development based in Warsaw.

Symmetry2023, 15(11), 1982;https://doi.org/10.3390/sym15111982

This article belongs to the Special Issue Symmetry in Combinatorics and Discrete Mathematics

Version Notes

Order Reprints

Abstract

We study an abstract variant of squares (and shuffle squares) defined by a constraint graph G, specifying which pairs of words form a square. So, a shuffle G-square is a word that can be split into two disjoint subwords U and W (of the same length), which are joined by an edge. This setting generalizes a recently introduced model of shuffle squares based on word symmetry and permutations. By using the probabilistic method, we provide a sufficient condition for a constraint graph G guaranteeing the avoidability of shuffle G-squares. By a more-elementary method (known as Rosenfeld counting), we prove that G-squares are avoidable over an alphabet of size

4 α

,

α > 1

, provided that the degree of every word of length n in G is at most

α^{n}

. We also introduce the concept of the cutting distance between words and state several conjectures involving this notion and various kinds of shuffle squares. We suspect that, for every

k ⩾ 2

, there is a constant

c_{k}

such that every even word can be turned into a shuffle square by cutting it in at most

c_{k}

places and rearranging the resulting pieces. We present some computational, as well as theoretical evidence in favor of this conjecture.

Keywords:

combinatorics on words; square-free words; shuffle squares; repetition

1. Introduction

Let

A

be a fixed alphabet, and let U be a word over

A

. A subword of U is any word obtained by deleting some (possibly zero) letters of U. For example, the word

atic

is a subword of the word mathematics. A factor of U is a special type of a subword consisting of consecutive letters of U. The fact that F is a factor of U can be written as U = PFS, where P and S are some (possibly empty) words (called a prefix and a suffix of U, respectively). For instance, thema is a factor of mathematics. A square is a word of the form S = UU for some nonempty word U, and a shuffle square is a word that can be split into two identical disjoint subwords. For example, the word hotshots is a square, while tuteurer is a shuffle square, but not a square. Clearly, every letter in a shuffle square must occur an even number of times. We will call any word with this property a tangram.

Shuffle squares were introduced by Henshall, Rampersad, and Shallit in [1]. The main question posed there was about the enumeration of shuffle squares of a fixed length over a given alphabet. Even for the smallest binary case, we do not have a satisfactory answer. It was only recently proven by He, Huang, Nam, and Thaper [2] that the number of binary shuffle squares of length 2n is at least

(\binom{2 n}{n})

, for

n ⩾ 3

. An intriguing conjecture presented there states that almost every binary tangram is a shuffle square (in the sense that the probability of picking a shuffle square uniformly at random from all tangrams of length 2n tends to 1 with n tending to infinity).

Many challenging problems concern the avoidability of squares and their various relatives. A word U is square-free if neither of its factors is a square. For instance, the word combinatorics is square-free, while the word repetitive is not. By saying that squares are avoidable, we mean that there exist arbitrarily long square-free words over some finite alphabet. In 1906, Thue [3] proved that squares are avoidable by constructing an infinite family of ternary square-free words (which is the best possible). This result is considered as the starting point of combinatorics on words—an important area with many connections to other branches of mathematics and computer science.

Similarly, one may investigate the avoidability of shuffle squares. Using the probabilistic method, Currie [4] proved that shuffle squares are avoidable over more than

10^{40}

letters (see [5]). Later, this was improved by lowering the size of an alphabet down to 10, by Müller [5], to 7, by Guégan and Ochem [6], and independently, by Grytczuk, Kozik, and Zaleski [7]. Recently, Bulteau, Jugé, and Vialette [8] proved that shuffle squares are avoidable over an alphabet of size six, which is currently the best estimate.

In the present paper, we consider the avoidability of more-general shuffle squares, introduced in [9]. An anagram of a word U is any word V obtained by rearranging the letters of U. For instance, the words braze and zebra are anagrams of each other. More formally, let

σ

be a permutation of the set

{1, 2, \dots, n}

denoted as a sequence

σ = a_{1} a_{2} \dots a_{n}

. If

U = u_{1} u_{2} \dots u_{n}

is a word of length n, then

σ (U) = u_{a_{1}} u_{a_{2}} \dots u_{a_{n}}

is a σ-anagram of U (the word obtained by rearranging the letters in U according to the permutation

σ

). For instance, if U = sword and

σ = 23, 451

, then

σ (U) =

words is a

σ

-anagram of U. We also say that two words,

U = u_{1} u_{2} \dots u_{n}

and

V = v_{1} v_{2} \dots v_{n}

, are σ-similar if

U = σ (V)

(or

V = σ (U)

). For example, the words U = braze and V = zebra are

σ

-similar with

σ = 34, 512

. A σ-square is just a word of the form

U V

, where U and V are

σ

-similar. A word W is a shuffle σ-square if it can be split into two subwords that are

σ

-similar.

Of course, every tangram is a shuffle

σ

-square for some permutation

σ

. Moreover, it is not hard to demonstrate that tangrams are not avoidable. Actually, any k-ary word of length

2^{k}

contains a tangram as a factor. This leads to a natural question: For which families of permutations

σ

are the corresponding shuffle

σ

-squares avoidable? Our main result gives a partial answer amounting to a sufficient condition concerning the size of the allowed permutation families.

Theorem 1.

Let

α > 1

be a constant, and let

P_{n}

be a set of permutations of length n satisfying

| P_{n} | ⩽ α^{n}

. Then, there exists

k = k (α)

and arbitrarily long k-ary words avoiding all shuffle σ-squares, with

σ \in ⋃_{n = 1}^{\infty} P_{n}

.

A cyclic shuffle square is a shuffle

σ

-square for a cyclic permutation

σ

. The theorem above implies that cyclic shuffle squares are avoidable, though the resulting upper bound on the alphabet size is, rather, not optimal. A similar conclusion holds for sets

P_{n}

consisting of all permutations avoiding a fixed pattern. This follows from the famous Stanley–Wilf Conjecture proven by Marcus and Tardos [10] (see Theorem 3).

In [9], we proved that every binary tangram is a cyclic shuffle square. We also made the following stronger conjecture, which says that every binary tangram can be shifted cyclically to a shuffle square.

Conjecture 1.

Let T be an arbitrary binary tangram. Then, there exists a factorization

T = X Y

such that the word

U = Y X

is a shuffle square.

The conjecture has been verified for all binary tangrams of length at most 20. The statement is not true for larger alphabets, as, for instance, none of the cyclic shifts of the word 011022 are a shuffle square. Perhaps the following weaker, but more-general conjecture is true.

Conjecture 2.

For every integer

k ⩾ 1

, there exists an integer

q = f (k)

such that every k-ary tangram T can be factorized as

T = X_{1} X_{2} \dots X_{q}

, so that the word

U = X_{a_{1}} X_{a_{2}} \dots X_{a_{q}}

is a shuffle square, for some permutation

σ = a_{1} a_{2} \dots a_{q}

.

The above conjecture can be stated using a more-general notion of the cutting distance, which could be of independent interest. A formal definition together with some results and computer experiments are contained in Section 3. Section 2 contains the proof of our main result stated in a more-abstract setting based on the idea of a constraint graph. The last section contains a short discussion of possible directions for future research.

2. Avoiding Abstract Squares

In this section, we present a proof of Theorem 1, which is derived as an immediate consequence of a more-general result. We start by introducing a general setting of abstract shuffle squares.

Consider any undirected graph G (loops are allowed) on the set of all finite words. A word U is called a G-square if

U = X Y

, where the factors X and Y have the same length and are joined by an edge in G. In a classical definition of a square, where

X = Y

, the graph G consists of loops only (at every vertex) and has no other edges. So, one may think of a G-square as of an abstract repetitive structure in which there may be no visible similarity between the first part and its “repetition”. Similarly, a word U is a shuffle G-square if U can be split into two disjoint subwords of the same length, which form an edge of G.

Theorem 2.

For every real number

α > 1

, there exists an integer

k = k (α)

such that, for any graph G on the set of k-ary words, in which every word of length n has at most

α^{n}

neighbors, there exist arbitrarily long k-ary words avoiding shuffle G-squares.

To see that the above theorem implies Theorem 1, it suffices to define a graph G by joining each word U to its

σ

-anagram

σ (U)

, for every permutation

σ \in ⋃_{n = 1}^{\infty} P_{n}

. Clearly, every word of length n will have at most

2 | P_{n} |

neighbors in G (by the symmetry of the relation of the

σ

-similarity of the words).

We apply the following version of the powerful tool from the probabilistic method—the Lovász Local Lemma (see [11]).

Lemma 1

(The Local Lemma; Multiple Versions ([11])). Let

A_{1}, \dots, A_{n}

be events in any probability space with dependency graph

D = (V, E)

. Let

V = V_{1} \cup \dots \cup V_{t}

be a partition such that every event in

V_{r}

has the same probability

p_{r}

. Suppose that the maximum number of vertices from

V_{s}

adjacent to a vertex from

V_{r}

is at most

Δ_{r s}

. If there exist real numbers

0 ⩽ x_{1}, \dots, x_{t} < 1

such that

p_{r} ⩽ x_{r} \prod_{s = 1}^{t} {(1 - x_{s})}^{Δ_{r s}}

, then

\Pr (⋂_{i = 1}^{n} \bar{A_{i}}) > 0

.

The events

A_{i}

in the above lemma are typically considered as “bad” events that we want to avoid. In our case, we pick a random word W and a bad event will mean that a shuffle G-square occurs as a factor of W. Then, the lemma guarantees that the probability that none of the bad events occur is positive, provided that all the assumptions are satisfied. As a consequence, there must exist a word avoiding shuffle G-squares.

Proof of Theorem 2.

Let

α > 1

be a fixed real number, and let k be a sufficiently large natural number to be specified later. Suppose that there is a fixed graph G on the set of k-ary words satisfying the assumption of the theorem.

Consider a random word

W = w_{1} w_{2} \dots w_{N}

of arbitrary, but fixed length N. For a fixed interval

R = [i + 1, i + 2 r]

of length

2 r

, let

A_{R}

be the event that the factor

w_{i + 1} w_{i + 2} \dots w_{i + 2 r}

is a shuffle G-square. Let

V_{r}

denote the family of all such events. Clearly, the probability of events in

V_{r}

depends only on r, so we may denote it as

p_{r}

and estimate it from above as follows.

First, notice that the two parts of the shuffle G-square determine a partition of the segment R into two parts, each of size r. The number of such partitions is equal to

\frac{1}{2} (\binom{2 r}{r}) ⩽ \frac{4^{r}}{2 \sqrt{π r}} .

(1)

One part of the partition may be occupied by any word of length r, while the other part contains a neighboring word in the graph G. Hence,

p_{r} ⩽ \frac{4^{r}}{2 \sqrt{π r}} \cdot \frac{k^{r} \cdot α^{r}}{k^{2 r}} = \frac{1}{2 \sqrt{π r}} \cdot {(\frac{4 α}{k})}^{r} .

(2)

To estimate

Δ_{r s}

, it is enough to notice that two events

A_{R}

and

A_{S}

are independent whenever the segments R and S are disjoint. Hence,

Δ_{r s} ⩽ 2 r + 2 s = 2 (r + s)

.

Now, put

x_{s} = 4^{- s}

, and notice that

(1 - x_{s}) ⩾ e^{- 2 x_{s}}

. So, we may write the following inequalities:

x_{r} \prod_{s = 1}^{N / 2} {(1 - x_{s})}^{Δ_{r s}} ⩾ 4^{- r} \prod_{s = 1}^{N / 2} e^{- 2 x_{s} Δ_{r s}} ⩾ 4^{- r} \exp (- 2 \sum_{s = 1}^{\infty} \frac{2 r + 2 s}{4^{s}}) .

(3)

We need two well-known formulas:

\sum_{s = 1}^{\infty} \frac{1}{x^{s}} = \frac{1}{x - 1} and \sum_{s = 1}^{\infty} \frac{s}{x^{s}} = \frac{x}{{(x - 1)}^{2}} .

(4)

Both series are convergent for

x > 1

; hence, by putting

x = 4

, we obtain

\sum_{s = 1}^{\infty} \frac{1}{4^{s}} = \frac{1}{3}

and

\sum_{s = 1}^{\infty} \frac{s}{4^{s}} = \frac{4}{9}

. Thus,

\exp (- 2 \sum_{s = 1}^{\infty} \frac{2 r + 2 s}{4^{s}}) = \exp (- 4 r \sum_{s = 1}^{\infty} \frac{1}{4^{s}}) \cdot \exp (- 4 \sum_{s = 1}^{\infty} \frac{s}{4^{s}}) = e^{- 4 r / 3} \cdot e^{- 16 / 9} .

(5)

Putting the last equality to (2.3), we obtain

x_{r} \prod_{s = 1}^{N / 2} {(1 - x_{s})}^{Δ_{r s}} ⩾ 4^{- r} \cdot e^{- 4 r / 3} \cdot e^{- 16 / 9} .

(6)

Hence, by Lemma 1, we obtain the desired conclusion, provided that

\frac{1}{2 \sqrt{π r}} \cdot {(\frac{4 α}{k})}^{r} ⩽ 4^{- r} \cdot e^{- 4 r / 3} \cdot e^{- 16 / 9},

(7)

which can be transformed to

{(\frac{e^{16 / 9}}{2 \sqrt{π r}})}^{1 / r} \cdot 16 α \cdot e^{4 / 3} ⩽ k .

(8)

So, the assertion of the theorem holds whenever

k ⩾ 102 α

. □

In the aforementioned proof, we were not taking too much care in optimizing the multiplicative constant in the lower bound for the alphabet size k. Most probably, one may obtain a better estimate by more-careful manipulations of the constants or by using some other method, like entropy compression or Rosenfeld counting (see [12,13]). However, the resulting bound on the size of the alphabet will be still linear in

α

.

Let us formulate now the mentioned consequence concerning shuffle squares based on permutations avoiding a fixed pattern. Recall that, for a given permutation

π

, a permutation

σ

is said to avoid

π

as a pattern if no subsequence of

σ

is order-isomorphic to

π

. Stanley and Wilf (see [10]) formulated independently a conjecture stating that the number of permutations of length n avoiding

π

grows at most exponentially. The conjecture was proven by Marcus and Tardos in [10]. Let

S_{n} (π)

denote the set of permutations of length n avoiding

π

. Let

S (π) = \cup_{n = 1}^{\infty} S_{n} (π)

.

Theorem 3

(Marcus and Tardos [10]). For every fixed permutation π, there exists a constant

c = c_{π}

such that

| S_{n} (π) | ⩽ c^{n}

.

This theorem together with Theorem 1 immediately implies the following.

Corollary 1.

Let π be any fixed permutation. Then, there exists an integer

k = k (π)

and an infinite family of k-ary words avoiding shuffle σ-squares, with

σ \in S (π)

.

We conclude this section with a demonstration of the Rosenfeld counting argument in the case of ordinary G-squares.

Theorem 4.

Let

α > 1

be a real number, and let

k ⩾ 4 α

be an integer. Let G be any graph on the set of k-ary words, in which every word of length n has at most

α^{n}

neighbors. Then, there exist arbitrarily long k-ary G-square-free words. Moreover, the number of such words of length n is at least

{(2 α)}^{n}

.

Proof.

Denote by

H_{n}

the set of all k-ary words not containing any G-square as a factor. Let

| H_{n} | = h_{n}

. We prove by induction that

h_{n + 1} ⩾ (2 α) \cdot h_{n},

for all

n ⩾ 1

. It is not hard to confirm that the inequality holds for

n = 1

. So, assuming that it holds for all

2 ⩽ n ⩽ N - 1

, we prove that it also holds for

n = N

.

Let

F

denote the set of all k-ary words of length

N + 1

that are notG-square-free, but whose prefix of length N is G-square-free. In other words, if

U \in F

and

U = u_{1} u_{2} \dots u_{N + 1}

, then the word

U^{'} = u_{1} u_{2} \dots u_{N}

is G-square-free, but there is at least one suffix of U that is a G-square. Clearly, we have

h_{N + 1} ⩾ k \cdot h_{n} - | F | .

Let

F_{t}

, with

1 ⩽ t ⩽ (N + 1) / 2

, denote the family of all words in

F

having a G-square at the last

2 t

positions. Every word

U \in F_{t}

has the form

U = P X Y

, where

X Y

is a G-square of length

2 t

, with X joined to Y in the graph G. Clearly, the word

P X

is a G-square-free word of length

N + 1 - t

, which can be extended to a word in

F

by appending at most

α^{t}

words Y of length t (by the assumption on the graph G). It follows that

| F_{t} | ⩽ α^{t} \cdot h_{N + 1 - t},

for all

1 ⩽ t ⩽ (N + 1) / 2

. On the other hand, by the inductionassumptions, we have

h_{N} ⩾ {(2 α)}^{j} \cdot h_{N - j},

for each

1 ⩽ j ⩽ N - 1

. Hence, taking

j = t - 1

, we obtain

| F_{t} | ⩽ α^{t} \cdot \frac{1}{{(2 α)}^{t - 1}} \cdot h_{N} = \frac{α}{2^{t - 1}} \cdot h_{N} .

Since

| F | ⩽ \sum_{t ⩾ 1} | F_{t} |

, we obtain

h_{N + 1} ⩾ k \cdot h_{N} - \sum_{t ⩾ 1} | F_{t} | ⩾ k \cdot h_{N} - α \cdot h_{N} \cdot \sum_{t ⩾ 1} \frac{1}{2^{t - 1}} ⩾ (k - 2 α) \cdot h_{N} ⩾ (2 α) \cdot h_{N},

where the last inequality follows from the assumption

k ⩾ 4 α

. This completes the proof. □

To illustrate the aforementioned theorem, consider the following scenario. Let

A

= {1, 2, 3, 4, 5, 6, 7, 8} be an alphabet with eight letters. For every

n ⩾ 1

, let

π_{n}

be an arbitrary involution of the alphabet

A

, i.e., a permutation, which is a product of disjoint transpositions. We may assume that

π_{n}

does not have fixed points, so it can be described as a matching, i.e., a set of disjoint pairs of letters. For instance,

π_{1}

= {12, 34, 56, 78},

π_{2}

= {15, 26, 37, 48}, and

π_{3}

= {18, 27, 36, 45}, are three exemplary involutions. It should be stressed that, for every n, we may choose an involution

π_{n}

completely at ease.

Two words of length n are related if one of them can be obtained from the other by choosing a subset of positions and applying the involution

π_{n}

to the letters occupying these positions. For instance, using the above-mentioned three involutions

π_{1}

,

π_{2}

, and

π_{3}

, we can see that the word 1 is related to two words, 1 and 2, the word 12 is related to four words, 12, 52, 16, 56, while the word 123 has the following eight siblings:

123, 823, 173, 126, 873, 826, 176, 876.

A word of the form

U W

is an involutive square if U and W are related under the involution

π_{n}

. Now, by Theorem 4, we obtain that there exist arbitrarily long words over

A

not containing any involutive squares. Indeed, for every word U of length n, there are exactly

2^{n}

related words, so the assertion follows by taking

α = 2

.

3. The Cutting Distance between Words

In this section, we focus on shuffle squares in the context of cutting words. Let U and W be two distinct words that are anagrams of each other, i.e., every letter occurs the same number of times in both words. Then, it is possible to cut the word U into pieces that can be rearranged to give the word W. For instance, the word below can be cut into three pieces, b|el|ow, from which we may compose the word elbow, (el|b|ow). It is not hard to check that the number two (of cuts needed to transform below to elbow) is actually minimal. We may thus say that the cutting distance between these two words is equal to 2.

More formally, for any pair of distinct anagrams U and W, there must exist a factorization

U = U_{1} U_{2} \dots U_{q + 1}

,

q ⩾ 1

, and a permutation

τ = a_{1} a_{2} \dots a_{q + 1}

such that

W = U_{a_{1}} U_{a_{2}} \dots U_{a_{q + 1}}

. We call the minimum number q in such a factorization the cutting distance between words U and W and denote it as

q = cut (U, W)

. We may extend this definition to pairs of words

(U, W)

for which such a cutting is not possible. We, then, adopt the convention that

cut (U, W) = \infty

. Notice also that

cut (U, W) = cut (W, U)

and

cut (U, W) = 0

if and only if

U = W

. Slightly less obvious is the triangle inequality, but it is also not hard to verify it, as demonstrated below. So, the cutting distance satisfies all three requirements from the definition of a metric, making the set of all finite words a metric space.

Proposition 1.

The cutting distance

cut (U, W)

is a metric on the set of all finite words.

Proof.

As noticed above, it remains to demonstrate that every triple of words

(U, W, X)

satisfies the triangle inequality:

cut (U, W) ⩽ cut (U, X) + cut (X, W) .

(9)

The case when one of the the two distances on the right-hand side is infinite is clear. So, assume without loss of generality that

cut (U, X) = p

,

cut (X, W) = q

, and

p ⩽ q

. Then, the word X has two factorizations,

X = X_{1} X_{2} \dots X_{p + 1}

and

X = Y_{1} Y_{2} \dots Y_{q + 1}

, whose parts

X_{i}

and

Y_{j}

can be rearranged to give U and W, respectively. Assuming the worst-case scenario (there is no pair of factors

(X_{i}, Y_{j})

starting or ending at the same position), we obtain a more-fragmented factorization

X = Z_{1} Z_{2} \dots Z_{p + q + 1}

such that each

X_{i}

and each

Y_{j}

is a product of some number of consecutive factors

Z_{i}

. Therefore, one may produce the word U, as well as the word W out of these factors

Z_{i}

by appropriate substitutions. This proves that

cut (U, W) ⩽ p + q

, which completes the proof. □

Given a set of words

F

and a word U, one may define the cutting distance between these objects as

cut (U, F) = \min {cut (U, X) : X \in F}

. For instance, if

F

denotes the family of shuffle squares, then

cut (U, F)

is the least number of cuts needed to turn the word U into a shuffle square.

Let

S_{k}

denote the set of all shuffle squares over a k-letter alphabet. Using this terminology, we may now restate Conjecture 2 as follows.

Conjecture 3.

Every k-ary tangram T satisfies

cut (T, S_{k}) ⩽ c_{k}

, for some finite constant

c_{k}

depending only on k.

Let us denote by

s (k, n) = \max {cut (T, S_{k}) : T

is a k-ary tangram of lenght n} the maximum of all possible cutting distances between shuffle squares and k-ary tangrams of fixed length n. The above conjecture states that

s (k) = \sup {s (k, n) : n \in N}

is finite for every

k ⩾ 2

. Based on computer experiments and known results, we dare to state the following (risky) conjecture.

Conjecture 4.

We have

s (2) = 1

and

s (k) = k

for all

k ⩾ 3

.

Below, we collect some (weak) evidence in favor of the above statement. Firstly, we checked using a computer that there is no counterexample to the equality

s (2) = 1

among binary tangrams of a length up to 20.

Proposition 2.

Every binary tangram T of a length at most 20 satisfies

cut (T, S_{2}) ⩽ 1

, i.e.,

s (2, n) ⩽ 1

, for all

n ⩽ 20

.

Let us stress, however, that we do not even know if

s (2)

is finite. But, even if is not, then the speed of growth of the function

s (2, n)

would be interesting to study.

One theoretical fact supporting our belief is the famous necklace-splitting theorem of Goldberg and West [14] (see also [15,16]). The theorem states that every necklace (a word) with an even number of beads in each of k kinds can be fairy split between two thieves by cutting it at no more than k places. This means that the resulting pieces of the split necklace can be partitioned into two collections so that every kind of bead is equally represented in both collections. It is easy to see that this result is optimal by considering necklaces with beads of the same kind grouped into connected segments.

We state the necklace-splitting theorem below in the terminology of anagrams, squares, and the cutting distance. Denote by

A_{k}

the family of all k-ary anagram squares, i.e., words of the form

U W

, with U being an anagram of W. We may also define the analogous functions

a (k, n) = \max {cut (T, A_{k}) : T is a k - ary \tan gram of length n}

and

a (k) = \sup {a (k, n) :

n \in N}

.

Theorem 5

(Goldberg and West [14]). Every binary tangram B satisfies

cut (B, A_{2}) ⩽ 1

. For every

k ⩾ 3

, every k-ary tangram T satisfies

cut (T, A_{k}) ⩽ k

. As a consequence,

a (2) = 1

and

a (k) = k

, for all

k ⩾ 3

.

Perhaps only the binary case should be explained. The original necklace-splitting theorem asserts that two cuts are sufficient for every binary tangram B, while we wrote

cut (B, A_{2}) ⩽ 1

. The reason is that, in the necklace-splitting problem, we must obtain two separate anagrams, while the cutting distance is measured to one word that is an anagram square. So, one cut can always be saved. For instance, consider the word 0000111111. In the necklace problem, we need two cuts, namely 00|00111|111. But, to make an anagram square, only one of them is sufficient; the first cut, 00|00111111, gives the word 0011111100, while the second cut, 0000111|111, gives the word 1110000111, and both resulting words are anagram squares.

This cut saving is not always possible over larger alphabets. Curiously, there also exist examples of tangrams with correspondingly larger cut distances to shuffle squares.

Proposition 3.

We have

s (3) ⩾ 3

.

Proof.

We show that there exists a ternary tangram T satisfying

cut (T, S_{3}) = 3

. Consider the word T = 011120002221. It can be cut by three cuts as

0111 | 20 | 00222 | 1

and rearranged to the word S = 00222|20|0111|1. Since every run in S is of an even length, S is a shuffle square with the following partition:

002222001111

.

Now, by running a computer program, we checked that two cuts are not sufficient to make a shuffle square out of T. Hence,

cut (T, S_{3}) = 3

. □

By computer experiments, we found that the word 011120002221 is the shortest ternary tangram with the cutting distance of three to shuffle squares. There are only two such tangrams of length 12 (up to renaming the letters). Table 1 contains a collection of data including the number of ternary tangrams of a given length and a given cutting distance to shuffle squares.

Table 1. Number of ternary tangrams (up to renaming the letters) with a given cutting distance to a shuffle square, for every length between 6 and 14.

In another experiment, we considered the following variant of shuffle squares introduced in [1]. A reverse of a word

U = u_{1} u_{2} \dots u_{n}

,

u_{i} \in A

, is a word

\tilde{U} = u_{n} u_{n - 1} \dots u_{1}

, i.e., U written backwards. A reverse square is a word of the form

U \tilde{U}

; for instance, the word rattar is an example of a reverse square. Analogously, a word that can be split into two subwords that are the reverse of each other is called a reverse shuffle square. Denote by

R_{k}

the set of all k-ary reverse shuffle squares.

As before, we are interested in the cutting distance between tangrams and reverse shuffle squares. Thus, we define

r (k, n) = \max {cut (T, R_{k}) : T is a k - ary \tan gram of length n}

and

r (k) = \sup {r (k, n) : n \in N}

.

It is known (and easy to prove; see [1]) that every reverse shuffle square must be at the same time an anagram square. In the binary case, the opposite is also true; a binary tangram is a reverse shuffle square if and only if it is an anagram square (see [1]). By these remarks and by the necklace-splitting theorem, we obtain immediately the following.

Proposition 4.

Every binary tangram B satisfies

cut (B, R_{2}) ⩽ 1

. As a consequence,

r (2) = 1

.

Proof.

Let B be a binary tangram. By the necklace-splitting theorem, we may write

B = U X V

, where X is an anagram of

U V

. So, cutting the word B as

B = U | X V

and moving U at the end give an anagram square

S = X V U

. This completes the proof, since S is also a reverse shuffle square. □

It is natural to wonder what happens for larger alphabets. The discussion above together with computer experiments suggests that the following analog to Conjecture 3 may be true.

Conjecture 5.

Every k-ary tangram T satisfies

cut (T, R_{k}) ⩽ d_{k}

, for some finite constant

d_{k}

depending only on k.

This time, we know that the statement of the conjecture is true for

k = 2

, but again, for

k = 3

, even the finiteness of

r (3)

is not obvious. Nevertheless, analogous to the case of shuffle squares, we formulate the following (risky) conjecture.

Conjecture 6.

For every

k ⩾ 3

, we have

r (k) = k

.

One tiny fact in favor of this supposition is given below.

Proposition 5.

We have

r (3) ⩾ 3

.

Proof.

We show that there exists a ternary tangram T satisfying

cut (T, R_{3}) = 3

. Consider the word T = 0001201222. It can be cut by three cuts as

0 | 0 | 01 | 201222

and rearranged to the word S = 0|201222|01|0. The word S is a reverse shuffle square, which can be seen in the decomposition:

0201222010

.

Now, by running a computer program, we checked that two cuts are not sufficient to make a reverse shuffle square out of T. Hence,

cut (T, R_{3}) = 3

. □

By computer experiments, we found that the word 0001201222 is the unique (up to renaming the letters) shortest ternary tangram with the cutting distance of three to reverse shuffle squares. Table 2 contains a collection of data including the number of ternary tangrams of a given length and a given cutting distance to reverse shuffle squares.

Table 2. Number of ternary tangrams (up to renaming the letters) with the given cutting distance to a reverse shuffle square, for every length between 6 and 14.

4. Final Comments

Let us conclude the paper with some comments on possible directions for future research. Of course, the most intriguing is the resolution of the stated conjectures and providing the answer to the following question: How many cuts are needed to turn a tangram into a shuffle (or reverse shuffle) square?

A similar question can be asked for other types of squares or actually any imaginable sets of words. For instance, consider the set

D

of all decimal words, i.e., words over the alphabet {0, 1, 9}, representing all positive integers written in the usual decimal notation. Let P = {2, 3, 5, 6, 11, 13, 17, 19, 23, …} be the subset of prime words representing all the prime numbers. How many cuts are needed to turn a given decimal word into a prime word? Is it true that there is an absolute constant c such that, for any

U \in D

, we have either

cut (U, P) = \infty

or

cut (U, P) ⩽ c

?

Another possible direction is to study how much the structure of a constraint graph affects the avoidability of the abstract squares. In our main result, we presented one property (an exponential upper bound for vertex degrees corresponding to words of given length) guaranteeing avoidability. Perhaps other graph-theoretic properties may be of use here. For which classes of graphs are the corresponding abstract squares avoidable? Is it true, for instance, that G-squares are avoidable for any planar graph G?

Finally, besides shuffle squares, one may consider shuffle cubes, shuffle bi-squares, or any shuffle m-powers, for arbitrary

m ⩾ 2

. For instance, the word 010011110100 is a shuffle cube consisting of three shuffled copies of the word 0110. In [5], Müller proved that shuffle cubes are avoidable over a six-letter alphabet. Notice that a shuffle cube may not contain any shuffle square as a factor (unlike for ordinary powers), so this result is independent of the theorem in [8] establishing the avoidability of shuffle squares over a six-letter alphabet. In [7], we proved that, for every

m ⩾ 2

, shuffle m-powers are avoidable on an alphabet of size

(1 + o (1)) m

. Is it true that, for a sufficiently large m, one may avoid shuffle m-powers on a binary alphabet? Is there a finite number k such that all shuffle m-powers,

m ⩾ 2

, are simultaneously avoidable on an alphabet of size k?

Author Contributions

Conceptualization, J.G. and B.P.; methodology, J.G., B.P. and M.P.; software, M.P.; formal analysis, J.G. and B.P.; investigation, J.G. and B.P.; data curation, J.G., B.P. and M.P.; writing—original draft preparation, J.G. and B.P.; writing—review and editing, J.G., B.P. and M.P.; supervision, J.G., B.P. and M.P.; project administration, J.G., B.P. and M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

We would like to thank the anonymous referees for the careful reading of the manuscript and many valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Henshall, D.; Rampersad, N.; Shallit, J. Shuffling and Unshuffling. Bull. Eur. Assoc. Theor. Comput. Sci. 2012, 107, 131–142. [Google Scholar]
He, X.; Huang, E.; Nam, I.; Thaper, R. Shuffle Squares and Reverse Shuffle Squares. arXiv 2021, arXiv:2109.12455. [Google Scholar]
Thue, A. Über unendliche Zeichenreihen. Nor. Vid. Selsk. Skr. Mat. Nat. Kl. 1906, 7, 1–22, reprinted in Sel. Math. Pap. Axel Thue; Nagell, T., Ed.; Universitetsforlaget: Oslo, Norway, 1977; pp. 139–158. [Google Scholar]
Currie, J.D. Shuffle squares are avoidable. 2014; Unpublished manuscript. [Google Scholar]
Müller, M. Avoiding and Enforcing Repetitive Structures in Words. Ph.D. Thesis, University of Kiel, Kiel, Germany, 2015. [Google Scholar]
Guégan, G.; Ochem, P. A short proof that shuffle squares and 7-avoidable. RAIRO Theor. Inform. Appl. 2016, 50, 101–103. [Google Scholar] [CrossRef]
Grytczuk, J.; Kozik, J.; Zaleski, B. Avoiding Tight Twins in Sequences by Entropy Compression; Mittag-Leffler Institute: Djursholm, Sweden.
Bulteau, L.; Jugé, V.; Vialette, S. On shuffled-square-free words. Theor. Comput. Sci. 2023, 941, 91–103. [Google Scholar] [CrossRef]
Grytczuk, J.; Pawlik, B.; Pleszczyński, M. Variations on shuffle squares. arXiv 2023, arXiv:2308.13882. [Google Scholar]
Marcus, A.; Tardos, G. Excluded permutation matrices and the Stanley–Wilf conjecture. J. Comb. Theory Ser. A 2004, 107, 153–160. [Google Scholar] [CrossRef]
Alon, N.; Spencer, J. The Probabilistic Method, 4th ed.; Wiley: Hoboken, NJ, USA, 2016. [Google Scholar]
Bosek, B.; Grytczuk, J.; Nayar, B.; Zaleski, B. Nonrepetitive List Colorings of the Integers. Ann. Comb. 2021, 25, 393–403. [Google Scholar] [CrossRef]
Rosenfeld, M. Another Approach to Non-Repetitive Colorings of Graphs of Bounded Degree. Electron. J. Comb. 2020, 27, P3.43. [Google Scholar] [CrossRef]
Goldberg, C.H.; West, D.B. Bisection of circle colorings. SIAM J. Algebr. Discret. Methods 1985, 6, 93–106. [Google Scholar] [CrossRef]
Alon, N. Splitting necklaces. Adv. Math. 1987, 63, 247–253. [Google Scholar] [CrossRef]
Alon, N.; West, D. The Borsuk-Ulam theorem and bisection of necklaces. Proc. Am. Math. 1986, 98, 623–628. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

	6	8	10	12	14
Cuts	6	8	10	12	14
0	5	63	578	4817	38,933
1	7	107	1167	11,227	103,272
2	3	40	460	5074	52,955
3	-	-	-	2	35
∑	15	210	2205	21,120	195,195

	6	8	10	12	14
Cuts	6	8	10	12	14
0	5	57	349	3758	28,361
1	7	113	1186	11,366	102,742
2	3	40	532	5964	63,650
3	-	-	1	32	442
∑	15	210	2205	21,120	195,195

More Variations on Shuffle Squares

Abstract

1. Introduction

2. Avoiding Abstract Squares

3. The Cutting Distance between Words

4. Final Comments

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics