Abstract
A square is a word of the form , where X is any finite non-empty word. For example, couscous is a square. A shuffle square is a finite word that can be formed by self-shuffling a word; for instance, the Spanish word acaece is a shuffle square but not a square. We discuss both known and novel enumerative problems related to shuffle squares, with a focus on the number of distinct roots of binary shuffle squares. We introduce the term explicit shuffle squares, propose several conjectures, and present some preliminary results towards their resolution. Our discussion is supported by computational experiments. In particular, we determine the exact number of distinct roots of binary shuffle squares with a length of up to 24. On the other hand, we show that every non-constant binary word of length n generates at least n different shuffle squares.
1. Introduction
Symmetry plays a significant role in the combinatorics of words, as it provides a deeper understanding of structural properties and simplifies the enumeration of patterns within sequences. By examining symmetries in word structures, we can identify invariants under specific transformations, such as reversals, rotations, or permutations of letters. In this paper, we focus on shuffle squares—a type of word characterized by a hidden symmetric structure.
Throughout this paper, we consider all words to be finite. Let be a fixed alphabet, and let W be a word made from . The number of letters in W is called the length of W. Words P, U, and S (each possibly empty) such that are called factors of W. Moreover, P is called a prefix of W, and S is called a suffix of W.
A tangram is a word in which every letter occurs an even number of times. A special type of tangram is a square, i.e., a word W such that for some word X. If , then X is called the root of the square W, and we say that X generates W. For example, the word generates the square .
A shuffle square is a word W obtained by self-shuffling some finite word X; here, X is, again, called a root of the shuffle square W, and we say that X generates W. For instance, the word is a self-shuffle of . This can be visualized by coloring the letters in the shuffle square and decomposing it into two copies of a root:
More formally, for , the word is a shuffle square if there exist index sets
such that
, and for .
Clearly, every shuffle square is a tangram.
Let be a letter and k be a positive integer. We use the notation for a word that is a concatenation of k copies of :
The word is called a constant word of length k.
Words that are squares were defined by Thue in 1906 in a study [1] that is considered today to be a pioneering paper in the field of combinatorics on words [2]. Shuffle squares are relatively new—introduced by Henshall, Rampersad, and Shallit [3] in 2012. Henshall et al. focus on the enumeration of shuffle squares of a fixed length over a given alphabet. Let us note that counting squares with a given length over a fixed alphabet is very easy; the number of squares with length over a k-letter alphabet is for every positive n and k. However, to this day, we only know the quantity of binary shuffle squares with length for , as shown in Table 1 for the OEIS sequence A191755 [4].
Table 1.
The number, , of distinct binary shuffle squares of length for small values of n [4].
A general formula for the function f or any of its generalizations over larger alphabets is not known. Some results from the study of these functions were obtained by He, Huang, Nam, and Thaper [5]; they demonstrated that the number of binary shuffle squares is at least and that the number of shuffle squares over a k-letter alphabet is
where is the length under consideration. He et al. also proposed an intriguing conjecture: almost every binary tangram is a shuffle square—meaning that the probability of selecting a shuffle square uniformly at random from all tangrams of length tends to 1 as . A related result was obtained by Axenovich, Person, and Puzynina [6], who showed that every binary word of length n is a shuffle square up to the deletion of letters; the precise asymptotic formula for the maximum number of deletable letters is not known. Most recently, Basu and Ruciński showed [7] that there exists a ternary word requiring the deletion of at least letters to become a shuffle square. Moreover, some properties of binary shuffle squares were recently presented by Fici [8], and some generalizations of shuffle squares were introduced by Grytczuk, Pawlik, and Pleszczyński [9,10].
Buss and Soltys [11] proved that recognizing shuffle squares is NP-complete for every sufficiently large alphabet. Recently, Bulteau and Vialette [12] improved upon this result and showed that this statement also holds for a binary alphabet.
In this paper, we focus on certain heuristics related to the combinatorial properties of binary shuffle squares, specifically investigating the relationship between the number of shuffle squares generated by given roots and, conversely, the number of distinct roots of a given shuffle square.
Note that the constant word generates only one shuffle square—, while the word generates three shuffle squares: , and . The exemplary decompositions of obtained shuffle squares are
On the other hand, the shuffle square has only one root, namely , while has two distinct roots: and —it can be demonstrated with the following decompositions:
We investigate the minimal and maximal results in both scenarios, as well as propose some observations and conjectures.
2. Searching of Shuffle-Squares for Given Roots
From now on, we consider only words over a binary alphabet . Let be the letter different than , i.e., and .
Let us notice that every binary word generates a shuffle square.
We characterise the number of shuffle squares generated by any binary word with exactly one letter :
Proposition 1.
Let be the non-negative integers. The word generates exactly different shuffle squares.
Proof.
Let us notice that a shuffle square S generated by the word has the prefix and the suffix . Thus,
where and are words of lengths and , respectively.
Notice that must contain exactly one , as the first occurrence of in S cannot appear after more than occurrences of . Similarly, must also contain exactly one —the second occurrence of cannot appear before more than trailing occurrences of . Therefore, the number of possible positions for the first is , and the number of possible positions for the second is . Consequently, we can generate distinct words from the word .□
Of course, a constant word generates only one shuffle square. We show that no other words have this property in the following proof.
Proposition 2.
Every non-constant binary word W generates at least two different shuffle squares.
Proof.
Without a loss of generality, let us assume that is the prefix of W. Then, there exist and a (possibly empty) word S such that
Let us note that W generates both shuffle squares
and
which are different because each of them contains a different prefix of length . □
Now, we state a more precise result in the following.
Theorem 1.
Every non-constant binary word W generates at least n different shuffle squares.
Proof.
It is easy to verify manually that this statement holds if the word W has a length of 3 or less. We use induction on the length, n, of the root of W.
Assume that all non-constant words of length n generate at least n distinct shuffle squares. We need to show that all non-constant words of length generate at least distinct shuffle squares. Let us note that a non-constant word of length can be obtained either by adding a letter to the end of the constant word or by adding the letter or to the end of the non-constant word V.
In the first case, the resulting word generates shuffle squares according to Proposition 1. Therefore, we only need to investigate the second case. Without a loss of generality, we can assume that V has the suffix (the proof for the case of suffix 0 is analogous to the one presented below).
We need to show that the numbers of shuffle squares generated by the words and are at least one greater than the number of shuffle squares generated by the word V.
Since V is a non-constant word, there exists a word P and a number s () such that
Let us denote the n distinct shuffle squares generated by V as
Case 1: Root .
Note that the words
are pairwise distinct and generated by . Hence, generates at least n shuffle squares.
Now, observe that
so generates the word
which is distinct from every word , as it has a different suffix of length 2. Thus, generates at least one more shuffle square than V.
Case 2: Root .
Analogously to the previous case, let us note that the words
are pairwise distinct and generated by . Hence, generates at least n shuffle squares.
We have
so generates the word
which is distinct from every word , as it has a different suffix of length . Thus, generates at least one more shuffle square than V, which completes the proof. □
Let us note that the lower bound n given by Theorem 1 is optimal: according to Proposition 1, for every positive n, the word generates exactly n different shuffle squares.
On the other hand, the upper bound is not known, but in Table 2, we present known terms from the sequence of the maximal number of shuffle squares for given lengths of binary words, obtained by Shallit in 2020 [13].
Table 2.
The maximal distinct binary shuffle number, , of distinct shuffle squares for binary generators of length n.
In Table 3, the values of k are presented for which there exists a binary word of length n generating exactly k different shuffle squares for and .
Table 3.
The existence of binary words of length n generating exactly k distinct shuffle squares for and .
In Table 4, we present words of length n for that generates, by self-shuffle, the most shuffle squares. Numbers of generated shuffle squares by these words are presented in Table 2.
Table 4.
Words that generate the maximal number, , of distinct shuffle squares for binary generators of length n.
3. Searching for Roots for Given Shuffle Squares
Let us note that for every positive n, the constant word has only one root, , which means that over every alphabet, there are arbitrarily long words with only one root. On the other hand, the number of roots for a single word can be arbitrarily large:
Proposition 3.
Let us consider words over a fixed non-empty alphabet . For every natural number n, let be the maximal number of roots of a single word with length . Then, .
Proof.
Clearly, this is sufficient to prove the statement for the binary alphabet.
Let us note that, for , there exist words of length that have at least two roots. Examples of such words are provided in Table 5. If the word W has a root and the word V has a root , then the word has the root . Thus, if the sets of roots for the words W and V contain k and l elements, respectively, the set of roots of the word contains at least distinct elements (note that the roots are unique in this summation).
Table 5.
Exemplary binary words of length with multiple different roots for .
Every natural number can be decomposed as a sum
such that
Therefore, for every such m, we can construct a word W of length m such that
where the words are taken from Table 5. Thus, the number of roots, , of the word W satisfies
which completes the proof. □
The results presented in Table 6 suggest that the maximal number of different roots for a single binary shuffle square W increases with the length, , of W.
Table 6.
The maximum number, , of roots of binary shuffle squares of length for small values of n.
A shuffle square is called explicit if it has only one root. There is no known general formula for the number of explicit shuffle squares with a given length. The number, , of binary explicit shuffle squares for small lengths, , is shown in Table 7.
Table 7.
The number, , of explicit shuffle squares of length for small values of n.
The first step towards counting the binary explicit shuffle squares is the following.
Theorem 2.
Let W be a binary shuffle square of length with a prefix 0. Let . If W belongs to one of the following classes, then W is an explicit shuffle square.
- 1.
- ;
- 2.
- , where ;
- 3.
- , where ;
- 4.
- , where .
Proof.
- (1)
- The constant shuffle square can only be obtained using a constant root.
- (2)
- Of course, every root contains a prefix . Since the number of 0s and 1s in every root is a and b, respectively, then we have that the only possible root is .
- (3)
- Every root contains a prefix and a suffix and has to contain exactly one 1. The length of a root is , so the only possibility is .
- (4)
- Similarly, like in case (3), we have that the only possible root is .
□
Classes (3) and (4) in Theorem 2 suggest that all shuffle squares of the form might be explicit. However, this is not true; the word 0000100100 has two roots: 00100 and 00010.
Moreover, there exist binary explicit shuffle squares that do not belong to any of the classes mentioned in the above theorem. The shortest of such words is 010111.
Using values from Table 1 and values from Table 7 multiplied by 2 (Table 1 contains the total number of distinct binary shuffle squares, and Table 7 contains the number of explicit shuffle squares that start with the letter ), we can define the function as follows:
Values of of length for small values of n are shown in Table 8.
Table 8.
The quotient, , of the number of explicit shuffle squares and the number of shuffle squares of length for small values of n.
Using values from Table 8 and Proposition 3, we state the following.
Conjecture 1.
Indeed, the values in Table 8 decrease as n increases, which might suggest that the conjecture is true. However, from Theorem 2, we know that for every , we can create at least explicit shuffle squares, so this topic needs further investigation.
4. Summary
The presented study explores the combinatorial properties of binary shuffle squares, focusing on their number of roots and the generation of shuffle squares from given words. It was demonstrated that every non-constant word generates at least n shuffle squares, where n is the word’s length, and classes of explicit shuffle squares with unique roots were characterized. A conjecture was proposed suggesting that the ratio of explicit shuffle squares to all shuffle squares approaches zero as their length increases, warranting further investigation. Future research should aim to extend these findings to larger alphabets, refine asymptotic analyses, develop efficient algorithms for recognizing and analyzing shuffle squares, and conduct computational experiments for larger values of n. These efforts could enhance our understanding of shuffle squares and their applications in mathematics and computer science.
Author Contributions
Conceptualization, D.D. and B.P.; methodology, B.P.; software, D.D.; validation, D.D. and B.P.; formal analysis, B.P.; writing—original draft preparation, and B.P.; writing—review and editing, D.D. and B.P. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
Data are contained within the article.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Thue, A. Über unendliche Zeichenreihen. Norske vid. Selsk. Skr. Mat. Nat. Kl. 1906, 7, 1–22, Reprinted in Selected Mathematical Papers of Axel Thue; Nagell, T., Ed.; Universitetsforlaget: Oslo, Norway, 1977; pp. 139–158. [Google Scholar]
- Berstel, J.; Perrin, D. The origins of combinatorics on words. Eur. J. Comb. 2007, 28, 996–1022. [Google Scholar] [CrossRef]
- Henshall, D.; Rampersad, N.; Shallit, J. Shuffling and Unshuffling. Bull. EATCS 2012, 107, 131–142. [Google Scholar]
- Available online: https://oeis.org/A191755 (accessed on 2 January 2025).
- He, X.; Huang, E.; Nam, I.; Thaper, R. Shuffle Squares and Reverse Shuffle Squares. Eur. J. Comb. 2024, 116, 103883. [Google Scholar] [CrossRef]
- Axenovich, M.; Person, Y.; Puzynina, S. A regularity lemma and twins in words. J. Combin. Theory Ser. A 2013, 120, 733–743. [Google Scholar] [CrossRef]
- Basu, A.; Ruciński, A. How far are ternary words from shuffle squares? Ars Math. Contemp. 2024. [Google Scholar] [CrossRef]
- Fici, G. The Shortest Interesting Binary Words. arXiv 2024, arXiv:2412.21145. [Google Scholar]
- Grytczuk, J.; Pawlik, B.; Pleszczyński, M. More Variations on Shuffle Squares. Symmetry 2023, 15, 1982. [Google Scholar] [CrossRef]
- Grytczuk, J.; Pawlik, B.; Pleszczyński, M. Variations on shuffle squares. arXiv 2024, arXiv:2308.13882. [Google Scholar]
- Buss, S.; Soltys, M. Unshuffling a square is NP-hard. J. Comput. Syst. Sci. 2014, 80, 766–776. [Google Scholar] [CrossRef]
- Bulteau, L.; Vialette, S. Recognizing binary shuffle squares is NP-hard. Theor. Comput. Sci. 2020, 806, 116–132. [Google Scholar] [CrossRef]
- Available online: https://oeis.org/A331850 (accessed on 2 January 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).