Lower Bounds, and Exact Enumeration in Particular Cases, for the Probability of Existence of a Universal Cycle or a Universal Word for a Set of Words

Chen, Herman Z. Q.; Kitaev, Sergey; Sun, Brian Y.

doi:10.3390/math8050778

Open AccessArticle

Lower Bounds, and Exact Enumeration in Particular Cases, for the Probability of Existence of a Universal Cycle or a Universal Word for a Set of Words

by

Herman Z. Q. Chen

¹

,

Sergey Kitaev

^2,*

and

Brian Y. Sun

³

¹

School of Statistics and Data Science, Nankai University, Tianjin 300071, China

²

Department of Mathematics and Statistics, University of Strathclyde, Glasgow G1 1XH, UK

³

College of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830046, China

^*

Author to whom correspondence should be addressed.

Mathematics 2020, 8(5), 778; https://doi.org/10.3390/math8050778

Submission received: 6 April 2020 / Revised: 8 May 2020 / Accepted: 9 May 2020 / Published: 12 May 2020

(This article belongs to the Section E1: Mathematics and Computer Science)

Download Versions Notes

Abstract

:

A universal cycle, or u-cycle, for a given set of words is a circular word that contains each word from the set exactly once as a contiguous subword. The celebrated de Bruijn sequences are a particular case of such a u-cycle, where a set in question is the set

A^{n}

of all words of length n over a k-letter alphabet A. A universal word, or u-word, is a linear, i.e., non-circular, version of the notion of a u-cycle, and it is defined similarly. Removing some words in

A^{n}

may, or may not, result in a set of words for which u-cycle, or u-word, exists. The goal of this paper is to study the probability of existence of the universal objects in such a situation. We give lower bounds for the probability in general cases, and also derive explicit answers for the case of removing up to two words in

A^{n}

, or the case when

k = 2

and

n \leq 4

.

Keywords:

universal cycle; u-cycle; universal word; u-word; de Bruijn sequence

1. Introduction

A universal cycle, or u-cycle, for a given set S with ℓ words of length n over an alphabet A is a circular word

u_{0} u_{1} \dots u_{ℓ - 1}

that contains each word from S exactly once (and no other word) as a contiguous subword

u_{i} u_{i + 1} \dots u_{i + n - 1}

for some

0 \leq i \leq ℓ - 1

, where the indices are taken modulo ℓ. The notion of a universal cycle was introduced in [1]. The celebrated de Bruijn sequences are a particular case of such a u-cycle, where a set in question is the set

A^{n}

of all words of length n over a k-letter alphabet A. A universal word, or u-word, for S is a (non-circular) word

u_{0} u_{1} \dots u_{ℓ + n - 2}

that contains each word from S exactly once as a contiguous subword

u_{i} u_{i + 1} \dots u_{i + n - 1}

for some

0 \leq i \leq ℓ - 1

. In this paper, we assume that

n \geq 2

and

k \geq 2

to make all of our definitions well-defined and to avoid trivialities.

There is a long series of research in the literature dedicated to the study of universal cycles and universal words for various sets of combinatorial structures. For example, see [2] and references therein. We note that the existence of a u-cycle trivially implies the existence of a u-word, but not vice versa. Indeed, if

u_{0} u_{1} \dots u_{ℓ - 1}

is a u-cycle for S then

u_{0} u_{1} \dots u_{ℓ - 1} u_{0} u_{1} \dots u_{n - 2}

is a u-word for S. In either case, solving problems on u-cycles and u-word is normally done through considering de Bruijn graphs. A de Bruijn graph

B (n, k)

consists of

k^{n}

nodes corresponding to words in

A^{n}

and its directed edges are

x_{1} x_{2} \dots x_{n} \to x_{2} \dots x_{n} x_{n + 1}

where

x_{i} \in A

for

i \in {1, \dots, n + 1}

. De Bruijn graphs are an important structure that is used in solving a variety of problems, e.g., in combinatorics on words [3] and genomics [4].

Let

G = (V, E)

be a directed graph. A directed path in G is a sequence

v_{1}, \dots, v_{t}

of distinct nodes such that there is an edge

v_{i} \to v_{i + 1}

for each

1 \leq i \leq t - 1

. Such a path is a Hamiltonian path if it contains all nodes in G. A closed Hamiltonian path (

v_{t} \to v_{1}

is an edge) is a Hamiltonian cycle. If G has a Hamiltonian cycle then G is Hamiltonian. It is well-known, and is not difficult to show, that

B (n, k)

is Hamiltonian, so any Hamiltonian cycle (resp., path) in

B (n, k)

corresponds to a u-cycle (resp., u-word) for

A^{n}

. For example, the cycle

00 \to 01 \to 11 \to 10 \to 00

in

B (2, 2)

corresponds to the u-cycle 0011, and we can also get a u-word 00110 from this.

The problem in question. Now, suppose that we remove

s < k^{n}

words from

A^{n}

where A is an alphabet of size k so that each word is equally likely to be removed. The resulting set S may, or may not have a u-cycle or a u-word. Let

P_{c} (n, k, s)

and

P_{w} (n, k, s)

be the probabilities of the events that S has a u-cycle and u-word, respectively. Then, a natural question is: What are

P_{c} (n, k, s)

and

P_{w} (n, k, s)

? Note that by definition, a u-cycle for S must cover at least n distinct words, and thus if

s > k^{n} - n

then

P_{c} (n, k, s) = 0

.

It is not difficult to see that if

s = 1

, or

s = k^{n} - 1

, then with probability 1 a u-word exists. Indeed, if

s = 1

then removing a word in

A^{n}

corresponds to removing a node in

B (n, k)

that turns a Hamiltonian cycle passing through it to a Hamiltonian path giving a u-word, while if

s = k^{n} - 1

then only one word remains and it is a u-word. Similarly, it is not difficult to see that if

s = 1

then with probability

1 / k^{n - 1}

a u-cycle exists. Indeed, if

s = 1

then one can only remove words of the form

x x \dots x

called loops, and there are k such words, while if

s = k^{n} - 1

then the only u-cycle of length n can be a loop.

In Table 1, Table 2 and Table 3 we present the values of

P_{c} (n, 2, s)

and

P_{w} (n, 2, s)

for

n = 2, 3, 4

obtained by Mathematica 11.3. Even though these tables were obtained by computer, it is possible to check them by hand for

n = 2, 3

by considering the existence of a Hamiltonian path in

B (n, 2)

. Moreover, in the case of

n = 4

, one can consider Eulerian cycles/paths (to be introduced below) in

B (3, 2)

and be also able to check Table 3 by hand.

Our results in this paper. In this paper, we not only provide lower bounds for

P_{c} (n, k, s)

and

P_{w} (n, k, s)

for any values of

n, k, s

(summarized in Table 4), but also give exact values in the case of

s = 2

in Theorem 8. For example, we will show that for

k \geq 3

and

n \geq 2

,

P_{w} (n, k, 2) = \frac{2 (2 k^{n} - 3 k + 1)}{k^{n - 1} (k^{n} - 1)} .

We remark that some of our proofs require rather subtle considerations, which tend to be more difficult in the case of the binary alphabet.

Preliminaries. In this paper,

B^{'} (n, k)

denotes the graph obtained from

B (n, k)

after removing s nodes, or s edges depending on the context.

A directed graph is strongly connected if there exists a directed path from any node to any other node. A directed graph is connected if for any pair of nodes a and b there exists a path in the underlying undirected graph. A trail in a directed graph G is a sequence

v_{1}, \dots, v_{t}

of nodes such that there is an edge

v_{i} \to v_{i + 1}

for each

1 \leq i \leq t - 1

and edges are not visited more than once. An Eularian trail in G is a trail that goes through each edge exactly once. A closed Eulerian trail is an Eulerian cycle. A directed graph is Eulerian (resp., semi-Eulerian) if it has an Eulerian cycle (resp., Eulerian trail). Let

d^{+} (v)

(resp.,

d^{-} (v)

) denote the out-degree (resp., in-degree) of a node v. A directed graph is balanced if

d^{+} (v) = d^{-} (v)

for each node v in the graph. The following result is well-known and is not hard to prove.

Theorem 1.

A directed graph G is semi-Eulerian if and only if at most one vertex v has

d^{+} (v) - d^{-} (v) = 1

, at most one vertex u has

d^{-} (u) - d^{+} (u) = 1

, every other vertex w has

d^{+} (w) = d^{-} (w)

, and G is connected. A graph is Eulerian if and only if it is balanced and (strongly) connected.

The line graph

L (G)

of a directed graph G is the directed graph whose vertex set corresponds to the edge set of G, and

L (G)

has an edge

e \to v

if in G, the head of e meets the tail of v. It is well-known, and not difficult to show, that

B (n, k) = L (B (n - 1, k))

, and thus a Hamiltonian path (resp., cycle) in

B (n, k)

corresponds to an Eulerian trail (resp., cycle) in

B (n - 1, k)

, and this property will be often used throughout this paper to show the existence of u-cycles and u-words.

Nodes of the form

x^{n}

, as well as edges of the form

x^{n} \to x^{n}

, are loops. Nodes of the form

y x^{n - 1}

are out-special and nodes of the form

x^{n - 1} y

are in-special. Out-special and in-special nodes together are special. The following theorem will be used by us in the paper multiple times.

Theorem 2

([5]). Let u and v be two distinct non-loop nodes in

B (n, k)

. Then, there exist k distinct node-disjoint paths from u to v if and only if u is not out-special and v is not in-special.

Organization of the paper. In Section 2 and Section 3 we provide the lower bounds for

P_{c} (n, k, s)

and

P_{w} (n, k, s)

in the cases of

k \geq 3

and

k = 2

, respectively. In Section 4 we give exact values of

P_{c} (n, k, 2)

and

P_{w} (n, k, 2)

, and in Section 5 we provide some concluding remarks.

2. The Case of the Alphabet of Size $K \geq 3$

Let

S (n, k, s) : = \sum (\binom{k}{s_{1}}) \prod_{i = 2}^{n - 2} (\binom{(i - 1) (\binom{k}{2})}{s_{i}}),

where the sum is taken over all

s_{1} + 2 s_{2} + \dots + (n - 2) s_{n - 2} = s

with

s_{i} \geq 0

and

1 \leq i \leq n - 2

. In the next theorem, we will obtain the following lower bounds for

k \geq 3

and

n \geq 3

:

P_{c} (n, k, s) \geq \frac{S (n, k, s)}{(\binom{k^{n}}{s})}

(1)

P_{w} (n, k, s) \geq \frac{1}{(\binom{k^{n}}{s})} (S (n, k, s) + \sum α (\binom{k}{s_{1}}) \prod_{i = 2}^{n - 2} (\binom{(i - 1) (\binom{k}{2})}{s_{i}})),

(2)

where

α = k^{n} - s + s_{1} - k + 1

and the sum is taken over all

s_{1} + 2 s_{2} + \dots + (n - 2) s_{n - 2} = s - 1

with

s_{i} \geq 0

and

1 \leq i \leq n - 2

. The case of

n = 2

and

k \geq 3

will be considered in Theorem 4 below.

Theorem 3.

For

k, n \geq 3

, the lower bounds in (1) and (2) hold.

Proof.

Assume

k \geq 3

. We observe that removing all i-cycles in

B (n - 1, k)

,

1 \leq i \leq n - 2

, of the binary form, that is, involving only nodes

x_{1} \dots x_{n - 1}

for

x_{j} \in {x, y}

for

1 \leq j \leq n

and

x, y \in {1, \dots, k}

, results in a strongly connected and balanced graph

B^{'} (n - 1, k)

. Indeed, clearly

B^{'} (n - 1, k)

is balanced. To justify that

B^{'} (n - 1, k)

is strongly connected, we need to show that for any edge

e = A \to B

belonging to a removed binary cycle, there is a directed path

P_{A B}

from A to B which does not go through any other edge from the removed binary cycles. Then, in a path

P_{X Y}

in

B (n - 1, k)

from a node X to a node Y, we can replace any such e with

P_{A B}

, so that it gives a path in

B^{'} (n - 1, k)

from X to Y.

Suppose

A = x_{1} \dots x_{n - 1}

and

B = x_{2} \dots x_{n}

where all

x_{j} \in {x, y}

for some x and y. Let

z \neq x, y

. Then,

P_{A B}

is given by

A \to x_{2} \dots x_{n - 1} z \to x_{3} \dots x_{n - 1} z x_{2} \to x_{4} \dots x_{n - 1} z x_{2} x_{3} \to \dots \to B

since no of the edges in

P_{A B}

belongs to an i-cycle for

i < n - 1

. So,

B^{'} (n - 1, k)

is Eulerian, and thus its line graph

B^{'} (n, k)

is Hamiltonian, and there exists a u-cycle corresponding to it.

To justify (1), we consider i-cycles,

2 \leq i \leq n - 2

, of the form

x^{m} y^{j} x^{m} y^{j} \dots \to x^{m - 1} y^{j} x^{m} y^{j} \dots \to x^{m - 2} y^{j} x^{m} y^{j} \dots \to \dots

where

x < y

,

m + j = i

and

1 \leq m, j \leq i - 1

. Note that no two of such cycles can share an edge. Thus, we can remove in

B (n - 1, k)

s_{i}

such i-cycles for

1 \leq i \leq n - 2

so that the total number of removed edges (corresponding to the total number of removed nodes in

B (n, k)

) is s. Clearly, the number of such one-cycles is k, and for

i \geq 2

, the number of such i-cycles is

(i - 1) (\binom{k}{2})

.

To justify (2) we note that if all of the s removed edges come from the binary cycles considered above, then the same lower bound as in (1) will be obtained. This bound can be improved as follows. Begin with removing

s - 1

edges coming from the binary i-cycles as above, which will result in a Eulerian graph, so that we can remove any edge e in such a graph and obtain a semi-Eulerian graph corresponding to a u-word. To count the possibilities to remove such an e, we do not want e to be a loop, because this will result in some double counting. However, if e is not a loop, all the cases will be different from already considered cases, because before we were removing entire i-cycles for some i. This explains the term

α = k^{n} - (s - 1) - (k - s_{1})

in (2). □

In the proof of the next theorem we need the following simple lemma. There, by a circular binary string, we mean a number of digits 0 and 1 placed around a circle in positions labeled by

1, 2, . . . . . .

Lemma 1.

For

k \geq 2

, the number of circular binary strings with i 1s and

k - i

0s,

0 \leq i \leq k

, in which no two 1s stay next to each other is given by

(\binom{k - i - 1}{i - 1}) + (\binom{k - i}{i}) .

Proof.

Let

h (k, i)

be the number of binary (non-circular) strings with i 1s in which no two 1s stay next to each other. Then,

h (k, i) = (\binom{k - i + 1}{i})

. Indeed,

h (k, i)

clearly counts placing i 1s in a binary string of length

k - i + 1

and then replacing each 1, but the rightmost 1, by 10. For the circular case, if 0 is in position 1, then we clearly have

h (k - 1, i)

such strings. On the other hand, if 1 is in position 1 in the circular case, then we have

h (k - 3, i - 1)

such strings since then positions k and 2 must be occupied by 0s. This completes the proof. □

Theorem 4.

Let

n = 2

and

k \geq 3

,

f (k, s) : = \sum_{i = 0}^{k (k - 3) / 2} (\binom{\frac{k (k - 3)}{2}}{i}) (\binom{k}{s - 2 i})

,

g (k, s) : = \sum_{i \geq 3} (i - 1)! ((\binom{k - i - 1}{i - 1}) + (\binom{k - i}{i})) (\binom{k}{s - i}),

U (k, s) : = f (k, s) + 2 f (k, s - k) + g (k, s) + 2 g (k, s - k)

and

V (k, s) : = k (k - 3) ((\binom{k}{s - 1}) + 2 (\binom{k}{s - k - 1})) .

Then,

P_{c} (2, k, s) \geq \frac{U (k, s)}{(\binom{k^{2}}{s})} .

(3)

Also,

P_{w} (2, k, s) \geq \frac{U (k, s) + V (k, s) + 2 k \sum_{j = 1}^{k - 1} (f (k, s - j) + g (k, s - j))}{(\binom{k^{2}}{s})} .

(4)

Proof.

Instead of removing s nodes in

B (2, k)

, we consider removing s edges in

B (1, k)

, whose nodes are k loops

1, \dots k

, and for every pair of nodes x and y, both

x \to y

and

y \to x

are present. We call a 2-cycle in

B (1, k)

special if it involves nodes x and

x + 1

for

1 \leq x \leq k - 1

, or 1 and k. Clearly, the number of non-special two-cycles is

\frac{k (k - 3)}{2}

.

To justify (3), note that removing all the edges in any i of non-special 2-cycles in

B (1, k)

, and then removing

s - 2 i

loops results in a balanced and strongly connected graph showing the existence of a u-cycle in this case. The number of ways to proceed in this way is clearly given by

f (k, s)

. Moreover, we can proceed in the same way after first removing the k edges either from the cycle

1 \to 2 \to \dots \to k \to 1

, or from the cycle

k \to (k - 1) \to \dots \to k \to 1

, which explains the term of

2 f (k, s - k)

.

To produce a more subtle estimate, we will be removing just a single i-cycle for a fixed

i \geq 3

from

B (1, k)

, which is clearly not counted previously. The only condition on removing such an i-cycle is that it must not involve any of the edges in a special two-cycle for us to guarantee strong connectivity of the obtained graph. The number of ways to selected i nodes to form such an i-cycle is given by Lemma 1, and since there are edges in both directions between any pair of selected nodes, there are

(i - 1)!

ways to choose a cycle on the chosen nodes. The remaining

s - i

edges to be removed after removing i-edges in an i-cycle can be chosen among the k loops. This explains the term of

g (k, s)

. Finally, removing the k edges in either the cycle

1 \to 2 \to \dots \to k \to 1

, or the cycle

k \to (k - 1) \to \dots \to k \to 1

, and then removing an i-cycle as above results in a balanced and strongly connected graph, and explains the term

2 g (k, s - k)

. This completes the justification of (3).

To justify (4), first note that all cases considered in proving (3) can be used in the case of u-words. To improve the bound, we note that a directed path (on distinct nodes) of length j,

1 \leq j \leq k - 1

, consisting of edges coming from special 2-cycles, can be removed, and then some other cycles can be removed as discussed in the case of u-cycles, which will result in a semi-Eulerian graph and thus corresponds to a u-word. There are two ways to pick the direction of such a path on special two-cycles, and k ways to pick its start, justifying the term of

2 k \sum_{j = 1}^{k - 1} (f (k, s - j) + g (k, s - j))

. Finally, the following two options also result in semi-Eulerian graph not considered above:

remove any non-loop edge among the $k (k - 3)$ edges coming not from special two-cycles, and the remaining edges can be removed from loops. This gives $k (k - 3) (\binom{k}{s - 1})$ possibilities;
remove the k edges in either the cycle $1 \to 2 \to \dots \to k \to 1$ , or the cycle $k \to (k - 1) \to \dots \to k \to 1$ , and then remove one more non-loop edge among the $k (k - 3)$ edges coming not from special two-cycles, and the remaining edges can be removed from loops, which gives $2 k (k - 3) (\binom{k}{s - k - 1})$ possibilities.

This explains the term of

V (k, s)

and completes the proof of (4). □

3. The Case of the Alphabet Size $k = 2$

Recall that in Table 1, Table 2 and Table 3 we present the values of

P_{c} (n, 2, s)

and

P_{w} (n, 2, s)

for

n = 2, 3, 4

.

Let

T (n, s) = \sum (\binom{2}{s_{1}}) (\binom{1}{s_{2}}) (\binom{1}{s_{n}}),

where the sum is taken over all

s_{1} + 2 s_{2} + (n - 1) s_{n - 1} = s

with

0 \leq s_{1} \leq 2

,

0 \leq s_{2} \leq 1

and

0 \leq s_{n - 1} \leq 2

. Then, for

k = 2

and

n \geq 5

, we will show in the next theorem that

P_{c} (n, 2, s) \geq \frac{T (n, s)}{(\binom{2^{n}}{s})} .

(5)

P_{w} (n, 2, s) \geq \frac{T (n, s) + \sum (2^{n} - s + s_{1} - 1) (\binom{2}{s_{1}}) (\binom{1}{s_{2}}) (\binom{1}{s_{n}})}{(\binom{2^{n}}{s})},

(6)

where the sum is taken over all

s_{1} + 2 s_{2} + (n - 1) s_{n - 1} = s - 1

with

0 \leq s_{1} \leq 2

,

0 \leq s_{2} \leq 1

and

0 \leq s_{n - 1} \leq 2

.

Theorem 5.

For

n \geq 5

, the lower bounds in (5) and (6) hold.

Proof.

We have

k = 2

and

n \geq 5

. We observe that after removing two loops, the two-cycle, and the two

(n - 1)

-cycles of the form

x^{n - 2} y \to x^{n - 3} y x \to x^{n - 4} y x^{2} \to \dots \to x y^{n - 2} \to x^{n - 2} y

in

B (n - 1, k)

results in a strongly connected and balanced graph

B^{'} (n - 1, k)

. Indeed, clearly

B^{'} (n - 1, k)

is balanced. To justify that

B^{'} (n - 1, k)

is strongly connected, we need to show that for any edge

e = A \to B

belonging to a removed cycle, there is a directed path

P_{A B}

from A to B which does not go through any other edge from the removed cycles. Then, in a path

P_{X Y}

in

B (n - 1, k)

from a node X to a node Y we can replace any such e with

P_{A B}

giving a path in

B^{'} (n - 1, k)

from X to Y. We consider two cases.

Case 1.

A = x^{i} y x^{n - i - 2}

and

B = x^{i - 1} y x^{n - i - 3}

for

0 \leq i \leq n - 2

. If

i = 0

then

P_{A B} = A \to x^{n - 1} \to B

. If

i = 1

then

P_{A B}

is given by

A \to y x^{n - 3} y \to x^{n - 3} y y \to \dots \to y^{n - 1} \to y^{n - 2} x \to y^{n - 3} x^{2} \to \dots \to B .

If

i = n - 2

, then

P_{A B}

is given by

A \to x^{n - 3} y y \to \dots \to y^{n - 1} \to y^{n - 2} x \to y^{n - 3} x^{2} \to \dots \to B .

In all other cases,

P_{A B}

is given by

A \to x^{i - 1} y x^{n - i - 2} y \to x^{i - 2} y x^{n - i - 2} y^{2} \to x^{i - 3} y x^{n - i - 2} y^{2} x \to

x^{i - 4} y x^{n - i - 2} y^{2} x^{2} \to \dots \to y^{2} x^{i - 1} y x^{n - i - 4} \to y x^{i - 1} y x^{n - i - 3} \to B .

Case 2.

A = x_{1} x_{2} \dots x y x y

and

B = x_{2} \dots y x y x

are in the 2-cycle where

x \neq y

. Then, for even n,

P_{A B}

is given by

A \to x_{2} \dots y x y y \to x_{3} \dots x y y x \to x_{4} \dots y y x y \to x_{5} \dots y x y x \to \dots \to B,

and for odd n,

P_{A B}

is given by

A \to x_{2} \dots y x y y \to x_{3} \dots x y y x \to x_{4} \dots y y x x \to x_{5} \dots y x x y \to

x_{6} \dots x x y x \to x_{7} \dots x y x y \to \dots \to B .

So,

B^{'} (n - 1, k)

is Eulerian, and thus its line graph

B^{'} (n, k)

is Hamiltonian, and there exists a u-cycle corresponding to it.

To justify (5), we note that no two of the two loops, one two-cycle and two

(n - 1)

-cycles considered above can share an edge. Thus, we can remove in

B (n - 1, k)

s_{i}

such i-cycles for

i \in {1, 2, n - 1}

so that the total number of removed edges (corresponding to the total number of removed nodes in

B (n, k)

) is s.

To justify (6) we note that if all of the s removed edges come from the cycles considered above, then the same lower bound as in (5) will be obtained. This bound, can be improved as follows. Begin with removing

s - 1

edges coming from the i-cycles as above, which will result in an Eulerian graph, so that we can remove any edge e in such a graph and obtain a semi-Eulerian graph corresponding to a u-word. To count the possibilities to remove such an e, we do not want e to be a loop, because this will result in some double counting. However, if e is not a loop, all the cases will be different from already considered cases, because before we were removing entire i-cycles for some i. This explains the factor of

2^{n} - (s - 1) - (2 - s_{1})

in (6). □

4. Exact Values of $P_{c} (n, k, 2)$ and $P_{w} (n, k, 2)$

Theorem 6.

We have

P_{c} (2, 2, 2) = \frac{1}{6}

and for

n \geq 3

and

k \geq 2

,

P_{c} (n, k, 2) = \frac{k (k - 1)}{(\binom{k^{n}}{2})} .

Proof.

If two nodes are removed in

B (2, 2)

, the only possibility for the graph to stay Hamiltonian (and thus to correspond to a u-cycle) is if the removed nodes are loops, which explains that

P_{c} (2, 2, 2) = \frac{1}{6}

. On the other hand, if

n \geq 3

and

k \geq 2

then in order to obtain an Eulerian graph by removing two edges in

B (n - 1, k)

we must either remove two loops, or remove a two-cycle. Each of these gives

(\binom{k}{2})

possibilities thus explaining the formula for

P_{c} (n, k, 2)

. □

The proof of Theorem 8 relies on the following theorem, which looks like an intuitively true statement, but its proof is rather involved and requires consideration of many cases, and we were not able to find this result in the literature.

Theorem 7.

Let

e = a \to b

be an edge in

B (n, k)

. Then, there exists a Hamiltonian cycle in

B (n, k)

that goes through e, with the only exception when

k = 2

, a is out-special and b is in-special.

Proof.

If

k = 2

,

a = y x^{n - 1}

and

b = x^{n - 1} y

then no Hamiltonian cycle can cover the loop

x^{n}

and go through e because the only edge coming to

x^{n}

comes from a. This is not the case for

k \geq 3

.

If either a or b is a loop

x^{n}

, then the statement is true. Indeed, if

k = 2

then there is only one edge coming in to

x^{n}

, and one edge coming out of

x^{n}

, so these edges will be part of any Hamiltonian cycle. On the other hand, if

k \geq 3

, then suppose

a = y x^{n - 1}

and

b = x^{n}

; the case when

a = x^{n}

can be considered similarly. Since

B (n, k)

has a Hamiltonian cycle and the corresponding u-cycle U, it will go through an edge

z x^{n - 1} \to x^{n}

,

x \neq z

. If

y = z

we are done. Otherwise, we can swap all y’s and z’s in U to obtain the desired Hamiltonian cycle from the new u-cycle.

Thus, we can assume that neither a nor b is a loop.

In what follows, we will use the following approach. We will be considering edges

e_{1} = A \to B

and

e_{2} = B \to C

in

B (n - 1, k)

corresponding to a and b, respectively. Next, we will demonstrate that after removing

e_{1}

and

e_{2}

(corresponding to removing a and b in

B (n, k)

) the obtained graph

B^{'} (n - 1, k)

remains connected. This is done via finding alternative directed paths

P_{X Y}

from X to Y,

X \neq Y

, where

X, Y \in {A, B}

or

X, Y \in {B, C}

. Together with the fact that

B^{'} (n - 1, k)

is balanced if

A = C

, or otherwise A has one extra edge coming in, and C has one extra edge coming out,

B^{'} (n - 1, k)

has an Eulerian trail corresponding to a Hamiltonian path in

B^{'} (n, k)

obtained from

B (n, k)

after removing a and b. Such a Hamiltonian path can clearly be extended to a Hamiltonian cycle in

B (n, k)

by adding back the removed edge e.

Suppose that

n = 3

. If

k = 2

then we have 8 possibilities for e (loops are not involved, and we cannot have a be out-special and b be in-special). Each of the eight possible choices of e can be found in one of the following two Hamiltonian cycles in

B (3, 2)

giving the desired result:

100 \to 000 \to 001 \to 010 \to 101 \to 011 \to 111 \to 110 \to 100

001 \to 011 \to 111 \to 110 \to 101 \to 010 \to 100 \to 000 \to 001

Thus we can assume

k \geq 3

. Let

a = x y z

and

b = y z h

. Then,

A = x y

,

B = y z

and

C = z h

. Letting

t \neq x, z

we see that

P_{A B} = A \to y t \to t y \to B

if

t y \neq y z

, or else

P_{A B} = A \to y t \to B

.

P_{B C}

is found in the same way.

In what follows, we assume that

n \geq 4

(and a and b are not loops).

If

A = C

then

A, B, C

form a 2-cycle, and for

n \geq 3

none of these vertices is special. Thus,

P_{A B}

and

P_{B C}

exist by Theorem 2. So, we can assume that

A \neq C

.

Suppose that one of A, B, or C is a loop.

Case 1.

A = x^{n - 1}

is a loop,

B = x^{n - 2} y

,

C = x^{n - 3} y z

, and

y \neq x

.

P_{B C}

exist by Theorem 2, and for

k \geq 3

,

P_{A B}

is given by

A = x^{n - 1} \to x^{n - 2} t \to x^{n - 3} t x \to x^{n - 4} t x^{2} \to \dots \to t x^{n - 2} \to B = x^{n - 2} y,

where

t \neq y, x

. For

k = 2

, we note that

P_{A B}

does not exist in this case. However, it is sufficient for us to prove that there exists

P_{B A}

that does not use the edge

e_{2}

(

e_{1}

clearly will not be used). Such a path is given by

B = x^{n - 2} y \to x^{n - 3} y \bar{z} \to x^{n - 4} y \bar{z} x \to x^{n - 4} y \bar{z} x^{2} \to \dots \to A = x^{n - 1},

where

\bar{z}

denotes the letter distinct from z. In the case

z = x

the path above has one extra step than otherwise.

Case 2.

B = y^{n - 1}

is a loop,

A = x y^{n - 2}

,

C = y^{n - 2} z

,

y \neq x

, and

z \neq y

. For

k \geq 3

,

P_{B C}

is essentially

P_{A B}

in Case 1, and

P_{A B}

is given by

A = x y^{n - 2} \to y^{n - 2} t \to y^{n - 3} t y \to y^{n - 4} t y^{2} \to \dots \to t y^{n - 2} \to B = y^{n - 1}

where

t \neq y, x

. The case

k = 2

corresponds to a being out-special, and b being in-special, and it is the exception in the statement of the theorem (the loop B becomes non-reachable from any other node).

Case 3.

C = z^{n - 1}

is a loop,

A = x y z^{n - 3}

,

B = y z^{n - 2}

, and

y \neq z

.

P_{A B}

exist by Theorem 2, and for

k \geq 3

P_{B C}

is given by

B = y z^{n - 2} \to z^{n - 2} t \to z^{n - 3} t z \to z^{n - 4} t z^{2} \to \dots \to t z^{n - 2} \to C = z^{n - 1}

where

t \neq y, x

. Similarly to Case 1, for

k = 2

,

P_{B C}

does not exist, but we can find

P_{C B}

not using

e_{1}

and

e_{2}

:

C = z^{n - 1} \to z^{n - 2} \bar{x} \to z^{n - 3} \bar{x} y \to z^{n - 4} \bar{x} y z \to z^{n - 5} \bar{x} y z^{2} \to \dots \to B = y z^{n - 2}

where

\bar{x}

is the letter different from x so the last step is not the edge

e_{1}

.

Thus, we can assume that none of A, B, or C is a loop. Moreover, we only need to consider the following two cases, because otherwise,

P_{A B}

and

P_{B C}

are given by Theorem 2.

Case i.

A = y x^{n - 2}

is out-special and

B = x^{n - 2} z

is in-special, where

x \neq y, z

. In this case,

P_{A B} = y x^{n - 2} \to x^{n - 1} \to B = x^{n - 2} z

.

Case ii.

B = y x^{n - 2}

is out-special and

C = x^{n - 2} z

is in-special, where

x \neq y, z

. This is essentially Case i. □

Theorem 8.

We have

P_{w} (2, 2, 2) = \frac{5}{6}

and for

n \geq 3

,

P_{w} (n, 2, 2) = \frac{2^{n} - 3}{(2^{n} - 1) 2^{n - 3}}

Moreover, for

k \geq 3

and

n \geq 2

, we have

P_{w} (n, k, 2) = \frac{2 (2 k^{n} - 3 k + 1)}{k^{n - 1} (k^{n} - 1)} .

Proof.

If

n = 2

and

k = 2

, then it is easy to see that the graph obtained from

B (2, 2)

by removing two nodes has a Hamiltonian path unless the removed nodes are 01 and 10. This gives

P_{w} (2, 2, 2) = \frac{5}{6}

.

So, we can assume that either

n \geq 3

and

k \geq 2

or

n \geq 2

and

k \geq 3

.

Let a and b be the nodes in

B (n, k)

corresponding to the removed words of length n over k-letter alphabet. Note that

B^{'} (n - 1, k)

is semi-Eulerian (in particular, respecting the conditions on in-degrees and out-degrees) if and only if

either a or b is a loop, in which case clearly a Hamiltonian path in $B^{'} (n, k)$ exists, or
$a \to b$ (or $b \to a$ ) is an edge in $B (n, k)$ , in which case a Hamiltonian path in $B^{'} (n, k)$ exists by Theorem 7 with one exception.

Thus, exactly one of the following four cases of choosing a can occur in order for

B^{'} (n, k)

to have a Hamiltonian path.

Case 1. a is a loop, in which case b can be any node. Clearly, there are

k (k^{n} - 1)

ways to choose such a and b.

Case 2. (i)

a = x y x y \dots x

or (ii)

a = x y x y \dots y

. b can only be of the form, in case (i)

t t \dots t

or

y x y x \dots y

or

z x y x y \dots y

or

y x y x \dots x z

where

z \neq y

, and in case (ii)

t t \dots t

or

y x y x \dots y

or

z x y x y \dots x

or

y x y x \dots y z

where

z \neq y

. In either case, b can be chosen in

k + 1 + (k - 1) + (k - 1) = 3 k - 1

ways depending on the respective choices of t and z. There are

k (k - 1)

choices to pick a giving in total for this case

k (k - 1) (3 k - 1)

possibilities.

Case 3.

a = y x^{n - 1}

is out-special or

a = x^{n - 1} y

is in-special. In either case, b can be either a loop, or the other endpoint of an edge coming into a or going out of a. There are k loops, k edges coming in, and k edges coming out of a, but one of these edges is connected to a loop. Thus, in this case we have

3 k - 1

choices for b, and in total

2 k (k - 1) (3 k - 1)

possibilities (2 corresponds to the choices of being out-special or in-special). However, the last formular only works for

k \geq 3

, because when

k = 2

, we cannot remove a and b connected by an edge, when both of them are special (in this case a loop becomes isolated). So, if

k = 2

we have

2 k (k - 1) (3 k - 1 - 1) = 16

possibilities.

Case 4. In all other cases of a, b can be any of

2 k

nodes connected to a by an edge, or any of k loops. So, we have

3 k (k^{n} - k - k (k - 1) - 2 k (k - 1)) = 3 k (k^{n} - 3 k^{2} + 2 k)

possibilities.

Since every pair

(a, b)

appears twice in our arguments, Cases 1–4 give

P_{w} (n, 2, 2) = \frac{2 (2^{n} - 1) + 10 + 16 + 6 (2^{n} - 8)}{(2^{n} - 1) 2^{n}}

and for

k \geq 3

,

P_{w} (n, k, 2) =

\frac{k (k^{n} - 1) + k (k - 1) (3 k - 1) + 2 k (k - 1) (3 k - 1) + 3 k (k^{n} - 3 k^{2} + 2 k)}{(k^{n} - 1) k^{n}} .

□

5. Directions of Further Research

A universal cycle (or a universal word) for an arbitrary set S is a cyclic sequence (or a non-circular sequence) whose substrings of length n encode

| S |

distinct instances in S. U-cycles and u-words have been studied for a wide variety of combinatorial objects including permutations [6,7], partitions [8], subsets [9], multisets [10], labeled graphs [11], various functions and passwords; for more information, the reader is referred to [12]. There are many studies on universal cycles or universal words because of their applications including dynamic connections in overlay networks [13], genomics [4], software calculation of the ruler function in computer words [12], etc. An interesting direction of research would be extending our studies of

P_{c} (n, k, s)

and

P_{w} (n, k, s)

to other combinatorial structures. It would also be interesting to explore new methods to compute the exact values of

P_{c} (n, k, s)

and

P_{w} (n, k, s)

. Finally, our bounds for the general case presented in (1) and (2) can be improved by conducting a more subtle analysis of the removed cycles in the proof of Theorem 3, and exploring how far the improvement could go is another interesting direction.

Author Contributions

conceptualization, S.K.; methodology, H.Z.Q.C. and S.K.; software, H.Z.Q.C.; validation, H.Z.Q.C. and S.K.; formal analysis, H.Z.Q.C., S.K. and B.Y.S.; investigation, H.Z.Q.C., S.K. and B.Y.S.; resources, H.Z.Q.C., S.K. and B.Y.S.; data curation, H.Z.Q.C.; writing—original draft preparation, H.Z.Q.C. and S.K.; writing—review and editing, H.Z.Q.C., S.K. and B.Y.S.; visualization, H.Z.Q.C., S.K. and B.Y.S.; supervision, S.K.; project administration, S.K.; funding acquisition, N/A. All authors have read and agreed to the published version of the manuscript.

Funding

The work of the first author was supported by the National Natural Science Foundation of China (Grant Numbers 11901319) and the Fundamental Research Funds for the Central Universities (Grant Number 63191349). The third author was partially supported by the National Natural Science Foundation of China (Grant Numbers 11701491, 11726629 and 11726630) and the China Postdoctoral Science Foundation (Grant Number 2017M621188).

Acknowledgments

The authors are thankful to the reviewers for a number of useful comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chung, F.; Diaconis, P.W.; Graham, R.L. Universal cycles for combinatorial structures. Discrete Math. 1992, 110, 43–59. [Google Scholar] [CrossRef] [Green Version]
Gardner, K.B.; Godbole, A. Universal cycles of restricted words. J. Combin. Math. Combin. Comput. 2018, 106, 153–173. [Google Scholar]
Moreno, E. De Bruijn sequences and De Bruijn graphs for a general language. Inform. Process. Lett. 2005, 96, 214–219. [Google Scholar] [CrossRef]
Zerbino, D.; Birney, E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18, 821–829. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Baumslag, M. An algebraic analysis of the connectivity of DeBruijn and shuffle-exchange digraphs. Discrete Appl. Math. 1995, 61, 213–227. [Google Scholar] [CrossRef] [Green Version]
Johnson, J.R. Universal cycles for permutations. Discrete Math. 2009, 309, 5264–5270. [Google Scholar] [CrossRef] [Green Version]
Kitaev, S.; Potapov, V.N.; Vajnovszki, V. On shortening u-cycles and u-words for permutations. Discrete Appl. Math. 2019, in press. [Google Scholar] [CrossRef] [Green Version]
Casteels, K.; Stevens, B. Universal cycles of (n − 1)-partitions of an n-set. Discrete Math. 2009, 309, 5332–5340. [Google Scholar] [CrossRef] [Green Version]
Jackson, B. Universal cycles of k-subsets and k-permutations. Discrete Math. 1993, 117, 114–150. [Google Scholar] [CrossRef]
Hurlbert, G.; Johnson, T.; Zahl, J. On universal cycles for multisets. Discrete Math. 2009, 309, 5321–5327. [Google Scholar] [CrossRef] [Green Version]
Brockman, G.; Kay, B.; Snively, E. On universal cycles of labeled graphs. Electron. J. Combin. 2010, 17, 1–9. [Google Scholar] [CrossRef] [Green Version]
Wong, D. Novel Universal Cycle Constructions for a Variety of Combinatorial Objects. Ph.D. Thesis, University of Guelph, Guelph, ON, Canada, 2015. [Google Scholar]
Sridhar, M.A.; Raghavendra, C.S. Fault-tolerant networks based on the de Bruijn graph. IEEE Trans. Comput. 1991, 40, 1167–1174. [Google Scholar] [CrossRef]

Table 1. Values of

P_{c} (2, 2, s)

and

P_{w} (2, 2, s)

for

s \geq 1

.

Table 1. Values of

P_{c} (2, 2, s)

and

P_{w} (2, 2, s)

for

s \geq 1

.

s	1	2	3
$P_{c} (2, 2, s)$	$\frac{1}{2}$	$\frac{1}{6}$	0
$P_{w} (2, 2, s)$	1	$\frac{5}{6}$	1

Table 2. Values of

P_{c} (3, 2, s)

and

P_{w} (3, 2, s)

for

s \geq 1

.

Table 2. Values of

P_{c} (3, 2, s)

and

P_{w} (3, 2, s)

for

s \geq 1

.

s	1	2	3	4	5	6	7
$P_{c} (3, 2, s)$	$\frac{1}{4}$	$\frac{1}{14}$	$\frac{1}{28}$	$\frac{3}{70}$	$\frac{1}{28}$	0	0
$P_{w} (3, 2, s)$	1	$\frac{5}{7}$	$\frac{13}{28}$	$\frac{5}{14}$	$\frac{5}{14}$	$\frac{13}{28}$	1

Table 3. Values of

P_{c} (4, 2, s)

and

P_{w} (4, 2, s)

for

s \geq 1

.

Table 3. Values of

P_{c} (4, 2, s)

and

P_{w} (4, 2, s)

for

s \geq 1

.

s	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
$P_{c}$	$\frac{1}{8}$	$\frac{1}{60}$	$\frac{1}{140}$	$\frac{3}{910}$	$\frac{1}{546}$	$\frac{1}{728}$	$\frac{1}{1144}$	$\frac{1}{1287}$	$\frac{1}{1430}$	$\frac{1}{1144}$	$\frac{1}{728}$	$\frac{3}{1820}$	$\frac{1}{280}$	0	0
$P_{w}$	1	$\frac{13}{30}$	$\frac{13}{70}$	$\frac{1}{10}$	$\frac{23}{364}$	$\frac{355}{8008}$	$\frac{199}{5720}$	$\frac{62}{2145}$	$\frac{153}{5720}$	$\frac{31}{1144}$	$\frac{3}{91}$	$\frac{1}{20}$	$\frac{13}{140}$	$\frac{29}{120}$	1

Table 4. References for the lower bounds for

P_{c} (n, k, s)

and

P_{w} (n, k, s)

. The gray cells refer to exact values.

Table 4. References for the lower bounds for

P_{c} (n, k, s)

and

P_{w} (n, k, s)

. The gray cells refer to exact values.

	2	3	4	≥5
k	2	3	4	≥5
2	Table 1	Table 2	Table 3	Theorem 5
≥3	Theorem 4	Theorem 3

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, H.Z.Q.; Kitaev, S.; Sun, B.Y. Lower Bounds, and Exact Enumeration in Particular Cases, for the Probability of Existence of a Universal Cycle or a Universal Word for a Set of Words. Mathematics 2020, 8, 778. https://doi.org/10.3390/math8050778

AMA Style

Chen HZQ, Kitaev S, Sun BY. Lower Bounds, and Exact Enumeration in Particular Cases, for the Probability of Existence of a Universal Cycle or a Universal Word for a Set of Words. Mathematics. 2020; 8(5):778. https://doi.org/10.3390/math8050778

Chicago/Turabian Style

Chen, Herman Z. Q., Sergey Kitaev, and Brian Y. Sun. 2020. "Lower Bounds, and Exact Enumeration in Particular Cases, for the Probability of Existence of a Universal Cycle or a Universal Word for a Set of Words" Mathematics 8, no. 5: 778. https://doi.org/10.3390/math8050778

APA Style

Chen, H. Z. Q., Kitaev, S., & Sun, B. Y. (2020). Lower Bounds, and Exact Enumeration in Particular Cases, for the Probability of Existence of a Universal Cycle or a Universal Word for a Set of Words. Mathematics, 8(5), 778. https://doi.org/10.3390/math8050778

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lower Bounds, and Exact Enumeration in Particular Cases, for the Probability of Existence of a Universal Cycle or a Universal Word for a Set of Words

Abstract

1. Introduction

2. The Case of the Alphabet of Size $K \geq 3$

3. The Case of the Alphabet Size $k = 2$

4. Exact Values of $P_{c} (n, k, 2)$ and $P_{w} (n, k, 2)$

5. Directions of Further Research

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Lower Bounds, and Exact Enumeration in Particular Cases, for the Probability of Existence of a Universal Cycle or a Universal Word for a Set of Words

Abstract

1. Introduction

2. The Case of the Alphabet of Size K ≥ 3

3. The Case of the Alphabet Size k = 2

4. Exact Values of P c ( n , k , 2 ) and P w ( n , k , 2 )

5. Directions of Further Research

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2. The Case of the Alphabet of Size $K \geq 3$

3. The Case of the Alphabet Size $k = 2$

4. Exact Values of $P_{c} (n, k, 2)$ and $P_{w} (n, k, 2)$