1. Introduction
Parts of the work have been submitted to the IEEE Symposium on Information Theory (ISIT), Ann Arbor, MI, USA, 2025 [
1]. This extension contains added proofs for results on orthogonal de Bruijn sequences and a completely new section on orthogonal Kautz sequences.
De Bruijn sequences [
2,
3] are combinatorial objects that have found many practical applications, which range from pesudorandomness generation, hashing, and lookup table design to DNA assembly and molecular data storage [
4]. The utility of the de Bruijn sequences of order
k stems from the fact that they have the property of covering all
k-sequences over a finite alphabet as substrings exactly once. De Bruijn sequences have been further generalized to include
balancing constraints [
5]—in which case, every
k-sequence is allowed to appear
ℓ or at most
ℓ times—or general substring constraints, as described in [
6]. In addition, they have been modified to accommodate other practical constraints, such as run length bounds, in which case, the sequences are known as
Kautz sequences [
7].
Another interesting extension of the concept of de Bruijn sequences is
orthogonal de Bruijn sequences, introduced in [
8] for the purpose of optimizing experimental designs in synthetic biology (they were also independently studied in the mathematics literature [
9,
10] under the name of arc-disjoint de Bruijn cycles). In a nutshell, orthogonal de Bruijn sequences are the de Bruijn sequences of order
k that satisfy the joint (cross) property, where every
-sequence appears in at most one of the sequences in the collection. The de Bruijn property of the sequences is used to ensure both the
diversity of DNA sequence binding probes of length
k and the shortest sequence length property under the diversity constraint (since DNA strings have to be synthesized for testing and since the cost of synthesis prohibits the use of long strings). Interestingly, the orthogonality constraint aims to reduce the undesired cross-hybridization of longer probes designed to target only one of the sequences; although, in the definition, the constraint is imposed on length-
substrings, other constrained substring lengths (such as
) are equally relevant. From the perspective of DNA-based data storage, orthogonal de Bruijn sequences are relevant for multistage primer-based random access [
11]. There, groups of strings sharing a common
k-substring can be accessed together (for any possible choice of the substring), and then further partitioned into subgroups as needed, using more specialized primers that are not shared by the whole group (say, using primers of length
,
). A drawback of orthogonal de Bruijn sequences is that their number is strongly restricted by the alphabet size. We can increase an orthogonal collection by relaxing the notion of orthogonality, as described in
Section 3 (following the preliminaries of
Section 2). There, we study orthogonal de Bruijn sequences in which
-sequences are allowed to appear at most
ℓ times, with
ℓ . The main result is that the number of generalized orthogonal de Bruijn sequences scales with
ℓ.
Another practical issue with orthogonal de Bruijn sequences is that each
k-sequence has only one context in which it appears in each of the sequences. To increase the number of different contexts, we can examine
b-balanced de Bruijn sequences in which each
k-sequence is allowed to appear exactly
b times [
5]. In this case, we can investigate the (new) notion of orthogonality of balanced de Bruijn sequences, as outlined in
Section 4. For ease of synthesis, it is desirable to maintain composition/weight constraints for the DNA sequences, resulting in counting and construction problems pertaining to fixed-weight (fixed composition) de Bruijn sequences, which are introduced and studied in
Section 5.
We conclude our exposition with a review of Kautz and orthogonal Kautz sequences and the introduction of balanced and fixed-weight Kautz and orthogonal Kautz sequences, as described in
Section 6. The relevance of the added run length constraint is that the sequences avoid what is known as
homopolymers of DNA symbols, which are known to cause both DNA synthesis and sequencing errors. This is particularly the case for nanopore sequencers, as first described and experimentally evaluated in [
12].
2. Preliminaries
We start by reviewing relevant concepts and definitions pertaining to (orthogonal) de Bruijn sequences and graphs.
Definition 1. Let , and let be an alphabet of size . A circular sequence is called a -de Bruijn sequence if each sequence in appears as a circular substring of exactly once. More specifically, for each sequence , there is a unique index such that . Here, we used to denote the integer set for two integers a and b that satisfy . We do not distinguish sequences that are circular shifts of each other.
Fundamental for the study of de Bruijn sequences is the notion of a
de Bruijn graph of order
k and alphabet size
, which is denoted by
. A de Bruijn graph is a directed graph
with vertex set
and arc set
In words, for
, there exists an arc from
to
if and only if the length-
suffix of
is the same as the length-
prefix of
.
Definition 2. A collection of -de Bruijn sequences is called orthogonal if each sequence in appears at most once in .
It is clear that there exists a one-to-one correspondence between -de Bruijn sequences and Eulerian circuits (i.e., closed walks that traverse each arc exactly once) in . Furthermore, a length- string appears in a -de Bruijn sequence if and only if the corresponding Eulerian circuit traverses from to to , where , , and .
Similarly, there exists a one-to-one correspondence between -de Bruijn sequences and Hamiltonian cycles (i.e., closed walks that traverse each vertex exactly once) in . The length- string appears in a -de Bruijn sequence if and only if the corresponding Hamiltonian cycle traverses from to .
The relevance of de Bruijn graphs
for the analysis of orthogonal de Bruijn sequences comes from the fact that certain arc-constrained Hamiltonian cycles correspond to orthogonal sequences. In that context, it was shown in [
8] that for
, the number of orthogonal
-de Bruijn sequences is bounded between
and
.
Example 1. Let and . The circular sequence is a -de Bruijn sequence since its length-2 substrings are —all the length-2 sequences over without repetition. Figure 1a shows the de Bruijn graph used to generate the sequence. Figure 1b depicts the Eulerian circuit on that corresponds to, which is . Figure 1c illustrates the de Bruijn graph , where corresponds to the Hamiltonian cycle . The length-3
substring 012
in corresponds to the two-step walk in the Eulerian cycle on and the arc in the Hamiltonian cycle on . We also review the balanced de Bruijn sequence studied in [
5].
Definition 3 ([
5] Definition 4)
. Let k and be as before, and let . A circular sequence is called a b-balanced
-de Bruijn sequence
if each sequence in appears as a circular substring of exactly b times. 3. Generalized Orthogonal de Bruijn Sequences
Our first result pertains to a generalization of orthogonal de Bruijn sequences, defined below.
Definition 4. Let ℓ . A collection of -de Bruijn sequences is called ℓ-orthogonal if each sequence in appears at most ℓ times in .
Let denote the maximum cardinality of a collection of ℓ-orthogonal -de Bruijn sequences. We have the following bound on :
Proposition 1. We have .
Proof. The proof follows a similar argument as in [
8] Corollary 4. A collection of
ℓ-orthogonal
-de Bruijn sequences corresponds to a collection of Hamiltonian cycles in
such that each arc is used at most
ℓ times. Note that the all-zero vertex
in
(assuming
has
incoming arcs,
outgoing arcs, and a loop. Note that for a directed graph
G and a vertex
v of
G, an incoming arc of
v refers to an arc of the form
for some
, while an outgoing arc of
v refers to an arc of the form
for some
; a loop on
v is an arc
. Also note that a Hamiltonian cycle cannot involve a loop. Therefore, by the pigeonhole principle, any collection of more than
ℓ Hamiltonian cycles on
uses at least one of the
incoming arcs of
more than
ℓ times. □
Theorem 1. If ℓ and , then This result is intuitively expected, despite its proof being nontrivial. Before proceeding with the proof, we introduce the concept of “wiring” described in [
8,
9].
Definition 5. Let G be a directed graph with an Eulerian circuit C, and let v be a vertex in G. For a graph to be Eulerian, each vertex must have the same in-degree and out-degree, so that its incoming arcs and outgoing arcs are paired up in the circuit. A wiring
of v [8], or a transition system
at v [9], is a 1
-regular bipartite undirected graph (i.e., a matching) between two vertex sets representing the incoming arcs and outgoing arcs of v. More precisely, the wiring of v induced by C, denoted by , is a wiring such that an incoming arc connects to an outgoing arc if and only if the Eulerian circuit C uses and in consecutive order. In the case that v has a self-loop, we treat that loop as both an incoming arc and an outgoing arc. We say two Eulerian circuits are
compatible [
9] if the induced wirings are edge-disjoint at each vertex, i.e., two Eulerian circuits are compatible if they do not use the same in/out arc pair at any vertex. This leads to the following characterization of orthogonal de Bruijn sequences: A collection of
-de Bruijn sequences is orthogonal if and only if their corresponding Eulerian circuits in
are pairwise compatible.
In our proof of Theorem 1, we will make use of the following lemmas regarding wirings.
Lemma 1 ([
8] Theorem 7)
. Let be a directed graph with an Eulerian circuit C and let . Let denote the in/out-degree of v (i.e., the in-degree of v, which is equal to the out-degree of v). If , there exists an Eulerian circuit , denoted by such that and are edge-disjoint;
for all .
In words, we can rewire v with respect to C to obtain so that does not use the same in/out arc pair of v as C; the wirings at other vertices in remain the same as in C.
Lemma 2 ([
8] Theorem 8; [
9] Theorem 1)
. Let be an directed graph with an Eulerian circuit C, and let be compatible Eulerian circuits of G. Furthermore, let . If , then there exists an Eulerian circuit , denoted by such that the following holds: and are edge-disjoint for ;
for all .
Intuitively speaking, given the current Eulerian circuit C and previous circuits , we can rewire v with respect to C to obtain such that none of uses the same in/out arc pair of v as .
Proof of Theorem 1. We first prove
for
. Let
be an Eulerian circuit of
. We arbitrarily partition the vertices in
into
ℓ groups, say
. Then, for
we recursively define
where for a collection of vertices
the notation
refers to an Eulerian circuit obtained by rewiring each vertex in
with respect to
C given
. Writing
, we define
, where
and
for
.
Remark 1. It is noteworthy that a different labeling of elements in may lead to different . That is, is not unique. However, in this proof, we can use any one of to construct the circuits of interest.
Next, write
. Note that every vertex in
has in/out-degree
. Thus, by Lemma 2, for any Eulerian circuit
C and any vertex
of
, we can always rewire
with respect to
C given any collection of less than
K compatible Eulerian circuits. This ensures that the following recursive definitions hold: For
and for
Next, define
and for
let
Note that in (
6), the conditioned circuits start from
instead of
. We claim that the following collection
is
ℓ-orthogonal. Let
be in the
mth group of vertices,
for some
. Observe that the following holds:
For each , the wiring of is the same in all . Denote it by . Denote by and the shared wiring of in and , respectively.
The wirings and are edge-disjoint for , and . Thus, each in/out arc pair of used in any is used exactly ℓ times in .
Even if and share some edge in their bipartite graphs, that edge is used only ℓ times (m times by and times by ). This establishes the claim.
Now we prove
as long as
. Again, let
be an Eulerian circuit of
. We recursively apply Lemma 1 to define for
that
Next, define
and for
let
A similar argument shows that
is
ℓ-orthogonal. □
Example 2. This example demonstrates the rewiring process from the proof of Theorem 1. Consider , , and . We seek to find Eulerian circuits on such that each in/out arc pair of each vertex is used at most twice. We first partition into and . Also, we select the starting circuit to be the Eulerian circuit in for the -de Bruijn sequence 012002211
. With a slight abuse of notation, we write . According to (
7)–(
9)
, the rewired circuits are , , and (Figure 2). Comparing Figure 1b and Figure 2, we see that each vertex has two edge-disjoint wirings (gray and violet), each of which appears twice in the collection . 4. Orthogonal Balanced de Bruijn Sequences
We start with several definitions relevant to the balanced de Bruijn sequence defined in Definition 3.
Definition 6. A b-balanced -de Bruijn sequence is self-orthogonal if any sequence in appears at most once in . A collection of b-balanced -de Bruijn sequences is called orthogonal if each sequence in appears at most once in .
Remark 2. Clearly, a b-balanced de Bruijn sequence in an orthogonal collection is self-orthogonal.
Definition 7. Let , G be a directed graph, and let C be a circuit of G. We say that C is ab-circuit of G if C visits each vertex of G exactly b times.
Remark 3. A b-circuit is sometimes referred to as an exact b-walk [13]. Proposition 2. There is a one-to-one correspondence between self-orthogonal b-balanced -de Bruijn sequences and b-circuits on . A collection of b-balanced -de Bruijn sequences is orthogonal if and only if each sequence is self-orthogonal and their corresponding b-circuits are arc-disjoint.
Example 3. Consider , , and . The circular sequences and are two 2-balanced -de Bruijn sequences since each length-2 sequence over appears exactly twice in both and . The sequence is not self-orthogonal since the length-3 sequence 202 appears twice in . On the other hand, is self-orthogonal, and it corresponds to the 2-circuit on (see Figure 3). Fix . Observe that when b increases, the number of length-() sequences in any two b-balanced -de Bruijn sequences increases as well, making them less likely to be orthogonal. This motivates the following definition: For , define to be the minimum such that there exist c orthogonal b-balanced -de Bruijn sequences.
We start by establishing a lower bound on .
Proposition 3. If , then .
Proof. Similar to the proof of Proposition 1, we observe that there are outgoing arcs for each vertex in . By Proposition 2, a self-orthogonal b-balanced de Bruijn sequence uses exactly b incoming and outgoing arcs at each vertex. Therefore, since c orthogonal b-balanced de Bruijn sequences share no arcs, they must use distinct outgoing arcs at each vertex. This is impossible if . □
We also have the following upper bound on
Theorem 2. For , is at most the smallest prime power that is greater than or equal to . Furthermore, if each prime factor of c divides b.
To prove Theorem 2, we establish the following lemmas.
Lemma 3. Assume , for some prime p and positive integer m, and . Then there exist c orthogonal b-balanced -de Bruijn sequences.
Proof. Since
is a prime power, by [
10] Lemma 2, there exist cycles
on
such that the following holds:
For with , the cycles and are arc-disjoint.
For , the cycle does not visit the vertex representing the sequence (assuming ) and visits all the other vertices exactly once.
Then, for each we can construct a circuit as follows. Let be the cycle with an additional transition following the loop on the vertex representing the all- sequence, where . Note that this definition is valid since must visit the all- sequence. Also, let be the cycle with an additional transition following the loop at the all- vertex. Then, let be a combination of (a combination of is a closed walk whose arcs are exactly those in ). Since each pair of meets at some vertex, such a closed walk always exists. Then, observe the following: (a) the arcs used in are those in and the loops at the all- vertices for ; (b) are arc-disjoint and do not use any loops; (c) for , the all- vertex is visited b times in (twice by , never by , and once by each remaining ); (d) each of the remaining vertices in is also visited b times in (once by each ). Therefore, is a b-circuit on . The arc-disjointness and no-loop property of the s imply that and are arc-disjoint for . Thus, are orthogonal b-balanced -de Bruijn sequences. □
Lemma 4. Let and be two directed graphs. Write and . Assume has arc-disjoint -circuits and has arc-disjoint -circuits. If and are coprime, then the tensor-product graph has arc-disjoint -circuits.
Proof. Let be arc-disjoint -circuits of and be arc-disjoint -circuits of .
Then, for each
and
, define
to be the subgraph of
with arc set
where for a directed graph
G, we use
to denote its arc set.
We first show that is a circuit. Since is a circuit of length , we can traverse following a circular sequence of distinct arcs , where each and the target of equals the source of . Similarly, we can represent by a circular sequence of arcs , where each and the target of equals the source of . Then, with a slight abuse of notation, we write Now fix an arc of . We can traverse starting from and by following , , etc. Then, the next time is used is after steps. However, note that there are exactly arcs in . Therefore, right before the next use of in , each arc in has been traversed, and thus represents a single closed walk on . Furthermore, if some arc is repeated, starting the traversal from that arc would contradict the fact that the next use of each arc has to happen after steps. Hence, is a circuit. Then, since and are arc-disjoint for and and for , we have that and are arc-disjoint when either or .
To verify that each is a circuit, it remains to show that each visits each vertex in exactly times. Fix a vertex v in . Write for some and . On , the vertex has exactly outgoing arcs, say . Similarly, the outgoing arcs of the vertex on are denoted by . Therefore, the outgoing arcs of on are of the form , where and . Since has exactly outgoing arcs in , it is visited exactly times in . Since v is arbitrary, is a -circuit.
In conclusion, we have shown that is a collection of arc-disjoint -circuits. □
Lemma 5. The tensor-product graph is isomorphic to .
Sketch of the proof. Without loss of generality, assume that the alphabets of
,
, and
are
,
, and
, respectively. It is not hard to show that the following mapping is a graph isomorphism:
where for
, we write
and
for the quotient and remainder of
when divided by
, respectively. Note that the construction in (
10) is related to the proofs of [
10] Lemma 3 and [
14] Lemma 3. □
Sketch of the proof of Theorem 2. Lemma 3 established the first statement in the theorem. Now, assume that each prime factor of c divides b. Then, we can write and , where are distinct prime numbers, are positive integers, and none of the divides R.
Now consider the following de Bruijn graphs . By Lemma 3, for each , has arc-disjoint -circuits. We also know that has one R-circuit, which is essentially an Eulerian circuit on it. Then, by repeatedly applying Lemma 4 and Lemma 5, we can see that has arc-disjoint -circuits. Equivalently, has c arc-disjoint b-circuits. Therefore, . We conclude that by invoking Proposition 3. □
Example 4. Consider , , and in Theorem 2. In this case, , , and . Next, examine the two de Bruijn graphs and . We associate with the Eulerian circuit .
Lemma 3 implies that has two arc-disjoint 2-circuits, constructed as follows. The cycles areThen, the circuit is defined by following the loop on the vertex 11
. Written as a sequence, . Note that the loop on 11
induces the length-3
substring 111
in . Similarly, we haveLet and be a combination of and and one of and , respectively. For example, we can takeThen, and are arc-disjoint 2
-circuits on . Then, we define and , where we treat , , and E as infinitely long periodic sequences, and with additions performed entry-wise. Here the coefficient 3 in front of and arises from the size of the alphabet of E. The first few terms of are , and the period of is since has period 32 and E has period 27. By Lemmas 4 and 5, and correspond to two arc-disjoint 6-circuits on .
5. Orthogonal Fixed-Weight de Bruijn Sequences
As pointed out, in many applications we are allowed to only use sequences with constrained compositions. We therefore first generalize the definition of de Bruijn sequences in a way that its length-k substrings belong to a constrained subset of length-k sequences.
Definition 8 ([
6]
Section 1)
. Let , be an alphabet of size , and let L be a subset of . A circular sequence is called a de Bruijn sequence with respect to
L if each sequence in L appears as a circular substring of exactly once. Similar to the nonrestricted case, restricted de Bruijn sequences can be characterized as Eulerian circuits of a specialized directed graph, defined as follows.
Definition 9 ([
6]
Section 2; [
4]
Section 3)
. Let , and let be an alphabet of size . Let L be a proper subset of . Define the de Bruijn graph with respect to
L as a directed graph with the vertex set being the collection of all length- prefixes and length- suffixes of sequences in L, and with arc set Then, there is a one-to-one correspondence between de Bruijn sequences with respect to L and Eulerian circuits of . Note that for some choices of L, there may not exist a de Bruijn sequence with respect to L or an Eulerian circuit in .
We say that two de Bruijn sequences with respect to L are orthogonal if they have no common length- circular substring. Similarly, two de Bruijn sequences with respect to L are orthogonal if and only if their corresponding Eulerian circuits on are compatible. We seek to analyze the restricted de Bruijn graph in Definition 8, where L is the collection of all sequences of certain restricted weights.
Formally, write , where represents the set of “weighted symbols”, while represents the set of “nonweighted” symbols, with both and nonempty and disjoint. Then, for any and each , we define the weight of , , as the number of entries of in :
Then, for
we define
to be the collection of all sequences in
with weight between
and
w. Our interest lies in orthogonal de Bruijn sequences with respect to
for
, i.e., compatible Eulerian circuits on
when
w and
differ by 1. Note that this is the smallest difference between
w and
we can consider due to the following fact: If
, then
is not strongly connected and thus not Eulerian unless
[
6]. The reason is that any vertex whose weight representation contains the substring 1010 can never reach a vertex with weight representation
. Here, the weight representation of
, denoted by
, is the binary vector
Next, we generalize the arguments in [
6] and characterize the vertices of
as well as their degrees. To simplify our discussion, we further assume that
. First, note that any length-
prefix or suffix of any string from
is necessarily of weight
,
, or
w. Conversely, any length-
string whose weight is between
and
w can be seen as a prefix of some string in
. As a result, the vertex set
is equal to the set
. Then, for
with
, we observe that its predecessors and successors are
and
, respectively. Thus, the in-degree and out-degree of each
with
are equal to
. Similarly, for
with
, the in- and out-degrees equal
. For
with
, there are no restrictions, and thus, its in- and out-degrees equal
.
The following theorem characterizes the number of orthogonal de Bruijn sequences with respect to :
Theorem 3. Let . The de Bruijn graph has a collection of compatible Eulerian circuits. Furthermore, this is the largest possible collection. In particular, has an Eulerian circuit.
This finding is consistent with the results of [
4] Proposition 3, which state that
always has an Eulerian circuit as long as
.
Proof of Theorem 3. Without loss of generality, we assume that since the other case can be handled similarly. Since each vertex in the de Bruijn graph has in/out-degree , , or , there is no collection of more than pairwise compatible Eulerian circuits of .
Before we proceed, we recall the vertex-splitting technique mentioned in [
9]: Given a directed graph
, a vertex
v with both in-degree and out-degree
d that has no loop, and a wiring (transition system)
W on
v, we can define another directed graph
by “splitting
v along
W” as follows: The vertex set of
is
, where
are newly introduced vertices. In other words, we remove
v from
G and add
d new vertices. Each
has in/out-degree 1, and its predecessor and successor are determined by exactly one edge of
W. The arcs on all the other vertices
remain the same as in
G.
Then, we enumerate and . For each vertex of of weight , we can define edge-disjoint wirings on by the following rule: For , the wiring is defined by pairing the incoming arc with the outgoing arc for each . Similarly, for each vertex in of weight w and for , we define the wiring by pairing the incoming arc with the outgoing arc for each . Recall that , and thus are edge-disjoint as well.
Now, for each , we define to be the directed graph obtained from by splitting every vertex of weight along and splitting every vertex of weight w along . If each admits an Eulerian cycle, then by “merging back the splitted vertices”, we obtain Eulerian circuits on such that for each vertex with weight or w, no in/out arc-pair is used twice in these . Then, since the remaining weight- vertices have degree , for each j we can repeatedly apply Lemma 2 to rewire each weight- vertex in given . After this process, we obtain compatible Eulerian circuits on .
It remains to show that each
indeed has an Eulerian circuit. Since each vertex in
either has in/out degree 1 or in/out degree
, it suffices to verify that
is strongly connected. Let
and
be two vertices in
. We first show that from
, we can reach
if
has weight
. Write the weight representations of
and
as
and
, respectively. Also, consider the following generalized de Bruijn graph
, where
consists of only one weighted symbol 1 and one nonweighted symbol 0. By [
6] Corollary 2.3, there is a path
P from
to
on
. Write the binary sequence induced by
P as
, where
and
. Then, observe that we can “follow the same path
P” to walk on
from
to some vertex having the same weight representation as
. That is, there is a path
on
from
to some
such that the induced sequence of
has weight representation
and the weight representation of
satisfies
. This claim can be easily established by induction on the length of
P.
Then, note that since and have the same weight representation and they all share the same weight , we can traverse from to using a walk that induces the sequence . The weight representation of each length-k substring in has a prefix that is a circular shift of , and thus each length-k substring in has weight w or . Therefore, is a valid walk in . Furthermore, each length- substring in has a weight representation equal to a circular shift of , and thus the walk in only visits vertices of weight . Thus, the walk is unaffected by the splitting process and remains present in . In conclusion, we can walk on from to and then from to , provided that has weight .
Now consider the case where has weight w or . If has weight , we can traverse in a reverse direction from to its unique predecessor , and then if still has weight , traverse to , and so on. Since each traversal appends a symbol of weight 1 to the left of and removes the right-most symbol of , we will reach a weight- vertex (say ) in after reverse steps, where is the last 0 in . We can then apply the previous result to see that we can start from to reach , and then, since is obtained from a reverse traversal from , walk from to . A similar argument holds when has weight w.
These arguments show that is strongly connected and thus has an Eulerian circuit. □
Example 5. Consider , , , and . Then, and represent the collection of all length-4 sequences over with weight 2 or 3, where the “weight” of a word is determined by the number of symbols that are either C or G (i.e., the content). The vertices of the de Bruijn graph are all length-3 sequences of weight 1, 2, or 3. For example, consider the vertex , which has weight 1. It has two incoming arcs, one from and the other from . The two outgoing arcs of this vertex point toward and , respectively.
For each vertex of with weight 1 or 3, the wiring associates each of its predecessors to the successor such that the first symbol of equals the last symbol of . For example, the vertex has predecessors and successors , and the wiring pairs with and with . Then, in the split graph , the original vertex is split to two vertices and , where the subscript C or G denotes the first symbol of its only predecessor, which is equal to the last symbol of its only successor. More explicitly, has only one incoming arc from and only one outgoing arc toward , and has only one incoming arc from and only one outgoing arc toward .
We show next that from , we can, for example, reach on this split graph . First, since has weight , we first reversely traverse the graph starting from to reach its only predecessor , which has weight . Then, the weight representations of and read as 100
and 101
, respectively. We then apply the proof steps in [6] Lemma 2.2 and [6] Corollary 2.3 to construct a path P from 100
to 101
on the binary version of the de Bruijn graph . It turns out that this path P induces the sequence 1001101
on , and by an abuse of notation we write . Based on P, we find that the path is a valid path on from to . More explicitly, traverses in the following order: . Note that the weight representation of is the same as that of . Then, we can initiate a walk from to following the path on . These arguments shows that from we can reach through . 6. Generalized Orthogonal Kautz Sequences
Next, we study possible extensions of the concepts of generalized orthogonal de Bruijn sequences to Kautz sequences. More explicitly, we introduce the notions of ℓ-orthogonal Kautz sequences, orthogonal balanced Kautz sequences, and orthogonal fixed-weight Kautz sequences. We start by recalling the definition of Kautz sequences.
Definition 10. Let , and let be an alphabet of size . Define to be the collection of all sequences in that do not have two or more adjacent identical characters (i.e., no run lengths/homopolymers of length at least two). Then, a circular sequence is called a -Kautz sequence if each sequence in appears as a circular substring of exactly once.
Remark 4. Using the terminology in Definition 8, a Kautz sequence can be interpreted as a de Bruijn sequence with respect to the set .
Similar to de Bruijn sequences, Kautz sequences are closely related to
Kautz graphs. Using the notation in Definition 9, a Kautz graph
is a generalized de Bruijn graph with respect to the set
. Consequently, there is a one-to-one correspondence between the
-Kautz sequences and the Eulerian circuits on
[
8]. In addition, there is also a one-to-one correspondence between the
-Kautz sequences and the Hamiltonian cycles on
[
8].
Example 6. Consider the DNA alphabet and . The sequence is a -Kautz sequence since each length-2
sequence of unequal symbols from appears as a circular substring of exactly once. Figure 4a depicts the Kautz graph , which has an Eulerian circuit corresponding to . The wiring induced by that circuit is illustrated in Figure 4b. The sequence can also be represented by a Hamiltonian cycle on the Kautz graph in Figure 4c. This cycle is given by . 6.1. ℓ-Orthogonal Kautz Sequences
By generalizing Definition 4, we describe the notion of ℓ-orthogonality of Kautz sequences as follows. A collection of -Kautz sequences is called ℓ-orthogonal if each sequence in appears at most ℓ times in , where and is an alphabet of size . Similar to the case of de Bruijn sequences, for any collection of -Kautz sequences, the following statements are equivalent:
is ℓ-orthogonal.
The corresponding Eulerian circuits on use each pair of consecutive arcs at most ℓ times.
The corresponding Hamiltonian cycles on use each arc at most ℓ times.
Let be the maximum cardinality of any collection of ℓ-orthogonal -Kautz sequences. By adapting the proofs of Proposition 1 and Theorem 1, we can bound as follows.
Proposition 4. We have . If and , then we further have and .
Sketch of the proof. First observe that every vertex in the Kautz graph has incoming arcs, outgoing arcs, and no loop. The same arguments in the proof of Proposition 1 proves the upper bound on .
Furthermore, note that by adapting the proof of Theorem 1, we actually can establish the following claim: For any Eulerian directed graph G with minimum in/out-degree and any , G has a collection of Eulerian circuits such that each pair of consecutive arcs appears in this collection at most ℓ times, where for and for . The lower bounds on can then be immediately deduced by the fact that the minimum in/out-degree of is . □
Example 7. We verify that the lower bound of in Proposition 4 holds for , , and . Setting these parameters in Proposition 4 gives . Thus, we seek to find a collection of four -Kautz sequences such that each sequence in appears at most twice in . Following the proof of Theorem 1, we partition into and and start with the Eulerian circuit on . After the rewiring process, we obtain the following three -Kautz sequences:It can be checked that the collection is 2
-orthogonal. For example, the string appears twice (in and in ), while the string appears once (in ), and the string does not appear at all. 6.2. Orthogonal Balanced Kautz Sequences
We now generalize Definition 3 in order to define balanced Kautz sequences as follows.
Definition 11. Let k and be as before, and let . We say that a circular sequence is a b-balanced -Kautz sequence if each sequence in appears as a circular substring of exactly b times.
The following terminology is similar to that used in
Section 4. We say that a collection
of
b-balanced
-Kautz sequences is
orthogonal if each sequence in
appears at most once in
. A necessary condition for a
-Kautz sequence
to belong to an orthogonal collection is that
itself contains no sequence in
more than once. In this case, we say that
is self-orthogonal. Then, there is a one-to-one correspondence between self-orthogonal
b-balanced
-Kautz sequences and
b-circuits on
, where the
b-circuits are defined as in Definition 7.
For , we define to be the smallest such that there exist c orthogonal b-balanced -Kautz sequences. We have the following bounds on :
Proposition 5. Assume and . Then we have .
Sketch of the proof. The proof of the lower bound on is similar to that of Proposition 3 and is omitted.
To prove the upper bound, it suffices to prove that has c arc-disjoint b-circuits. We first invoke Lemma 2 to deduce that there are pairwise compatible Eulerian circuits on . These circuits correspond to arc-disjoint Hamiltonian cycles on . Then, we can arbitrarily partition the collection of these cycles into c groups, each having b cycles. The combination of the cycles in each group is a b-circuit on , and the b-circuits combined from different groups are arc-disjoint. Therefore, has c arc-disjoint b-circuits, which proves the upper bound on . □
Remark 5. Since de Bruijn sequences are also characterized by both Eulerian circuits and Hamiltonian cycles, applying the same steps in the proof of Proposition 5 gives , where is defined in Section 4. However, Theorem 2 always gives an upper bound of strictly smaller than since there is always a power of two contained in . 6.3. Orthogonal Fixed-Weight Kautz Sequences
We first define fixed-weight Kautz sequences. Following the definitions in
Section 5, consider the case
with each symbol in
having weight 1 and each symbol in
having zero weight. For
, we then define a fixed-weight Kautz sequence with parameters
as a de Bruijn sequence with respect to
. We have the following conditions on the existence of such sequences:
Proposition 6. A fixed-weight Kautz sequence with parameters exists if and only if and w satisfy one of the following conditions:
- 1.
;
- 2.
;
- 3.
and .
Proof. We first prove that the conditions in Proposition 6 imply the existence of a fixed-weight Kautz sequence with associated parameters. If , then , and the fixed-weight sequences are simply the -Kautz sequences. Similarly, the case , corresponds to -Kautz sequences. Among the four subcases with and , the subcase corresponds to -Kautz sequences since . The remaining three subcases , , and can be seen as a direct consequence of Proposition 7, which we will state and prove later.
Now, assume none of the conditions in Proposition 6 hold. We seek to show that the generalized de Bruijn graph is not Eulerian. Since Condition 3 fails to hold, we have either or . First assume . Since Condition 2 fails to hold and , we can assume . Now, note that the vertex set of is . Thus, we can pick a vertex with weight in the form of , where and . Since has weight , each of its predecessors must start with a weight-1 symbol. Then note that starts with and no vertex can have two adjacent identical symbol, has predecessors. Similarly, each of its successors must end with a weight-1 symbol, but since ends with , has successors. Therefore, the in-degree of is not equal to its out-degree, which implies that is not Eulerian.
The case is handled similarly. Since Condition 1 fails to hold and , we can assume . Then we can pick a vertex with weight w in the form of , where and . A similar argument shows that has in-degree and out-degree , which are not equal. □
Example 8. Let and . First consider , which fails to meet any of the three conditions in Proposition 6. Figure 5a shows the fixed-weight Kautz graph . It can be seen that is not Eulerian since the vertex has one successor and two predecessors, and . As a result, no fixed-weight Kautz sequence with parameters exists. This observation is consistent with Proposition 6. Next, consider the case and , which satisfies Condition 3 in Proposition 6. The fixed-weight Kautz graph is shown in Figure 5b. It can be verified that the sequence can be represented by an Eulerian circuit on . Thus is a fixed-weight Kautz sequence with parameters . This result agrees with Proposition 6. Orthogonality of fixed-weight Kautz sequences with
,
, or
can be deduced from the orthogonality of ordinary Kautz sequences, as explained in [
8]. For other allowed cases of
in Proposition 6, we have the following results.
Proposition 7. Assume that and . Let denote the minimum in/out-degree of the ordinary Kautz graph . We have the following:
has pairwise compatible Eulerian circuits.
has pairwise compatible Eulerian circuits.
has pairwise compatible Eulerian circuits.
Sketch of the proof. We only prove the result for the case of , since the other two cases can be handled similarly.
This proof is similar to that of Theorem 3. Write for simplicity. The vertex set of G is . Any vertex in G with weight 0 has in/out-degree , any vertex with weight has in/out-degree , and any other vertex has in/out-degree . We apply the same vertex-splitting technique in the proof of Theorem 3. For each vertex with weight 0 or , there exist edge-disjoint wirings of . Define the split graphs as follows: is obtained by splitting each vertex with weight 0 or along its jth wiring . Then, as long as each split graph is Eulerian, we can select of them, merge each one back, and then rewire each remaining vertex with weight in one by one via Lemma 2. This process gives pairwise compatible Eulerian circuits.
To show that each split graph is Eulerian, first note that each split vertex has in/out-degree 1 and each unmodified vertex has in/out-degree . Thus, each vertex in has equal in-degree and out-degree. Then, we show that is strongly connected. Note that it suffices to show that the subgraph H of obtained by removing all the split vertices is strongly connected since each split vertex can be either forwardly or reversely traversed to one vertex in H. It is noteworthy that H is the same for all ’s and is identical to the subgraph of obtained by removing all the vertices with weight 0 or .
Let and be two vertices in H. Since the weight of is at least 1, there must be a symbol in that is weighted. That is, there exists some such that . Now consider the walk P on from to that induces the sequence , where are defined as follows:
It can be seen that no two adjacent symbols in the sequence are the same, and thus the walk P is indeed a valid walk on . Furthermore, the weight representation of the sequence is the same as . This implies that the walk P only visits vertices having the same weight as . Therefore, P is also a valid walk in the subgraph H.
Similarly, since the weight of is at most , there exists some such that . Then, consider the vertex , where are defined in a similar way as follows:
It follows that the walk from to that induces the sequence is a valid walk in H. Then, note that we have since and is either or . Similarly, . In particular, we have . Thus, we can walk in from to inducing the sequence . Denote this walk by . Furthermore, any vertex visited by contains the substring . Recall that has weight 1 and has weight 0. As a result, any vertex visited by must have weight between 2 and , implying that the whole walk lies in H. In conclusion, we can walk in H from to , then from to , and then from to . These arguments demonstrate that H is strongly connected, and thus so is each split graph .
These arguments prove the statement of the result. □