Compression Sensitivity of the Burrows–Wheeler Transform and Its Bijective Variant

Hyodam Jeon; Dominik Köppl

doi:10.3390/math13071070

and

Department of Computer Science and Engineering, University of Yamanashi, Kōfu 400-8511, Japan

^*

Author to whom correspondence should be addressed.

Mathematics2025, 13(7), 1070;https://doi.org/10.3390/math13071070

This article belongs to the Section E: Applied Mathematics

Version Notes

Order Reprints

Abstract

The Burrows–Wheeler Transform (BWT) is a widely used reversible data compression method, forming the foundation of various compression algorithms and indexing structures. Prior research has analyzed the sensitivity of compression methods and repetitiveness measures to single-character edits, particularly in binary alphabets. However, the impact of such modifications on the compression efficiency of the bijective variant of BWT (BBWT) remains largely unexplored. This study extends previous work by examining the compression sensitivity of both BWT and BBWT when applied to larger alphabets, including alphabet reordering. We establish theoretical bounds on the increase in compression size due to character modifications in structured sequences such as Fibonacci words. Our devised lower bounds put the sensitivity of BBWT on the same scale as of BWT, with compression size changes exhibiting logarithmic multiplicative growth and square-root additive growth patterns depending on the edit type and the input data. These findings contribute to a deeper understanding of repetitiveness measures.

Keywords:

lossless data compression; Burrows–Wheeler Transform (BWT); bijective BWT (BBWT); compression sensitivity; string transformations; Fibonacci words; Lyndon factorization; compression efficiency analysis

MSC:

68P30

1. Introduction

The Burrows–Wheeler transform (BWT) [1] has attracted great attention in interdisciplinary fields such as lossless data compression and text indexing. It lies at the heart of compression algorithms like bzip2 and text indexing data structures such as the FM-index [2]. By compressing single character runs of the BWT, we obtain a compressed but reversible transformation, which can be augmented with techniques akin to the FM-index to give rise to compressed text indices [3,4,5,6,7,8]. Because of its reversible nature, the BWT is also used in bioinformatics applications such as sequence alignment and genome assembly [9,10]. Workshops (e.g., [11,12]) and books (e.g., [13]) have been dedicated exclusively to the BWT and its applications.

Given word T of length n, the BWT of T is a permutation of its characters. In detail, we sort all cyclic conjugates of T lexicographically and concatenate the last characters of these conjugates to form the BWT of T. The BWT is a reversible transformation by application of the so-called Gessel–Reutenauer transform [14].

Among various variants of the BWT (e.g., [15,16,17,18,19]), the bijective BWT (BBWT) [20] can be considered as one of the well-perceived ones that is a word isomorphism. A word isomorphism maps a word to another word injectively, and each word is a unique image of another word. For instance, this is not the case for the BWT, whether we add additional information such as an artificial delimiter (known as the $ character) or a starting position, cf. [21,22,23].

In this article, we focus on the run-length compression of the BWT and the BBWT: run-length compression is usually the first step in the compression pipeline of the BWT and its variants. In addition, compressed text indices such as the r-index [4] store the BWT in a run-length compressed form. The run-length compression of word T is the number of maximal runs of equal characters in T. For instance, the word

mississippi

can be written in an exponential notation as

m^{1} i^{1} s^{2} i^{1} s^{2} i^{1} p^{2} i^{1}

and therefore has eight runs. We denote the run-length compression of word T by

runs (T)

. Given word T, we define the following two repetitiveness measures:

$r = r (T) = runs (BWT (T))$ and
$ρ = ρ (T) = runs (BBWT (T))$ .

In this article, we investigate the sensitivity of the BWT and the BBWT to single-character edits. This means that we analyze how the run-length compression of the BWT and the BBWT changes when we modify a single character of the input word. Previous research has shown that the run-length compression of the BWT is sensitive to single-character edits in binary alphabets [24]. Here, we extend this research to larger alphabets and analyze the sensitivity of the BBWT to single-character edits. Research on compression sensitivity is not a new topic, of which we are aware. We present following related work.

2. Related Work and Contribution

The sensitivity [25] of a repetitiveness measure m is the maximum difference in the sizes of

m (T)

for word T and for a single-character edited word

T^{'}

. Sensitivity measures the robustness of a repetitiveness measure against small changes in the input word introduced by various sources of input (source code changes, biological sequencing errors, typos, etc.). Akagi et al. [25] reviewed known results that directly imply a sensitivity for repetitiveness measures such as for Lempel–Ziv 78 [26] or the BWT [24]. Additionally, they offered and improved upper and lower bounds on the multiplicative sensitivity of various compressors and measures including the Lempel–Ziv dictionary compressors [27,28] and the smallest string attractors [29].

In detail, for two words

W_{1}

and

W_{2}

, we let

ed (W_{1}, W_{2})

denote the edit distance between

W_{1}

and

W_{2}

. We define the additive sensitivity

{AS}_{m}

and multiplicative sensitivity

{MS}_{m}

of a repetitiveness measure

m

by

${AS}_{m} (n) = {max}_{W_{1} \in Σ^{n}} \{m (W_{2}) - m (W_{1}) ∣ W_{2} \in Σ^{*} : ed (W_{1}, W_{2}) = 1\}$ , and
${MS}_{m} (n) = {max}_{W_{1} \in Σ^{n}} \{\frac{m (W_{2})}{m (W_{1})} ∣ W_{2} \in Σ^{*} : ed (W_{1}, W_{2}) = 1\}$ .

The sensitivity has been studied for lexparse [30] by Nakashima et al. [31] and for the size of the compact directed acyclic word graph [32] by Fujimaru et al. [33]. In particular, Giuliani et al. [24] showed that

{MS}_{r} (n) = Ω (log n)

and

{AS}_{r} (n) = Ω (\sqrt{log n})

.

Our contribution. In this article, we show identical results for the BBWT, confirming that it is also sensitive to single-character edits. Concretely, we establish that

{MS}_{ρ} (n) = Ω (log n)

with Theorem 5 and

{AS}_{ρ} (n) = Ω (\sqrt{log n})

with Lemma 47. In detail, we obtain the asymptotically same results regarding

{MS}_{ρ} (n)

:

in Theorem 5 for deletion,
in Theorem 6 and Theorem 7 for substituting a character with a smaller or larger one, respectively, and
in Theorem 8 and Theorem 9 for insertion of $a$ or a strictly smaller character #, respectively.

We also obtain the asymptotically same results regarding

{AS}_{ρ} (n)

:

in Theorem 10 for deletion,
in Theorem 12 for inserting a large character, and
in Theorem 11 and Theorem 13 for substituting a character with a smaller or larger one, respectively.

Additionally, we broaden the study of the sensitivity of the BWT by allowing larger alphabets (Theorem 2) and alphabet reordering (Theorem 4), obtaining the same asymptotic complexities as reported by Giuliani et al. [24].

Since our major contribution is on the BBWT, we also briefly review known results related to it.

BBWT. Since its inception [20], the BBWT has been studied under various aspects. We are aware of construction algorithms (cf. [34] or [35] and the references therein), indexes [35] based on the BBWT, studies about the relationship of

ρ

and r [36],

ρ (T)

and

ρ

of the reverse of T [37].

3. Preliminaries

In this section, we provide the necessary definitions and terminology used throughout the paper. A list of symbols is given in Table 1.

Words. We let

Σ

be a finite and ordered alphabet with cardinality

σ

. The elements of

Σ

are called characters. A word over

Σ

is a finite sequence

W = W [0] W [1] \dots W [n - 1] = W [0 . . n - 1]

of characters from

Σ

. The order of the alphabet induces the lexicographic order on words, which we also denote by

≺_{lex}

.

We denote the length of W by

| W |

, with

ε

being the unique word of length 0. We denote the set of words of length n by

Σ^{n}

, and represent the set of all words on

Σ

by

Σ^{*} = ⋃_{n \geq 0} Σ^{n}

. Given word

W = W [0 . . n - 1]

, we define its reverse by

rev (W) = W [n - 1] W [n - 2] \dots W [0]

. If

W = X Y Z

for words

W, X, Y, Z

, then

X, Y, Z

are, respectively, a prefix, a subword, and a suffix of W. We call word

W^{'}

a conjugate of W if and only if there is integer

i \in [0 . . | W | - 1]

such that

W^{'} = W [i . . | W |] W [0 . . i - 1]

. In this case, we write

W^{'} = {conj}_{i} (W)

. In particular,

W = {conj}_{0} (W)

. We call word U a circular factor of word W if it is a prefix of

{conj}_{i} (W)

for some

i \in [0 . . | W | - 1]

; in this case, we call i (the starting position of) an occurrence of U. If we can express word W as

W = V^{k}

for word V and integer

k \geq 2

, then we call W a power, otherwise we call W primitive. Finally, W is primitive if and only if it has

| W |

distinct conjugates.

Given two words

V, W

, the longest common prefix of V and W, denoted

lcp (V, W)

, is the unique word U such that U is a prefix of both V and W, and

V [| U |] \neq W [| U |]

if neither of the two words is a prefix of the other.

The Burrows–Wheeler Transform (BWT). We define the BWT of word W based on its conjugates. For that, we define two concepts, an order and a list of conjugates sorted in that order. First, the omega-order [16] of two words T and S as follows:

T ≺_{ω} S

if either

T^{ω} ≺ S^{ω}

or

T^{ω} = S^{ω}

and

| T | < | S |

. Here,

S^{ω}

denotes the infinite word obtained by concatenating word S an infinite number of times. The omega-order coincides with the lexicographic order if neither of two words is a proper prefix of the other but may differ otherwise. Second, we let

M (W)

be the list of sorted conjugates of word W in omega-order.

Now, we can define the Burrows–Wheeler Transform (BWT) [1] of the word W, denoted by

BWT (W)

, as the word obtained by reading the last character of each conjugate in

M (W)

.

For instance, the BWT of word

mississippi

is

pssmipissii

. By construction, it follows that W and

W^{'}

are conjugates if and only if

BWT (W) = BWT (W^{'})

. We denote by

r (W) = runs (BWT (W))

the number of runs in the BWT of word W. For example,

r (mississippi) = runs (pssmipissii) = 8

.

Table 1. Definitions of symbols introduced in this article.

Symbol	Meaning
r	run length of the BWT
ρ	run length of the bijective BWT
n	length
k	index
#	a character lexicographically smaller than $a$
$c$	a character lexicographically larger than $b$
$F_{k}$	kth Fibonacci word
$f_{k}$	kth Fibonacci number
$X_{k}$	kth central word
$L_{k}$	kth Lyndon Fibonacci word
$F_{k}^{♭}$	kth Fibonacci word deleting its last character
$L_{k}^{♭}$	kth Fibonacci Lyndon word deleting its last character
$P_{k}$	${ab}^{k} aa$
$E_{k}$	${ab}^{k} {aba}^{k - 2}$
$Q_{k}$	${ab}^{k} a$
$Q_{k}^{♭}$	${ab}^{k}$
$W_{k}$	$(\prod_{i = 2}^{k - 1} P_{i} E_{i}) Q_{k}$
$W_{2 k}^{♭}$	$W_{2 k}$ deleting its last character
$\bar{P_{k}}$	${ba}^{k} bb$
$\bar{E_{k}}$	${ba}^{k} {bab}^{k - 2}$
$\bar{Q_{k}}$	${ba}^{k} b$
$\bar{Q_{k}^{♭}}$	${ba}^{k}$
$\bar{W_{k}}$	$(\prod_{i = 2}^{k - 1} \bar{P_{i}} \bar{E_{i}}) \bar{Q_{k}}$
$\bar{W_{k}^{♭}}$	$W_{k}^{'}$ deleting the last character
$C_{k}$	Lyndon word of $W_{k}$
$C_{k}^{♭}$	$C_{k}$ deleting its last character $b$
$D_{k}$	$C_{k}^{♭}$ deleting its last character $a$
$H_{k - 1}$	$E_{k - 1}$ changed into ${ab}^{k - 1} a^{k - 3}$
$S_{k - 1}$	$E_{k - 1}$ changed into ${ab}^{k - 1} {abca}^{k - 3}$
$R_{k - 1}$	$E_{k - 1}$ changed into ${ab}^{k - 1} {aca}^{k - 3}$
$β (W)$	subword of BWT(W) corresponding to the range of contiguous conjugates prefixed by W
$β^{'} (W)$	subword of BWT(W) applied to a specific edit operation
$α$	$f_{2 k - 3} + f_{2 k - 5} + \dots + f_{3} + f_{1}$
$M (W)$	the list of lexicographically sorted conjugates of word W

Lyndon Words. A word is called a Lyndon word if it is lexicographically strictly smaller than all of its conjugates [38]. In particular, a Lyndon word must be primitive. Each primitive word S has exactly one conjugate that is Lyndon. We denote this conjugate by

LynConj (S)

and call it the Lyndon conjugate of S. The Lyndon factorization [39] of word W is a unique factorization of W into Lyndon words. In detail, it decomposes word W into a list of Lyndon words

S_{1}^{e_{1}}, S_{2}^{e_{2}}, \dots, S_{m}^{e_{m}}

such that

W = S_{1}^{e_{1}} S_{2}^{e_{2}} \dots S_{m}^{e_{m}}

, where

S_{m} ≺_{lex} S_{m - 1} ≺_{lex} \dots ≺_{lex} S_{1}

and

e_{i} \geq 1

. By construction, word S is Lyndon if and only if its Lyndon factorization consists of only one factor, i.e., S itself. We denote the multiset of Lyndon factors in the Lyndon factorization of S by

L (S)

. As an example, we consider

LynConj (mississippi) = imississipp

. The Lyndon factorization of

mississippi

is

m \cdot {iss}^{2} \cdot ipp \cdot i

. We have

L (mississippi) = {m, iss, iss, ipp, i}

.

Bijective BWT (BBWT). The Bijective BWT (BBWT) [20] of word T is the word obtained by sorting all conjugates of the Lyndon factors in the multiset

L (T)

in

ω

-order and then concatenating the last character of each sorted conjugate. For example, the BBWT of the word

mississippi

is

ipssmpissii

. In this article, we denote

ρ (W)

as the compression ratio of BBWT, which means

ρ (W) = runs (BBWT (W))

. For instance,

ρ (mississippi) = runs (ipssmpissii) = 8

.

Fibonacci Words. Fibonacci words are so-called standard words ([40], Section 10.1), which are defined as follows.

F_{0} = b

,

F_{1} = a

,

F_{k + 1} = F_{k} F_{k - 1},

for every

k \geq 1

. For all

k \geq 0

,

| F_{k} | = f_{k}

, where

{f_{k}}_{k \geq 0}

are the Fibonacci numbers

1, 1, 2, 3, 5, 8, 13, 21, \dots

, defined by the recurrence

f_{0} = f_{1} = 1,

f_{k + 1} = f_{k} + f_{k - 1},

for

k \geq 1

. Since Fibonacci numbers grow exponentially in k, we have

k = Θ (log | F_{k} |)

. We also introduce so-called central words [41]

X_{k}

for

k \geq 2

, which are palindromes defined by equation

F_{2 k} = X_{2 k} ab, F_{2 k + 1} = X_{2 k + 1} ba

for all

k \geq 1

. The central words

X_{2 k}

and

X_{2 k + 1}

are palindromes. In particular,

X_{2} = ε

. The recursive structure of words

X_{2 k}

and

X_{2 k + 1}

is also known [42]:

$X_{2 k} = X_{2 k - 1} ba X_{2 k - 2} = X_{2 k - 2} ab X_{2 k - 1}$ and
$X_{2 k + 1} = X_{2 k} ab x_{2 k - 1} = X_{2 k - 1} ba X_{2 k}$ .

We study Fibonacci words in this article because they have the minimal number of BWT runs among binary words. This is because Mantaci et al. [43] have shown that the BWT of a binary word has exactly two runs if and only if it is a conjugate of a standard word or a conjugate of a power of a standard word. Further, there is rich literature (e.g., [44,45,46]) about Fibonacci words and their rotations.

4. Multiplicative Sensitivity of $r$ by $Ω (log n)$

As a startup, we follow the steps of (Giuliani et al. [24], Section 3), who studied a family of Fibonacci word-related words for which they could observe a multiplicative sensitivity of

Θ (log n)

for the number of character runs in the BWT. We here show a similar result, but use a new character (#) instead of one already appearing in the binary word. To facilitate notation, we write < for

<_{ω}

when sorting conjugates. We build our proofs on the insights from the following results from the literature.

Lemma 1.

(Remark 11 from [16]). All conjugates of a word have the same BWT.

Lemma 2.

(Proposition 4 of [24]). We let

F_{2 k}^{♭}

be a word that removes the last character of

F_{2 k}

, then

r (F_{2 k}^{♭}) = 2 k

.

Lemma 3.

(Lemma 7 of [24]).

{conj}_{n - 3} (F_{2 k}^{♭})

is the smallest conjugate in

M (F_{2 k}^{♭})

.

Lemma 4.

We let

v \in Σ^{*}

be a Lyndon word of

F_{2 k}^{♭}

that contains at least two distinct characters and let # be a character that does not occur in v. Then,

r (v) \leq r (# v) = r (v #) \leq r (v) + 2

.

Proof .

We refer to

{conj}_{n - 3} (F_{2 k}^{♭})

from Lemma 3 as v here if only

0 \leq i, j \leq f_{2 k} - 1

. The conjugates of v with index i and j are

{conj}_{i} (v)

,

{conj}_{j} (v)

, respectively. Also, we set the lexicographic order between two conjugates as

{conj}_{i} (v) < {conj}_{j} (v)

; thus,

v [i . . | v | - 1] v [0 . . i - 1] < v [j . . | v | - 1] v [0 . . j - 1]

. We prove this separately in two cases, where Figure 1 sketches the setting.

Figure 1. Sketch of the setting

{conj}_{i} (v) < {conj}_{j} (v)

considered in the proof of Lemma 4.

Case 1:: |lcp( ${conj}_{i} (v), {conj}_{j} (v)$ )| < min( $| v | - i + 1, | v | - j + 1$ );
Case 2:: |lcp( ${conj}_{i} (v), {conj}_{j} (v)$ )| > min( $| v | - i + 1, | v | - j + 1$ ).

The red rectangle in Figure 2 is an example of a common prefix of

{conj}_{i} (v)

and

c o n j_{j} (v)

. In Case 1, it is

{conj}_{i} (v) < {conj}_{j} (v)

, meaning that the character of

{conj}_{i} (v)

in position |lcp(

{conj}_{i} (v), {conj}_{j} (v)

)|+1 is smaller than in the one in the same position in

{conj}_{j} (v)

. Thus, inserting # in position

| v |

does not change the lexicographic order between

{conj}_{i} (v)

and

{conj}_{j} (v)

. The order is preserved.

Figure 2. Illustration of the first case in Lemma 4. Inserting # does not change the lexicographic order between

{conj}_{i} (v)

and

{conj}_{j} (v)

.

The red rectangle in Figure 3 depicting the longest common prefix of the two strings in question is longer than |lcp(

{conj}_{i} (v), {conj}_{j} (v)

)|. In Case 2, it must be

i > j

, which means

| v [i . . | v | - 1] | < | v [j . . | v | - 1] |

. When it is

i < j

, then

| v [j . . | v | - 1] | < | v [i . . | v | - 1] |

, meaning that # appears first in

{conj}_{j} (v)

. As a result,

{conj}_{j} (v) < {conj}_{i} (v)

, which contradicts

{conj}_{i} (v) < {conj}_{j} (v)

. Thus, in Case 2, we only consider when it is

i > j

, as illustrated in Figure 3.

Figure 3. Illustration of the second case in Lemma 4.

Furthermore, we distinguish the second case between two subcases: We let u be unique circular factor which is smaller than all the other circular factors having the same length in v.

Case 2 (a):: when u is a prefix of $v [i . . | v | - 1]$ ;
Case 2 (b):: when $v [0 . . i - 1]$ is a prefix of u.

When it is Case 2 (a), u appears only in the prefix of

v [0 . . i - 1]

. Thus, the first difference between

{conj}_{i} (v)

and

{conj}_{j} (v)

lies within the unique occurrence of u. The situation is depicted at Figure 4. After inserting the #,

{conj}_{i} (v)

becomes

v [i . . | v | - 1] # v [0 . . i - 1]

, creating factor

# u

at position

| v | - i + 1

, which is not only unique but also smallest among other factors of length

| # u |

in v. Any factor that appears in the same position in

v [j . . | v | - 1] # u

is greater than

# u

. Thus, the order is preserved.

Figure 4. Illustration of Case 2 (a) in Lemma 4. Inserting # in does not affect lexicographic order between

{conj}_{i} (v)

and

{conj}_{j} (v)

.

In Case 2 (a), u is the smallest prefix which appears only once in

v [0 . . i - 1]

. v is a Lyndon word; thus,

v [0 . . i - 1]

, it is also the smallest factor in v. However, in Case 2 (b), u is longer than

v [0 . . i - 1]

. We sketch the situation in Figure 5, where we visualize u with a purple rectangle. Therefore,

v [0 . . i - 1]

must appear more than twice in u. If

v [0 . . i - 1]

appears only once, u is analogous with

v [0 . . i - 1]

. Also, from

{conj}_{i} (v) \neq {conj}_{j} (v)

, there must be a difference in

v [0 . . i - 1]

. Moreover, since v is primitive, v cannot be expressed in the form

Z^{k}

for word Z and a integer

k \geq 2

. The first distinct character between

{conj}_{i} (v) and {conj}_{j} (v)

is within

{conj}_{i} (v) [| v | - i + 1 . . | v | - 1]

. We assume otherwise that there is no mismatching character pair with

v [0 . . i - 1]

and the prefix of

{conj}_{i} (v)

, which is

v [i . . 2 i - 1]

. Since

v [0 . . i - 1] = v [i . . [2 i - 1]

,

{conj}_{i} (v)

also has a smallest prefix and it contradicts with v, which is one and only Lyndon word. Moreover,

{conj}_{i} (v)

becomes

v [0 . . i - 1] v [i . . 2 i - 1] \dots = v {[0 . . i - 1]}^{2} \dots

, thus contradicting its primitivity.

Figure 5. Illustration of Case 2 (b) in Lemma 4.

In this way, after inserting a #, the analogous behavior of Case 2 (a) is observed.

The order of original conjugates of v is preserved with respect to the original BWT according to the cases above. Thus, the only difference in inserting # in v occurs in conjugates of

# v

and

v #

. On the one hand, we observe that

# v

is now the smallest among all conjugates of

M (# v)

, and it ends with the last character of v. On the other hand,

v #

becomes the second smallest conjugate and ends with #. Hence, we have BWT(

# v

) =

BWT (v) [0] \cdot # \cdot BWT (v) [1 . . | v | - 1]

, which concludes the proof. □

Theorem 1.

We let

F_{2 k}

be the Fibonacci word of even order

2 k > 4

, and

f_{2 k} = | F_{2 k} |

. We let

F_{2 k}^{♭}

be the word that results from substituting a

b

by a # at position

f_{2 k} - 1

. Then,

r (F_{2 k}^{♭} #) = 2 k + 2

.

Proof.

We let

S = F_{2 k}^{♭} #

. From Lemma 2,

r (F_{2 k}^{♭}) = 2 k

. And by Lemma 3, we know that

{conj}_{n - 3} (F_{2 k}^{♭})

is the smallest conjugate among

M (F_{2 k}^{♭})

. By Lemma 4, we have

2 k \leq r (# {conj}_{n - 3} (F_{2 k}^{♭})) \leq 2 k + 2

. More precisely, it is

2 k + 2

since

# {conj}_{n - 3} (F_{2 k}^{♭})

is the smallest conjugate in

M (F_{2 k}^{♭} #)

and

{conj}_{n - 3} (F_{2 k}^{♭}) #

is the second smallest conjugate. The relative order among the conjugates of

# {conj}_{n - 3} (F_{2 k}^{♭})

coincides with that of the conjugates of

F_{2 k}^{♭}

, using the same argument as in the proof of Lemma 4. This means that to obtain

BWT (S)

, it suffices to insert a # between the first two

b

s in

BWT (F_{2 k}^{♭})

. Since

r (# {conj}_{n - 3} (F_{2 k}^{♭})) = r (# F_{2 k}^{♭}) = r (F_{2 k}^{♭} #)

, we obtain the claim. □

5. Additive Sensitivity of $r$ by $Ω (\sqrt{n})$

In Section 4, we presented a word such that substituting one of its characters by #, which is strictly lexicographically smaller than all its characters, resulted in a logarithmic multiplicative increase in the number of runs r in the BWT. We now follow (Giuliani et al. [24], Section 4), who presented a family of words where a single edit can produce an additive increase of

Θ (\sqrt{n})

in r. Like before, we want to study the sensitivity when introducing a new character (#) in Section 5.1 or additionally when inverting the order of the alphabet in Section 5.2.

Definition 1.

For any

k > 5

, we let

P_{k} = {ab}^{k} aa

and

E_{k} = {ab}^{k} {aba}^{k - 2}

for all

i \in [2 . . k - 1]

, and

Q_{k} = {ab}^{k} a

. Then,

W_{k} = (\prod_{i = 2}^{k - 1} P_{i} E_{i}) Q_{k} = (\prod_{i = 2}^{k - 1} {ab}^{i} {aaab}^{i} {aba}^{i - 2}) {ab}^{k} a .

(1)

The length of these words is

n = \sum_{i = 2}^{k - 1} (3 i + 4) + (k + 2) = \frac{3 k^{2} + 7 k}{2} - 9 .

(2)

Thus,

k = Θ (\sqrt{n})

.

W_{k}^{♭}

is

W_{k}^{♭} = (\prod_{i = 2}^{k - 1} P_{i} E_{i}) {ab}^{k} = (\prod_{i = 2}^{k - 1} {ab}^{i} {aaab}^{i} {aba}^{i - 2}) {ab}^{k} .

(3)

We append #, which is lexicographically smaller than character

a

at the last part of

W_{k}^{♭}

and name the resulting word

W_{k}^{♭} #

.

Also,

W_{k}

, with its characters

a

and

b

swapped, is defined as

\bar{W_{k}}

, which is

\bar{W_{k}} = (\prod_{i = 2}^{k - 1} \bar{P_{i}} \bar{E_{i}}) \bar{Q_{k}} = (\prod_{i = 2}^{k - 1} {ba}^{i} {bbba}^{i} {bab}^{i - 2}) {ba}^{k} b .

(4)

To characterize the BWT of words

W_{k}^{♭} #

,

\bar{W_{k}}

and

\bar{W_{k}^{♭}} c

, we partition each of the BWT conjugates

M (W_{k}^{♭} #)

,

M (\bar{W_{k}})

,

M (\bar{W_{k}^{♭}} c)

into distinct groups of consecutive conjugates having identical prefixes and define the subword of BWT(

W_{k}

) corresponding to each of these prefixes.

Given

X \in Σ^{*}

, we denote by

β (X, W_{k})

the subword of BWT

(W_{k})

corresponding to the range of contiguous conjugates prefixed by X. We omit the second parameter of

β (X, W_{k})

when it is clear from the context.

β (X)

is the concatenation of the last characters of conjugates with prefix X. For example, when X is

banana

, there are two conjugates starting with the prefix

an

which are

ananab

and

anaban

; thus,

β

(

an

) of

banana

is

bn

.

Lemma 5.

In Proposition 28 of [24], it is already known that

r (W_{k}) = 6 k - 12

.

5.1. BWT of $W_{k}$ After Substituting a Character

The lemmas presented below characterize the BWT of

W_{k}

after certain modifications have been applied. Rather than deriving the entire structure of the BWT from scratch, we analyze how replacing a character affects either the relative order or the final character of the conjugates of

W_{k}

. We let

M (W_{k}^{♭} #)

be the list of lexicographically sorted conjugates of the word

W_{k}^{♭} #

.

Lemma 6.

β (#, W_{k}^{♭} #) = b

.

Proof.

The first conjugate in

M (W_{k}^{♭} #)

is

# P_{2} \dots b

. Since the lexicographic order of # is smaller than all other characters, a conjugate starting with # is smaller than every conjugate starting with

a

. # can be obtained by the last character of

W_{k}^{♭} #

, which is preceded by a

b

. □

Lemma 7.

β (a^{i} b, W_{k}^{♭} #)

=

{ba}^{k - i - 2}

for all

i \in [4 . . k - 2]

.

Proof.

Given integer

i \in [4 . . k - 2]

, the conjugates of

M (W_{k}^{♭} #)

starting with

a^{i} b

are

\begin{matrix} a^{i - 1} P_{i + 2} \dots b < a^{i - 1} P_{i + 3} \dots a < \dots < a^{i - 1} P_{k - 1} \dots a < a^{i - 1} Q_{k}^{♭} # \dots a . \end{matrix}

In

M (W_{k}^{♭} #)

, a prefix

a^{i} b

can only be obtained by concatenation of the suffix

a^{i - 2}

of

E_{i}

, with the prefix

ab

of

P_{i + 1}

or the prefix of

ab

of

Q_{k}^{♭} #

if

i = k

. Note that all these conjugates end with an

a

, with the exception of the conjugate starting with

a^{i - 1} P_{i + 1}

, since this is where the unique occurrence of

{ba}^{i - 1} b

can be found. □

Lemma 8.

β (aaab, W_{k}^{♭} #) = b^{5} {(ab)}^{k - 6} a

.

Proof.

The conjugates in

M (W_{k}^{♭} #)

starting with

aaab

are

\begin{matrix} aa E_{2} \dots b & < aa E_{3} \dots b < aa E_{4} \dots b < aa P_{5} \dots b < aa E_{5} \dots b \\ < aa P_{6} \dots a < aa E_{6} \dots b < \dots < aa P_{k - 1} \dots a < aa E_{k - 1} \dots b \\ < aa Q_{k}^{♭} # \dots a . \end{matrix}

In

M (W_{k}^{♭} #)

, the conjugates that start with

aaab

can be obtained for all

i \in [4 . . k - 1]

from the concatenation of the suffix

aa

from

E_{i}

with

P_{i + 1}

or with

Q_{k}^{♭} #

if

i = k

. If

i \in [2 . . k - 1]

, concatenation of the suffix

aa

of

P_{i}

with the prefix

ab

of

E_{i}

also makes

aaab

. Also, we can sort the conjugates with following order:

⋃_{i = 2}^{4} {aa E_{i}} \cup ⋃_{i = 5}^{k - 1} {aa P_{i} aa E_{i}} \cup {aa Q_{k}^{♭} #}

. All conjugates of

aa E_{i}

end with a

b

and if

i \in [5 . . k - 2]

,

aa

of

E_{i}

concatenated with

P_{i + 1}

or

Q_{k}^{♭} #

if

i = j

also ends with

a

. On the other hand,

aa P_{5}

ends with

b

. □

Lemma 9.

β (aab, W_{k}^{♭} #) = {aaba}^{2 k - 8}

.

Proof.

The conjugates in

M (W_{k}^{♭} #)

starting with

aab

are

\begin{matrix} a E_{2} \dots a & < a E_{3} \dots a < a P_{4} \dots b \\ < a E_{4} \dots a < a P_{5} \dots a < a E_{5} \dots a < \dots < a P_{k - 1} \dots a < a E_{k - 1} \dots a < a Q_{k}^{♭} # \dots a . \end{matrix}

Each of the conjugates starting with

aaab

from Lemma 8 induces a conjugate starting with

aab

, obtained by shifting on the left one character

a

. It follows that all of these conjugates end with an

a

. The other conjugates that start with

aab

are those obtained by concatenating the suffix

a

of

E_{3}

with the prefix

ab

of

P_{4}

which ends with

b

. □

Lemma 10.

β (ab, W_{k}^{♭} #) = b^{k - 2} # {aba}^{2 k - 6}

.

Proof.

The conjugates in

M (W_{k}^{♭} #)

starting with the prefix

ab

are

\begin{matrix} {aba}^{k - 3} Q_{k}^{♭} # \dots b & < {aba}^{k - 4} P_{k - 1} \dots b < \dots < ab P_{3} \dots b \\ < P_{2} \dots # < E_{2} \dots a < P_{3} \dots b \\ < E_{3} \dots a < P_{4} \dots a < E_{4} \dots a < \dots < P_{k - 1} \dots a < E_{k - 1} \dots a < Q_{k}^{♭} # \dots a . \end{matrix}

For all two distinct integers

i, i^{'}

with

i > i^{'} \geq 0

, we have

{aba}^{i} b < {aba}^{i^{'}} b

. Thus, the first conjugate in the lexicographic order starting with

ab

is the one followed by the longest

a

. The smallest of these conjugates can be found from the suffix

{aba}^{k - 3} b

of

E_{k - 1}

, followed by the suffix

{aba}^{i - 2} b

of

E_{i}

for all

2 \leq i \leq k - 2

taken in decreasing order.

By construction of

E_{i}

, for all

2 \leq i \leq k - 1

, these conjugates must end in a

b

. The remaining conjugates starting with

ab

are exactly those of either

P_{i}

or

E_{i}

, for all

2 \leq i \leq k - 1

, or

Q_{k}^{♭} #

. The conjugates can be obtained by shifting on the left one character

a

from the conjugates starting with

aab

from Lemma 9, with the exception of one starting with

P_{3}

since it ends with a

b

, and the other starting with

P_{2}

which ends with #, while the other conjugates end with an

a

. □

Lemma 11.

β (b^{i} #, W_{k}^{♭} #) = b

for all

1 \leq i \leq k - 1

.

Proof.

The conjugate in

M (W_{k}^{♭} #)

starting with

b^{i} #

for all

1 \leq i \leq k - 1

is

b^{i} # P_{2} \dots b

. This conjugate can be obtained by a suffix of

Q_{k}^{♭} #

, and is always preceded by a

b

. □

Lemma 12.

β (ba, W_{k}^{♭} #) = a^{k - 5} {bbbab}^{k - 5} {ab}^{k - 2} a

.

Proof.

The conjugates in

M (W_{k}^{♭} #)

starting with

ba

are

\begin{matrix} {ba}^{k - 3} Q_{k}^{♭} # \dots a & < {ba}^{k - 4} P_{k - 1} \dots a < \dots < {ba}^{3} P_{6} \dots a \\ < baa E_{2} \dots b < baa E_{3} \dots b < baa E_{4} \dots b < baa P_{5} \dots a \\ < baa E_{5} \dots b < baa E_{6} \dots b < \dots < baa E_{k - 1} \dots b < ba P_{4} \dots a \\ < {baba}^{k - 3} Q_{k}^{♭} # \dots b < {baba}^{k - 4} P_{k - 1} \dots b < \dots < bab P_{3} \dots b < babbbaa \dots a . \end{matrix}

We have as many circular occurrences of

ba

as the number of maximal character runs of

b

in

W_{k}^{♭} #

. Then, for all

2 \leq i \leq k - 1

,

Case 1:: one run of $b$ in $P_{i}$ and
Case 2:: two runs in $E_{i}$ .

For Case 1, we have one conjugate starting with

baa E_{i}

for each

i \in [2 . . k - 1]

. Since each run of

b

within each word of

P_{i}

is of length of at least 2, all conjugates in (1) end with

b

. For Case 2, for all

i \in [2 . . k - 1]

we can distinguish between two subcases, based on where

ba

starts:

Case 2 (a):: from the first run of $b$ in $E_{i}$ , which is ${baba}^{i - 2} P_{i + 1}$ when $i \in [2 . . k - 2]$ or ${baba}^{k - 3} Q_{k}^{♭} #$ if $i = k - 1$ . Since $b$ has at least 2 runs, conjugates with prefix (2.1) always end with $b$ .
Case 2 (b):: from run ${ba}^{i - 3} P_{i + 1}$ for all $i \in [2 . . k - 2]$ , and ${ba}^{k - 3} Q_{k}^{♭} #$ . Each conjugate of Case 2 (b) is obtained by shifting two characters to the right in each conjugate in Case 2 (a). Therefore, these conjugates end with an $a$ .

Observe that only for Case 2 (b) we have conjugates starting with

baaaa

. Hence, the first conjugate in the lexicographic order is the one starting with

{ba}^{k - 3} Q_{k}^{♭} #

, followed by those starting with

{ba}^{k - 4} P_{k - 1} < {ba}^{k - 5} P_{k - 2} < \dots < baaa P_{6}

.

Among the remaining conjugates, those that have the prefix

baaa

start with

baa P_{5}

from Case 2 (b) or

baa E_{i}

from Case 2 (a). Thus, we can sort them according to lexicographic order. Then, the remaining conjugates, which start with

baa

, are obtained by

ba P_{3}

only. Finally, let us focus on the conjugates from Case 2 (a), which start with

ba

. These conjugates are sorted according to the length of the runs of

a

s following the common prefix

bab

, similarly to the sorting of conjugates from Case 2 (b). The last conjugate left is the one starting with

b P_{3}

from Case 2 (b). Since

b P_{3}

is lexicographically greater than all other cases, this is the greatest conjugate of

W_{k}^{♭} #

starting with

ba

and we can conclude our claim. □

Lemma 13.

β (b^{j} a, W_{k}^{♭} #) = a b^{2 k - 2 j - 2} a

for all

2 \leq j \leq k - 2

.

Proof.

The conjugates starting with

{ba}^{i}

with integer

2 \leq j \leq k - 2

in

M (W_{k}^{♭} #)

are

\begin{matrix} b^{i} aa E_{i} \dots a & < b^{i} aa E_{i + 1} \dots b < \dots < b^{i} aa E_{k - 1} \dots b \\ < b^{i} {aba}^{k - 3} Q_{k}^{♭} # \dots b < b^{i} {aba}^{k - 4} P_{k - 1} \dots b < \dots < b^{i} {aba}^{i - 1} P_{i + 2} \dots b \\ < b^{i} {aba}^{i - 2} P_{i + 1} \dots a . \end{matrix}

All runs of

b

of length at least

2 \leq i \leq k - 2

appear in either

Case 1:: $P_{i}$ or
Case 2:: $E_{i}$ for all $i \leq j \leq k - 1$ .

Let us consider these two cases separately. For all

i \leq j \leq k - 1

, the conjugate starting within

P_{j}

has prefix

b^{i} aa E_{j}

. For all

i \leq j \leq k - 2

, the conjugate starting within

E_{j}

has prefix

b^{i} {aba}^{j - 2} P_{j + 1}

, and for

j = k - 1

, we have the conjugate with prefix

b^{i} {aba}^{k - 3} Q_{k}^{♭} #

. By construction, we have all the conjugates from Case 1 sorted according to the lexicographic order of the words with respect to the length of the run by

b

obtained by

E_{j}

.

The conjugates covered by Case 2 are sorted according to the decreasing length of the run of

a

, following the common prefix

b^{i} ab

. Only when the run of

b

is exactly i long, its conjugate ends with

a

. Thus, the conjugates ending with an

a

are those starting with

P_{i}

and

E_{i}

, which have prefixes

b^{i} aa E_{i}

and

b^{i} {aba}^{i - 2} P_{i + 1}

. □

Lemma 14.

β (b^{k - 1} a, W_{k}^{♭} #) = aa

.

Proof.

The two conjugates in

M (W_{k}^{♭} #)

which start with

{ba}^{k - 1} a

are

\begin{matrix} b^{k - 1} aa E_{k - 1} \dots a < b^{k - 1} {aba}^{k - 3} Q_{k}^{♭} # \dots a . \end{matrix}

The conjugates with the prefix

b^{k - 1} a

start with

E_{k - 1}

or

Q_{k}^{♭} #

. These conjugates have prefixes of

b^{k - 1} aa E_{k - 1}

and

b^{k - 1} aba Q_{k}^{♭} #

, respectively. One can see that these conjugates taken in this order are already sorted, and both conjugates end with

a

. □

Lemma 15.

β (b^{k} #, W_{k}^{♭} #) = a

.

Proof.

The last conjugate in

M (W_{k}^{♭} #)

is

b^{k} # P_{2} \dots a

. The last conjugate in lexicographic order starts with

b^{k} # P_{2}

, and since the run of

b

is maximal, it ends with

a

, and the claim follows. □

In conclusion, we define the above theorem.

Theorem 2.

r (W_{k}^{♭} #) - r (W_{k}) = 2 k - 5

for every

k \geq 6

.

Proof.

The BWT of the

W_{b}^{k} #

is BWT(

W_{b}^{k} #

) =

β (#) \prod_{i = 2}^{k - 1} β (a^{k - i} b) \cdot \prod_{i = 1}^{k - 1} β (b^{i} #) β (b^{i} a) \cdot β (b^{k} a)

. We refer to Table 2. Moreover,

r (W_{k}^{♭} #) = 8 k - 17

which has

2 k - 5

more runs than

r (W_{k}) = 6 k - 12

, cf. Lemma 5.

Table 2. Classification of the number of runs obtain in Theorem 2. The total number of runs is

8 k - 17

.

The lexicographic order of # is lower than an

a

, and a conjugate starting with # is smaller than any conjugate starting with

a

. Moreover, every conjugate in

β (a^{i} b)

is smaller than every one in

β (a^{i^{'}} b)

, for every

1 \leq i^{'} \leq i \leq k - 2

. In addition, every conjugate contributing a character to

β (b^{j} a)

is smaller than a conjugate contributing a character to

β (b^{j^{'}} a)

for every

1 \leq j \leq j^{'} \leq k - 1

. And with a conjugate starting with

b^{i} #

, the number is smaller than that of

b^{i} a

. Since we considered all the disjoint ranges of conjugates of

W_{k}^{♭} #

based on their common prefix, the word BWT(

W_{k}^{♭} #

) is

β (#) \prod_{i = 2}^{k - 1} β (a^{k - i} b) \cdot \prod_{i = 1}^{k - 1} β (b^{i} #) β (b^{i} a) \cdot β (b^{k} a)

. With the structure of BWT(

W_{k}^{♭} #

), we can derive its number of runs. The words

β (#)

and

\prod_{i = 2}^{k - 4} β (a^{k - i} b)

have

2 (k - 6)

runs: we start with 1 run from

β (#)

=

b

which is merged by

β (a^{k - 2} b) β (a^{k - 3} b)

=

bba

. And concatenating them

β (a^{i} b)

up to

β (a^{4} b)

adds 2 new runs each.

β

(aaab)

,

β

(

aab

),

β

(

ab

) have

2 (k - 5)

, 3, 5 runs, respectively. However, the boundaries between

β

(aaab)

and

β

(

aab

) are merged by an

a

; therefore,

β (aab)

has 2 runs.

β (b #)

has 1 run, followed by

β (ba)

which makes 7 runs. Then,

β (b^{i} #)

and

β

(

b^{i} a

) repeat, making 1 and 3 runs until

i = k - 2

thus makes

4 (k - 3)

runs.

β (b^{k - 1} #)

adds 1 run. Also,

β (b^{k - 1} a)

adds 1 run and is the last run since

β (b^{k} #)

does not add new runs, since it consists only of a

a

that merges with the previous one. Altogether, we have

2 (k - 6) + 2 (k - 5) + 2 + 5 + 1 + 7 + 4 (k - 3) + 1 + 1 = 8 k - 17

, and the claim holds. The main difference in the runs of

W_{k}^{♭} #

and

W_{k}

occurs from the prefix beginning with

b^{i} #

that concatenates with

b^{i} a

, repeating

baba

for

i \in [2 . . k - 1]

, while

W_{k}

repeats only

ba

. Thus, it makes additive runs of

2 k - 5 = Θ (k) = Θ (\sqrt{n})

.

Table 3, Table 4, Table 5, Table 6 and , Table 7

M (W_{k}^{♭} #)

. The first column partitions conjugates by common prefixes and names the common prefix shared by all conjugates in a partition. The second column shows the remaining part of the respective conjugate followed by the prefix of its partition. The remaining part of a conjugate decides its relative order inside its partition. The BWT column shows the last character of each conjugate. □

Table 3. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 1.

Table 4. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 2.

Table 5. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 3.

Table 6. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 4.

Table 7. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 5.

5.2. BWT of $\bar{W_{k}}$ After Substituting a Character

In this subsection, we consider the word

\bar{W_{k}} = (\prod_{i = 2}^{k - 1} \bar{P_{i}} \bar{E_{i}}) \bar{Q_{k}} = (\prod_{i = 2}^{k - 1} {ba}^{i} {bbba}^{i} {bab}^{i - 2}) {ba}^{i} b

, where we swapped

a

with

b

in

W_{k}

. The following series of lemmas characterize the subword of BWT(

\bar{W_{k}}

) using

M (\bar{W_{k}})

for each range we consider.

Lemma 16.

β (a^{k} b, \bar{W_{k}}) = b

.

Proof.

The first conjugate in

M (\bar{W_{k}})

is

a^{k} b \dots b .

The first conjugate in lexicographic order must start with the longest run of

a

s. By the definition of

\bar{W_{k}}

, the longest run of

a

has length k, which is obtained by

a^{k}

of

\bar{Q_{k}}

, which is preceded by a

b

. □

Lemma 17.

β (a^{i} b, \bar{W_{k}}) = {ba}^{2 k - 2 i - 1} b

for all

i \in [2 . . k - 1]

.

Proof.

With integer

i \in [2 . . k - 1]

, the conjugates starting with

a^{i} b

in

M (\bar{W_{k}})

are

\begin{matrix} a^{i} {bab}^{i - 2} \dots b & < a^{i} {bab}^{i - 1} \dots a < \dots < a^{i} {bab}^{k - 3} \dots a < a^{i} b \bar{P_{2}} \dots a \\ < a^{i} {bbba}^{k - 1} \dots a < \dots < a^{i} {bbba}^{i + 1} \dots a \\ < a^{i} {bbba}^{i} \dots b . \end{matrix}

For all

i \in [2 . . k - 1]

, the factor of

a^{i} b

can only be obtained for all

j \in [i . . k - 1]

, from

a^{i} {bab}^{j - 2}

from

\bar{E_{j}}

, or

a^{i} bb

from

\bar{P_{j}}

, and if

j = k

,

a^{i} b

from

\bar{Q_{k}}

. We can sort the conjugate according to the lexicographic order. Note that all these conjugates end with

b

, with the exception of the conjugate starting with

a^{i} b

obtained by

\bar{E_{i}}

and

\bar{P_{i}}

ending with

b

. □

Lemma 18.

β (ab, \bar{W_{k}}) = {ba}^{k - 2} {baa}^{k - 5} {baaab}^{k - 5}

.

Proof.

In

M (\bar{W_{k}})

, the conjugates starting with

ab

are

\begin{matrix} abaaabb \bar{E_{2}} \dots b & < aba \bar{P_{3}} \dots a < abab \bar{P_{4}} \dots a < \dots < {abab}^{k - 3} \bar{Q_{k}} \dots a \\ < ab \bar{P_{4}} \dots b < ab \bar{P_{2}} \dots a < abb \bar{E_{k - 1}} \dots a < \dots < abb \bar{E_{5}} \dots a \\ < abb \bar{P_{5}} < abb \bar{E_{4}} \dots a < abb \bar{E_{3}} \dots a < abb \bar{E_{2}} \dots a \\ < abbb \bar{P_{6}} \dots b < \dots < {ab}^{k - 3} \bar{Q_{k}} \dots b . \end{matrix}

We have as many circular occurrences of

ab

as the number of maximal (circular) runs of

b

in

\bar{W_{k}}

. Then, for all

i \in [2 . . k - 1]

, we have three cases.

Case 1:: one run of $ab$ in $\bar{P_{i}}$ ,
Case 2:: two runs in $ab$ in $\bar{E_{i}}$ ,
Case 3:: one run $ab$ in $\bar{Q_{k}}$ .

For Case 1, we have one conjugate starting with

abb \bar{E_{i}}

, for each

i \in [2 . . k - 1]

. Since each run of

a

within each word of

\bar{P_{i}}

is of length at least 2, all conjugates in Case 1 end in

a

. For Case 2, for all

i \in [2 . . k - 1]

, we can distinguish between two sub-cases based on where

ab

starts.

Case 2 (a):: from the first run of $a$ in $\bar{E_{i}}$ , starting with ${abab}^{i - 2} \bar{P_{i + 1}}$ , if $i \in [2 . . k - 2]$ , or ${abab}^{k - 3} \bar{Q_{k}}$ ,
Case 2 (b):: from the second run in $\bar{E_{i}}$ , starting with ${ab}^{i - 2} \bar{P_{i + 1}}$ , if $i \in [2 . . k - 2]$ , or ${ab}^{k - 3} \bar{Q_{k}}$ .

Similarly to Case 1, each conjugate for Case 2 (a), ends with

a

. Each conjugate in Case 2 (b) is obtained by shifting two characters on the right in each conjugate in Case 2 (a). Therefore, all these conjugates end with

b

.

For Case 3, the conjugate starting with

ab

in

\bar{Q_{k}}

has

ab \bar{P_{2}}

as a prefix and is preceded by

a

. Observe that only for Case 2 (b), we have one conjugate that starts with

abaaa

obtained by

a \bar{P_{3}}

and it is the first conjugate in the lexicographic order of

\bar{W_{k}}

. Then, the conjugates start with

abab

followed by

aba \bar{P_{3}} < abab \bar{P_{4}} < \dots < {abab}^{k - 3} \bar{Q_{k}}

from Case 2 (a).

Among the remaining conjugates, those with the prefix

abb

start with

ab \bar{P_{4}}

from Case 2 (b) or

ab \bar{P_{2}}

from Case 3. Then, among the left conjugates, the conjugate with the prefix

abbb

from Case 2 (a), for all

i \in [2 . . k - 1]

, or

abb \bar{P_{5}}

from Case 2 (b) follows. The last remaining conjugates have the prefix

{ab}^{i - 2}

for

i \in [6 . . k - 1]

or

{ab}^{k - 3} \bar{Q_{k}}

, which can be obtained by Case 2 (b). Since

{ab}^{k - 3} \bar{Q_{k}}

is greater than all other conjugates, it is the greatest conjugate of

\bar{W_{k}}

starting with

ab

and we conclude this proof. □

Lemma 19.

β (ba, \bar{W_{k}}) = {bb}^{2 k - 8} {babba}^{k - 2}

.

Proof.

The conjugates in

M (\bar{W_{k}})

that start with

ba

are

\begin{matrix} {ba}^{k} b \bar{P_{2}} \dots b & < {ba}^{k - 1} {ab}^{k - 3} \bar{Q_{k}} \dots b < {ba}^{k - 1} bb \bar{E_{k - 1}} \dots b < {ba}^{k - 2} {bab}^{k - 4} \bar{P_{k - 1}} \dots b \\ < {ba}^{k - 2} bb \bar{E_{k - 2}} \dots b < \dots < {ba}^{4} babb \bar{P_{5}} \dots b < {ba}^{4} bb \bar{E_{4}} \dots b \\ < baaabab \bar{P_{4}} \dots b < baaabb \bar{E_{3}} \dots a < baaba \bar{P_{3}} \dots b < baabb \bar{E_{2}} \dots b \\ < ba \bar{P_{3}} \dots a < bab \bar{P_{4}} \dots a < \dots < {bab}^{k - 3} \bar{Q_{k}} \dots a . \end{matrix}

For integer i, we can see that

{ba}^{i} {bab}^{i - 2}

is lexicographically smaller than

{ba}^{i} bb

. Thus, the first conjugate in lexicographic order starting with

ba

is the one followed by the longest run of

a

, and it can be found by

{ba}^{k} b

of

\bar{Q_{k}}

, followed by conjugates starting with

{ba}^{i} {bab}^{i - 2}

of

\bar{E_{i}}

and

{ba}^{i} bb

of

\bar{P_{i}}

for all

i \in [2 . . k - 1]

taken in decreasing order. By construction of

\bar{E_{i}}

, for

i \in [2 . . k - 1]

, these conjugates must end with a

b

. Otherwise, for

\bar{P_{i}}

, conjugates also end with

b

, with the exception of a conjugate starting with

\bar{P_{3}}

, since it is preceded by an

a

from

\bar{P_{2}}

. The remaining conjugates starting with

ba

are exactly those conjugates that have the prefix of the suffix

{bab}^{i - 2} \bar{P_{i + 1}}

if

i \in [2 . . k - 2]

or

{bab}^{k - 3} \bar{Q_{k}}

. All of these conjugates end with

a

, since they are preceded by

a

. □

Lemma 20.

β (bba, \bar{W_{k}}) = b^{2 k - 8} abba

.

Proof.

The conjugates starting with

bba

in

M (\bar{W_{k}})

are

\begin{matrix} b \bar{Q_{k}} \dots b & < b \bar{E_{k - 1}} \dots b < b \bar{P_{k - 1}} \dots b < \dots < b \bar{E_{5}} \dots b < b \bar{P_{5}} \dots b < b \bar{E_{4}} \dots b \\ < b \bar{P_{4}} \dots a < b \bar{E_{3}} \dots b < b \bar{E_{2}} \dots b < b \bar{P_{2}} \dots a . \end{matrix}

These conjugates are obtained by following four cases.

Case 1:: concatenating suffix $b$ of $\bar{P_{j}}$ with $\bar{E_{j}}$ for all $j \in [2 . . k - 1]$ ,
Case 2:: concatenating suffix $b$ of $\bar{E_{j}}$ with $\bar{P_{j + 1}}$ for all $j \in [3 . . k - 2]$ ,
Case 3:: concatenating suffix $b$ of $\bar{E_{k - 1}}$ with $\bar{Q_{k}}$ ,
Case 4:: concatenating suffix $b$ of $\bar{Q_{k}}$ with $\bar{P_{2}}$ .

The first conjugate in lexicographic order starting with

bba

is the one followed by the longest run of

a

. The smallest of these conjugates can be found by Case 3, concatenation of the suffix

b

of

\bar{E_{k - 1}}

with

\bar{Q_{k}}

. We can directly observe that

{bba}^{j} {bab}^{j - 2} < {bba}^{j} bb

holds for every integer

j \geq 0

. Thus, the next conjugate will have the prefix

b \bar{E_{j}}

from Case 1 and

b \bar{P_{j}}

from Case 2 repeating in decreasing order. Since

b \bar{E_{j}}

of Case 1 and

b \bar{Q_{k}}

of Case 3 is preceded by a

b

, those end with a

b

. On the other hand,

b \bar{P_{j + 1}}

precedes

b

for all

j \in [4 . . k - 2]

until

b \bar{P_{4}}

appears since it precedes an

a

. Lastly, conjugates with the prefix

bbaaa

and

bbaa

by Case 1 end with a

b

. The greatest lexicographic conjugate is from Case 4 as it has the smallest runs of

a

which is two and ends with

a

.

We can sort all of these conjugates according to the order of the words in

{b \bar{Q_{k}}} ⋃_{j = 4}^{k - 1} {b \bar{E_{j}} b \bar{P_{j}}} \cup ⋃_{j^{'} = 2}^{3} {\bar{E_{j^{'}}}} \cup {b \bar{P_{2}}} .

□

Lemma 21.

β (bbba, \bar{W_{k}}) = b {(ab)}^{k - 6} a^{5}

.

Proof.

The conjugates in

M (\bar{W_{k}})

starting with

bbba

are

\begin{matrix} bb \bar{Q_{k}} \dots b & < bb \bar{E_{k - 1}} \dots a < bb \bar{P_{k - 1}} \dots b < \dots < bb \bar{E_{6}} \dots a < bb \bar{P_{6}} \dots b \\ < bb \bar{E_{5}} \dots a < bb \bar{P_{5}} \dots a < bb \bar{E_{4}} \dots a < bb \bar{E_{3}} \dots a < bb \bar{E_{2}} \dots a . \end{matrix}

Some of the conjugates starting with

bbba

can be obtained by two cases.

Case 1:: from the concatenation of the suffix $bb$ of $\bar{E_{j - 1}}$ with a prefix of $ba$ of $\bar{P_{j}}$ for all $j \in [5 . . k - 1]$
or $\bar{Q_{k}}$ if $j = k$ ;
Case 1:: from the concatenation of the suffix $bb$ of $\bar{P_{j}}$ with prefix $ba$ of $\bar{E_{j}}$ for all $j \in [2 . . k - 1]$ .

Thus, all conjugates starting with

bbba

are sorted according to the lexicographic order of the words in

{bb \bar{Q_{k}}} \cup ⋃_{j = 5}^{k - 1} {bb \bar{E_{j}} bb \bar{P_{j}}} \cup ⋃_{j = 2}^{4} {\bar{E_{j}}} .

All conjugates starting with

bb \bar{P_{j}}

for all

j \in [6 . . k - 1]

or

bb \bar{Q_{k}}

in Case 1 end with

b

. Otherwise, conjugates starting with

bb \bar{P_{5}}

of Case 1 or

bb \bar{E_{j}}

for all

j \in [2 . . k - 1]

of Case 2 end with

a

. □

Lemma 22.

β (b^{j} a, \bar{W_{k}}) = b^{k - j - 2} a

for all

j \in [4 . . k - 2]

.

Proof.

All runs of

b

of length of a range

j \in [4 . . k - 3]

appear only by concatenating suffix

b^{j - 1}

of

E_{j + 1}

with prefix

ba

of

\bar{P_{j^{'}}}

for all

j^{'} \in [j + 2 . . k - 1]

in decreasing order. All of these conjugates end with a

b

, with the exception of a conjugate

b^{j - 1} \bar{P_{j + 2}}

which ends with an

a

since suffix

b^{j - 1}

precedes an

a

. Hence, the last conjugate in lexicographic order starting with

b^{k - 2} a

is within

b^{k - 3} \bar{Q_{k}}

and since the run of

b

is maximal it ends with

a

, and the claim follows. □

The following theorem presents the shape of the BWT of

\bar{W_{k}}

.

Theorem 3.

For every

k \geq 6, r (\bar{W_{k}}) = 6 k - 12

. cf. Table 8.

Table 8. Classification of the number of runs obtained in Theorem 3. The total number of runs is

6 k - 12

.

Proof.

Let us put the result from Lemma 16 to Lemma 22 together. Every conjugate of contributing a character to

β (a^{i} b)

is smaller than a conjugate contributing a character to

β (a^{i^{'}} b)

, for every

1 \leq i^{'} \leq i \leq k

. Symmetrically, every conjugate in

β (b^{j} a)

is greater than every conjugate in

β (b^{j^{'}} a)

, when

1 \leq j^{'} \leq j \leq k - 2

. Since we considered all the disjoint ranges of conjugates of

\bar{W_{k}}

based on their common prefix, the word

\prod_{i = 0}^{k - 1} β (a^{k - i} b) \cdot \prod_{i = 1}^{k - 2} β (b^{i} a)

is the BWT of

\bar{W_{k}}

.

With the structure of BWT(

\bar{W_{k}}

), we can derive its number of runs. The word

\prod_{i = 0}^{k - 1} β (a^{k - i} b)

has exactly

2 k + 3

runs: we start with 1 run from

β (a^{k} b)

but it is merged by a

b

from

β (a^{k - 1} b)

. Then, concatenating each

β (a^{k - 1} b)

up to

β (aab)

adds 3 runs each. However, the boundaries between these words merge because

b

appears continuously. Thus, each

β (a^{i} b)

for

i \in [2 . . k - 1]

makes 2 runs each. By counting, we observe that

β (ab)

runs 7 times. The remaining part of the BWT, that is,

\prod_{i = 1}^{k - 2} β (b^{i} a)

has

4 k - 12

runs: the word

β (ba)

, has 4 runs, but the first

b

merges with a

b

from

β (ab)

, so we only charge 3 runs for this word. Then,

β (bba)

and

β (bbba)

add 4 and

1 + 2 (k - 6) + 1

runs, respectively. Finally,

\prod_{i = 4}^{k - 2} β (b^{i} a)

runs for 2 until

i = k - 3

. The word

β (b^{k - 2} a)

does not add new runs, as it consists only of an

a

that merges with the previous one. Altogether, we have

2 (k - 2) + 7 + 3 + 4 + 1 + 2 (k - 6) + 1 + 2 (k - 6) = 6 k - 12

, and the claim holds. □

The following lemmas describe the BWT of

\bar{W_{k}}

after applying one specific edit operation.

\bar{W_{k}^{♭}} c

is a word obtained by replacing the last character

b

of

\bar{W_{k}}

with

c

, where

c

is lexicographically larger than

b

. The number of runs in the BWT of

\bar{W_{k}^{♭}} c

can be derived by comparing the BWT of

\bar{W_{k}^{♭}} c

to the BWT of

\bar{W_{k}}

, for which we explicitly counted the number of runs, so we omit these parts of the proof using

M (\bar{W_{k}^{♭}} c)

, which is a list of lexicographically sorted conjugates of word

\bar{W_{k}^{♭}} c

. Substituting the last character with

c

in

\bar{W_{k}}

also increases the number of runs by

Θ (k)

.

Lemma 23.

β (a^{k} c, \bar{W_{k}^{♭}} c) = b

.

Proof.

The first conjugate in

M (\bar{W_{k}^{♭}} c)

starts with

a^{k} c \dots b

. The first conjugate in lexicographic order must start with the longest run of

a

. By the definition of

\bar{W_{k}^{♭}} c

, the longest run of

a

is obtained by suffix

a^{k} c

of

\bar{Q_{k}^{♭}} c

, preceded by a

b

. □

Lemma 24.

β (a^{i} b, \bar{W_{k}^{♭}} c) = {ba}^{2 k - 2 i - 2} b

for all

i \in [2 . . k - 1]

.

Proof.

The conjugates in

M (\bar{W_{k}^{♭}} c)

starting with the prefix

a^{i} b

for

i \in [2 . . k - 1]

are

\begin{matrix} a^{i} {bab}^{i - 2} \bar{P_{i + 1}} \dots b & < a^{i} {bab}^{i - 1} \bar{P_{i + 2}} \dots a < \dots < a^{i} {bab}^{k - 4} \bar{P_{k - 1}} \dots a < a^{i} {bab}^{k - 3} \bar{Q_{k}^{♭}} c \dots a \\ < a^{i} bb \bar{E_{k - 1}} \dots a < \dots < a^{i} bb \bar{E_{k - 2}} \dots a < \dots < a^{i} bb \bar{E_{i + 1}} \dots a \\ < a^{i} bb \bar{E_{i}} \dots b . \end{matrix}

For every integer

i \in [2 . . k - 1]

, the conjugates in

M (\bar{W_{k}^{♭}} c)

starting with

b^{i} a

can only be obtained from two cases:

Case 1:: $a^{i} {bab}^{i - 2}$ of $\bar{E_{j}}$ for all $j \in [i . . k - 1]$ ,
Case 2:: $a^{i} bb$ of $\bar{P_{j}}$ for all $j \in [i . . k - 1]$ .

We can sort these conjugates according to the lexicographic order of

⋃_{j = i}^{k - 2} {a^{i} {bab}^{j - 2} \bar{P_{j + 1}}} \cup a^{i} {bab}^{k - 3} \bar{Q_{k}^{♭}} \cup ⋃_{j = i}^{k - 1} {a^{i} bb \bar{E_{j}}}

. Note that all these conjugates end with an

a

, with the exception of the conjugate starting with

a^{i} {bab}^{i - 2} \bar{P_{i + 1}}

and

a^{i} bb \bar{E_{i}}

, since these are the only places where the occurrence of

a^{i} b

can be found. □

Lemma 25.

β (a^{i} c, \bar{W_{k}^{♭}} c) = a

for all

i \in [1 . . k - 1]

.

Proof.

The only conjugate in

M (\bar{W_{k}^{♭}} c)

starting with

a^{i} b

c

for all

i \in [1 . . k - 1]

has a prefix of

a^{i} b c \bar{P_{2}} \dots a

. For all two distinct integers

i, i^{'}

with

i > i^{'} \geq 0

, we have

a^{i} bc < a^{i^{'}} bc

. Also, since the lexicographic order of a word in

\bar{W_{k}^{♭}}

c

is

a < b < c

, it is also clear that

a^{i} b < a^{i} c

. The conjugates starting with

a^{i} c

are obtained from

a^{i} c

from

\bar{Q_{k}^{♭}} c

and since the length of

a

is k, all conjugates with

a^{i}

with

i \in [1 . . k - 1]

end with

a

. □

Lemma 26.

β (ab, \bar{W_{k}^{♭}} c) = {ba}^{k - 2} {ba}^{k - 5} {baaab}^{k - 5}

.

Proof.

In

M (\bar{W_{k}^{♭}} c)

, the conjugates starting with

ab

are

\begin{matrix} a \bar{P_{3}} \dots b & < aba \bar{P_{3}} \dots a < abab \bar{P_{4}} \dots a < \dots < {abab}^{k - 3} \bar{Q_{k}^{♭}} c \dots a \\ < ab \bar{P_{4}} \dots b \\ < abb \bar{E_{k - 1}} \dots a < \dots < abb \bar{E_{5}} \dots a \\ < abb \bar{P_{5}} \dots b < abb \bar{E_{4}} \dots a < abb \bar{E_{3}} \dots a < abb \bar{E_{2}} \dots a \\ < abbb \bar{P_{6}} \dots b < \dots < {ab}^{k - 3} \bar{Q_{k}^{♭}} c \dots b . \end{matrix}

We have as many circular occurrences of

ab

as the number of maximal runs of

a

in

\bar{W_{k}^{♭}} c

. Then, for all

i \in [2 . . k - 1]

, we have two cases.

Case 1:: one run in $\bar{P_{i}}$ obtained by concatenating suffix $abb$ of $\bar{P_{i}}$ with $\bar{E_{i}}$ , for each $i \in [2 . . k - 1]$ , and
Case 3:: two runs in $\bar{E_{i}}$ .

For Case 1, since each run of

a

within each word of

⋃_{i = 2}^{k - 1} abb \bar{E_{i}}

is of length of at least 2, all conjugates in Case 1 end with an

a

.

For Case 2, for all

i \in [2 . . k - 1]

, we can distinguish between two sub-cases, based on where

ab

starts, if either

Case 2 (a):: from the first $a$ in $\bar{E_{i}}$ or
Case 2 (b):: from the second $a$ in $\bar{E_{i}}$ .

For Case 2 (a), we can see that these conjugates are of the type

{abab}^{i - 2} \bar{P_{i + 1}}

if

i \in [2 . . k - 2]

or

{abab}^{k - 3} \bar{Q_{k}^{♭}} c

. Similarly to Case 1, each conjugate for Case 2 (a) ends with

a

. Each conjugate in Case 2 (b) is obtained by shifting two characters on the right each conjugate in Case 2 (a). Therefore, all of these conjugates end with a

b

and have prefix of type

{ab}^{i - 2} \bar{P_{i + 1}}

, if

i \in [2 . . k - 2]

or

{ab}^{k - 3} \bar{Q_{k}^{♭}} c

. All these conjugates end with a

b

since

a

is preceded by

b

. Observe that only for Item Case 2 (b), we have conjugates starting with

abaaab

which is

a

\bar{P_{3}}

. Hence, it is the first conjugate in lexicographic order, followed by those starting with

aba

\bar{P_{3}} < abab \bar{P_{4}} < \dots < {abab}^{k - 3} \bar{Q_{k}^{♭}} c

from Item Case 2 (a) and these conjugates start with

abab

.

Next, conjugates with a prefix of

abba

which is

ab \bar{P_{4}}

from Case 2 (b) follow, then those having prefix

abbba

either start with

abb \bar{E_{i}}

for all

i \in [5 . . k - 1]

from Case 1 follow in decreasing order. Then,

abb \bar{P_{5}}

from Case 2 (b) and

abb \bar{E_{4}}

,

abb \bar{E_{3}}

,

abb \bar{E_{2}}

from Case 1 follow.

The remaining conjugates are those which start with a prefix of

{ab}^{i} a

for

i \in [4 . . k - 2]

, which are obtained by

{ab}^{i - 1} \bar{P_{i + 2}}

if

i \in [4 . . k - 3]

or

{ab}^{k - 3} \bar{Q_{k}^{♭}} c

, from Case 2 (b). These conjugates are sorted according to the length of the run of

a

following the common prefix. Then, the result is

\begin{matrix} {a \bar{P_{3}}} & \cup ⋃_{j = 2}^{k - 2} {{abab}^{j - 2} \bar{P_{j + 1}}} \cup {{aba}^{k - 3} \bar{Q_{k}^{♭}} c} \cup {ab \bar{P_{4}}} \cup ⋃_{i = 0}^{k - 6} {abb \bar{E_{k - i - 1}}} \\ \cup {abb \bar{P_{5}}} \cup ⋃_{i = 0}^{2} {abb \bar{E_{4 - i}}} \cup ⋃_{j = 4}^{k - 3} {a^{j - 1} \bar{P_{j + 2}}} \cup {{ab}^{k - 3} \bar{Q_{k}^{♭}} c} . \end{matrix}

□

Lemma 27.

β (ba, \bar{W_{k}^{♭}} c) = b^{2 k - 6} {abca}^{k - 2}

.

Proof.

The conjugates in

M (\bar{W_{k}^{♭}} c)

starting with the prefix

ba

are

\begin{matrix} {ba}^{k} c \bar{P_{2}} \dots b & < {ba}^{k - 1} {ab}^{k - 3} \bar{Q_{k}^{♭}} c \dots b < {ba}^{k - 1} bb \bar{E_{k - 1}} \dots b < \dots < {ba}^{4} babb \bar{P_{5}} \dots b \\ < {ba}^{4} bb \bar{E_{4}} \dots b < {ba}^{3} bab \bar{P_{4}} \dots b < {ba}^{3} bb \bar{E_{3}} \dots a < baaba \bar{P_{3}} \dots b \\ < baabb \bar{E_{2}} \dots c < ba \bar{P_{3}} \dots a < bab \bar{P_{4}} \dots a < \dots < {bab}^{k - 3} \bar{Q_{k}^{♭}} c \dots a . \end{matrix}

There are many occurrences of a conjugate starting with the prefix

ba

, and it occurs in three parts.

Case 1:: one run of $ba$ in $\bar{P_{j}}$ , for all $j \in [2 . . k - 1]$ ,
Case 1:: two runs from $\bar{E_{j}}$ , for all $j \in [2 . . k - 1]$ ,
Case 1:: one run from $\bar{Q_{k}^{♭}} c$ .

The conjugates in Case 1 start with

{ba}^{j}

for all

j \in [2 . . k - 1]

. Since

{ba}^{i} < {ba}^{i^{'}}

if only

i > i^{'}

, the conjugates are sorted in decreasing order. All conjugates for

j \in [3 . . k - 1]

end with a

b

, except for conjugates with prefix

\bar{P_{2}}

since it is preceded by

c

.

In Case 2, we can distinguish between two sub-cases based on where

ba

starts:

Case 2 (a):: first run of $ba$ from the prefix of $\bar{E_{j}}$ ,
Case 2 (b):: from the second run of $ba$ in $\bar{E_{j}}$ .

The conjugates in Case 2 (a) are the type of

{ba}^{j} {bab}^{j - 2} \bar{P_{j + 1}}

, if

j \in [2 . . k - 2]

or

{ba}^{k - 1} {bab}^{k - 3} \bar{Q_{k}^{♭}} c

. All of these conjugates are preceded by

\bar{P_{j}}

, thus ending with

b

. The conjugates in Case 2 (b) start from

{bab}^{j - 2} \bar{P_{j + 1}}

if

j \in [2 . . k - 2]

or

{bab}^{k - 3} \bar{Q_{k}^{♭}} c

and end with an

a

.

In Case 3, only one conjugate can be found by a prefix of

{ba}^{k} c

, which ends with

b

.

Observe that only for Case 3 we have a conjugate with the longest run of

a

after

b

. Hence, the first conjugate in lexicographic order is

{ba}^{k} c \bar{P_{2}}

from Case 3. It is followed by

{ba}^{k - 1} {bab}^{k - 3} \bar{Q_{k}^{♭}} c < {ba}^{k - 1} bb \bar{E_{k - 1}} < {ba}^{k - 2} {bab}^{k - 4} \bar{P_{k - 1}} < {ba}^{k - 2} bb \bar{E_{k - 2}} < \dots < {ba}^{4} babb \bar{P_{5}} < {ba}^{4} bb \bar{E_{4}}

. All of these conjugates end with a

b

.

Among the remaining conjugates, those having prefix

baaab

either start with

baaabab \bar{P_{4}}

from Case 2 (a) or

baaabb \bar{E_{3}}

from Case 1. Then, the remaining conjugates with prefix

baab

are those starting with

baaba \bar{P_{3}}

from Case 2 (a) or

baabb \bar{E_{2}}

from Case 1. Lastly,

k - 2

conjugates from Case 2 (b) follow, which are

{bab}^{j - 2} \bar{P_{j + 1}}

for all

j \in [2 . . k - 2]

or

{bab}^{k - 3} \bar{Q_{k}^{♭}} c

. All of these conjugates end with an

a

.

We prove our claim by sorting lexicographically the conjugates in

{{ba}^{k} c \bar{P_{2}}} \cup ⋃_{j = 0}^{k - 3} {{ba}^{k - j - 1} {bab}^{k - j - 3} \bar{P_{k - j}} \cdot {ba}^{k - j - 1} bb \bar{E_{k - j - 1}}} \cup ⋃_{j = 2}^{k - 2} {{bab}^{j - 2} \bar{P_{j + 1}}} \cup {{bab}^{k - 3} \bar{Q_{k}^{♭}} c} .

□

Lemma 28.

β (bba, \bar{W_{k}^{♭}} c) = b^{2 k - 8} abb

.

Proof.

The conjugates in

M (\bar{W_{k}^{♭}} c)

starting with prefix

bba

are

\begin{matrix} {bba}^{k} c \bar{P_{2}} \dots b & < {bba}^{k - 1} {bab}^{k - 3} \bar{Q_{k}^{♭}} c \dots b < {bba}^{k - 1} bb \bar{E_{k - 1}} \dots b < \dots \\ < {bba}^{5} {bab}^{3} \bar{P_{6}} \dots b < {bba}^{5} bb \bar{E_{5}} \dots b < {bba}^{4} babb \bar{P_{5}} \dots b \\ < {bba}^{4} bb \bar{E_{4}} \dots a < {bba}^{3} bab \bar{P_{4}} \dots b < bbaaba \bar{P_{3}} \dots b . \end{matrix}

The smallest conjugate with prefix

bba

can be obtained by three cases.

Case 1:: concatenating suffix $b$ of $\bar{E_{k - 1}}$ with $\bar{Q_{k}^{♭}} c$ ,
Case 2:: concatenation of suffix $b$ of $\bar{E_{j}}$ with $\bar{P_{j + 1}}$ if $j \in [3 . . k - 2]$ or $\bar{Q_{k}^{♭}} c$ ,
Case 3:: concatenating suffix $b$ of $\bar{P_{j}}$ with $\bar{E_{j}}$ , for all $j \in [2 . . k - 1]$ .

The conjugates in Case 1 and Case 3 end with

b

. Also, conjugates from Case 2 end with

b

with an exception of a conjugate starting with

b \bar{P_{4}}

since it is preceded by an

a

. We conclude this proof by sorting lexicographically the conjugates in

{{bba}^{k} c \bar{P_{2}}} \cup ⋃_{j = 0}^{k - 5} {{bba}^{k - i - 1} {bab}^{k - i - 3} \bar{P_{k - i}} {bba}^{k - i - 1} bb \bar{E_{k - i - 1}}} \cup ⋃_{j = 0}^{1} {{bba}^{3 - j} {bab}^{1 - j} \bar{P_{4 - j}}} .

□

Lemma 29.

β (bbba, \bar{W_{k}^{♭}} c) = b {(ab)}^{k - 6} aaaaa

.

Proof.

The conjugates in

M (\bar{W_{k}^{♭}} c)

starting with the prefix

bbba

are

\begin{matrix} bb \bar{Q_{k}^{♭}} c \dots b & < bb \bar{E_{k - 1}} \dots a < bb \bar{P_{k - 1}} \dots b < \dots < bb \bar{E_{6}} \dots a < bb \bar{P_{6}} \dots b \\ < bb \bar{E_{5}} \dots a < bb \bar{P_{5}} \dots a < bb \bar{E_{4}} \dots a < bb \bar{E_{3}} \dots a < bb \bar{E_{2}} \dots a . \end{matrix}

Analogously to Lemma 28, the conjugates starting with

bbba

can be obtained from three cases.

Case 1:: concatenating suffix $bb$ of $\bar{E_{k - 1}}$ with $\bar{Q_{k}^{♭}} c$ ,
Case 2:: concatenation of suffix $bb$ of $\bar{E_{j}}$ with $\bar{P_{j + 1}}$ if $j \in [4 . . k - 2]$ or $\bar{Q_{k}^{♭}} c$ ,
Case 3:: concatenating a suffix $bb$ of $\bar{P_{j}}$ with $\bar{E_{j}}$ , for all $j \in [2 . . k - 1]$ .

The conjugate in Case 1 is the smallest conjugate starting with

bbba

since it has a longest run of

a

and ends with a

b

. In addition, the conjugates of Case 3 end with a

a

since

bb

are preceded by an

a

. In Case 2, all the conjugates end with

b

with an exception of a conjugate starting with

bb \bar{P_{5}}

since it is preceded by an

a

. We can sort these conjugates by

{bb \bar{Q_{k}^{♭}} c} \cup ⋃_{j = 0}^{k - 6} {{bba}^{k - i - 1} {bab}^{k - i - 3} \bar{P_{k - i}} {bba}^{k - i - 1} bb \bar{E_{k - i - 1}}} \cup ⋃_{j = 0}^{2} {{bba}^{4 - j} {bab}^{2 - j} \bar{P_{5 - j}}} .

□

Lemma 30.

β (b^{j} a, \bar{W_{k}^{♭}} c) = b^{k - j - 2} a

for all

j \in [4 . . k - 2]

.

Proof.

In

M (\bar{W_{k}^{♭}} c)

, the conjugates starting with prefix

b^{j} a

for all

j \in [4 . . k - 2]

are

\begin{matrix} b^{j - 1} \bar{Q_{k}^{♭}} c \dots b < b^{j - 1} \bar{P_{k - 1}} \dots b < b^{j - 1} \bar{P_{k - 2}} \dots b < \dots < b^{j - 1} \bar{P_{j + 3}} \dots b < b^{j - 1} \bar{P_{j + 2}} \dots a . \end{matrix}

Observe that the only conjugates with the prefix

b^{j} a

for

j \in [4 . . k - 2]

start with concatenating

b^{j - 1}

either to

\bar{Q_{k}^{♭}} c

or

\bar{P_{j^{'}}}

if

j^{'} \in [j + 2 . . k - 1]

. One can see that these conjugates taken in this order are already sorted, and all conjugates end with a

b

, with the exception of a conjugate starting with

b^{j - 1} \bar{P_{j + 2}}

, since it is preceded by an

a

, therefore ending with an

a

. We have all conjugates ordered according to the lexicographic order of the words in

b^{j - 1} \bar{Q_{k}^{♭}} c \cup ⋃_{j^{'} = 0}^{k - j - 3} {b^{j - 1} \bar{P_{k - j^{'} - 1}}}

. This concludes our proof. □

Lemma 31.

β (c, \bar{W_{k}^{♭}} c) = a

.

Proof.

The only conjugate in

M (\bar{W_{k}^{♭}} c)

that starts with prefix

c

is

c \bar{P_{2}} \dots a

. Since

c

is lexicographically larger than other characters such as

a

,

b

, it is the biggest conjugate in

M (\bar{W_{k}^{♭}} c)

, and it ends with an

a

. □

The following theorem puts the lemmas above together.

Theorem 4.

Substituting the last character

b

of

\bar{W_{k}}

by

c

increases r by

2 k - 5

, cf. Table 9.

Table 9. Classification of the number of runs obtain in Theorem 4. The total number of runs is

8 k - 17

.

Proof.

Every conjugate contributing a character to

β (a^{i} b)

is smaller than a conjugate contributing a character to

β (a^{i^{'}} b)

for every

1 \leq i^{'} < i \leq k - 1

. By symmetry, every conjugate contributing a character to

β (b^{j} a)

is greater than each conjugate contributing a character to

β (b^{j^{'}} a)

for every

1 \leq j^{'} \leq j \leq k - 2

. With the structure of the BWT of

(\bar{W_{k}^{♭}} c)

, we can easily derive its number of runs.

β (a^{k} c) \cdot β (a^{k - 1} c) \cdot \prod_{i = 1}^{k - 2} β (a^{i} b) \cdot β (a^{i} c)

has exactly

4 k - 2

runs: we start from 1 run from

β (a^{k} c)

but it is merged with

β (a^{k - 1} b)

.

β (a^{k - 1} b)

and

β (a^{k - 1} c)

add 2 runs. Then, concatenating each

β (a^{i} b)

and

β (a^{i} c)

for all

i \in [2 . . k - 2]

in a decreasing order, we add 3 and 1 runs each, which results in

4 (k - 3)

runs. By counting, we observe that

β (ab), β (a #)

adds 7 and 1 runs, respectively.

The word

β (ba), β (bba), β (bbba)

has exactly 5, 3,

2 k - 10

runs each, but since the boundaries between

β^{'} (bba)

and

β (bbba)

merge, the first

b

of

β (bbba)

does not count, turning into

2 k - 11

. The remaining part of BWT, that is,

\prod_{j = 4}^{k - 3} β (b^{j} a) \cdot β (b^{k - 2} a) \cdot β (c)

has

2 k - 12

runs: we start by concatenating each

β^{'} (b^{4} a)

up to

β (b^{k - 3} a)

, which adds 2 runs each. The last

β (b^{k - 2} a), β (c)

does not add new runs, as it consists only of an

a

that merges with the previous one. Altogether, we have

2 + 4 (k - 3) + 7 + 1 + 5 + 3 + 2 k - 11 + 2 (k - 6) = 8 k - 17

, and the claim holds.

The main difference between

\bar{W_{k}}

and

\bar{W_{k}^{♭}} c

comes from

a^{i} b

that is concatenated with

a^{i} c

for

i \in [2 . . k - 1]

, which repeats

baba

, while

\bar{W_{k}}

repeats

ba

only, making

2 k - 5 = Θ (k)

more runs. Table 10, Table 11, Table 12 and Table 13 describe the scheme of the BWT of word

\bar{W_{k}^{♭}} c

. We have

r (\bar{W_{K}^{♭}} c

) =

r (\bar{W_{k}}) + 2 k - 5

. From Definition 1, we have

k = Θ (\sqrt{n})

. Thus,

r (\bar{W_{K}^{♭}} c) - r (\bar{W_{k}}) = 2 k - 5 = Θ (\sqrt{n})

. □

Table 10. Lexicographically sorted conjugates of

\bar{W_{k}^{♭}} c

studied in Theorem 4, Part 1.

Table 11. Lexicographically sorted conjugates of

\bar{W_{k}^{♭}} c

studied in Theorem 4, Part 2.

Table 12. Lexicographically sorted conjugates of

\bar{W_{k}^{♭}} c

studied in Theorem 4, Part 3.

Table 13. Lexicographically sorted conjugates of

\bar{W_{k}^{♭}} c

studied in Theorem 4, Part 4.

6. Multiplicative Sensitivity of $ρ$ by $Ω (log n)$

Recall that

ρ (W) = runs (BBWT (W))

. In this section, we return our attention to Fibonacci words. Similar to Section 4, we use them to construct a family of words with a multiplicative sensitivity of

Θ (log n)

for the number of runs

ρ

in the BBWT. Before that, we start with some helpful lemmas known in the literature.

Lemma 32.

([41], Lemma 3). The

2 k

th Fibonacci word

F_{2 k}

is

X_{2 k} ab

. The Lyndon conjugate of the Fibonacci word

F_{2 k}

is

L_{2 k} = a X_{2 k} b

.

Lemma 33.

([37], Lemma 6). We let

L_{2 k}

be the Lyndon conjugate of the Fibonacci word

F_{2 k}

. Then, r(BBWT(

L_{2 k}

))= 2.

Lemma 34.

([41], Lemma 8). If

k < n

, then the Lyndon conjugate of

F_{k}

is a prefix or a suffix of

a X_{n} b

. If

F_{k} = P_{k} ba

, then its Lyndon conjugate

a P_{k} b

is a prefix of

a P_{n} b

; and if

F_{k} = P_{k} ab

, then its Lyndon conjugate

a P_{k} b

is a suffix of

a P_{n} b

.

The next lemma addresses the extended Burrows–Wheeler transform [16], which takes a subset of steps from the BBWT by expecting the input to be a set of primitive words (i.e., the Lyndon factors in case of the BBWT). We translate the following known result to the BBWT:

Lemma 35.

(Corollary 4 of [47]). We let

{T_{1}, \dots, T_{m}}

be a conjugate-free set of primitive words and let

r^{'}

be the number of runs of its extended Burrows–Wheeler transform. Then,

m \leq r^{'}

.

Corollary 1.

We let

T_{1}, \dots, T_{m}

be the Lyndon factors of word T, then

m \leq ρ (T)

.

In what follows, we establish a lower bound on the multiplicative sensitivity of

ρ

with the Lyndon conjugates of Fibonacci words by leveraging Corollary 1.

6.1. Editing the Last Position of $L_{2 k}$

We start with deleting the last character of

L_{2 k}

, which directly leads to the following insight.

Theorem 5.

ρ (L_{2 k}^{♭}) \geq k

.

Proof.

L_{2 k}^{♭} = a X_{2 k}

is not a Lyndon word; therefore, its Lyndon is factorized and has more than one factor. According to Lemma 34, the Lyndon word of the Fibonacci word

F_{2 k} = X_{2 k} ab

is

a X_{2 k} b

. The central word

X_{2 k}

is

X_{2 k - 1} b a X_{2 k - 2}

, so the Lyndon word of

F_{2 k}

is

a X_{2 k} b = a X_{2 k - 1} ba X_{2 k - 2} b

.

a X_{2 k - 1} b

refers to

L_{2 k - 1}

, which is

a X_{2 k - 1} b

and the suffix

a X_{2 k - 2} b

is

L_{2 k - 2}

.

However, by deleting the last character

b

,

L_{2 k}^{♭}

becomes

a X_{2 k - 1} ba X_{2 k - 2}

, meaning that

L_{2 k - 2}

does not exist. Thus, we can say that

L_{2 k - 1}

is one of the Lyndon factors, since it is not followed by

L_{2 k - 2}

. The remaining part of

L_{2 k}^{♭}

is

a X_{2 k - 2}

. The same as

X_{2 k}

, central word

X_{2 k - 2}

can be divided as

X_{2 k - 3} ba X_{2 k - 4}

; thus,

a X_{2 k - 2} = a X_{2 k - 3} ba X_{2 k - 4}

. We can find Lyndon factor

L_{2 k - 3} = a X_{2 k - 3} b

in the prefix. The remaining part is

a X_{2 k - 4}

, which is not a Lyndon word, same as

a X_{2 k - 2}

above, so

a X_{2 k - 4}

is Lyndon factorized and makes

L_{2 k - 5}

as a prefix, and the remaining

a X_{2 k - 6}

makes

L_{2 k - 7}

as a prefix. And finally,

a X_{4}

is divided as

a X_{3} ba X_{2}

, where

X_{2}

is

ε

. Therefore,

L_{2 k}^{♭}

’s Lyndon factor is

L_{2 i - 1}

for

i \in [2 . . k]

and the last remaining part

a

is the Lyndon word itself. Thus,

L_{2 k}^{♭}

has Lyndon factors

L_{2 i - 1}

for every

i \in [2 . . k]

and

a

as a Lyndon factor. The number of the Lyndon factor is k, which we depict in Figure 6. □

Figure 6. Factorization of

L_{2 k}^{♭}

into Lyndon factors studied in the proof of Theorem 5.

L_{2 k}^{♭}

has k Lyndon factors.

By Lemma 33 and Theorem 5, we conclude that the multiplicative sensitivity for deleting the last character of

L_{2 k}

is

Ω (k)

.

Theorem 6.

We let

L_{2 k}^{♭} #

be the word obtained by substituting the last character

b

of

L_{2 k}

by #. Then,

ρ (L_{2 k}^{♭} #) \geq k + 1 .

Proof.

Since # is lexicographically smaller than

a

,

L_{2 k}^{♭} #

is not a Lyndon word; it makes Lyndon factors. Since # is smaller than both

a

and

b

, # is a Lyndon word. In addition,

L_{2 k}^{♭}

is Lyndon factorized as Theorem 5, which produces Lyndon factors

L_{2 i - 1}

for

i \in [2 . . k]

and the last Lyndon factor

a

.

L_{2 k}^{♭} #

makes one more Lyndon factor, which is #, which therefore makes

k + 1

a number of Lyndon factors. We depict the Lyndon factorization in Figure 7. □

Figure 7. Factorization of

L_{2 k}^{♭} #

into Lyndon factors studied in the proof of Theorem 6.

L_{2 k}^{♭} #

has

k + 1

Lyndon factors.

By Lemma 33 and Theorem 6, we conclude that the multiplicative sensitivity for substituting the last character of

L_{2 k}

is

Ω (k)

. We observe a similar result when substituting the last character with a larger character instead of a smaller one (#).

Theorem 7.

ρ (L_{2 k}^{♭} c) \geq k

.

Proof.

The lexicographic order between

a

,

b

, and

c

is

a < b < c

. Recall that

L_{2 k}^{♭}

makes

L_{2 i - 1}

for

i \in [2 . . k]

, and

a

as a Lyndon factor. In

L_{2 k}^{♭} c

,

c

is in position

f_{2 k}

; therefore, it does not affect anything until the last Lyndon factor

a

.

ac

is the Lyndon word itself because

a < c

. Therefore,

L_{2 k}^{♭} c

makes a k number of Lyndon factors, shown in Figure 8. □

Figure 8. Factorization of

L_{2 k}^{♭} c

into Lyndon factors studied in the proof of Theorem 7.

L_{2 k}^{♭} c

has k Lyndon factors.

6.2. Insertions at Specific Locations

According to Corollary 1,

ρ

is lower bounded by the number of distinct Lyndon factors. After editing

L_{2 k}

at any position, we can still find consecutive Lyndon conjugates of lower order which can merge to a higher order. For instance,

L_{2 k - 1} \cdot L_{2 k - 2}

merge into

L_{2 k}

, which can decrease the number of the Lyndon factor. Also,

L_{2 k - 3} \cdot L_{2 k - 2}

merge into

L_{2 k - 1}

. Our idea is to avoid consecutive Fibonacci Lyndon conjugates so that they do not merge because doing so avoids a decrease in a number of distinct Lyndon factors. Now, we consider editing the specific location of Fibonacci Lyndon conjugates, also resulting in an increase in runs. The following theorems describe the bijective BWT of

L_{2 k}

after some specific edit operations are applied.

Theorem 8.

We let

L_{2 k}

be a Fibonacci Lyndon conjugate. By inserting

a

at position α in

L_{2 k}

, ρ is at least k.

Proof.

We let

α

be the number of additions of odd Fibonacci numbers

f_{2 k - 3} + f_{2 k - 5} + \dots + f_{3} + f_{1}

. Recall that the Fibonacci word

F_{i} = X_{i} c

with

c \in {ab, ba}

has the Lyndon conjugate

L_{i} = a X_{i} b

. Further,

L_{2 k} = L_{2 k - 1} \cdot L_{2 k - 2} = a X_{2 k - 1} b \cdot a X_{2 k - 2} b

. Thus, we start with

a X_{2 k - 1} b \cdot a X_{2 k - 2} b

. To obtain many distinct Lyndon factors, we aim to produce Lyndon factors that are not consecutive. Knowing

X_{2 k - 1} = X_{2 k - 3} ba X_{2 k - 2}

,

a X_{2 k - 2}

merges with

a X_{2 k - 3} b

into

L_{2 k - 1}

, so it is best to divide

X_{2 k - 2}

.

a X_{2 k - 2}

divides into

a X_{2 k - 3} ba X_{2 k - 4}

. In this case, it is best to add

a X_{2 k - 3} b

as a new Lyndon factor since it is smallest among those Lyndon factors that are not consecutive with

X_{2 k - 1}

, the same as

a X_{2 k - 2}

;

a X_{2 k - 4}

divides into

a X_{2 k - 5} ba X_{2 k - 6}

, and we add

a X_{2 k - 5} b

as a Lyndon factor.

a X_{2 k - 6}

divides into

a X_{2 k - 7} ba X_{2 k - 8}

as we add

a X_{2 k - 7} b

as a Lyndon factor. The addition of Lyndon factors of

2 i - 1

for

i \in [1 . . k - 1]

continues until

a X_{5} = a X_{3} ba X_{4}

appears since

a X_{3} b

is the second smallest Lyndon factor in Fibonacci. Thus, we need

X_{1} = a

as the last Lyndon factor and it is obtained by inserting a # in

a X_{4}

, dividing

a X_{4}

into

a # X_{4}

. Since # is lexicographically smaller than any words from right to #, the right words become the Lyndon factor. Thus, we can obtain k Lyndon factors by inserting # in

L_{2 k}

:

k - 1

factors from

L_{2 k - 3} \cdot L_{2 k - 5} \dots L_{1}

and one from # concatenated with the remaining words. And this is shown in Figure 9. □

Figure 9. Inserting # at position

α

in

L_{2 k}

considered in the proof in Theorem 8.

By Lemma 33 and Theorem 8, we conclude that the multiplicative sensitivity for inserting a character into

L_{2 k}

is

Ω (k)

. In the same way, we can also insert the special character # to observe a similar behavior:

Theorem 9.

We let

L_{2 k}

be a Fibonacci Lyndon conjugate. By inserting # at position

f_{2 k} - 2

in

L_{2 k}

, ρ is at least

k + 1

.

Proof.

Unlike Theorem 8, we can obtain some Lyndon factors on the right side of

a X_{2 k} b

, adding

a X_{2 k - 1} b = L_{2 k - 1}

as a Lyndon factor. We divide

a X_{2 k - 2} b

into

a X_{2 k - 3} ba X_{2 k - 4} b

and obtain

a X_{2 k - 3} b = L_{2 k - 3}

. Further, we divide

a X_{2 k - 4} b

into

a X_{2 k - 5} ba X_{2 k - 6} b

, making

L_{2 k - 5}

. We divide

a X_{2 k - 6} b

and can obtain Lyndon factors such as

L_{2 k - 7} \dots L_{5}

. Lastly,

a X_{4} b

divides into

a X_{3} ba X_{2} b

, but since

X_{2}

is

ε

, the last Lyndon factor obtained here is

L_{3}

. To make more Lyndon factors, we can add # between

a

and

b

, turning into

a # b

, adding 2 Lyndon factors which are

a = L_{1}

and

# b

. Thus, we can obtain

k + 1

Lyndon factors here: k factors by

L_{2 k - 1}, L_{2 k - 3} \dots L_{1}

and one from

# b

. We visualize the Lyndon factorization in Figure 10. □

Figure 10. Insertion of # at position

f_{2 k} - 2

in

L_{2 k}

increases

ρ

by at least the number of distinct Lyndon factors

k + 1

studied in Theorem 9.

7. Additive Sensitivity of $ρ$ by $Ω (\sqrt{n})$

Here, we study the additive sensitivity of

ρ

with an approach similar to Section 5. In what follows, we establish that the additive sensitivity of

ρ

is at least

Θ (\sqrt{n})

. To that end, we again make use of the word

W_{k}

. Recall that

W_{k} = (\prod_{i = 2}^{k - 1} P_{i} E_{i}) Q_{k} = (\prod_{i = 2}^{k - 1} {ab}^{i} {aaab}^{i} {aba}^{i - 2}) {ab}^{k} a

.

Lemma 36.

The Lyndon conjugate

C_{k}

of

W_{k}

is

a^{k - 2} b^{k} a \cdot (\prod_{i = 2}^{k - 2} P_{i} E_{i}) \cdot P_{k - 1} {ab}^{k - 1} ab

.

Proof.

The Lyndon conjugate of

W_{k}

starts with the longest runs of

a

, which can be obtained by concatenating suffix

a^{k - 3}

of

E_{k - 1}

with prefix

a

of

Q_{k}

. Therefore,

C_{k} = a^{k - 2} b^{k} a \cdot (\prod_{i = 2}^{k - 2} P_{i} E_{i}) \cdot P_{k - 1} {ab}^{k - 1} ab = a^{k - 2} b^{k} a \cdot (\prod_{i = 2}^{k - 2} {ab}^{i} {aaab}^{i} {aba}^{i - 2}) \cdot {ab}^{k - 1} aa {ab}^{k - 1} ab

. □

Lemma 37.

ρ (C_{k})

=

6 k - 12

.

Proof.

According to Lemma 1, all conjugates have the same BWT, thus

r (W_{k}) = r (C_{k}) = 6 k - 12

. Also, since

C_{k}

is a Lyndon word,

r (C_{k}) = ρ (C_{k}) = 6 k - 12

. □

Recall that the runs in the BBWT and BWT are the same if the input word is Lyndon, cf. Lemma 1. Thus, we can leverage BWT computation if the input word is Lyndon since we can obtain the number of runs in the same way as in Section 5 by using

β (W)

for word W. In this section, we focus on three variations of the word

C_{k}

: deleting its last character and substituting its last character

b

with

c

or #.

7.1. Deletions and Edits of $C_{k}$ with a Character Smaller than $a$

Recall that

C_{k} = a^{k - 2} b^{k} a \cdot (\prod_{i = 2}^{k - 2} {ab}^{i} {aaab}^{i} {aba}^{i - 2}) \cdot {ab}^{k - 1} aa {ab}^{k - 1} ab

. Thus,

C_{k}^{♭}

, which is obtained by deleting the last character

b

, is

C_{k}^{♭} = a^{k - 2} b^{k} a \cdot (\prod_{i = 2}^{k - 2} {ab}^{i} {aaab}^{i} {aba}^{i - 2}) \cdot {ab}^{k - 1} aa {ab}^{k - 1} a

. Recall that the Lyndon conjugate of

C_{k}^{♭}

is the strictly smallest conjugate of all conjugates of

C_{k}^{♭}

. Since we obtain the longest runs of

a

s from a conjugate of

C_{k}^{♭}

by concatenating the last

a

with

a^{k - 2} b^{k} a

,

C_{k}^{♭}

cannot be a Lyndon word. In fact, it has two Lyndon factors, which are

a^{k - 2} b^{k} a \cdot (\prod_{i = 2}^{k - 2} {ab}^{i} {aaab}^{i} {aba}^{i - 2}) \cdot {ab}^{k - 1} aa {ab}^{k - 1}

, and we refer to both Lyndon factors as

D_{k}

(first factor) and

a

from now on. Figure 11 shows the Lyndon factorization. Since

r (a)

is 1, the only thing left to check is

r (D_{k})

. In

D_{k}

, we made a slight modification to the subword

E_{k - 1}

. In fact,

E_{k - 1} = {ab}^{k - 1} {aba}^{k - 3}

was changed to

{ab}^{k - 1} a^{k - 3}

, which we call

H_{k - 1}

in this section. Since

D_{k}

is a Lyndon word, we determine

ρ (D_{k})

using

M (D_{k})

with the BWT as we did before.

Figure 11. Introducing

D_{k}

from

C_{k}^{♭}

studied in Section 7.1.

D_{k}

is the first Lyndon factor of

C_{k}^{♭}

.

Lemma 38.

β (a^{k - 2} b, D_{k}) = b

.

Proof.

The only conjugate in

M (D_{k})

starting with prefix

a^{k - 2} b

is

a^{k - 2} b P_{2} \dots b

. The first conjugate in lexicographic order must start with the longest run of

a

. By the definition of

D_{k}

, the longest run of

a

has length

k - 2

, and it is obtained by concatenating suffix

a^{k - 3}

with prefix

a

of

Q_{k}

that is preceded by

b

(otherwise, we could extend the sequence of

a

characters). □

Lemma 39.

β (a^{i} b, D_{k}) = {ba}^{k - i - 2}

, for all

i \in [4 . . k - 3]

.

Proof.

With integer

i \in [3 . . k - 3]

, the conjugates in

M (D_{k})

starting with

a^{i} b

are

\begin{matrix} a^{i - 1} P_{i + 2} \dots b < a^{i - 1} P_{i + 3} \dots a < \dots < a^{i - 1} P_{k - 1} \dots a < a^{i} b^{k} a P_{2} \dots a . \end{matrix}

For all

i \in [4 . . k - 3]

, the factor

a^{i} b

can only be obtained from the concatenation of suffix

a^{i - 1}

from

E_{j - 1}

,

with the prefix $ab$ of $P_{j}$ for a $j \in [i + 2 . . k - 1]$ or
with the prefix $ab$ of $Q_{k}$ , if $j = k$ .

We can sort these conjugates according to the lexicographic order of

⋃_{j = i + 2}^{k - 1} P_{j} \cup Q_{k}

. All these conjugates end with an

a

, with the exception of the conjugate starting with

a^{i} P_{i + 2}

, since

D_{k}

has a unique occurrence of

{ba}^{i} b

. □

Lemma 40.

β (aaab, D_{k}) = bbbbb {(ab)}^{k - 7} baa

.

Proof.

The conjugates in

M (D_{k})

starting with

aaab

are

\begin{matrix} aa E_{2} \dots b & < aa E_{3} \dots b < aa E_{4} \dots b < aa P_{5} \dots b < aa E_{5} \dots b \\ < aa P_{6} \dots a < aa E_{6} \dots b < \dots < aa P_{k - 2} \dots a < aa E_{k - 2} \dots b \\ < aa H_{k - 1} \dots b < aa P_{k - 1} \dots a < aa Q_{k} \dots a . \end{matrix}

The above conjugates are obtained in the following cases.

Case 1:: by concatenating the suffix $aa$ of $E_{i - 1}$ with the prefix $ab$ of $P_{i}$ , if only $i \in [5 . . k - 1]$ ,
Case 2:: by concatenating the suffix $aa$ of $P_{i}$ , with the prefix $ab$ of $E_{i}$ , for all $i \in [2 . . k - 2]$ or with $H_{k - 1}$ if $i = k - 1$ ,
Case 3:: by concatenating the suffix $aa$ of $H_{k - 1}$ with the prefix $ab$ of $Q_{k}$ .

All these conjugates starting with

aaab

are sorted according to the lexicographic order of the words in

⋃_{i = 2}^{4} {aa E_{i}} \cup ⋃_{j = 5}^{k - 2} {aa P_{j} \cdot aa E_{j}} \cup {aa H_{k - 1}} \cup {aa P_{k - 1}} \cup {aa Q_{k}} .

The conjugates starting either with

aa P_{i}

, for all

i \in [6 . . k - 1]

in Case 1 or Case 3, end with an

a

. On the other hand, conjugates of Case 2 or

aa P_{5}

in Case 1 end with a

b

. □

Lemma 41.

β (aab, D_{k}) = {baaba}^{2 k - 8} .

Proof.

The conjugates in

M (D_{k})

starting with

aab

are

\begin{matrix} a P_{2} \dots b & < a E_{2} \dots a < a E_{3} \dots a < a P_{4} \dots b < a E_{4} \dots a \\ < a P_{5} \dots a < a E_{5} \dots a < \dots < a H_{k - 1} \dots a < a P_{k - 1} \dots a < a Q_{k} \dots a . \end{matrix}

Each of the cases from Case 1 to Case 3 in Lemma 40 induces a conjugate starting with

aab

, obtained by shifting on the left character

a

. It follows that all of these conjugates end with

a

. The other two conjugates that start with an

aab

are obtained by

concatenating the suffix $a$ of $Q_{k}$ with the prefix $ab$ of $P_{2}$ or
concatenating suffix $a$ of $E_{3}$ to the prefix $ab$ of $P_{4}$ .

In both cases, the obtained conjugates end with

b

. We conclude this proof by sorting lexicographically the conjugates in

a P_{2} \cup ⋃_{i = 2}^{3} {a E_{i}} \cup ⋃_{i^{'} = 4}^{k - 2} {a P_{i^{'}} \cdot a E_{i^{'}}} \cup {a H_{k - 1}} \cup {a P_{k - 1}} \cup {a Q_{k}} .

□

Lemma 42.

β (ab, D_{k}) = b^{k - 3} {aaba}^{2 k - 6} .

Proof.

The conjugates in

M (D_{k})

starting with

ab

are

\begin{matrix} {aba}^{k - 4} P_{k - 1} \dots b & < \dots < ab P_{3} \dots b \\ < P_{2} \dots a < E_{2} \dots a < P_{3} \dots b \\ < E_{3} \dots a < P_{4} \dots a < E_{4} \dots a < \dots < P_{k - 2} \dots a < E_{k - 2} \dots a \\ < H_{k - 1} \dots a < P_{k - 1} \dots a < Q_{k} \dots a . \end{matrix}

The above conjugates are obtained in the following cases.

Case 1:: $P_{i}$ for all $i \in [2 . . k - 1]$ ,
Case 2:: prefix $ab$ of $E_{i}$ , for all $i \in [2 . . k - 1]$ ,
Case 3:: ${aba}^{i - 2}$ from $E_{i}$ , for all $i \in [2 . . k - 2]$ or $ab$ from $H_{k - 1}$ ,
Case 4:: $ab$ from $Q_{k}$ .

For two distinct integers

i, i^{'}

with

i > i^{'} \geq 0

, we have

{aba}^{i} > {aba}^{i^{'}}

. Thus, the first conjugate in lexicographic order starting with

ab

is the one followed by the longest run of

a

s. The smallest of these conjugates can be found by concatenating the suffix

{aba}^{k - 4}

with the prefix

ab

of

P_{k - 1}

from Case 3. Then, the remaining conjugates in Case 3 which are

{aba}^{i - 2}

of

E_{i}

for all

i \in [2 . . k - 3]

follow in decreasing order. By construction of

E_{i}

, for all

i \in [2 . . k - 2]

, these conjugates must end with a

b

. Note that the remaining cases are obtained by shifting the character

a

from the conjugates starting with

aab

from Lemma 41 with the exception of the character starting with

P_{3}

. It follows that the latter ends with a

b

, while all the other conjugates end with

a

. □

Lemma 43.

β (ba, D_{k}) = {ba}^{k - 6} {bbbab}^{k - 4} {ab}^{k - 3} a .

Proof.

The conjugates in

M (D_{k})

starting with

ab

are

\begin{matrix} {ba}^{k - 2} b^{k} a P_{2} \dots b & < {ba}^{k - 4} P_{k - 1} \dots a < {ba}^{k - 5} P_{k - 2} \dots a < \dots < baaa P_{6} \dots a \\ < baa E_{2} \dots b < baa E_{3} \dots b < baa E_{4} \dots b < baa P_{5} \dots a \\ < baa E_{5} \dots b < \dots < baa E_{k - 2} \dots b < baa H_{k - 1} \dots b < ba P_{2} \dots b \\ < ba P_{4} \dots a \\ < {baba}^{k - 4} P_{k - 1} \dots b < \dots < bab P_{3} \dots b \\ < b P_{3} \dots a . \end{matrix}

The conjugates above are obtained by following cases.

Case 1:: suffix $baa$ of $P_{i}$ concatenating with $E_{i}$ for all $i \in [2 . . k - 2]$ or $H_{k - 1}$ if $i = k - 1$ ,
Case 2:: runs in $E_{i}$ for all $i \in [2 . . k - 2]$ ,
Case 3:: suffix $ba$ from $Q_{k}$ concatenating with $P_{2}$ ,
Case 4:: ${ba}^{k - 3}$ of $H_{k - 1}$ concatenating with $Q_{k}$ .

We have as many circular occurrences of

ba

as the number of maximal runs of

b

s in

D_{k}

. For Case 1, we have one conjugate starting with

baa E_{i}

for all

i \in [2 . . k - 2]

or

baa

H_{k - 1}

. Since each run of

b

s within each word from

⋃_{2}^{k - 1} P_{i}

is of length of at least 2, all conjugates of Case 1 end with

b

.

For Case 2, for all

i \in [2 . . k - 2]

, we can distinguish two subcases based on where

ba

starts:

Case 2 (a):: the first run of $ba$ in $E_{i}$ , which has a type of ${baba}^{i - 2}$ for all $i \in [2 . . k - 2]$ ,
Case 2 (b):: the second run of $ba$ in $E_{i}$ , which has a type of ${ba}^{i - 2}$ for all $i \in [2 . . k - 2]$ .

For Case 2 (a), we can see that these conjugates start with ${baba}^{i - 2} P_{i + 1},$ if $i \in [2 . . k - 2]$ . Similarly to Case 1, each conjugate for Case 2 (a) ends with a $b$ . Each conjugate in Case 2 (b) is obtained by shifting two characters on the right each conjugate in Case 2 (a). Therefore, all of these conjugates end with an $a$ and have prefixes of the type ${ba}^{i - 2} P_{i + 1}$ , if $i \in [2 . . k - 2]$ .
For Case 3, the conjugate starting with $ba$ in $Q_{k}$ has $ba P_{2}$ as a prefix, and it is preceded by a $b$ .
Lastly, for Case 4, the conjugates start with ${ba}^{k - 3}$ concatenating with $Q_{k}$ which ends with a $b$ .
Observe that only for Case 4 and Case 2 (b) we have conjugates starting with $baaaa$ . Hence, the first conjugate in lexicographic order is the one from Case 4 starting with ${ba}^{k - 3} Q_{k}$ , followed by those from Case 2 (b) which are ${ba}^{k - 4} P_{k - 1} < {ba}^{k - 5} P_{k - 2} < \dots < baaa P_{6}$ .

Among the remaining conjugates, those having prefix

baaa

either start with

baa P_{5}

from Case 2 (b) and from Case 1 starting with

baa E_{i}

for all

i \in [2 . . k - 2]

or

baa

H_{k - 1}

if

i = k - 1

. We can sort them according to the order of the words in

⋃_{i = 2}^{4} {baa E_{i}} \cup {baa P_{5}} \cup ⋃_{i = 5}^{k - 2} {baa E_{i}} \cup {baa H_{k - 1}} .

Then, the remaining conjugates with prefix

baa

are those starting with

ba P_{2}

from Case 3 and

ba P_{4}

from Case 2 (b). Finally, let us focus on the conjugates from Case 2 (a). These conjugates are sorted according to the length of the run of

a

s following the common prefix

bab

. The last conjugate left is the one starting with

b P_{3}

from Case 2 (b). Since this conjugate is greater than each conjugate considered in Case 2 (a), this is the greatest conjugate of

D_{k}

starting with

ba

and the thesis follows. □

Lemma 44.

β (b^{j} a, D_{k}) = {bab}^{2 k - 2 j - 2} a

for all

j \in [2 . . k - 2]

.

Proof.

With integer

i \in [2 . . k - 2]

, the conjugates in

M (D_{k})

starting with the prefix

b^{i} a

are

\begin{matrix} b^{i} a^{k - 3} Q_{k} \dots b & < b^{i} aa E_{i} \dots a < b^{i} aa E_{i + 1} \dots b < \dots < b^{i} aa E_{k - 2} \dots b < b^{i} aa H_{k - 1} \dots b \\ < b^{i} a P_{2} \dots b < b^{i} {aba}^{k - 4} P_{k - 1} \dots b < \dots < b^{i} {aba}^{i - 1} P_{i + 2} \dots b \\ < b^{i} {aba}^{i - 2} P_{i + 1} \dots a . \end{matrix}

Case 1:: concatenating $b^{i}$ $aa$ of $P_{j}$ with $E_{j}$ for all $j \in [i . . k - 1]$ or with $H_{k - 1}$ if only $j = k - 1$ ,
Case 2:: concatenating $b^{i} {aba}^{j - 2}$ of $E_{j}$ with $P_{j + 1}$ if only $j \in [i . . k - 2]$ ,
Case 3:: concatenating $b^{i} a^{k - 3}$ of $H_{k - 1}$ with $Q_{k}$ ,
Case 4:: concatenating $b^{i} a$ with $P_{2}$ .

We consider these four cases separately. For all

j \in [i . . k - 2]

, the conjugate starting within

P_{j}

has a prefix of

b^{i} aa E_{j}

or

b^{i} aa H_{k - 1}

(Case 1). For all

j \in [i . . k - 2]

, the conjugates starting within

E_{j}

have a prefix of

b^{i} {aba}^{j - 2} P_{j + 1}

(Case 2). In addition, conjugate starting within a word in Case 3 has a prefix of

b^{i} a^{k - 3} Q_{k}

. Finally, the conjugates starting with

Q_{k}

starts with

b^{i} a P_{2}

(Case 4). By construction, we can see that first we have all the conjugates first from Case 3 and then from Case 1 sorted according to the lexicographic order into

⋃_{j = i}^{k - 2} b^{i} aa E_{j} \cup b^{i} aa H_{k - 1}

; then, we have the conjugate from Case 4, then Case 2 sorted according to the decreasing length of the run of

a

s following the common prefix

b^{i} ab

. Moreover, we note that only when the run of

b

s is exactly of length

i

, the conjugate ends with

a

. Thus, only the conjugates ending with an

a

are those starting within

b^{i} aa E_{i}

and

b^{i} {aba}^{i - 2} P_{i + 1}

. □

Lemma 45.

β(

b^{k - 1} a, D_{k}

)=

aab

.

Proof.

There are three conjugates in

M (D_{k})

starting with prefix

b^{k - 1} a

. These conjugates are

\begin{matrix} b^{k - 1} a^{k - 3} Q_{k} \dots a < b^{k - 1} {aaab}^{k - 1} \dots a < b^{k - 1} a P_{2} \dots b . \end{matrix}

Observe that the only conjugates with prefix

b^{k - 1} a

have the prefixes, respectively, of

b^{k - 1} a^{k - 3} Q_{k}

,

b^{k - 1} aa H_{k - 1}

, and

b^{k - 1} a P_{2}

. One can see that these conjugates taken in this order are already sorted, and only the conjugate starting within

Q_{k}

ends with

b

, while the other two have

a

. □

Lemma 46.

β (b^{k} a, D_{k}) = a

.

Proof.

The last conjugate in

M (D_{k})

with prefix

b^{k} a

is

b^{k} a P_{2} \dots a

. Finally, the only occurrence of

b^{k}

is within

Q_{k}

. Hence, the last conjugate in lexicographic order starts with

b^{k} a P_{2}

, and since the run of

b

’ is maximal, it ends with an

a

, and the thesis follows. □

We summarize the above lemmas as follows.

Lemma 47.

For integer

k \geq 10

,

ρ (D_{k}) = 8 k - 18

, cf. Table 14. The BWT of the word

D_{k}

is given by

BBWT (D_{k}) = \prod_{i = 2}^{k - 1} β (a^{k - i} b) \cdot \prod_{i = 1}^{k} β (b^{i} a)

.

Table 14. Classification of the number of runs obtain in Lemma 47. The total number of runs is

8 k - 18

.

Proof.

Every conjugate of

β (a^{i} b)

is smaller than each conjugate of

β (a^{i^{'}} b)

for every

1 \leq i^{'} < i \leq k - 2

. Symmetrically, every conjugate of

β (b^{j} a)

is greater than any conjugate of

β (b^{j^{'}} a)

, for every

1 \leq j^{'} < j \leq k

. Since we considered all the disjoint ranges of conjugates of

D_{k}

based on their common prefix, the word

\prod_{i = 2}^{k - 1} β (a^{k - i} b) \cdot \prod_{i = 1}^{k} β (b^{i} a)

is the BWT of

D_{k}

.

With the structure of

BBWT (D_{k})

, we can easily derive its number of runs. The word

\prod_{i = 2}^{k - 4} β (a^{k - i} b)

has exactly

2 (k - 6)

runs. We start with 2 runs from

β (a^{k - 2} b) β (a^{k - 3} b) = bba,

and then, concatenating each

β (a^{i} b)

up to

β (a^{4} b)

adds 2 new runs each. By counting, we observe that

β (aaab), β (aab), β (ab)

have

2 (k - 6)

, 4, 4. The boundaries between these words do not yet merge. The word

β (ba)

has exactly 8 runs. The remaining part of the BWT, that is,

\prod_{i = 2}^{k} β (b^{i} a)

, has

4 (k - 3) + 2

runs. Concatenating each

β (b^{2} a)

to

β (b^{k - 2} a)

adds 4 new runs each. The word

β (b^{k} a)

adds only one run by

b

, as it contains an

a

that merges with the previous one. Finally,

β (b^{k} a)

adds one run. Altogether, we have

2 (k - 6) + 2 (k - 6) + 4 + 4 + 8 + 4 (k - 3) + 2 = 8 k - 18

, and the claim holds. □

Using Lemma 47 above, we can finally obtain the runs of

C_{k}^{♭} = D_{k} a

.

Theorem 10.

ρ (C_{k}^{♭}) = 8 k - 17

.

Proof.

C_{k}^{♭} = a^{k - 2} b^{k} a \cdot (\prod_{i = 2}^{k - 2} {ab}^{i} {aaab}^{i} {aba}^{i - 2}) \cdot {ab}^{k - 1} aa {ab}^{k - 1} a

. The Lyndon conjugate of

C_{k}^{♭}

is the smallest conjugate starting with the longest runs of

a

, thus it is the one starting with

a^{k - 1}

. Therefore, it is obvious that

C_{k}^{♭}

is not a Lyndon word, then it is Lyndon factorized by an

a

and the residual which is

D_{k}

. Figure 12 depicts the Lyndon factorization of

C_{k}^{♭}

. Since the lexicographic order between

a

and

D_{k}

is

a

< D_{k}

, the runs of

C_{k}^{♭}

add one run because the first conjugate of

D_{k}

from Lemma 38 ends with a

b

. Therefore,

ρ (C_{k}^{♭}) = ρ (a) + ρ (D_{k}) = 8 k - 17

. □

Figure 12. Lyndon factorization of

C_{k}^{♭}

. We obtain

ρ (C_{k}^{♭})

by knowing the number of runs of both its Lyndon factors and where these conjugates are sorted in the BBWT. The analysis is in the proof of Theorem 10.

With Lemma 37 and Theorem 10, we determine that the additive sensitivity of

ρ

for

C_{k}

is

Θ (log n)

when deleting the last character.

Theorem 11.

ρ (C_{k}^{♭}

#)=

8 k - 16

.

Proof.

C_{k}^{♭}

# =

a^{k - 2} b^{k} a \cdot (\prod_{i = 2}^{k - 2} {ab}^{i} {aaab}^{i} {aba}^{i - 2}) \cdot {ab}^{k - 1} aa {ab}^{k - 1} a

#.

C_{k}^{♭} #

is Lyndon factorized into three parts, which are

D_{k}

,

a

and #, because the lexicographic order of

a

is lower than

D_{k}

, and moreover # is smaller than both

D_{k}

and

a

. Therefore, the

ρ (C_{k}^{♭} #) = ρ (#) + ρ (a) + ρ (D_{k}) = 8 k - 16

. We show a sketch in Figure 13. □

Figure 13. Lyndon factorization of

C_{k}^{♭} #

. Compared to Figure 12, we have one additional Lyndon factor. The analysis is in proof of Theorem 11.

With Lemma 37 and Theorem 11, we obtain that the additive sensitivity of

ρ

for

C_{k}

is

Θ (log n)

when substituting the last character.

7.2. Editing $C_{k}$ with a Character Larger than $b$

Now, we consider the editing operation

C_{k}

with a character

c

that is lexicographically larger than any character in

C_{k}

. In this part, we consider two edit operations that add

c

in the last part of

C_{k}

, and substitute the last character of

C_{k}

into

c

.

7.2.1. Appending $c$ to $C_{k}$

Now, we prove that adding

c

to

C_{k}

, i.e.,

C_{k}

becomes

C_{k} c

, also adds

Θ (\sqrt{n})

in runs in BBWT.

C_{k} c = a^{k - 2} b^{k} a \cdot (\prod_{i = 2}^{k - 2} {ab}^{i} {aaab}^{i} {aba}^{i - 2}) \cdot {ab}^{k - 1} aa {ab}^{k - 1} abc

. We illustrate

C_{k}

in Figure 14. Similar to Section 7.1, we slightly modify

E_{k - 1}

to

{ab}^{k - 1} {abca}^{k - 3}

. In this section, we call this modified subword

S_{k - 1}

. The lexicographic order of

c

is larger than any words in

C_{k}

. Thus,

C_{k} c

is a Lyndon word itself. Recall that the runs of a Lyndon word are the same in both the BWT or the BBWT, so we obtain

ρ (C_{k} c)

by using BWT with

M (C_{k} c)

the same way we did in previous lemmas.

Figure 14. Introducing the Lyndon word

C_{k} c

studied in Section 7.2.

Lemma 48.

β (a^{k - 2} b, C_{k} c) = c

.

Proof.

The first conjugate in

M (C_{k} c)

is

a^{k - 2} ba P_{2} \dots c

. The first conjugate must start with the longest run of

a

s. In

C_{k} c

, the longest run of

a

has a length of

k - 2

which is a prefix of itself, and it is obtained by concatenating the suffix

a^{k - 3}

of

S_{k - 1}

with

Q_{k}

, and it is preceded by a

c

. □

Lemma 49.

β (a^{i} b, C_{k} c) = {ba}^{k - i - 2}

for all

i \in [4 . . k - 3]

.

Proof.

In

M (C_{k} c)

, the conjugates starting with

a^{i} b

for

i \in [4 . . k - 3]

are

\begin{matrix} a^{i - 1} P_{i + 2} \dots b < a^{i - 1} P_{i + 3} \dots a < \dots < a^{i - 1} P_{k - 1} \dots a < a^{i - 1} Q_{k} \dots a . \end{matrix}

For all

i \in [4 . . k - 3]

, the factor

a^{i} b

can only be obtained, for all

j \in [i + 2 . . k]

, from the concatenation of the suffix

a^{i - 1}

of

E_{j - 1}

with prefix

ab

of

P_{j}

, if

j \in [i + 2 . . k - 1]

or from the concatenation with

a^{i - 1}

of

S_{k - 1}

with prefix

ab

of

Q_{k}

. We can sort these conjugates according to the lexicographic order of

⋃_{j = i}^{k - 1} a^{i - 1} P_{j} \cup a^{i - 1} Q_{k}

. Note that all these conjugates end with an

a

, with the exception of the conjugate starting with

a^{i - 1} P_{i + 2}

, since it is here the only occurrence of

{ba}^{i} b

can be found. □

Lemma 50.

β (aaab, C_{k} c) = bbbbb {(ab)}^{k - 6} a .

Proof.

In

M (C_{k} c)

, the conjugates starting with

aaab

are

\begin{matrix} aa E_{2} \dots b & < aa E_{3} \dots b < aa E_{4} \dots b < aa P_{5} \dots b < aa E_{5} \dots b \\ < aa P_{6} \dots a < aa E_{6} \dots b < \dots < aa P_{k - 2} \dots a \\ < aa E_{k - 2} b < aa P_{k - 1} \dots a < aa S_{k - 1} \dots b \\ < aa Q_{k} \dots a . \end{matrix}

Similarly to Lemma 49,

aaab

can be obtained from concatenation of the suffix

aa

of

E_{j - 1}

, with the prefix

ab

of

P_{j}

, if

j \in [5 . . k - 1]

, or concatenating

aa

of

S_{k - 1}

with prefix

ab

of

Q_{k}

. On the other hand, there are more conjugates from concatenating suffix

aa

of

P_{j^{'}}

to the prefix

ab

of

E_{j^{'}}

, for all

j^{'} \in [2 . . k - 2]

, or with

S_{k - 1}

if

j^{'} = k - 1

. All the conjugates starting with

aaab

are sorted according to the lexicographic order of the words in

⋃_{j = 2}^{4} {aa E_{j}} \cup {aa P_{5} \cdot aa E_{5}} \cup ⋃_{j^{'} = 6}^{k - 2} {aa P_{j^{'}} \cdot aa E_{j^{'}}} \cup {aa P_{k - 1} \cdot aa S_{k - 1}} \cup {aa Q_{k}}

. Note that all the conjugates starting either with

aa P_{j}

, for all

j \in [6 . . k - 1]

, or with

aa Q_{k}

, end with

a

. On the other hand, the conjugates starting either with

aa P_{5}

or with

aa E_{j}

, for all

j \in [2 . . k - 2]

or

aa S_{k - 1}

, end with

b

. □

Lemma 51.

β (aab, C_{k} c) = {baaba}^{2 k - 8}

.

Proof.

The conjugates starting with

aab

in

M (C_{k} c)

are

\begin{matrix} a P_{2} \dots b & < a E_{2} \dots a < a E_{3} \dots a < a P_{4} \dots b \\ < a E_{4} \dots a < a P_{5} \dots a < a E_{4} \dots a < \dots < a P_{k - 1} \dots a < a S_{k - 1} \dots a < a Q_{k} \dots a . \end{matrix}

Each of the conjugates starting with

aaab

from Lemma 50 induces a conjugate starting with

aab

, obtained by shifting one character on the left

a

. It follows that all of these conjugates end with

a

. The other conjugates starting with

aab

are the ones obtained by concatenating the suffix

a

of

E_{3}

and the prefix

ab

of

P_{4}

, and the one obtained by concatenating the suffix

a

of

Q_{k}

and the prefix

ab

of

P_{2}

. Moreover, both conjugates end with a

b

. We conclude this proof by sorting lexicographically the conjugates in

{a P_{2}} \cup ⋃_{i = 2}^{3} {a E_{i}} \cup ⋃_{i = 4}^{k - 2} {a P_{i} \cdot a E_{i}} \cup {a P_{k - 1} \cdot a S_{k - 1}} \cup {a Q_{k}}

. □

Lemma 52.

β (ab, C_{k} c) = b^{k - 3} {aaba}^{2 k - 6} b

.

Proof.

The conjugates in

M (C_{k} c)

starting with

ab

are

\begin{matrix} {ab}^{k - 4} P_{k - 1} \dots b & < {ab}^{k - 5} P_{k - 2} \dots b < \dots < ab P_{3} \dots b \\ < P_{2} \dots a < E_{2} \dots a < P_{3} \dots b \\ < E_{3} \dots a < P_{4} \dots a < E_{4} \dots a < \dots < P_{k - 1} \dots a < S_{k - 1} \dots a < Q_{k} \dots a \\ < abc \dots b . \end{matrix}

For all two distinct integers

i, i^{'}

with

i > i^{'} \geq 0

, we have

{ab}^{i} ab < {ab}^{i} ab

. Thus, the first conjugate in lexicographic order starting with

ab

is the one followed by the longest run of

a

s. The smallest of these conjugates can be found by concatenating the suffix

{aba}^{k - 4}

of

E_{k - 2}

with

P_{k - 1}

, followed by the suffix

{aba}^{i - 3}

of

E_{i - 1}

concatenated with

P_{i}

, for all

i \in [3 . . k - 2]

, taken in decreasing order. By construction of

E_{i}

, for all

i \in [2 . . k - 2]

, these conjugates all end with a

b

. The remaining conjugates starting with

ab

are exactly those conjugates that have as prefix either

P_{i}

or

E_{i}

, for all

i \in [2 . . k - 2]

,

P_{k - 1}

,

S_{k - 1}

or

Q_{k}

. Note that all of these conjugates are obtained by shifting one character on the left

a

from the conjugates starting with

aab

from Lemma 51, with the exception of one starting with

P_{3}

. It follows that the latter ends with a

b

, while all the other conjugates end with

a

. Finally, the conjugate starting with the prefix

abc

follows, which ends with

b

. □

Lemma 53.

β (ba, C_{k} c) = a^{k - 6} {bbbab}^{k - 4} {ab}^{k - 3} ab .

Proof.

In

M (C_{k} c)

, the conjugates starting with

ba

are

\begin{matrix} {ba}^{k - 4} P_{k - 1} \dots a & < {ba}^{k - 5} P_{k - 2} \dots a < \dots < baaa P_{6} \dots a < baa E_{2} \dots b \\ < baa E_{3} \dots b < baa E_{4} \dots b < baa P_{5} \dots a \\ < baa E_{5} \dots b < baa E_{6} \dots b < \dots < baa E_{k - 2} \dots b < baa S_{k - 1} \dots b \\ < ba P_{2} \dots b < ba P_{4} \dots a < {baba}^{k - 4} P_{k - 1} \dots b < \dots < baba P_{4} \dots b \\ < bab P_{3} \dots b < b P_{3} \dots a < {babca}^{k - 3} Q_{k} \dots b . \end{matrix}

We have as many circular occurrences of

ba

as the number of maximal runs of

b

in

C_{k} c

. We have four cases.

Case 1:: one run of $b$ s in $P_{i}$ , for all $i \in [2 . . k - 1]$ ,
Case 2:: two runs in $E_{i}$ for all $i \in [2 . . k - 2]$ ,
Case 3:: one run of $ba$ in $Q_{k}$ ,
Case 4:: one run of $ba$ in $S_{k - 1}$ .

For Case 1, we have one conjugate starting with

baa E_{i}

, for each

i \in [2 . . k - 2]

, or

baa S_{k - 1}

. Since each run of

b

s within each word from

⋃_{i = 2}^{k - 1} {P_{i}}

is of length of at least 2, all conjugates in Case 1 end with a

b

.

For Case 2 and all

i \in [2 . . k - 2]

, we can distinguish between two subcases based on where

ba

starts:

Case 2 (a):: a first run of $ba$ in $E_{i}$ , which has a type of ${baba}^{i - 2}$ for all $i \in [2 . . k - 2]$ ,
Case 2 (b):: a second run of $ba$ in $E_{i}$ , which has a type of ${ba}^{i - 2}$ for all $i \in [2 . . k - 2]$ .

For Case 2 (a), we can see that these conjugates are of the type ${baba}^{i - 2} P_{i + 1}$ , for $i \in [2 . . k - 2]$ . Analogously to Case 1, each conjugates for Case 2 (a) end with a $b$ . Each conjugate in Case 2 (b) is obtained by shifting two characters on the right each conjugate in Case 2 (a). Therefore, all of these conjugates end with an $a$ and have prefixes of the type ${ba}^{i - 2} P_{i + 1}$ , for all $i \in [2 . . k - 2]$ .
For Case 3, the conjugate starting with $ba$ in $Q_{k}$ has $ba P_{2}$ as prefix, and it is preceded by a $b$ .
Finally, in Case 4, there is one run of $ba$ , having a prefix of ${babca}^{k - 3} Q_{k}$ , ending with $b$ .
Only for Case 2 (b) we have conjugates starting with $baaaa$ . Hence, the first conjugate in lexicographic order is the one starting with ${ba}^{k - 4} P_{k - 1}$ , followed by those ${ba}^{k - 5} P_{k - 2} < \dots < baaa P_{6}$ .

Among the remaining conjugates, those having prefix

baaa

either start with

baa P_{5}

from Case 2 (b) or

baa E_{i}

from Case 1, for all

i \in [2 . . k - 2]

or

baa S_{k - 1}

if

i = k - 1

. We can sort these conjugates by following the order of

⋃_{i = 2}^{4} {baa E_{i}} \cup {baa P_{5}} \cup ⋃_{i = 5}^{k - 2} {baa E_{i}} \cup {baa S_{k - 1}}

. Then, the remaining conjugates with prefix

baa

are those starting with

ba P_{2}

from Case 3 and

ba P_{4}

from Case 2 (b). Finally, we focus on the conjugates from Case 2 (a). These conjugates are sorted according to the length of the run of

a

s following the common prefix

bab

. The last two conjugates left are one starting with

b P_{3}

from Case 2 (b), and the one from Case 4, which is

{babca}^{k - 3} Q_{k}

. These two conjugates are already sorted. Since these conjugates are greater than other conjugates, these are the greatest conjugates of

M (C_{k} c)

starting with

ba

. □

Lemma 54.

β (b^{i} a, C_{k} c) = {ab}^{2 k - 2 i - 2} ab

for all

i \in [2 . . k - 2]

.

Proof.

With integer

i \in [2 . . k - 2]

, conjugates in

M (C_{k} c)

with prefix

b^{i} a

are

\begin{matrix} b^{i} aa E_{i} \dots a & < b^{i} aa E_{i + 1} \dots b < \dots < b^{i} aa E_{k - 2} \dots b < b^{i} aa S_{k - 1} \dots b \\ < b^{i} a P_{2} \dots b < b^{i} {aba}^{k - 4} P_{k - 1} \dots b \\ < b^{i} {aba}^{k - 5} P_{k - 2} \dots b < \dots < b^{i} {aba}^{i - 1} P_{i + 2} \dots b \\ < b^{i} {aba}^{i - 2} P_{i + 1} \dots a < b^{i} {abca}^{k - 3} Q_{k} \dots b . \end{matrix}

With integer

i \in [2 . . k - 2]

, these conjugates are obtained in the following cases.

Case 1:: concatenating $b^{i} aa$ of $P_{j}$ with $E_{j}$ , for all $j \in [i . . k - 2]$ or with $S_{k - 1}$ if $j = k - 1$ ,
Case 2:: concatenating $b^{i} {aba}^{j - 2}$ of $E_{j}$ with $P_{j + 1}$ for all $i \in [2 . . k - 2]$ ,
Case 3:: concatenating $b^{i} a$ of $Q_{k}$ with $P_{2}$ ,
Case 4:: concatenating $b^{i} {abca}^{k - 3}$ of $S_{k - 1}$ with $Q_{k}$ .

We consider the four cases separately. For all

j \in [i . . k - 1]

, the conjugate starting within

P_{j}

(Case 1) has as prefix

b^{i} aa E_{j}

if

j \in [i . . k - 2]

or

b^{i} aa S_{k - 1}

if

j = k - 1

. Also, when

j \in [i . . k - 2]

, the conjugate starting within

E_{j}

(Case 2) has the prefix of

b^{i} {aba}^{j - 2} P_{j + 1}

. In addition, the conjugate starting within

Q_{k}

(Case 3) has as prefix

b^{i} a P_{2}

. Finally, the conjugate that begins within

S_{k - 1}

(Case 4) has a prefix of

b^{i} {abca}^{k - 3}

. By construction, we can see that all the conjugates from Case 1 are sorted according to the lexicographic order of the words in

⋃_{j = i}^{k - 2} {b^{i} aa E_{j}} \cup {b^{i} aa S_{k - 1}}

; then, we have the conjugate from Case 3. Following, we have the conjugate from Case 2, sorted according to the decreasing length of the run of

a

s following the common prefix

b^{i} ab

. Finally, the conjugate of Case 4 follows. Moreover, we note that only when the run of

b

s is exactly of length i ends the conjugate with an

a

. Thus, only conjugates ending with an

a

are those starting within

P_{i}

and

E_{i}

, i.e., those with prefixes

b^{i} aa E_{i}

and

b^{i} {aba}^{i - 2} P_{i + 1}

. □

Lemma 55.

β (b^{k - 1} a, C_{k} c

) =

aba

.

Proof.

In

M (C_{k} c)

, the conjugates with prefix

b^{k - 1} a

are

\begin{matrix} b^{k - 1} aa S_{k - 1} \dots a < b^{k - 1} a P_{2} \dots b < b^{k - 1} {abca}^{k - 3} Q_{k} \dots a . \end{matrix}

Observe that the only conjugates with prefix

b^{k - 1} a

start within

P_{k - 1}

,

Q_{k}

and

S_{k - 1}

. These conjugates have prefixes of, respectively,

b^{k - 1} aa S_{k - 1}, b^{k - 1} a P_{2}, b^{k - 1} {abca}^{k - 3} Q_{k}

. One can see that these conjugates taken in this order are already sorted, and only the conjugate starting within

Q_{k}

ends with

b

, while the other two have

a

. □

Lemma 56.

β (b^{k} a, C_{k} c

) =

a

.

Proof.

In

M (C_{k} c)

, the conjugate with prefix

b^{k} a

is

b^{k} a P_{2} \dots a

. The only occurrence of

b^{k} a

is within

Q_{k}

. Since the run of

b

s is maximal, it ends with

a

. □

Lemma 57.

β (bc, C_{k} c) = a

.

Proof.

In

M (C_{k} c)

, the conjugate starting with

bc

is

{bca}^{k - 3} Q_{k} \dots a

. The only occurrence of

bc

is in

S_{k - 1}

, preceded by an

a

. □

Lemma 58.

β (c, C_{k} c) = b

.

Proof.

In

M (C_{k} c)

, the last conjugate is

{ca}^{k - 3} Q_{k} \dots b

since

c

is biggest character in

C_{k} c

. The only occurrence of

c

is in the last character of

C_{k} c

. Hence, the last conjugate in lexicographic order starts with

{ca}^{k - 3} Q_{k}

. Since

c

is preceded by

b

, the conjugate

C_{k} c

contributes a

b

to the BWT. □

The following theorem puts the above lemmas together.

Theorem 12.

ρ (C_{k} c) = 8 k - 12

, cf. Table 15. It holds that

BBWT (C_{k} c) = BWT (C_{k} c) = \prod_{i = 2}^{k - 1} β (a^{k - i} b) \cdot \prod_{i = 1}^{k} β (b^{i} a) \cdot β (bc) \cdot β (c)

.

Table 15. Classification of the number of runs obtain in Theorem 12. The total number of runs is

8 k - 12

.

Proof.

Every conjugate of

β (a^{i} b)

is smaller than any conjugate of

β (a^{i^{'}} b)

, for all

1 \leq i^{'} \leq i \leq k - 2

. Symmetrically, every conjugate of

β (b^{j} a)

is greater than any conjugate of

β (b^{j^{'}} a)

, for every

1 \leq j^{'} \leq j \leq k

. Since we considered all the disjoint ranges of conjugates of

C_{k} c

based on their common prefix,

\prod_{i = 2}^{k - 1} β (a^{k - i} b) \cdot \prod_{i = 1}^{k} β (b^{i} a) \cdot β (bc) \cdot β (c)

is the BBWT and BWT of

C_{k} c

.

With the structure of BWT(

C_{k} c

), we can easily derive its number of runs. The word

\prod_{i = 2}^{k - 4} β (a^{k - i} b)

has exactly

2 k - 11

runs: we start with 1 run from

β (a^{k - 2} b) = c

, and then concatenating each from

β (a^{k - 3} b)

to

β (aaaab)

adds 2 runs each. By counting, we observe that

β (aaab), β (aab), β (ab)

, have

2 k - 10

, 4, 5 runs, respectively. The boundaries between these words do not merge. The word

β (ba)

has exactly 8 runs. The remaining parts of the BWT

\prod_{i = 2}^{k} β (b^{i} a)

have

4 (k - 3) + 4

runs: we start adding 4 runs each by concatenating each

β (bba)

to

β (b^{k - 2} a)

. And

β (b^{k - 1} a)

adds 3 runs. On the other hand, the words

β (b^{k} a)

and

β (bc)

do not add new runs, as they consist only of an

a

that merges with the previous one. For the last element,

β (c)

adds one run. Altogether, we have

2 k - 11 + 2 k - 10 + 8 + 4 + 5 + 4 k - 12 + 3 + 1 = 8 k - 12

, and the claim holds. □

With Lemma 37 and Theorem 12, we obtain that the additive sensitivity of

ρ

for

C_{k}

is

Θ (log n)

when appending a character.

7.2.2. Substituting the Last Position of $C_{k}$ with $c$

Here, we focus on the word

C_{k}^{♭} c

that we obtain by substituting the last character of

C_{k}

with

c

, which is lexicographically larger than any character in

C_{k}

. See Figure 15 for a visualization. The same as Section 7.2.1,

E_{k - 1}

changes to

{ab}^{k - 1} {aca}^{k - 3},

and we refer to it as

R_{k - 1}

below. According to its definition,

C_{k}^{♭}

c

=

a^{k - 2} b^{k} a \cdot (\prod_{i = 2}^{k - 2} {ab}^{i} {aaab}^{i} {aba}^{i - 2}) \cdot {ab}^{k - 1} aa {ab}^{k - 1} a

c

. Recall from the proof of Theorem 10 that

C_{k}^{♭}

is not a Lyndon factor. The Lyndon factors of

C_{k}^{♭}

are

D_{k}

and

a

. There, we prove that the run of

C_{k}^{♭}

is

8 k - 17

. We start with the first observation that

C_{k}^{♭} c

is a Lyndon word.

Figure 15. Introducing the Lyndon word

C_{k}^{♭}

c

studied in Section 7.2.2.

Lemma 59.

C_{k}^{♭} c

is a Lyndon word.

Proof.

The longest run of

a

has a length of

k - 2

, which is a prefix of

C_{k}^{♭} c

itself having prefix

a^{k - 2} b

. Thus,

C_{k}^{♭} c

is a Lyndon word. □

Thus, we prove

ρ (C_{k}^{♭} c)

using the

M (C_{k}^{♭} c)

as we did above.

Lemma 60.

β (a^{k - 2} b, C_{k}^{♭} c) = c .

Proof.

The first conjugate in

M (C_{k}^{♭} c)

is

a^{k - 2} b^{k} a P_{2} \dots c

. The first conjugate in lexicographic order must start with the longest run of

a

s. By the definition of

C_{k}^{♭}

, the longest run of

a

has length

k - 2

, and it is obtained by concatenating the suffix

a^{k - 3}

of

R_{k - 1}

with

Q_{k}

, which is preceded by a

c

. □

Lemma 61.

β (a^{i} b, C_{k}^{♭} c) = {ba}^{k - 2 - i}

for all

i \in [4 . . k - 3]

.

Proof.

All conjugates in

M (C_{k}^{♭} c)

starting with the prefix

a^{i} b

for any

i \in [4 . . k - 3]

are given below.

\begin{matrix} a^{i - 1} P_{i + 2} \dots b < a^{i - 1} P_{i + 3} \dots a < \dots < a^{i - 1} P_{k - 1} \dots a < a^{i - 1} Q_{k} \dots a . \end{matrix}

For all

i \in [4 . . k - 3]

, the factor

a^{i} b

can only be obtained, for all

j \in [i + 2 . . k - 1]

, by concatenating the suffix

a^{i - 1}

of

E_{j - 1}

, with the prefix ab of

P_{j}

, or by concatenating suffix

a^{k - 3}

of

R_{k - 1}

with the prefix

ab

of

Q_{k}

. We can sort these conjugates according to the lexicographic order of

⋃_{j = i}^{k - 3} {a^{i - 1} P_{j + 2}} \cup {a^{i - 1} Q_{k}}

. Note that all these conjugates end with an

a

, with the exception of the conjugate starting with

a^{i - 1} P_{i + 2}

, since it is here the only occurrence of

{ba}^{i} b

can be found. □

Lemma 62.

β (aaab, C_{k}^{♭} c) = bbbbb {(ab)}^{k - 6} a

.

Proof.

The conjugates in

M (C_{k}^{♭} c)

starting with the prefix

aaab

are

\begin{matrix} aa E_{2} \dots b & < aa E_{3} \dots b < aa E_{4} \dots b < aa P_{5} \dots b < aa E_{5} \dots b \\ < aa P_{6} \dots a < aa E_{6} \dots b < \dots < aa P_{k - 2} \dots a < aa R_{k - 1} \dots b < aa Q_{k} \dots a . \end{matrix}

These conjugates are obtained from the following cases.

Case 1:: concatenating suffix $aa$ of $P_{i}$ with prefix $ab$ of $E_{i}$ , for all $i \in [2 . . k - 2]$ or with $R_{k - 1}$ if $i = k - 1$ ,
Case 2:: concatenating suffix $aa$ of $E_{i - 1}$ with prefix $ab$ of $P_{i}$ for all $i \in [5 . . k - 1]$ ,
Case 3:: concatenating suffix $aa$ of $R_{k - 1}$ with prefix $ab$ of $Q_{k}$ .

All these conjugates starting with

aaab

are sorted according to the lexicographic order of the words in

⋃_{i = 2}^{4} {aa E_{i}} \cup {aa P_{5} \cdot aa E_{5}} \cup ⋃_{i = 6}^{k - 2} {aa P_{i} \cdot aa E_{i}} \cup {aa P_{k - 1} \cdot aa R_{k - 1}} \cup {aa Q_{k}}

. Note that all the conjugates starting either with

aa P_{i}

, for all

i \in [6 . . k - 1]

of Case 2, or Case 3 end with

a

. On the other hand, the conjugates starting either with

aa P_{5}

of Case 2 or Case 1 end with a

b

. □

Lemma 63.

β (aab, C_{k}^{♭} c) = {baaba}^{2 k - 8} .

Proof.

The conjugates in

M (C_{k}^{♭} c)

that starts with the prefix

aab

are

\begin{matrix} a P_{2} \dots b & < a E_{2} \dots a < a E_{3} \dots a < a P_{4} \dots b < a E_{4} \dots a < a P_{5} \dots a \\ < a E_{5} \dots a < \dots < a P_{k - 2} \dots a < a E_{k - 2} \dots a < a P_{k - 1} \dots a < a R_{k - 1} \dots a \\ < a Q_{k} \dots a . \end{matrix}

Each of the conjugates starting with

aaab

from Lemma 62 induces a conjugate starting with

aab

, obtained by shifting one character on the left

a

. It follows that all of these conjugates end with

a

. The other conjugates starting with

aab

are the ones obtained by concatenating suffix

a

of

Q_{k}

with

ab

of

P_{2}

, and another is obtained by concatenating suffix

a

of

E_{3}

with

ab

of

P_{4}

. Moreover, both conjugates end with a

b

. We prove our claim by sorting the conjugates according to the lexicographic order of the words in

{a P_{2} \cdot a E_{2} \cdot a E_{3}} \cup ⋃_{i = 4}^{k - 2} {a P_{i} \cdot a E_{i}} \cup {a P_{k - 1} \cdot a R_{k - 1}} \cup {a Q_{k}}

. □

Lemma 64.

β (ab, C_{k}^{♭} c) = b^{k - 3} {aaba}^{2 k - 6}

.

Proof.

In

M (C_{k}^{♭} c)

, the conjugates which start with prefix

ab

are

\begin{matrix} {aba}^{k - 4} P_{k - 1} \dots b & < {aba}^{k - 5} P_{k - 2} \dots b < \dots < ab P_{3} \dots b \\ < P_{2} \dots a < E_{2} \dots a < P_{3} \dots b < E_{3} \dots a \\ < P_{4} \dots a < E_{4} \dots a < \dots < P_{k - 2} \dots a < E_{k - 2} \dots a \\ < P_{k - 1} \dots a < R_{k - 1} \dots a < Q_{k} \dots a . \end{matrix}

For all two distinct integers

i, i^{'}

with

i > i^{'} \geq 0

, we have

{aba}^{i} b < {aba}^{i^{'}} b

. Thus, the first conjugate in lexicographic order starting with

ab

is the one which is followed by the longest run of

a

s. The smallest of these conjugates can be found by concatenating the suffix

{aba}^{k - 4}

of

E_{k - 2}

with the prefix

ab

of

P_{k - 1}

, followed by the suffix

{aba}^{i - 3}

of

E_{i - 1}

concatenated with the prefix

ab

of

P_{i}

, for all

i \in [3 . . k - 2]

all taken in decreasing order. By construction of

E_{i}

, for all

i \in [2 . . k - 2]

, these conjugates must end with a

b

. The remaining conjugates starting with

ab

are exactly those conjugates having as prefix either

P_{i}

for all

i \in [2 . . k - 1]

and

E_{i^{'}}

for all

i^{'} \in [2 . . k - 2]

or

R_{k - 1}

and

Q_{k}

. Note that all of these conjugates are obtained by shifting one character on the left

a

from the conjugates starting with

aab

from Lemma 63, with the exception of one starting with

P_{3}

. It follows that the latter ends with a

b

, while all the other conjugates end with an

a

. □

Lemma 65.

β (ac, C_{k}^{♭} c) = b .

Proof.

In

M (C_{k}^{♭} c)

, the conjugate that starts with prefix

ac

is

{aca}^{k - 3} Q_{k} \dots b

. The lexicographic order of

c

is larger than

b

or

a

, so the prefix

ac

is also larger than the prefix

ab

.

ac

is obtained from

R_{k - 1}

, preceded by a

b

. □

Lemma 66.

β (ba, C_{k}^{♭} c) = a^{k - 6} {bbbab}^{k - 4} {ab}^{k - 3} ab .

Proof.

In

M (C_{k}^{♭} c)

, the conjugates starting with the prefix

ba

are

\begin{matrix} {ba}^{k - 4} P_{k - 1} \dots a & < {ba}^{k - 5} P_{k - 2} \dots a < \dots < baaa P_{6} \dots a \\ < baa E_{2} \dots b < baa E_{3} \dots b < baa E_{4} \dots b < baa P_{5} \dots a \\ < baa E_{5} \dots b < baa E_{6} \dots b < \dots < baa R_{k - 1} \dots b \\ < ba P_{2} \dots b < ba P_{4} \dots a \\ < {baba}^{k - 4} P_{k - 1} \dots b < {baba}^{k - 5} P_{k - 2} \dots b < \dots < bab P_{3} \dots b \\ < b P_{3} \dots a < {baca}^{k - 3} Q_{k} \dots b . \end{matrix}

One can notice that we have as many circular occurrences of

ba

as the number of maximal runs of

b

s in

M (C_{k}^{♭} c)

. The conjugates are obtained from the cases below.

Case 1:: one run of $b$ s in $P_{i}$ , for all $i \in [2 . . k - 1]$ ,
Case 2:: two runs in $E_{i}$ for all $i \in [2 . . k - 2]$ ,
Case 3:: one run in $Q_{k}$ ,
Case 4:: one run in $R_{k - 1}$ .

For Case 1, we have one conjugate starting with

baa E_{i}

for each

i \in [2 . . k - 1]

. Since each run of

b

s within each word from

⋃_{i = 2}^{k - 1} {P_{i}}

is of length of at least 2, all conjugates in Case 1 end with a

b

.

For Case 2, with integer

i \in [2 . . k - 2]

, we can distinguish between two subcases based on where

ba

starts:

Case 2 (a):: a first run of $ba$ in $E_{i}$ , which has a prefix of ${baba}^{i - 2} P_{i + 1}$ for all $i \in [2 . . k - 2]$ ,
Case 2 (b):: a second run of $ba$ in $E_{i}$ , which has a prefix of ${ba}^{i - 2} P_{i + 1}$ for all $i \in [2 . . k - 2]$ .

Similarly to Case 1, all the conjugates in Case 2 (a) end with a $b$ .
Each conjugate in Case 2 (b) is obtained by shifting two characters on the right each conjugate in Case 2 (a). Therefore, all of these conjugates end with an $a$ .
For Case 3, the conjugate starting with $ba$ in $Q_{k}$ has $ba P_{2}$ as a prefix, and it is preceded by a $b$ .
For Case 4, $ba$ in $R_{k - 1}$ has ${baca}^{k - 3}$ as a prefix, and it is preceded by a $b$ .
Observe that only for Case 2 (b) we have conjugates starting with $baaaa$ . Hence, the first conjugate in lexicographic order is the one starting with ${ba}^{k - 4} P_{k - 1}$ followed by ${ba}^{k - 5} P_{k - 2} < \dots < baaa P_{6}$ .
Among the remaining conjugates, those having prefix $baaa$ either start with $baa P_{5}$ from Case 2 (b) or $baa E_{i}$ from Case 1 for all $i \in [2 . . k - 1]$ . Thus, we can sort them according to the order of the words in $⋃_{i = 2}^{4} {baa E_{i}} \cup {baa P_{5}} \cup ⋃_{i = 5}^{k - 2} {baa E_{i}}$ . Then, the remaining conjugates with prefix $baa$ are those starting with $ba P_{2}$ from Case 3 and $ba P_{4}$ from Case 2 (b).

Finally, we focus on the conjugates from Case 2 (a). These conjugates are sorted according to the length of the run of

a

s following the common prefix

bab

. The last conjugates left are the one starting with

b P_{3}

from Case 2 (b). and the one starting with

{baca}^{k - 3}

from Case 4. These conjugates are lexicographically organized and are greater than any other cases, and therefore we analyzed all conjugates. □

Lemma 67.

β (b^{i} a, C_{k}^{♭} c) = {ab}^{2 k - 2 i - 2} ab

for all

i \in [2 . . k - 2]

.

Proof.

In

M (C_{k}^{♭} c)

, the conjugates starting with

b^{i} a

for all

i \in [2 . . k - 2]

are

\begin{matrix} b^{i} aa E_{i} \dots a & < b^{i} aa E_{i + 1} \dots b < b^{i} aa E_{i + 2} \dots b < \dots < b^{i} aa R_{k - 1} \dots b \\ < b^{i} a P_{2} \dots b < b^{i} {aba}^{k - 4} P_{k - 1} \dots b < \dots < b^{i} {aba}^{i - 1} P_{i + 2} \dots b \\ < b^{i} {aba}^{i - 2} P_{i + 1} \dots a < b^{i} {aca}^{k - 3} Q_{k} \dots b . \end{matrix}

All runs of

b

s of length of at least

i \in [2 . . k - 2]

are obtained from the cases below.

Case 1:: suffix $b^{i} aa$ in $P_{j}$ , for all $j \in [i . . k - 1]$
Case 2:: $b^{i} {aba}^{j - 2}$ in $E_{j}$ for all $j \in [i . . k - 2]$ ,
Case 3:: $b^{i} a$ in $Q_{k}$ ,
Case 4:: $b^{i} {aca}^{k - 3}$ in $R_{k - 1}$ .

Consider the four cases separately. The conjugate starting within $P_{j}$ (Case 1) has as prefix $b^{i} aa E_{j}$ if only $j \in [i . . k - 2]$ or $b^{i} aa R_{k - 1}$ if $j = k - 1$ .
And for all $j \in [i . . k - 2]$ , the conjugate starting within $E_{j}$ (Case 2) has as prefix $b^{i} {aba}^{j - 2} P_{j + 1}$ .
In addition, the conjugate starting within $Q_{k}$ (Case 3) has as prefix $b^{i} a P_{2}$ .
Finally, the conjugate that begins within $R_{k - 1}$ (Case 4) has as prefix $b^{i} {aca}^{k - 3}$ .

By construction, we have all the conjugates from Case 1 sorted according to the lexicographic order of the words in

⋃_{j = i}^{k - 2} {b^{i} aa E_{j}} \cup {b^{i} aa R_{k - 1}}

; then, we have the conjugate from Case 3. Then, the conjugates of Case 2 are sorted according to the decreasing length of the run of

a

s following the common prefix

b^{i} ab

. Finally, the conjugate of Case 4 follows. Moreover, note that only when the run of

b

s is exactly of length i the conjugate ends with an

a

. Thus, only the conjugates ending with an

a

are those starting within

P_{i}

and

E_{i}

, i.e., those with prefix

b^{i} aa E_{i}

and

b^{i} {aba}^{i - 2} P_{i + 1}

. □

Lemma 68.

β (b^{k - 1} a, C_{k}^{♭} c) = aba .

Proof.

In

M (C_{k}^{♭} c)

, there are exactly three conjugates that start with prefix

b^{k - 1} a

. These are

\begin{matrix} b^{k - 1} {aaab}^{k - 1} {aca}^{k - 3} Q_{k} \dots a < b^{k - 1} a P_{2} \dots b < b^{k - 1} {aca}^{k - 3} Q_{k} \dots a . \end{matrix}

Observe that the only conjugates with prefix

b^{k - 1} a

start within

P_{k - 1}

,

Q_{k}

, and

R_{k - 1}

. These conjugates have prefixes of, respectively,

b^{k - 1} R_{k - 1}

,

b^{k - 1} a P_{2}

,

b^{k - 1} {aca}^{k - 3} Q_{k}

. One can see that these conjugates taken in this order are already sorted, and only the conjugate starting within

Q_{k}

ends with

b

, while the other two end with

a

. □

Lemma 69.

β (b^{k} a, C_{k}^{♭} c) = a .

Proof.

In

M (C_{k}^{♭} c)

, only one conjugate starts with a prefix of

b^{k} a

and it is

b^{k} a P_{2} \dots a

. The only occurrence of

b^{k} a

is within

Q_{k}

, preceded by

a

. □

Lemma 70.

β (c, C_{k}^{♭} c) = a .

Proof.

The last conjugate in

M (C_{k}^{♭} c)

that starts with prefix

c

is

{ca}^{k - 3} Q_{k} \dots a

. The last conjugate in lexicographic order that starts with

c

occurs in

R_{k - 1}

. Since

c

is preceded by an

a

, it ends with

a

. □

The following theorem puts the above lemmas together.

Theorem 13.

ρ (C_{k}^{♭} c) = 8 k - 13

, cf. Table 16. The BBWT of

C_{k}^{♭} c

is BBWT(

C_{k}^{♭} c

) =

\prod_{i = 2}^{k - 1} β (a^{k - i} b) \cdot β (ac) \cdot \prod_{i = 1}^{k} β (b^{i} a) \cdot β (c)

.

Table 16. Classification of the number of runs obtained in Theorem 13. The total number of runs is

8 k - 13

.

Proof.

Every conjugate contributing a character to

β (a^{i} b)

is smaller than any conjugate of

β (a^{i^{'}} b)

, for all

1 \leq i^{'} \leq i \leq k - 2

. Symmetrically, every conjugate contributing a character to

β (b^{j} a)

is greater than any conjugate of

β (b^{j^{'}} a)

, for every

1 \leq j^{'} \leq j \leq k

. Since we considered all the disjoint ranges of conjugates of

C_{k} c

based on their common prefix,

\prod_{i = 2}^{k - 1} β (a^{k - i} b) \cdot β (ac) \cdot \prod_{i = 1}^{k} β (b^{i} a) \cdot β (c)

is the BBWT and BWT of

C_{k}^{♭} c

.

With the structure of BWT(

C_{k}^{♭} c

), we can easily derive its number of runs. The word

\prod_{i = 2}^{k - 4} β (a^{k - i} b)

has exactly

2 k - 11

runs: we start with 1 run from

β (a^{k - 2} b) = c

, and then concatenating each from

β (a^{k - 3} b)

to

β (a^{4} b)

adds 2 runs each. By counting, we observe that

β (aaab), β (aab),

and

β (ab)

contribute

2 k - 10

, 4, and 4 runs, respectively. The boundaries between these words do not merge. The conjugates in

β (ac)

and

β (ba)

contribute with 1 and 8 runs each. The remaining parts of the BWT

\prod_{i = 2}^{k} β (b^{i} a)

contribute

4 (k - 3) + 3

runs: we start adding 4 runs each by concatenating each

β (bba)

to

β (b^{k - 2} a)

. And

β (b^{k - 1} a)

adds 3 runs.

β (b^{k} a)

and

β (c)

do not add new runs, as they consist only of an

a

that merges with the previous one. The last part

β (c)

contributes one run. In total, we have

2 k - 11 + 2 k - 10 + 4 + 4 + 1 + 8 + 4 k - 12 + 3 = 8 k - 13

, and the claim holds. □

8. Conclusions

In this article, we analyzed the sensitivity of the Burrows–Wheeler Transform (BWT) and its bijective variant (BBWT) to single-character edits. We extended previous work on the BWT by a four-character alphabet setting and an alphabet reordering. Our findings reveal that BWT and BBWT exhibit similar sensitivity characteristics, with compression size changes that can follow a multiplicative logarithmic or additive square-root growth. These insights clarify that the BWT and BBWT are not robust repetitiveness measures, which is a crucial property for data compression applications. As future work, we would like to find positions in a word for which we can predict the compression size changes when editing that position. That would allow us to design algorithms to improve the compression power of BWT/BBWT by editing the word in a way that minimizes the compression size changes.

Author Contributions

Conceptualization, D.K.; Writing—original draft, H.J. All authors have read and agreed to the published version of the manuscript.

Funding

JSPS KAKENHI Grant Number 23H04378 and Yamanashi Wakate Grant Number 2291.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Burrows, M.; Wheeler, D.J. A Block Sorting Lossless Data Compression Algorithm; Technical Report 124; Digital Equipment Corporation: Palo Alto, CA, USA, 1994. [Google Scholar]
Ferragina, P.; Manzini, G. Opportunistic Data Structures with Applications. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science, Redondo Beach, CA, USA, 12–14 November 2000; pp. 390–398. [Google Scholar] [CrossRef]
Gagie, T.; Navarro, G.; Prezza, N. Optimal-Time Text Indexing in BWT-runs Bounded Space. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–10 January 2018; pp. 1459–1477. [Google Scholar] [CrossRef]
Gagie, T.; Navarro, G.; Prezza, N. Fully Functional Suffix Trees and Optimal Text Searching in BWT-Runs Bounded Space. J. ACM 2020, 67, 2:1–2:54. [Google Scholar] [CrossRef]
Bertram, N.; Fischer, J.; Nalbach, L. Move-r: Optimizing the r-index. In Proceedings of the 22nd International Symposium on Experimental Algorithms (SEA 2024), Vienna, Austria, 23–26 July 2024; Volume 301, pp. 1:1–1:19. [Google Scholar] [CrossRef]
Cobas, D.; Gagie, T.; Navarro, G. A Fast and Small Subsampled R-Index. In Proceedings of the 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021), Wrocław, Poland, 5–7 July 2021; Volume 191, pp. 13:1–13:16. [Google Scholar] [CrossRef]
Arakawa, Y.; Navarro, G.; Sadakane, K. Bi-Directional r-Indexes. In Proceedings of the 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022), Prague, Czech Republic, 27–29 June 2022; Volume 223, pp. 11:1–11:14. [Google Scholar] [CrossRef]
Shivakumar, V.S.; Ahmed, O.Y.; Kovaka, S.; Zakeri, M.; Langmead, B. Sigmoni: Classification of nanopore signal with a compressed pangenome index. Bioinformatics 2024, 40, i287–i296. [Google Scholar] [CrossRef]
Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef] [PubMed]
Li, R.; Yu, C.; Li, Y.; Lam, T.W.; Yiu, S.; Kristiansen, K.; Wang, J. SOAP2: An improved ultrafast tool for short read alignment. Bioinformatics 2009, 25, 1966–1967. [Google Scholar] [CrossRef] [PubMed]
Ferragina, P.; Manzini, G.; Muthukrishnan, S.M. The Burrows–Wheeler Transform: Ten Years Later; DIMACS: Piscataway, NJ, USA, 2004. [Google Scholar]
Gagie, T.; Manzini, G.; Navarro, G.; Stoye, J. 25 Years of the Burrows–Wheeler Transform (Dagstuhl Seminar 19241). Dagstuhl Rep. 2019, 9, 55–68. [Google Scholar]
Adjeroh, D.; Bell, T.; Mukherjee, A. The Burrows–Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Gessel, I.M.; Reutenauer, C. Counting Permutations with Given Cycle Structure and Descent Set. J. Comb. Theory Ser. A 1993, 64, 189–215. [Google Scholar]
Schindler, M. A fast block-sorting algorithm for lossless data compression. In Proceedings of the DCC ’97. Data Compression Conference, Snowbird, UT, USA, 25–27 March 1997; p. 469. [Google Scholar] [CrossRef]
Mantaci, S.; Restivo, A.; Rosone, G.; Sciortino, M. An extension of the Burrows–Wheeler Transform. Theor. Comput. Sci. 2007, 387, 298–312. [Google Scholar] [CrossRef]
Kufleitner, M. On Bijective Variants of the Burrows–Wheeler Transform. In Proceedings of the Prague Stringology Conference 2009 (PSC 2009), Prague, Czech Republic, 31 August–2 September 2009; pp. 65–79. [Google Scholar]
Daykin, J.W.; Smyth, W.F. A bijective variant of the Burrows–Wheeler Transform using V-order. Theor. Comput. Sci. 2014, 531, 77–89. [Google Scholar] [CrossRef]
Daykin, J.W.; Groult, R.; Guesnet, Y.; Lecroq, T.; Lefebvre, A.; Léonard, M.; Prieur-Gaston, É. Binary block order Rouen Transform. Theor. Comput. Sci. 2016, 656, 118–134. [Google Scholar] [CrossRef]
Gil, J.Y.; Scott, D.A. A Bijective String Sorting Transform. arXiv 2012, arXiv:1201.3077. [Google Scholar] [CrossRef]
Likhomanov, K.M.; Shur, A.M. Two Combinatorial Criteria for BWT Images. In Computer Science–Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6651, pp. 385–396. [Google Scholar] [CrossRef]
Giuliani, S.; Lipták, Z.; Masillo, F.; Rizzi, R. When a dollar makes a BWT. Theor. Comput. Sci. 2021, 857, 123–146. [Google Scholar] [CrossRef]
Giuliani, S.; Lipták, Z.; Masillo, F. When a Dollar in a Fully Clustered Word Makes a BWT. Ceur Workshop Proc. 2022, 3284, 122–135. [Google Scholar]
Giuliani, S.; Inenaga, S.; Lipták, Z.; Romana, G.; Sciortino, M.; Urbina, C. Bit Catastrophes for the Burrows–Wheeler Transform. In Developments in Language Theory; Springer: Berlin/Heidelberg, Germany, 2023; Volume 13911, pp. 86–99. [Google Scholar] [CrossRef]
Akagi, T.; Funakoshi, M.; Inenaga, S. Sensitivity of string compressors and repetitiveness measures. Inf. Comput. 2023, 291, 104999. [Google Scholar] [CrossRef]
Lagarde, G.; Perifel, S. Lempel-Ziv: A “one-bit catastrophe” but not a tragedy. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–10 January 2018; pp. 1478–1495. [Google Scholar] [CrossRef]
Ziv, J.; Lempel, A. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 1977, 23, 337–343. [Google Scholar] [CrossRef]
Ziv, J.; Lempel, A. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 1978, 24, 530–536. [Google Scholar] [CrossRef]
Kempa, D.; Prezza, N. At the roots of dictionary compression: String attractors. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, Los Angeles, CA, USA, 25–29 June 2018; pp. 827–840. [Google Scholar] [CrossRef]
Navarro, G.; Ochoa, C.; Prezza, N. On the Approximation Ratio of Ordered Parsings. IEEE Trans. Inf. Theory 2021, 67, 1008–1026. [Google Scholar] [CrossRef]
Nakashima, Y.; Köppl, D.; Funakoshi, M.; Inenaga, S.; Bannai, H. Edit and Alphabet-Ordering Sensitivity of Lex-Parse. In Proceedings of the 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024), Bratislava, Slovakia, 26–30 August 2024; Volume 306, pp. 75:1–75:15. [Google Scholar] [CrossRef]
Blumer, A.; Blumer, J.; Haussler, D.; Ehrenfeucht, A.; Chen, M.T.; Seiferas, J.I. The Smallest Automaton Recognizing the Subwords of a Text. Theor. Comput. Sci. 1985, 40, 31–55. [Google Scholar] [CrossRef]
Fujimaru, H.; Nakashima, Y.; Inenaga, S. On Sensitivity of Compact Directed Acyclic Word Graphs. In Combinatorics on Words; Springer: Berlin/Heidelberg, Germany, 2023; Volume 13899, pp. 168–180. [Google Scholar] [CrossRef]
Olbrich, J.; Ohlebusch, E.; Büchler, T. Generic Non-recursive Suffix Array Construction. ACM Trans. Algorithms 2024, 20, 18. [Google Scholar] [CrossRef]
Bannai, H.; Kärkkäinen, J.; Köppl, D.; Piątkowski, M. Constructing and Indexing the Bijective and Extended Burrows–Wheeler Transform. Inf. Comput. 2024, 297, 1–30. [Google Scholar] [CrossRef]
Badkobeh, G.; Bannai, H.; Köppl, D. Bijective BWT based Compression Schemes. In String Processing and Information Retrieval; Springer: Berlin/Heidelberg, Germany, 2024; Volume 14899, pp. 16–25. [Google Scholar] [CrossRef]
Biagi, E.; Cenzato, D.; Lipták, Z.; Romana, G. On the Number of Equal-Letter Runs of the Bijective Burrows–Wheeler Transform. In Proceedings of the International Conference on Information and Communication Technology for Competitive Strategies (ICTCS), Jaipur, India, 8–9 December 2023; Volume 3587, pp. 129–142. [Google Scholar]
Lyndon, R.C. On Burnside’s Problem. Trans. Am. Math. Soc. 1954, 77, 202–215. [Google Scholar]
Chen, K.T.; Fox, R.H.; Lyndon, R.C. Free differential calculus, IV. The quotient groups of the lower central series. Ann. Math. 1958, 68, 81–95. [Google Scholar]
Lothaire, M. Combinatorics on Words, 2nd ed.; Cambridge Mathematical Library, Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
Saari, K. Lyndon words and Fibonacci numbers. J. Comb. Theory Ser. A 2014, 121, 34–44. [Google Scholar] [CrossRef]
De Luca, A.; Mignosi, F. Some Combinatorial Properties of Sturmian Words. Theor. Comput. Sci. 1994, 136, 361–385. [Google Scholar] [CrossRef]
Mantaci, S.; Restivo, A.; Sciortino, M. Burrows–Wheeler transform and Sturmian words. Inf. Process. Lett. 2003, 86, 241–246. [Google Scholar] [CrossRef]
Christodoulakis, M.; Iliopoulos, C.S.; Ardila, Y.J.P. Simple Algorithm for Sorting the Fibonacci String Rotations. In Proceedings of the SOFSEM, Merin, Czech Republic, 21–27 January 2006; Volume 3831, pp. 218–225. [Google Scholar] [CrossRef]
Séébold, P. Fibonacci Morphisms and Sturmian Words. Theor. Comput. Sci. 1991, 88, 365–384. [Google Scholar] [CrossRef]
Giuliani, S.; Inenaga, S.; Lipták, Z.; Prezza, N.; Sciortino, M.; Toffanello, A. Novel Results on the Number of Runs of the Burrows–Wheeler-Transform. In Proceedings of the SOFSEM, Bolzano-Bozen, Italy, 25–29 January 2021; Volume 12607, pp. 249–262. [Google Scholar] [CrossRef]
Boucher, C.; Cenzato, D.; Lipták, Z.; Rossi, M.; Sciortino, M. r-Indexing the eBWT. In Proceedings of the SPIRE, Lille, France, 4–6 October 2021; Volume 12944, pp. 3–12. [Google Scholar] [CrossRef]

Figure 1. Sketch of the setting

{conj}_{i} (v) < {conj}_{j} (v)

considered in the proof of Lemma 4.

Figure 2. Illustration of the first case in Lemma 4. Inserting # does not change the lexicographic order between

{conj}_{i} (v)

and

{conj}_{j} (v)

.

Figure 3. Illustration of the second case in Lemma 4.

Figure 4. Illustration of Case 2 (a) in Lemma 4. Inserting # in does not affect lexicographic order between

{conj}_{i} (v)

and

{conj}_{j} (v)

.

Figure 5. Illustration of Case 2 (b) in Lemma 4.

Figure 6. Factorization of

L_{2 k}^{♭}

into Lyndon factors studied in the proof of Theorem 5.

L_{2 k}^{♭}

has k Lyndon factors.

Figure 7. Factorization of

L_{2 k}^{♭} #

into Lyndon factors studied in the proof of Theorem 6.

L_{2 k}^{♭} #

has

k + 1

Lyndon factors.

Figure 8. Factorization of

L_{2 k}^{♭} c

into Lyndon factors studied in the proof of Theorem 7.

L_{2 k}^{♭} c

has k Lyndon factors.

Figure 9. Inserting # at position

α

in

L_{2 k}

considered in the proof in Theorem 8.

Figure 10. Insertion of # at position

f_{2 k} - 2

in

L_{2 k}

increases

ρ

by at least the number of distinct Lyndon factors

k + 1

studied in Theorem 9.

Figure 11. Introducing

D_{k}

from

C_{k}^{♭}

studied in Section 7.1.

D_{k}

is the first Lyndon factor of

C_{k}^{♭}

.

Figure 12. Lyndon factorization of

C_{k}^{♭}

. We obtain

ρ (C_{k}^{♭})

by knowing the number of runs of both its Lyndon factors and where these conjugates are sorted in the BBWT. The analysis is in the proof of Theorem 10.

Figure 13. Lyndon factorization of

C_{k}^{♭} #

. Compared to Figure 12, we have one additional Lyndon factor. The analysis is in proof of Theorem 11.

Figure 14. Introducing the Lyndon word

C_{k} c

studied in Section 7.2.

Figure 15. Introducing the Lyndon word

C_{k}^{♭}

c

studied in Section 7.2.2.

Table 2. Classification of the number of runs obtain in Theorem 2. The total number of runs is

8 k - 17

.

Table 2. Classification of the number of runs obtain in Theorem 2. The total number of runs is

8 k - 17

.

BWT of $W_{k}^{♭} #$	Runs
$β (#) = b$	1
$β (a^{i} b)$ = ${ba}^{k - i - 2}$ for all $4 \leq i \leq k - 2$	$2 k - 11$ but, when merged, $2 k - 12$
$β (aaab) = b^{5} {(ab)}^{k - 6} a$	$2 k - 10$ but, when merged, $2 k - 11$
$β (aab) = {aaba}^{2 k - 8}$	3 but, when merged, 2
$β (ab) = b^{k - 2} # {aba}^{2 k - 6}$	5
$β (b^{i} #) = b$ for all $i \in [1 . . k - 1]$	$k - 1$
$β (ba) = a^{k - 5} {bbbab}^{k - 5} {ab}^{k - 2} a$	7
$β (b^{j} a) = a b^{2 (k - j - 1)} a$ for all $j \in [2 . . k - 2]$	$3 (k - 3)$
$β (b^{k - 1} a) = aa$	1
$β (b^{k} #) = a$	1 but, when merged, 0

Table 3. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 1.

Table 3. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 1.

Prefix	Remaining Part	BWT
#	$P_{2}$	$b$
$a^{k - 2} b$	$b^{k - 1} #$	$b$
$a^{k - 3} b$	$b^{k - 2} aa$	$b$
$a^{k - 3} b$	$b^{k - 1} #$	$a$
$a^{k - 4} b$	$b^{k - 3} aa$	$b$
	$b^{k - 2} aa$	$a$
	$b^{k - 1} #$	$a$
...	...	...

Table 4. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 2.

Table 4. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 2.

Prefix	Remaining Part	BWT
$a^{3} b$	$bab$	$b$
	$bbaba$	$b$
	$bbbabaa$	$b$
	$b^{4} aa$	$b$
	$b^{4} {aba}^{3}$	$b$
	$b^{5} aa$	$a$
	$b^{5} {aba}^{4}$	$b$
	$b^{6} aa$	$a$
	$b^{6} {aba}^{5}$	$b$
	...	...
	$b^{k - 2} aa$	$a$
	$b^{k - 2} {aba}^{k - 3}$	$b$
	$b^{k - 1} #$	$a$
$a^{2} b$	$bab$	$a$
	$bbaba$	$a$
	$bbbaa$	$b$
	$bbbabaa$	$a$
	$b^{4} aa$	$a$
	$b^{4} {aba}^{3}$	$a$
	...	...
	$b^{k - 2} aa$	$a$
	$b^{k - 2} {aba}^{k - 3}$	$a$
	$b^{k - 1} #$	$a$

Table 5. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 3.

Table 5. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 3.

Prefix	Remaining Part	BWT
$ab$	$a^{k - 3} Q_{k}^{♭} #$	$b$
	$a^{k - 4} P_{k - 1}$	$b$
	...	...
	$P_{3}$	$b$
	$baa$	`#`
	$bab$	$a$
	$bbaa$	$b$
	$bbaba$	$a$
	$b^{3} aa$	$a$
	$b^{3} {aba}^{2}$	$a$
	...	...
	$b^{k - 2} aa$	$a$
	$b^{k - 2} {aba}^{k - 3}$	$a$
	$b^{k - 1} #$	$a$
$b #$	$P_{2}$	$b$
$ba$	$a^{k - 4} Q_{k}^{♭} #$	$a$
	$a^{k - 5} P_{k - 1}$	$a$
	...	...
	$a^{2} P_{6}$	$a$
	$b^{2} ab$	$b$
	$b^{3} aba$	$b$
	$b^{4} abaa$	$b$
	${aab}^{5} aa$	$a$
	${aab}^{5} {aba}^{3}$	b
	...	...
	${aab}^{k - 1} {aba}^{k - 3}$	$b$
	$P_{4}$	$a$
	$b P_{3}$	$b$
	...	...
	${ba}^{k - 3} Q_{k}^{♭} #$	$b$
	$b^{3} aa$	$a$

Table 6. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 4.

Table 6. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 4.

Prefix	Remaining Part	BWT
$bb #$	$P_{2}$	$b$
$b^{2} a$	${aab}^{2} ab$	$a$
	${aab}^{3} aba$	$b$
	...	...
	${aab}^{k - 1} {aba}^{k - 3}$	$b$
	${ba}^{k - 3} Q_{k}^{♭} #$	$b$
	...	...
	$ba P_{4}$	$b$
	$b P_{3}$	a
$bbb #$	$P_{2}$	$b$
$b^{3} a$	${aab}^{3} aba$	$a$
	${aab}^{4} abaa$	$b$
	...	...
	${aab}^{k - 1} {aba}^{k - 3}$	$b$
	${ba}^{k - 3} Q_{k}^{♭} #$	$b$
	...	...
	$baa P_{5}$	$b$
	$ba P_{4}$	$a$
$bbbb #$	$P_{2}$	$b$
...	...	...

Table 7. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 5.

Table 7. Lexicographically sorted conjugates of

W_{k}^{♭} #

studied in Theorem 2, Part 5.

Prefix	Remaining Part	BWT
$b^{k - 2} #$	$P_{2}$	$b$
$b^{k - 2} a$	${aab}^{k - 2} {aba}^{k - 4}$	$a$
	${aab}^{k - 1} {aba}^{k - 3}$	$b$
	${ba}^{k - 3} Q_{k}^{♭} #$	$b$
	${ba}^{k - 4} P_{k - 1}$	$a$
$b^{k - 1} #$	$P_{2}$	$b$
$b^{k - 2} a$	${aab}^{k - 1} {aba}^{k - 3}$	$a$
$b^{k - 2} a$	${ba}^{k - 3} Q_{k}^{♭} #$	$a$
$b^{k} #$	$P_{2}$	$a$

Table 8. Classification of the number of runs obtained in Theorem 3. The total number of runs is

6 k - 12

.

Table 8. Classification of the number of runs obtained in Theorem 3. The total number of runs is

6 k - 12

.

BWT of $\bar{W_{k}}$	Runs
$β (a^{k} b) = b$	1
$β (a^{i} b) = {ba}^{2 (k - 1 - i) + 1} b$ for all $i \in [2 . . k - 1]$	$2 k - 3$ but, when merged, $2 k - 4$
$β (ab) = {ba}^{k - 2} {baa}^{k - 5} {baaab}^{k - 5}$	7 but, when merged, 6
$β (ba) = b^{2 k - 6} {abba}^{k - 2}$	4 but, when merged, 3
$β (bba) = b^{2 k - 8} abba$	4
$β (bbba) = b {(ab)}^{k - 6} a^{5}$	$2 k - 10$
$β (b^{i} a) = b^{k - i - 2} a,$ for all $i \in [4 . . k - 2]$	$2 k - 12$

Table 9. Classification of the number of runs obtain in Theorem 4. The total number of runs is

8 k - 17

.

Table 9. Classification of the number of runs obtain in Theorem 4. The total number of runs is

8 k - 17

.

BWT of $\bar{W_{k}^{♭}} c$	Runs
$β (a^{k} c) = b$	1
$β^{'} (a^{k - 1} b) = bb$	1 but, when merged, 0
$β^{'} (a^{i} b) = {ba}^{2 k - 2 i - 2} b$ for all $i \in [2 . . k - 2]$	$3 k - 9$
$β^{'} (a^{i} c) = a$ for all $i \in [1 . . k - 1]$	$k - 1$
$β^{'} (ab) = {ba}^{k - 2} {ba}^{k - 5} {baaab}^{k - 5}$	7
$β^{'} (ba) = b^{2 k - 6} abc a^{k - 2}$	5
$β^{'} (bba) = b^{2 k - 8} abb$	3
$β^{'} (bbba) = b {(ab)}^{k - 6} a^{5}$	$2 k - 10$ but, when merged, $2 k - 11$
$β^{'} (b^{i} a) = b^{k - i - 2} a$ for all $i \in [4 . . k - 2]$	$2 k - 12$
$β^{'} (c) = a$	1 but, when merged, 0

Table 10. Lexicographically sorted conjugates of

\bar{W_{k}^{♭}} c

studied in Theorem 4, Part 1.

Table 10. Lexicographically sorted conjugates of

\bar{W_{k}^{♭}} c

studied in Theorem 4, Part 1.

Prefix	Remaining Part	BWT
$a^{k} c$	$\bar{P_{2}}$	$b$
$a^{k - 1} b$	${ab}^{k - 2}$	$b$
$a^{k - 1} b$	${bba}^{k - 1}$	$b$
$a^{k - 1} c$	$\bar{P_{2}}$	$a$
$a^{k - 2} b$	${ab}^{k - 3}$	$b$
	${ab}^{k - 2}$	$a$
	${bba}^{k - 1}$	$a$
	${bba}^{k - 2}$	$b$
$a^{k - 2} c$	$\bar{P_{2}}$	$a$
$a^{k - 3} b$	${ab}^{k - 4}$	$b$
	${ab}^{k - 3}$	$a$
	${ab}^{k - 2}$	$a$
	${bba}^{k - 1}$	$a$
	${bba}^{k - 2}$	$a$
	${bba}^{k - 3}$	$b$
$a^{k - 3} c$	$\bar{P_{2}}$	$a$
...	...	...
$aab$	$ab$	$b$
	${ab}^{2}$	$a$
	...	$a$
	${ab}^{k - 2}$	$a$
	${bba}^{k - 1}$	$a$
	...	$a$
	${bba}^{3}$	$a$
	${bba}^{2}$	$b$
$aa c$	$\bar{P_{2}}$	$a$

Table 11. Lexicographically sorted conjugates of

\bar{W_{k}^{♭}} c

studied in Theorem 4, Part 2.

Table 11. Lexicographically sorted conjugates of

\bar{W_{k}^{♭}} c

studied in Theorem 4, Part 2.

Prefix	Remaining Part	BWT
$ab$	$aaabb$	$b$
	$a \bar{P_{3}}$	$a$
	$ab \bar{P_{4}}$	$a$
	$abb \bar{P_{5}}$	$a$
	...	$a$
	${ab}^{k - 4} \bar{P_{k - 1}}$	$a$
	${ab}^{k - 3} \bar{Q_{k}^{♭}} c$	$a$
	$\bar{P_{4}}$	$b$
	$b \bar{E_{k - 1}}$	$a$
	$b \bar{E_{k - 2}}$	$a$
	...	$a$
	$b \bar{E_{5}}$	$a$
	$b \bar{P_{5}}$	$b$
	$b \bar{E_{4}}$	$a$
	$b \bar{E_{3}}$	$a$
	$b \bar{E_{2}}$	$a$
	$bb \bar{P_{6}}$	$b$
	$bbb \bar{P_{7}}$	$b$
	$bbbb \bar{P_{8}}$	$b$
	...	$b$
	$b^{k - 5} \bar{P_{k - 1}}$	$b$
	$b^{k - 4} \bar{Q_{k}^{♭}} c$	$b$
$a c$	$\bar{P_{2}}$	$a$

Table 12. Lexicographically sorted conjugates of

\bar{W_{k}^{♭}} c

studied in Theorem 4, Part 3.

Table 12. Lexicographically sorted conjugates of

\bar{W_{k}^{♭}} c

studied in Theorem 4, Part 3.

Prefix	Remaining Part	BWT
$ba$	$a^{k - 1} c$	$b$
	$a^{k - 2} {bab}^{k - 3}$	$b$
	$a^{k - 2} bb$	$b$
	$a^{k - 3} {bab}^{k - 4}$	$b$
	$a^{k - 3} bb$	$b$
	...	...
	$aaababb$	$b$
	$aaabb$	$b$
	$aaabab$	$b$
	$aabb$	$a$
	$aba$	$b$
	$abb$	`#`
	$\bar{P_{3}}$	$a$
	$b \bar{P_{4}}$	$a$
	...	$a$
	$b^{k - 4} \bar{P_{k - 1}}$	$a$
	$b^{k - 3} \bar{Q_{k}^{♭}} c$	$a$
$bba$	$a^{k - 1} c$	$b$
	$a^{k - 2} {bab}^{k - 3}$	$b$
	$a^{k - 2} bb$	$b$
	...	...
	$a^{4} {bab}^{3}$	$b$
	$a^{4} bb$	$b$
	$a^{3} babb$	$b$
	$aaabb$	$a$
	$aabab$	$b$
	$aba$	$b$

Table 13. Lexicographically sorted conjugates of

\bar{W_{k}^{♭}} c

studied in Theorem 4, Part 4.

Table 13. Lexicographically sorted conjugates of

\bar{W_{k}^{♭}} c

studied in Theorem 4, Part 4.

Prefix	Remaining Part	BWT
$bbba$	$a^{k - 1} c$	$b$
	$a^{k - 2} {bab}^{k - 3}$	$a$
	$a^{k - 2} bb$	$b$
	$a^{k - 3} {bab}^{k - 4}$	$a$
	$a^{k - 3} bb$	$b$
	...	...
	$a^{5} {bab}^{4}$	$a$
	$a^{5} bb$	$b$
	$a^{4} {bab}^{3}$	$a$
	$a^{4} bb$	$a$
	$a^{3} {bab}^{2}$	$a$
	$aabab$	$a$
	$aba$	$a$
$bbbba$	$a^{k - 1} c$	$b$
	$a^{k - 2} bb$	$b$
	...	$b$
	$a^{6} bb$	$b$
	$a^{5} bb$	$a$
...	...	...
$b^{k - 3} a$	$a^{k - 1} c$	$b$
$b^{k - 3} a$	$a^{k - 2} bb$	$a$
$b^{k - 2} a$	$a^{k - 1} c$	$a$
$c$	$\bar{P_{2}}$	$a$

Table 14. Classification of the number of runs obtain in Lemma 47. The total number of runs is

8 k - 18

.

Table 14. Classification of the number of runs obtain in Lemma 47. The total number of runs is

8 k - 18

.

BBWT of $\bar{W_{k}^{♭}} c$	Runs
$β (a^{k - 2} b) = b$	1
$β (a^{i} b) = {ba}^{k - i - 2}$ , for all $i \in [4 . . k - 3]$	$2 k - 12$ but, when merged, $2 k - 13$
$β (aaab) = bbbbb {(ab)}^{k - 7} baa$	$2 k - 12$
$β$ ( $aab$ ) = ${baaba}^{2 k - 8}$	4
$β$ ( $ab$ ) = $b^{k - 3} {aaba}^{2 k - 6}$	4
$β (ba) = {ba}^{k - 6} {bbbab}^{k - 4} {ab}^{k - 3} a$	8
$β (b^{j} a) = {bab}^{2 k - 2 j - 2} a$ for all $j \in [2 . . k - 2]$	$4 k - 12$
$β$ ( $b^{k - 1} a$ ) = $aab$	2 but, when merged, 1
$β$ ( $b^{k} a$ ) = $a$	1

Table 15. Classification of the number of runs obtain in Theorem 12. The total number of runs is

8 k - 12

.

Table 15. Classification of the number of runs obtain in Theorem 12. The total number of runs is

8 k - 12

.

BWT of $C_{k} c$	Runs
$β (a^{k - 2} b) = c$	1
$β (a^{i} b) = {ba}^{k - i - 2}$ for all $i \in [4 . . k - 3]$	$2 k - 12$
$β (aaab) = bbbbb {(ab)}^{k - 6} a$	$2 k - 10$
$β (aab) = {baaba}^{2 k - 8}$	4
$β$ ( $ab$ ) = $b^{k - 3} {aaba}^{2 k - 6} b$	5
$β$ ( $ba$ ) = $a^{k - 6} {bbbab}^{k - 4} {ab}^{k - 3} ab$	8
$β (b^{i} a)$ = ${ab}^{2 k - 2 i - 2} ab$ for all $i \in [2 . . k - 2]$	$4 k - 12$
$β (b^{k - 1} a) = aba$	3
$β (b^{k} a) = a$	1 but when merged 0
$β (bc) = a$	1 but when merged 0
$β$ ( $c$ ) = $b$	1

Table 16. Classification of the number of runs obtained in Theorem 13. The total number of runs is

8 k - 13

.

Table 16. Classification of the number of runs obtained in Theorem 13. The total number of runs is

8 k - 13

.

BWT of $C_{k}^{♭} c$	Runs
$β (a^{k - 2} b) = c$	1
$β (a^{i} b) = {ba}^{k - 2 - i}$ for all $i \in [4 . . k - 3]$	$2 k - 12$
$β (aaab) = bbbbb {(ab)}^{k - 6} a$	$2 k - 10$
$β (aab) = {baaba}^{2 k - 8}$	4
$β (ab) = b^{k - 3} {aaba}^{2 k - 6}$	4
$β (ac) = b$	1
$β (ba) = a^{k - 6} {bbbab}^{k - 4} {ab}^{k - 3} ab$	8
$β (b^{i} a) = {ab}^{2 k - 2 i - 2} ab$ for all $i \in [2 . . k - 2]$	$4 k - 12$
$β (b^{k - 1} a) = aba$	3
$β (b^{k} a) = a$	1 but, when merged, 0
$β (c) = a$	1 but, when merged, 0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Compression Sensitivity of the Burrows–Wheeler Transform and Its Bijective Variant

Abstract

1. Introduction

2. Related Work and Contribution

3. Preliminaries

4. Multiplicative Sensitivity of $r$ by $Ω (log n)$

5. Additive Sensitivity of $r$ by $Ω (\sqrt{n})$

5.1. BWT of $W_{k}$ After Substituting a Character

5.2. BWT of $\bar{W_{k}}$ After Substituting a Character

6. Multiplicative Sensitivity of $ρ$ by $Ω (log n)$

6.1. Editing the Last Position of $L_{2 k}$

6.2. Insertions at Specific Locations

7. Additive Sensitivity of $ρ$ by $Ω (\sqrt{n})$

7.1. Deletions and Edits of $C_{k}$ with a Character Smaller than $a$

7.2. Editing $C_{k}$ with a Character Larger than $b$

7.2.1. Appending $c$ to $C_{k}$

7.2.2. Substituting the Last Position of $C_{k}$ with $c$

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Compression Sensitivity of the Burrows–Wheeler Transform and Its Bijective Variant

Abstract

1. Introduction

2. Related Work and Contribution

3. Preliminaries

4. Multiplicative Sensitivity of r by Ω ( log n )

5. Additive Sensitivity of r by Ω ( n )

5.1. BWT of W k After Substituting a Character

5.2. BWT of W k ¯ After Substituting a Character

6. Multiplicative Sensitivity of ρ by Ω ( log n )

6.1. Editing the Last Position of L 2 k

6.2. Insertions at Specific Locations

7. Additive Sensitivity of ρ by Ω ( n )

7.1. Deletions and Edits of C k with a Character Smaller than a

7.2. Editing C k with a Character Larger than b

7.2.1. Appending c to C k

7.2.2. Substituting the Last Position of C k with c

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

4. Multiplicative Sensitivity of $r$ by $Ω (log n)$

5. Additive Sensitivity of $r$ by $Ω (\sqrt{n})$

5.1. BWT of $W_{k}$ After Substituting a Character

5.2. BWT of $\bar{W_{k}}$ After Substituting a Character

6. Multiplicative Sensitivity of $ρ$ by $Ω (log n)$

6.1. Editing the Last Position of $L_{2 k}$

7. Additive Sensitivity of $ρ$ by $Ω (\sqrt{n})$

7.1. Deletions and Edits of $C_{k}$ with a Character Smaller than $a$

7.2. Editing $C_{k}$ with a Character Larger than $b$

7.2.1. Appending $c$ to $C_{k}$

7.2.2. Substituting the Last Position of $C_{k}$ with $c$