Next Article in Journal
Quantum Privacy Comparison with Ry Rotation Operation
Previous Article in Journal
Adaptive Real-Time Transmission in Large-Scale Satellite Networks Through Software-Defined-Networking-Based Domain Clustering and Random Linear Network Coding
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Compression Sensitivity of the Burrows–Wheeler Transform and Its Bijective Variant

Department of Computer Science and Engineering, University of Yamanashi, Kōfu 400-8511, Japan
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(7), 1070; https://doi.org/10.3390/math13071070
Submission received: 27 February 2025 / Revised: 21 March 2025 / Accepted: 24 March 2025 / Published: 25 March 2025
(This article belongs to the Section E: Applied Mathematics)

Abstract

:
The Burrows–Wheeler Transform (BWT) is a widely used reversible data compression method, forming the foundation of various compression algorithms and indexing structures. Prior research has analyzed the sensitivity of compression methods and repetitiveness measures to single-character edits, particularly in binary alphabets. However, the impact of such modifications on the compression efficiency of the bijective variant of BWT (BBWT) remains largely unexplored. This study extends previous work by examining the compression sensitivity of both BWT and BBWT when applied to larger alphabets, including alphabet reordering. We establish theoretical bounds on the increase in compression size due to character modifications in structured sequences such as Fibonacci words. Our devised lower bounds put the sensitivity of BBWT on the same scale as of BWT, with compression size changes exhibiting logarithmic multiplicative growth and square-root additive growth patterns depending on the edit type and the input data. These findings contribute to a deeper understanding of repetitiveness measures.

1. Introduction

The Burrows–Wheeler transform (BWT) [1] has attracted great attention in interdisciplinary fields such as lossless data compression and text indexing. It lies at the heart of compression algorithms like bzip2 and text indexing data structures such as the FM-index [2]. By compressing single character runs of the BWT, we obtain a compressed but reversible transformation, which can be augmented with techniques akin to the FM-index to give rise to compressed text indices [3,4,5,6,7,8]. Because of its reversible nature, the BWT is also used in bioinformatics applications such as sequence alignment and genome assembly [9,10]. Workshops (e.g., [11,12]) and books (e.g., [13]) have been dedicated exclusively to the BWT and its applications.
Given word T of length n, the BWT of T is a permutation of its characters. In detail, we sort all cyclic conjugates of T lexicographically and concatenate the last characters of these conjugates to form the BWT of T. The BWT is a reversible transformation by application of the so-called Gessel–Reutenauer transform [14].
Among various variants of the BWT (e.g., [15,16,17,18,19]), the bijective BWT (BBWT) [20] can be considered as one of the well-perceived ones that is a word isomorphism. A word isomorphism maps a word to another word injectively, and each word is a unique image of another word. For instance, this is not the case for the BWT, whether we add additional information such as an artificial delimiter (known as the $ character) or a starting position, cf. [21,22,23].
In this article, we focus on the run-length compression of the BWT and the BBWT: run-length compression is usually the first step in the compression pipeline of the BWT and its variants. In addition, compressed text indices such as the r-index [4] store the BWT in a run-length compressed form. The run-length compression of word T is the number of maximal runs of equal characters in T. For instance, the word mississippi can be written in an exponential notation as m 1 i 1 s 2 i 1 s 2 i 1 p 2 i 1 and therefore has eight runs. We denote the run-length compression of word T by runs ( T ) . Given word T, we define the following two repetitiveness measures:
  • r = r ( T ) = runs ( BWT ( T ) ) and
  • ρ = ρ ( T ) = runs ( BBWT ( T ) ) .
In this article, we investigate the sensitivity of the BWT and the BBWT to single-character edits. This means that we analyze how the run-length compression of the BWT and the BBWT changes when we modify a single character of the input word. Previous research has shown that the run-length compression of the BWT is sensitive to single-character edits in binary alphabets [24]. Here, we extend this research to larger alphabets and analyze the sensitivity of the BBWT to single-character edits. Research on compression sensitivity is not a new topic, of which we are aware. We present following related work.

2. Related Work and Contribution

The sensitivity [25] of a repetitiveness measure m is the maximum difference in the sizes of m ( T ) for word T and for a single-character edited word T . Sensitivity measures the robustness of a repetitiveness measure against small changes in the input word introduced by various sources of input (source code changes, biological sequencing errors, typos, etc.). Akagi et al. [25] reviewed known results that directly imply a sensitivity for repetitiveness measures such as for Lempel–Ziv 78 [26] or the BWT [24]. Additionally, they offered and improved upper and lower bounds on the multiplicative sensitivity of various compressors and measures including the Lempel–Ziv dictionary compressors [27,28] and the smallest string attractors [29].
In detail, for two words W 1 and W 2 , we let ed ( W 1 , W 2 ) denote the edit distance between W 1 and W 2 . We define the additive sensitivity  AS m and multiplicative sensitivity  MS m of a repetitiveness measure m by
  • AS m ( n ) = max W 1 Σ n m ( W 2 ) m ( W 1 ) W 2 Σ * : ed ( W 1 , W 2 ) = 1 , and 
  • MS m ( n ) = max W 1 Σ n m ( W 2 ) m ( W 1 ) W 2 Σ * : ed ( W 1 , W 2 ) = 1 .
The sensitivity has been studied for lexparse [30] by Nakashima et al. [31] and for the size of the compact directed acyclic word graph [32] by Fujimaru et al. [33]. In particular, Giuliani et al. [24] showed that MS r ( n ) = Ω ( log n ) and AS r ( n ) = Ω ( log n ) .
Our contribution. In this article, we show identical results for the BBWT, confirming that it is also sensitive to single-character edits. Concretely, we establish that MS ρ ( n ) = Ω ( log n ) with Theorem 5 and AS ρ ( n ) = Ω ( log n ) with Lemma 47. In detail, we obtain the asymptotically same results regarding MS ρ ( n ) :
  • in Theorem 5 for deletion,
  • in Theorem 6 and Theorem 7 for substituting a character with a smaller or larger one, respectively, and
  • in Theorem 8 and Theorem 9 for insertion of a or a strictly smaller character #, respectively.
We also obtain the asymptotically same results regarding AS ρ ( n ) :
  • in Theorem 10 for deletion,
  • in Theorem 12 for inserting a large character, and
  • in Theorem 11 and Theorem 13 for substituting a character with a smaller or larger one, respectively.
Additionally, we broaden the study of the sensitivity of the BWT by allowing larger alphabets (Theorem 2) and alphabet reordering (Theorem 4), obtaining the same asymptotic complexities as reported by Giuliani et al. [24].
Since our major contribution is on the BBWT, we also briefly review known results related to it.
BBWT. Since its inception [20], the BBWT has been studied under various aspects. We are aware of construction algorithms (cf. [34] or [35] and the references therein), indexes [35] based on the BBWT, studies about the relationship of ρ and r [36], ρ ( T ) and ρ of the reverse of T [37].

3. Preliminaries

In this section, we provide the necessary definitions and terminology used throughout the paper. A list of symbols is given in Table 1.
Words. We let Σ be a finite and ordered alphabet with cardinality σ . The elements of Σ are called characters. A word over Σ is a finite sequence W = W [ 0 ] W [ 1 ] W [ n 1 ] = W [ 0 . . n 1 ] of characters from Σ . The order of the alphabet induces the lexicographic order on words, which we also denote by lex .
We denote the length of W by | W | , with  ε being the unique word of length 0. We denote the set of words of length n by Σ n , and represent the set of all words on Σ by Σ * = n 0 Σ n . Given word W = W [ 0 . . n 1 ] , we define its reverse by rev ( W ) = W [ n 1 ] W [ n 2 ] W [ 0 ] . If W = X Y Z for words W , X , Y , Z , then X , Y , Z are, respectively, a prefix, a subword, and a suffix of W. We call word W a conjugate of W if and only if there is integer i [ 0 . . | W | 1 ] such that W = W [ i . . | W | ] W [ 0 . . i 1 ] . In this case, we write W = conj i ( W ) . In particular, W = conj 0 ( W ) . We call word U a circular factor of word W if it is a prefix of conj i ( W ) for some i [ 0 . . | W | 1 ] ; in this case, we call i (the starting position of) an occurrence of U. If we can express word W as W = V k for word V and integer k 2 , then we call W a power, otherwise we call W primitive. Finally, W is primitive if and only if it has | W | distinct conjugates.
Given two words V , W , the longest common prefix of V and W, denoted lcp ( V , W ) , is the unique word U such that U is a prefix of both V and W, and  V [ | U | ] W [ | U | ] if neither of the two words is a prefix of the other.
The Burrows–Wheeler Transform (BWT). We define the BWT of word W based on its conjugates. For that, we define two concepts, an order and a list of conjugates sorted in that order. First, the omega-order [16] of two words T and S as follows: T ω S if either T ω S ω or  T ω = S ω and | T | < | S | . Here, S ω denotes the infinite word obtained by concatenating word S an infinite number of times. The omega-order coincides with the lexicographic order if neither of two words is a proper prefix of the other but may differ otherwise. Second, we let M ( W ) be the list of sorted conjugates of word W in omega-order.
Now, we can define the Burrows–Wheeler Transform (BWT) [1] of the word W, denoted by BWT ( W ) , as the word obtained by reading the last character of each conjugate in M ( W ) .
For instance, the BWT of word mississippi is pssmipissii . By construction, it follows that W and W are conjugates if and only if BWT ( W ) = BWT ( W ) . We denote by r ( W ) = runs ( BWT ( W ) ) the number of runs in the BWT of word W. For example, r ( mississippi ) = runs ( pssmipissii ) = 8 .
Table 1. Definitions of symbols introduced in this article.
Table 1. Definitions of symbols introduced in this article.
SymbolMeaning
rrun length of the BWT
ρrun length of the bijective BWT
nlength
kindex
#a character lexicographically smaller than a
c a character lexicographically larger than b
F k kth Fibonacci word
f k kth Fibonacci number
X k kth central word
L k kth Lyndon Fibonacci word
F k kth Fibonacci word deleting its last character
L k kth Fibonacci Lyndon word deleting its last character
P k ab k aa
E k ab k aba k 2
Q k ab k a
Q k ab k
W k i = 2 k 1 P i E i Q k
W 2 k W 2 k deleting its last character
P k ¯ ba k bb
E k ¯ ba k bab k 2
Q k ¯ ba k b
Q k ¯ ba k
W k ¯ i = 2 k 1 P i ¯ E i ¯ Q k ¯
W k ¯ W k deleting the last character
C k Lyndon word of W k
C k C k deleting its last character b
D k C k deleting its last character a
H k 1 E k 1 changed into ab k 1 a k 3
S k 1 E k 1 changed into ab k 1 abca k 3
R k 1 E k 1 changed into ab k 1 aca k 3
β ( W ) subword of BWT(W) corresponding to the range of contiguous conjugates prefixed by W
β ( W ) subword of BWT(W) applied to a specific edit operation
α f 2 k 3 + f 2 k 5 + + f 3 + f 1
M ( W ) the list of lexicographically sorted conjugates of word W
Lyndon Words. A word is called a Lyndon word if it is lexicographically strictly smaller than all of its conjugates [38]. In particular, a Lyndon word must be primitive. Each primitive word S has exactly one conjugate that is Lyndon. We denote this conjugate by LynConj ( S ) and call it the Lyndon conjugate of S. The Lyndon factorization [39] of word W is a unique factorization of W into Lyndon words. In detail, it decomposes word W into a list of Lyndon words S 1 e 1 , S 2 e 2 , , S m e m such that W = S 1 e 1 S 2 e 2 S m e m , where S m lex S m 1 lex lex S 1 and e i 1 . By construction, word S is Lyndon if and only if its Lyndon factorization consists of only one factor, i.e., S itself. We denote the multiset of Lyndon factors in the Lyndon factorization of S by L ( S ) . As an example, we consider LynConj ( mississippi ) = imississipp . The Lyndon factorization of mississippi is m · iss 2 · ipp · i . We have L ( mississippi ) = { m , iss , iss , ipp , i } .
Bijective BWT (BBWT). The Bijective BWT (BBWT) [20] of word T is the word obtained by sorting all conjugates of the Lyndon factors in the multiset L ( T ) in ω -order and then concatenating the last character of each sorted conjugate. For example, the BBWT of the word mississippi is ipssmpissii . In this article, we denote ρ ( W ) as the compression ratio of BBWT, which means ρ ( W ) = runs ( BBWT ( W ) ) . For instance, ρ ( mississippi ) = runs ( ipssmpissii ) = 8 .
Fibonacci Words. Fibonacci words are so-called standard words ([40], Section 10.1), which are defined as follows. F 0 = b , F 1 = a , F k + 1 = F k F k 1 , for every k 1 . For all k 0 , | F k | = f k , where { f k } k 0 are the Fibonacci numbers 1 , 1 , 2 , 3 , 5 , 8 , 13 , 21 , , defined by the recurrence f 0 = f 1 = 1 , f k + 1 = f k + f k 1 , for k 1 . Since Fibonacci numbers grow exponentially in k, we have k = Θ ( log | F k | ) . We also introduce so-called central words [41] X k for k 2 , which are palindromes defined by equation F 2 k = X 2 k ab , F 2 k + 1 = X 2 k + 1 ba for all k 1 . The central words X 2 k and X 2 k + 1 are palindromes. In particular, X 2 = ε . The recursive structure of words X 2 k and X 2 k + 1 is also known [42]:
  • X 2 k = X 2 k 1 ba X 2 k 2 = X 2 k 2 ab X 2 k 1 and 
  • X 2 k + 1 = X 2 k ab x 2 k 1 = X 2 k 1 ba X 2 k .
We study Fibonacci words in this article because they have the minimal number of BWT runs among binary words. This is because Mantaci et al. [43] have shown that the BWT of a binary word has exactly two runs if and only if it is a conjugate of a standard word or a conjugate of a power of a standard word. Further, there is rich literature (e.g., [44,45,46]) about Fibonacci words and their rotations.

4. Multiplicative Sensitivity of r by Ω ( log n )

As a startup, we follow the steps of (Giuliani et al. [24], Section 3), who studied a family of Fibonacci word-related words for which they could observe a multiplicative sensitivity of Θ ( log n ) for the number of character runs in the BWT. We here show a similar result, but use a new character (#) instead of one already appearing in the binary word. To facilitate notation, we write < for < ω when sorting conjugates. We build our proofs on the insights from the following results from the literature.
Lemma 1.
(Remark 11 from [16]). All conjugates of a word have the same BWT.
Lemma 2.
(Proposition 4 of [24]). We let F 2 k be a word that removes the last character of F 2 k , then r ( F 2 k ) = 2 k .
Lemma 3.
(Lemma 7 of [24]). conj n 3 ( F 2 k ) is the smallest conjugate in M ( F 2 k ) .
Lemma 4.
We let v Σ * be a Lyndon word of F 2 k that contains at least two distinct characters and let # be a character that does not occur in v. Then, r ( v ) r ( # v ) = r ( v # ) r ( v ) + 2 .
Proof .
We refer to conj n 3 ( F 2 k ) from Lemma 3 as v here if only 0 i , j f 2 k 1 . The conjugates of v with index i and j are conj i ( v ) , conj j ( v ) , respectively. Also, we set the lexicographic order between two conjugates as conj i ( v ) < conj j ( v ) ; thus, v [ i . . | v | 1 ] v [ 0 . . i 1 ] < v [ j . . | v | 1 ] v [ 0 . . j 1 ] . We prove this separately in two cases, where Figure 1 sketches the setting.
Case 1:
  |lcp( conj i ( v ) , conj j ( v ) )| < min( | v | i + 1 , | v | j + 1 );
Case 2:
  |lcp( conj i ( v ) , conj j ( v ) )| > min( | v | i + 1 , | v | j + 1 ).
The red rectangle in Figure 2 is an example of a common prefix of conj i ( v ) and c o n j j ( v ) . In Case 1, it is conj i ( v ) < conj j ( v ) , meaning that the character of conj i ( v ) in position |lcp( conj i ( v ) , conj j ( v ) )|+1 is smaller than in the one in the same position in conj j ( v ) . Thus, inserting # in position | v | does not change the lexicographic order between conj i ( v ) and conj j ( v ) . The order is preserved.
The red rectangle in Figure 3 depicting the longest common prefix of the two strings in question is longer than |lcp( conj i ( v ) , conj j ( v ) )|. In Case 2, it must be i > j , which means | v [ i . . | v | 1 ] | < | v [ j . . | v | 1 ] | . When it is i < j , then | v [ j . . | v | 1 ] | < | v [ i . . | v | 1 ] | , meaning that # appears first in conj j ( v ) . As a result, conj j ( v ) < conj i ( v ) , which contradicts conj i ( v ) < conj j ( v ) . Thus, in Case 2, we only consider when it is i > j , as illustrated in Figure 3.
Furthermore, we distinguish the second case between two subcases: We let u be unique circular factor which is smaller than all the other circular factors having the same length in v.
Case 2 (a):
  when u is a prefix of v [ i . . | v | 1 ] ;
Case 2 (b):
  when v [ 0 . . i 1 ] is a prefix of u.
When it is Case 2 (a), u appears only in the prefix of v [ 0 . . i 1 ] . Thus, the first difference between conj i ( v ) and conj j ( v ) lies within the unique occurrence of u. The situation is depicted at Figure 4. After inserting the #, conj i ( v ) becomes v [ i . . | v | 1 ] # v [ 0 . . i 1 ] , creating factor # u at position | v | i + 1 , which is not only unique but also smallest among other factors of length | # u | in v. Any factor that appears in the same position in v [ j . . | v | 1 ] # u is greater than # u . Thus, the order is preserved.
In Case 2 (a), u is the smallest prefix which appears only once in v [ 0 . . i 1 ] . v is a Lyndon word; thus, v [ 0 . . i 1 ] , it is also the smallest factor in v. However, in Case 2 (b), u is longer than v [ 0 . . i 1 ] . We sketch the situation in Figure 5, where we visualize u with a purple rectangle. Therefore, v [ 0 . . i 1 ] must appear more than twice in u. If v [ 0 . . i 1 ] appears only once, u is analogous with v [ 0 . . i 1 ] . Also, from conj i ( v ) conj j ( v ) , there must be a difference in v [ 0 . . i 1 ] . Moreover, since v is primitive, v cannot be expressed in the form Z k for word Z and a integer k 2 . The first distinct character between conj i ( v ) and conj j ( v ) is within conj i ( v ) [ | v | i + 1 . . | v | 1 ] . We assume otherwise that there is no mismatching character pair with v [ 0 . . i 1 ] and the prefix of conj i ( v ) , which is v [ i . . 2 i 1 ] . Since v [ 0 . . i 1 ] = v [ i . . [ 2 i 1 ] , conj i ( v ) also has a smallest prefix and it contradicts with v, which is one and only Lyndon word. Moreover, conj i ( v ) becomes v [ 0 . . i 1 ] v [ i . . 2 i 1 ] = v [ 0 . . i 1 ] 2 , thus contradicting its primitivity.
In this way, after inserting a #, the analogous behavior of Case 2 (a) is observed.
The order of original conjugates of v is preserved with respect to the original BWT according to the cases above. Thus, the only difference in inserting # in v occurs in conjugates of # v and v # . On the one hand, we observe that # v is now the smallest among all conjugates of M ( # v ) , and it ends with the last character of v. On the other hand, v # becomes the second smallest conjugate and ends with #. Hence, we have BWT( # v ) = BWT ( v ) [ 0 ] · # · BWT ( v ) [ 1 . . | v | 1 ] , which concludes the proof. □
Theorem 1.
We let F 2 k be the Fibonacci word of even order 2 k > 4 , and  f 2 k = | F 2 k | . We let F 2 k be the word that results from substituting a b by a # at position f 2 k 1 . Then, r ( F 2 k # ) = 2 k + 2 .
Proof. 
We let S = F 2 k # . From Lemma 2, r ( F 2 k ) = 2 k . And by Lemma 3, we know that conj n 3 ( F 2 k ) is the smallest conjugate among M ( F 2 k ) . By Lemma 4, we have 2 k r ( # conj n 3 ( F 2 k ) ) 2 k + 2 . More precisely, it is 2 k + 2 since # conj n 3 ( F 2 k ) is the smallest conjugate in M ( F 2 k # ) and conj n 3 ( F 2 k ) # is the second smallest conjugate. The relative order among the conjugates of # conj n 3 ( F 2 k ) coincides with that of the conjugates of F 2 k , using the same argument as in the proof of Lemma 4. This means that to obtain BWT ( S ) , it suffices to insert a # between the first two b s in BWT ( F 2 k ) . Since r ( # conj n 3 ( F 2 k ) ) = r ( # F 2 k ) = r ( F 2 k # ) , we obtain the claim.    □

5. Additive Sensitivity of r by Ω ( n )

In Section 4, we presented a word such that substituting one of its characters by #, which is strictly lexicographically smaller than all its characters, resulted in a logarithmic multiplicative increase in the number of runs r in the BWT. We now follow (Giuliani et al. [24], Section 4), who presented a family of words where a single edit can produce an additive increase of Θ ( n ) in r. Like before, we want to study the sensitivity when introducing a new character (#) in Section 5.1 or additionally when inverting the order of the alphabet in Section 5.2.
Definition 1.
For any k > 5 , we let P k = ab k aa and E k = ab k aba k 2 for all i [ 2 . . k 1 ] , and  Q k = ab k a . Then,
W k = i = 2 k 1 P i E i Q k = i = 2 k 1 ab i aaab i aba i 2 ab k a .
The length of these words is
n = i = 2 k 1 ( 3 i + 4 ) + ( k + 2 ) = 3 k 2 + 7 k 2 9 .
Thus, k = Θ ( n ) . W k is
W k = i = 2 k 1 P i E i ab k = i = 2 k 1 ab i aaab i aba i 2 ab k .
We append #, which is lexicographically smaller than character a at the last part of W k and name the resulting word W k # .
Also, W k , with its characters a and b swapped, is defined as W k ¯ , which is
W k ¯ = i = 2 k 1 P i ¯ E i ¯ Q k ¯ = i = 2 k 1 ba i bbba i bab i 2 ba k b .
To characterize the BWT of words W k # , W k ¯ and W k ¯ c , we partition each of the BWT conjugates M ( W k # ) , M ( W k ¯ ) , M ( W k ¯ c ) into distinct groups of consecutive conjugates having identical prefixes and define the subword of BWT( W k ) corresponding to each of these prefixes.
Given X Σ * , we denote by β ( X , W k ) the subword of BWT ( W k ) corresponding to the range of contiguous conjugates prefixed by X. We omit the second parameter of β ( X , W k ) when it is clear from the context. β ( X ) is the concatenation of the last characters of conjugates with prefix X. For example, when X is banana , there are two conjugates starting with the prefix an which are ananab and anaban ; thus, β ( an ) of banana is bn .
Lemma 5.
In Proposition 28 of [24], it is already known that r ( W k ) = 6 k 12 .

5.1. BWT of W k After Substituting a Character

The lemmas presented below characterize the BWT of W k after certain modifications have been applied. Rather than deriving the entire structure of the BWT from scratch, we analyze how replacing a character affects either the relative order or the final character of the conjugates of W k . We let M ( W k # ) be the list of lexicographically sorted conjugates of the word W k # .
Lemma 6.
β ( # , W k # ) = b .
Proof. 
The first conjugate in M ( W k # ) is # P 2 b . Since the lexicographic order of # is smaller than all other characters, a conjugate starting with # is smaller than every conjugate starting with a . # can be obtained by the last character of W k # , which is preceded by a b . □
Lemma 7.
β ( a i b , W k # ) = ba k i 2 for all i [ 4 . . k 2 ] .
Proof. 
Given integer i [ 4 . . k 2 ] , the conjugates of M ( W k # ) starting with a i b are
a i 1 P i + 2 b < a i 1 P i + 3 a < < a i 1 P k 1 a < a i 1   Q k # a .
In M ( W k # ) , a prefix a i b can only be obtained by concatenation of the suffix a i 2 of E i , with the prefix ab of P i + 1 or the prefix of ab of Q k # if i = k . Note that all these conjugates end with an a , with the exception of the conjugate starting with a i 1 P i + 1 , since this is where the unique occurrence of ba i 1 b can be found.    □
Lemma 8.
β ( aaab , W k # ) = b 5 ( ab ) k 6 a .
Proof. 
The conjugates in M ( W k # ) starting with aaab are
aa E 2 b < aa E 3 b < aa E 4 b < aa P 5 b < aa E 5 b < aa P 6 a < aa E 6 b < < aa P k 1 a < aa E k 1 b < aa   Q k # a .
In M ( W k # ) , the conjugates that start with aaab can be obtained for all i [ 4 . . k 1 ] from the concatenation of the suffix aa from E i with P i + 1 or with Q k # if i = k . If  i [ 2 . . k 1 ] , concatenation of the suffix aa of P i with the prefix ab of E i also makes aaab . Also, we can sort the conjugates with following order: i = 2 4 { aa E i } i = 5 k 1 { aa P i aa E i } { aa Q k # } . All conjugates of aa E i end with a b and if i [ 5 . . k 2 ] , aa of E i concatenated with P i + 1 or  Q k # if i = j also ends with a . On the other hand, aa P 5 ends with b .    □
Lemma 9.
β ( aab , W k # ) = aaba 2 k 8 .
Proof. 
The conjugates in M ( W k # ) starting with aab are
a E 2 a < a E 3 a < a P 4 b < a E 4 a < a P 5 a < a E 5 a < < a P k 1 a < a E k 1 a < a Q k # a .
Each of the conjugates starting with aaab from Lemma 8 induces a conjugate starting with aab , obtained by shifting on the left one character a . It follows that all of these conjugates end with an a . The other conjugates that start with aab are those obtained by concatenating the suffix a of E 3 with the prefix ab of P 4 which ends with b . □
Lemma 10.
β ( ab , W k # ) = b k 2 # aba 2 k 6 .
Proof. 
The conjugates in M ( W k # ) starting with the prefix ab are
aba k 3 Q k # b < aba k 4 P k 1 b < < ab P 3 b < P 2 # < E 2 a < P 3 b < E 3 a < P 4 a < E 4 a < < P k 1 a < E k 1 a < Q k # a .
For all two distinct integers i , i with i > i 0 , we have aba i b < aba i b . Thus, the first conjugate in the lexicographic order starting with ab is the one followed by the longest a . The smallest of these conjugates can be found from the suffix aba k 3 b of E k 1 , followed by the suffix aba i 2 b of E i for all 2 i k 2 taken in decreasing order.
By construction of E i , for all 2 i k 1 , these conjugates must end in a b . The remaining conjugates starting with ab are exactly those of either P i or E i , for all 2 i k 1 , or  Q k # . The conjugates can be obtained by shifting on the left one character a from the conjugates starting with aab from Lemma 9, with the exception of one starting with P 3 since it ends with a b , and the other starting with P 2 which ends with #, while the other conjugates end with an a . □
Lemma 11.
β ( b i # , W k # ) = b for all 1 i k 1 .
Proof. 
The conjugate in M ( W k # ) starting with b i # for all 1 i k 1 is b i # P 2 b . This conjugate can be obtained by a suffix of Q k # , and is always preceded by a b .    □
Lemma 12.
β ( ba , W k # ) = a k 5 bbbab k 5 ab k 2 a .
Proof. 
The conjugates in M ( W k # ) starting with ba are
ba k 3 Q k # a < ba k 4 P k 1 a < < ba 3 P 6 a < baa E 2 b < baa E 3 b < baa E 4 b < baa P 5 a < baa E 5 b < baa E 6 b < < baa E k 1 b < ba P 4 a < baba k 3 Q k # b < baba k 4 P k 1 b < < bab P 3 b < babbbaa a .
We have as many circular occurrences of ba as the number of maximal character runs of b in W k # . Then, for all 2 i k 1 ,
Case 1:
  one run of b in P i and
Case 2:
  two runs in E i .
For Case 1, we have one conjugate starting with baa E i for each i [ 2 . . k 1 ] . Since each run of b within each word of P i is of length of at least 2, all conjugates in (1) end with b . For Case 2, for all i [ 2 . . k 1 ] we can distinguish between two subcases, based on where ba  starts:
Case 2 (a):
  from the first run of b in E i , which is baba i 2 P i + 1 when i [ 2 . . k 2 ] or baba k 3 Q k # if i = k 1 . Since b has at least 2 runs, conjugates with prefix (2.1) always end with b .
Case 2 (b):
  from run ba i 3 P i + 1 for all i [ 2 . . k 2 ] , and  ba k 3 Q k # . Each conjugate of Case 2 (b) is obtained by shifting two characters to the right in each conjugate in Case 2 (a). Therefore, these conjugates end with an a .
Observe that only for Case 2 (b) we have conjugates starting with baaaa . Hence, the first conjugate in the lexicographic order is the one starting with ba k 3 Q k # , followed by those starting with ba k 4 P k 1 < ba k 5 P k 2 < < baaa P 6 .
Among the remaining conjugates, those that have the prefix baaa start with baa P 5 from Case 2 (b) or baa E i from Case 2 (a). Thus, we can sort them according to lexicographic order. Then, the remaining conjugates, which start with baa , are obtained by ba P 3 only. Finally, let us focus on the conjugates from Case 2 (a), which start with ba . These conjugates are sorted according to the length of the runs of a s following the common prefix bab , similarly to the sorting of conjugates from Case 2 (b). The last conjugate left is the one starting with b P 3 from Case 2 (b). Since b P 3 is lexicographically greater than all other cases, this is the greatest conjugate of W k # starting with ba and we can conclude our claim.    □
Lemma 13.
β ( b j a , W k # ) = a b 2 k 2 j 2 a for all 2 j k 2 .
Proof. 
The conjugates starting with ba i with integer 2 j k 2 in M ( W k # ) are
b i aa E i a < b i aa E i + 1 b < < b i aa E k 1 b < b i aba k 3 Q k # b < b i aba k 4 P k 1 b < < b i aba i 1 P i + 2 b < b i aba i 2 P i + 1 a .
All runs of b of length at least 2 i k 2 appear in either
Case 1:
   P i or
Case 2:
   E i for all i j k 1 .
Let us consider these two cases separately. For all i j k 1 , the conjugate starting within P j has prefix b i aa E j . For all i j k 2 , the conjugate starting within E j has prefix b i aba j 2 P j + 1 , and for j = k 1 , we have the conjugate with prefix b i aba k 3 Q k # . By construction, we have all the conjugates from Case 1 sorted according to the lexicographic order of the words with respect to the length of the run by b obtained by E j .
The conjugates covered by Case 2 are sorted according to the decreasing length of the run of a , following the common prefix b i ab . Only when the run of b is exactly i long, its conjugate ends with a . Thus, the conjugates ending with an a are those starting with P i and E i , which have prefixes b i aa E i and b i aba i 2 P i + 1 .    □
Lemma 14.
β ( b k 1 a , W k # ) = aa .
Proof. 
The two conjugates in M ( W k # ) which start with ba k 1 a are
b k 1 aa E k 1 a < b k 1 aba k 3 Q k # a .
The conjugates with the prefix b k 1 a start with E k 1 or Q k # . These conjugates have prefixes of b k 1 aa E k 1 and b k 1 aba Q k # , respectively. One can see that these conjugates taken in this order are already sorted, and both conjugates end with a .    □
Lemma 15.
β ( b k # , W k # ) = a .
Proof. 
The last conjugate in M ( W k # ) is b k # P 2 a . The last conjugate in lexicographic order starts with b k # P 2 , and since the run of b is maximal, it ends with a , and the claim follows. □
In conclusion, we define the above theorem.
Theorem 2.
r ( W k # ) r ( W k ) = 2 k 5 for every k 6 .
Proof. 
The BWT of the W b k # is BWT( W b k # ) = β ( # ) i = 2 k 1 β ( a k i b ) · i = 1 k 1 β ( b i # ) β ( b i a ) · β ( b k a ) . We refer to Table 2. Moreover, r ( W k # ) = 8 k 17 which has 2 k 5 more runs than r ( W k ) = 6 k 12 , cf. Lemma 5.
The lexicographic order of # is lower than an a , and a conjugate starting with # is smaller than any conjugate starting with a . Moreover, every conjugate in β ( a i b ) is smaller than every one in β ( a i b ) , for every 1 i i k 2 . In addition, every conjugate contributing a character to β ( b j a ) is smaller than a conjugate contributing a character to β ( b j a ) for every 1 j j k 1 . And with a conjugate starting with b i # , the number is smaller than that of b i a . Since we considered all the disjoint ranges of conjugates of W k # based on their common prefix, the word BWT( W k # ) is β ( # ) i = 2 k 1 β ( a k i b ) · i = 1 k 1 β ( b i # ) β ( b i a ) · β ( b k a ) . With the structure of BWT( W k # ), we can derive its number of runs. The words β ( # ) and i = 2 k 4 β ( a k i b ) have 2 ( k 6 ) runs: we start with 1 run from β ( # ) = b which is merged by β ( a k 2 b ) β ( a k 3 b ) = bba . And concatenating them β ( a i b ) up to β ( a 4 b ) adds 2 new runs each. β ( aaab ) , β ( aab ), β ( ab ) have 2 ( k 5 ) , 3, 5 runs, respectively. However, the boundaries between β ( aaab ) and β ( aab ) are merged by an a ; therefore, β ( aab ) has 2 runs. β ( b # ) has 1 run, followed by β ( ba ) which makes 7 runs. Then, β ( b i # ) and β ( b i a ) repeat, making 1 and 3 runs until i = k 2 thus makes 4 ( k 3 ) runs. β ( b k 1 # ) adds 1 run. Also, β ( b k 1 a ) adds 1 run and is the last run since β ( b k # ) does not add new runs, since it consists only of a a that merges with the previous one. Altogether, we have 2 ( k 6 ) + 2 ( k 5 ) + 2 + 5 + 1 + 7 + 4 ( k 3 ) + 1 + 1 = 8 k 17 , and the claim holds. The main difference in the runs of W k # and W k occurs from the prefix beginning with b i # that concatenates with b i a , repeating baba for i [ 2 . . k 1 ] , while W k repeats only ba . Thus, it makes additive runs of 2 k 5 = Θ ( k ) = Θ ( n ) .
Table 3, Table 4, Table 5, Table 6 and , Table 7  M ( W k # ) . The first column partitions conjugates by common prefixes and names the common prefix shared by all conjugates in a partition. The second column shows the remaining part of the respective conjugate followed by the prefix of its partition. The remaining part of a conjugate decides its relative order inside its partition. The BWT column shows the last character of each conjugate. □

5.2. BWT of W k ¯ After Substituting a Character

In this subsection, we consider the word W k ¯ = i = 2 k 1 P i ¯ E i ¯ Q k ¯ = i = 2 k 1 ba i bbba i bab i 2 ba i b , where we swapped a with b in W k . The following series of lemmas characterize the subword of BWT( W k ¯ ) using M ( W k ¯ ) for each range we consider.
Lemma 16.
β ( a k b , W k ¯ ) = b .
Proof. 
The first conjugate in M ( W k ¯ ) is a k b b . The first conjugate in lexicographic order must start with the longest run of a s. By the definition of W k ¯ , the longest run of a has length k, which is obtained by a k of Q k ¯ , which is preceded by a b .    □
Lemma 17.
β ( a i b , W k ¯ ) = ba 2 k 2 i 1 b for all i [ 2 . . k 1 ] .
Proof. 
With integer i [ 2 . . k 1 ] , the conjugates starting with a i b in M ( W k ¯ ) are
a i bab i 2 b < a i bab i 1 a < < a i bab k 3 a < a i b P 2 ¯ a < a i bbba k 1 a < < a i bbba i + 1 a < a i bbba i b .
For all i [ 2 . . k 1 ] , the factor of a i b can only be obtained for all j [ i . . k 1 ] , from  a i bab j 2 from E j ¯ , or  a i bb from P j ¯ , and if j = k , a i b from Q k ¯ . We can sort the conjugate according to the lexicographic order. Note that all these conjugates end with b , with the exception of the conjugate starting with a i b obtained by E i ¯ and P i ¯ ending with b .    □
Lemma 18.
β ( ab , W k ¯ ) = ba k 2 baa k 5 baaab k 5 .
Proof. 
In M ( W k ¯ ) , the conjugates starting with ab are
abaaabb E 2 ¯ b < aba P 3 ¯ a < abab P 4 ¯ a < < abab k 3 Q k ¯ a < ab P 4 ¯ b < ab P 2 ¯ a < abb E k 1 ¯ a < < abb E 5 ¯ a < abb P 5 ¯ < abb E 4 ¯ a < abb E 3 ¯ a < abb E 2 ¯ a < abbb P 6 ¯ b < < ab k 3 Q k ¯ b .
We have as many circular occurrences of ab as the number of maximal (circular) runs of b in W k ¯ . Then, for all i [ 2 . . k 1 ] , we have three cases.
Case 1:
  one run of ab in P i ¯ ,
Case 2:
  two runs in ab in E i ¯ ,
Case 3:
  one run ab in Q k ¯ .
For Case 1, we have one conjugate starting with abb E i ¯ , for each i [ 2 . . k 1 ] . Since each run of a within each word of P i ¯ is of length at least 2, all conjugates in Case 1 end in a . For Case 2, for all i [ 2 . . k 1 ] , we can distinguish between two sub-cases based on where ab starts.
Case 2 (a):
  from the first run of a in E i ¯ , starting with abab i 2 P i + 1 ¯ , if  i [ 2 . . k 2 ] , or  abab k 3 Q k ¯ ,
Case 2 (b):
  from the second run in E i ¯ , starting with ab i 2 P i + 1 ¯ , if  i [ 2 . . k 2 ] , or  ab k 3 Q k ¯ .
Similarly to Case 1, each conjugate for Case 2 (a), ends with a . Each conjugate in Case 2 (b) is obtained by shifting two characters on the right in each conjugate in Case 2 (a). Therefore, all these conjugates end with b .
For Case 3, the conjugate starting with ab in Q k ¯ has ab P 2 ¯ as a prefix and is preceded by a . Observe that only for Case 2 (b), we have one conjugate that starts with abaaa obtained by a P 3 ¯ and it is the first conjugate in the lexicographic order of W k ¯ . Then, the conjugates start with abab followed by aba P 3 ¯ < abab P 4 ¯ < < abab k 3 Q k ¯ from Case 2 (a).
Among the remaining conjugates, those with the prefix abb start with ab P 4 ¯ from Case 2 (b) or  ab P 2 ¯ from Case 3. Then, among the left conjugates, the conjugate with the prefix abbb from Case 2 (a), for all i [ 2 . . k 1 ] , or  abb P 5 ¯ from Case 2 (b) follows. The last remaining conjugates have the prefix ab i 2 for i [ 6 . . k 1 ] or ab k 3 Q k ¯ , which can be obtained by Case 2 (b). Since ab k 3 Q k ¯ is greater than all other conjugates, it is the greatest conjugate of W k ¯ starting with ab and we conclude this proof. □
Lemma 19.
β ( ba , W k ¯ ) = bb 2 k 8 babba k 2 .
Proof. 
The conjugates in M ( W k ¯ ) that start with ba are
ba k b P 2 ¯ b < ba k 1 ab k 3 Q k ¯ b < ba k 1 bb E k 1 ¯ b < ba k 2 bab k 4 P k 1 ¯ b < ba k 2 bb E k 2 ¯ b < < ba 4 babb P 5 ¯ b < ba 4 bb E 4 ¯ b < baaabab P 4 ¯ b < baaabb E 3 ¯ a < baaba P 3 ¯ b < baabb E 2 ¯ b < ba P 3 ¯ a < bab P 4 ¯ a < < bab k 3 Q k ¯ a .
For integer i, we can see that ba i bab i 2 is lexicographically smaller than ba i bb . Thus, the first conjugate in lexicographic order starting with ba is the one followed by the longest run of a , and it can be found by ba k b of Q k ¯ , followed by conjugates starting with ba i bab i 2 of E i ¯ and ba i bb of P i ¯ for all i [ 2 . . k 1 ] taken in decreasing order. By construction of E i ¯ , for  i [ 2 . . k 1 ] , these conjugates must end with a b . Otherwise, for  P i ¯ , conjugates also end with b , with the exception of a conjugate starting with P 3 ¯ , since it is preceded by an a from P 2 ¯ . The remaining conjugates starting with ba are exactly those conjugates that have the prefix of the suffix bab i 2 P i + 1 ¯ if i [ 2 . . k 2 ] or  bab k 3 Q k ¯ . All of these conjugates end with a , since they are preceded by a .    □
Lemma 20.
β ( bba , W k ¯ ) = b 2 k 8 abba .
Proof. 
The conjugates starting with bba in M ( W k ¯ ) are
b Q k ¯ b < b E k 1 ¯ b < b P k 1 ¯ b < < b E 5 ¯ b < b P 5 ¯ b < b E 4 ¯ b < b P 4 ¯ a < b E 3 ¯ b < b E 2 ¯ b < b P 2 ¯ a .
These conjugates are obtained by following four cases.
Case 1:
  concatenating suffix b of P j ¯ with E j ¯ for all j [ 2 . . k 1 ] ,
Case 2:
  concatenating suffix b of E j ¯ with P j + 1 ¯ for all j [ 3 . . k 2 ] ,
Case 3:
  concatenating suffix b of E k 1 ¯ with Q k ¯ ,
Case 4:
  concatenating suffix b of Q k ¯ with P 2 ¯ .
The first conjugate in lexicographic order starting with bba is the one followed by the longest run of a . The smallest of these conjugates can be found by Case 3, concatenation of the suffix b of E k 1 ¯ with Q k ¯ . We can directly observe that bba j bab j 2 < bba j bb holds for every integer j 0 . Thus, the next conjugate will have the prefix b E j ¯ from Case 1 and b P j ¯ from Case 2 repeating in decreasing order. Since b E j ¯ of Case 1 and b Q k ¯ of Case 3 is preceded by a b , those end with a b . On the other hand, b P j + 1 ¯ precedes b for all j [ 4 . . k 2 ] until b P 4 ¯ appears since it precedes an a . Lastly, conjugates with the prefix bbaaa and bbaa by Case 1 end with a b . The greatest lexicographic conjugate is from Case 4 as it has the smallest runs of a which is two and ends with a .
We can sort all of these conjugates according to the order of the words in
{ b Q k ¯ } j = 4 k 1 { b E j ¯ b P j ¯ } j = 2 3 { E j ¯ } { b P 2 ¯ } .
Lemma 21.
β ( bbba , W k ¯ ) = b ( ab ) k 6 a 5 .
Proof. 
The conjugates in M ( W k ¯ ) starting with bbba are
bb Q k ¯ b < bb E k 1 ¯ a < bb P k 1 ¯ b < < bb E 6 ¯ a < bb P 6 ¯ b < bb E 5 ¯ a < bb P 5 ¯ a < bb E 4 ¯ a < bb E 3 ¯ a < bb E 2 ¯ a .
Some of the conjugates starting with bbba can be obtained by two cases.
Case 1:
  from the concatenation of the suffix bb of E j 1 ¯ with a prefix of ba of P j ¯ for all j [ 5 . . k 1 ]
or Q k ¯ if j = k ;
Case 1:
  from the concatenation of the suffix bb of P j ¯ with prefix ba of E j ¯ for all j [ 2 . . k 1 ] .
Thus, all conjugates starting with bbba are sorted according to the lexicographic order of the words in { bb Q k ¯ } j = 5 k 1 { bb E j ¯ bb P j ¯ } j = 2 4 { E j ¯ } . All conjugates starting with bb P j ¯ for all j [ 6 . . k 1 ] or bb Q k ¯ in Case 1 end with b . Otherwise, conjugates starting with bb P 5 ¯ of Case 1 or bb E j ¯ for all j [ 2 . . k 1 ] of Case 2 end with a . □
Lemma 22.
β ( b j a , W k ¯ ) = b k j 2 a for all j [ 4 . . k 2 ] .
Proof. 
All runs of b of length of a range j [ 4 . . k 3 ] appear only by concatenating suffix b j 1 of E j + 1 with prefix ba of P j ¯ for all j [ j + 2 . . k 1 ] in decreasing order. All of these conjugates end with a b , with the exception of a conjugate b j 1 P j + 2 ¯ which ends with an a since suffix b j 1 precedes an a . Hence, the last conjugate in lexicographic order starting with b k 2 a is within b k 3 Q k ¯ and since the run of b is maximal it ends with a , and the claim follows. □
The following theorem presents the shape of the BWT of W k ¯ .
Theorem 3.
For every  k 6 , r ( W k ¯ ) = 6 k 12 . cfTable 8.
Proof. 
Let us put the result from Lemma 16 to Lemma 22 together. Every conjugate of contributing a character to β ( a i b ) is smaller than a conjugate contributing a character to β ( a i b ) , for every 1 i i k . Symmetrically, every conjugate in β ( b j a ) is greater than every conjugate in β ( b j a ) , when 1 j j k 2 . Since we considered all the disjoint ranges of conjugates of W k ¯ based on their common prefix, the word i = 0 k 1 β ( a k i b ) · i = 1 k 2 β ( b i a ) is the BWT of W k ¯ .
With the structure of BWT( W k ¯ ), we can derive its number of runs. The word i = 0 k 1 β ( a k i b ) has exactly 2 k + 3 runs: we start with 1 run from β ( a k b ) but it is merged by a b from β ( a k 1 b ) . Then, concatenating each β ( a k 1 b ) up to β ( aab ) adds 3 runs each. However, the boundaries between these words merge because b appears continuously. Thus, each β ( a i b ) for i [ 2 . . k 1 ] makes 2 runs each. By counting, we observe that β ( ab ) runs 7 times. The remaining part of the BWT, that is, i = 1 k 2 β ( b i a ) has 4 k 12 runs: the word β ( ba ) , has 4 runs, but the first b merges with a b from β ( ab ) , so we only charge 3 runs for this word. Then, β ( bba ) and β ( bbba ) add 4 and 1 + 2 ( k 6 ) + 1 runs, respectively. Finally, i = 4 k 2 β ( b i a ) runs for 2 until i = k 3 . The word β ( b k 2 a ) does not add new runs, as it consists only of an a that merges with the previous one. Altogether, we have 2 ( k 2 ) + 7 + 3 + 4 + 1 + 2 ( k 6 ) + 1 + 2 ( k 6 ) = 6 k 12 , and the claim holds. □
The following lemmas describe the BWT of W k ¯ after applying one specific edit operation. W k ¯ c is a word obtained by replacing the last character b of W k ¯ with c , where c is lexicographically larger than b . The number of runs in the BWT of W k ¯ c can be derived by comparing the BWT of W k ¯ c to the BWT of W k ¯ , for which we explicitly counted the number of runs, so we omit these parts of the proof using M ( W k ¯ c ) , which is a list of lexicographically sorted conjugates of word W k ¯ c . Substituting the last character with c in W k ¯ also increases the number of runs by Θ ( k ) .
Lemma 23.
β ( a k c , W k ¯ c ) = b .
Proof. 
The first conjugate in M ( W k ¯ c ) starts with a k c b . The first conjugate in lexicographic order must start with the longest run of a . By the definition of W k ¯ c , the longest run of a is obtained by suffix a k c of Q k ¯ c , preceded by a b .    □
Lemma 24.
β ( a i b , W k ¯ c ) = ba 2 k 2 i 2 b for all i [ 2 . . k 1 ] .
Proof. 
The conjugates in M ( W k ¯ c ) starting with the prefix a i b for i [ 2 . . k 1 ] are
a i bab i 2 P i + 1 ¯ b < a i bab i 1 P i + 2 ¯ a < < a i bab k 4 P k 1 ¯ a < a i bab k 3 Q k ¯ c a < a i bb E k 1 ¯ a < < a i bb E k 2 ¯ a < < a i bb E i + 1 ¯ a < a i bb E i ¯ b .
For every integer i [ 2 . . k 1 ] , the conjugates in M ( W k ¯ c ) starting with b i a can only be obtained from two cases:
Case 1:
   a i bab i 2 of E j ¯ for all j [ i . . k 1 ] ,
Case 2:
   a i bb of P j ¯ for all j [ i . . k 1 ] .
We can sort these conjugates according to the lexicographic order of j = i k 2 { a i bab j 2 P j + 1 ¯ } a i bab k 3 Q k ¯ j = i k 1 { a i bb E j ¯ } . Note that all these conjugates end with an a , with the exception of the conjugate starting with a i bab i 2 P i + 1 ¯ and a i bb E i ¯ , since these are the only places where the occurrence of a i b can be found.    □
Lemma 25.
β ( a i c , W k ¯ c ) = a for all i [ 1 . . k 1 ] .
Proof. 
The only conjugate in M ( W k ¯ c ) starting with a i b c for all i [ 1 . . k 1 ] has a prefix of a i b c P 2 ¯ a . For all two distinct integers i , i with i > i 0 , we have a i bc < a i bc . Also, since the lexicographic order of a word in W k ¯ c is a < b < c , it is also clear that a i b < a i c . The conjugates starting with a i c are obtained from a i c from Q k ¯ c and since the length of a is k, all conjugates with a i with i [ 1 . . k 1 ] end with a .    □
Lemma 26.
β ( ab , W k ¯ c ) = ba k 2 ba k 5 baaab k 5 .
Proof. 
In M ( W k ¯ c ) , the conjugates starting with ab are
a P 3 ¯ b < aba P 3 ¯ a < abab P 4 ¯ a < < abab k 3 Q k ¯ c a < ab P 4 ¯ b < abb E k 1 ¯ a < < abb E 5 ¯ a < abb P 5 ¯ b < abb E 4 ¯ a < abb E 3 ¯ a < abb E 2 ¯ a < abbb P 6 ¯ b < < ab k 3 Q k ¯ c b .
 We have as many circular occurrences of ab as the number of maximal runs of a in W k ¯ c . Then, for all i [ 2 . . k 1 ] , we have two cases.
Case 1:
  one run in P i ¯ obtained by concatenating suffix abb of P i ¯ with E i ¯ , for each i [ 2 . . k 1 ] , and 
Case 3:
  two runs in E i ¯ .
For Case 1, since each run of a within each word of i = 2 k 1 abb E i ¯ is of length of at least 2, all conjugates in Case 1 end with an a .
For Case 2, for all i [ 2 . . k 1 ] , we can distinguish between two sub-cases, based on where ab starts, if either
Case 2 (a):
  from the first a in E i ¯ or 
Case 2 (b):
  from the second a in E i ¯ .
For Case 2 (a), we can see that these conjugates are of the type abab i 2 P i + 1 ¯ if i [ 2 . . k 2 ] or abab k 3 Q k ¯ c . Similarly to Case 1, each conjugate for Case 2 (a) ends with a . Each conjugate in Case 2 (b) is obtained by shifting two characters on the right each conjugate in Case 2 (a). Therefore, all of these conjugates end with a b and have prefix of type ab i 2 P i + 1 ¯ , if  i [ 2 . . k 2 ] or ab k 3 Q k ¯ c . All these conjugates end with a b since a is preceded by b . Observe that only for Item Case 2 (b), we have conjugates starting with abaaab which is a P 3 ¯ . Hence, it is the first conjugate in lexicographic order, followed by those starting with aba P 3 ¯ < abab P 4 ¯ < < abab k 3 Q k ¯ c from Item Case 2 (a) and these conjugates start with abab .
Next, conjugates with a prefix of abba which is ab P 4 ¯ from Case 2 (b) follow, then those having prefix abbba either start with abb E i ¯ for all i [ 5 . . k 1 ] from Case 1 follow in decreasing order. Then, abb P 5 ¯ from Case 2 (b) and abb E 4 ¯ , abb E 3 ¯ , abb E 2 ¯ from Case 1 follow.
The remaining conjugates are those which start with a prefix of ab i a for i [ 4 . . k 2 ] , which are obtained by ab i 1 P i + 2 ¯ if i [ 4 . . k 3 ] or ab k 3 Q k ¯ c , from Case 2 (b). These conjugates are sorted according to the length of the run of a following the common prefix. Then, the result is
{ a P 3 ¯ } j = 2 k 2 { abab j 2 P j + 1 ¯ } { aba k 3 Q k ¯ c } { ab P 4 ¯ } i = 0 k 6 { abb E k i 1 ¯ } { abb P 5 ¯ } i = 0 2 { abb E 4 i ¯ } j = 4 k 3 { a j 1 P j + 2 ¯ } { ab k 3 Q k ¯ c } .
   □
Lemma 27.
β ( ba , W k ¯ c ) = b 2 k 6 abca k 2 .
Proof. 
The conjugates in M ( W k ¯ c ) starting with the prefix ba are
ba k c P 2 ¯ b < ba k 1 ab k 3 Q k ¯ c b < ba k 1 bb E k 1 ¯ b < < ba 4 babb P 5 ¯ b < ba 4 bb E 4 ¯ b < ba 3 bab P 4 ¯ b < ba 3 bb E 3 ¯ a < baaba P 3 ¯ b < baabb E 2 ¯ c < ba P 3 ¯ a < bab P 4 ¯ a < < bab k 3 Q k ¯ c a .
There are many occurrences of a conjugate starting with the prefix ba , and it occurs in three parts.
Case 1:
  one run of ba in P j ¯ , for all j [ 2 . . k 1 ] ,
Case 1:
  two runs from E j ¯ , for all j [ 2 . . k 1 ] ,
Case 1:
  one run from Q k ¯ c .
The conjugates in Case 1 start with ba j for all j [ 2 . . k 1 ] . Since ba i < ba i if only i > i , the conjugates are sorted in decreasing order. All conjugates for j [ 3 . . k 1 ] end with a b , except for conjugates with prefix P 2 ¯ since it is preceded by c .
In Case 2, we can distinguish between two sub-cases based on where ba starts:
Case 2 (a):
  first run of ba from the prefix of E j ¯ ,
Case 2 (b):
  from the second run of ba in E j ¯ .
The conjugates in Case 2 (a) are the type of ba j bab j 2 P j + 1 ¯ , if  j [ 2 . . k 2 ] or  ba k 1 bab k 3 Q k ¯ c . All of these conjugates are preceded by P j ¯ , thus ending with b . The conjugates in Case 2 (b) start from bab j 2 P j + 1 ¯ if j [ 2 . . k 2 ] or  bab k 3 Q k ¯ c and end with an a .
In Case 3, only one conjugate can be found by a prefix of ba k c , which ends with b .
Observe that only for Case 3 we have a conjugate with the longest run of a after b . Hence, the first conjugate in lexicographic order is ba k c P 2 ¯ from Case 3. It is followed by ba k 1 bab k 3 Q k ¯ c < ba k 1 bb E k 1 ¯ < ba k 2 bab k 4 P k 1 ¯ < ba k 2 bb E k 2 ¯ < < ba 4 babb P 5 ¯ < ba 4 bb E 4 ¯ . All of these conjugates end with a b .
Among the remaining conjugates, those having prefix baaab either start with baaabab P 4 ¯ from Case 2 (a) or  baaabb E 3 ¯ from Case 1. Then, the remaining conjugates with prefix baab are those starting with baaba P 3 ¯ from Case 2 (a) or baabb E 2 ¯ from Case 1. Lastly, k 2 conjugates from Case 2 (b) follow, which are bab j 2 P j + 1 ¯ for all j [ 2 . . k 2 ] or  bab k 3 Q k ¯ c . All of these conjugates end with an a .
We prove our claim by sorting lexicographically the conjugates in
{ ba k c P 2 ¯ } j = 0 k 3 { ba k j 1 bab k j 3 P k j ¯ · ba k j 1 bb E k j 1 ¯ } j = 2 k 2 { bab j 2 P j + 1 ¯ } { bab k 3 Q k ¯ c } .
Lemma 28.
β ( bba , W k ¯ c ) = b 2 k 8 abb .
Proof. 
The conjugates in M ( W k ¯ c ) starting with prefix bba are
bba k c P 2 ¯ b < bba k 1 bab k 3 Q k ¯ c b < bba k 1 bb E k 1 ¯ b < < bba 5 bab 3 P 6 ¯ b < bba 5 bb E 5 ¯ b < bba 4 babb P 5 ¯ b < bba 4 bb E 4 ¯ a < bba 3 bab P 4 ¯ b < bbaaba P 3 ¯ b .
The smallest conjugate with prefix bba can be obtained by three cases.
Case 1:
  concatenating suffix b of E k 1 ¯ with Q k ¯ c ,
Case 2:
  concatenation of suffix b of E j ¯ with P j + 1 ¯ if j [ 3 . . k 2 ] or Q k ¯ c ,
Case 3:
  concatenating suffix b of P j ¯ with E j ¯ , for all j [ 2 . . k 1 ] .
The conjugates in Case 1 and Case 3 end with b . Also, conjugates from Case 2 end with b with an exception of a conjugate starting with b P 4 ¯ since it is preceded by an a . We conclude this proof by sorting lexicographically the conjugates in
{ bba k c P 2 ¯ } j = 0 k 5 { bba k i 1 bab k i 3 P k i ¯ bba k i 1 bb E k i 1 ¯ } j = 0 1 { bba 3 j bab 1 j P 4 j ¯ } .
Lemma 29.
β ( bbba , W k ¯ c ) = b ( ab ) k 6 aaaaa .
Proof. 
The conjugates in M ( W k ¯ c ) starting with the prefix bbba are
bb Q k ¯ c b < bb E k 1 ¯ a < bb P k 1 ¯ b < < bb E 6 ¯ a < bb P 6 ¯ b < bb E 5 ¯ a < bb P 5 ¯ a < bb E 4 ¯ a < bb E 3 ¯ a < bb E 2 ¯ a .
Analogously to Lemma 28, the conjugates starting with bbba can be obtained from three cases.
Case 1:
  concatenating suffix bb of E k 1 ¯ with Q k ¯ c ,
Case 2:
  concatenation of suffix bb of E j ¯ with P j + 1 ¯ if j [ 4 . . k 2 ] or Q k ¯ c ,
Case 3:
  concatenating a suffix bb of P j ¯ with E j ¯ , for all j [ 2 . . k 1 ] .
The conjugate in Case 1 is the smallest conjugate starting with bbba since it has a longest run of a and ends with a b . In addition, the conjugates of Case 3 end with a a since bb are preceded by an a . In Case 2, all the conjugates end with b with an exception of a conjugate starting with bb P 5 ¯ since it is preceded by an a . We can sort these conjugates by
{ bb Q k ¯ c } j = 0 k 6 { bba k i 1 bab k i 3 P k i ¯ bba k i 1 bb E k i 1 ¯ } j = 0 2 { bba 4 j bab 2 j P 5 j ¯ } .
   □
Lemma 30.
β ( b j a , W k ¯ c ) = b k j 2 a for all j [ 4 . . k 2 ] .
Proof. 
In M ( W k ¯ c ) , the conjugates starting with prefix b j a for all j [ 4 . . k 2 ] are
b j 1 Q k ¯ c b < b j 1 P k 1 ¯ b < b j 1 P k 2 ¯ b < < b j 1 P j + 3 ¯ b < b j 1 P j + 2 ¯ a .
Observe that the only conjugates with the prefix b j a for j [ 4 . . k 2 ] start with concatenating b j 1 either to Q k ¯ c or P j ¯ if j [ j + 2 . . k 1 ] . One can see that these conjugates taken in this order are already sorted, and all conjugates end with a b , with the exception of a conjugate starting with b j 1 P j + 2 ¯ , since it is preceded by an a , therefore ending with an a . We have all conjugates ordered according to the lexicographic order of the words in b j 1 Q k ¯ c j = 0 k j 3 { b j 1 P k j 1 ¯ } . This concludes our proof.    □
Lemma 31.
β ( c , W k ¯ c ) = a .
Proof. 
The only conjugate in M ( W k ¯ c ) that starts with prefix c is c P 2 ¯ a . Since c is lexicographically larger than other characters such as a , b , it is the biggest conjugate in M ( W k ¯ c ) , and it ends with an a . □
The following theorem puts the lemmas above together.
Theorem 4.
Substituting the last character b of W k ¯ by c increases r by 2 k 5 , cfTable 9.
Proof. 
Every conjugate contributing a character to β ( a i b ) is smaller than a conjugate contributing a character to β ( a i b ) for every 1 i < i k 1 . By symmetry, every conjugate contributing a character to β ( b j a ) is greater than each conjugate contributing a character to β ( b j a ) for every 1 j j k 2 . With the structure of the BWT of ( W k ¯ c ) , we can easily derive its number of runs. β ( a k c ) · β ( a k 1 c ) · i = 1 k 2 β ( a i b ) · β ( a i c ) has exactly 4 k 2 runs: we start from 1 run from β ( a k c ) but it is merged with β ( a k 1 b ) . β ( a k 1 b ) and β ( a k 1 c ) add 2 runs. Then, concatenating each β ( a i b ) and β ( a i c ) for all i [ 2 . . k 2 ] in a decreasing order, we add 3 and 1 runs each, which results in 4 ( k 3 ) runs. By counting, we observe that β ( ab ) , β ( a # ) adds 7 and 1 runs, respectively.
The word β ( ba ) , β ( bba ) , β ( bbba ) has exactly 5, 3, 2 k 10 runs each, but since the boundaries between β ( bba ) and β ( bbba ) merge, the first b of β ( bbba ) does not count, turning into 2 k 11 . The remaining part of BWT, that is, j = 4 k 3 β ( b j a ) · β ( b k 2 a ) · β ( c ) has 2 k 12 runs: we start by concatenating each β ( b 4 a ) up to β ( b k 3 a ) , which adds 2 runs each. The last β ( b k 2 a ) , β ( c ) does not add new runs, as it consists only of an a that merges with the previous one. Altogether, we have 2 + 4 ( k 3 ) + 7 + 1 + 5 + 3 + 2 k 11 + 2 ( k 6 ) = 8 k 17 , and the claim holds.
The main difference between W k ¯ and W k ¯ c comes from a i b that is concatenated with a i c for i [ 2 . . k 1 ] , which repeats baba , while W k ¯ repeats ba only, making 2 k 5 = Θ ( k ) more runs. Table 10, Table 11, Table 12 and Table 13 describe the scheme of the BWT of word W k ¯ c . We have r ( W K ¯ c ) = r ( W k ¯ ) + 2 k 5 . From Definition 1, we have k = Θ ( n ) . Thus, r ( W K ¯ c ) r ( W k ¯ ) = 2 k 5 = Θ ( n ) .    □

6. Multiplicative Sensitivity of ρ by Ω ( log n )

Recall that ρ ( W ) = runs ( BBWT ( W ) ) . In this section, we return our attention to Fibonacci words. Similar to Section 4, we use them to construct a family of words with a multiplicative sensitivity of Θ ( log n ) for the number of runs  ρ in the BBWT. Before that, we start with some helpful lemmas known in the literature.
Lemma 32.
([41], Lemma 3). The 2 k th Fibonacci word F 2 k is X 2 k ab . The Lyndon conjugate of the Fibonacci word F 2 k is L 2 k = a X 2 k b .
Lemma 33.
([37], Lemma 6). We let L 2 k be the Lyndon conjugate of the Fibonacci word F 2 k . Then, r(BBWT( L 2 k ))= 2.
Lemma 34.
([41], Lemma 8). If k < n , then the Lyndon conjugate of F k is a prefix or a suffix of a X n b . If F k = P k ba , then its Lyndon conjugate a P k b is a prefix of a P n b ; and if F k = P k ab , then its Lyndon conjugate a P k b is a suffix of a P n b .
The next lemma addresses the extended Burrows–Wheeler transform [16], which takes a subset of steps from the BBWT by expecting the input to be a set of primitive words (i.e., the Lyndon factors in case of the BBWT). We translate the following known result to the BBWT:
Lemma 35.
(Corollary 4 of [47]). We let { T 1 , , T m } be a conjugate-free set of primitive words and let r be the number of runs of its extended Burrows–Wheeler transform. Then, m r .
Corollary 1.
We let T 1 , , T m be the Lyndon factors of word T, then m ρ ( T ) .
In what follows, we establish a lower bound on the multiplicative sensitivity of ρ with the Lyndon conjugates of Fibonacci words by leveraging Corollary 1.

6.1. Editing the Last Position of L 2 k

We start with deleting the last character of L 2 k , which directly leads to the following insight.
Theorem 5.
ρ ( L 2 k ) k .
Proof. 
L 2 k = a X 2 k is not a Lyndon word; therefore, its Lyndon is factorized and has more than one factor. According to Lemma 34, the Lyndon word of the Fibonacci word F 2 k = X 2 k ab is a X 2 k b . The central word X 2 k is X 2 k 1 b a X 2 k 2 , so the Lyndon word of F 2 k is a X 2 k b = a X 2 k 1 ba X 2 k 2 b . a X 2 k 1 b refers to L 2 k 1 , which is a X 2 k 1 b and the suffix a X 2 k 2 b is L 2 k 2 .
However, by deleting the last character b , L 2 k becomes a X 2 k 1 ba X 2 k 2 , meaning that L 2 k 2 does not exist. Thus, we can say that L 2 k 1 is one of the Lyndon factors, since it is not followed by L 2 k 2 . The remaining part of L 2 k is a X 2 k 2 . The same as X 2 k , central word X 2 k 2 can be divided as X 2 k 3 ba X 2 k 4 ; thus, a X 2 k 2 = a X 2 k 3 ba X 2 k 4 . We can find Lyndon factor L 2 k 3 = a X 2 k 3 b in the prefix. The remaining part is a X 2 k 4 , which is not a Lyndon word, same as a X 2 k 2 above, so a X 2 k 4 is Lyndon factorized and makes L 2 k 5 as a prefix, and the remaining a X 2 k 6 makes L 2 k 7 as a prefix. And finally, a X 4 is divided as a X 3 ba X 2 , where X 2 is ε . Therefore, L 2 k ’s Lyndon factor is L 2 i 1 for i [ 2 . . k ] and the last remaining part a is the Lyndon word itself. Thus, L 2 k has Lyndon factors L 2 i 1 for every i [ 2 . . k ] and a as a Lyndon factor. The number of the Lyndon factor is k, which we depict in Figure 6. □
By Lemma 33 and Theorem 5, we conclude that the multiplicative sensitivity for deleting the last character of L 2 k is Ω ( k ) .
Theorem 6.
We let L 2 k # be the word obtained by substituting the last character b of L 2 k by #. Then, ρ ( L 2 k # ) k + 1 .
Proof. 
Since # is lexicographically smaller than a , L 2 k # is not a Lyndon word; it makes Lyndon factors. Since # is smaller than both a and b , # is a Lyndon word. In addition, L 2 k is Lyndon factorized as Theorem 5, which produces Lyndon factors L 2 i 1 for i [ 2 . . k ] and the last Lyndon factor a . L 2 k # makes one more Lyndon factor, which is #, which therefore makes k + 1 a number of Lyndon factors. We depict the Lyndon factorization in Figure 7. □
By Lemma 33 and Theorem 6, we conclude that the multiplicative sensitivity for substituting the last character of L 2 k is Ω ( k ) . We observe a similar result when substituting the last character with a larger character instead of a smaller one (#).
Theorem 7.
ρ ( L 2 k c ) k .
Proof. 
The lexicographic order between a , b , and c is a < b < c . Recall that L 2 k makes L 2 i 1 for i [ 2 . . k ] , and a as a Lyndon factor. In L 2 k c , c is in position f 2 k ; therefore, it does not affect anything until the last Lyndon factor a . ac is the Lyndon word itself because a < c . Therefore, L 2 k c makes a k number of Lyndon factors, shown in Figure 8. □

6.2. Insertions at Specific Locations

According to Corollary 1, ρ is lower bounded by the number of distinct Lyndon factors. After editing L 2 k at any position, we can still find consecutive Lyndon conjugates of lower order which can merge to a higher order. For instance, L 2 k 1 · L 2 k 2 merge into L 2 k , which can decrease the number of the Lyndon factor. Also, L 2 k 3 · L 2 k 2 merge into L 2 k 1 . Our idea is to avoid consecutive Fibonacci Lyndon conjugates so that they do not merge because doing so avoids a decrease in a number of distinct Lyndon factors. Now, we consider editing the specific location of Fibonacci Lyndon conjugates, also resulting in an increase in runs. The following theorems describe the bijective BWT of L 2 k after some specific edit operations are applied.
Theorem 8.
We let L 2 k be a Fibonacci Lyndon conjugate. By inserting a at position α in L 2 k , ρ is at least k.
Proof. 
We let α be the number of additions of odd Fibonacci numbers f 2 k 3 + f 2 k 5 + + f 3 + f 1 . Recall that the Fibonacci word F i = X i c with c { ab , ba } has the Lyndon conjugate L i = a X i b . Further, L 2 k = L 2 k 1 · L 2 k 2 = a X 2 k 1 b · a X 2 k 2 b . Thus, we start with a X 2 k 1 b · a X 2 k 2 b . To obtain many distinct Lyndon factors, we aim to produce Lyndon factors that are not consecutive. Knowing X 2 k 1 = X 2 k 3 ba X 2 k 2 , a X 2 k 2 merges with a X 2 k 3 b into L 2 k 1 , so it is best to divide X 2 k 2 . a X 2 k 2 divides into a X 2 k 3 ba X 2 k 4 . In this case, it is best to add a X 2 k 3 b as a new Lyndon factor since it is smallest among those Lyndon factors that are not consecutive with X 2 k 1 , the same as a X 2 k 2 ; a X 2 k 4 divides into a X 2 k 5 ba X 2 k 6 , and we add a X 2 k 5 b as a Lyndon factor. a X 2 k 6 divides into a X 2 k 7 ba X 2 k 8 as we add a X 2 k 7 b as a Lyndon factor. The addition of Lyndon factors of 2 i 1 for i [ 1 . . k 1 ] continues until a X 5 = a X 3 ba X 4 appears since a X 3 b is the second smallest Lyndon factor in Fibonacci. Thus, we need X 1 = a as the last Lyndon factor and it is obtained by inserting a # in a X 4 , dividing a X 4 into a # X 4 . Since # is lexicographically smaller than any words from right to #, the right words become the Lyndon factor. Thus, we can obtain k Lyndon factors by inserting # in L 2 k : k 1 factors from L 2 k 3 · L 2 k 5 L 1 and one from # concatenated with the remaining words. And this is shown in Figure 9. □
By Lemma 33 and Theorem 8, we conclude that the multiplicative sensitivity for inserting a character into L 2 k is Ω ( k ) . In the same way, we can also insert the special character # to observe a similar behavior:
Theorem 9.
We let L 2 k be a Fibonacci Lyndon conjugate. By inserting # at position f 2 k 2 in L 2 k , ρ is at least k + 1 .
Proof. 
Unlike Theorem 8, we can obtain some Lyndon factors on the right side of a X 2 k b , adding a X 2 k 1 b = L 2 k 1 as a Lyndon factor. We divide a X 2 k 2 b into a X 2 k 3 ba X 2 k 4 b and obtain a X 2 k 3 b = L 2 k 3 . Further, we divide a X 2 k 4 b into a X 2 k 5 ba X 2 k 6 b , making L 2 k 5 . We divide a X 2 k 6 b and can obtain Lyndon factors such as L 2 k 7 L 5 . Lastly, a X 4 b divides into a X 3 ba X 2 b , but since X 2 is ε , the last Lyndon factor obtained here is L 3 . To make more Lyndon factors, we can add # between a and b , turning into a # b , adding 2 Lyndon factors which are a = L 1 and # b . Thus, we can obtain k + 1 Lyndon factors here: k factors by L 2 k 1 , L 2 k 3 L 1 and one from # b . We visualize the Lyndon factorization in Figure 10. □

7. Additive Sensitivity of ρ by Ω ( n )

Here, we study the additive sensitivity of ρ with an approach similar to Section 5. In what follows, we establish that the additive sensitivity of ρ is at least Θ ( n ) . To that end, we again make use of the word W k . Recall that W k = ( i = 2 k 1 P i E i ) Q k = i = 2 k 1 ab i aaab i aba i 2 ab k a .
Lemma 36.
The Lyndon conjugate C k of W k is a k 2 b k a · i = 2 k 2 P i E i · P k 1 ab k 1 ab .
Proof. 
The Lyndon conjugate of W k starts with the longest runs of a , which can be obtained by concatenating suffix a k 3 of E k 1 with prefix a of Q k . Therefore, C k = a k 2 b k a · i = 2 k 2 P i E i · P k 1 ab k 1 ab = a k 2 b k a · i = 2 k 2 ab i aaab i aba i 2 · ab k 1 aa ab k 1 ab . □
Lemma 37.
ρ ( C k ) = 6 k 12 .
Proof. 
According to Lemma 1, all conjugates have the same BWT, thus r ( W k ) = r ( C k ) = 6 k 12 . Also, since C k is a Lyndon word, r ( C k ) = ρ ( C k ) = 6 k 12 . □
Recall that the runs in the BBWT and BWT are the same if the input word is Lyndon, cf. Lemma 1. Thus, we can leverage BWT computation if the input word is Lyndon since we can obtain the number of runs in the same way as in Section 5 by using β ( W ) for word W. In this section, we focus on three variations of the word C k : deleting its last character and substituting its last character b with c or #.

7.1. Deletions and Edits of C k with a Character Smaller than a

Recall that C k = a k 2 b k a · i = 2 k 2 ab i aaab i aba i 2 · ab k 1 aa ab k 1 ab . Thus, C k , which is obtained by deleting the last character b , is C k = a k 2 b k a · i = 2 k 2 ab i aaab i aba i 2 · ab k 1 aa ab k 1 a . Recall that the Lyndon conjugate of C k is the strictly smallest conjugate of all conjugates of C k . Since we obtain the longest runs of a s from a conjugate of C k by concatenating the last a with a k 2 b k a , C k cannot be a Lyndon word. In fact, it has two Lyndon factors, which are a k 2 b k a · i = 2 k 2 ab i aaab i aba i 2 · ab k 1 aa ab k 1 , and we refer to both Lyndon factors as D k (first factor) and a from now on. Figure 11 shows the Lyndon factorization. Since r ( a ) is 1, the only thing left to check is r ( D k ) . In D k , we made a slight modification to the subword E k 1 . In fact, E k 1 = ab k 1 aba k 3 was changed to ab k 1 a k 3 , which we call H k 1 in this section. Since D k is a Lyndon word, we determine ρ ( D k ) using M ( D k ) with the BWT as we did before.
Lemma 38.
β ( a k 2 b , D k ) = b .
Proof. 
The only conjugate in M ( D k ) starting with prefix a k 2 b is a k 2 b P 2 b . The first conjugate in lexicographic order must start with the longest run of a . By the definition of D k , the longest run of a has length k 2 , and it is obtained by concatenating suffix a k 3 with prefix a of Q k that is preceded by b (otherwise, we could extend the sequence of a characters). □
Lemma 39.
β ( a i b , D k ) = ba k i 2 , for all i [ 4 . . k 3 ] .
Proof. 
With integer i [ 3 . . k 3 ] , the conjugates in M ( D k ) starting with a i b are
a i 1 P i + 2 b < a i 1 P i + 3 a < < a i 1 P k 1 a < a i b k a P 2 a .
For all i [ 4 . . k 3 ] , the factor a i b can only be obtained from the concatenation of suffix a i 1 from E j 1 ,
  • with the prefix ab of P j for a j [ i + 2 . . k 1 ] or
  • with the prefix ab of Q k , if j = k .
We can sort these conjugates according to the lexicographic order of j = i + 2 k 1 P j Q k . All these conjugates end with an a , with the exception of the conjugate starting with a i P i + 2 , since D k has a unique occurrence of ba i b . □
Lemma 40.
β ( aaab , D k ) = bbbbb ( ab ) k 7 baa .
Proof. 
The conjugates in M ( D k ) starting with aaab are
aa E 2 b < aa E 3 b < aa E 4 b < aa P 5 b < aa E 5 b < aa P 6 a < aa E 6 b < < aa P k 2 a < aa E k 2 b < aa H k 1 b < aa P k 1 a < aa Q k a .
The above conjugates are obtained in the following cases.
Case 1:
  by concatenating the suffix aa of E i 1 with the prefix ab of P i , if only i [ 5 . . k 1 ] ,
Case 2:
  by concatenating the suffix aa of P i , with the prefix ab of E i , for all i [ 2 . . k 2 ] or with H k 1 if i = k 1 ,
Case 3:
  by concatenating the suffix aa of H k 1 with the prefix ab of Q k .
All these conjugates starting with aaab are sorted according to the lexicographic order of the words in i = 2 4 { aa E i } j = 5 k 2 { aa P j · aa E j } { aa H k 1 } { aa P k 1 } { aa Q k } . The conjugates starting either with aa P i , for all i [ 6 . . k 1 ] in Case 1 or Case 3, end with an a . On the other hand, conjugates of Case 2 or aa P 5 in Case 1 end with a b . □
Lemma 41.
β ( aab , D k ) = baaba 2 k 8 .
Proof. 
The conjugates in M ( D k ) starting with aab are
a P 2 b < a E 2 a < a E 3 a < a P 4 b < a E 4 a < a P 5 a < a E 5 a < < a H k 1 a < a P k 1 a < a Q k a .
Each of the cases from Case 1 to Case 3 in Lemma 40 induces a conjugate starting with aab , obtained by shifting on the left character a . It follows that all of these conjugates end with a . The other two conjugates that start with an aab are obtained by
  • concatenating the suffix a of Q k with the prefix ab of P 2 or
  • concatenating suffix a of E 3 to the prefix ab of P 4 .
In both cases, the obtained conjugates end with b . We conclude this proof by sorting lexicographically the conjugates in a P 2 i = 2 3 { a E i } i = 4 k 2 { a P i · a E i } { a H k 1 } { a P k 1 } { a Q k } .
Lemma 42.
β ( ab , D k ) = b k 3 aaba 2 k 6 .
Proof. 
The conjugates in M ( D k ) starting with ab are
aba k 4 P k 1 b < < ab P 3 b < P 2 a < E 2 a < P 3 b < E 3 a < P 4 a < E 4 a < < P k 2 a < E k 2 a < H k 1 a < P k 1 a < Q k a .
The above conjugates are obtained in the following cases.
Case 1:
   P i for all i [ 2 . . k 1 ] ,
Case 2:
  prefix ab of E i , for all i [ 2 . . k 1 ] ,
Case 3:
   aba i 2 from E i , for all i [ 2 . . k 2 ] or ab from H k 1 ,
Case 4:
   ab from Q k .
For two distinct integers i , i with i > i 0 , we have aba i > aba i . Thus, the first conjugate in lexicographic order starting with ab is the one followed by the longest run of a s. The smallest of these conjugates can be found by concatenating the suffix aba k 4 with the prefix ab of P k 1 from Case 3. Then, the remaining conjugates in Case 3 which are aba i 2 of E i for all i [ 2 . . k 3 ] follow in decreasing order. By construction of E i , for all i [ 2 . . k 2 ] , these conjugates must end with a b . Note that the remaining cases are obtained by shifting the character a from the conjugates starting with aab from Lemma 41 with the exception of the character starting with P 3 . It follows that the latter ends with a b , while all the other conjugates end with a . □
Lemma 43.
β ( ba , D k ) = ba k 6 bbbab k 4 ab k 3 a .
Proof. 
The conjugates in M ( D k ) starting with ab are
ba k 2 b k a P 2 b < ba k 4 P k 1 a < ba k 5 P k 2 a < < baaa P 6 a < baa E 2 b < baa E 3 b < baa E 4 b < baa P 5 a < baa E 5 b < < baa E k 2 b < baa H k 1 b < ba P 2 b < ba P 4 a < baba k 4 P k 1 b < < bab P 3 b < b P 3 a .
The conjugates above are obtained by following cases.
Case 1:
suffix baa of P i concatenating with E i for all i [ 2 . . k 2 ] or H k 1 if i = k 1 ,
Case 2:
runs in E i for all i [ 2 . . k 2 ] ,
Case 3:
suffix ba from Q k concatenating with P 2 ,
Case 4:
ba k 3 of H k 1 concatenating with Q k .
We have as many circular occurrences of ba as the number of maximal runs of b s in D k . For Case 1, we have one conjugate starting with baa E i for all i [ 2 . . k 2 ] or baa H k 1 . Since each run of b s within each word from 2 k 1 P i is of length of at least 2, all conjugates of Case 1 end with b .
For Case 2, for all i [ 2 . . k 2 ] , we can distinguish two subcases based on where ba starts:
Case 2 (a):
 the first run of ba in E i , which has a type of baba i 2 for all i [ 2 . . k 2 ] ,
Case 2 (b):
 the second run of ba in E i , which has a type of ba i 2 for all i [ 2 . . k 2 ] .
  • For Case 2 (a), we can see that these conjugates start with baba i 2 P i + 1 , if i [ 2 . . k 2 ] . Similarly to Case 1, each conjugate for Case 2 (a) ends with a b . Each conjugate in Case 2 (b) is obtained by shifting two characters on the right each conjugate in Case 2 (a). Therefore, all of these conjugates end with an a and have prefixes of the type ba i 2 P i + 1 , if i [ 2 . . k 2 ] .
  • For Case 3, the conjugate starting with ba in Q k has ba P 2 as a prefix, and it is preceded by a b .
  • Lastly, for Case 4, the conjugates start with ba k 3 concatenating with Q k which ends with a b .
  • Observe that only for Case 4 and Case 2 (b) we have conjugates starting with baaaa . Hence, the first conjugate in lexicographic order is the one from Case 4 starting with ba k 3 Q k , followed by those from Case 2 (b) which are ba k 4 P k 1 < ba k 5 P k 2 < < baaa P 6 .
Among the remaining conjugates, those having prefix baaa either start with baa P 5 from Case 2 (b) and from Case 1 starting with baa E i for all i [ 2 . . k 2 ] or baa H k 1 if i = k 1 . We can sort them according to the order of the words in
i = 2 4 { baa E i } { baa P 5 } i = 5 k 2 { baa E i } { baa H k 1 } .
Then, the remaining conjugates with prefix baa are those starting with ba P 2 from Case 3 and ba P 4 from Case 2 (b). Finally, let us focus on the conjugates from Case 2 (a). These conjugates are sorted according to the length of the run of a s following the common prefix bab . The last conjugate left is the one starting with b P 3 from Case 2 (b). Since this conjugate is greater than each conjugate considered in Case 2 (a), this is the greatest conjugate of D k starting with ba and the thesis follows. □
Lemma 44.
β ( b j a , D k ) = bab 2 k 2 j 2 a for all j [ 2 . . k 2 ] .
Proof. 
With integer i [ 2 . . k 2 ] , the conjugates in M ( D k ) starting with the prefix b i a are
b i a k 3 Q k b < b i aa E i a < b i aa E i + 1 b < < b i aa E k 2 b < b i aa H k 1 b < b i a P 2 b < b i aba k 4 P k 1 b < < b i aba i 1 P i + 2 b < b i aba i 2 P i + 1 a .
Case 1:
  concatenating b i aa of P j with E j for all j [ i . . k 1 ] or with H k 1 if only j = k 1 ,
Case 2:
  concatenating b i aba j 2 of E j with P j + 1 if only j [ i . . k 2 ] ,
Case 3:
  concatenating b i a k 3 of H k 1 with Q k ,
Case 4:
  concatenating b i a with P 2 .
We consider these four cases separately. For all j [ i . . k 2 ] , the conjugate starting within P j has a prefix of b i aa E j or b i aa H k 1 (Case 1). For all j [ i . . k 2 ] , the conjugates starting within E j have a prefix of b i aba j 2 P j + 1 (Case 2). In addition, conjugate starting within a word in Case 3 has a prefix of b i a k 3 Q k . Finally, the conjugates starting with Q k starts with b i a P 2 (Case 4). By construction, we can see that first we have all the conjugates first from Case 3 and then from Case 1 sorted according to the lexicographic order into j = i k 2 b i aa E j b i aa H k 1 ; then, we have the conjugate from Case 4, then Case 2 sorted according to the decreasing length of the run of a s following the common prefix b i ab . Moreover, we note that only when the run of b s is exactly of length i , the conjugate ends with a . Thus, only the conjugates ending with an a are those starting within b i aa E i and b i aba i 2 P i + 1 . □
Lemma 45.
β( b k 1 a , D k )= aab .
Proof. 
There are three conjugates in M ( D k ) starting with prefix b k 1 a . These conjugates are
b k 1 a k 3 Q k a < b k 1 aaab k 1 a < b k 1 a P 2 b .
Observe that the only conjugates with prefix b k 1 a have the prefixes, respectively, of b k 1 a k 3 Q k , b k 1 aa H k 1 , and b k 1 a P 2 . One can see that these conjugates taken in this order are already sorted, and only the conjugate starting within Q k ends with b , while the other two have a . □
Lemma 46.
β ( b k a , D k ) = a .
Proof. 
The last conjugate in M ( D k ) with prefix b k a is b k a P 2 a . Finally, the only occurrence of b k is within Q k . Hence, the last conjugate in lexicographic order starts with b k a P 2 , and since the run of b ’ is maximal, it ends with an a , and the thesis follows. □
We summarize the above lemmas as follows.
Lemma 47.
For integer k 10 , ρ ( D k ) = 8 k 18 , cf. Table 14. The BWT of the word  D k is given by  BBWT ( D k ) = i = 2 k 1 β ( a k i b ) · i = 1 k β ( b i a ) .
Proof. 
Every conjugate of β ( a i b ) is smaller than each conjugate of β ( a i b ) for every 1 i < i k 2 . Symmetrically, every conjugate of β ( b j a ) is greater than any conjugate of β ( b j a ) , for every 1 j < j k . Since we considered all the disjoint ranges of conjugates of D k based on their common prefix, the word i = 2 k 1 β ( a k i b ) · i = 1 k β ( b i a ) is the BWT of D k .
With the structure of BBWT ( D k ) , we can easily derive its number of runs. The word i = 2 k 4 β ( a k i b ) has exactly 2 ( k 6 ) runs. We start with 2 runs from β ( a k 2 b ) β ( a k 3 b ) = bba , and then, concatenating each β ( a i b ) up to β ( a 4 b ) adds 2 new runs each. By counting, we observe that β ( aaab ) , β ( aab ) , β ( ab ) have 2 ( k 6 ) , 4, 4. The boundaries between these words do not yet merge. The word β ( ba ) has exactly 8 runs. The remaining part of the BWT, that is, i = 2 k β ( b i a ) , has 4 ( k 3 ) + 2 runs. Concatenating each β ( b 2 a ) to β ( b k 2 a ) adds 4 new runs each. The word β ( b k a ) adds only one run by b , as it contains an a that merges with the previous one. Finally, β ( b k a ) adds one run. Altogether, we have 2 ( k 6 ) + 2 ( k 6 ) + 4 + 4 + 8 + 4 ( k 3 ) + 2 = 8 k 18 , and the claim holds. □
Using Lemma 47 above, we can finally obtain the runs of C k = D k a .
Theorem 10.
ρ ( C k ) = 8 k 17 .
Proof. 
C k = a k 2 b k a · i = 2 k 2 ab i aaab i aba i 2 · ab k 1 aa ab k 1 a . The Lyndon conjugate of C k is the smallest conjugate starting with the longest runs of a , thus it is the one starting with a k 1 . Therefore, it is obvious that C k is not a Lyndon word, then it is Lyndon factorized by an a and the residual which is D k . Figure 12 depicts the Lyndon factorization of C k . Since the lexicographic order between a and D k is a < D k , the runs of C k add one run because the first conjugate of D k from Lemma 38 ends with a b . Therefore, ρ ( C k ) = ρ ( a ) + ρ ( D k ) = 8 k 17 . □
With Lemma 37 and Theorem 10, we determine that the additive sensitivity of ρ for C k is Θ ( log n ) when deleting the last character.
Theorem 11.
ρ ( C k #)= 8 k 16 .
Proof. 
C k # = a k 2 b k a · i = 2 k 2 ab i aaab i aba i 2 · ab k 1 aa ab k 1 a #. C k # is Lyndon factorized into three parts, which are D k , a and #, because the lexicographic order of a is lower than D k , and moreover # is smaller than both D k and a . Therefore, the ρ ( C k # ) = ρ ( # ) + ρ ( a ) + ρ ( D k ) = 8 k 16 . We show a sketch in Figure 13. □
With Lemma 37 and Theorem 11, we obtain that the additive sensitivity of ρ for C k is Θ ( log n ) when substituting the last character.

7.2. Editing C k with a Character Larger than b

Now, we consider the editing operation C k with a character c that is lexicographically larger than any character in C k . In this part, we consider two edit operations that add c in the last part of C k , and substitute the last character of C k into c .

7.2.1. Appending c to C k

Now, we prove that adding c to C k , i.e., C k becomes C k c , also adds Θ ( n ) in runs in BBWT. C k c = a k 2 b k a · i = 2 k 2 ab i aaab i aba i 2 · ab k 1 aa ab k 1 abc . We illustrate C k in Figure 14. Similar to Section 7.1, we slightly modify E k 1 to ab k 1 abca k 3 . In this section, we call this modified subword S k 1 . The lexicographic order of c is larger than any words in C k . Thus, C k c is a Lyndon word itself. Recall that the runs of a Lyndon word are the same in both the BWT or the BBWT, so we obtain ρ ( C k c ) by using BWT with M ( C k c ) the same way we did in previous lemmas.
Lemma 48.
β ( a k 2 b , C k c ) = c .
Proof. 
The first conjugate in M ( C k c ) is a k 2 ba P 2 c . The first conjugate must start with the longest run of a s. In C k c , the longest run of a has a length of k 2 which is a prefix of itself, and it is obtained by concatenating the suffix a k 3 of S k 1 with Q k , and it is preceded by a c . □
Lemma 49.
β ( a i b , C k c ) = ba k i 2 for all i [ 4 . . k 3 ] .
Proof. 
In M ( C k c ) , the conjugates starting with a i b for i [ 4 . . k 3 ] are
a i 1 P i + 2 b < a i 1 P i + 3 a < < a i 1 P k 1 a < a i 1 Q k a .
For all i [ 4 . . k 3 ] , the factor a i b can only be obtained, for all j [ i + 2 . . k ] , from the concatenation of the suffix a i 1 of E j 1 with prefix ab of P j , if j [ i + 2 . . k 1 ] or from the concatenation with a i 1 of S k 1 with prefix ab of Q k . We can sort these conjugates according to the lexicographic order of j = i k 1 a i 1 P j a i 1 Q k . Note that all these conjugates end with an a , with the exception of the conjugate starting with a i 1 P i + 2 , since it is here the only occurrence of ba i b can be found. □
Lemma 50.
β ( aaab , C k c ) = bbbbb ( ab ) k 6 a .
Proof. 
In M ( C k c ) , the conjugates starting with aaab are
aa E 2 b < aa E 3 b < aa E 4 b < aa P 5 b < aa E 5 b < aa P 6 a < aa E 6 b < < aa P k 2 a < aa E k 2 b < aa P k 1 a < aa S k 1 b < aa Q k a .
Similarly to Lemma 49, aaab can be obtained from concatenation of the suffix aa of E j 1 , with the prefix ab of P j , if j [ 5 . . k 1 ] , or concatenating aa of S k 1 with prefix ab of Q k . On the other hand, there are more conjugates from concatenating suffix aa of P j to the prefix ab of E j , for all j [ 2 . . k 2 ] , or with S k 1 if j = k 1 . All the conjugates starting with aaab are sorted according to the lexicographic order of the words in j = 2 4 { aa E j } { aa P 5 · aa E 5 } j = 6 k 2 { aa P j · aa E j } { aa P k 1 · aa S k 1 } { aa Q k } . Note that all the conjugates starting either with aa P j , for all j [ 6 . . k 1 ] , or with aa Q k , end with a . On the other hand, the conjugates starting either with aa P 5 or with aa E j , for all j [ 2 . . k 2 ] or aa S k 1 , end with b . □
Lemma 51.
β ( aab , C k c ) = baaba 2 k 8 .
Proof. 
The conjugates starting with aab in M ( C k c ) are
a P 2 b < a E 2 a < a E 3 a < a P 4 b < a E 4 a < a P 5 a < a E 4 a < < a P k 1 a < a S k 1 a < a Q k a .
Each of the conjugates starting with aaab from Lemma 50 induces a conjugate starting with aab , obtained by shifting one character on the left a . It follows that all of these conjugates end with a . The other conjugates starting with aab are the ones obtained by concatenating the suffix a of E 3 and the prefix ab of P 4 , and the one obtained by concatenating the suffix a of Q k and the prefix ab of P 2 . Moreover, both conjugates end with a b . We conclude this proof by sorting lexicographically the conjugates in { a P 2 } i = 2 3 { a E i } i = 4 k 2 { a P i · a E i } { a P k 1 · a S k 1 } { a Q k } . □
Lemma 52.
β ( ab , C k c ) = b k 3 aaba 2 k 6 b .
Proof. 
The conjugates in M ( C k c ) starting with ab are
ab k 4 P k 1 b < ab k 5 P k 2 b < < ab P 3 b < P 2 a < E 2 a < P 3 b < E 3 a < P 4 a < E 4 a < < P k 1 a < S k 1 a < Q k a < abc b .
For all two distinct integers i , i with i > i 0 , we have ab i ab < ab i ab . Thus, the first conjugate in lexicographic order starting with ab is the one followed by the longest run of a s. The smallest of these conjugates can be found by concatenating the suffix aba k 4 of E k 2 with P k 1 , followed by the suffix aba i 3 of E i 1 concatenated with P i , for all i [ 3 . . k 2 ] , taken in decreasing order. By construction of E i , for all i [ 2 . . k 2 ] , these conjugates all end with a b . The remaining conjugates starting with ab are exactly those conjugates that have as prefix either P i or E i , for all i [ 2 . . k 2 ] , P k 1 , S k 1 or Q k . Note that all of these conjugates are obtained by shifting one character on the left a from the conjugates starting with aab from Lemma 51, with the exception of one starting with P 3 . It follows that the latter ends with a b , while all the other conjugates end with a . Finally, the conjugate starting with the prefix abc follows, which ends with b . □
Lemma 53.
β ( ba , C k c ) = a k 6 bbbab k 4 ab k 3 ab .
Proof. 
In M ( C k c ) , the conjugates starting with ba are
ba k 4 P k 1 a < ba k 5 P k 2 a < < baaa P 6 a < baa E 2 b < baa E 3 b < baa E 4 b < baa P 5 a < baa E 5 b < baa E 6 b < < baa E k 2 b < baa S k 1 b < ba P 2 b < ba P 4 a < baba k 4 P k 1 b < < baba P 4 b < bab P 3 b < b P 3 a < babca k 3 Q k b .
We have as many circular occurrences of ba as the number of maximal runs of b in C k c . We have four cases.
Case 1:
  one run of b s in P i , for all i [ 2 . . k 1 ] ,
Case 2:
  two runs in E i for all i [ 2 . . k 2 ] ,
Case 3:
  one run of ba in Q k ,
Case 4:
  one run of ba in S k 1 .
For Case 1, we have one conjugate starting with baa E i , for each i [ 2 . . k 2 ] , or baa S k 1 . Since each run of b s within each word from i = 2 k 1 { P i } is of length of at least 2, all conjugates in Case 1 end with a b .
For Case 2 and all i [ 2 . . k 2 ] , we can distinguish between two subcases based on where ba starts:
Case 2 (a):
  a first run of ba in E i , which has a type of baba i 2 for all i [ 2 . . k 2 ] ,
Case 2 (b):
  a second run of ba in E i , which has a type of ba i 2 for all i [ 2 . . k 2 ] .
  • For Case 2 (a), we can see that these conjugates are of the type baba i 2 P i + 1 , for i [ 2 . . k 2 ] . Analogously to Case 1, each conjugates for Case 2 (a) end with a b . Each conjugate in Case 2 (b) is obtained by shifting two characters on the right each conjugate in Case 2 (a). Therefore, all of these conjugates end with an a and have prefixes of the type ba i 2 P i + 1 , for all i [ 2 . . k 2 ] .
  • For Case 3, the conjugate starting with ba in Q k has ba P 2 as prefix, and it is preceded by a b .
  • Finally, in Case 4, there is one run of ba , having a prefix of babca k 3 Q k , ending with b .
  • Only for Case 2 (b) we have conjugates starting with baaaa . Hence, the first conjugate in lexicographic order is the one starting with ba k 4 P k 1 , followed by those ba k 5 P k 2 < < baaa P 6 .
Among the remaining conjugates, those having prefix baaa either start with baa P 5 from Case 2 (b) or baa E i from Case 1, for all i [ 2 . . k 2 ] or baa S k 1 if i = k 1 . We can sort these conjugates by following the order of i = 2 4 { baa E i } { baa P 5 } i = 5 k 2 { baa E i } { baa S k 1 } . Then, the remaining conjugates with prefix baa are those starting with ba P 2 from Case 3 and ba P 4 from Case 2 (b). Finally, we focus on the conjugates from Case 2 (a). These conjugates are sorted according to the length of the run of a s following the common prefix bab . The last two conjugates left are one starting with b P 3 from Case 2 (b), and the one from Case 4, which is babca k 3 Q k . These two conjugates are already sorted. Since these conjugates are greater than other conjugates, these are the greatest conjugates of M ( C k c ) starting with ba . □
Lemma 54.
β ( b i a , C k c ) = ab 2 k 2 i 2 ab for all i [ 2 . . k 2 ] .
Proof. 
With integer i [ 2 . . k 2 ] , conjugates in M ( C k c ) with prefix b i a are
b i aa E i a < b i aa E i + 1 b < < b i aa E k 2 b < b i aa S k 1 b < b i a P 2 b < b i aba k 4 P k 1 b < b i aba k 5 P k 2 b < < b i aba i 1 P i + 2 b < b i aba i 2 P i + 1 a < b i abca k 3 Q k b .
With integer i [ 2 . . k 2 ] , these conjugates are obtained in the following cases.
Case 1:
  concatenating b i aa of P j with E j , for all j [ i . . k 2 ] or with S k 1 if j = k 1 ,
Case 2:
  concatenating b i aba j 2 of E j with P j + 1 for all i [ 2 . . k 2 ] ,
Case 3:
  concatenating b i a of Q k with P 2 ,
Case 4:
  concatenating b i abca k 3 of S k 1 with Q k .
We consider the four cases separately. For all j [ i . . k 1 ] , the conjugate starting within P j (Case 1) has as prefix b i aa E j if j [ i . . k 2 ] or b i aa S k 1 if j = k 1 . Also, when j [ i . . k 2 ] , the conjugate starting within E j (Case 2) has the prefix of b i aba j 2 P j + 1 . In addition, the conjugate starting within Q k (Case 3) has as prefix b i a P 2 . Finally, the conjugate that begins within S k 1 (Case 4) has a prefix of b i abca k 3 . By construction, we can see that all the conjugates from Case 1 are sorted according to the lexicographic order of the words in j = i k 2 { b i aa E j } { b i aa S k 1 } ; then, we have the conjugate from Case 3. Following, we have the conjugate from Case 2, sorted according to the decreasing length of the run of a s following the common prefix b i ab . Finally, the conjugate of Case 4 follows. Moreover, we note that only when the run of b s is exactly of length i ends the conjugate with an a . Thus, only conjugates ending with an a are those starting within P i and E i , i.e., those with prefixes b i aa E i and b i aba i 2 P i + 1 . □
Lemma 55.
β ( b k 1 a , C k c ) = aba .
Proof. 
In M ( C k c ) , the conjugates with prefix b k 1 a are
b k 1 aa S k 1 a < b k 1 a P 2 b < b k 1 abca k 3 Q k a .
Observe that the only conjugates with prefix b k 1 a start within P k 1 , Q k and S k 1 . These conjugates have prefixes of, respectively, b k 1 aa S k 1 , b k 1 a P 2 , b k 1 abca k 3 Q k . One can see that these conjugates taken in this order are already sorted, and only the conjugate starting within Q k ends with b , while the other two have a . □
Lemma 56.
β ( b k a , C k c ) = a .
Proof. 
In M ( C k c ) , the conjugate with prefix b k a is b k a P 2 a . The only occurrence of b k a is within Q k . Since the run of b s is maximal, it ends with a . □
Lemma 57.
β ( bc , C k c ) = a .
Proof. 
In M ( C k c ) , the conjugate starting with bc is bca k 3 Q k a . The only occurrence of bc is in S k 1 , preceded by an a . □
Lemma 58.
β ( c , C k c ) = b .
Proof. 
In M ( C k c ) , the last conjugate is ca k 3 Q k b since c is biggest character in C k c . The only occurrence of c is in the last character of C k c . Hence, the last conjugate in lexicographic order starts with ca k 3 Q k . Since c is preceded by b , the conjugate C k c contributes a b to the BWT. □
The following theorem puts the above lemmas together.
Theorem 12.
ρ ( C k c ) = 8 k 12 , cf. Table 15. It holds that  BBWT ( C k c ) = BWT ( C k c ) = i = 2 k 1 β ( a k i b ) · i = 1 k β ( b i a ) · β ( bc ) · β ( c ) .
Proof. 
Every conjugate of β ( a i b ) is smaller than any conjugate of β ( a i b ) , for all 1 i i k 2 . Symmetrically, every conjugate of β ( b j a ) is greater than any conjugate of β ( b j a ) , for every 1 j j k . Since we considered all the disjoint ranges of conjugates of C k c based on their common prefix, i = 2 k 1 β ( a k i b ) · i = 1 k β ( b i a ) · β ( bc ) · β ( c ) is the BBWT and BWT of C k c .
With the structure of BWT( C k c ), we can easily derive its number of runs. The word i = 2 k 4 β ( a k i b ) has exactly 2 k 11 runs: we start with 1 run from β ( a k 2 b ) = c , and then concatenating each from β ( a k 3 b ) to β ( aaaab ) adds 2 runs each. By counting, we observe that β ( aaab ) , β ( aab ) , β ( ab ) , have 2 k 10 , 4, 5 runs, respectively. The boundaries between these words do not merge. The word β ( ba ) has exactly 8 runs. The remaining parts of the BWT i = 2 k β ( b i a ) have 4 ( k 3 ) + 4 runs: we start adding 4 runs each by concatenating each β ( bba ) to β ( b k 2 a ) . And β ( b k 1 a ) adds 3 runs. On the other hand, the words β ( b k a ) and β ( bc ) do not add new runs, as they consist only of an a that merges with the previous one. For the last element, β ( c ) adds one run. Altogether, we have 2 k 11 + 2 k 10 + 8 + 4 + 5 + 4 k 12 + 3 + 1 = 8 k 12 , and the claim holds. □
With Lemma 37 and Theorem 12, we obtain that the additive sensitivity of ρ for C k is Θ ( log n ) when appending a character.

7.2.2. Substituting the Last Position of C k with c

Here, we focus on the word C k c that we obtain by substituting the last character of C k with c , which is lexicographically larger than any character in C k . See Figure 15 for a visualization. The same as Section 7.2.1, E k 1 changes to ab k 1 aca k 3 , and we refer to it as R k 1 below. According to its definition, C k c = a k 2 b k a · i = 2 k 2 ab i aaab i aba i 2 · ab k 1 aa ab k 1 a c . Recall from the proof of Theorem 10 that C k is not a Lyndon factor. The Lyndon factors of C k are D k and a . There, we prove that the run of C k is 8 k 17 . We start with the first observation that C k c is a Lyndon word.
Lemma 59.
C k c is a Lyndon word.
Proof. 
The longest run of a has a length of k 2 , which is a prefix of C k c itself having prefix a k 2 b . Thus, C k c is a Lyndon word. □
Thus, we prove ρ ( C k c ) using the M ( C k c ) as we did above.
Lemma 60.
β ( a k 2 b , C k c ) = c .
Proof. 
The first conjugate in M ( C k c ) is a k 2 b k a P 2 c . The first conjugate in lexicographic order must start with the longest run of a s. By the definition of C k , the longest run of a has length k 2 , and it is obtained by concatenating the suffix a k 3 of R k 1 with Q k , which is preceded by a c . □
Lemma 61.
β ( a i b , C k c ) = ba k 2 i for all i [ 4 . . k 3 ] .
Proof. 
All conjugates in M ( C k c ) starting with the prefix a i b for any i [ 4 . . k 3 ] are given below.
a i 1 P i + 2 b < a i 1 P i + 3 a < < a i 1 P k 1 a < a i 1 Q k a .
For all i [ 4 . . k 3 ] , the factor a i b can only be obtained, for all j [ i + 2 . . k 1 ] , by concatenating the suffix a i 1 of E j 1 , with the prefix ab of P j , or by concatenating suffix a k 3 of R k 1 with the prefix ab of Q k . We can sort these conjugates according to the lexicographic order of j = i k 3 { a i 1 P j + 2 } { a i 1 Q k } . Note that all these conjugates end with an a , with the exception of the conjugate starting with a i 1 P i + 2 , since it is here the only occurrence of ba i b can be found. □
Lemma 62.
β ( aaab , C k c ) = bbbbb ( ab ) k 6 a .
Proof. 
The conjugates in M ( C k c ) starting with the prefix aaab are
aa E 2 b < aa E 3 b < aa E 4 b < aa P 5 b < aa E 5 b < aa P 6 a < aa E 6 b < < aa P k 2 a < aa R k 1 b < aa Q k a .
These conjugates are obtained from the following cases.
Case 1:
  concatenating suffix aa of P i with prefix ab of E i , for all i [ 2 . . k 2 ] or with R k 1 if i = k 1 ,
Case 2:
  concatenating suffix aa of E i 1 with prefix ab of P i for all i [ 5 . . k 1 ] ,
Case 3:
  concatenating suffix aa of R k 1 with prefix ab of Q k .
All these conjugates starting with aaab are sorted according to the lexicographic order of the words in i = 2 4 { aa E i } { aa P 5 · aa E 5 } i = 6 k 2 { aa P i · aa E i } { aa P k 1 · aa R k 1 } { aa Q k } . Note that all the conjugates starting either with aa P i , for all i [ 6 . . k 1 ] of Case 2, or Case 3 end with a . On the other hand, the conjugates starting either with aa P 5 of Case 2 or Case 1 end with a b . □
Lemma 63.
β ( aab , C k c ) = baaba 2 k 8 .
Proof. 
The conjugates in M ( C k c ) that starts with the prefix aab are
a P 2 b < a E 2 a < a E 3 a < a P 4 b < a E 4 a < a P 5 a < a E 5 a < < a P k 2 a < a E k 2 a < a P k 1 a < a R k 1 a < a Q k a .
Each of the conjugates starting with aaab from Lemma 62 induces a conjugate starting with aab , obtained by shifting one character on the left a . It follows that all of these conjugates end with a . The other conjugates starting with aab are the ones obtained by concatenating suffix a of Q k with ab of P 2 , and another is obtained by concatenating suffix a of E 3 with ab of P 4 . Moreover, both conjugates end with a b . We prove our claim by sorting the conjugates according to the lexicographic order of the words in { a P 2 · a E 2 · a E 3 } i = 4 k 2 { a P i · a E i } { a P k 1 · a R k 1 } { a Q k } . □
Lemma 64.
β ( ab , C k c ) = b k 3 aaba 2 k 6 .
Proof. 
In M ( C k c ) , the conjugates which start with prefix ab are
aba k 4 P k 1 b < aba k 5 P k 2 b < < ab P 3 b < P 2 a < E 2 a < P 3 b < E 3 a < P 4 a < E 4 a < < P k 2 a < E k 2 a < P k 1 a < R k 1 a < Q k a .
For all two distinct integers i , i with i > i 0 , we have aba i b < aba i b . Thus, the first conjugate in lexicographic order starting with ab is the one which is followed by the longest run of a s. The smallest of these conjugates can be found by concatenating the suffix aba k 4 of E k 2 with the prefix ab of P k 1 , followed by the suffix aba i 3 of E i 1 concatenated with the prefix ab of P i , for all i [ 3 . . k 2 ] all taken in decreasing order. By construction of E i , for all i [ 2 . . k 2 ] , these conjugates must end with a b . The remaining conjugates starting with ab are exactly those conjugates having as prefix either P i for all i [ 2 . . k 1 ] and E i for all i [ 2 . . k 2 ] or R k 1 and Q k . Note that all of these conjugates are obtained by shifting one character on the left a from the conjugates starting with aab from Lemma 63, with the exception of one starting with P 3 . It follows that the latter ends with a b , while all the other conjugates end with an a . □
Lemma 65.
β ( ac , C k c ) = b .
Proof. 
In M ( C k c ) , the conjugate that starts with prefix ac is aca k 3 Q k b . The lexicographic order of c is larger than b or a , so the prefix ac is also larger than the prefix ab . ac is obtained from R k 1 , preceded by a b . □
Lemma 66.
β ( ba , C k c ) = a k 6 bbbab k 4 ab k 3 ab .
Proof. 
In M ( C k c ) , the conjugates starting with the prefix ba are
ba k 4 P k 1 a < ba k 5 P k 2 a < < baaa P 6 a < baa E 2 b < baa E 3 b < baa E 4 b < baa P 5 a < baa E 5 b < baa E 6 b < < baa R k 1 b < ba P 2 b < ba P 4 a < baba k 4 P k 1 b < baba k 5 P k 2 b < < bab P 3 b < b P 3 a < baca k 3 Q k b .
One can notice that we have as many circular occurrences of ba as the number of maximal runs of b s in M ( C k c ) . The conjugates are obtained from the cases below.
Case 1:
  one run of b s in P i , for all i [ 2 . . k 1 ] ,
Case 2:
  two runs in E i for all i [ 2 . . k 2 ] ,
Case 3:
  one run in Q k ,
Case 4:
  one run in R k 1 .
For Case 1, we have one conjugate starting with baa E i for each i [ 2 . . k 1 ] . Since each run of b s within each word from i = 2 k 1 { P i } is of length of at least 2, all conjugates in Case 1 end with a b .
For Case 2, with integer i [ 2 . . k 2 ] , we can distinguish between two subcases based on where ba starts:
Case 2 (a):
  a first run of ba in E i , which has a prefix of baba i 2 P i + 1 for all i [ 2 . . k 2 ] ,
Case 2 (b):
  a second run of ba in E i , which has a prefix of ba i 2 P i + 1 for all i [ 2 . . k 2 ] .
  • Similarly to Case 1, all the conjugates in Case 2 (a) end with a b .
  • Each conjugate in Case 2 (b) is obtained by shifting two characters on the right each conjugate in Case 2 (a). Therefore, all of these conjugates end with an a .
  • For Case 3, the conjugate starting with ba in Q k has ba P 2 as a prefix, and it is preceded by a b .
  • For Case 4, ba in R k 1 has baca k 3 as a prefix, and it is preceded by a b .
  • Observe that only for Case 2 (b) we have conjugates starting with baaaa . Hence, the first conjugate in lexicographic order is the one starting with ba k 4 P k 1 followed by ba k 5 P k 2 < < baaa P 6 .
  • Among the remaining conjugates, those having prefix baaa either start with baa P 5 from Case 2 (b) or baa E i from Case 1 for all i [ 2 . . k 1 ] . Thus, we can sort them according to the order of the words in i = 2 4 { baa E i } { baa P 5 } i = 5 k 2 { baa E i } . Then, the remaining conjugates with prefix baa are those starting with ba P 2 from Case 3 and ba P 4 from Case 2 (b).
Finally, we focus on the conjugates from Case 2 (a). These conjugates are sorted according to the length of the run of a s following the common prefix bab . The last conjugates left are the one starting with b P 3 from Case 2 (b). and the one starting with baca k 3 from Case 4. These conjugates are lexicographically organized and are greater than any other cases, and therefore we analyzed all conjugates. □
Lemma 67.
β ( b i a , C k c ) = ab 2 k 2 i 2 ab for all i [ 2 . . k 2 ] .
Proof. 
In M ( C k c ) , the conjugates starting with b i a for all i [ 2 . . k 2 ] are
b i aa E i a < b i aa E i + 1 b < b i aa E i + 2 b < < b i aa R k 1 b < b i a P 2 b < b i aba k 4 P k 1 b < < b i aba i 1 P i + 2 b < b i aba i 2 P i + 1 a < b i aca k 3 Q k b .
All runs of b s of length of at least i [ 2 . . k 2 ] are obtained from the cases below.
Case 1:
  suffix b i aa in P j , for all j [ i . . k 1 ]
Case 2:
   b i aba j 2 in E j for all j [ i . . k 2 ] ,
Case 3:
   b i a in Q k ,
Case 4:
   b i aca k 3 in R k 1 .
  • Consider the four cases separately. The conjugate starting within P j (Case 1) has as prefix b i aa E j if only j [ i . . k 2 ] or b i aa R k 1 if j = k 1 .
  • And for all j [ i . . k 2 ] , the conjugate starting within E j (Case 2) has as prefix b i aba j 2 P j + 1 .
  • In addition, the conjugate starting within Q k (Case 3) has as prefix b i a P 2 .
  • Finally, the conjugate that begins within R k 1 (Case 4) has as prefix b i aca k 3 .
By construction, we have all the conjugates from Case 1 sorted according to the lexicographic order of the words in j = i k 2 { b i aa E j } { b i aa R k 1 } ; then, we have the conjugate from Case 3. Then, the conjugates of Case 2 are sorted according to the decreasing length of the run of a s following the common prefix b i ab . Finally, the conjugate of Case 4 follows. Moreover, note that only when the run of b s is exactly of length i the conjugate ends with an a . Thus, only the conjugates ending with an a are those starting within P i and E i , i.e., those with prefix b i aa E i and b i aba i 2 P i + 1 . □
Lemma 68.
β ( b k 1 a , C k c ) = aba .
Proof. 
In M ( C k c ) , there are exactly three conjugates that start with prefix b k 1 a . These are
b k 1 aaab k 1 aca k 3 Q k a < b k 1 a P 2 b < b k 1 aca k 3 Q k a .
Observe that the only conjugates with prefix b k 1 a start within P k 1 , Q k , and R k 1 . These conjugates have prefixes of, respectively, b k 1 R k 1 , b k 1 a P 2 , b k 1 aca k 3 Q k . One can see that these conjugates taken in this order are already sorted, and only the conjugate starting within Q k ends with b , while the other two end with a . □
Lemma 69.
β ( b k a , C k c ) = a .
Proof. 
In M ( C k c ) , only one conjugate starts with a prefix of b k a and it is b k a P 2 a . The only occurrence of b k a is within Q k , preceded by a . □
Lemma 70.
β ( c , C k c ) = a .
Proof. 
The last conjugate in M ( C k c ) that starts with prefix c is ca k 3 Q k a . The last conjugate in lexicographic order that starts with c occurs in R k 1 . Since c is preceded by an a , it ends with a . □
The following theorem puts the above lemmas together.
Theorem 13.
ρ ( C k c ) = 8 k 13 , cf. Table 16. The BBWT of C k c is BBWT( C k c ) = i = 2 k 1 β ( a k i b ) · β ( ac ) · i = 1 k β ( b i a ) · β ( c ) .
Proof. 
Every conjugate contributing a character to β ( a i b ) is smaller than any conjugate of β ( a i b ) , for all 1 i i k 2 . Symmetrically, every conjugate contributing a character to β ( b j a ) is greater than any conjugate of β ( b j a ) , for every 1 j j k . Since we considered all the disjoint ranges of conjugates of C k c based on their common prefix, i = 2 k 1 β ( a k i b ) · β ( ac ) · i = 1 k β ( b i a ) · β ( c ) is the BBWT and BWT of C k c .
With the structure of BWT( C k c ), we can easily derive its number of runs. The word i = 2 k 4 β ( a k i b ) has exactly 2 k 11 runs: we start with 1 run from β ( a k 2 b ) = c , and then concatenating each from β ( a k 3 b ) to β ( a 4 b ) adds 2 runs each. By counting, we observe that β ( aaab ) , β ( aab ) , and β ( ab ) contribute 2 k 10 , 4, and 4 runs, respectively. The boundaries between these words do not merge. The conjugates in β ( ac ) and β ( ba ) contribute with 1 and 8 runs each. The remaining parts of the BWT i = 2 k β ( b i a ) contribute 4 ( k 3 ) + 3 runs: we start adding 4 runs each by concatenating each β ( bba ) to β ( b k 2 a ) . And β ( b k 1 a ) adds 3 runs. β ( b k a ) and β ( c ) do not add new runs, as they consist only of an a that merges with the previous one. The last part β ( c ) contributes one run. In total, we have 2 k 11 + 2 k 10 + 4 + 4 + 1 + 8 + 4 k 12 + 3 = 8 k 13 , and the claim holds. □

8. Conclusions

In this article, we analyzed the sensitivity of the Burrows–Wheeler Transform (BWT) and its bijective variant (BBWT) to single-character edits. We extended previous work on the BWT by a four-character alphabet setting and an alphabet reordering. Our findings reveal that BWT and BBWT exhibit similar sensitivity characteristics, with compression size changes that can follow a multiplicative logarithmic or additive square-root growth. These insights clarify that the BWT and BBWT are not robust repetitiveness measures, which is a crucial property for data compression applications. As future work, we would like to find positions in a word for which we can predict the compression size changes when editing that position. That would allow us to design algorithms to improve the compression power of BWT/BBWT by editing the word in a way that minimizes the compression size changes.

Author Contributions

Conceptualization, D.K.; Writing—original draft, H.J. All authors have read and agreed to the published version of the manuscript.

Funding

JSPS KAKENHI Grant Number 23H04378 and Yamanashi Wakate Grant Number 2291.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Burrows, M.; Wheeler, D.J. A Block Sorting Lossless Data Compression Algorithm; Technical Report 124; Digital Equipment Corporation: Palo Alto, CA, USA, 1994. [Google Scholar]
  2. Ferragina, P.; Manzini, G. Opportunistic Data Structures with Applications. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science, Redondo Beach, CA, USA, 12–14 November 2000; pp. 390–398. [Google Scholar] [CrossRef]
  3. Gagie, T.; Navarro, G.; Prezza, N. Optimal-Time Text Indexing in BWT-runs Bounded Space. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–10 January 2018; pp. 1459–1477. [Google Scholar] [CrossRef]
  4. Gagie, T.; Navarro, G.; Prezza, N. Fully Functional Suffix Trees and Optimal Text Searching in BWT-Runs Bounded Space. J. ACM 2020, 67, 2:1–2:54. [Google Scholar] [CrossRef]
  5. Bertram, N.; Fischer, J.; Nalbach, L. Move-r: Optimizing the r-index. In Proceedings of the 22nd International Symposium on Experimental Algorithms (SEA 2024), Vienna, Austria, 23–26 July 2024; Volume 301, pp. 1:1–1:19. [Google Scholar] [CrossRef]
  6. Cobas, D.; Gagie, T.; Navarro, G. A Fast and Small Subsampled R-Index. In Proceedings of the 32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021), Wrocław, Poland, 5–7 July 2021; Volume 191, pp. 13:1–13:16. [Google Scholar] [CrossRef]
  7. Arakawa, Y.; Navarro, G.; Sadakane, K. Bi-Directional r-Indexes. In Proceedings of the 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022), Prague, Czech Republic, 27–29 June 2022; Volume 223, pp. 11:1–11:14. [Google Scholar] [CrossRef]
  8. Shivakumar, V.S.; Ahmed, O.Y.; Kovaka, S.; Zakeri, M.; Langmead, B. Sigmoni: Classification of nanopore signal with a compressed pangenome index. Bioinformatics 2024, 40, i287–i296. [Google Scholar] [CrossRef]
  9. Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef] [PubMed]
  10. Li, R.; Yu, C.; Li, Y.; Lam, T.W.; Yiu, S.; Kristiansen, K.; Wang, J. SOAP2: An improved ultrafast tool for short read alignment. Bioinformatics 2009, 25, 1966–1967. [Google Scholar] [CrossRef] [PubMed]
  11. Ferragina, P.; Manzini, G.; Muthukrishnan, S.M. The Burrows–Wheeler Transform: Ten Years Later; DIMACS: Piscataway, NJ, USA, 2004. [Google Scholar]
  12. Gagie, T.; Manzini, G.; Navarro, G.; Stoye, J. 25 Years of the Burrows–Wheeler Transform (Dagstuhl Seminar 19241). Dagstuhl Rep. 2019, 9, 55–68. [Google Scholar]
  13. Adjeroh, D.; Bell, T.; Mukherjee, A. The Burrows–Wheeler Transform: Data Compression, Suffix Arrays, and Pattern Matching; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
  14. Gessel, I.M.; Reutenauer, C. Counting Permutations with Given Cycle Structure and Descent Set. J. Comb. Theory Ser. A 1993, 64, 189–215. [Google Scholar]
  15. Schindler, M. A fast block-sorting algorithm for lossless data compression. In Proceedings of the DCC ’97. Data Compression Conference, Snowbird, UT, USA, 25–27 March 1997; p. 469. [Google Scholar] [CrossRef]
  16. Mantaci, S.; Restivo, A.; Rosone, G.; Sciortino, M. An extension of the Burrows–Wheeler Transform. Theor. Comput. Sci. 2007, 387, 298–312. [Google Scholar] [CrossRef]
  17. Kufleitner, M. On Bijective Variants of the Burrows–Wheeler Transform. In Proceedings of the Prague Stringology Conference 2009 (PSC 2009), Prague, Czech Republic, 31 August–2 September 2009; pp. 65–79. [Google Scholar]
  18. Daykin, J.W.; Smyth, W.F. A bijective variant of the Burrows–Wheeler Transform using V-order. Theor. Comput. Sci. 2014, 531, 77–89. [Google Scholar] [CrossRef]
  19. Daykin, J.W.; Groult, R.; Guesnet, Y.; Lecroq, T.; Lefebvre, A.; Léonard, M.; Prieur-Gaston, É. Binary block order Rouen Transform. Theor. Comput. Sci. 2016, 656, 118–134. [Google Scholar] [CrossRef]
  20. Gil, J.Y.; Scott, D.A. A Bijective String Sorting Transform. arXiv 2012, arXiv:1201.3077. [Google Scholar] [CrossRef]
  21. Likhomanov, K.M.; Shur, A.M. Two Combinatorial Criteria for BWT Images. In Computer Science–Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6651, pp. 385–396. [Google Scholar] [CrossRef]
  22. Giuliani, S.; Lipták, Z.; Masillo, F.; Rizzi, R. When a dollar makes a BWT. Theor. Comput. Sci. 2021, 857, 123–146. [Google Scholar] [CrossRef]
  23. Giuliani, S.; Lipták, Z.; Masillo, F. When a Dollar in a Fully Clustered Word Makes a BWT. Ceur Workshop Proc. 2022, 3284, 122–135. [Google Scholar]
  24. Giuliani, S.; Inenaga, S.; Lipták, Z.; Romana, G.; Sciortino, M.; Urbina, C. Bit Catastrophes for the Burrows–Wheeler Transform. In Developments in Language Theory; Springer: Berlin/Heidelberg, Germany, 2023; Volume 13911, pp. 86–99. [Google Scholar] [CrossRef]
  25. Akagi, T.; Funakoshi, M.; Inenaga, S. Sensitivity of string compressors and repetitiveness measures. Inf. Comput. 2023, 291, 104999. [Google Scholar] [CrossRef]
  26. Lagarde, G.; Perifel, S. Lempel-Ziv: A “one-bit catastrophe” but not a tragedy. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–10 January 2018; pp. 1478–1495. [Google Scholar] [CrossRef]
  27. Ziv, J.; Lempel, A. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 1977, 23, 337–343. [Google Scholar] [CrossRef]
  28. Ziv, J.; Lempel, A. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 1978, 24, 530–536. [Google Scholar] [CrossRef]
  29. Kempa, D.; Prezza, N. At the roots of dictionary compression: String attractors. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, Los Angeles, CA, USA, 25–29 June 2018; pp. 827–840. [Google Scholar] [CrossRef]
  30. Navarro, G.; Ochoa, C.; Prezza, N. On the Approximation Ratio of Ordered Parsings. IEEE Trans. Inf. Theory 2021, 67, 1008–1026. [Google Scholar] [CrossRef]
  31. Nakashima, Y.; Köppl, D.; Funakoshi, M.; Inenaga, S.; Bannai, H. Edit and Alphabet-Ordering Sensitivity of Lex-Parse. In Proceedings of the 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024), Bratislava, Slovakia, 26–30 August 2024; Volume 306, pp. 75:1–75:15. [Google Scholar] [CrossRef]
  32. Blumer, A.; Blumer, J.; Haussler, D.; Ehrenfeucht, A.; Chen, M.T.; Seiferas, J.I. The Smallest Automaton Recognizing the Subwords of a Text. Theor. Comput. Sci. 1985, 40, 31–55. [Google Scholar] [CrossRef]
  33. Fujimaru, H.; Nakashima, Y.; Inenaga, S. On Sensitivity of Compact Directed Acyclic Word Graphs. In Combinatorics on Words; Springer: Berlin/Heidelberg, Germany, 2023; Volume 13899, pp. 168–180. [Google Scholar] [CrossRef]
  34. Olbrich, J.; Ohlebusch, E.; Büchler, T. Generic Non-recursive Suffix Array Construction. ACM Trans. Algorithms 2024, 20, 18. [Google Scholar] [CrossRef]
  35. Bannai, H.; Kärkkäinen, J.; Köppl, D.; Piątkowski, M. Constructing and Indexing the Bijective and Extended Burrows–Wheeler Transform. Inf. Comput. 2024, 297, 1–30. [Google Scholar] [CrossRef]
  36. Badkobeh, G.; Bannai, H.; Köppl, D. Bijective BWT based Compression Schemes. In String Processing and Information Retrieval; Springer: Berlin/Heidelberg, Germany, 2024; Volume 14899, pp. 16–25. [Google Scholar] [CrossRef]
  37. Biagi, E.; Cenzato, D.; Lipták, Z.; Romana, G. On the Number of Equal-Letter Runs of the Bijective Burrows–Wheeler Transform. In Proceedings of the International Conference on Information and Communication Technology for Competitive Strategies (ICTCS), Jaipur, India, 8–9 December 2023; Volume 3587, pp. 129–142. [Google Scholar]
  38. Lyndon, R.C. On Burnside’s Problem. Trans. Am. Math. Soc. 1954, 77, 202–215. [Google Scholar]
  39. Chen, K.T.; Fox, R.H.; Lyndon, R.C. Free differential calculus, IV. The quotient groups of the lower central series. Ann. Math. 1958, 68, 81–95. [Google Scholar]
  40. Lothaire, M. Combinatorics on Words, 2nd ed.; Cambridge Mathematical Library, Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
  41. Saari, K. Lyndon words and Fibonacci numbers. J. Comb. Theory Ser. A 2014, 121, 34–44. [Google Scholar] [CrossRef]
  42. De Luca, A.; Mignosi, F. Some Combinatorial Properties of Sturmian Words. Theor. Comput. Sci. 1994, 136, 361–385. [Google Scholar] [CrossRef]
  43. Mantaci, S.; Restivo, A.; Sciortino, M. Burrows–Wheeler transform and Sturmian words. Inf. Process. Lett. 2003, 86, 241–246. [Google Scholar] [CrossRef]
  44. Christodoulakis, M.; Iliopoulos, C.S.; Ardila, Y.J.P. Simple Algorithm for Sorting the Fibonacci String Rotations. In Proceedings of the SOFSEM, Merin, Czech Republic, 21–27 January 2006; Volume 3831, pp. 218–225. [Google Scholar] [CrossRef]
  45. Séébold, P. Fibonacci Morphisms and Sturmian Words. Theor. Comput. Sci. 1991, 88, 365–384. [Google Scholar] [CrossRef]
  46. Giuliani, S.; Inenaga, S.; Lipták, Z.; Prezza, N.; Sciortino, M.; Toffanello, A. Novel Results on the Number of Runs of the Burrows–Wheeler-Transform. In Proceedings of the SOFSEM, Bolzano-Bozen, Italy, 25–29 January 2021; Volume 12607, pp. 249–262. [Google Scholar] [CrossRef]
  47. Boucher, C.; Cenzato, D.; Lipták, Z.; Rossi, M.; Sciortino, M. r-Indexing the eBWT. In Proceedings of the SPIRE, Lille, France, 4–6 October 2021; Volume 12944, pp. 3–12. [Google Scholar] [CrossRef]
Figure 1. Sketch of the setting conj i ( v ) < conj j ( v ) considered in the proof of Lemma 4.
Figure 1. Sketch of the setting conj i ( v ) < conj j ( v ) considered in the proof of Lemma 4.
Mathematics 13 01070 g001
Figure 2. Illustration of the first case in Lemma 4. Inserting # does not change the lexicographic order between conj i ( v ) and conj j ( v ) .
Figure 2. Illustration of the first case in Lemma 4. Inserting # does not change the lexicographic order between conj i ( v ) and conj j ( v ) .
Mathematics 13 01070 g002
Figure 3. Illustration of the second case in Lemma 4.
Figure 3. Illustration of the second case in Lemma 4.
Mathematics 13 01070 g003
Figure 4. Illustration of Case 2 (a) in Lemma 4. Inserting # in does not affect lexicographic order between conj i ( v ) and conj j ( v ) .
Figure 4. Illustration of Case 2 (a) in Lemma 4. Inserting # in does not affect lexicographic order between conj i ( v ) and conj j ( v ) .
Mathematics 13 01070 g004
Figure 5. Illustration of Case 2 (b) in Lemma 4.
Figure 5. Illustration of Case 2 (b) in Lemma 4.
Mathematics 13 01070 g005
Figure 6. Factorization of L 2 k into Lyndon factors studied in the proof of Theorem 5. L 2 k has k Lyndon factors.
Figure 6. Factorization of L 2 k into Lyndon factors studied in the proof of Theorem 5. L 2 k has k Lyndon factors.
Mathematics 13 01070 g006
Figure 7. Factorization of L 2 k # into Lyndon factors studied in the proof of Theorem 6. L 2 k # has k + 1 Lyndon factors.
Figure 7. Factorization of L 2 k # into Lyndon factors studied in the proof of Theorem 6. L 2 k # has k + 1 Lyndon factors.
Mathematics 13 01070 g007
Figure 8. Factorization of L 2 k c into Lyndon factors studied in the proof of Theorem 7. L 2 k c has k Lyndon factors.
Figure 8. Factorization of L 2 k c into Lyndon factors studied in the proof of Theorem 7. L 2 k c has k Lyndon factors.
Mathematics 13 01070 g008
Figure 9. Inserting # at position α in L 2 k considered in the proof in Theorem 8.
Figure 9. Inserting # at position α in L 2 k considered in the proof in Theorem 8.
Mathematics 13 01070 g009
Figure 10. Insertion of # at position f 2 k 2 in L 2 k increases ρ by at least the number of distinct Lyndon factors k + 1 studied in Theorem 9.
Figure 10. Insertion of # at position f 2 k 2 in L 2 k increases ρ by at least the number of distinct Lyndon factors k + 1 studied in Theorem 9.
Mathematics 13 01070 g010
Figure 11. Introducing D k from C k studied in Section 7.1. D k is the first Lyndon factor of C k .
Figure 11. Introducing D k from C k studied in Section 7.1. D k is the first Lyndon factor of C k .
Mathematics 13 01070 g011
Figure 12. Lyndon factorization of C k . We obtain ρ ( C k ) by knowing the number of runs of both its Lyndon factors and where these conjugates are sorted in the BBWT. The analysis is in the proof of Theorem 10.
Figure 12. Lyndon factorization of C k . We obtain ρ ( C k ) by knowing the number of runs of both its Lyndon factors and where these conjugates are sorted in the BBWT. The analysis is in the proof of Theorem 10.
Mathematics 13 01070 g012
Figure 13. Lyndon factorization of C k # . Compared to Figure 12, we have one additional Lyndon factor. The analysis is in proof of Theorem 11.
Figure 13. Lyndon factorization of C k # . Compared to Figure 12, we have one additional Lyndon factor. The analysis is in proof of Theorem 11.
Mathematics 13 01070 g013
Figure 14. Introducing the Lyndon word C k c studied in Section 7.2.
Figure 14. Introducing the Lyndon word C k c studied in Section 7.2.
Mathematics 13 01070 g014
Figure 15. Introducing the Lyndon word C k c studied in Section 7.2.2.
Figure 15. Introducing the Lyndon word C k c studied in Section 7.2.2.
Mathematics 13 01070 g015
Table 2. Classification of the number of runs obtain in Theorem 2. The total number of runs is 8 k 17 .
Table 2. Classification of the number of runs obtain in Theorem 2. The total number of runs is 8 k 17 .
BWT of W k # Runs
β ( # ) = b 1
β ( a i b ) = ba k i 2 for all 4 i k 2 2 k 11 but, when merged, 2 k 12
β ( aaab ) = b 5 ( ab ) k 6 a 2 k 10 but, when merged, 2 k 11
β ( aab ) = aaba 2 k 8 3 but, when merged, 2
β ( ab ) = b k 2 # aba 2 k 6 5
β ( b i # ) = b for all i [ 1 . . k 1 ] k 1
β ( ba ) = a k 5 bbbab k 5 ab k 2 a 7
β ( b j a ) = a b 2 ( k j 1 ) a for all j [ 2 . . k 2 ] 3 ( k 3 )
β ( b k 1 a ) = aa 1
β ( b k # ) = a 1 but, when merged, 0
Table 3. Lexicographically sorted conjugates of W k # studied in Theorem 2, Part 1.
Table 3. Lexicographically sorted conjugates of W k # studied in Theorem 2, Part 1.
PrefixRemaining PartBWT
# P 2 b
a k 2 b b k 1 # b
a k 3 b b k 2 aa b
b k 1 # a
a k 4 b b k 3 aa b
b k 2 aa a
b k 1 # a
.........
Table 4. Lexicographically sorted conjugates of W k # studied in Theorem 2, Part 2.
Table 4. Lexicographically sorted conjugates of W k # studied in Theorem 2, Part 2.
PrefixRemaining PartBWT
a 3 b bab b
bbaba b
bbbabaa b
b 4 aa b
b 4 aba 3 b
b 5 aa a
b 5 aba 4 b
b 6 aa a
b 6 aba 5 b
......
b k 2 aa a
b k 2 aba k 3 b
b k 1 # a
a 2 b bab a
bbaba a
bbbaa b
bbbabaa a
b 4 aa a
b 4 aba 3 a
......
b k 2 aa a
b k 2 aba k 3 a
b k 1 # a
Table 5. Lexicographically sorted conjugates of W k # studied in Theorem 2, Part 3.
Table 5. Lexicographically sorted conjugates of W k # studied in Theorem 2, Part 3.
PrefixRemaining PartBWT
ab a k 3 Q k # b
a k 4 P k 1 b
......
P 3 b
baa #
bab a
bbaa b
bbaba a
b 3 aa a
b 3 aba 2 a
......
b k 2 aa a
b k 2 aba k 3 a
b k 1 # a
b # P 2 b
ba a k 4 Q k # a
a k 5 P k 1 a
......
a 2 P 6 a
b 2 ab b
b 3 aba b
b 4 abaa b
aab 5 aa a
aab 5 aba 3 b
......
aab k 1 aba k 3 b
P 4 a
b P 3 b
......
ba k 3 Q k # b
b 3 aa a
Table 6. Lexicographically sorted conjugates of W k # studied in Theorem 2, Part 4.
Table 6. Lexicographically sorted conjugates of W k # studied in Theorem 2, Part 4.
PrefixRemaining PartBWT
bb # P 2 b
b 2 a aab 2 ab a
aab 3 aba b
......
aab k 1 aba k 3 b
ba k 3 Q k # b
......
ba P 4 b
b P 3 a
bbb # P 2 b
b 3 a aab 3 aba a
aab 4 abaa b
......
aab k 1 aba k 3 b
ba k 3 Q k # b
......
baa P 5 b
ba P 4 a
bbbb # P 2 b
.........
Table 7. Lexicographically sorted conjugates of W k # studied in Theorem 2, Part 5.
Table 7. Lexicographically sorted conjugates of W k # studied in Theorem 2, Part 5.
PrefixRemaining PartBWT
b k 2 # P 2 b
b k 2 a aab k 2 aba k 4 a
aab k 1 aba k 3 b
ba k 3 Q k # b
ba k 4 P k 1 a
b k 1 # P 2 b
b k 2 a aab k 1 aba k 3 a
ba k 3 Q k # a
b k # P 2 a
Table 8. Classification of the number of runs obtained in Theorem 3. The total number of runs is 6 k 12 .
Table 8. Classification of the number of runs obtained in Theorem 3. The total number of runs is 6 k 12 .
BWT of W k ¯ Runs
β ( a k b ) = b 1
β ( a i b ) = ba 2 ( k 1 i ) + 1 b for all i [ 2 . . k 1 ] 2 k 3 but, when merged, 2 k 4
β ( ab ) = ba k 2 baa k 5 baaab k 5 7 but, when merged, 6
β ( ba ) = b 2 k 6 abba k 2 4 but, when merged, 3
β ( bba ) = b 2 k 8 abba 4
β ( bbba ) = b ( ab ) k 6 a 5 2 k 10
β ( b i a ) = b k i 2 a , for all i [ 4 . . k 2 ] 2 k 12
Table 9. Classification of the number of runs obtain in Theorem 4. The total number of runs is 8 k 17 .
Table 9. Classification of the number of runs obtain in Theorem 4. The total number of runs is 8 k 17 .
BWT of W k ¯ c Runs
β ( a k c ) = b 1
β ( a k 1 b ) = bb 1 but, when merged, 0
β ( a i b ) = ba 2 k 2 i 2 b for all i [ 2 . . k 2 ] 3 k 9
β ( a i c ) = a for all i [ 1 . . k 1 ] k 1
β ( ab ) = ba k 2 ba k 5 baaab k 5 7
β ( ba ) = b 2 k 6 abc a k 2 5
β ( bba ) = b 2 k 8 abb 3
β ( bbba ) = b ( ab ) k 6 a 5 2 k 10 but, when merged, 2 k 11
β ( b i a ) = b k i 2 a for all i [ 4 . . k 2 ] 2 k 12
β ( c ) = a 1 but, when merged, 0
Table 10. Lexicographically sorted conjugates of W k ¯ c studied in Theorem 4, Part 1.
Table 10. Lexicographically sorted conjugates of W k ¯ c studied in Theorem 4, Part 1.
PrefixRemaining PartBWT
a k c P 2 ¯ b
a k 1 b ab k 2 b
bba k 1 b
a k 1 c P 2 ¯ a
a k 2 b ab k 3 b
ab k 2 a
bba k 1 a
bba k 2 b
a k 2 c P 2 ¯ a
a k 3 b ab k 4 b
ab k 3 a
ab k 2 a
bba k 1 a
bba k 2 a
bba k 3 b
a k 3 c P 2 ¯ a
.........
aab ab b
ab 2 a
... a
ab k 2 a
bba k 1 a
... a
bba 3 a
bba 2 b
aa c P 2 ¯ a
Table 11. Lexicographically sorted conjugates of W k ¯ c studied in Theorem 4, Part 2.
Table 11. Lexicographically sorted conjugates of W k ¯ c studied in Theorem 4, Part 2.
PrefixRemaining PartBWT
ab aaabb b
a P 3 ¯ a
ab P 4 ¯ a
abb P 5 ¯ a
... a
ab k 4 P k 1 ¯ a
ab k 3 Q k ¯ c a
P 4 ¯ b
b E k 1 ¯ a
b E k 2 ¯ a
... a
b E 5 ¯ a
b P 5 ¯ b
b E 4 ¯ a
b E 3 ¯ a
b E 2 ¯ a
bb P 6 ¯ b
bbb P 7 ¯ b
bbbb P 8 ¯ b
... b
b k 5 P k 1 ¯ b
b k 4 Q k ¯ c b
a c P 2 ¯ a
Table 12. Lexicographically sorted conjugates of W k ¯ c studied in Theorem 4, Part 3.
Table 12. Lexicographically sorted conjugates of W k ¯ c studied in Theorem 4, Part 3.
PrefixRemaining PartBWT
ba a k 1 c b
a k 2 bab k 3 b
a k 2 bb b
a k 3 bab k 4 b
a k 3 bb b
......
aaababb b
aaabb b
aaabab b
aabb a
aba b
abb #
P 3 ¯ a
b P 4 ¯ a
... a
b k 4 P k 1 ¯ a
b k 3 Q k ¯ c a
bba a k 1 c b
a k 2 bab k 3 b
a k 2 bb b
......
a 4 bab 3 b
a 4 bb b
a 3 babb b
aaabb a
aabab b
aba b
Table 13. Lexicographically sorted conjugates of W k ¯ c studied in Theorem 4, Part 4.
Table 13. Lexicographically sorted conjugates of W k ¯ c studied in Theorem 4, Part 4.
PrefixRemaining PartBWT
bbba a k 1 c b
a k 2 bab k 3 a
a k 2 bb b
a k 3 bab k 4 a
a k 3 bb b
......
a 5 bab 4 a
a 5 bb b
a 4 bab 3 a
a 4 bb a
a 3 bab 2 a
aabab a
aba a
bbbba a k 1 c b
a k 2 bb b
... b
a 6 bb b
a 5 bb a
.........
b k 3 a a k 1 c b
a k 2 bb a
b k 2 a a k 1 c a
c P 2 ¯ a
Table 14. Classification of the number of runs obtain in Lemma 47. The total number of runs is 8 k 18 .
Table 14. Classification of the number of runs obtain in Lemma 47. The total number of runs is 8 k 18 .
BBWT of W k ¯ c Runs
β ( a k 2 b ) = b 1
β ( a i b ) = ba k i 2 , for all i [ 4 . . k 3 ] 2 k 12 but, when merged, 2 k 13
β ( aaab ) = bbbbb ( ab ) k 7 baa 2 k 12
β ( aab ) = baaba 2 k 8 4
β ( ab ) = b k 3 aaba 2 k 6 4
β ( ba ) = ba k 6 bbbab k 4 ab k 3 a 8
β ( b j a ) = bab 2 k 2 j 2 a for all j [ 2 . . k 2 ] 4 k 12
β ( b k 1 a ) = aab 2 but, when merged, 1
β ( b k a ) = a 1
Table 15. Classification of the number of runs obtain in Theorem 12. The total number of runs is 8 k 12 .
Table 15. Classification of the number of runs obtain in Theorem 12. The total number of runs is 8 k 12 .
BWT of C k c Runs
β ( a k 2 b ) = c 1
β ( a i b ) = ba k i 2 for all i [ 4 . . k 3 ] 2 k 12
β ( aaab ) = bbbbb ( ab ) k 6 a 2 k 10
β ( aab ) = baaba 2 k 8 4
β ( ab ) = b k 3 aaba 2 k 6 b 5
β ( ba ) = a k 6 bbbab k 4 ab k 3 ab 8
β ( b i a ) = ab 2 k 2 i 2 ab for all i [ 2 . . k 2 ] 4 k 12
β ( b k 1 a ) = aba 3
β ( b k a ) = a 1 but when merged 0
β ( bc ) = a 1 but when merged 0
β ( c ) = b 1
Table 16. Classification of the number of runs obtained in Theorem 13. The total number of runs is 8 k 13 .
Table 16. Classification of the number of runs obtained in Theorem 13. The total number of runs is 8 k 13 .
BWT of C k c Runs
β ( a k 2 b ) = c 1
β ( a i b ) = ba k 2 i for all i [ 4 . . k 3 ] 2 k 12
β ( aaab ) = bbbbb ( ab ) k 6 a 2 k 10
β ( aab ) = baaba 2 k 8 4
β ( ab ) = b k 3 aaba 2 k 6 4
β ( ac ) = b 1
β ( ba ) = a k 6 bbbab k 4 ab k 3 ab 8
β ( b i a ) = ab 2 k 2 i 2 ab for all i [ 2 . . k 2 ] 4 k 12
β ( b k 1 a ) = aba 3
β ( b k a ) = a 1 but, when merged, 0
β ( c ) = a 1 but, when merged, 0
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jeon, H.; Köppl, D. Compression Sensitivity of the Burrows–Wheeler Transform and Its Bijective Variant. Mathematics 2025, 13, 1070. https://doi.org/10.3390/math13071070

AMA Style

Jeon H, Köppl D. Compression Sensitivity of the Burrows–Wheeler Transform and Its Bijective Variant. Mathematics. 2025; 13(7):1070. https://doi.org/10.3390/math13071070

Chicago/Turabian Style

Jeon, Hyodam, and Dominik Köppl. 2025. "Compression Sensitivity of the Burrows–Wheeler Transform and Its Bijective Variant" Mathematics 13, no. 7: 1070. https://doi.org/10.3390/math13071070

APA Style

Jeon, H., & Köppl, D. (2025). Compression Sensitivity of the Burrows–Wheeler Transform and Its Bijective Variant. Mathematics, 13(7), 1070. https://doi.org/10.3390/math13071070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop