#
Reversed Lempel–Ziv Factorization with Suffix Trees^{ †}

^{†}

## Abstract

**:**

## 1. Introduction

#### 1.1. Our Contribution

`#`. However, they need random access to the suffix array, which makes it hard to achieve linear time for working space bounds within $o(nlgn)$ bits. We can omit the need for random access to the suffix array by a different approach based on suffix tree traversals. As a precursor of this line of research we can include the work of Gusfield [17] ([APL16]) and Nakashima et al. [18]. The former studies the non-overlapping Lempel–Ziv–Storer–Szymanski (LZSS) factorization [2,19] while the latter the Lempel–Ziv-78 factorization [20]. Although their used techniques are similar to ours, they still need $\mathcal{O}(nlgn)$ bits of space. To actually improve the space bounds, we follow two approaches: On the one hand, we use the leaf-to-root traversals proposed by Fischer et al. [21] ([Section 3]) for the overlapping LZSS factorization [2] during which they mark visited nodes acting as signposts for candidates for previous occurrences of the factors. On the other hand, we use the root-to-leaf traversals proposed in [22] for the leaves corresponding to the text positions of T to find the lowest marked nodes whose paths to the root constitute the lengths of the non-overlapping LZSS factors. Although we mimic two approaches for computing factorizations different to the reversed LZ factorization, we can show that these traversals on the suffix tree of $T\xb7\#\xb7{T}^{\mathsf{R}}$ help us to detect the factors of the reversed LZ factorization. Our result is as follows:

**Theorem**

**1.**

- in $\mathcal{O}\left({\u03f5}^{-1}n\right)$ time using $(2+\u03f5)nlgn+\mathcal{O}\left(n\right)$ bits (excluding the read-only text T), or
- in $\mathcal{O}\left({\u03f5}^{-1}n\right)$ time using $\mathcal{O}({\u03f5}^{-1}nlg\sigma )$ bits,

**Theorem**

**2.**

- in $\mathcal{O}\left({\u03f5}^{-1}n\right)$ time using $(2+\u03f5)nlgn+\mathcal{O}\left(n\right)$ bits (excluding the read-only text T), or
- in $\mathcal{O}\left({\u03f5}^{-1}n{log}_{\sigma}^{\u03f5}n\right)$ time using $\mathcal{O}({\u03f5}^{-1}nlg\sigma )$ bits,

#### 1.2. Related Work

#### 1.3. Structure of This Article

## 2. Preliminaries

`c`in $T\left[1..j\right]$, and the select query $T.{select}_{\mathtt{c}}\left(j\right)$ gives the position of the j-th

`c`in T, if it exists. We stipulate that ${rank}_{\mathtt{c}}\left(0\right)={select}_{\mathtt{c}}\left(0\right)=0$. If the alphabet is binary, i.e., when T is a bit vector, there are data structures [35,36] that use $o\left(\right|T\left|\right)$ extra bits of space, and can compute ${rank}_{}$ and ${select}_{}$ in constant time, respectively. There are representations [37] with the same constant-time bounds that can be constructed in time linear in $\left|T\right|$. We say that a bit vector has a rank-support and a select-support if it is endowed by data structures providing constant time access to ${rank}_{}$ and ${select}_{}$, respectively.

`#`and

`$`that do not appear in T, with $\$<\#<c$ for every character $c\in \mathsf{\Sigma}$. Under this assumption, none of the suffixes of $T\xb7\#$ and ${T}^{\mathsf{R}}\xb7\$$ has another suffix as a prefix. Let $R:=T\xb7\#\xb7{T}^{\mathsf{R}}\xb7\$$. R has length $\left|R\right|=2\left|T\right|+2=2n$.

- $\mathrm{leaf}\_\mathrm{rank}\left(\lambda \right)\hspace{1em}$
- returns the leaf-rank of the leaf $\lambda $;
- $\mathrm{depth}\left(v\right)\hspace{1em}$
- returns the depth of the node v, which is the number of nodes on the path between v and the root (exclusive) such that root has depth zero;
- $\mathrm{level}\_\mathrm{anc}(\lambda ,d)\hspace{1em}$
- returns the level-ancestor of the $\lambda $ on depth d; and
- $\mathrm{lca}(u,v)\hspace{1em}$
- returns the lowest common ancestor (LCA) of u and v.

## 3. Reversed LZ Factorization

#### 3.1. Coding

#### 3.2. Factorization Algorithm

**Lemma**

**1.**

**Proof.**

- ${T[j-\ell +1.\phantom{\rule{0.166667em}{0ex}}.j]}^{\mathsf{R}}=T[i.\phantom{\rule{0.166667em}{0ex}}.i+\ell -1]$; or
- $R[i\phantom{\rule{0.222222em}{0ex}}.\phantom{\rule{0.166667em}{0ex}}.]$ and $R[2n-j\phantom{\rule{0.222222em}{0ex}}.\phantom{\rule{0.166667em}{0ex}}.]$ have the longest common prefix of length ℓ.

#### 3.2.1. Overview

`#`at position n), we do not need Player 1 to declare the leaf with suffix number n a phrase leaf. We also terminate the algorithm when both players meet at position n without checking whether we have found a new factor starting at position n.

Algorithm 1: Algorithm of Section 3.2.2 computing the non-overlapping reversed LZ factorization. The function $\mathrm{max}\_\mathrm{sufnum}$ is described in Section 3.3. |

#### 3.2.2. One-Pass Algorithm in Detail

`foreach`loop (Line 20) of the algorithm can be more verbosely expressed with a loop iterating over all depth offsets d in increasing order while computing $v\leftarrow \mathrm{level}\_\mathrm{anc}({\lambda}^{\mathsf{R}},d)$ until either reaching the root or a node marked in ${B}_{V}$. Subsequently, the turn of Player 1 starts (cf. Line 7). We depict the state after the first turn of Player 2 in Figure 4.

#### 3.2.3. Time Complexity

#### 3.3. Determining the Referred Position

- $\mathrm{max}\_\mathrm{sufnum}\left(v\right)\hspace{1em}$
- returning the maximum among all suffix numbers of the leaves in the subtree rooted in v.

Algorithm 2: Determining the referred positions in a second pass described in Section 3.3. |

## 4. Computing $\mathsf{LPnrF}$

`#`at the end is not needed, but simplifies the analysis for $T[1..n-1]\xb7\#$ having precisely n characters.) Having $\mathsf{LPnrF}$, we can iteratively compute the reversed LZ factorization because ${F}_{x}=T[{k}_{x}.\phantom{\rule{0.166667em}{0ex}}.{k}_{x}+max(0,\mathsf{LPnrF}\left[{k}_{x}\right]-1)]$ with ${k}_{x}1+{\sum}_{y=1}^{x-1}\left|{F}_{y}\right|$ for $x\in \left[1..z\right]$.

**Lemma**

**2**

**.**$\mathsf{LPnrF}[i-1]-1\le \mathsf{LPnrF}\left[i\right]\le n-i$ for $i\in \left[2.\phantom{\rule{0.166667em}{0ex}}.n\right]$.

#### 4.1. Adaptation of the Single-Pass Algorithm

#### 4.2. Algorithm of Crochemore et al.

- $\mathrm{maxsuf}\_\mathrm{leaf}({j}_{1},{j}_{2})\hspace{1em}$
- returning the leaf with the maximum suffix number among all leaves whose leaf-ranks are in $\left[{j}_{1}.\phantom{\rule{0.166667em}{0ex}}.{j}_{2}\right]$.

- If $\mathrm{leaf}\_\mathrm{rank}\left({\lambda}^{\prime}\right)=\mathrm{leaf}\_\mathrm{rank}\left(\lambda \right)+1$ (meaning ${\lambda}^{\prime}$ is to the right of $\lambda $ and there is no leaf between $\lambda $ and ${\lambda}^{\prime}$), we terminate.
- Otherwise, we set ${\lambda}_{R}^{\prime}$ to the leaf with the largest suffix number among the leaves with leaf-ranks in the range $[\mathrm{leaf}\_\mathrm{rank}\left(\lambda \right)+1.\phantom{\rule{0.166667em}{0ex}}.\mathrm{leaf}\_\mathrm{rank}\left({\lambda}^{\prime}\right)-1]$. If $\mathrm{sufnum}\left({\lambda}_{R}^{\prime}\right)>2n-i$, we set ${\lambda}_{R}\leftarrow {\lambda}_{R}^{\prime}$ and recurse. Otherwise we terminate.

## 5. Open Problems

#### 5.1. Overlapping Reversed LZ Factorization

`a`and $P\mathtt{a}$. The second factor is $P\mathtt{a}$ since ${\left(P\mathtt{a}\right)}^{\mathsf{R}}=\mathtt{a}P$. However, a coding of the second factor needs to store additional information about P to support restoring the characters of this factor. It seems that we need to store the entire left arm of P, including the middle character for odd palindromes.

#### 5.2. Computing $\mathsf{LPF}$ in Linear Time with Compressed Space

#### 5.3. Applications in Compressors

**Lemma**

**3.**

**Proof.**

`ab`,

`bc`, or

`ca`) as a substring (cf. [46] ([Theorem 5])). □

**Lemma**

**4**

## Funding

`JP18F18120`and

`JP21K17701`.

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A. Flip Book

## References

- Kolpakov, R.; Kucherov, G. Searching for gapped palindromes. Theor. Comput. Sci.
**2009**, 410, 5365–5373. [Google Scholar] [CrossRef] [Green Version] - Storer, J.A.; Szymanski, T.G. Data compression via textural substitution. J. ACM
**1982**, 29, 928–951. [Google Scholar] [CrossRef] - Crochemore, M.; Langiu, A.; Mignosi, F. Note on the greedy parsing optimality for dictionary-based text compression. Theor. Comput. Sci.
**2014**, 525, 55–59. [Google Scholar] [CrossRef] - Weiner, P. Linear Pattern Matching Algorithms. In Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973) SWAT, Iowa City, IA, USA, 15–17 October 1973; pp. 1–11. [Google Scholar]
- Sugimoto, S.; Tomohiro, I.; Inenaga, S.; Bannai, H.; Takeda, M. Computing Reversed Lempel–Ziv Factorization Online. Available online: http://stringology.org/papers/PSC2013.pdf#page=115 (accessed on 15 April 2021).
- Chairungsee, S.; Crochemore, M. Efficient Computing of Longest Previous Reverse Factors. In Proceedings of the Computer Science and Information Technologies, Yerevan, Armenia, 28 September–2 October 2009; pp. 27–30. [Google Scholar]
- Badkobeh, G.; Chairungsee, S.; Crochemore, M. Hunting Redundancies in Strings. In International Conference on Developments in Language Theory; LNCS; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6795, pp. 1–14. [Google Scholar]
- Chairungsee, S. Searching for Gapped Palindrome. Available online: https://www.sciencedirect.com/science/article/pii/S0304397509006409 (accessed on 15 April 2021).
- Charoenrak, S.; Chairungsee, S. Palindrome Detection Using On-Line Position. In Proceedings of the 2017 International Conference on Information Technology, Singapore, 27–29 December 2017; pp. 62–65. [Google Scholar]
- Charoenrak, S.; Chairungsee, S. Algorithm for Palindrome Detection by Suffix Heap. In Proceedings of the 2019 7th International Conference on Information Technology: IoT and Smart City, Shanghai China, 20–23 December 2019; pp. 85–88. [Google Scholar]
- Blumer, A.; Blumer, J.; Ehrenfeucht, A.; Haussler, D.; McConnell, R.M. Building the Minimal DFA for the Set of all Subwords of a Word On-line in Linear Time. In International Colloquium on Automata, Languages, and Programming; LNCS; Springer: Berlin/Heidelberg, Germany, 1984; Volume 172, pp. 109–118. [Google Scholar]
- Ehrenfeucht, A.; McConnell, R.M.; Osheim, N.; Woo, S. Position heaps: A simple and dynamic text indexing data structure. J. Discret. Algorithms
**2011**, 9, 100–121. [Google Scholar] [CrossRef] [Green Version] - Gagie, T.; Hon, W.; Ku, T. New Algorithms for Position Heaps. In Annual Symposium on Combinatorial Pattern Matching; LNCS; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7922, pp. 95–106. [Google Scholar]
- Crochemore, M.; Iliopoulos, C.S.; Kubica, M.; Rytter, W.; Walen, T. Efficient algorithms for three variants of the LPF table. J. Discret. Algorithms
**2012**, 11, 51–61. [Google Scholar] [CrossRef] [Green Version] - Manber, U.; Myers, E.W. Suffix Arrays: A New Method for On-Line String Searches. SIAM J. Comput.
**1993**, 22, 935–948. [Google Scholar] [CrossRef] - Dumitran, M.; Gawrychowski, P.; Manea, F. Longest Gapped Repeats and Palindromes. Discret. Math. Theor. Comput. Sci.
**2017**, 19, 205–217. [Google Scholar] - Gusfield, D. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
- Nakashima, Y.; Tomohiro, I.; Inenaga, S.; Bannai, H.; Takeda, M. Constructing LZ78 tries and position heaps in linear time for large alphabets. Inf. Process. Lett.
**2015**, 115, 655–659. [Google Scholar] [CrossRef] [Green Version] - Ziv, J.; Lempel, A. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory
**1977**, 23, 337–343. [Google Scholar] [CrossRef] [Green Version] - Ziv, J.; Lempel, A. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory
**1978**, 24, 530–536. [Google Scholar] [CrossRef] [Green Version] - Fischer, J.; Tomohiro, I; Köppl, D.; Sadakane, K. Lempel–Ziv Factorization Powered by Space Efficient Suffix Trees. Algorithmica
**2018**, 80, 2048–2081. [Google Scholar] [CrossRef] - Köppl, D. Non-Overlapping LZ77 Factorization and LZ78 Substring Compression Queries with Suffix Trees. Algorithms
**2021**, 14, 44. [Google Scholar] [CrossRef] - Sadakane, K. Compressed Suffix Trees with Full Functionality. Theory Comput. Syst.
**2007**, 41, 589–607. [Google Scholar] [CrossRef] [Green Version] - Belazzougui, D.; Cunial, F. Indexed Matching Statistics and Shortest Unique Substrings. In International Symposium on String Processing and Information Retrieval; LNCS; Springer: Cham, Switzerland, 2014; Volume 8799, pp. 179–190. [Google Scholar]
- Franek, F.; Holub, J.; Smyth, W.F.; Xiao, X. Computing Quasi Suffix Arrays. J. Autom. Lang. Comb.
**2003**, 8, 593–606. [Google Scholar] - Crochemore, M.; Ilie, L. Computing Longest Previous Factor in linear time and applications. Inf. Process. Lett.
**2008**, 106, 75–80. [Google Scholar] [CrossRef] [Green Version] - Belazzougui, D.; Cunial, F.; Kärkkäinen, J.; Mäkinen, V. Linear-time String Indexing and Analysis in Small Space. ACM Trans. Algorithms
**2020**, 16, 17:1–17:54. [Google Scholar] [CrossRef] - Goto, K.; Bannai, H. Space Efficient Linear Time Lempel–Ziv Factorization for Small Alphabets. In Proceedings of the 2014 Data Compression Conference, Snowbird, UT, USA, 26–28 March 2014; pp. 163–172. [Google Scholar]
- Kärkkäinen, J.; Kempa, D.; Puglisi, S.J. Lightweight Lempel–Ziv Parsing. In International Symposium on Experimental Algorithms; LNCS; Springer: Berlin/Heidelberg, Germany, 2013; Volume 7933, pp. 139–150. [Google Scholar]
- Kosolobov, D. Faster Lightweight Lempel–Ziv Parsing. In International Symposium on Mathematical Foundations of Computer Science; LNCS; Springer: Berlin/Heidelberg, Germany, 2015; Volume 9235, pp. 432–444. [Google Scholar]
- Belazzougui, D.; Puglisi, S.J. Range Predecessor and Lempel–Ziv Parsing. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, Arlington, VA, USA, 10–12 January 2016; pp. 2053–2071. [Google Scholar]
- Okanohara, D.; Sadakane, K. An Online Algorithm for Finding the Longest Previous Factors. In European Symposium on Algorithms; LNCS; Springer: Berlin/Heidelberg, Germany, 2008; Volume 5193, pp. 696–707. [Google Scholar]
- Prezza, N.; Rosone, G. Faster Online Computation of the Succinct Longest Previous Factor Array. In Conference on Computability in Europe; LNCS; Springer: Cham, Switzerland, 2020; Volume 12098, pp. 339–352. [Google Scholar]
- Bannai, H.; Inenaga, S.; Köppl, D. Computing All Distinct Squares in Linear Time for Integer Alphabets. In Proceedings of the 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017), Warsaw, Poland, 4–6July 2017; Volume 78, LIPIcs. pp. 22:1–22:18. Available online: https://link.springer.com/chapter/10.1007/978-3-662-48057-1_16 (accessed on 16 April 2021).
- Jacobson, G. Space-efficient Static Trees and Graphs. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science Research, Triangle Park, NC, USA, 30 October–1 November 1989; pp. 549–554. [Google Scholar]
- Clark, D.R. Compact Pat Trees. Ph.D. Thesis, University of Waterloo, Waterloo, ON, Canada, 1996. [Google Scholar]
- Baumann, T.; Hagerup, T. Rank-Select Indices Without Tears. In Proceedings of the Algorithms and Data Structures—16th International Symposium, WADS 2019, Edmonton, AB, Canada, 5–7 August 2019; LNCS. Volume 11646, pp. 85–98. [Google Scholar]
- Munro, J.I.; Navarro, G.; Nekrich, Y. Space-Efficient Construction of Compressed Indexes in Deterministic Linear Time. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, Barcelona, Spain, 16–19 January 2017; pp. 408–424. [Google Scholar]
- Burrows, M.; Wheeler, D.J. A Block Sorting Lossless Data Compression Algorithm; Technical Report 124; Digital Equipment Corporation: Palo Alto, CA, USA, 1994. [Google Scholar]
- Lempel, A.; Ziv, J. On the Complexity of Finite Sequences. IEEE Trans. Inf. Theory
**1976**, 22, 75–81. [Google Scholar] [CrossRef] - Fischer, J.; Mäkinen, V.; Navarro, G. Faster entropy-bounded compressed suffix trees. Theor. Comput. Sci.
**2009**, 410, 5354–5364. [Google Scholar] [CrossRef] [Green Version] - Manacher, G.K. A New Linear-Time “On-Line” Algorithm for Finding the Smallest Initial Palindrome of a String. J. ACM
**1975**, 22, 346–351. [Google Scholar] [CrossRef] - Apostolico, A.; Breslauer, D.; Galil, Z. Parallel Detection of all Palindromes in a String. Theor. Comput. Sci.
**1995**, 141, 163–173. [Google Scholar] [CrossRef] [Green Version] - Köppl, D. Exploring Regular Structures in Strings. Ph.D. Thesis, TU Dortmund, Dortmund, Germany, 2018. [Google Scholar]
- Grossi, R.; Vitter, J.S. Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. SIAM J. Comput.
**2005**, 35, 378–407. [Google Scholar] [CrossRef] - Fleischer, L.; Shallit, J.O. Words Avoiding Reversed Factors, Revisited. arXiv
**2019**, arXiv:1911.11704. [Google Scholar]

**Figure 1.**The reversed LZ and the non-overlapping LZSS factorization of the string $T=\mathtt{abbabbabab}$. A factor F is visualized by a rounded rectangle. Its coding consists of a mere character if it has no reference; otherwise, its coding consists of its referred position p and its length ℓ such that $F={T[p-\ell +1.\phantom{\rule{0.166667em}{0ex}}.p]}^{\mathsf{R}}$ for the reversed LZ factorization, and $F=T[p.\phantom{\rule{0.166667em}{0ex}}.p+\ell -1]$ for the non-overlapping LZSS factorization.

**Figure 2.**Witness node w of a referencing factor F starting at text position i. Given j is the referred position of F, the witness w of F is the node in the suffix tree having (a) F as a prefix of its string label and (b) the leaves with suffix numbers $2n-j$ and i in its subtree. lemWitness shows that w is uniquely defined to be the node whose string label is F.

**Figure 3.**A reversed-LZ factor F starting at position i in R with a referred position $j\ge \left|F\right|+1$. If $\mathtt{a}=\overline{\mathtt{a}}$ with $\mathtt{a},\overline{\mathtt{a}}\in \mathsf{\Sigma}$, then we could extend F by one character, contradicting its definition to be the longest prefix of $T[i\phantom{\rule{0.222222em}{0ex}}.\phantom{\rule{0.166667em}{0ex}}.]$ whose reverse occurs in $T[1.\phantom{\rule{0.166667em}{0ex}}.i-1]$. Hence, $\mathtt{a}\ne \overline{\mathtt{a}}$ and F is a right-maximal repeat.

**Figure 4.**Suffix tree of $T\#\xb7{T}^{\mathsf{R}}\xb7\$$ used in Section 3.2, where $T=\mathtt{abbabbabab}$ is our running example. The nodes are labeled by their preorder numbers. The suffix number of each leaf $\lambda $ is the underlined number drawn in dark yellow below $\lambda $. We trimmed the label of each edge to a leaf having more than two characters and display only the first character and the vertical dots ‘⋮’ as a sign of omission. The tree shows the state of Algorithm 1 after the first turn of both players. The nodes visited by Player 2 are colored in blue ( ), the phrase leaves are colored in green ( ). Player 1 and 2 are represented by the hands and , respectively, pointing to the respective leaves they visited during the first turn.

**Figure 5.**Continuation of Figure 4 with the state at the fifth turn of Player 1. Additionally to the coloring used in Figure 4, witnesses are colored in red ( ). In this figure, Player 1 just finished her turn on making the node with preorder number 32 the witness w of the leaf with suffix number 5. With w we know that the factor starting at text position 5 has the length $\mathrm{str}\_\mathrm{depth}\left(w\right)$ and that the next phrase leaf has suffix number 8. For visualization purposes, we left the hand ( ) of Player 2 below the leaf of her last turn.

**Figure 6.**State of our running example at termination of Algorithm 1. We have computed the bit vector ${B}_{L}$ of length $n=11$ storing a one at the entries $1,2,3,5$, and 8, i.e., the suffix numbers of the phrase leaves, which are marked in green ( ), and the bit vector ${B}_{W}$ of length 38 (the maximum preorder number of an $\mathsf{ST}$ node) storing a one at the entries $20,22$, and 32, i.e., the preorder numbers of the witnesses, which are colored red ( ). During the second pass described in Section 3.3, we compute W storing the referred positions in the order of the witness ranks (left table).

**Figure 7.**Setting of Section 4.1. Nodes marked in ${B}_{V}$ are colored in blue ( ). Curly arcs symbolize paths that can visit multiple nodes (which are not visualized). When visiting the lowest ancestor of $\lambda $ marked in ${B}_{V}$ for computing $\mathsf{LPnrF}[i-1]$, Player 1 determines $\tilde{w}=\mathrm{suffixlink}\left(w\right)$ such that she can skip the nodes on the path from the root to the leaf $\tilde{\lambda}$ for computing $\mathsf{LPnrF}\left[i\right]$ (these nodes are symbolized by the curly arc highlighted in yellow ( ) on the right). There are leaves ${\lambda}^{\mathsf{R}}$ and ${\tilde{\lambda}}^{\mathsf{R}}$ with suffix numbers of at least $2n-i+2$ and $2n-i+3$, respectively, since otherwise w would not have been marked in ${B}_{V}$ by Player 2.

**Figure 8.**Computing $\mathsf{LPnrF}$ with [14] ([Algorithm 2]) as explained in Section 4.2. Starting at the leaf ${\lambda}_{R}$, we jump to the leftmost leaf ${\lambda}^{\prime}$ with $\mathrm{lca}({\lambda}^{\prime},\lambda )=\mathrm{lca}({\lambda}_{R},\lambda )$. Then, we use the operation $\mathrm{max}\_\mathrm{sufnum}\left(\mathcal{I}\right)$ returning the leaf-rank of the leaf ${\lambda}_{R}^{\prime}$ having the largest suffix number among the query interval $\mathcal{I}=[\mathrm{leaf}\_\mathrm{rank}\left(\lambda \right)+1.\phantom{\rule{0.166667em}{0ex}}.\mathrm{leaf}\_\mathrm{rank}\left({\lambda}^{\prime}\right)-1]$. If $\mathrm{sufnum}\left({\lambda}_{R}^{\prime}\right)>2n-i$, we recurse by setting ${\lambda}_{R}\leftarrow {\lambda}_{R}^{\prime}$. The LCA of ${\lambda}_{R}^{\prime}$ and $\lambda $ is at least as deep as the child v of u on the path towards $\lambda $ (the figure shows the case that $v=\mathrm{lca}({\lambda}_{R}^{\prime},\lambda )$), and hence ${\ell}_{R}\left[i\right]$ is at least $\mathrm{str}\_\mathrm{depth}\left(v\right)$ if we recurse.

**Table 1.**Complexity bounds of related approaches described in Section 1.2 for a selectable parameter $\u03f5\in (0,1]$.

$(1+\mathit{\u03f5})\mathit{n}lg\mathit{n}+\mathcal{O}\left(\mathit{n}\right)$ Bits of Working Space (Excluding the Read-Only Text T) | ||
---|---|---|

Reference | Type | Time |

[21] ([Corollary 3.7]) | overlapping LZSS | $\mathcal{O}\left({\u03f5}^{-1}n\right)$ |

[34] ([Lemma 6]) | $\mathsf{LPF}$ | $\mathcal{O}\left({\u03f5}^{-1}n\right)$ |

[22] ([Theorem 1]) | non-overlapping LZSS | $\mathcal{O}\left({\u03f5}^{-1}n\right)$ |

[22] ([Theorem 3]) | $\mathsf{LPnF}$ | $\mathcal{O}\left({\u03f5}^{-1}n\right)$ |

$\mathcal{O}({\mathbf{\u03f5}}^{-\mathbf{1}}\mathbf{n}lg\mathbf{\sigma})$Bits of Working Space | ||

Reference | Type | Time |

[21] ([Corollary 3.4]) | overlapping LZSS | $\mathcal{O}\left({\u03f5}^{-1}n\right)$ |

[34] ([Lemma 6]) | $\mathsf{LPF}$ | $\mathcal{O}\left({\u03f5}^{-1}n{log}_{\sigma}^{\u03f5}n\right)$ |

[22] ([Theorem 1]) | non-overlapping LZSS | $\mathcal{O}\left({\u03f5}^{-1}n{log}_{\sigma}^{\u03f5}n\right)$ |

[22] ([Theorem 3]) | $\mathsf{LPnF}$ | $\mathcal{O}\left({\u03f5}^{-1}n{log}_{\sigma}^{\u03f5}n\right)$ |

**Table 2.**Construction time and needed space in bits for the succinct suffix tree (SST) and compressed suffix tree (CST) representations, cf. [21] ([Section 2.2]).

SST | CST | |
---|---|---|

Time | $\mathcal{O}(n/\u03f5)$ | $\mathcal{O}\left({\u03f5}^{-1}n\right)$ |

Space | $(2+\u03f5)nlgn+\mathcal{O}\left(n\right)$ | $\mathcal{O}({\u03f5}^{-1}nlg\sigma )$ |

**Table 3.**Time bounds for certain operations needed by our LZ factorization algorithms. Although not explicitly mentioned in [21], the time for $\mathrm{prev}\_\mathrm{leaf}$ is obtained with the Burrows–Wheeler transform [39] stored in the CST [38] ([A.1]) by constant-time partial rank queries, see [27] ([Section 3.4]) or [38] ([A.4]).

Operation | SST Time | CST Time |
---|---|---|

$\mathrm{sufnum}\left(\lambda \right)$ | $\mathcal{O}(1/\u03f5)$ | $\mathcal{O}\left(n\right)$ |

$\mathrm{str}\_\mathrm{depth}\left(v\right)$ | $\mathcal{O}(1/\u03f5)$ | $\mathcal{O}\left(\mathrm{str}\_\mathrm{depth}\left(v\right)\right)$ |

$\mathrm{suffixlink}\left(v\right)$ | $\mathcal{O}(1/\u03f5)$ | $\mathcal{O}\left(1\right)$ |

$\mathrm{prev}\_\mathrm{leaf}$ | $\mathcal{O}(1/\u03f5)$ | $\mathcal{O}\left(1\right)$ |

i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
---|---|---|---|---|---|---|---|---|---|---|---|

$T\#$ | a | b | b | a | b | b | a | b | a | b | # |

$\mathsf{LPnrF}$ | 0 | 0 | 2 | 1 | 3 | 3 | 2 | 3 | 2 | 1 | 0 |

$\mathsf{LPnF}$ | 0 | 0 | 1 | 3 | 3 | 3 | 2 | 3 | 2 | 1 | 0 |

$\mathsf{LPrF}$ | 0 | 6 | 5 | 5 | 4 | 3 | 2 | 3 | 2 | 1 | 0 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Köppl, D.
Reversed Lempel–Ziv Factorization with Suffix Trees. *Algorithms* **2021**, *14*, 161.
https://doi.org/10.3390/a14060161

**AMA Style**

Köppl D.
Reversed Lempel–Ziv Factorization with Suffix Trees. *Algorithms*. 2021; 14(6):161.
https://doi.org/10.3390/a14060161

**Chicago/Turabian Style**

Köppl, Dominik.
2021. "Reversed Lempel–Ziv Factorization with Suffix Trees" *Algorithms* 14, no. 6: 161.
https://doi.org/10.3390/a14060161