Reversed Lempel–Ziv Factorization with Sufﬁx Trees †

: We present linear-time algorithms computing the reversed Lempel–Ziv factorization [Kolpakov and Kucherov, TCS’09] within the space bounds of two different sufﬁx tree representations. We can adapt these algorithms to compute the longest previous non-overlapping reverse factor table [Crochemore et al., JDA’12] within the same space but pay a multiplicative logarithmic time penalty.


Introduction
The non-overlapping reversed Lempel-Ziv (LZ) factorization was introduced by Kolpakov and Kucherov [1] as a helpful tool for detecting gapped palindromes, i.e., substrings of a given text T of the form S R GS for two strings S and G, where S R denotes the reverse of S. This factorization is defined as follows: Given a factorization T = F 1 · · · F z for a string T, it is the non-overlapping reversed LZ factorization of T if each factor F x , for x ∈ [1 . . z], is either the leftmost occurrence of a character or the longest prefix of F x · · · F z whose reverse has an occurrence in F 1 · · · F x−1 . It is a greedy parsing in the sense that it always selects the longest possible such prefix as the candidate for the factor F x . The factorization can be written like a macro scheme [2], i.e., by a list storing either plain characters or pairs of referred positions and lengths, where a referred position is a previous text position from where the characters of the respective factor can be borrowed. Among all variants of such a left-to-right parsing using the reversed as a reference to the formerly parsed part of the text, the greedy parsing achieves optimality with respect to the number of factors [3] ([Theorem 3.1]) since the reversed occurrence of F x can be the prefix of any suffix in F 1 · · · F x−1 , and thus fulfills the suffix-closed property [3] ([Definition 2.2]). Kolpakov and Kucherov [1] also gave an algorithm computing the reversed LZ factorization in O(n lg σ) time using O(n lg n) bits of space, by applying Weiner's suffix tree construction algorithm [4] on the reversed text T R . Later, Sugimoto et al. [5] presented an online factorization algorithm running in O(n lg 2 σ) time using O(n lg σ) bits of space. We can also compute the reversed LZ factorization with the longest previous non-overlapping reverse factor table LPnrF storing the longest previous non-overlapping reverse factor for each text position. There are algorithms [6][7][8][9][10] computing LPnrF in linear time for strings whose characters are drawn from alphabets with constant sizes; their used data structures include the suffix automaton [11], the suffix tree of T R , the position heap [12], and the suffix heap [13]. Finally, Crochemore et al. [14] presented a linear-time algorithm working with integer alphabets by leveraging the suffix array [15]. To find the longest gapped palindromes of the form S R GS with the length of G restricted in a given interval I, Dumitran et al. [16] ([Theorem 1]) restricted the distance of the previous reverse occurrence relative to the starting position of the respective factor within I in their modified definition of LPnrF, and achieved the same time and space bounds of [14]. However, all mentioned linear-time approaches use either pointer-based data structures of O(n lg n) bits, or multiple integer arrays of length n to compute LPnrF or the reversed LZ factorization.

Our Contribution
The aim of this paper is to compute the reversed LZ factorization in less space while retaining the linear time bound. For that, we follow the idea of Crochemore et al. [14] ([Section 4]) who built text indexing data structures on T · # · T R to compute LPnrF for an artificial character #. However, they need random access to the suffix array, which makes it hard to achieve linear time for working space bounds within o(n lg n) bits. We can omit the need for random access to the suffix array by a different approach based on suffix tree traversals. As a precursor of this line of research we can include the work of Gusfield [17] ([APL16]) and Nakashima et al. [18]. The former studies the non-overlapping Lempel-Ziv-Storer-Szymanski (LZSS) factorization [2,19] while the latter the Lempel-Ziv-78 factorization [20]. Although their used techniques are similar to ours, they still need O(n lg n) bits of space. To actually improve the space bounds, we follow two approaches: On the one hand, we use the leaf-to-root traversals proposed by Fischer et al. [21] ([Section 3]) for the overlapping LZSS factorization [2] during which they mark visited nodes acting as signposts for candidates for previous occurrences of the factors. On the other hand, we use the root-to-leaf traversals proposed in [22] for the leaves corresponding to the text positions of T to find the lowest marked nodes whose paths to the root constitute the lengths of the non-overlapping LZSS factors. Although we mimic two approaches for computing factorizations different to the reversed LZ factorization, we can show that these traversals on the suffix tree of T · # · T R help us to detect the factors of the reversed LZ factorization. Our result is as follows: Given a text T of length n − 1 whose characters are drawn from an integer alphabet with size σ = n O(1) , we can compute its reversed LZ factorization On the downside, we have to admit that the results are not based on new tools, but rather a combination of already existing data structures with different algorithmic ideas. On the upside, Theorem 1 presents the first linear-time algorithm computing the reversed LZ factorization using a number of bits linear to the input text T, which is o(n lg n) bits for lg σ = o(lg n). Interestingly, this has not yet been achieved for the seemingly easier non-overlapping LZSS factorization, for which we have O( −1 n log σ n) time within the same space bound [22] ([Theorem 1]). We can also adapt the algorithm of Theorem 1 to compute LPnrF, but losing the linear time for the O(n lg σ)-bits solution: Theorem 2. Given a text T of length n − 1 whose characters are drawn from an integer alphabet with size σ = n O(1) , we can compute a 2n-bits representation of its longest previous non-overlapping reverse factor table LPnrF • in O( −1 n) time using (2 + )n lg n + O(n) bits (excluding the read-only text T), or • in O( −1 n log σ n) time using O( −1 n lg σ) bits, for a selectable parameter ∈ (0, 1]. We can augment our LPnrF representation with an o(n)-bits data structure to provide constant-time random access to LPnrF entries.
We obtain the 2n-bits representation of LPnrF with the same compression technique used for the permuted longest common prefix array [23] ([Theorem 1]), see [24] ([Definition 4]) for several other examples.

Related Work
To put the above theorems into the context of space-efficient factorization algorithms that can also compute factor tables like LPnrF, we briefly list some approaches for different variants of the LZ factorization and of LPnrF. We give Table 1 as an overview. We are aware of approaches to compute the overlapping and non-overlapping LZSS factorization, as well as the longest previous factor (LPF) table LPF [25,26] and the longest previous non-overlapping table LPnF [14]. We can observe in Table 1 that only the overlapping LZSS factorization does not come with a multiplicative log σ n time penalty when working within O( −1 n lg σ) bits. Note that the time and space bounds have an additional multiplicative −1 penalty (unlike described in the references therein) because the currently best construction algorithms of the compressed suffix tree (described later in Section 2) works in O( −1 n) time and needs O( −1 n lg σ) bits of space [27] ([Section 6.1]).
Regarding space-efficient algorithms computing the LZSS factorization, we are aware of the linear-time algorithm of Goto and Bannai [28] using n lg n + O(σ lg n) bits of working space. For n bits of space, Kärkkäinen et al. [29] can compute the factorization in O(n lg n lg lg σ) time, which got improved to O(n(lg σ + lg lg n)) by Kosolobov [30]. Finally, the algorithm of Belazzougui and Puglisi [31] uses O(n lg σ) bits of working space and O(n lg lg σ) time.
Another line of research is the online computation of LPF. Here, Okanohara and Sadakane [32] gave a solution that works in n lg σ + O(n) bits of space and needs O(n lg 3 n) time. This time bound got recently improved to O(n lg 2 n) by Prezza and Rosone [33].

Structure of this Article
This article is structured as follows: In Section 2, we start with the introduction of the suffix tree representations we build on the string T · # · T R , and introduce the reversed LZ factorization in Section 3. We present in Section 3.2 our solution for the claim of Theorem 1 without the referred positions, which we compute subsequently in Section 3.3. Finally, we introduce LPnrF in Section 4, and give two solutions for Theorem 2. One is a derivation of our reversed-LZ factorization algorithm of Section 3.2.2 (cf. Section 4.1), the other is a translation of [14] ([Algorithm 2]) to suffix trees (cf. Section 4.2).

Preliminaries
With lg we denote the logarithm log 2 to base two. Our computational model is the word RAM model with machine word size Ω(lg n) for a given input size n. Accessing a word costs O(1) time.
Let T be a text of length n − 1 whose characters are drawn from an integer alphabet Σ = [1 . . σ] with σ = n O (1) . Given X, Y, Z ∈ Σ * with T = XYZ, then X, Y and Z are called a prefix, substring and suffix of T, respectively. We call T[i . .] the i-th suffix of T, and denote a substring . j] and T[i . . j] denote the empty set and the empty string, respectively. The reverse T R of T is the concatenation Given a character c ∈ Σ, and an integer j, the rank query T. rank c (j) counts the occurrences of c in T[1 . . j], and the select query T. select c (j) gives the position of the j-th c in T, if it exists. We stipulate that rank c (0) = select c (0) = 0. If the alphabet is binary, i.e., when T is a bit vector, there are data structures [35,36] that use o(|T|) extra bits of space, and can compute rank and select in constant time, respectively. There are representations [37] with the same constant-time bounds that can be constructed in time linear in |T|. We say that a bit vector has a rank-support and a select-support if it is endowed by data structures providing constant time access to rank and select, respectively.
From now on, we assume that there exist two special characters # and $ that do not appear in T, with $ < # < c for every character c ∈ Σ. Under this assumption, none of the suffixes of T · # and T R · $ has another suffix as a prefix. Let R := T · # · T R · $. R has length |R| = 2|T| + 2 = 2n.
The suffix tree ST of R is the tree obtained by compacting the suffix trie, which is the trie of all suffixes of R. ST has 2n leaves and at most 2n − 1 internal nodes. The string stored in a suffix tree edge e is called the label of e. The children of a node v are sorted lexicographically with respect to the labels of the edges connecting the children with v. We identify each node of the suffix tree by its pre-order number. We do so implicitly such that we can say, for instance, that a node v is marked in a bit vector B, i.e., B[v] = 1, but actually have B[i] = 1, where i is the pre-order number of v. The string label of a node v is defined as the concatenation of all edge labels on the path from the root to v; v's string depth, denoted by str_depth(v), is the length of v's string label. The operation suffixlink(v) returns the node with string label S[2 . . ] or the root node, given that the string label of v is S with |S| ≥ 2 or a single character, respectively. suffixlink is undefined for the root node.
The leaf corresponding to the i-th suffix R[i . .] is labeled with the suffix number i ∈ [1 . . 2n]. We write sufnum(λ) for the suffix number of a leaf λ. The leaf-rank is the preorder rank (∈ [1 . . 2n]) of a leaf among the set of all ST leaves. For instance, the leftmost leaf in ST has leaf-rank 1, while the rightmost leaf has leaf-rank 2n. To avoid confusing the leaf-rank with the suffix number of a leaf, let us bear in mind that the leaf-ranks correspond to the lexicographical order of the suffixes (represented by the leaves) in R, while the suffix numbers induce a ranking based on the text order of R's suffixes. In this context, the function suffixlink(λ) returns the leaf whose suffix number is sufnum(λ) + 1. The reverse function of suffixlink on leaves is prev_leaf(λ) that returns the leaf whose suffix number is sufnum(λ) − 1, or 2n if sufnum(λ) = 1 (We do not need to compute suffixlink(λ) for a leaf with sufnum(λ) = 2n, but want to compute prev_leaf(λ) for the border case sufnum(λ) = 1.).
In this article, we focus on the following two ST representations: the compressed suffix tree (CST) [23,38] and the succinct suffix tree (SST) [21] ([Section 2.2.3]). Both can be computed in O( −1 n) time, where the former is due to a construction algorithm given by Belazzougui et al. [27] ([Section 6.1]), and the latter due to [21] ([Theorem 2.8]), see Table  2. These two representations provide some of the above described operations in the time bounds listed in Table 3. Each representation additionally stores the pointer smallest_leaf to the leaf with suffix number 1, and supports the following operations in constant time, independent of : leaf_rank(λ) returns the leaf-rank of the leaf λ; depth(v) returns the depth of the node v, which is the number of nodes on the path between v and the root (exclusive) such that root has depth zero; level_anc(λ, d) returns the level-ancestor of the λ on depth d; and lca(u, v) returns the lowest common ancestor (LCA) of u and v.
As previously stated, we implicitly represent nodes by their pre-order numbers such that the above operations actually take pre-order numbers as arguments.  Table 3. Time bounds for certain operations needed by our LZ factorization algorithms. Although not explicitly mentioned in [21], the time for prev_leaf is obtained with the Burrows-Wheeler transform [39]

Reversed LZ Factorization
A factorization of T of size z partitions T into z substrings F 1 · · · F z = T. Each such substring F x is called a factor. A factorization is called reversed LZ factorization if each factor F x is either the leftmost occurrence of a character or the longest prefix of F x · · · F z that occurs at least once in (F 1 · · · F x−1 ) R , for x ∈ [1 . . z]. A similar but much well-studied factorization is the non-overlapping LZSS factorization, where each factor F x is either the leftmost occurrence of a character or the longest prefix of F x · · · F z that occurs at least once in F 1 · · · F x−1 , for x ∈ [1 .

Coding
We classify factors into fresh and referencing factors: We say that a factor is fresh if it is the leftmost occurrence of a character. We call all other factors referencing. A referencing factor F x has a reference pointing to the ending position of its longest previous non-overlapping reverse occurrence; as a tie break, we always select the leftmost such ending position. We call this ending position the referred position of F x . More precisely, the referred position of a factor F x = T[i . . i + − 1] is the smallest text position j with j ≤ i − 1 and T[j − + 1 . . j] R = T[i . . i + − 1]. If we represent each referencing factor as a pair consisting of its referred position and its length, we obtain the coding shown in Figure 1.
Although our tie breaking rule selecting the leftmost position among all candidates for the referred position seems up to now arbitrary, it technically simplifies the algorithm in that we only have to index the very first occurrence.

Factorization Algorithm
In the following, we describe our factorization algorithm working with ST. This algorithm performs traversals on paths connecting leaves with the root, during which it marks certain nodes. One kind of these marked nodes are phrase leaves: A phrase leaf is a leaf whose suffix number is the starting position of a factor. We say that a phrase leaf λ corresponds to a factor F if the suffix number of is the starting position of F. We call all other leaves non-phrase leaves. Another kind are witnesses, a notion borrowed from [21] ([Section 3]): Witnesses are nodes that create a connection between referencing factors and their referred positions. We formally define them as follows: given λ is the phrase leaf corresponding to a referencing factor F, the witness w of F is the LCA of λ and a leaf with suffix number The smallest such j is the referred position of λ, which is needed for the coding in Section 3.1. See Figure 2 for a sketch of the setting. In what follows, we show that the witness of a referencing factor F is the node whose string label is F. Generally speaking, for each substring S of T, there is always a node whose string label has S as a prefix, but there maybe no node whose string label is precisely S. This is in particular the case for the non-overlapping LZSS factorization [22] ([Section 3.1]). Here, we can make use of the fact that the suffix number 2n − j for a referred position j is always larger than the length of T, which we want to factorize: Witness node w of a referencing factor F starting at text position i. Given j is the referred position of F, the witness w of F is the node in the suffix tree having (a) F as a prefix of its string label and (b) the leaves with suffix numbers 2n − j and i in its subtree. Lemma 1 shows that w is uniquely defined to be the node whose string label is F.

Lemma 1. The witness of each referencing factor exists and is well-defined.
Proof. To show that each referencing factor is indeed the string label of an ST node, we review the definition of right-maximal repeats: A right-maximal repeat is a substring of R having at least two occurrences R[i 1 . .
]. A right-maximal repeat is the string label of an ST node since this node has at least two children; those two children are connected by edges whose labels start with R[i 1 + ] and R[i 2 + ], respectively. It is therefore sufficient to show that each factor F is a right-maximal repeat. Given j is the referred position of and thus F is a right-maximal repeat. For the other case that j ≥ |F| + 1, assume that F is not a rightmaximal repeat. Then However, this means that F is not the longest reversed factor being a prefix of T[i . .], a contradiction. We visualized the situation in Figure 3.
Consequently, the referred position of a factor F x = T[i . . i + − 1] is the smallest text position j in T with j ≤ i − 1 and one of the two equivalent conditions hold: Figure 3. A reversed-LZ factor F starting at position i in R with a referred position j ≥ |F| + 1. If a =ā with a,ā ∈ Σ, then we could extend F by one character, contradicting its definition to be the longest prefix of T[i . .] whose reverse occurs in T[1 . . i − 1]. Hence, a =ā and F is a right-maximal repeat.

Overview
We explain our factorization algorithm in terms of a cooperative game with two players (We use this notation only for didactic purposes; the terminology must not be confused with game theory. Here, the notion of player is basically a subroutine of the algorithm having private and shared variables.), whose pseudo code we sketched in Algorithm 1. Player 1 and Player 2 are allowed to access the leaves with suffix numbers in the ranges [1 .
. n] and [n . . 2n − 1], respectively. Player 1 (resp. Player 2) starts at the leaf with the smallest (resp. largest) suffix number, and is allowed to access the leaf with the subsequently next (resp. previous) suffix number via suffixlink (resp. prev_leaf). Hence, Player 1 simulates a linear forward scan in the text T, while Player 2 simulates a linear backward scan in T R . Both players take turns at accessing leaves at the same pace. To be more precise, in the i-th turn, Player 1 processes the leaf with suffix number i, whereas Player 2 processes the leaf with suffix number 2n − i. In one turn, a player accesses a leaf λ and maybe performs a traversal on the path connecting the root with λ. For such a traversal, we use level ancestor queries to traverse each node on the path in constant time. Whenever Player 2 accesses the leaf with suffix number n (shared among both players), the game ends; at that time both players access the same leaf (cf. Line 6 in Algorithm 1). In the following, we call this game a pass (with the meaning that we pass all relevant text positions). Depending on the allowed working space, our algorithm consists of one or two passes (cf. Section 3.3). The goal of Player 2 is to keep track of all nodes she visits. Player 2 does this by maintaining a bit vector B V of length 4n such that B V [v] stores whether a node v has already been visited by Player 2, where we represent a node v by its pre-order number when using it as an index of a bit vector. To keep things simple, we initially mark the root node in B V at the start of each pass. By doing so, after the i-th turn of Player 2 we can read any substring of T[1 . . i] R by a top-down traversal from the suffix tree root, only visiting nodes marked in B V . This is because of the invariant that the set of nodes marked in B V is upper-closed, i.e., if a node v is marked in B V , then all its ancestors are marked in B V as well.
The goal of Player 1 is to find the phrase leaves and the witnesses. For that, she maintains two bit vectors B L and B W of length n and 4n, respectively, whose entries are marked similarly to B V by using the suffix numbers (∈ [1 . . n]) of the leaves accessed by Player 1 and preorder numbers of the internal nodes. We initially mark smallest_leaf in B L since text position 1 is always the starting position of the fresh factor F 1 . By doing so, after the i-th turn of Player 1 we know the ending positions of those factors contained in T[1 . . i], which are marked in B L . To sum up, after the i-th turn of both players we know the computed factors starting at text positions up to i thanks to Player 1, and can find the factor lengths thanks to Player 2, which we explain in detail in Section 3.2.2. There, we will show that the actions of Player 2 allow Player 1 to determine the starting position of the next factor. For that, she computes the string depth of the lowest ancestor marked in B V of the previously visited phrase leaf. See Appendix A.
As a side note: since we are only interested in the factorization of T[1 . . n − 1] (omitting the appended # at position n), we do not need Player 1 to declare the leaf with suffix number n a phrase leaf. We also terminate the algorithm when both players meet at position n without checking whether we have found a new factor starting at position n.
Algorithm 1: Algorithm of Section 3.2.2 computing the non-overlapping reversed LZ factorization. The function max_sufnum is described in Section 3.3.
if w is the root then

One-Pass Algorithm in Detail
In detail, a pass works as follows: at the start, Player 1 and Player 2 select smallest_leaf and prev_leaf(prev_leaf(smallest_leaf)), i.e., the leaves with suffix numbers 1 and 2n − 1, respectively. Now the players take action in alternating turns, starting with Player 1. Nevertheless, we first explain the actions of Player 2, since Player 2 acts independently of Player 1, while Player 1's actions depend on Player 2.
Suppose that Player 2 is at a leaf λ R (cf. Line 20 of Algorithm 1). Player 2 traverses the path from λ R to the root upwards and marks all visited nodes in B V until arriving at a node v already marked in B V (such a node exists since we mark the root in B V at the beginning of a pass.). When reaching the marked node v, we end the turn of Player 2, and move Player 2 to prev_leaf(λ R ) at Line 23 (and terminate the whole pass in Line 6 when this leaf has suffix number n). The foreach loop (Line 20) of the algorithm can be more verbosely expressed with a loop iterating over all depth offsets d in increasing order while computing v ← level_anc(λ R , d) until either reaching the root or a node marked in B V . Subsequently, the turn of Player 1 starts (cf. Line 7). We depict the state after the first turn of Player 2 in Figure 4.
If Player 1 is at a non-phrase leaf λ, we skip the turn of Player 1, move Player 1 to suffixlink(λ) at Line 19, and let Player 2 take action. Now suppose that Player 1 is at a phrase leaf λ corresponding to a factor F. Then we traverse the path from the root to λ downwards to find the lowest ancestor w of λ marked in B V . If w is the root node, then F is a fresh factor (cf. Line 11), and we know that the next factor starts immediately after F (cf. Line 13). Consequently, the leaf suffixlink(λ) is a phrase leaf. Otherwise, w is the witness of λ, and str_depth(w) = |F| (cf. Line 14). Hence, sufnum(λ) + str_depth(w) is the suffix number of the phrase leafλ that Player 1 will subsequently access. We therefore mark w and sufnum(λ) = sufnum(λ) + str_depth(w) in B W and in B L , respectively (cf. Lines 16 and 18). We depict the fifth turn of our running example in Figure 5, during which Player 1 marks a witness node. Finally, we end the turn of Player 1, move Player 1 to suffixlink(λ) at Line 19, and let Player 2 take action.   . i − 1] is therefore one of the string labels of the nodes marked in B V . In particular, we search the longest string label among those nodes, which we obtain with the lowest ancestor of λ marked in B V .

Time Complexity
First, let us agree on that we never compute the suffix number of a leaf since this is a costly operation for CST (cf. Table 3). Although we need the suffix numbers at multiple occasions, we can infer them if each player maintains a counter for the suffix number of the currently visited leaf. A counter is initialized with 1 (resp. 2n − 1) and becomes incremented (resp. decremented) by one when moving to the succeeding (resp. preceding) leaf in suffix number order. This works since both players traverse the leaves linearly in the order of the suffix numbers (either in ascending or descending order).
Player 2 visits n leaves, and visits only unvisited nodes during a leaf-to-root traversal. Hence, Player 2's actions take O(n) overall time.
Player 1 also visits n leaves. Since Player 1 has no business with the non-phrase leaves, we only need to analyze the time spent by Player 1 for a phrase leaf corresponding to a factor F: If F is fresh, then the root-to-leaf traversal ends prematurely at the root, and hence we can determine in constant time whether F is fresh or not. If F is referencing, we descend from the root to the lowest ancestor w marked in B V , and compute str_depth(w) to determine the suffix number of the next phrase leaf (cf. Line 15 of Algorithm 1). Since depth(w) ≤ str_depth(w), we visit at most |F| + 1 nodes before reaching w. Computing str_depth(w) takes O(1/ ) time for the SST, and O(|F|) time for the CST. This seems costly, but we compute str_depth(w) for each factor only once. Since the sum of all factor lengths is n, we spend O(n + z/ ) time or O(n) time for computing all factor lengths when using the SST or the CST, respectively. We finally obtain the time bounds stated in Theorem 1 for computing the factorization.

Determining the Referred Position
Up to now, we can determine the reversed-LZ factors F 1 · · · F z = T with B L marking the starting position of each factor with a one. Yet, we have not the referred positions necessary for the coding of the factors (cf. Section 3.1). To obtain them, we have two options: The first option is easier but comes with the requirement for a support data structure on ST for the operation max_sufnum(v) returning the maximum among all suffix numbers of the leaves in the subtree rooted in v.
We can build such a support data structure in O( −1 n) time (resp. O( −1 n log σ n) time) using O(n) bits to support max_sufnum in O( −1 ) time (resp. O( −1 log σ n) time) for the SST (resp. CST); see [22] ([Section 3.3]). Being able to query max_sufnum, we can directly compute the referred position of a factor F when discovering its witness w during a turn of Player 1 by max_sufnum(w). max_sufnum(w) gives us the suffix number of a leaf that has already been accessed by Player 2 since Player 2 accesses the leaves in descending order with respect to the suffix numbers, and w must have already been accessed by Player 2 during a leaf-to-root traversal (otherwise w would not have been marked in B V ). Since R[max_sufnum(w) . . max_sufnum(w) + str_depth(w) − 1] = F R , the referred position of F is 2n − max_sufnum(w). Consequently, we can compute the coding of the factors during a single pass (cf. Line 17 of Algorithm 1), and are done when the pass finishes.
The second option does not need to compute max_sufnum and retains the linear time bound when using CST. Here, the idea is to run an additional pass, whose pseudo code is given in Algorithm 2. For this additional pass, we do the following preparations: Let z W be the number of witnesses, which is at most z since there can be multiple factors having the same witness. We keep B L and B W marking the phrase leaves and the witnesses, respectively. However, we clear B V such that Player 2 has again the job to log her visited nodes in B V . We augment B W with a rank-support such that we can enumerate the witnesses with ranks from 1 to at most z W , which we call the witness rank. We additionally create an array W of z W lg n bits. We want W[B W . rank 1 (w)] to store the referred position 2n − max_sufnum(w) ∈ [1 . . n − 1] for each witness w such that we can read the respective referred position from W when Player 1 accesses w. We assign the task for maintaining W to Player 2. Player 2 can handle this task by taking additional action when visiting a witness (i.e., a node marked in B W ) during a leaf-to-root traversal: When visiting a witness node w with witness rank i from a leaf λ, we write W[i] ← 2n − sufnum(λ) if w is not yet marked in B V (cf. Line 15 in Algorithm 2). Like before, Player 2 terminates her turn whenever she visits an already visited node. The actions of Player 1 differ in that she no longer needs to compute B L and B W : When Player 1 visits a phrase leaf λ, she locates the lowest ancestor w of λ marked in B V , which has to be marked in B W , too (as a side note: storing the depth of the witness of each phrase leaf in a list, sorted by the suffix numbers of these leaves, helps us to directly jump to the respective witness in constant time. We can encode this list as a bit vector of length O(n) by storing each depth in unary coding (cf. [22] ([Section 3.4])). Nevertheless, we can afford the root-to-witness traversals of Player 1 since we visit at most ∑ z x=1 |F x | = n nodes in total.). With the rank-support on B W , we can compute w's witness rank i, and obtain the referred position of λ with W[i] (cf. Line 10 of Algorithm 2). We show the final state after the first pass in Figure 6, together with W computed in the second pass.   (left table).
Overall, the time complexity is O( −1 n) time when working with either the SST or the CST. We use o(n) additional bits of space for the rank-support of B W , but costly z W lg n bits for the array W. However, we can bound z W by O(n lg σ/ lg n) since z W is the number of distinct reversed LZ factors, and by an enumeration argument [40] ([Thm. 2]), a text of length n can be partitioned into at most O(n/ log σ n) distinct factors. Hence, we can store W in z W lg n = O(n lg σ) bits of space. With that, we finally obtain the working space bound of O( −1 n lg σ) bits for the CST solution as claimed in Theorem 1.

Computing LPnrF
The longest previous non-overlapping reverse factor The counterpart of LPnrF for the non-overlapping LZSS factorization is the longest previous non-overlapping factor Hence, we can encode LPnrF in 2n bits by writing the differences LPnrF[i] − LPnrF[i − 1] + 1 ≥ 0 in unary, obtaining a bit sequence of (a) n ones for the n entries and (b) ∑ n i=2 (LPnrF[i] − LPnrF[i − 1] + 1) ≤ n many zeros. We can decode this bit sequence by reading the differences linearly because we know that LPnrF[1] = 0.

Algorithm 2:
Determining the referred positions in a second pass described in Section 3.3.
if w is the root then output fresh factor

Adaptation of the Single-Pass Algorithm
Having an O(n)-bits representation of LPnrF gives us hope to find an algorithm computing LPnrF in a total workspace space of o(n lg n) bits. Indeed, we can adapt our algorithm devised for the reversed LZ factorization to compute LPnrF. For that, we just have to promote all leaves to phrase leaves such that the condition in Line 7 of Algorithm 1 is always true. Consequently, Player 1 performs a root-to-leaf traversal for finding the lowest node marked in B V of each leaf. By doing so, the time complexity becomes O(n 2 ), however, since we visit at most ∑ n i=1 LPnrF[i] = O(n 2 ) many nodes during the root-to-leaf traversals (there are strings like T = a · · · a for which this sum becomes Θ(n 2 )).
To lower this time bound, we follow the same strategy as in [22]

Algorithm of Crochemore et al.
We can also run the algorithm of Crochemore et al. [14] ([Algorithm 2]) with our suffix tree representations to obtain the same space and time bounds as stated in Theorem 2. For that, let us explain this algorithm in suffix tree terminology: For each leaf λ with suffix number i, the idea for computing LPnrF[i] is to scan the leaves for the leaf λ with 2n − sufnum(λ ) being the referred position, and hence the string depth of lca(λ, λ ) is To compute λ , we approach λ from the left and from the right to find λ L (resp. λ R ) having the deepest LCA with λ among all leaves to the left (resp. right) side of λ whose suffix numbers are greater than 2n − i. Then either λ L or λ R is λ . Let L [i] ← str_depth(lca(λ L , λ)) and R [i] ← str_depth(lca(λ R , λ)). Then LPnrF[i] = max( L [i], R [i]), and the referred position is either 2n − sufnum(λ L ) or 2n − sufnum(λ R ), depending on whose respective LCA has the deeper string depth. Note that the referred positions in this algorithm are not necessarily always the leftmost possible ones.
Correctness. Let j be the referred position of the leaf λ with suffix number i such that R[i . .] and R[2n − j . .] have the LCP F of length LPnrF[i]. Due to Lemma 1, there is a suffix tree node w whose string label is F. Consequently, λ and the leaf with suffix number 2n − j are in the subtree rooted at w. Now suppose that we have computed λ L and λ R according to the above described algorithm. On the one hand, let us first assume that R Then w is a descendant of the node w being the LCA of λ and λ R . Without loss of generality, let us stipulate that the leaf λ with suffix number 2n − j is to the right of λ (the other case to the left of λ works with λ L by symmetry). Then λ is to the left of λ R , i.e., λ is between λ and λ R . Since j > 2n − i, this contradicts the selection of λ R to be the closest leaf on the right hand side of λ with a suffix number larger than 2n − i.
Finding the Starting Points. Finally, to find the starting points of λ L and λ R being initially the leaves with the maximal suffix number to the left and to the right of λ, respectively, we use a data structure for answering. maxsuf_leaf(j 1 , j 2 ) returning the leaf with the maximum suffix number among all leaves whose leaf-ranks are in [j 1 . . j 2 ].
We can modify the data structure computing max_sufnum in Section 3.3 to return the leaf-rank instead of the suffix number (the used data structure for max_sufnum first computes the leaf-rank and then the respective suffix number). Finally, we need to take the border case into account that λ is the leftmost leaf or the rightmost leaf in the suffix tree, in which case we only need to approach λ from the right side or from the left side, respectively.
The algorithm explained up to now already computes LPnrF correctly, but visits O(n) leaves per LPnrF entry, or O(n 2 ) leaves in total. To improve this bound to O(n) leaves, we apply two tricks. To ease the explanation of these tricks, let us focus on the right-hand side of λ; the left-hand side is treated symmetrically.
Overview for Algorithmic Improvements. Given we want to compute R [i], we start with a pointer λ R to a leaf to the right of λ with suffix number larger than 2n − i, and approach λ with λ R from the right until there is no leaf closer to λ on its right side with a suffix number larger than 2n − i. Then λ R is λ R , and we can compute R [i] being the string depth of the LCA of λ R and λ. If we scan linearly the suffix tree leaves to reach λ R with the pointer λ R , this gives us O(n) leaves to process. Now the first trick lets us reduce the number of these leaves up to 2 R [i] many for computing R [i]. The broad idea is that with the max_sufnum operation we can find a leaf closer to λ whose LCA is at least one string depth deeper than the LCA with the previously processed leaf. In total, the first trick helps us to compute LPnrF by processing at most ∑ n i=1 max( L [i], R [i]) = O(n 2 ) many leaves. In the second trick, we show that we can reuse the already computed neighboring leaves λ L and λ R by following their suffix links such we process at most 2( R [i + 1] − R [i] + 1) many leaves (instead of 2 R [i + 1]) for computing R [i + 1]. Finally, by a telescoping sum, we obtain a linear number of leaves to process.
On termination, R [i] = str_depth(lca(λ R , λ)) because there is no leaf λ on the right of λ closer to λ than λ R with str_depth(lca(λ , λ)) > str_depth(lca(λ R , λ)) and sufnum(λ ) > 2n − i. Hence, sufnum(λ R ) is the referred position, and we continue with the computation of R [i + 1]. See Figure 8 for a visualization. Broadly speaking, the idea is that the closer λ R gets to λ, the deeper the string depth of lca(λ R , λ) becomes. However, we have to stop when there is no closer leaf with a suffix number larger than 2n − i. So we first scan until reaching a λ having the same lowest common ancestor with λ, and then search within the interval of leaves between λ and λ for the remaining leaf λ R with the largest suffix number. We search for λ because we can jump from λ R to λ with a range minimum query on the LCP array returning the index of the leftmost minimum in a given range. We can answer this query with an O(n)-bits data structure in O( −1 ) or O( −1 log σ n) time for the SST or the CST, respectively, and build it in O( −1 n) time or O( −1 n log σ n) time (cf. [22] ([Section 3.3]) and [41] ([Lemma 3]) for details). However, with this algorithm, we may visit as many leaves . .] is lexicographically larger than R[sufnum(λ) . .]; hence λ R is to the right of λ with sufnum(λ R ) = j + 1 (generally speaking, given two leaves λ 1 and λ 2 whose LCA is not the root, then leaf_rank(λ 1 ) < leaf_rank(λ 2 ) if and only if leaf_rank(suffixlink(λ 1 )) < leaf_rank(suffixlink(λ 2 )).). Otherwise ( R [i − 1] = 0), we reset λ R ← maxsuf_leaf(leaf_rank(λ), 2n). By doing so, we assure that λ R is always a leaf to the right of λ with sufnum(λ R ) > 2n − i (if such a leaf exists), and that we have already skipped In total, we obtain an algorithm that visits O(n) leaves, and spends O( −1 ) or O( −1 log σ n) time per leaf when using the SST or the CST, respectively. We need O(n) bits of working space on top of ST since we only need the values L [i − 1 . . i], R [i − 1 . . i], λ L , and λ R to compute LPnrF[i]. We note that Crochemore et al. [14] do not need the suffix tree topology, since they only access the suffix array, its inverse, and the LCP array, which we translated to ST leaves and the string depths of their LCAs.

Open Problems
There are some problems left open, which we would like to address in what follows:

Overlapping Reversed LZ Factorization
Crochemore et al. [14] ([Section 5]) gave a variation of LPnrF that supports overlaps, and called the resulting array the longest previous reverse factor table LPrF, where LPrF[i] is the maximum length such that T[i . . i + − 1] = T[j . . j + − 1] R for a j < i. The respective factorization, called the overlapping reversed LZ factorization, was proposed by Sugimoto et al. [5] ([Definition 4]): A factorization F 1 · · · F z = T is called the overlapping reversed LZ factorization of T if each factor F x is either the leftmost occurrence of a character or the longest prefix of F x · · · F z that has at least one reversed occurrence in F 1 · · · F x starting before F x , for x ∈ [1 . . z]. We can compute the overlapping reversed LZ factorization with LPrF analogously to computing the (non-overlapping) reversed LZ factorization with LPnrF. As an example, the overlapping reversed LZ factorization of T = abbabbabab is a · bbabba · bab. Table 4 gives an example for LPrF.
Since LPrF[i] ≥ LPnrF[i] by definition, the overlapping factorization seems more likely to have fewer factors. Unfortunately, this factorization cannot be expressed in a compact coding like Section 3.1 that stores enough information to restore the original text. To see this, take a palindrome P, and compute the overlapping reversed LZ factorization of aPa. The factorization creates the two factors a and Pa. The second factor is Pa since (Pa) R = aP. However, a coding of the second factor needs to store additional information about P to support restoring the characters of this factor. It seems that we need to store the entire left arm of P, including the middle character for odd palindromes.
Besides searching for an efficient coding for the overlapping reversed LZ factorization, we would like to improve the working space bounds needed for its computation. All algorithms we are aware of [5,14] embrace Manacher's algorithm [42,43] to find the maximal palindromes of each text position. To run in linear time, Manacher stores the arm lengths of these palindromes in a plain array of n lg n bits. Unfortunately, we are unaware of any time/space trade-offs regarding this array.

Computing LPF in Linear Time with Compressed Space
Having a 2n-bit representation for four different kinds of longest previous factor tables (we can exchange LPnrF with LPrF in the proof of Lemma 2), we wonder whether it is possible to compute any of these variants in linear time with o(n lg n) bits of space. If we want to compute LPF or LPnrF within a working space of O(n lg σ) bits, it seems hard to achieve linear running time. That is because we need access to the string depth of the suffix tree node w for each entry LPF[i] (resp. LPnrF[i]), where w is the lowest node having the leaf λ with suffix number i and a leaf with a suffix number less than i (resp. greater than 2n − i for LPnrF) in its subtree, cf. [34] ([Lemma 6]) for LPF and the actions of Player 1 in Section 4.1 for LPnrF. While we need to compute str_depth(w) for determining the starting position of the subsequent factor (i.e., suffix number of the next phrase leaf, cf. Line 16) for the reversed LZ factorization, the algorithms for computing LPF (cf. [34] ([Lemma 6]) or [44] ([Section 3.4.4])) and LPnrF work independently of the computed factor lengths and therefore can store a batch of str_depth-queries. Our question would be whether there is a δ = O((n lg σ)/ lg n) such that we can accesses δ suffix array positions with a O(n lg σ)-bits suffix array representation in O(δ) time. (We can afford storing δ integers of lg n bits in O(n lg σ) bits.) Grossi and Vitter [45] ([Theorem 3]) have a partial answer for sequential accesses to suffix array regions with large LCP values. Belazzougui and Cunial [24] ([Theorem 1]) experienced the same problem for computing matching statistics, but could evade the evaluation of str_depth with backward search steps on the reversed Burrows-Wheeler transform. Unfortunately, we do not see how to apply their solution here since the referred positions of LPF and LPnrF have to belong to specific text ranges (which is not the case for matching statistics).

Applications in Compressors
Although it seems appealing to use the reversed LZ factorization for compression, we have to note that the bounds for the number of factors z are not promising: Lemma 3. The size of the reversed LZ factorization can be as small as lg n + 1 and as large as n.

Proof. The lower bound is obtained for
For the upper bound, we consider the ternary string T = abc · abc · · · abc whose factorization consists only of factors of length one since T R = cba · cba · · · cba has no substring of T of length 2 (namely, ab, bc, or ca) as a substring (cf. [46] ([Theorem 5])).
Even for binary alphabets, there are strings for which z = Θ(n): Lemma 4 ([46] (Theorem 9)). There exists an infinite text T whose characters are drawn from the binary alphabet such that, for every substring S of T with |S| ≥ 5, S R is not a substring of T.
Funding: This work is funded by the JSPS KAKENHI Grant Numbers JP18F18120 and JP21K17701.

Data Availability Statement: Not applicable.
Acknowledgments: We thank a CPM'2021 reviewer for pointing out that it suffices to store W in z W lg n bits of space in Section 3.3, and that the currently best construction algorithm of the compressed suffix tree indeed needs O( −1 n) time instead of just O(n) time.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A. Flip Book
In this appendix, we provide a detailed execution of the algorithm sketched in