Next Article in Journal
Community Structure and Systemic Risk of Bank Correlation Networks Based on the U.S. Financial Crisis in 2008
Previous Article in Journal
Validation of Automated Chromosome Recovery in the Reconstruction of Ancestral Gene Order
Previous Article in Special Issue
Subpath Queries on Compressed Graphs: A Survey
 
 
Article
Peer-Review Record

Reversed Lempel–Ziv Factorization with Suffix Trees

Algorithms 2021, 14(6), 161; https://doi.org/10.3390/a14060161
by Dominik Köppl
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Algorithms 2021, 14(6), 161; https://doi.org/10.3390/a14060161
Submission received: 17 April 2021 / Revised: 13 May 2021 / Accepted: 20 May 2021 / Published: 21 May 2021
(This article belongs to the Special Issue Combinatorial Methods for String Processing)

Round 1

Reviewer 1 Report

The paper describes novel (almost) linear algorithms computing the reversed Lempel-Ziv factorization and the Longest Previous non-overlapping reversed Factor (LPnrF) array using succinct and compressed suffix trees. The paper is in line with a series of works by the same author in which succinct and compressed suffix trees are used to compute different Lempel-Ziv-like structures on strings. 

The considered problems are not quite well motivated, in my opinion. The reversed Lempel-Ziv factorization seems of little use besides the computation of gapped palindromes, for which it was introduced. Further, all techniques used for the presented solutions are not novel and mostly only cosmetic rather incremental changes were needed to adapt them for this work (this fact is, however, adequately emphasized in the text): the bottom-up traversals of the suffix tree with markings appeared in the LZ-CISS algorithm by the same author and the top-down descending using level ancestor queries was presented, for instance, in the recent paper on the computation of the LPF array by the same author. 

Nevertheless, the addressed problems have been attracting the attention of many researchers in the last decade and the new results close obvious open problems in this line of investigations. The text is well written and is full of examples and illustrations clarifying the content so it is a pleasure to read it. Weighing all the pros and cons, I after all recommend to accept the paper. 

Minor suggestions and typos:
line 62: "an LPnrF entry" -> "LPnrF entries"
line 83: "an solution" -> "a solution"
lines 133-137: what are the lambda, u, v here? pre-order numbers of the vertices? or some identifiers?
page 4, footnote: "case that" -> "case"
Figure 1, caption: "T[p-\ell-1 .. p]^R" -> "T[p-\ell+1 .. p]^R"
line 196: "all node" -> "all nodes"
line 206: is not it the case that we know the ending positions of those factors that start at positions from [1..i], not only those that are contained in T[1..i]?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The reversed Lempel-Ziv factorization represents a string w as a concatenation of factors F_1F_2...F_n
where each F_k is either a letter, when this is the left-most occurrence of this letter,
or F_k is the longest prefix of F_kF_{k+1}...F_n which occurs also in F_{k-1}^rF_{k-2}^r...F_{1}^r, where w^r is the reverse of w.
It is known that this problem can be solved in linear time.
The paper presents linear running time algorithms for this problem, while trying to minimize the used space.
Since the computation of such factorization always involves some sort of suffix array,
the goal is to use the space as (some sort of) compressed suffix array.
Then this approach is generalized to the computation of a table,
which for a position i returns the length and position of the longest substring that is a prefix of w[i+1..] and w[1..i-1]^r;
this increases the computation time to n log n.


This goal is reached. The representation of the suffix arrays are treated as black boxes and the technique used to compute the reversed LZ factorization
uses a recent technique of traversing the suffix array (first developed for the usual LZ factorization).
Some technical details need to be solved on the way.


The reversed Lempel-Ziv factorization is a nice and reasonable object,
the minimization of space is a usual goal in string-oriented algorithmics.
So I find the problem reasonable and worth investigating.
On the downside, all the crucial tools of the solution are known and the paper mostly presents a neat blend of known techniques
together with technical skills needed to merge them together.
Still, I think that the topic and the results are of appropriate quality and they deserve a publication.


My main reservation is the writing. While linguistically and stylistically it is very good,
I am puzzled by the use of formalism, which often replaces plain words description
and makes reading a difficult and time consuming task.
I would ask the author to try to improve and simplify the writing.
In particular I gave up on one of the arguments.


Detailed comments:
My main complaint is about the usage of the formalisms.
I would propose to use the formalism together with some intuitive description/plain words, not instead.
Lines 161-165 are a prime example: it is notation heavy long description of a very clear and natural notion.
 

I am thankful for the detailed examples, but their size makes them cumbersome.
I gave up on trying to understand them.


Please make the definition of the problem more precise in the introduction.

Perhaps comment, whether such LZ factorization is optimal, when it is computed greedily (I guess it is).

36
"we follow the following"

58
Please expand the LPnrF acronym somewhere.

Please note somewhere that the suffix tree has the children sorted by the same order in each node.
This is always the case in any application, but this is not canon, in the sense that textbooks do not explicitly require this.

125
As you use suffixlink, perhaps use suffixlink^{-1} instead of prev_leaf?

151
Could you comment, why you always select the leftmost position.

161-165
This formal definition is very difficult to parse. Please try to make in using more plain words.

161
is the leaf corresponding to F

In your definition corresponding leafs do not correspond to any particular factor.

Lemma 1
Please move the comment after the lemma or before it. Also, introduce somehow, what could go wrong and what we need to prove.

178 The comment is a bit misleading: the reason why in LZSS there could be overlaps is because we read both substrings from left-to-right.
When the earlier one is read in the opposite direction, there cannot be any overlap.

Figure 2 What is F^T? I think it should be F^R.

205
Remove the comma after "Player 1".

207
and can find the next factor thanks to Player 2.

Please explain why and how. Sentence unclear.

259
I think that you can visit |F|+1 nodes, but this is still fine.
Please mention that nodes in B_L are an upper-closed set, i.e. when v is in then all its ancestors are there as well.

309: but simplifies the analysis for T# having exactly n characters

Unclear, do you mean T#[i..] instead of T#?

You lost me in 4.2. What to algorithm deos exactly, why is this correct?
Please explain in more detail, starting from the first trick''. In particular,
try to explain without the usage of formalisms, when this is not needed.

l372 Shouldn't the equality be an inequality?

l437-439
Unclear

443-444
Unclear, how is \gamma quantified? Is \gamma = 1 OK?

5.3 Section name is a bit strange

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The author presents an algorithm for computing the reversed Lempel-Ziv factorization using a combined visit either of a compact suffix tree or of a succint suffix tree representation of the given input text (on an integer alphabet) concatenated to its reverse. The time bound matches that needed to construct the compact/succint suffix tree. He also shows how to adapt the algorithm in order to compute (a succint representation of) the longest previous non-overlapping reverse factor table. 

The manuscript does not present novel techniques; however, it combines in non-trivial way several ideas or tricks already presented. The aim is rather more theoretical than practical, which it is fine for this journal. I have particularly appreciated the exposition and the figures, that greatly help the reader in the understanding of the algorithms. 

I have no major concerns and I propose to accept the paper in the present form. 

Minor issue: line 241 str_depth(w) --> string label of w 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop