A Time-Identified R-Tree: A Workload-Controllable Dynamic Spatio-Temporal Index Scheme for Streaming Processing
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis article proposes a new indexing mechanism for streaming spatiotemporal data. The article is well written and easy to read. The theme addressed is very important and I believe it is interesting for the IJGI audience. I would like to address some issues to improve article quality.
1) There are minor typos: "spatial temporal" should be "spatiotemporal"; "grows while data inserted" should be "grows while data is inserted". Line 170 you should not start a sentence using "And".
2) There are some assumptions in the article without evidences:
line 34: "well constructed dynamic index"
line 68: "supoprts resilient deploy"
3) Authors should mention dataset size and update frequency
4) My main concern on this article refers to the experiments. Streaming spatiotemporal data demands for a distributed and parallel architecture using streaming software such as Airflow, Kafka, Spark, etc. The authors implemented their experiments in a single laptop which does not seem to be appropriate for the problem addressed. It seems to me a toy example,
Author Response
Thank you for your comments. Please see the attachment (combined response-letter.pdf, revision.pdf, and diff.pdf).
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors
Methodology Section:
Architecture of TIRF is briefly explained. However, there is no detailed information about how the components have been implemented. Is the Stream-diverter implemented on top of an external tool? Is it an open-source software? Is the Warehouse implemented with any sort of database? An object repository?. Please provide detailed explanation, and the source code for any part of your contribution.
The explanation of TIR should be clearer. Please clarify what is the time complexity of parent.remove() in Algo#2, line 16
Experimental Section:
This is a key section in any CS/Algorithms/IS manuscript, as it is used to verify the author's claims. Unfortunately, the experiments are questionable as important information an details are missing. The point is that any reader should be able to reproduce your experiments; however, this is not possible in your manuscript.
Strengths:
- Two large, real-life datasets are chosen, plus one dataset of random points. This is useful to validate different use cases
Weaknesses:
- Source code should be available. Github is a popular tool to do so.
- Line 233: “Each dataset has been pre-processed”. What is the objective? How? What is the schema of the pre-processed data?
- The authors are backing the claims of performance benefits by comparing to Postgres. However, this is not an apples-to-apples comparison: The prototype is “executed purely in memory” ( line 247); in contrast, PosgreSQL is not an in-memory system. You need run your experiments in a comparable system (in-memory system)
- How is measured the time axis in Figure 7-a, and Figure 8-a ?
Comments on the Quality of English Language
Minor typos in the document may hinder the clarity. A few examples are listed below. Please run a grammar checker.
Line 42: "To make sure every element to be received" . I do not understand what do you mean
Line 170: " And RUM appends a update memo" : Use "an" instead of "a"
Line 288: Then we "teat" the search performance. ???
Line 288: There are two "type" of time range . Please use "types"
Miscellaneous:
Section 5 is titled "Experiment". I would suggest something like "Experimental Evaluation"
Author Response
Thank you for your comments. Please see the attachment (combined response-letter.pdf, revision.pdf, and diff.pdf).
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis manuscript suffers from notable deficiencies in presenting a coherent background and articulating the specific problems it seeks to resolve, resulting in a lack of clarity for the reader. The document requires elucidation on the following aspects:
- What constitutes the content of the stored data? Is it a singular, temporally evolving location dataset or multiple streams of data?
- How are R-trees constructed? What methods are employed for data addition, removal, and retrieval?
- The experimental and comparative research should primarily focus on evaluating the proposed approach against other storage structures.
Author Response
Thank you for your comments. Please see the attachment (combined response-letter.pdf, revision.pdf, and diff.pdf).
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI am satisfied with the author's response to my first round review.
Author Response
We would like to thank you for your valuable comments and advice in improving the manuscript.
Reviewer 2 Report
Comments and Suggestions for AuthorsThanks for addressing the comments of the reviewers.
Comment #1
In the experimental section, results of experiments of KDtree are presented in Fig 8. What is this KD-tree? No explanations about it has been provided. You have 2 options to correct this issue:
1) Explain what is a KD-Tree, as you did it with the other types of indexes
2) Remove KD-Tree from the charts, and other references.
Comment #2
Figure 8.A and Fig 9.A look significatively different, after re-running the experiments. You have updated the charts; however, the discussion have not been updated, after the change of results.
Author Response
We would like to thank you for your valuable comments and advice in improving the manuscript.
Comments #1: In the experimental section, the results of experiments of KDtree are presented in Fig 8. What is this KD-tree? No explanations about it have been provided.
In line 264, an explanation is appended, "Kd-tree (a K-dimension tree for 2D attribute)".
Comments #2: Figure 8.A and Fig 9.A look significatively different, after re-running the experiments. You have updated the charts; however, the discussion has not been updated, after the change of results.
With the experimental environment changed, the relative coordination of curves in Fig 8 and Fig 9 have not been changed but are more stable. The total time costs have a consistent reduction. The change in results doesn't impact the origin discussion. We are sorry that the lack of explanation in the previous response led to this misunderstanding.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsAn efficient spatio-temporal indexing model for Streaming Processing is proposed, but how to integrate with existing database systems needs further research.
Author Response
We would like to thank you for your valuable comments and advice in improving the manuscript. And your suggestion has motivated our future research. We will try our best to respond to your expectation.