Similarity Search on Semantic Trajectories Using Text Processing
Round 1
Reviewer 1 Report
This paper introduces the SETHE (SEmantic Trajectory HuntEr) framework, which aims at finding subtrajectories matching a given query in a database of semantically enriched trajectories. This framework is based on the encoding of semantically enriched trajectories as tuples of sequences containing strings of text, it accelerates the finding of results for complicated queries and enables the finding of trajectory subsequences that yield an approximate match with a query, with a ranking of the pertinent trajectories based on weights that can be attributed to each aspect of the query.
After an analysis of existing methods to query semantically enriched trajectory databases in section 2, the authors present their approach in section 3. In this section, they explain how semantically enriched trajectories and queries are encoded, define the notion of Aspect-Based Semantic Subsequence Trajectory, which is crucial in the execution of the queries, and give a detailed account of the querying method. Section 4 gives an example of how a query is run, showing the details of how matching subsequences are found and similarity scores are computed for each trajectory. Section 5 describes an experiment, with the objective to evaluate the performance of the method, regarding both the running time and the number of matching trajectories found. This experiment shows that the SETHE method is not consistently faster than a SPARQL query, but is more efficient than SPARQL on complicated queries, when SPARQL running times are a lot more irregular. It also finds more matching trajectories than SPARQL, which is especially useful when there are no or few exact matches.
The encoding and especially the querying method are new and interesting ; the capacity of the method to find partial matches is also an important aspect of its novelty. The paper is generally well-written and well organized. In particular, section 4 is extremely useful as it enables the reader to have a detailed vision of how the process functions, it is a very helpful addition to section 3, where the description of the algorithms are sometimes harder to understand. The experiment in section 5 is appropriate to evaluate the performance of the algorithm.
Here are a few points that warrant editions:
1) My only major problem is with subsection 3.2. The textual descriptions of the three algorithms should give more elements to the reader on how they function.
In particular, at the first reading, it is difficult to understand how the algorithm combines the subsequences matching the first regular expression e₁ and the second regular expression e₂ to obtain a tree containing only subsequences
The text of Algorithm 1 implies that a second call of the function createTree is made with the regular expression e₂ on the running tree, that is on the tree constructed by the first call of the function createTree, but the procedure of the function createTree only adds nodes to the tree, and does nothing to remove paths that do not satisfy e₂.
The example in Section 4 makes it clear that both calls of the createTree function create nodes on different levels of the tree, and that no level is affected by both calls. It is a key point that needs to be mentioned in the description of Algorithm 1. This also means that the query is supposed to contain only one of the name or the category for a given POI.
In general, the reader needs to grasp a few key points to really understand how each algorithm works, and it would be useful if the authors modified the writing of this subsection to really stress thes key points.
2) There are a few places with errors or where better formulations would improve the paper:
l.14 : enabled mobile object monitoring in geographic spaces, such as people, ... → enabled the monitoring in geogarphic spaces of mobile entities such as people, …
l.18 : with the […] (GPS) is being the most… → with the […] (GPS) being the most…
l.20 : Generally, the points of a GPS trajectory are not only ordered in time but also have timestamps
l.28-29 : I do not understand why time is listed in this enumeration
l.47-48 : I think « Examples of POI include... » should be « Example of POI aspects include »
l.79-81 : This sentence si technically not true, as the transport mean aspect for the first location is also not satisfied.
l.185 : When it is written that other aspects are optional, does that mean that they are optional for a complete trajectory (which then either have or does not have that aspect completed), or alos for individual POIs in a trajectory (when some aspects may only be partially completed for a given trajectory) ?
l.186-188: The reader understand that aspect based subsequence trajectories are introduces to enable the matching of trajectories with queries that do not necessarily encompass all points of the trajectories, but it is only implicitly written at lines 186-188 (with a more direct explanation at line 218). A more explicit formulation explaining the interest of aspect based subsequence trajectories would make the understanding of the article more fluid.
Non-numbered lines after l.191 : There appears to be some confusion in this sentence. I would guess it should read : SST_{f1} represents a subsequence of ST_{f1}. […] the SST_{f1} POIs are sequential points of ST_{f1}
Table 2 : Euclidean(5.0, 1.0)=4.0
Table 3 : (?-),Explanation : until the last expression of the trajectory begins
l.243-248 : The definitions of W, D and L imply that no distance and no weight can be attributed to the aspect « Category ». Is it a technical choice made to ensure that at least one aspect has the rôle of an unmodifiable pivot, which facilitates the comparison of queries with trajectories, or is this a more philosophical choice, made with the view that the POI categories are the structuring elements of the queries and that one should not apply similarity measures on this particular aspect ?
Equation (1) : mob b should be mod b
l.377 does not fit on the page
Section 4 Example : I do not see hwo the three trajectories STF, STG and STH match the regular expression e_2=(church|chapel)(.*)(museum), unless automatc translation is involved. To make the example clear, categories in the query and in the semantic trajectories should match (preferably be in english, but if the trajectories come from a real dataset, the query should be modified to match the categories of the dataset)
Table 17 : It is doubtful that categories museidipisa, cappelledipisa, torridipisa… accurately represent the semantic queries in Table 16, as thedataset has other categories, like torridifirenze, that could also match these semantic queries.
Table 17 : Q2 : two uses of .* have a different rendering. Besides, I do not understand why (cappelledipisa|chiesedipisa) appears twice in the regular expression e₂. Won’t the SETHE process only identify subtrajectories that visit two of these objets (which is not asked in the query) ?
Q10 : There seems to be a .* missing at the end of the expression \alpha₁.
3) Lastly, I find that allowing only exact match for the weather (in part 3), and then the transportation (in part 5) aspect of trajectories is interesting for the description of the method, but the authors should stress that such a stringent condition as "equals" will only yield results if the words encoding the trajectories and the queries are chosen from a restrained syllabus.
Another limitation is that SETHE does not seem to allow trajectories to have missing information for a given aspect (unless a default setting gives a similarity of 0 for the aspects for which information is missing). Would it be possible to adapt the method to query a database where a lot of aspect values are missing?
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 2 Report
This article proposes a new approach for representing and querying semantic trajectories based on text-processing techniques.
They described a framework, called SETHE (SEmantic Trajectory HuntEr), that performs similarity queries on semantically enriched trajectory databases.
This article is interesting for the community by proposing new algorithms and a new way to query semantic trajectories taking into account not only the POIs name and categories, but also the semantic trajectory aspects.
Their approach is validated on a case study using TripBuilder, a trajectory dataset built from Flickr data, combined with Wikipedia data.
The main objective of this study is to compare their approach with those of SPARQL queries.
A good point is the share of code.
We have some questions :
-The authors give quantitatives mesures about performance, number of results per query or memory space consumption, but they don't give qualitative mesure like recall and precision for each query.
Regarding the evaluation part, we would like to have more elements on recall and precision about the results of the different queries.
- About the Query language : how to reduce the complexity to express the query for end-user like geographers or others non computer scientists end-users?
- About your environment ? How to evaluate end-user satisfaction ? Can you compare it to other frameworks on non technical aspects?
- Is your environment enough generic to accept other dataset ?
- This environment only works if the input data are enriched trajectories? What if the dataset contains only raw data?
- You don't talk about disambiguation of terms. You never encounter this kinf of problem?
Some mistakes
line 283 : "Finally, SETHE uses two regex functions over the database to look for the semantic trajectories that passed through the museum and the Leaning Tower of Pisa."
Why "the Leaning Tower of Pisa"?
line 390 : "The trajectories of the means of transport and rating aspects relate to STF, STG e STH..." : e--> and
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
This is quit important research on improving the semantic Trajectory studies.
However, some of the improvement many enhance the quality of the paper.
Some wording needs to be changed;
GPS (Global Positioning System/US)-GNSS (Global Navigation Satellite System/UN) GPS provided XYZ coordinate but author only utilize the XY(2D) information this study.
In the conclusion, Author mentioned that this study made based on the Spatio-Temporal approach but I could not find any temporal facts.
author may need categories some of elements based on time such as temperature and choice of the transportation.
Trajectory may be hardly impacted by time and some complex weight model could be developed for improving this study.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
The authors have taken the remarks of the first report into account in a thoroughful manner. These changes make the article easier to understand, in particular in the detailed descriptions of their core algorithms.
I have two editing corrections :
Algorithm 4, l.11 : extractSubtrajectory should be extractSubsequence (it is a recursive call of the function)
l. 496-501 repeat l. 490-495
Reviewer 2 Report
Thank you for taking our comments and suggestions into account.
It remains a short mistake : you repeat the same paragraph : lines 490-495 are the same as lines 496-501.
Due to the ranking algorithm, SETHE can return more results, which implies a better 496
recall, as shown in Figure 7. The first SETHE query results are the same as the SPARQL 497
query; they are those results that fit perfectly with the user query. The blue bars in the graph 498
in Figure 7 represents the result set returned by SETHE that is not retrieved by SPARQL. 499
The trajectories in the blue area are similar to the user query specifications but do not fit 500
perfectly with the SPARQL queries. 501