Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Trajectory Clustering and k-NN for Robust Privacy Preserving Spatiotemporal Databases

Algorithms 2018, 11(12), 207; https://doi.org/10.3390/a11120207

by Elias Dritsas^1,*, Maria Trigka¹

, Panagiotis Gerolymatos² and Spyros Sioutas¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Algorithms 2018, 11(12), 207; https://doi.org/10.3390/a11120207

Submission received: 30 October 2018 / Revised: 9 December 2018 / Accepted: 10 December 2018 / Published: 14 December 2018

(This article belongs to the Special Issue Humanistic Data Mining: Tools and Applications)

Round 1

Reviewer 1 Report

1. Results: Recommend to be Major revisions

This paper investigates the k-anonymity (consists of k nearest neighbours) of mobile users based on real trajectory data, by constructing a vector of the form (x,y,g,v) where x, y are the spatial coordinates, g the angle direction, v the velocity of mobile users, and study the problem in four-dimensional space. The experimental results demonstrate, based on real spatial data-sets, the anonymity robustness, the so-called Vulnerability, of the proposed method.

It is interesting. However, it requires some major revision.

Firstly, this paper requires more relevant literature reviews to support their proposed model, such as the clustering methods and their applications in trajectory by using spatio-temporal databases, i.e., lacking of critical comments and theoretical supports to convince readers why they chose the K-Means and the k nearest neighbours algorithm to conduct the clustering problem. Authors also do not provide very critical and sufficient literature review to indicate the background why the proposed approach is able to receive more satisfied improvement, i.e., it is difficult to see its significant contribution to literature novelty. What readers require is, by convinced literature review, to understand the clear thinking/consideration why the employed models can be used to solve the problem. This is the very contribution from authors. In addition, authors also should provide more sufficient critical literature review to indicate the drawbacks of existed approaches, then, well define the main stream of research direction, how did those previous studies perform? Employ which methodologies? Which problem still requires to be solved? Why is the proposed approach suitable to be used to solve the critical problem? We need more convinced literature reviews to indicate clearly the state-of-the-art development of clustering problems.

By the way, only 8 cited journal papers are published after 2010s, no any cited papers are from Algorithms. Please notice the-state-of-the-art and the audiences of Algorithms. In the meanwhile, be as a journal paper, please also avoid citing “arXiv preprint arXiv:1501.00614” paper.

Secondly, in Section 2.4, authors should provide the details illustrating the procedure how the proposed framework is working in the experimental results section, i.e., with some essential brief explanation vis-à-vis the text with the flow chart (Figure 1) to indicate how the proposed methodology is working in the experimental results section.

Thirdly, in Section 4, authors should provide some insight discussion to reveal the very merits of the proposed approach.

Finally, readers need more convinced literature reviews to understand the necessity of the proposed model. I hope authors could eventually verify the proposition by comprehensive literature reviews, clear flow chart illustration, and insight discussions.

Author Response

Point1.Firstly, this paper requires more relevant literature reviews to support their proposed model, such as the clustering methods and their applications in trajectory by using spatio-temporal databases, i.e., lacking critical comments and theoretical supports to convince readers why they chose the K-Means and the k nearest neighbours algorithm to conduct the clustering problem. Authors also do not provide very critical and sufficient literature review to indicate the background why the proposed approach is able to receive more satisfied improvement, i.e., it is difficult to see its significant contribution to literature novelty. What readers require is, by convinced literature review, to understand the clear thinking/consideration why the employed models can be used to solve the problem. This is the very contribution from authors. In addition, authors also should provide more sufficient critical literature review to indicate the drawbacks of existed approaches, then, well define the main stream of research direction, how did those previous studies perform? Employ which methodologies? Which problem still requires to be solved? Why is the proposed approach suitable to be used to solve the critical problem? We need more convinced literature reviews to indicate clearly the state-of-the-art development of clustering problems.

Response: The literature review was revised, and the section “Discussion” was rewritten in the right direction. We made a careful revision of this one. Also, in “Preliminaries” a new section entitled “Problem Definition” added and sections concerning “Clustering” and “Classification” were enriched. All of them, we believe that now justify our choice and analysis.

Point 2.By the way, only 8 cited journal papers are published after 2010s, no any cited papers are from Algorithms. Please notice the-state-of-the-art and the audiences of Algorithms.

Response: We made a revision of the references and replaced them by recent relevant ones. We added journal papers from Algorithms.

Point 3. In the meanwhile, be as a journal paper, please also avoid citing “arXiv preprint arXiv:1501.00614” paper.

Response: We excluded such a reference from our manuscript and replaced by an appropriate one.

Point 4.Secondly, in Section 2.4, authors should provide the details illustrating the procedure how the proposed framework is working in the experimental results section, i.e., with some essential brief explanation vis-à-vis the text with the flow chart (Figure 1) to indicate how the proposed methodology is working in the experimental results section.

Response: The SMaRT framework is described in detail in reference [12] of the resubmitted manuscript. Through this system a number of trajectories is created and stored in a relational database. For each Cartesian point (x,y) is also computed the tuple (angle,velocity) considering that trajectory is linear among the studied time-stamps. The linear approximation of non-linear trajectories makes attributes (x,y) correlated with (angle, velocity). Hence, the SMaRT system approximates users’ non-linear trajectories with linear ones from time-stamp to time-stamp. This assumption impacted results, in which, we observed that the attributes (angle,velocity) didn’t enhance both methods robustness. This is due to the curse of dimensionality. Relevant comments and discussion have already been done into Section Experiments.

We processed the data sets (a number of trajectory data points for a number of time-stamps and a number of mobile users), stored in a .csv file, in MATLAB environment. The investigated methods presented in Section 2.4 and the flow chart in figure 1 are independent procedures and any clustering and classification method (or combination of them) can be applied.

Point 5.Thirdly, in Section 4, authors should provide some insight discussion to reveal the very merits of the proposed approach.

Response: We added relevant text at the end of the section “Discussion” which presents the advantages of our approach, along with constraints, and issues to be elaborated in a future work.

Reviewer 2 Report

Please see attached document.

Comments for author File: Comments.pdf

Author Response

Point1. A more descriptive definition of the problem is needed prior to the methods section. A preliminaries section should be included that adds a description of the problem, and a review of the classical clustering algorithms used.

Response: In the section “Preliminaries” added a subsection named “Problem Definition” in which we make a comprehensive and more substantial description of the studied problem which led us in the design of methods presented in section “System model”. Moreover, it is explained the performance measure used to evaluate compared approaches.

Point2. It is not clear why vulnerability is defined as it is. Perhaps highlighting the problem first, and then providing a brief review of prior efforts for addressing location privacy can better highlight this.

Response: An explanation about vulnerability definition is addressed in “Problem Definition”.

Point 3. Is it possible that by studying only Euclidean measures of similarities between trajectories, nonlinear and higher-order measures are neglected that might compromise location privacy? This should be explored further.

Response: This comment is very good. We will further elaborate the investigated approaches from this perspective in the extended version of this work.

Point 4. A more descriptive explanation for some of the assumptions regarding motion on roads and highways versus other motions should be provided.

Response: Assumptions about road networks are related with SMaRT system from which extracted the trajectory data we experimented on. Thus, further analysis is out of the scope of this work.

Point 5. Why use crisp techniques when fuzzy clustering with various levels of fuzziness and shape of the covariance matrices along with relative Euclidean distance may be more appropriate for the data?

Response: It is expected that K-Means algorithm may be a good option for exclusive clustering, but Fuzzy C-Means may give good results for overlapping clusters. Fuzzy C-Means assigns each mobile object to different clusters with varying degrees of membership. Not to mention that it has much higher time complexity than K-Means.

Point 6. Could adding transformations and lagged samples improve or decrease the effectiveness of the proposed approach?

Response: A useful remark which requires further analysis and discussion. We take it into serious consideration and such an approach shall be an object in our future work.

Point 7. Fig. 2 is difficult to read as the test is too small and not clearly laid it. it can be improved.

Response: Fig.2 was improved in terms of size and the new version is presented inside the manuscript.

Point 8. Fig. 3 should have units on the axis and perhaps some legend or description to highlight the different trajectories.

Response: Fig.3 demonstrates mobile users’ positions in X-Y plane. Each color corresponds to a different mobile user trajectory’ points. Axis units and labels added, and the enhanced version was incorporated in the resubmitted manuscript. Some colors in the figure are repeated, but they relate with a different user trajectory. This happens since programming environment, e.g., MATLAB, utilizes a restricted number of different colors to plot the trajectories and since the number of trajectories is much greater than the available colors. Ultimately, the number of trajectories is large enough to use a legend for each one.

Point 9. Some language grammar and sentence structure issue should be addressed: for instance, “Facebook, Twitter” should be “Facebook and Twitter”, “in case of clustering them or not” should be “in cases of with and without clustering”, “irrespective of the used method” should be “irrespective of the method used”, “k nearest neighbor” should by hyphenated as “k-nearest neighbor”, “ids” should be “labels”, etc. Overall, editing of the paper should improve the language.

Response: A global revision made in the manuscript concerning language grammar and sentences’ structure. We took into consideration and aforementioned points.

Round 2

Reviewer 1 Report

Authors have completely addressed all my concerns.

Author Response

There are no additional comments to address.

Reviewer 2 Report

The manuscript is significantly improved in the revision. However, there are some omissions that should be addressed in the paper:

1) How can this problem benefit from neural networks and deep learning approaches?

2) Is there a certain robustness to the technique to handle outliers, missing values, corrupted signals, noise, artifacts, and other disturbances?

3) Can there be hidden latent patterns that are difficult to extract and require more advanced processing?

Author Response

Point 1. How can this problem benefit from neural networks and deep learning approaches?

Response: We are studying how neural networks and deep learning approaches (see the following literature) could be used in order to construct rapidly clusters of mobile objects with similar motion patterns in a more accurate way (with negligible false positives) achieving more robust Privacy Preserving Techniques.

Point 2. Is there a certain robustness to the technique to handle outliers, missing values, corrupted signals, noise, artifacts, and other disturbances?

Response: The aforementioned issues can not be addressed effectively by the proposed method. For this reason, we aim at using deep learning techniques.

Point 3. Can there be hidden latent patterns that are difficult to extract and require more advanced processing?

Response: The suggested method cannot address such cases effectively. Deep learning techniques, which are studied according to the following literature, are going to be used.

Studying Literature

Algorithms 2018, 11(2), 21

Doi:10.3390/a11020021

Algorithms 2018, 11(12), 192

Doi: https://doi.org/10.3390/a11120192

Algorithms 2018, 11(10), 158;

https://doi.org/10.3390/a11100158

https://arxiv.org/pdf/1610.04794.pdf

SIGMOD Record, June 2016 (Vol. 45, No. 2)

https://sigmodrecord.org/publications/sigmodRecord/1606/pdfs/04_vision_Wang.pdf

SIGMOD’18, June 10–15, 2018, Houston, TX, USA

http://pages.cs.wisc.edu/~anhai/papers1/deepmatcher-sigmod18.pdf

ACM SIGKDD ’16, August 13-17, 2016, San Francisco, CA, USA

https://www.kdd.org/kdd2016/papers/files/rfp0191-wangAemb.pdf

Proceedings of the VLDB Endowment, Vol. 9, No. 13

http://www.vldb.org/pvldb/vol9/p1425-boehm.pdf

Proceedings of the VLDB Endowment, Vol. 7, No. 13

http://www.vldb.org/pvldb/vol7/p1772-tencent.pdf

Round 3

Reviewer 2 Report

The authors have sufficiently revised the manuscript to address my concerns. I have no more questions, comments, or suggestions.

Article Menu

Trajectory Clustering and k-NN for Robust Privacy Preserving Spatiotemporal Databases

Further Information

Guidelines

MDPI Initiatives

Follow MDPI