Next Article in Journal
Forecasting Short-Term Passenger Flow of Subway Stations Based on the Temporal Pattern Attention Mechanism and the Long Short-Term Memory Network
Next Article in Special Issue
Multi-GPU-Parallel and Tile-Based Kernel Density Estimation for Large-Scale Spatial Point Pattern Analysis
Previous Article in Journal
Geospatial Network Analysis and Origin-Destination Clustering of Bike-Sharing Activities during the COVID-19 Pandemic
Previous Article in Special Issue
Spatial–Temporal Data Imputation Model of Traffic Passenger Flow Based on Grid Division
 
 
Article
Peer-Review Record

Multi-Scale Massive Points Fast Clustering Based on Hierarchical Density Spanning Tree

ISPRS Int. J. Geo-Inf. 2023, 12(1), 24; https://doi.org/10.3390/ijgi12010024
by Song Chen 1, Fuhao Zhang 1, Zhiran Zhang 2,3, Siyi Yu 4, Agen Qiu 1, Shangqin Liu 1,5 and Xizhi Zhao 1,*
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
ISPRS Int. J. Geo-Inf. 2023, 12(1), 24; https://doi.org/10.3390/ijgi12010024
Submission received: 8 November 2022 / Revised: 18 December 2022 / Accepted: 12 January 2023 / Published: 14 January 2023
(This article belongs to the Special Issue GIS Software and Engineering for Big Data)

Round 1

Reviewer 1 Report

 This paper proposed a method called Multi-Scale Massive Points Fast Clustering based on Hierarchical Density Spanning Tree (MSCHT),but there are still two questions.

(1)Because MSCHT also has a tree-link structure, links are cut when the distance is longer than a specific threshold. How to determine this threshold.

(2) It is recommended to increase the comprehensive experiments on real data and synthetic data to verify the clustering effect, high efficiency and high scalability of the algorithm.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report

This research furnishes a method (MSCHT) to do fast clustering for massive points data in multi-scale maps. The methodology is clearly illustrated and analyzed. One question is :

Which density calculation method (RD, KNN, …) is better for MSCHT, or the selecting of density calculation strategy depends on the case studied?  It may be better if authors include related discussion into the current version.

Some details:

Line 46. “The higher the number of grids,…” -> The more the number of grids…

Line 62-63. “…based on this tree structure [10] and [11].” The main points of [11] that be referenced here should be briefly summarized.

Figure 1. typing error in “Cutting amd merging linked points”.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 3 Report

Scale effect is a key issue in spatial analysis, and is particularly remarkable in large volume of spatial datasets. Multi-scale clustering can yield full recognition of point patterns and the detection of hierarchical spatial structures. This paper proposes a multi-scale clustering algorithm based on hierarchical density spanning tree, namely MSCHT. It constructs a spanning tree according to the local density, and generates multi-scale clusters using different distance thresholds. The time efficiency and clustering quality have been demonstrated on two real-world datasets, i.e., building points in Beijing and landslide hazard sites in Pingwu County, Sichuan Province. However, there are too many spelling problems should be corrected, and the experiments seems unfair. I would to point out some detailed concerns and questions to this work.

 

Spelling Problem

(1) Please unify the format of abbreviations. The authors capitalize the first letter of the full name of each abbreviation in brackets, such as CFSFDP (Clustering by Fast Search and Find of Density Peaks) (Line 56, Page 2). However, they do not capitalize every word sometimes, like OPTICS (Ordering points to identify the clustering structure) (Line 104, Page 3). There are many more abbreviations like this. The authors had better to unify the format.

 (2) The full names of some abbreviations have not been introduced, such as STING (Line 112, Page 3) CLIQUEENCLUS (Line 123, Page 3), MST (Line 262, Page 7) and so on.

 (3) There are many misspelling words. “Web map clustering” should be “web map clustering” (Line 322, Page 10); “Leadlet-webmap” should be “Leaflet-webmap” (Table 2); “dc” in Eq. (3) should be corrected as “eps” since “dc” is the distance threshold. “eps” (Table 2) and “Eps” (Table 3), “k” (Line 98, Page 3) and “K” (Table 2 and 3) should be unified. The authors should check the spelling more carefully.

 (4) There is no “” in English. The special punctuation in Eq. (1) should be corrected as commas “,”.

 (5) “k” in K-means means the number of clusters (Line 98, Page 3), while it denotes the number of nearest neighbors (Line 222, Page6). The difference should be clarified in the manuscript.

 Other Problem

(1) Only the multi-scale clustering methods are introduced in the related work. Actually, there are many definitions of scale. Hence, the authors should expound what is a scale for non-domain readers, and present the significance of introducing a scale to clustering.

(2) The validation of clustering accuracy in Table 3 is insufficient. In an application scenario of multi-scale clustering, the clustering accuracy should be compared at each scale, but not only comparing the best results by parameter tuning. In other words, the ground truth had better to be multi-scale labels, might be obtained by manual annotation. Moreover, more challenging synthetic datasets, containing clusters with heterogeneous density, weak connectivity, island-shaped structures (Peng D. et al., 2022) and so on should be selected to assess the clustering accuracy of MSCHT.

 Reference: Peng, D., Gui, Z.*, Wang, D. et al. Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity. Nat. Commun. 13, 5455 (2022). https://www.nature.com/articles/s41467-022-33136-9.

 (3) The comparison of clustering speed in Table 2 is unfair, since K-means and DBSCAN have to be run again by adjusting the parameters at each scale, while MSCHT only changes the distance threshold to conduct the tree pruning. MSCHT should be compared with hierarchical (e.g., DIANA and AGNES) or grid-based (e.g., STING) clustering algorithms. Because they can transform the scale from the initial clustering results, and do not need calculate from scratch.

 (4) In addition, the runtime of MSCHT in Table 2 for the initial clustering is a little bit long. Hence, whether the current version is qualified for efficient clustering on large-scale dataset is doubtful. It should be considered to extend MSCHT to parallel versions using GPGPU and distributed computing techniques such as Apache Spark for performance acceleration.

 (5) How do the authors determine the eps and dc at each scale? The manuscript does not explain the criteria for setting parameters, and how to vary dc when transforming the scale. Some adaptive methods to specify the appropriate parameters according to the data distribution should be proposed.

 (6) The clustering results in Fig. 6 and 7 are interesting. However, we don’t know whether the clustering results are in accord with the patterns in real world or not. Some evidence should be presented to demonstrate that the multi-scale results make sense, such as, the initial clusters of building points are in line with the administrative divisions.

 

Author Response

Please see the attachment

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

 Accept in present form

Back to TopTop