Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Efficient Clustering Method for Graph Images Using Two-Stage Clustering Technique

Electronics 2025, 14(6), 1232; https://doi.org/10.3390/electronics14061232

by Hyuk-Gyu Park¹, Kwang-Seong Shin^2,*,† and Jong-Chan Kim^2,*,†

Reviewer 1:

Yan Wang

Reviewer 2:

Inácio Fonseca

Reviewer 3: Anonymous

Electronics 2025, 14(6), 1232; https://doi.org/10.3390/electronics14061232

Submission received: 21 January 2025 / Revised: 20 February 2025 / Accepted: 15 March 2025 / Published: 20 March 2025

(This article belongs to the Special Issue Pattern Recognition and Image Processing: Latest Advances and Prospects)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Reviewer's report:

Title: Efficient Clustering Method for Graph Images Using Two-Stage Clustering Technique

This paper presented and evaluated a two-stage clustering framework in which an initial (coarse) clustering stage was followed by a second (refinement) stage. Numerical examples demonstrated the efficiency of hybrid approach for clustering unstructured data in the form of graph images.

Some detailed comments are as follows:

1. Abstract: There is no specific content in the abstract.

2. How to choose optimal clustering method for two stages? Supplying some foundation or comparison experiments results will be more interesting.

3. Clustering has also been well handled by deep learning methods. What are the advantages of the proposed method?

4. How to verify the parameters estimated by the proposed method be optimal? Is there some theoretical foundation?

5. The paper illustrates the effectiveness of the proposed two-stage clustering technique by 8,118 graph images derived from angle-based depth. It would be more convincing if this can be demonstrated by more data from different fields. In addition, the discussion of computational complexity of the algorithm is advised.

6. Experiments demonstrated the proposed method’s results quantitatively only. Qualitative results and comparations with related methods are necessary.

7. Some errors, such as Lines 344-345, “Section ??”.

There might be more problems in the English and phrasing in your paper. You should check the entire paper carefully and correct all of them.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

1. Abstract: There is no specific content in the abstract.
The abstract, which was omitted during LaTeX editing, has been added to the revised manuscript.

2. How to choose optimal clustering method for two stages? Supplying some foundation or comparison experiments results will be more interesting.
We have added Section 3.5, "Two-Stage Clustering Method Configuration," to provide a detailed explanation of why K-means (or DBSCAN) is used in the first stage and why different techniques are employed in the second stage.

3. Clustering has also been well handled by deep learning methods. What are the advantages of the proposed method?
Section 2.2.3 has been revised to introduce deep learning-based clustering methods and explain their differences from our proposed approach.

4. How to verify the parameters estimated by the proposed method be optimal? Is there some theoretical foundation?
Section 4.1.4, "Parameter Setting and Sensitivity Analysis," has been added to explain that for DBSCAN, it is common practice to select ϵ at the 90-95th percentile after sorting the k-distance graph. We also explained that the k value was set between 3 and 6 to prevent over-segmentation and because this range was found to be experimentally optimal.

6. Experiments demonstrated the proposed method’s results quantitatively only. Qualitative results and comparations with related methods are necessary.
We have made the following additions to the manuscript:
In Chapter 5 (Conclusion), we have added supplementary content regarding the extensible application of data sources and potential areas for data utilization.
In Section 3.3.1, we have incorporated content addressing Computational Complexity.

7. Some errors, such as Lines 344-345, “Section ??”.
We have removed incorrect links and markings, and completed a comprehensive review of English expressions and errors throughout the manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

Thank you for your paper.

In general, the authors should carry out a careful review to remove anomalies that are considered basic in terms of formal writing and the quality of the images.

1- Please add your abstract to the PDF.
2- Improve the quality of Figure 2.
3- Lines 111, 132, 150, 157, 176, 182, 197, 238, 250, 256: remove the full stop.
4- Lines 147, 153, 163, 170, 172, 174, 179, 194, 244, 247: correct punctuation
5- Lines 156, 192: ‘based on distance- or density-’ --> ‘based on distance or density’. Why use "distance-based" or "density-based" and not "distance based" or "density based"? So in my opinion it should be left without the "-" , but that's up to the authors.
6- As a general rule, the ‘:’ in the enumeration does not space the last word. In other words, ‘Visualisation :’ --> ‘Visualisation:’ (Line 280). But as there are several examples with the space and it seems to be uniform, it's up to the authors to decide whether the space should be removed.
7- Line 305: ‘This chapter’ --> ‘This section’
8- Referencing: lines 344, 345,
9- Figures 3 and 5 are not referenced in the text.
10- Line 464: Figure number missing

Questions for the authors.
1- When you combine two methods, each with its own strengths and weaknesses, how do you ensure that combining the two results in the best aspects of each?
This question is important for the reader, and you should explain what considerations you have taken to ensure that the combined result is better.
The opposite can happen if you're not careful.

2- Consider detailing step 2 of Algorithm 1 (ProcessFolder) in an algorithm.
3- Consider detailing step 19 of Algorithm 1 (TwoStage) in an algorithm.

4- Algorithm 1, Step 2: What is G? Not used below, and F. None of the letters used in this algorithm are indicated in the text what they stand for. That's not allowed in science and maths. Anyone who knows Python can understand Lines 8, 11, 15, 16 in the Python technical documentation, but the fundamental aspects of the proposed new algorithm should be more properly detailed, in my opinion.
https://medium.com/@tarammullin/dbscan-parameter-estimation-ff8330e3a3bd
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN.fit_predict
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

5: Line 317-326: The preprocessing of the image until the features are obtained (line 324, formula) can and should, in my opinion, be documented in an algorithm, or a reference used if the method is described in another article.
6: Details on the ‘subsequent subdivision’ into two phases (KM->KM)? (lines 387-390), Why is 3 in the second-stage? not 4, 5 ...
7: Lines 463-467: dual k-means, in first state with 3 clusters and in each cluster you again apply a split into 3 clusters. How do the boundaries of the first 3 clusters look after stage2 if there are sub-clusters close to each other, for example cluster 3.1 close to cluster 2.3 (after stage1 and stage2)? In other words: do you foresee any extra processing if the performance metrics show values in line with this situation?
9: In single K-Means the number of cluster is 8, so shouldn't it be 9 to compare on an equal basis with dual stage k-Means?

Final:
Lines 556-560 This procedure is not very detailed, but is important for understanding.

Reference no. 24. This reference is part one of a study and I think you can reference part two:
Hoeser, T.; Bachofer, F.; Kuenzer, C. Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review—Part II: Applications. Remote Sens. 2020, 12, 3053. https://doi.org/10.3390/rs12183053

The references are balanced between "fundamental papers" - where the work is actually done, and "systematic review or survey" papers.
You have 21 references as "fundamental" papers and 9 references as "Systematic review or survey" papers.
All articles have citations according to google scholar.

Author Response

1- Please add your abstract to the PDF.
The abstract has been incorporated into the PDF file in this revision.

2- Improve the quality of Figure 2.
The resolution of Figure 2 has been increased and its readability has been improved in this revision.

3- Lines 111, 132, 150, 157, 176, 182, 197, 238, 250, 256: remove the full stop.
As requested, the modifications have been made.

4- Lines 147, 153, 163, 170, 172, 174, 179, 194, 244, 247: correct punctuation
As requested, the modifications have been made.

5- Lines 156, 192: ‘based on distance- or density-’ --> ‘based on distance or density’. Why use "distance-based" or "density-based" and not "distance based" or "density based"? So in my opinion it should be left without the "-" , but that's up to the authors.
As requested, the modifications have been made.

6- As a general rule, the ‘:’ in the enumeration does not space the last word. In other words, ‘Visualisation :’ --> ‘Visualisation:’ (Line 280). But as there are several examples with the space and it seems to be uniform, it's up to the authors to decide whether the space should be removed.
As requested, the modifications have been made.

7- Line 305: ‘This chapter’ --> ‘This section’
As requested, the modifications have been made.

8- Referencing: lines 344, 345,
As requested, the modifications have been made.

9- Figures 3 and 5 are not referenced in the text.
As requested, the modifications have been made.

10- Line 464: Figure number missing
As requested, the modifications have been made.

Questions for the authors.
1- When you combine two methods, each with its own strengths and weaknesses, how do you ensure that combining the two results in the best aspects of each?
This question is important for the reader, and you should explain what considerations you have taken to ensure that the combined result is better.
The opposite can happen if you're not careful.

-> Section 3.4, "Why Two-Stage Clustering? Strengths and Considerations," has been added to provide an explanation of our implemented measures.

2- Consider detailing step 2 of Algorithm 1 (ProcessFolder) in an algorithm.

-> The second stage (ProcessFolder) has been documented and explained as a separate algorithm.

3- Consider detailing step 19 of Algorithm 1 (TwoStage) in an algorithm.

-> It has been documented and explained as a separate Algorithm 3.

4- Algorithm 1, Step 2: What is G? Not used below, and F. None of the letters used in this algorithm are indicated in the text what they stand for. That's not allowed in science and maths. Anyone who knows Python can understand Lines 8, 11, 15, 16 in the Python technical documentation, but the fundamental aspects of the proposed new algorithm should be more properly detailed, in my opinion.
https://medium.com/@tarammullin/dbscan-parameter-estimation-ff8330e3a3bd
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN.fit_predict
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

-> We have added definitions and explanations for G and F.

5: Line 317-326: The preprocessing of the image until the features are obtained (line 324, formula) can and should, in my opinion, be documented in an algorithm, or a reference used if the method is described in another article.

-> The image preprocessing steps have been algorithmically formalized and explained in detail.

6: Details on the ‘subsequent subdivision’ into two phases (KM->KM)? (lines 387-390), Why is 3 in the second-stage? not 4, 5 ...

-> Section 3.6, "Rationale for Two-Stage K-means Clustering," has been added to explain the reasoning behind our approach.

7: Lines 463-467: dual k-means, in first state with 3 clusters and in each cluster you again apply a split into 3 clusters. How do the boundaries of the first 3 clusters look after stage2 if there are sub-clusters close to each other, for example cluster 3.1 close to cluster 2.3 (after stage1 and stage2)? In other words: do you foresee any extra processing if the performance metrics show values in line with this situation?

-> Section 4.1.5, "Post-Processing for Boundary Refinement," has been added with detailed explanations.

9: In single K-Means the number of cluster is 8, so shouldn't it be 9 to compare on an equal basis with dual stage k-Means?
-> We have explained that while single K-means produces 8 clusters, the two-stage K-means results in 9 clusters because in the first stage, 3 clusters are created, and then in the second stage, additional clustering generates a total of 9 final clusters.

Reviewer 3 Report

Comments and Suggestions for Authors

I have the following comments and concerns.

1. Lack of Novelty: While the study applies a two-stage clustering approach to graph image data, the concept of two-stage clustering is not entirely new. The paper could better emphasize what makes its approach significantly different from previous work.

2. Limited Benchmark Comparisons: The paper compares its method against single-stage K-means and DBSCAN clustering. To establish a more substantial baseline, it would be beneficial to include comparisons with other hybrid clustering techniques or graph neural network-based clustering models.

3. Poor organizations: This paper's presentations and organizations are bad. It does not even provide an abstract, which is unacceptable.

Author Response

------------------------------------------------------------------------------------------------------------------------------------------------

1. Lack of Novelty: While the study applies a two-stage clustering approach to graph image data, the concept of two-stage clustering is not entirely new. The paper could better emphasize what makes its approach significantly different from previous work.

As pointed out by the reviewers, while two-stage clustering itself is not a novel concept, this research proposes a specialized approach for effectively processing graph image data. Here are the key contributions that differentiate our research from existing studies:

1. Specialized Feature Extraction for Graph Image Data
While previous studies primarily focused on clustering structured data or general image data, our research enables more effective clustering by designing specialized feature vectors (F) that leverage graph structures.

2. A Practical Alternative to Deep Learning Using Traditional Methods
While many recent studies utilize deep learning-based approaches such as Graph Neural Networks (GNNs), deep learning models have disadvantages including requirements for high-end GPUs, large training datasets, and difficulties in model optimization. Our research presents a practical alternative using two-stage K-means based non-deep learning (Non-DL) techniques, offering lower computational costs and faster processing speeds.

3. Lightweight Clustering Model Applicable in Resource-Constrained Environments
While fast and lightweight clustering techniques are needed in industrial inspection, medical image analysis, and other fields, existing deep learning models are difficult to apply in these environments. Our research demonstrates that effective clustering is possible without deep learning models and proposes an efficient methodology that can be used even on low-specification devices.

Revision: To further emphasize the "differentiation from existing research," we have added content related to deep learning-based clustering techniques in the "Related Work" section and made efforts to establish the rationale for our research.

------------------------------------------------------------------------------------------------------------------------------------------------
2. Limited Benchmark Comparisons: The paper compares its method against single-stage K-means and DBSCAN clustering. To establish a more substantial baseline, it would be beneficial to include comparisons with other hybrid clustering techniques or graph neural network-based clustering models.

We deeply appreciate the reviewer's comments. The reason for comparing with K-means and DBSCAN in this study was based on computational efficiency and practical applicability, and we agree that the research credibility would be enhanced with more diverse experimental comparisons.
However, due to current experimental environment constraints and data processing limitations, we are unable to conduct additional benchmark experiments. As such, we respectfully request the opportunity to submit direct comparisons with hybrid clustering techniques and GNN-based clustering methods through subsequent research.

------------------------------------------------------------------------------------------------------------------------------------------------

3. Poor organizations: This paper's presentations and organizations are bad. It does not even provide an abstract, which is unacceptable.

We sincerely apologize for the inconvenience caused by LaTeX editing errors in the initial submission, which resulted in the omission of the abstract, paragraph markers, and various parameters. In this revised version, we have thoroughly reviewed and corrected these issues. We kindly request your favorable consideration of this improved manuscript. Thank you for your understanding.

------------------------------------------------------------------------------------------------------------------------------------------------

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The revised version has been improved a lot. However, the proposed method is a two-stage technique which combines the existed methods. The novelty is somewhat weak. In addition, more experiments are encouraged comparing with recent techniques, including deep learning.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Dear Reviewer,

First of all, I sincerely appreciate your in-depth and detailed review.

In response to the first review report, we have revised the manuscript to clarify why we employed a conventional, rather than a deep learning-based, clustering method. Specifically, we have added a discussion introducing deep learning-based clustering techniques and their advantages, while also elaborating on the limitations that necessitate the use of the proposed two-stage clustering approach in our study.

We fully agree with your insightful comments. However, incorporating additional experiments using deep learning as suggested would require fundamental changes to the overall idea and structure of the paper. Given these constraints, we plan to conduct further research on deep learning-based clustering techniques and submit a separate paper on this topic in the future.

We kindly ask for your understanding regarding this matter. Thank you once again for your valuable feedback.

Best regards,
Kwang-Seong, Shin

Reviewer 2 Report

Comments and Suggestions for Authors

Thank you for your work in improving the article.
My comments are always suggestions, which the authors are free to accept or not.
Corrections to be made: in line 360, 362, 418, 471, 530 ,641
Line 491: check that the (-1) belongs to the text

Author Response

Dear Reviewer,

First of all, I sincerely appreciate your in-depth and detailed review.

As per your suggestions, we have corrected the errors in lines 360, 362, 418, 471, 530, 641, and 491. Thank you for your meticulous efforts and valuable feedback.

Best regards,
Kwang-Seong Shin

Reviewer 3 Report

Comments and Suggestions for Authors

I do not have concerns anymore.

Author Response

Dear Reviewer,

I sincerely appreciate the time and effort you have dedicated to reviewing our manuscript. Your thorough and insightful comments have been incredibly valuable in improving the quality and clarity of our work.

Your detailed suggestions and careful examination of various aspects of the paper have helped us refine both the technical content and overall presentation. We are truly grateful for your keen attention to detail and constructive feedback, which have significantly contributed to enhancing the manuscript.

Thank you once again for your invaluable input and for sharing your expertise.

Best regards,
Kwang-Seong Shin

Article Menu

Efficient Clustering Method for Graph Images Using Two-Stage Clustering Technique

Further Information

Guidelines

MDPI Initiatives

Follow MDPI