Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

InfoFlow: A Distributed Algorithm to Detect Communities According to the Map Equation

Big Data Cogn. Comput. 2019, 3(3), 42; https://doi.org/10.3390/bdcc3030042

by Park K. Fung

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Big Data Cogn. Comput. 2019, 3(3), 42; https://doi.org/10.3390/bdcc3030042

Submission received: 25 April 2019 / Revised: 8 July 2019 / Accepted: 18 July 2019 / Published: 22 July 2019

Round 1

Reviewer 1 Report

The authors propose a community detection approach based on map equation with the aim to adapt InfoMap into distributed framework.

The proposed approach is interesting but there are some points that the authors should be clarify.

In the evaluation section more details should be provided about the obtained results using different metrics (NMI, Omega and so on). Furthermore, the authors should be compare the performance in terms of running time of InfoMap and InfoFlow.

Finally, I suggest to perform a linguistic revision.

Author Response

We thank the reviewer for his/her constructive feedback. Below, we give our feedback, interspersed with original reviewer comment:

(1) In the evaluation section more details should be provided about the obtained results using different metrics (NMI, Omega and so on). Furthermore, the authors should be compare the performance in terms of running time of InfoMap and InfoFlow.

Response: The results section is rewritten to clarify the performance metrics, in both text and table, in regards to NMI, Omega, and run time.

(2) Finally, I suggest to perform a linguistic revision.

Response: The manuscript is rewritten, with linguistic improvements.

Reviewer 2 Report

GENERAL OVERVIEW AND MAIN CONTRIBUTIONS:

The author of this paper combines big data technology and distributed computing with partition and clustering to develop a distributed community detection algorithm to handle big networks.

The main contributions of this paper are the following:

- It is developed a discrete mathematics to adapt InfoMap into distributed computing framework.

- It is developed the mathematics for a greedy algorithm, InfoFlow, which has logarithmic time complexity, compared to the linear complexity in InfoMap.

WEAKNESSES

The ideas of the paper are very interesting; however, there are some open questions:

- The proposed algorithm InFlow is explained in section 2.3. However, it will be interesting to have also the algorithm illustrated in pseudo-code and not only the mathematics explanation.

- The results section should also present the runtime for each of the approaches: InfoMap and InFlow. It should be also presented the numbers for CPU, Memory, and Disk. Another important information for the reader is to show the speedup of the proposed InFlow.

- The same suggestion also applies to Table 2.

- In would be also interesting to compare the results of the two algorithms using Apache Spark on multiples nodes, for example 5 and 10 nodes.

- Figure 2 and Figure 3 should be improved and presented in a different way, for example two or three in the same “line” in the page.

- The structure of the paper should be included at the end of Introduction.

- We also suggest the author to use “We” instead of “I” or the impersonal form during the text.

SOME TYPOS

- Line 30: “the network. [1-3]” should be “the network [1-3].”

- Line 83: “Error! No sequence specified.”

- The use of testbed should be harmonized in all the text: test bed, Test Bed, …

- …

In conclusion, the paper can be improved provided that the authors answer the above-mentioned questions and modify the paper according to the suggestions.

Author Response

We thank the reviewer for his/her constructive feedback. Below, we give our feedback, interspersed with original reviewer comment:

(1) The proposed algorithm InFlow is explained in section 2.3. However, it will be interesting to have also the algorithm illustrated in pseudo-code and not only the mathematics explanation.

Response: Section 2.3 is rewritten, so that the InfoFlow algorithm is explained in detail, as in InfoMap. In addition, the graphical illustration of InfoFlow is moved from Section 3 to Section 2.4, to clarify the algorithm even more.

(2) The results section should also present the runtime for each of the approaches: InfoMap and InFlow. It should be also presented the numbers for CPU, Memory, and Disk. Another important information for the reader is to show the speedup of the proposed InFlow.

Response: Section 3 is rewritten. The computing resource is clarified in the lines 248 - 252. Runtime performances are clarified in text and also in tables.

(3) The same suggestion also applies to Table 2.

Response: Done

(4) In would be also interesting to compare the results of the two algorithms using Apache Spark on multiples nodes, for example 5 and 10 nodes.

Response: The benchmarking in computer clusters is currently being performed and will be submitted in the future in an addendum.

(5) Figure 2 and Figure 3 should be improved and presented in a different way, for example two or three in the same “line” in the page.

Response: Done.

(6) The structure of the paper should be included at the end of Introduction.

Response: Done

(7) We also suggest the author to use “We” instead of “I” or the impersonal form during the text.

Response: Done

Reviewer 3 Report

Summary:

The author applied big data technology and distributed computing to handle the big network.

By combing two methods together, a map equation based distributed community detection algorithm was generated to solve big network issues. The results shows to have certain improvement over the time complexity and the accuracy after tested with benchmarks.

Pro:

+ The authors have notable contributions to the field.

+ Present an improve on time complexity (from linear to logarithmic) of proposed algorithm

Con:

- Some related works don’t convey the bigger picture of the presented work.

- Need to mention or compare to the popular cluster algorithm such as K-means.

- Need a discussion section to compare to other algorithms

- Why the algorithm builds on top of map equation and InfoMap? need more justifications.

The author may want to mention the outline of their further work.

Author Response

We thank the reviewer for his/her constructive feedback. Below, we give our feedback, interspersed with original reviewer comment:

(1) Some related works don’t convey the bigger picture of the presented work.

Response: The entire manuscript, in particular the introduction and discussion sections, are rewritten. In particular is stressed the context of related work and the comparison with other algorithms.

(2) Need to mention or compare to the popular cluster algorithm such as K-means.

Response: Different algorithms and context are now elaborated in the introductory section. Clustering and K-means algorithm are now mentioned in lines 48-53.

(3) Need a discussion section to compare to other algorithms

Response: Comparison with other works are now written in the introductory section in lines 69-80, and in the discussion section in lines 309 - 330.

(4) Why the algorithm builds on top of map equation and InfoMap? need more justifications.

Response: This is now elaborated in the introductory section, in lines 54-68.

(5) The author may want to mention the outline of their further work.

Response: Future directions are highlighted in the discussion section in lines 309 - 320.

Round 2

Reviewer 1 Report

I think the authors have successfully addressed my questions and comments in this revised version.

Reviewer 2 Report

In this second version, the author reformulates the entire paper, in particular the introduction and discussion sections, are rewritten.

The author has attending mainly my comments and suggestions, and the paper has improved.

Article Menu

InfoFlow: A Distributed Algorithm to Detect Communities According to the Map Equation

Further Information

Guidelines

MDPI Initiatives

Follow MDPI