Next Article in Journal
Splashback Radius in a Spherical Collapse Model
Previous Article in Journal
Cubic–Quartic Optical Soliton Perturbation for Fokas–Lenells Equation with Power Law by Semi-Inverse Variation
 
 
Article
Peer-Review Record

A Preliminary Study of Large Scale Pulsar Candidate Sifting Based on Parallel Hybrid Clustering

Universe 2022, 8(9), 461; https://doi.org/10.3390/universe8090461
by Zhi Ma 1, Zi-Yi You 1,*, Ying Liu 1, Shi-Jun Dang 1, Dan-Dan Zhang 1, Ru-Shuang Zhao 1,2, Pei Wang 2, Si-Yao Li 3 and Ai-Jun Dong 1
Reviewer 2:
Universe 2022, 8(9), 461; https://doi.org/10.3390/universe8090461
Submission received: 27 July 2022 / Revised: 31 August 2022 / Accepted: 1 September 2022 / Published: 5 September 2022
(This article belongs to the Section Astroinformatics and Astrostatistics)

Round 1

Reviewer 1 Report

Referee Opinion on the paper Ma, Z.; You, Z.-Y.; Liu, Y.; Dang, S.-J.; Zhang, D.-D.; Zhao, R.-S.; Wang, P.; Li, S.-Y.; Dong, A.-J. “A preliminary study of large-scale pulsar candidate sifting based on parallel hybrid clustering.” Universe 186186

 

               The paper Ma, Z.; You, Z.-Y.; Liu, Y.; Dang, S.-J.; Zhang, D.-D.; Zhao, R.-S.; Wang, P.; Li, S.-Y.; Dong, A.-J. “A preliminary study of large-scale pulsar candidate sifting based on parallel hybrid clustering” is connected to the problem of study of large-scale pulsar candidate. The problem analyzed in the paper is interesting and   important and it seems that the results obtained in the work also may be interesting and important.                Unfortunately, the paper is extremely vague and incomprehensible to readers - even professional astrophysicists. As a result, it can only be understood by experts working on the exact subject. As a result, the paper is not suitable for publication in its current form and requires major revision before it can be considered for publication.   The more important points are following: 1)       Section 1.  Relation between pulsar candidate selecting and clustering problem, as well as all relation of this section to main subject of the paper is unclear. 2)      Section 2.  Section contains list of the previous papers, but it is not explained what ideas of this methods is and, what is even worse it was not explain what the result was obtained in the base of previous methods. Table 1 is interesting but also unclear.  Moreover, it refers to the PHCAL method what is discussed later in the work (Section 3.3) which makes this paragraph additionally difficult to understand 3)      Section 3. This section contains description of the method. Unfortunately, this section is also very unclear and looks like a part of manual for the program, not like scientific article. The entire chapter must be rewritten, and both the motivations of the individual stages and the description of the individual steps must be thoroughly explained 4)      Section4. In this section authors present result. They are seeming to by interesting and important, but also this section is very hard to read and understand. This whole chapter looks as if the authors want to present too much in too short a paper. As a result, the individual results are presented too briefly and without explaining where and how they get the individual results, and without discussing their significance.

Author Response

   Thank you very much for your comments on our paper. We have revised our paper carefully and clearly according to your comments as follows.

1) Section 1. Relation between pulsar candidate selecting and clustering problem, as well as all relation of this section to main subject of the paper is unclear.

Answer: To give readers a clear understanding of the relation between pulsar candidate selecting and clustering problem and relation of section 1 to main subject of the paper, we have added new content and revised the related sentence in section1. This paper focuses on the pulsar candidate selection for a filtering system. So far, existing pulsar candidate selection methods can be divided into three categories based on the principles of the methods, i.e. traditional scoring methods, Machine Learning (ML) methods based classifiers and Deep Learning (DL) based diagnostic plots recognition models. However, in the actual calculation of pulsar surveys, there is an extremely imbalanced proportion between pulsar and non pulsar samples, and most of the input data sets are unlabeled. As a result, a large number of training samples for the methods (including candidate signal classifiers based on supervised ML and diagnostic subplots recognition models based on supervised DL) result in a tremendous workload. Therefore, facing the digestion and mining of a large number of pulsar candidates, a semi-supervised Parallel Hybrid Clustering Analyzer (PHCAL) was presented to select extremely imbalanced pulsar-like samples from large-scale candidate signals. The experimental results show that the PHCAL algorithm can ensure the efficiency of pulsar candidate sifting. Furthermore, it can cluster more meaningful classifications not only binary classification so that it will promote the discovery of special pulsars that could be outliers. The revised sentences in section 1 are highlighted in red.

2) Section 2. Section contains list of the previous papers, but it is not explained what ideas of this methods is and, what is even worse it was not explain what the result was obtained in the base of previous methods. Table 1 is interesting but also unclear. Moreover, it refers to the PHCAL method what is discussed later in the work (Section 3.3) which makes this paragraph additionally difficult to understand.

Answer: To describe the motivations and challenges of this work more clearly, the section 2 “Related Works” were deleted and the original content in section 2 was inserted into the section 1 “Introduction” and section 3 (now becomes section 2) “The Method” respectively. The section 2.1 “Idea of Hybrid Clustering” was added to explained the idea of hybrid clustering. In the section 2.1, Table 1 lists the advantages and disadvantages of several different kinds of artificial intelligence methods for pulsar candidate selection. As can be seem in Table 1, the idea of hybrid clustering is to effectively combine the advantages of aforementioned different clustering algorithms , which could be more suitable for the multiple shapes of data distribution. So, it can further ensure the depth and stability of data mining for pulsar candidate signals, compared with other kinds of methods. Similarly, the goal of hybrid clustering scheme of PHCAL is to combine the clustering idea based on density hierarchy and partition, which will bring a lot of benefits to the clustering for large number of pulsar candidates. The related revised sentences are highlighted in red.

3) Section 3. This section contains description of the method. Unfortunately, this section is also very unclear and looks like a part of manual for the program, not like scientific article. The entire chapter must be rewritten, and both the motivations of the individual stages and the description of the individual steps must be thoroughly explained.

Answer: In order to make the description of the innovation of the technical method more clear and the description of the individual steps thoroughly explained, the content organization of the section 3 (now becomes section 2) “The Method” has been adjusted to “2.1 Idea of Hybrid Clustering” , ”2.2 The Hybrid Clustering Scheme” , “2.3 Data Partition Strategy for Parallelization” ,“2.4 A Spark-based Parallelization Model” and “2.5 Time Complexity Analysis”. Moreover, the entire chapter has been rewritten, and highlighted in red (for related revised sentences).

4) Section4. In this section authors present result. They are seeming to by interesting and important, but also this section is very hard to read and understand. This whole chapter looks as if the authors want to present too much in too short a paper. As a result, the individual results are presented too briefly and without explaining where and how they get the individual results, and without discussing their significance.

Answer: In section 4 (now becomes section 3) “Experiments and Results”, the overall performance analysis of PHCAL includes two aspects: clustering effect test in section 3.5 and running time evaluation in section 3.6. Note that, the experimental data sets and data preprocessing are described in section 3.1 and 3.2, the performance metrics choice and parameters setting are described in section 3.3 and 3.4, and the hardware and software conditions are described in section 3.5. Due to the original chapter “3.6 Time Complexity Analysis ” involves theoretical analysis of the PHCAL, which is not a step in the experiments, so this section was transferred to “2.5 Time Complexity Analysis” and the related sentences were revised. In addition, some sentences in section 3.5 and 3.6 have been revised to make the individual experimental results more understood.

  Furthermore, the section “4 Conclusions” has been rewritten to further emphasize the experiment results and their significance. In this paper, a parallel hybrid clustering algorithm for large scale pulsar candidate sifting, PHCAL, is proposed. The experimental results show that the PHCAL can excellently identify the pulsars with high performance (Precision and Recall) on both HTRU2 and AOD-FAST. Meanwhile, the running time on both data sets is significantly reduced compared with its serial execution mode. Although PHCAL is proved to be feasible, it is just a preliminary research result. In a word, our proposed approach has provided theoretical and practical references for sifting a large number of candidate signals observed by the advanced telescopes e.g. FAST.

  Thank you for your comments, which have guiding significance for our paper writing . If our revised manuscript has any problem, please don’t hesitate to inform us. Your kind help will be my pleasure.

Yours Sincerely

Corresponding author: Ziyi You

2022-08-29

Author Response File: Author Response.pdf

Reviewer 2 Report

Review of "A preliminary study of large scale pulsar candidate sifting based on parallel hybrid clustering" by Ma et al.

 

This paper report a new proposal of a parallel pulsar candidate sifting algorithm base on semi-supervised clustering in order to solve the problem of data mining of large number of pulsar data of Five-hundred-meter Aperture Spherical radio Telescope. Judging from the analysis results of the two data sets TTRU and AOD-FAST, the authors concluded that the proposed algorithm provides a feasible idea for astronomical data mining of FAST observation. Because this paper is clearly and logically written, I have no major comments on this manuscript. However, there are a lot of typos in this manuscript. The authors should improve the manuscript by checking it carefully. After doing it, the manuscript should be accepted to MDPI journal.

I listed the improvement points below.

1. Line 7: "...)2..." should be "...) 2..."

2. Line 39: "...show that, the PHCAL..." should be "...show that the PHCAL..."

3. "...the conclusion is..." " should be "...the conclusions are..."

4. Lines 47-83: artifificial --> artificial

                        classififier --> classifier 

                        profifile --> profile

    Please check these words especially.

5. Line 91. "...several extension of..." should be "...several extensions of...".

6. Line 99. "...perform single a step to...." should be "...perform a single step to....".

7. Line 105. "...same cluster which contains..." should be "...same cluster that contains...".

8. Line 220. "...sample of this..." should be "...sample of  this..."

Author Response

   Thank you very much for your comments on our paper. We have revised our paper carefully and clearly according to your comments as follows.

1) Line 7: "...)2..." should be "...) 2..."

Answer: has been revised.

2) Line 39: "...show that, the PHCAL..." should be "...show that the PHCAL..."

Answer: has been revised.

3) "...the conclusion is..." " should be "...the conclusions are..."

Answer: has been revised.

4) Lines 47-83: artifificial --> artificial

                  classififier --> classifier 

                  profifile --> profile

Please check these words especially.

Answer: have been revised.

5) Line 91. "...several extension of..." should be "...several extensions of...".

Answer: has been revised.

6) Line 99. "...perform single a step to...." should be "...perform a single step to....".

Answer: has been revised.

7) Line 105. "...same cluster which contains..." should be "...same cluster that contains...".

Answer: has been revised.

8) Line 220. "...sample ofthis..." should be "...sample of  this..."

Answer: has been revised.

    Thank you for your comments, which have guiding significance for our paper writing . If our revised manuscript has any problem, please don’t hesitate to inform us. Your kind help will be my pleasure.

 

Yours Sincerely

Corresponding author: Ziyi You

2022-08-29

Round 2

Reviewer 1 Report

Second Referee Opinion on the paper Ma, Z.; You, Z.-Y.; Liu, Y.; Dang, S.-J.; Zhang, D.-D.; Zhao, R.-S.; Wang, P.; Li, S.-Y.; Dong, A.-J. “A preliminary study of large-scale pulsar candidate sifting based on parallel hybrid clustering.” Universe 1861860

               The present version of the paper Ma, Z.; You, Z.-Y.; Liu, Y.; Dang, S.-J.; Zhang, D.-D.; Zhao, R.-S.; Wang, P.; Li, S.-Y.; Dong, A.-J. “A preliminary study of large-scale pulsar candidate sifting based on parallel hybrid clustering” is much better and understandable. For readers who are not experts in this field, it is still difficult to read, but the paper already meets the conditions for publication, and due to the difficulty and complexity of the issues under consideration, its further correction would be difficult and take long time. For this reason, I recommend accepting the paper for publication in its current form.  
Back to TopTop