A Scintillation Hodoscope for Measuring the Flux of Cosmic Ray Muons at the Tien Shan High Mountain Station
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors
The Authors present an interesting study on the performance of a underground muon detector sensitive to the rate of EAS muonic component. The detector, based on an hodoscopic ensamble of scintillating tiles, is able to perform a pure measurement of rate, disentangling the integrated energy measurement obtained with other facilities at the observatory.
Overall the study is clean, although the manuscript in my opinion would benefit some clarifications in each section. In the following, I list my remarks and suggestions divided per sections.
Introduction:
- My only concern about this section, besides some minor text editing, regards the references. They are often grouped at the end of paragraphs. I think the clarity of the chapter would benefit a more punctual arrangement of the citations after the concept they refer to.
Hardware description
- The description of the instrument is largely demanded to Figure 1, but some informations are missing: what is the vertical distance between upper and lower tier of the hodoscope? How does it affect the geometrical acceptance of the observatory?
- What is the reason of the particular arrangement of tiles in upper tier? Is the "hole" between tile 4 and tile 23 coincident with the position of the lower tier?
- Can you give some more details on how the (-27...-10)us time window for the memorization of the time series was chosen?
- Figure2 is probably meant to show both the typical pulse shapes and the graphical interface of the data-taking program. Even though the second aspect may be informative for readers (in this case the caption or the text should include a brief description of the available features), I think that the first one should have the priority, maybe with a dedicated figure with proper label and axes. What is represented as a vertical red line in the right panel?
- Again in Figure2, I see some important variability in the pulse shapes, with some pulses showing an importan positive component. Since these feature may be important for the ML discriminator described after, I would add a brief description of the factors affecting these shapes
Data Processing
- Acronym naming the different ML algorithm used are defined only in the caption of Figure 3. Please define them in the main text
- The discussion of the effect of the tg parameter on the overall performances of the discriminators is demanded to Table1. I think this deserves a dedicated discussion in the text, or even a graphical comparison between the ROC curves of the same algorithm with different tg
- Figure 3 label is confusing, I assume that it is not a rate [sec^-1] but a percentage?
- The different ML alghoritms need to be at least breafly described with references and/or a discussion in the text
- How was the 80-20 partition of the data set into training/test sample decided? with such asymmetry, is there any risk of overfitting?
- I assume that the working point (i.e. the cut in the chosen ML discriminator) is chosen to obtain <1% false positive. This is not clearly stated in the text and/or in the Figure3
- It is not clear to me the meaning of "feature" in the discussion regarding Figure4. Why lower tg seems to be connected with less features? please clarify in the text.
- In the last sentence of the chapter it is stated that online discrimination of true pulses is achievable with the described technique. Are there references (similar techniques used online in other similar experiments) supporting that statement?
Data Processing
- Figure 5 left panel. I would cite the high significance of the signal with respect to the fluctuations of the backround visible in the plot
- Figure 5 right panel shows a 3 gaussian fit from wich is obtained the contribution of different amplitudes (i.e. different numbers of "contemporary" muons). Anyway, I see that the first peak seems to be a little bit shifted f wrt the distribtions. How the systematic uncertainties of this fit affect the final estimation?
- In line 251-252 it is said that the baseline is calculated as average inside a 2tg interval around the peak. Before it was said that tg is rather short (0.5-1 usec). Doesn't this bias the signal?
- I feel like that the last sentence of 4.1 and the first of 4.2 need to be supported by a reference.
- Figure 6. Can you comment errorbars
- Energy of the knee: "PeV"->"eV"?
Conclusions
- Energy of the knee: "PeV"->"eV"?
Minor text editing suggestions
Abstract
- analogue -> analogic?
Introduction
- I suggest to break sentence in in lines 58-60 in two
Hardware
- "total amount of rel. particles" -> total number of
- "middle"-> center
- "pos tion"-> position
- Caption figure 2: "prepared ready" -> pre-processed
- "a pair most optimum combinations seems to be" -> "the combination that maximizes the performance is.."
Data processing
- "Estimating" -> The estimation
- "Remarkable, that" -> It is remarkable that
Author Response
Open Review
(x) I would not like to sign my review report
( ) I would like to sign my review report
Quality of English Language
( ) The English could be improved to more clearly express the research.
(x) The English is fine and does not require any improvement.
Yes Can be improved Must be improved Not applicable
Does the introduction provide sufficient background and include all relevant references?
( ) ( ) (x) ( )
Is the research design appropriate?
(x) ( ) ( ) ( )
Are the methods adequately described?
( ) (x) ( ) ( )
Are the results clearly presented?
( ) (x) ( ) ( )
Are the conclusions supported by the results?
(x) ( ) ( ) ( )
Are all figures and tables clear and well-presented?
( ) ( ) (x) ( )
Comments and Suggestions for Authors
The Authors present an interesting study on the performance of a underground muon detector sensitive to the rate of EAS muonic component. The detector, based on an hodoscopic ensamble of scintillating tiles, is able to perform a pure measurement of rate, disentangling the integrated energy measurement obtained with other facilities at the observatory.
Overall the study is clean, although the manuscript in my opinion would benefit some clarifications in each section. In the following, I list my remarks and suggestions divided per sections.
Introduction:
1. My only concern about this section, besides some minor text editing, regards the references. They are often grouped at the end of paragraphs. I think the clarity of the chapter would benefit a more punctual arrangement of the citations after the concept they refer to.
++++ ANSWER:
The content of Introduction was modified accordingly to the remark.
Hardware description
2. The description of the instrument is largely demanded to Figure 1, but some informations are missing: what is the vertical distance between upper and lower tier of the hodoscope? How does it affect the geometrical acceptance of the observatory?
++++ ANSWER:
Presently that distance is of about 0.7m only, but the dimensions of the underground room permit to increase it later up to (2.5-3)m. The whole installation now remains at the stage of active design and construction, so it seems rather prematurely to report too specific details of its current configuration. As mentioned in the article, the total amount of scintillators still may be further increased, including both the number and disposition of the detectors in the lower tier.
The goal of the present work mainly was to elucidate the principal possibility to apply the modern machine learning algorithms to automated investigation of the muons in cosmic rays with the method of scintillation hodoscope, and to design a complete program toolchain applicable for practical using in this task.
3. What is the reason of the particular arrangement of tiles in upper tier? Is the "hole" between tile 4 and tile 23 coincident with the position of the lower tier?
++++ ANSWER:
This is a consequence of inner particularities of the underground room. Namely, in the area of the "hole" there occurs the end of a ladder leading from the room's floor to the upper tier.
Again, since the final configuration of the hodoscope detectors presently is not completely clear, it seems inappropriate to report too excessive details in the present publication.
4. Can you give some more details on how the (-27...-10)us time window for the memorization of the time series was chosen?
Generally, the trigger signal arrives to the registration point in the underground room with a delay of about (20-23)us. This includes both the time necessary for an electric pulse generated by the trigger system of the surface shower installation to achieve the underground room over the cable line, and a ~10us long delay deliberately inserted before generation of the trigger pulse to ensure proper memorizing the amplitude of the analog signals at the output of all detectors of shower particles before their digitization. Because of various random factors, the arrival time of the trigger in different EAS events may vary in the limits +-(3-5)us. So, the borders of the time window for keeping the oscillograms of the hodoscope signal were (experimentally!) selected by such a way, to ensure hitting the signal of shower front particles approximately in the middle of that window in any case.
There is a doubt if such detailed explanation is necessary in the article, but it may inserted there.
5. Figure2 is probably meant to show both the typical pulse shapes and the graphical interface of the data-taking program. Even though the second aspect may be informative for readers (in this case the caption or the text should include a brief description of the available features), I think that the first one should have the priority, maybe with a dedicated figure with proper label and axes. What is represented as a vertical red line in the right panel?
6. Again in Figure2, I see some important variability in the pulse shapes, with some pulses showing an important positive component. Since these feature may be important for the ML discriminator described after, I would add a brief description of the factors affecting these shapes
++++ ANSWERS to 5 and 6 :
Some comments concerning the two screenshot pictures in Figure 2, as well as on corresponding program facility designed for on-line check of the hodoscope data, were put in proper places of Section 2 and Section 3.
In particular, in the paragraph started with the words "Left frame of Figure 2 illustrates the outlook..." it is discussed the difference between the oscillograms of detector signal registered in the same EAS event, which may proceed both from the situation specific for concrete EAS event (say, unique distribution of the muon flux density over the hodoscope detectors) and permanent peculiarity of the features of every detector (such as the scintillator properties, parameters of the electronic circuitry of signal transfer, matching quality of the signal cable line, intensity of electric interference on the cable, \etc). As stated there, it is the diversity of a multitude of random or badly controlled parameters that makes appropriate application of machine learning technique for the analysis of the information acquired from such installation.
Data Processing
7. Acronym naming the different ML algorithm used are defined only in the caption of Figure 3. Please define them in the main text
++++ ANSWER:
All acronyms were introduced in first paragraph of Section 3, just after first mentioning of corresponding classifiers.
8. The discussion of the effect of the tg parameter on the overall performances of the discriminators is demanded to Table1. I think this deserves a dedicated discussion in the text, or even a graphical comparison between the ROC curves of the same algorithm with different tg.
++++ ANSWER:
In practice, preliminary selection of the duration of time gate was made experimentally as probing various values of the $t_g$ parameter over small pieces of labelled dataset with immediate evaluation of the efficiency of used classifier models. Prospects of each variant were estimated as a tradeoff between rising amount of useful information at larger $t_g$ (since the number of $i(t)$ counts accessible to classifier growths with widening the time gate), which is good, and simultaneous intensification of the influence of random noise and parasite oscillations on stability of classifier operation, which is bad. Then, the four values shown in Table 1 were selected as most perspective for detailed investigation.
Such explanation, in a shorten form, was added to the text.
As follows from Table 1, almost all combinations 'time gate/classifier type' ensure comparable operation quality in the case of considered task (may be, with exception of the SGD model), so the data presented there seem quite sufficient for selection an optimum combination.
9. Figure 3 label is confusing, I assume that it is not a rate [sec^-1] but a percentage?
++++ ANSWER:
The labels in the plots of that figure were changed as recommended.
10. The different ML alghoritms need to be at least briefly described with references and/or a discussion in the text
++++ ANSWER:
A short description of operation principle of the classifier models used in the work was included in beginning of Section 3.
11. How was the 80-20 partition of the data set into training/test sample decided? with such asymmetry, is there any risk of overfitting?
++++ ANSWER:
This was made in accordance with a recommendation found in Ref.[33]. As I understand from that book, it is a rather standard decision.
12. I assume that the working point (i.e. the cut in the chosen ML discriminator) is chosen to obtain <1% false positive. This is not clearly stated in the text and/or in the Figure3
++++ ANSWER:
Final selection of the variant of operation algorithm, with explanation of its reasons, is made at discussion of the detailed numerical data presented in Table 1 (in the paragraph starting with "Detailed results on verification of binary classifiers...". Figure 3 is only an auxiliary illustration of that selection.
13. It is not clear to me the meaning of "feature" in the discussion regarding Figure4. Why lower tg seems to be connected with less features? please clarify in the text.
++++ ANSWER:
As explained in Section 3, a bulk of the features used by the classifiers in the considered task constitutes the massive of amplitude values which fall inside the gate time around the global minimum of the waveform oscillogram. The total amount of these values, and, consequently, the length of the used features set, depends on duration of the time gate $t_g$: the longer $t_g$, the more features are used.
A reference to this explanation was added in discussion of Figure 4.
14. In the last sentence of the chapter it is stated that online discrimination of true pulses is achievable with the described technique. Are there references (similar techniques used online in other similar experiments) supporting that statement?
++++ ANSWER:
A few milliseconds long processing time of hodoscope data in every EAS event is negligible in comparison with the registration rate of the EASs at the Tien Shan installation, typically 3-5 events per minute. So, the real time operation is possible, indeed.
Similar explanation was added to the text.
Data Processing
15. Figure 5 left panel. I would cite the high significance of the signal with respect to the fluctuations of the background visible in the plot
++++ ANSWER:
Numerical estimates of the standard deviation of random oscillations (~10), the amplitude of the useful pulsed signal (~140), and the signal-to-noise ratio (~14) were explicitly given in the text.
16. Figure 5 right panel shows a 3 gaussian fit from which is obtained the contribution of different amplitudes (i.e. different numbers of "contemporary" muons). Anyway, I see that the first peak seems to be a little bit shifted f wrt the distributions. How the systematic uncertainties of this fit affect the final estimation?
++++ ANSWER:
Seemingly, the displacement of the first peak in this concrete case is caused by a relative excess of low-amplitude pulses with A<150, which may result, e.g., from incomplete scintillator passage by the particles following slanted trajectories near the side of scintillator block. In any case, the relative value of this displacement does not exceed a single estimate of its standard deviation. Supposingly, in full-sized experiment such effect should not influence essentially the estimate of particle density in EAS, since even the random fluctuations of the latter between the detectors would be higher because of low average number of muon particles passing each detector. If nevertheless it would, a corresponding correction may be deduced later on, say, through simulation of the whole registration process of EAS particles.
17. In line 251-252 it is said that the baseline is calculated as average inside a 2tg interval around the peak. Before it was said that tg is rather short (0.5-1 usec). Doesn't this bias the signal?
++++ ANSWER:
In contrary, restraining the baseline calculation area within a rather tight surrounding of the considered pulse signal permits to avoid possible bias due to random interferences and noise oscillations which could be present in other parts of the same oscillogram. So, such limitation was made deliberately.
18. I feel like that the last sentence of 4.1 and the first of 4.2 need to be supported by a reference.
++++ ANSWER:
These statements are based on the lateral distribution functions of muon density in EAS measured in the early experiments at the Tien Shan station and discussed, in particular, in Ref. [1]. Such explanation and citing that reference was added in appropriate places of both sentences.
19. Figure 6. Can you comment errorbars
++++ ANSWER:
The vertical error bars in the plot mean the standard deviation of the estimates of amplitude $A$ obtained in the EAS events which belong to particular $N_e/R$ combination; the horizontal bars indicate the width of the intervals over the distance parameter $R$.
This explanation was added to the text.
20. Energy of the knee: "PeV"->"eV"?
++++ CORRECTED.
Conclusions
21. Energy of the knee: "PeV"->"eV"?
++++ CORRECTED.
22. Minor text editing suggestions
Abstract
analogue -> analogic?
Introduction
I suggest to break sentence in in lines 58-60 in two
Hardware
"total amount of rel. particles" -> total number of
"middle"-> center
"pos tion"-> position
Caption figure 2: "prepared ready" -> pre-processed
"a pair most optimum combinations seems to be" -> "the combination that maximizes the performance is.."
Data processing
"Estimating" -> The estimation
"Remarkable, that" -> It is remarkable that
++++ ANSWER:
everything mentioned above was fixed; much thanks for the corrections.
Reviewer 2 Report
Comments and Suggestions for Authors
Please find all the comments in the review report.
Comments for author File: Comments.pdf
Author Response
Open Review
(x) I would not like to sign my review report
( ) I would like to sign my review report
Quality of English Language
(x) The English could be improved to more clearly express the research.
( ) The English is fine and does not require any improvement.
Yes Can be improved Must be improved Not applicable
Does the introduction provide sufficient background and include all relevant references?
(x) ( ) ( ) ( )
Is the research design appropriate?
( ) (x) ( ) ( )
Are the methods adequately described?
( ) ( ) (x) ( )
Are the results clearly presented?
( ) (x) ( ) ( )
Are the conclusions supported by the results?
( ) (x) ( ) ( )
Are all figures and tables clear and well-presented?
( ) (x) ( ) ( )
Comments and Suggestions for Authors
Please find all the comments in the review report.
peer-review-48069577.v2.pdf
Submission Date
22 June 2025
Date of this review
27 Jun 2025 06:35:46
The authors present the development of a new underground scintillation hodoscope for measuring muon flux at the Tien Shan High Mountain Cosmic Ray Station. The setup is novel, and the authors implement a machine learning-based pulse identification method for high-efficiency data processing. The paper is well-structured in general, but there are important issues on language, clarity, and technical methodology that need revision before publication.
Major comments:
1. Testing and validation details are not described in this manuscript. Were these performed?
2. The hodoscope described in the manuscript is time-series data that exhibits time dependencies.
Did you take special care when splitting the dataset into train, validation and test sets?
3. Since extracting train and test sets from overlapping time windows may cause data leakage, please clarify the extraction method.
++++ ANSWERS to 1, 2, and 3:
Though not being completely sure if I correctly grasp these questions, I'll try to answer them in measure of my understanding.
Although the input data in the considered task are the amplitude distributions of a time series kind, depending on a time argument indeed, every such distribution does represent a single element (instance) of the data from the viewpoint of machine learning algorithms. Consequently, preparation of every such element was reduced to extraction of a few of necessary features: a set of normalized amplitudes around the position of supposed peak candidate and several numbers characterizing the overall waveform shape.
So, I do not see any "overlapping time windows" in the statement of such task. May be, my understanding is wrong?
Then, a relatively small subset of the elements (uniformly distributed, nevertheless, over the whole period of data taking, to avoid possible systematic influence of any slowly varying factors) was labelled manually to constitute a training dataset.
As recommended in Ref. [34], selection of the training and testing subsets from the common massive of labelled elements (previously shuffled over the period of data taking, as recommended, again) was made using the standard 'train_test_split' facility from the Scikit-Learn library, in a hope that its authors did take into account all subtle nuances.
Then the training and verification of what's returned by the classification algorithms were performed, again with application of corresponding Scikit-Learn procedures. The results of this analysis are presented in Section 3.
In the article's text, a short remark concerning preparation of the training/tested datasets in the paragraph "For evaluation of the automated pulse searching program..." of Section 3, now was changed there to a more detailed explanation.
4. Clarify the normalization method mentioned in line 147-using the min/max values across the entire time range causes data leakage.
++++ ANSWER:
The bulk of the features used by the classifiers consists of a massive of the amplitude values i(t) which fall inside the gate time around the global minimum in the waveform oscillogram. Normalization of these features within the [0,1] interval is quite adequate to the task. Two additional features which characterize the overall shape of the signal, the coefficients of skew and kurtosis, are calculated over the complete original waveform distribution I(t), before any cutting or normalization were applied to it.
Though similar remark has been just present in the text, now more detailed explanation was put there for better clearness.
5. Were any of the hyperparameters tuned for best performance?
++++ ANSWER:
No, all the considered classifier models were using the default values of their hyperparameters in-built in the Scikit-Learn library (this remark was added in the text). As the presented study shows, these defaults ensure both sufficient quality of the results obtained and admissible operation speed of the algorithms.
Editorial comments:
...
++++ ANSWER:
All listed misspellings and language mistakes were fixed as recommended. Much thanks for these remarks.
Figure 2: Please make the legends and labels larger and clearer.
++++ ANSWER:
This figure shows two screenshot pictures of the window of a program facility designed for on-line check of the hodoscope data. To make them clearer, both frames in Figure 2 were somewhat enlarged, also textual comments concerning that program were added in proper places of Section 2 and Section 3.
Round 2
Reviewer 1 Report
Comments and Suggestions for Authors
I am in general satisfied with the answers and the modifications done to the manuscript that definitely improved the clarity and the accessibility of the study, with the exeception of my question number 11 of my precedent report, regarding the training of the ML alghoritm in the classification of bad/good pulses.
After reading the manuscript in the current form, I am even more strongly convinced that this study would benefit from a deeper description of the training methodology. I suggest to clarify better and quantitatively the difference between "good" and "bad" events in the training sample and include a figure showing typical examples of the two, and even also an example of mis-identified pulse (false positive and/or false negative in the test sample) in order to allow the reader to apreciate the robustness of the ML approach for this application.
Author Response
Open Review
(x) I would not like to sign my review report
( ) I would like to sign my review report
Quality of English Language
( ) The English could be improved to more clearly express the research.
(x) The English is fine and does not require any improvement.
Yes Can be improved Must be improved Not applicable
Does the introduction provide sufficient background and include all relevant references?
(x) ( ) ( ) ( )
Is the research design appropriate?
(x) ( ) ( ) ( )
Are the methods adequately described?
( ) ( ) (x) ( )
Are the results clearly presented?
( ) (x) ( ) ( )
Are the conclusions supported by the results?
(x) ( ) ( ) ( )
Are all figures and tables clear and well-presented?
(x) ( ) ( ) ( )
Comments and Suggestions for Authors
I am in general satisfied with the answers and the modifications done to the manuscript that definitely improved the clarity and the accessibility of the study, with the exeception of my question number 11 of my precedent report, regarding the training of the ML alghoritm in the classification of bad/good pulses.
After reading the manuscript in the current form, I am even more strongly convinced that this study would benefit from a deeper description of the training methodology. I suggest to clarify better and quantitatively the difference between "good" and "bad" events in the training sample and include a figure showing typical examples of the two, and even also an example of mis-identified pulse (false positive and/or false negative in the test sample) in order to allow the reader to apreciate the robustness of the ML approach for this application.
++++ ANSWER:
As practical experience has shown, the oscillograms which contain a scintillation pulse and those without can be effectively differentiated by their outer look, particularly after normalization described in Section 3.Correspondingly, the manual labeling of the training dataset proceeded simply through examination at the waveforms kept in the database and hitting a corresponding button 'Yes' or 'No' (which, nevertheless, demanded developing for convenience of a special visual program illustrated by Figure 2). A sample of typical oscillograms agreeably marked in this way is now shown in Figure 3, as illustration of the sufficient reliability of such approach.
Reviewer 2 Report
Comments and Suggestions for Authors
Many thanks to authors for taking into account my comments. The paper looks quite good, and I recommend publication after these comments are addressed.
Suggestion:
- For training and testing split, please also refer to the TimeSeriesSplit feature in Scikit-learn.
Editorial comments:
Line 21: “a phenomenon was discovered of an accelerated rise of the energy deposit left at interaction of the muon particles”->“a phenomenon of an accelerated rise of the energy deposit left at interaction of the muon particles was discovered”
Line 32:” As known”->”As is known”
Line 35: “As such badly understood observations”->”Among such poorly understood observations”
Line 43: “and are described in detail”->“are described in detail”
Line 43: rephrase the sentence to make it clear “today occurred in trend with the so-called “muon deficit” problem actively discussed in latter”
Line 49: “did steady reported”->”steadily reported”
Author Response
Comments and Suggestions for Authors
Many thanks to authors for taking into account my comments. The paper looks quite good, and I recommend publication after these comments are addressed.
Suggestion:
For training and testing split, please also refer to the TimeSeriesSplit feature in Scikit-learn.
++++ ANSWER:
Presently, proper utilities from the Scikit-learn library are used for all standard operations over the data. In particular, for shuffling and splitting the labeled dataset before evaluation of the training procedure, the'sklearn.model_selection.train_test_split' facility is applied.
Also, although the input data in the considered task are amplitude distributions of a time argument, every such distribution is a single atomic element of data from the viewpoint of machine learning algorithms represented by a simple tuple of numerals. So I suppose, that using of more sophisticated algorithm is superfluous at time, and may be postponed to further studies, when we would be working with the time series type data indeed. For example, in the planned searching for anomalous patterns in the time series of the environmental radiation background at Tien Shan mountains.
Editorial comments:
......
++++ ANSWER:
All suggested corrections were applied; much thanks.