Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Machine Learning Approach to Shield Optimization at Muon Collider

Particles 2025, 8(1), 25; https://doi.org/10.3390/particles8010025

by Luca Castelli^1,2

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Particles 2025, 8(1), 25; https://doi.org/10.3390/particles8010025

Submission received: 27 January 2025 / Revised: 19 February 2025 / Accepted: 26 February 2025 / Published: 3 March 2025

(This article belongs to the Special Issue Selected Papers from the 4th MODE Workshop on Differentiable Programming for Experiment Design)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

While overall the paper is fine, and particularly the first sections are well written, I'm wondering why there is quite a lack of details in sections 3 and 4. Particularly, in section 3.2 there are no references or explanations of current studies of ML optimization for detectors (I would expect at least a reference to MODE). Also, details are missing in several parts of this section (e.g. what is the loss function used? Is the convergence reached after how many epochs? Time for training and evaluation? ...). Finally, the final section should be more comprehensive and include a general discussion of the purposes and the results obtained.

In the following, I will list some line-by-line comments:

l5: physics 5 performance tungsten -> physics 5 performance, tungsten
l10: in -> for
l21: Nozzle -> nozzle
l25: Nozzle -> nozzle's
l31: already declared BIB
l32: why?
Figure 1: missing reference
Figure 2: missing reference
l49: the two step simulation -> a two-step simulation process
Table 1: Electron/Positron -> Electrons/Positrons
l64: already defined BIB (same l75)
Figure 3: missing reference to parameters in the text
l85: IP already defined
l119: use BIB
l119-124: I don't understand this point, what do you want to prove with Figure 7?
l145: missing .
l150-157: is this design feasible to build? can you put some considerations on this?

Comments on the Quality of English Language

Check for clarity and phrasing in some parts, overall the English Language is good

Author Response

While overall the paper is fine, and particularly the first sections are well written, I'm wondering why there is quite a lack of details in sections 3 and 4.

Particularly, in section 3.2 there are no references or explanations of current studies of ML optimization for detectors (I would expect at least a reference to MODE).

The detector optimization studies are not correlated with the studies presented in this proceeding. They follow a different approach based on surrogate models that reproduce a physical object (like a photon) and BIB effects on the calorimeter in order to optimize the parameters of the calorimeter, while this proceeding focused on a "step before". In the sense that I don't look at what is happening in the detector, I just try to minimize the background which goes inside. Of course this is simplified, because I am moving toward a more inclusive observable as mention in section 4.

I am currently in contact with other people from MODE in order to develop the method described in section 4. Added the citation.

Also, details are missing in several parts of this section (e.g. what is the loss function used? Is the convergence reached after how many epochs? Time for training and evaluation? ...).

Added some details on the XGBoost model used to identify the optimal geometry in table 2 and the time (200 ms) at the end of the paragraph. For the Random Forest I did few try after having already achieved good results with XGB. Since I didn't obtained anything close, I discarded it, but I thought it was worth mentioning it. For the DNN instead I did several try changing the architecture, the epocs, the lr and so on. In this case I didn't achieve any results, in the sense that the algorithm was not able to produce a Delta around 0%.

Finally, the final section should be more comprehensive and include a general discussion of the purposes and the results obtained.

Added a conclusion section to summarize the work focusing on the purpose, the results and the next steps.

l5: physics 5 performance tungsten -> physics 5 performance, tungsten

done

l10: in -> for

done

l21: Nozzle -> nozzle

done

l25: Nozzle -> nozzle's

done

l31: already declared BIB

l32: why?

it causes high occupancy and background noise, preventing precise track reconstruction and degrading overall physics performance.

Figure 1: missing reference

left: added, right: my picture used for this paper and couple of presentation, not other papers

Figure 2: missing reference

added

l49: the two step simulation -> a two-step simulation process

done

Table 1: Electron/Positron -> Electrons/Positrons

done

l64: already defined BIB (same l75)

Figure 3: missing reference to parameters in the text

added

l85: IP already defined

l119: use BIB

l119-124: I don't understand this point, what do you want to prove with Figure 7?

That low statistics simulation are still reliable

l145: missing .

done

l150-157: is this design feasible to build? can you put some considerations on this?

There is no current study concerning the building of the objects for two main reason. The first one is that the sustain structure depends on the final geometry since we are moving towards the direction of saving several portion of tungsten, but to be honest, the main one is that we are two people currently working on the nozzle design, I am working for the 3 TeV, the other guy for the 10 TeV. However we think that the design is feasible, but challenging.

Reviewer 2 Report

Comments and Suggestions for Authors

This paper reports the methods and results of the geometry optimization procedure of the nozzles envisaged for the experimental apparatus for a future Muon Collider.
The feature of the Muon colliders is the background induced by the muon beam decays. The electrons produced in this way interact with beam pipe, leading to a dense cloud of particles that reach the detector. In order to mitigate the impact of such particle, tungsten nozzles surrounding the beam pipe have been proposed. The study presented by the author is focused on the optimization of the design and material of such nozzles. The novelty of the study consist in the methodology adopted, based on machine learning techniques. The results obtained in this way are compared with a standard approach based on Fluka simulations. The paper is well written and easily readable; the goals are illustrated clearly and the methodology is described adequately. Further details on the implementation Machine learning models used could enrich the paper. The results, that demonstrate the effectiveness of this novel approach, are reported in a clear way.

L16: "intrinsic instability" could be replaced with "limited lifetime even in a highly boosted regime" to better underline the problem.

L21, L54: "Nozzle" -->"nozzle". Capitol "N" could be removed, since in everywhere elese you will always use small "n"

L23, L28, L72: "TeV" or "TeV" (italic)? Please fix a convention and write it always in the same way.

L23: It would be good to add a sentence explaining the motivations for the 3 and 10 TeV stages, to underline the importance of your study.

L52, L55: is it "scored" or "stored"?

L84-86: it is not cleare which are the results that demonstrates that the optimization of the last part is critical. Could you please elaborate a bit, by adding some numbers, tables, examples or plots that supports this sentence?

Fig. 4: why the statistical error bar on the mu+ is so high?

Fig. 5: in the printed version of the paper the numbers on X and Y axis as well as the axis titles are too small. Please increse the font size. In the caption you can use "located" instead of "concentrated" to avoid repetition.

L98: ML--> Machine Learning (ML)

L116: 0.02% of a bx: does it mean that you are using ~100 muon decays to train your model? Are these decays sufficient to avoid overtrainig?

L117: it is not clear what do you mean by elements of the dataset. What are these elements? Are they different sets of particle hits convoluted with different geometries?

L114: what are the ML algorithms that you mention here? are the ones mentioned in L132-134? If yes you could add a sentence like "the details of the models will be given later" or divide this long section in 3 subsections like

data preparation (3.2.1)
Model architecture and optimization (3.2.2)
Results and application (3.2.3)

You can also consider to have

section 3: Nozzle optimization with Fluka
section 4: Nozzle optimization with Machine Learning

L120 and Fig7: the photon energy spectrum of the BIB with 1.6% decays simulated, is obtained with Fluka? if yes, it would be good to mention it explicitly in the text and, the caption and also in the legend for the blue curve.

Fig7: the word "Pipeline" appears for the first time. Could you please add a little bit of context in the text or just change it to something like "ML prediction" in the legend?

Fig7: do you have the same plot with the pipeline trained with the same geometry used in the Two Step, as a validation of your pipeline model?

Formula 1: which are the geometries of Flux_simand Flux_pred?from what you say it seems that Fluxsim is the one generated by fluka with low stat (what is low stat? 104 or 102 muons) and Flux_predis the one obtained with the geometry predicted by the ML. Could you please add more details?

L141-144: it would be good to describe first the figure of merrit used to quantify the performance of the model (L142-143) and then the results for the best configuration.

L150-151: please specify also in the text if these numbers refers to the readout window.

L151, concerning the increase of the neutron flux: actually this could be a problem if you are considering a silicon based technology for the inner tracker, given the possible radiation damage induced by thermal neutrons. Can you elaborate a bit on this point (i.e. did you consider to include the effect of the detector radiation damage in the pipeline) ?

Aftere the references there an "Appendix A" without text.

Author Response

L16: "intrinsic instability" could be replaced with "limited lifetime even in a highly boosted regime" to better underline the problem.

Changed

L21, L54: "Nozzle" -->"nozzle". Capitol "N" could be removed, since in everywhere elese you will always use small "n"

Changed

L23, L28, L72: "TeV" or "TeV" (italic)? Please fix a convention and write it always in the same way.

Italic adopted everywhere

L23: It would be good to add a sentence explaining the motivations for the 3 and 10 TeV stages, to underline the importance of your study.

Added the following sentence:

however the current project aims to reach $\sqrt{s} = 10\ TeV$ with a first stage operating at $\sqrt{s} = 3\ TeV$, therefore optimizations for these center of mass energies are required.

L52, L55: is it "scored" or "stored"?

Usually we use the term "scoring plane" in FLUKA when we identify the surface that determines the output saving of particles that pass through it, therefor "score". However "scored" and "stored" in this context has the exact same meaning. Changed to stored to be more clear.

Added a comparison between two configuration and results on BIB that show that reducing the tip and increasing the bottom part increase the BIB, while increasing the tip reduces the BIB despite reducing the base radius. Removed figure 3 and pointed at the new figure for the statistics.

Fig. 4: why the statistical error bar on the mu+ is so high?

Because the simulation produced very low particles. The error is assumed poissonian, i.e sqrt(N). When scaled to the expected flux per bunch crossing, the error scaled as in figure.

Done

L98: ML--> Machine Learning (ML)

Done

L116: 0.02% of a bx: does it mean that you are using ~100 muon decays to train your model? Are these decays sufficient to avoid overtrainig?

Actually I made a mistake when stated I use 10^4 for 1.6% bunch crossing. It's actually 2x10^5, to be precise, 2636 decays 10 cycle * 8 spawn (using fluka terms, but it actually means 2636 decays times 80 parallel processes). Therefor I considered 1000 decays.

L117: it is not clear what do you mean by elements of the dataset. What are these elements? Are they different sets of particle hits convoluted with different geometries?

An element of the dataset is a set of geometrical parameters that define the geometry of the nozzles (features) and the BIB flux (target). Specified in text

L114: what are the ML algorithms that you mention here? are the ones mentioned in L132-134? If yes you could add a sentence like "the details of the models will be given later"

Done

or divide this long section in 3 subsections like

data preparation (3.2.1)
Model architecture and optimization (3.2.2)
Results and application (3.2.3)

You can also consider to have

section 3: Nozzle optimization with Fluka
section 4: Nozzle optimization with Machine Learning

Divided in 3 subsection

Added clarification in text and caption and in the legend (see next comment for more)

Fig7: the word "Pipeline" appears for the first time. Could you please add a little bit of context in the text or just change it to something like "ML prediction" in the legend?

Fig 7 shows both FLUKA simulation, not ML results. The point is to show that even using much less statistics, the results are still reliable.

Fig7: do you have the same plot with the pipeline trained with the same geometry used in the Two Step, as a validation of your pipeline model?

As above. Actually I never predict the full energy distribution with ML methods, just the integrated BIB flux

I defined, when mentioned, that 1.6% is high statistics, 0.02 is low. I specified in 3.2.1 how the dataset has been diveded for train and test.

L141-144: it would be good to describe first the figure of merrit used to quantify the performance of the model (L142-143) and then the results for the best configuration.

Delta distribution. Actually XGBoost was the only model able to produce a distribution peaked around 0. Std Scaler slightly reduced the width of the distribution.

L150-151: please specify also in the text if these numbers refers to the readout window.

Done

The atlas technical report for silicon pixel sensor forseen a 131x10^14 1-MeV-n-eq/cm^2 (https://hal.science/hal-03978880) for HL-LHC, while at Muon Collider is expeced to be order 1x10^14 cm^2/year (https://arxiv.org/pdf/2303.08533). Therefore, considering existing technologies, we are quite safe and a factor 2 on the neutron flux should not be a problem, therefore I don't plan to include it in the analysis.

Aftere the references there an "Appendix A" without text.

Removed

Reviewer 3 Report

Comments and Suggestions for Authors

The paper includes an interesting suggestion what kind of the background will be anticipated in a future muon collider experiment. The contents of the paper will be useful in the design stage of the muon collider experiment Therefore, this paper should be published in Particles. However, the referee notices a few points to be added in this paper. The author should provide the answers on these points.

1. line 79-80. The number of samples used in the simulation. Was it 10^4 muons or 6.35x10^5 muons?

2. Figures 4 and 8, you have listed up the main background are γ n, e… What are the origins of these background? Please write them clearly in the paper. Are they coming from the muon beam or induced in the shield surrounding th collision point?

3. In figure 4, you have provided possible flux of the background. For the soft gamma-rays, you have provided this number as 3x10^7 per beam crossing. How many muons are passing in this case? Please provide definite muon number of the beam.

4. line 138-139, on Eq(1). What is the flux of predicted? No definition is given.

5. Figure 8. Why the outside shield bend at r=-100cm? What is the reason? Please write it in the paper.

Comments: why don’t you add some future prospect on this project? Actually, if we can measure the multi muons appearing the outside thick shield maybe we can do a good job. If we can select muons higher than 40 GeV. Therefore not only th design of the collision point, but also the estimation of the shield thickness, preventing the penetration of the hadron background will be very important task for the simulation. The inner detector of the beam pipe may only act the separation of the collision time. Don’t you have any plan to extend your simulation including total thickness of the shield?

Author Response

line 79-80.The number of samples used in the simulation. Was it 10^4 muons or 6.35x10^5 muons?

The number is indeed wrong, to be precise it's 2.1x10^5, 2632 decays per simulation, 80 parallel simulations

Figures 4 and 8, you have listed up the main background are γ n, e…What are the origins of these background? Please write them clearly in the paper. Are they coming from the muon beam or induced in the shield surrounding th collision point?

Line 17 in the introduction mentions that BIB arise from interaction between decay products and the machine. Added clarification in lines 31-33

In figure 4, you have provided possible flux of the background.For the soft gamma-rays, you have provided this number as3x10^7 per beam crossing. How many muons are passing in this case? Please provide definite muon number of the beam.

Muons per beam added in line 31

line 138-139, on Eq(1).What is the flux of predicted? No definition is given.

Defined right before the equation

Figure 8.Why the outside shield bend at r=-100cm? What is the reason? Please write it in the paper.

Explained in line 43-44, since the bending at 1 m was already present in the original design.

I am not sure I fully understood this comment. Concerning the muons, study are ongoing to assess the feasibility of instrumenting the nozzles themself or to but a detector in between the final focusing magnet to tag forward muons which cross the nozzles and doesn't enter the detector. Added the mention and reference to the study in the conclusion

Therefore not only th design of the collision point, but also the estimation of the shield thickness, preventing the penetration of the hadron background will be very important task for the simulation. The inner detector of the beam pipe may only act the separation of the collision time. Don’t you have any plan to extend your simulation including total thickness of the shield?

I am not sure what do you mean. The full simulation including the shielding described in this proceeding has already been done. For example, in this paper https://arxiv.org/abs/2405.19314 we asses the performance on several higgs-related parameters.

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

From page1 to page 5 I recognized the author's effort,(line 1-line127) so they could read without any problem. OK.

However, I require you a more detailed explanation on the lines 136-164.

1. Especially the relation between 0.02% decay and the flux of 1.3x10^4 written in the caption of Figure 7. Detail explanations is needed.

Is it mean that the 1.3x10^4 low statistic corresponds to 0.02%? Or is the beam intensity greater, but you have collided 0.02% of them in your simulation? Or did you use only 0.02% product as current sample? Please clarify them.

How did you set the flux_prediction in line 157? Is it only average of the samples? Not defined. (I pointed this out last time)

Under line 179 Figure 10.

The condition of the simulation that created this figure should be written in the figure caption more detail. The collision energy is also missing in the caption.

In short, the main background is a few MeV soft gamma rays. So we may be possible to eliminate them by an intelligent detector.

Author Response

1. Especially the relation between 0.02% decay and the flux of 1.3x10^4 written in the caption of Figure 7. Detail explanations is needed.

I removed from the caption the number because is misleading. 1.3x10^4 is the number of low stat simulations done. Each low stat sim. has a different geometry, as stated in line 131/132. In line 130 I specified that in each simulation were generated 0.02% of the total expected decays.

How did you set the flux_prediction in line 157? Is it only average of the samples? Not defined. (I pointed this out last time)

Flux is the target variable of the models. The predicted flux is the output of ML model given the geometrical parameters as input. I tried to stress it out in line 135 by adding an equation to define it and lines 136-137 explains it.

Under line 179 Figure 10.

Improved the description

The condition of the simulation that created this figure should be written in the figure caption more detail. The collision energy is also missing in the caption.

Added the sqrt{s} and rephrased to enlighten the statistics used and the geometries considered.

In short, the main background is a few MeV soft gamma rays. So we may be possible to eliminate them by an intelligent detector.

I do not fully agree with this statement. It is indeed true that the main component of the background is made by soft gammas, however the impact on the physics performance is not straightforward. In section 4 of the ref. [1] (https://link.springer.com/article/10.1140/epjc/s10052-023-11889-x) are presented all the studies concerning the detector performance comparing object reconstruction with and without BIB. I added the citation in line 145 where I mention this problematic. One of the most critical aspects of the BIB is the impact on the first two layers of the vertex detector due to e+/e-, therefore, even if they are a small fraction of the BIB flux, their impact is relevant. I am currently studying how the respective particle species should be weight in order to optimize the nozzle (mentioned in lines 200-208).

If you think it could be useful I can expand the discussion in line 36-39.

Article Menu

Machine Learning Approach to Shield Optimization at Muon Collider

Further Information

Guidelines

MDPI Initiatives

Follow MDPI