Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Printed Edition

A printed edition of this Special Issue is available at MDPI Books....

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Use of Deep Learning to Improve the Computational Complexity of Reconstruction Algorithms in High Energy Physics

Appl. Sci. 2021, 11(23), 11467; https://doi.org/10.3390/app112311467

by Núria Valls Canudas^*

, Míriam Calvo Gómez^*

, Elisabet Golobardes Ribé^*

and Xavier Vilasis-Cardona^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2021, 11(23), 11467; https://doi.org/10.3390/app112311467

Submission received: 15 November 2021 / Revised: 28 November 2021 / Accepted: 1 December 2021 / Published: 3 December 2021

(This article belongs to the Special Issue Women in Artificial intelligence (AI))

Round 1

Reviewer 1 Report

I like this paper as it gives a clear description of the use of machine learning to replace an existing system. One connection that I think should be made in any revision is that there seems to be a connection between some of the networks that you use and "pooling layers" where a maximum value over some stencil is sought. I think that the authors should discuss this connection, if any, to their work.

Author Response

Dear Reviewer,

First of all many thanks for the comments on our manuscript.

Regarding the connection between pooling layers it is true that the behavior of a max pooling layer on a convolutional network resembles very much what we try to model on the first local maxima network. But there is a difference in the windows to which the operation is applied. On a pooling layer a kernel window is applied sequentially to all the possible position without analysing the same 'cell' twice. However, in our approach, we want to evaluate every single cell together with its neighbours. To achieve this we need a convolution behavior.

As far as I know pooling layers do not perform a convolution on the input data so it is not the exact behavior that we need in order to make the local maxima analysis. However, it is an interesting connection that we will further investigate.

A new version of the manuscript will be uploaded with some minor changes regarding some explanations cleared out and spell checking. If there is any other comment or you would like to discuss anything else we will be pleased to make another review iteration.

Reviewer 2 Report

This paper describes a new approach to perform reconstruction of calorimeter data from the LHCb experiment, using machine learning (neural networks) to emulate the traditional cellular automaton approach. The new algorithm is shown to reproduce the original LHCb reconstruction closely, while giving significantly improved timing performance for high-occupancy data. In light of the increasing complexity of event reconstruction in high luminosity environments which we will see during the HL-LHC era and beyond, this is important work. It is worthy of being published.

I don’t have any significant objections to the methods, or the results. However, in places the descriptions and definitions are unclear or ambiguous which makes it hard to follow exactly what was done, and interpret the results. As such, I provide a number of suggestions below which I hope the authors can implement. This should hopefully make the paper easier to follow, especially for non-experts.

General comments:

The results indicate very good performance in terms of resource use for very high-multiplicity environments. This is important, but just as important is to sustain (or even improve) the reconstruction performance. I see the study of this is limited to the ‘relative error’ given in Tables 3 and 4. Are there any more absolute metrics which can be used to assess performance? e.g. using MC truth information to label clusters as false positives, or false negatives (noise and inefficiency, respectively), or to account for possible merged clusters? With the information currently presented in the paper, it’s not clear if the proposed algorithm can perform at the required level to be considered for future use in collider experiments.

Some confusion is raised by the two different uses of ‘cell’ - in one case this refers to a repeating physical unit of the calorimeter. In the other it refers to the more abstract cell used in the CA. In fact the definition of the CA cell is never given for the specific context of calorimeter clustering - one might assume that this is taken to be the physical calorimeter cell itself, but this should be clearly stated. In fact, I would propose relabelling the calorimeter unit (from ‘cell’ to something else) so that every use of ‘cell’ in the paper is clear and unambiguous.

In places the description is obscured by some language issues, which I hope the editorial team can smooth out. Often these are small, but in places they do obscure the meaning. This includes mismatches in number (singular verbs on plural nouns and vice versa, mismatched pronouns), incorrect wording (e.g. this -> these, throughout -> throughput, loosing -> losing, hole -> whole, Montecarlo -> Monte Carlo), some unusual language (‘domain experts’, ‘what is willing to be reconstructed’, ‘major upgrade currently undergoing’, ‘determinant information’, ‘inciding particles’)

Section 1: Introduction

The description of the LHCb ECAL on L35-42 could be improved. While this is only the initial introduction, before more details are given in Sec 2, it is currently slightly confusing for a non-expert. For example, it’s not immediately obvious how the representation as a 2D grid is reconciled with the segmentation into three regions (perhaps it would be more better to first describe the segmentation, and then say that each of the three regions can be represented by a 2D grid). In fact, at this point the information on regions with different segmentations is probably not needed, and removing it will make the paper easier to follow. Such information can be moved to Sec 2.

Further, it’s assumed that the reader understands that the particles generally strike the calorimeter at a near-perpendicular angle (i.e that the orientation of the grid with respect to incident particles is obvious). Adding a few words to make this explicit would avoid ambiguity (notably, calorimeters are in general also segmented in the longitudinal direction).

Section 2: Data processing in HEP

The description of the LHCb ECAL on L85- could be improved. A figure would in fact be very helpful here, since the words do not clearly convey the structure and composition of the calorimeter.

It would be useful to say a bit more about typical cluster sizes, since this is relevant to the problem being solved. It is stated that a particle hitting the centre of a cell in the inner region will deposit all its energy in that cell, which implies that one might expect a maximum cluster size of ~4 for particles hitting a corner. Making this a bit more explicit would be helpful (i.e. what is the mean cluster size in this region? What is the variance/SD?).

Similarly, a statement about the occupancy in Run 2, and some tentative statements about how this will change in future, would be useful in the context of the work performed and results shared later.

Finally, one wonders how the performance of the clustering algorithm (and ECAL reconstruction in general) is evaluated. When presenting a new method it is important to demonstrate that it is fit for purpose, so describing the relevant performance metrics here would be useful. Presumably there are quantities like purity, efficiency, and split/merged cluster fractions which could be used.

L105-113: the discussion of the throughput and trigger could also be improved. The term ‘trigger’ is first used on L106 as a shorthand for the hardware trigger. Later it is used more generally for the overall system, so this should be made consistent. It’s also not clear what is meant by ‘the time constraints of the trigger throughput’ - either this is a throughput constraint or a timing constraint (or else something else which isn’t at present clear). The wording here should be revised for clarity.

Section 3.2: Local maxima formation

The definition & description of the CA ruleset could be improved. Firstly, the parameter ‘t’ is not defined. I suppose that this is the iteration index, but that should be made clear (also that t can only take values 0 and 1 for this particular algorithm).
Secondly, the function f() seems to represent the rule used to define the state (although that isn’t explicitly stated) but for the initial iteration the input state f(cij^0) is not well defined. Is f(cij^0) defined as the ‘input value’ of the cell? This would make sense but isn’t clear.

In fact, the function f(C_i,j^t+1) may be redundant: in all cases you can replace with C_i,j^t+1, since you present simple rules in Eq (1,2,3) which define this C^t+1 in terms of C^t.

L170-71: What does it mean that “This range of values [0-99] have been chosen taking into account the statistical number of ones on each sample given the ruleset.”

Eq (1): If two neighbouring calorimeter cells happen to share the same value, this would seem to indicate that both cells iterates to a state=0 at t=1. It would also imply that it is not possible for any adjacent cells to have state=1 at t=1. Is this really the designed behaviour? I suspect not - probably the ‘>’ should be ‘>=’ in Eq. (1) - this is suggested by the fact that figure 1 (top right) does show some adjacent cells with state=1 at t=1. Secondly, the prevalence of such occurrences will depend on the range within the random values are sampled, so a 0->99 range is not a realistic representation of a 0->4096 range as in the real LHCb calorimeter. Why not use this realistic scheme? Perhaps I misunderstand the method, in which case it would be useful for this to be more clearly described.

L177-: It’s not clear what samples are used for testing and training the neural network. This information should be added to the document. A scheme with random number generation is mentioned, but so is LHCb simulation data. Presumably the target sample is produced by passing the ‘raw’ data of calorimeter cell values through the CA. This process isn’t so clear at the moment (i.e. that the NN is being trained to emulate the CA).

Figure 1: What does ‘input’ and ‘output’ refer to here? By context I guess that the input is the map of calorimeter call ‘values’ (0-99, or 0-4096?) from either random generation (top) or LHCb simulation (bottom). Is the output from the CA or from the NN here? A useful comparison would be between the CA itself, and the NN defined to emulate it.

It would be useful to have a definition of the ‘accuracy’ of the neural network.

Section 3.3: Clustering

I don’t follow why there are 2004 states in this case. The text justifies this through the total number of calorimeter cells, and the fact that at most ¼ of these can be local maxima simultaneously. In a CA the state is a local property of the cell. Why should the number of possible states here depend on the global properties of the problem?

I would instead expect the number of states to be 3, based on the ruleset presented in Eq (2): {-1,0,+1}
Furthermore, dividing 6016 by four I obtain 1504, not 2004, so either I misunderstand the logic or there is a problem in calculating this number on L189.

In Eq (2), what is Ktag, and how does it differ from K? Similarly, what is a ‘tag’ here? In a CA there is one and only one attribute of a cell at a given time increment t: the state. If by ‘tag’, you mean ‘state’, please use the usual nomenclature. Similarly, the text also mentions ‘labels’ which is presumably referring to the neural network. The authors should be clear and careful with their terminology to avoid confusion.

The text in L199-204 is hard to follow, and it’s not clear what encoding problem this refers to. If this is relevant to the method or results, it should be described in more detail. If not, then probably the text can be simplified to avoid touching on tangential matters which are not relevant.

Section 3.4: Clustering and Overlap Formulation

L220-225 is again hard to follow (the justification and procedure to use a 7x7 window). What is a ‘5x5 convolution’? What does it mean that an energy fraction belongs to a cell? This needs to be rewritten so a reader can follow exactly what was done, and why. I think figure 2 could be used here, since it includes both the 7x7 and two 5x5 windows. Presumably the 7x7 window ensures that any overlaps in the neighbourhood of the cell under test can be found and a 5x5 window framed around each one. The language is quite loose (e.g. what is the ‘given cell’? what is the ‘central cell’? what does the ‘predicted value’ refer to?)

On L239 (central cluster information) we learn that the output of the 7x7 window is a single state allocated to the central cell in the window. This state is either 0 or 1, but it’s not clear what the rule is for assigning this state. Related: the first option in the ruleset is used ‘if is central cluster’ - does this refer to a state=1 from this information stream? Again, this isn’t clear and should be described more fully.
From Table 2 it is apparent that a central cluster (case 1) is not simply a cluster with no overlaps (since that is a separate row, case 2, in the table). Is a central cluster one without any neighbours at all in the 3x3 vicinity?

The ruleset is presented in a quite complicated way here. In reality the steps are quite simple, so it would be useful to have these described verbally first, e.g.

‘Central clusters’ are defined as cells with no overlaps (or similar?) - i.e. there are no cells in the 3x3 window which overlap with another local maximum.
For other cases, the state at step t+1 is calculated such that only a fraction of the contribution of overlap cells at step t contributes.

Then equation (3) makes more sense.

Is the third rule in Eq (3) necessary? This seems like a special case of the first rule, i.e. c_i,j^t+1 = c_i,j^t

L275: One of the training samples is augmented by rotating the calorimeter coordinate system by 90,180,270 degrees. Since the rulesets are symmetric with respect to this rotation I don’t see how this leads to a larger training sample - aren’t the four rotations 100% correlated? Is the correlation broken when you choose a subset of 30000 7x7 windows, rather than running over the full set of windows? How are these 30k windows chosen?

It would be insightful to report the performance (table 3) separately for the six different cases being considered, which are quite different in their complexity. I ask the authors to consider this option, or explain why it is not possible/meaningful.

Section 4 (results)

The relative error is given in Table 3, which relates the energy reported by the ‘original LHCb algorithm’ to that from the new approach. I suppose this is the (energy difference)/energy, but it would be useful to spell that out.

Later (Table 4) different metrics are quoted - namely the ‘mean of relative error’ and ‘STD of relative error’. Are these the same as the metric from Table 3? If so, the authors should use a common terminology. If they differ, this should be explained.

Further, the row labels in Table 4 are ambiguous. What is ‘original (python)’ here? Previously the label ‘original’ has been used to label the official LHCb algorithm from which the energy difference is computed for the new approach, but that can’t be the case here (it would give an energy difference of zero by construction). The caption should clearly explain what the two rows refer to.

On L297, the text is ambiguous: ‘a version of this algorithm’ - which algorithm? Two are discussed in the previous sentence.

L321-325. The motivation for executing the three regions in parallel is not clear. What is the case here? In particular, I don’t follow this sentence: “the highest curve is settled by the outer region reconstruction which cuts over 11% of the events in the iterative curve”
What does it mean to settle a curve? What is the iterative curve? Where does the 11% come from? (and what does it mean for it to be cut by reconstruction?) I recommend that this be reworded for clarity.

Author Response

Dear Reviewer,

First of all many thanks for the dedication you put on the reviewing of this manuscript. All the comments were really valuable and have helped us improve the article a lot.

Almost every comment has implied some changes on the manuscript. Below I provide a detailed reply on each of them.

In case any of the comments or changes are not clear enough or there is anything that needs further discussion I will be pleased to review it again.

General comments:

On the results given in the article regarding relative error the selection of MC particles was taking into account the same selection conditions as the current algorithm in LHCb. We are also working on the integration of this algorithm into the LHCb framework so that many more efficiency tests can be made once it is finished.

This has been cleared on the text and specified that 'readout cell' refers to the calorimeter and 'cell' refers to the CA.

A deep revision has been made regarding language expressions.

Section 1: Introduction

Included in sections 1 and 2.

Included in section 2.

Section 2: Data processing in HEP

The description of the LHCb ECAL on L85- could be improved. A figure would in fact be very helpful here, since the words do not clearly convey the structure and composition of the calorimeter.

A schematic view of the ECAL has been included.

Included in section 2 with a reference to a study on the cluster sizes.

Similarly, a statement about the occupancy in Run 2, and some tentative statements about how this will change in future, would be useful in the context of the work performed and results shared later.

Included in section 2.

A sentence in section 2 has been included, however a better description of the performance metrics is done in section 4.

It has been re-written.

Section 3.2: Local maxima formation

Added and creared out.

In fact, the function f(C_i,j^t+1) may be redundant: in all cases you can replace with C_i,j^t+1, since you present simple rules in Eq (1,2,3) which define this C^t+1 in terms of C^t.

Taken into account for all the equations.

L170-71: What does it mean that “This range of values [0-99] have been chosen taking into account the statistical number of ones on each sample given the ruleset.”

Re-written.

The first part has already been corrected. About the range of values chosen, it has been re-written and also clarified with more detail as a footnote. However to answer your question, the number of occurrences does not need to be the same as in the simulation samples since the network used in this case is convolutional with a 3 by 3 kernel. Hence the evaluation of the individual cells will be in groups of 3 by 3 not as a whole image and the network should be able to learn the 'ruleset patterns' with independence on its occurrence. I hope this is more clear now.

More information regarding the training and testing data has been included.

Clarified on the text and on the figure caption. We have also added a fifth image regarding the 'expected output' of the network, which is what we compare in order to give an accuracy on the networks output.

It would be useful to have a definition of the ‘accuracy’ of the neural network.

Included.

Section 3.3: Clustering

Thanks to your comments and suggestions we have realized that as you said the formulation of this step can be simplified a lot. The explanations regarding this have been improved accordingly also.

Section 3.4: Clustering and Overlap Formulation

We have tried to explain more in detail the process of windowing for clarification.

This has been clarified a lot. About table 2 there was a mistake on the description of the datasets that has been corrected. We have also added a column evaluating the RMSE on each of the cases (since the network is trained as a regressor this is the metric chosen to evaluate the performance).

On the definition of the number of states we realised the range of values from 0 to 4096 are on ADC counts. Since all the calorimeter data samples we work on are expressed as energy (MeVs) we have updated this number to the equivalent energy range of 10240 (2.5 MeVs of gain per ADC count).

The ruleset is presented in a quite complicated way here. In reality the steps are quite simple, so it would be useful to have these described verbally first, e.g.

‘Central clusters’ are defined as cells with no overlaps (or similar?) - i.e. there are no cells in the 3x3 window which overlap with another local maximum.
For other cases, the state at step t+1 is calculated such that only a fraction of the contribution of overlap cells at step t contributes.

Then equation (3) makes more sense.

It has been clarified.

Is the third rule in Eq (3) necessary? This seems like a special case of the first rule, i.e. c_i,j^t+1 = c_i,j^t

True again. It has been corrected.

It is true that the rotation samples are correlated. However we used this technique as a pure data augmentation in order to have a balanced dataset on all the cases mentioned before.

Done. Also the metric given to evaluate the network in table 3 has been changed to RMSE for consistency with table 2 and because it was redundant with information given in table 4.

Section 4 (results)

It has been clarified.

On L297, the text is ambiguous: ‘a version of this algorithm’ - which algorithm? Two are discussed in the previous sentence.

It has been clarified.

This has been re-written for clarification.

Article Menu

Printed Edition

Use of Deep Learning to Improve the Computational Complexity of Reconstruction Algorithms in High Energy Physics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI