Identifying D Mesons from Radiative W Decays at the Large Hadron Collider

: In this paper, we present two machine learning algorithms to identify D mesons produced in a colour singlet state from radiative W boson decays at the LHC. The combined network algorithm is able to identify D mesons via its hadronic decays with an efficiency of 47% while suppressing a background of quark and gluon jets by a factor of 100. Using the developed algorithm, we perform a prospective study for the measurement of B ( W → D s γ ) .


Introduction
The large amount of W bosons produced in pp collisions at the Large Hadron Collider (LHC) enables searches for exclusive hadronic decays.These decay modes can offer precision studies of QCD factorisation [1] and are sensitive to the coupling of the W boson with the photon.However, the searches for the hadronic decays are still challenging due to the large background dominated by various QCD processes.Of all these decay modes, W → D s γ has the largest branching fraction predicted by the Standard Model with the value of B = (3.7 ± 1.5) × 10 −8 .No such decay has been observed so far and the best upper limit is set by the LHCb collaboration with the value B(W → D s γ) < 6.4 × 10 −4 at a 95% confidence level [2].The limit is obtained by analysing K + K − π + final states, which make up 5.4% of the D s decays.This improves on an earlier limit of B(W → D s γ) < 1.3 × 10 −3 set by the CDF collaboration [3], using only ϕ(K + K − )π + and K * 0 K + final states, which comprise 3.9% of all D s decays.The algorithm presented in this paper offers a new approach to identify D mesons specific to radiative W boson decay, using inclusive tagging, and is sensitive to all decays at the possible expense of higher backgrounds.As a proof of principle, we focus on the D s meson because of its highest predicted branching ratio.
A recent study [4] demonstrated that jets originating from a radiative decay of a colour-singlet charmonium state can be distinguished from coloured jets.With machine learning algorithms, we can differentiate between jets originating from radiatively produced D s mesons and background jets from gluons and quarks.The main characteristic is that they are produced without accompanying fragmentation tracks and produce isolated jets.With retraining, the algorithm offers an opportunity to identify other mesons originating from hadronic decays of colour-singlet states as well.This would improve future searches for these rare decays and could improve the measurement precision using data to be collected during the ongoing LHC Run 3.
In the following section, the simulation setup is described together with the algorithm where a deep neural network (DNN), a convolutional neural network (CNN), and a combined network are used to identify signal mesons.In Section 3, the results are presented and an overview is given of the network performance and stability.In Section 4, prospects for the search for W → D s γ are assessed.

Simulated Samples
Proton-proton collisions are simulated at 13.6 TeV to match the Run 3 data taking period of the LHC.The sample of D s particles is obtained via the hadronic decay of the W → D s γ.The matrix element for the process pp → W is generated at LO accuracy in QCD using MADGRAPHv5 [5,6].The NN23LO1 PDF set [7] is used in the generation.The W → D s γ decay as well as parton showering and subsequent hadronisation are performed using Pythia8 [8] with the A14 ATLAS tune [9].
The main background processes (in terms of D s identification) are pp → gg and pp → qq where g and q denotes a gluon and a quark, respectively.The background samples are modelled separately using the same setup as for the signal events.
The detector response is simulated via the Delphes [10] package using the ATLAS detector configuration files.Jets are reconstructed as pFlow jets with the anti-k t [11] jet clustering algorithm with ∆R = 0.4, and are required to satisfy p T >25 GeV and |η| < 2.1 selection criteria.Jets are considered as a D s meson if the angular distance to the truth D s particle is ∆R < 0.2.The entire configuration can be found in [12].

D s Identification using Machine Learning Algorithm
The full set of W → D s γ signal sample consists of 180k events, the qq background sample contains 45k, and the gg background sample contains 30k events.In addition, qq and gg, 30k, 30k and 45k Z → Υ/(J/ψ)/ϕ + γ events are considered as background to ensure that the network is able to reject other colour-singlet states.This makes the full background sample with 160k events comparable to the signal.Before the training all the samples were divided into training and testing sets, consisting of 70% and 30% of the full dataset, respectively.To create the machine learning algorithm, TensorFlow [13] and Keras [14] libraries were used.To determine the model performance, we use the receiver operating characteristic (ROC) curve and in particular the area under the ROC curve (AuC).The network hyperparameters were optimised with grid search to make sure that the best performing models were used to obtain the results.

Deep Neural Network
Signal jets originate from the decay of an isolated D s (not surrounded by fragmentation tracks) and will be more collimated than background jets.This is particularly true for gluon jets, since gluon-initiated jets have higher particle multiplicity and a softer fragmentation function, due to the large colour factor.Variables ∆ϕ and ∆η, which measure the width of the jet in the ϕ and η directions, as well as R em and R track which measure the ∆R with respect to the jet axis in case of tracks and electromagnetic clusters can be used to distinguish jets originating from D s from the background jets.The multiplicity of charged and neutral particles (n ch and n 0 ) in jets originating from D s is lower compared to jets from quarks and gluons.From the lower constituent multiplicity it can also be deducted that signal jets have lower invariant mass.The m tr measures the invariant mass of all charged tracks while m j defines the invariant mass of all constituents in the jet.Jets emerging from D s mesons are also less surrounded with hadronic activity caused by the fragmentation.The p core and f core measure the ratio of sum p T in a cone and the jet p T , and the ratio of sum E T in a cone and the jet total E T , respectively.The E had /E em defines the energy ratio in the hadronic and the electromagnetic calorimeter.
We start with the variables used in [4].These variables are further extended with the absolute values of the total charge and the jet-charge (p T weighted charge sum [15]).The charge is expected to peak at zero for gluon jets, at one for signal jets, and have a higher average value for quark jets.In addition, with the b-jet tagging we gain some discriminating power against b-jets.A particular class of generalised angularities (λ k β ) [16] are also added to the algorithm, which are efficient in distinguishing quark jets from gluon jets.
Furthermore, the N-Subjettiness [17] is also used, which measures to what degree the jet is composed of N subjets.For our signal jets, the N-subjettiness was expected to be close to zero, since all the radiation is aligned with the direction of the jet, meaning N (or fewer) subjets.gg background jets have τ N >> 0, since a large fraction of their energy is distributed away from the jet direction.All the variables used for the ML algorithm are listed in Table 1 and also shown in Figure 1.Based on the optimisation results, the final model consists of one input layer and two hidden layers with 35, 20 and 12 nodes, respectively.The activation function for the input layer and both hidden layers is tanh.As is common with classification problems, the output layer is activated with the sigmoid function.The full set of hyperparameters is summarised in Table 2.A feature importance plot for the DNN network is also presented in Figure 2. It can be seen that the most important features are the charge ratio of the hadronic and electromagnetic energy deposit, and the N-Subjettiness, while the R em variable has very little impact on the network performance.This indicates kinematics of the generated sample does not have a major impact on the obtained results.

Convolutional Neural Network
Another approach for developing a D s identification algorithm is to use a CNN.In this case the input variables are low level variables: energy deposited in the electromagnetic and the hadronic calorimeter, and track transverse momentum, which are plotted as a 3D image.The advantage of this approach is that one can use relatively raw data instead of carefully constructed variables.
In the context of this analysis, these energy deposits and the track transverse momentum are converted into a 20 × 20 jet image.Since the jet reconstruction parameter is ∆R = 0.4, and the segmentation of the ECAL is 0.02 × 0.02, the grid size of the jet image is equal to the smallest possible tower size in the η-ϕ plane.The variables are introduced in three different channels as is the case of an RGB picture, where the hadronic deposit is noted with blue, the electromagnetic deposit with green and the track transverse momentum with red.The schematic illustration of the jet image is shown in Figure 3.  2.

Combined Network
To further improve the efficiency of the network, the DNN and the CNN models are merged into a single network.In this case, the output of the DNN and the output of the CNN are the inputs of the next combined layer.The last layer of the model performs the classification and the results depend both on the output of the CNN and the DNN.
The best performing combined network has slightly different number of nodes within the DNN layers: 33, 20 and 14, respectively.Another significant change compared to the previously introduced models is the absence of the dense layers after the convolutional layers.Instead, a combined dense layer is introduced with 8 nodes and ReLu activation function.The classification happens in the last sigmoid layer.The parameters of the combined model are summarised in the last column of Table 2.

Results
The ROC curves of the different models are presented in Figure 4, while the output distributions of the models can be seen in Figure 5. Table 3 shows the AuC values of the different networks defined previously.As is expected, the combined model performs the best with 0.956, which corresponds to a signal efficiency of 47% at a background rejection factor of 100 or 15% at a background rejection factor of 1000.Using DNN only one can reach a signal efficiency of 38% for a background rejection factor of 100 or 15% for 1000, while using only CNN the efficiency is 35% at 100 or 9% at 1000 times background rejection.As it can be seen, the performance is significantly better against a single background of gluon jets than against quark jets.This can be further improved if one uses only a gluon sample for training to an AuC of 0.991.The tagging rate of the network for various samples used and not used during the training is presented in Table 4.Here a cut-off value of 0.75 is used.We find that for charm jets the results are not materially different from the generic quark-jet sample and this indicates that the absence of fragmentation tracks around the jets and a narrow jet with low multiplicity are more important than the exact D-meson decay topology.For hadronic τ decays, we find a high tagging rate, which is not surprising, given that τ leptons are also produced in a colour-singlet state and more than 5% of the D s mesons decay to τs.We investigate the stability of the network performance under variations of the simulation parameters.To study this, we apply the recommended variations of the Pythia8 framework.These variations cover a range of possible events that differ from the base simulation: variation 1 is related to the underlying event activity, variation 2 covers the jet shapes and substructure, and the three variations 3 cover the effects of initial and final state radiation.The results of the variance in the model performance is presented in Table 5.The effect of pileup is also taken into account during the analysis.Within the Delphes framework the additional tracking and vertexing information is not available, meaning that our estimate is worse than the real life conditions on the LHC experiments.We simulated samples with a pileup of ⟨µ⟩ = 40 meaning on average 40 pileup interaction, which is the expected amount for LHC Run 3 conditions.The retrained network, without further optimisation, shows a drop of 0.076 in the AuC, meaning that, while pileup has a significant effect, the model is still able to identify D s mesons.One can note, however, that pileup mitigation techniques implemented in Delphes are suboptimal; hence, the expected effect with real data is smaller.

Discussion
In this section prospects for the measurement of B(W → D s γ) using the method described previously are studied.For the purpose of this exercise it is assumed that lowpileup data corresponding to the integrated luminosity of 1 fb −1 are collected during LHC Run 3. Events are required to have one jet tagged as D s and an isolated photon with p T > 30 GeV.Events with invariant mass of jet-photon system ±10 GeV around W boson mass are selected.Triggering efficiency is assumed to be 100%.The optimised network cutoff of 0.75 provides the best sensitivity.Total signal efficiency for W + → D s γ (W − → D s γ) is estimated to be 15.5% (18.7%), respectively.
In order to estimate background level, large MC samples of pp → gg and pp → qq, as well as pp → qγ, Z → ee, and Z → ττ are generated with MADGRAPHv5 and Pythia8.The detector response is simulated via the Delphes package using the ATLAS detector configuration files.Backgrounds are normalised according to their generated cross-sections.The total level of background is estimated to be 930,000 events corresponding to the integrated luminosity of 1 fb −1 .The background is dominated by the QCD process while less than 1% of the total background arises from Z boson events.Figure 6 shows the distribution of D s tagged jet-plusphoton invariant mass for the backgrounds and W → D s γ signal normalised to the integrated luminosity of 1 fb −1 .The signal histogram is overlaid and scaled by a factor of 10,000.
The CL s method [18,19] is used to calculate the upper limit on the branching fraction of the W → D s γ decay.The expected number of signal plus background events is where ϵ is event selection efficiency of the signal, σ pp→W is the inclusive production crosssection for the W boson evaluated at the NNLO in QCD, Ldt is the integrated luminosity, and N bg is the expected number of background events.Uncertainties on ϵ, Ldt, and N bg are assumed to be Gaussian, and correlations between these are neglected.In this study total signal uncertainty is assumed to be 10% and has only marginal impact on the calculated limit.The uncertainty on the background level is assumed to be 0.5% as obtained in the ATLAS search for radiative Higgs boson decay [20].The upper 95% CL (confidence level) limit on the "signal strength" σ pp→W B(W → D s γ), with production cross-section fixed, is set using Poisson statistics and the above equation.The limit is obtained with CL s = CL s+b /CL b ≤ 0.05, where CL s+b is the confidence level for signal and background, and Cl b is the confidence level for the background alone.The calculated CL s exclusion as a function of branching fraction of W → D s γ is shown in Figure 7.The expected upper limit at the 95% CL is determined to be:

Figure 1 .
Figure 1.Distributions of the variables used for D s identification, using DNN.The signal is presented with a solid blue line, while the gg and qq backgrounds are drawn with dashed red and dotted green lines respectively.

Figure 1 .
Figure 1.Distributions of the variables used for D s identification, using DNN.The signal is presented with a solid blue line, while the gg and qq backgrounds are drawn with dashed red and dotted green lines, respectively.

Figure 2 .
Figure 2. Feature importance plot of DNN.The blue bars represent the weight of each feature (variable) within the network.

Figure 3 .
Jet image construction from low level variables, where (a) shows 2 different signal and (b) 2 different background events.The hadronic deposit is noted with blue circle pattern, the electromagnetic deposit with green square grid and the track transverse momentum with red star pattern composing an RGB input picture to the CNN algorithm.Our CNN model consists of 5 layers: 3 convolutional and 2 fully connected dense layers.The number of nodes in the convolutional layers are 30, 8 and 8, respectively.The window sizes are [3 × 3] and [5 × 5] in the last layer, while the activation function is tanh in all three cases.A maxpooling layer is added after the second convolutional layer.The number of nodes in the first dense layers is 10 with the ReLU activation function.The output layer is again a dense sigmoid.The parameters of the final CNN model are summarised in Table

Figure 4 .Figure 5 .
Figure 4. ROC curves for the different network types.

Figure 6 .
Figure 6.Distribution of the invariant mass of D s tagged jet-plus-photon system.The signal is scaled with a factor of 10 4 .

Figure 7 .
Figure 7. Expected upper limit on branching fraction of the W → D s γ decay.The vertical line corresponds to CL s = 0.05.The branching fractions higher than 2.87 × 10 −4 are excluded at 95% CL.
T in a cone of ∆R <0.1 and the jet p T p core2 ratio of sum p T in a cone of ∆R <0.2 and the jet p T f core1 ratio of sum ET in a cone of ∆R < 0.1 and the jet total ET f core2 ratio of sum ET in a cone of ∆R < 0.2 and the jet total ET f core3 ratio of sum ET in a cone of ∆R < 0.3 and the jet total ET had /E em ratio of the hadronic versus electromagnetic energy deposited in the calorimeter τ 0 , τ 1 , τ 2 N-Subjettiness

Table 2 .
Hyperparameters of the different network types.

Table 3 .
Overview of the training results using the combined network.Mixed background test samples contain 50% quark and 50% gluon jets.

Table 4 .
Jet tagging rate for different samples.For c c and b b samples, the tagging rate is separately evaluated for events, where the jet contains a truth D s .

Table 5 .
Variations in the AuC for different Pythia8 tunes.