Machine Learning Approach to Analyze the Heavy Quark Diffusion Coefficient in Relativistic Heavy Ion Collisions

The diffusion coefficient of heavy quarks in a deconfined medium is examined in this research using a deep convolutional neural network (CNN) that is trained with data from relativistic heavy ion collisions involving heavy flavor hadrons. The CNN is trained using observables such as the nuclear modification factor RAA and the elliptic flow v2 of non-prompt J/ψ from the B-hadron decay in different centralities, where B meson evolutions are calculated using the Langevin equation and the instantaneous coalescence model. The CNN outputs the parameters, thereby characterizing the temperature and momentum dependence of the heavy quark diffusion coefficient. By inputting the experimental data of the non-prompt J/ψ(RAA,v2) from various collision centralities into multiple channels of a well-trained network, we derive the values of the diffusion coefficient parameters. Additionally, we evaluate the uncertainty in determining the diffusion coefficient by taking into account the uncertainties present in the experimental data (RAA,v2), which serve as inputs to the deep neural network.


I. INTRODUCTION
In recent years, there has been rapid development in deep learning methods, which are increasingly being widely applied in industry and scientific research.In particular, deep learning methods are used to handle highdimensional data and uncover patterns, such as in image recognition and more.In the realm of scientific research, deep learning has already found applications in many aspects of physics [1][2][3].In theoretical research in highenergy nuclear physics, an increasing number of studies are utilizing deep learning methods to analyze computational data from theoretical models, such as the study of the equation of state of QGP matter [4], dynamical evolutions of QGP [5], identifying spinodal clumping in high energy nuclear collisions [6].
In the Relativistic Heavy-Ion Collider (RHIC) and the Large Hadron Collider (LHC) [7][8][9], a new kind of deconfined state, called Quark-Gluon Plasma (QGP), is predicted and produced.The properties such as initial energy density and the coupling strength of QGP are typically studied through the final-state light hadron spectra [10][11][12][13] or the distribution of heavy flavor particles [14][15][16][17][18][19].For heavy flavor particles, their distribution is initially influenced by the cold nuclear effect, and then they will interact with the QGP medium, resulting in energy loss [20][21][22][23].Numerous models have been established to account for the various effects mentioned above [24][25][26][27][28][29], aiming to ultimately predict the nuclear modification factor and anisotropic flows of open heavy flavor hadrons.These models investigate the energy loss mechanisms of heavy quarks in the QGP from different perspectives.Relevant nuclear modification factors and collective flows of D mesons or B mesons have also been experimentally measured by STAR [30], ALICE [31][32][33] and CMS [34,35] Collaborations, which are closely related to the energy loss process of heavy quarks.As it is not straightforward to explain the experimental observables of R AA and v 2 (p T ) at the same time with a simple value of the diffusion coefficient D s 2πT , a more realistic expression with temperature and momentum dependence is needed.Due to the complex process involving the heavy quark diffusion and hydrodynamic evolution, it is necessary to employ deep neural networks to analyze the relationship between the diffusion coefficient and experimental observables.CNN has been proven to be suitable for analyzing high-dimensional datasets and quantify the value of the diffusion coefficient considering multiple hot and cold nuclear matter effects.
In previous studies, Bayesian statistical analysis has been employed to analyze the experimental data of soft particles [36][37][38][39], quantitatively extract the diffusion coefficient with experimental observables of charmed hadrons in heavy-ion collisions [40].Although there is abundant experimental data on D mesons, our main focus is on studying the evolution of B mesons.This is because the diffusion coefficient is directly related to the drag term in the Langevin equation, which arises from elastic scattering processes rather than the medium-induced gluon radiation.Therefore, as the mass of the heavy quark increases, the contribution from elastic scattering processes becomes more significant in the energy loss process, while the contribution from gluon radiation is relatively weaker.In this work, we treat the R AA and v 2 of non-prompt J/ψ from B-meson decay as inputs of the CNN and the parameters in the diffusion coefficient as outputs of the network.After training the neural network with supervision, the inputs of the neural network are selected from values within the error bars of the experimental data points in order to generate the corresponding diffusion coefficient with some uncertainty.This paper is organized as follows: in Section II, we introduce the Langevin plus Instantanuous Coalescence model (LICM) to generate datasets of heavy flavor evolutions with different values of the parameters, which will be used as trainning datasets of the CNN model.In Section III, the values of the shadowing factor, temperature and momentum dependence of the diffusion coefficient are quantitatively extracted based on the experimental data of non-prompt J/ψ from B-meson decay.A final summary is given in Section IV.

II. FRAMEWORKS
A. Generating datasets with LICM Bottom quarks are produced in the hard scatterings of nucleon partons.We adopted the the fixed-order plus next-to-leading log formula (FONLL) [41,42] and NNPDF30NLO PDF set [43] to calculated the initial momentum distribution of bottom quarks in nucleonnucleon collisions.Due to the fact that Pb-Pb collisions can be regarded as a superposition of nucleon-nucleon collisions, accompanied by cold nuclear matter effects, the initial momentum distribution of bottom quarks in Pb-Pb collisions can be considered as the momentum distribution in pp collisions multiplied by a shadowing factor.The nuclear shadowing factor is calculated with the EPS09 NLO package [44] in 5.02 TeV Pb-Pb collisions.The production of bottom quarks primarily arises from binary collision processes, hence the spatial distribution of bottom quarks in nuclear collisions is proportional to the distribution of binary collisions, [45].Here, T A(B) = dzρ A(B) (x T , z) represents the thickness functions of the two nuclei.The nucleon distribution ρ(x T , z) in the nucleus follows a Woods-Saxon distribution.
After the production of bottom quarks, they propagate within the high-temperature QGP medium accompanied by energy loss.The energy loss of bottom quarks in the QGP is primarily attributed to scattering processes between bottom quarks and thermal partons, as well as medium-induced parton radiation.Considering that the mass of the bottom quark is much larger than the typical temperature of the medium, the momentum change during each interaction of bottom quarks in the medium is relatively small, which can be regarded as Brownian motion.Consequently, the momentum evolution of bottom quarks can be described using the Langevin equation, On the right-hand side, the first two terms represent the drag and noise terms, which come from the elastic collisions with thermal light partons.The drag coefficient is defined as η D (p) = κ/(2T E b ), where the energy of the bottom quark is given by The mass of the bottom quark is taken to be m b = 4.75 GeV.κ represents the momentum-diffusion coefficient.It is related to the spatial diffusion coefficient D s via D s κ = 2T 2 .In the noise term ξ, the time correlation and momentum dependence are both neglected for simplicity, where ξ is treated as a Gaussian shaped white noise satisfying the relation, The third term f g = −dp g /dt with p g being the momentum of the emitted gluon, represents the recoil force on the bottom quark from the emitted gluon.The number of emitted gluons in a small time interval t ∼ t + ∆t is [46], Here x = E g /E b represents the ratio of energy carried by gluons radiated from bottom quarks.dN g /dxdk 2 T dt is the the spectrum of emitted gluon from higher twist calculation [22,23].k T is the transverse momentum of the gluon.The position of heavy quark is updated in each time step as Heavy quarks are randomly generated based on the initial spatial and momentum distributions, and propagate in the QGP with energy loss described with Eq.( 1).When bottom quarks diffuse and move into certain regions where QGP local temperature is low, heavy quarks undergo hadronization by combining with light quarks to form B mesons or by combining with an anti-heavy quark to form quarkonium.In the high-momentum region, the production of B mesons from bottom quarks is predominantly through fragmentation processes, while in the intermediate and low-momentum regions, it is mainly through the coalescence process.In this study, we primarily focus on bottom quarks with transverse momentum p T ≤ 15 GeV/c.Therefore, we utilize a coalescence model to describe the hadronization process of bottom quarks into B mesons, The momentum distribution of B meson dN M /dp M is proportional to the distributions of bottom quarks dN 1 /dp 1 and also the thermal light quarks dN 2 /dp 2 .Heavy quark distribution is given by the Langevin equation, while thermal light quark is taken as a Fermi distribution.Their coalescence probability is determined by the Wigner function f W M (q r ), which can be obtained via the Weyl transform of B meson wave function.In principle, the complete Wigner function f W (q r , x r ) provides constraints on the relative distance and relative momentum between two particles for the formation of a bound state.The spatial constraint becomes crucial and can significantly reduce the coalescence probability when the two particles are rare in the QGP, as observed in the case of charmonium coalescence process.However, in the case of a B meson composed of one heavy and one light anti-quark, with a plentiful number of light quarks in the QGP, we assume that the heavy quark can readily find a light quark in proximity, thus satisfying the spatial constraint.By integrating over the spatial part of the complete Wigner function, we simplify it to a normalized Gaussian function A 0 exp(−q 2 r σ 2 ), where A 0 represents the normalization factor.The width of the Gaussian function is connected with the root-mean-square radius of B meson, ⟨r 2 ⟩ B [47], with the value to be ⟨r 2 ⟩ B = 0.43 fm [45].m 1 is the bottom quark mass, while the thermal mass of the light quark is approximated to be m 2 = 0.3 GeV, used in the coalescence process.
is the relative momentum between bottom quark and the light quark in the center of mass (COM) frame.p cm 1 and p cm 2 are the momenta of the bottom quark and the light quark in the COM frame.The delta function ensures the momentum conservation in the coalescence process p M = p 1 + p 2 .As the phase transition between QGP and hadronic gas crossover at LHC energies, we perform the coalescence process at the critical temperature T c = 150 MeV.The time and spatial evolutions of bulk medium has been well described with the hydrodynamic equations.We employ the MUSIC package to give the information of hot medium at 5.02 TeV Pb-Pb collisions [27,48,49].The local temperatures of the medium vary with coordinates and time, and these variations will be incorporated into the Langevin equation.After the hadronization, B mesons continue diffusion in the hadronic gas with a different value of the diffusion coefficient, and decay into non-prompt J/ψ after the kinetic freeze-out at the temperature T fo = 120 MeV.
Recently, Lattice QCD calculations present the new calculations of the spatial diffusion coefficient of heavy quarks at different temperatures [20], which is found to be smaller than the previous quenched lattice QCD [50,51] and recent phenomenological estimates [40,[52][53][54].This conclusion is also observed in other theoretical results [55][56][57].This prompts us to reexamine the relationship between experimental measurements of heavy quarks and the diffusion coefficient.In the high temperature and momentum regimes, the diffusion coefficient can be calculated through perturbative QCD [58][59][60].However, this calculation is not sufficient to simultaneously explain the R AA and v 2 of open heavy flavor particles measured in nuclear collisions [61], suggesting that non-perturbative processes play an indispensable role in the temperature and momentum dependence of the diffusion coefficient.In Bayesian statistical analysis, a parameterized form for the diffusion coefficient is proposed, including a linear temperature dependence term and a perturbative QCD term, such as D s 2πT ∝ A(p)(α + βT ) + (1 − A(p))8π/(q/T 3 ) [40].The first term represents the contribution from non-perturbative processes, while the second term represents the contribution from perturbative processes.q is the heavy quark trans-port coefficient, which is calculated by elastic scatterings between heavy and light quarks [62].The spatial diffusion coefficient from lattice QCD calculations is in the p = 0 case.In this work, we introduce a concise formula to consider the temperature and momentum dependence in the spatial diffusion coefficient, The temperature and momentum dependence are encoded in the terms T /T c and m Q are the mass and energy of heavy quark.The parameter α represents the value of D s 2πT at the critical temperature with the momentum p = 0.The parameter β and γ controls the degree of temperature and momentum dependence.In the hadronic gas, the coupling strength between B-and D-mesons and the medium becomes much smaller.Their contribution on the R AA and v 2 of open heavy flavor particles are limited.In hadronic gas, the mean value of the spatial diffusion coefficient of B-meson is approximated to be D M s 2πT = 9 before the kinetic freeze-out of B-meson in 0.8T c < T < T c [63].

B. Deep neural networks
In the dynamical evolution of bottom quarks, our theoretical model produces a wide range of R AA and v 2 values for non-prompt J/ψ in 5.02 TeV Pb-Pb collisions by varying the shadowing factor denoted as S, and the three parameters (α, β, γ) in Eq.( 5).This dataset will serve as the training data for the CNN.Experimental measurements have been conducted to determine the nuclear modification factor R AA (p T ) for non-prompt J/ψ in three different centralities, as well as the elliptic flow coefficient v 2 (p T ).In our approach, we treat the three centralities of R AA (p T ) as separate channels, while v 2 (p T ) acts as an additional channel in the input layer of the CNN.The output layer of the CNN incorporates the corresponding parameter values in the diffusion coefficient and the shadowing factor, which are considered as labels for the input data.Theoretical calculations based on the LICM model establish a mapping relationship between the parameter combination values (S, α, β, γ) and the experimental observables (R AA , v 2 ). Figure 1 provides a graphical representation of the network structure, which consists of four hidden layers including Average pooling layers and fully connected layers.The ReLU activation function is chosen for these layers.The three output channels related to the diffusion coefficient parameters are projected to have positive values, while the channel associated with the shadowing effect is projected within the range of 0-1 using the Sigmoid function.
To generate the training dataset, we randomly select parameter values within the regions specified in Table I using the Langevin model.Due to the significant uncertainty surrounding the shadowing factor S in model calculations, which can notably impact the final observables of the B meson, we consider the shadowing factor as a parameter to be optimized within the deep neural network.The values of the shadowing factor are constrained within the range of 0.6 to 1.0.For 5.02 TeV Pb-Pb collisions, the values of S used to generate the training dataset are randomly selected from the range of 0.6 to 1.0.The spatial diffusion coefficient D s 2πT at the critical temperature T c is selected within the range of 2.0 ≤ α ≤ 6.0, while the values of β and γ characterizing the temperature and momentum dependence are chosen within the ranges of 0 ≤ β ≤ 8.0 and 0 ≤ γ ≤ 1.0, respectively.We generate 20K events as the training and validation datasets, where each event corresponds to one combination of parameters.The performance of the neural network is influenced by both the size of the training dataset and the structure of the deep neural network.This type of uncertainty has been examined in previous studies [64] and will be partially investigated in this work by varying the size of the training datasets.However, such uncertainties are not as significant as those arising from the error-bars of the experimental data points of heavy flavor particles used as input for the CNN.We partition 70% of the total datasets as the training data, while the remaining 30% of the datasets serve as the validation data.By treating (R AA , v 2 ) as inputs for the CNN and the parameter values as labels, we can calculate the loss of the CNN for both the training and validation datasets.The learning curve shown in Fig. 3 demonstrates that the loss decreases to below 5% after 250 training epochs.Notably, the loss of the CNN for the training datasets closely aligns with the loss observed when using the validation datasets.This indicates that the CNN model does not exhibit significant overfitting or underfitting.
Using the well-trained CNN model, we feed the experimental data points into the neural network.Considering the presence of error bars in the experimental data points (as depicted in Fig. 4), we sample within the error-bars associated with the experimental data points.These samples are then utilized as inputs to the neural network, allowing us to account for the impact of experimental uncertainties on the diffusion coefficient parameters.To capture the influence of experimental uncertainty, we randomly generate 10K samples within the range of experimental error-bars around the data points, as illustrated in Fig. 4. Consequently, we obtain 10K different outputs, which are plotted in Fig. 5.Each data point represents an individual event, with the corresponding values of α and β represented on the x-and y-axes, respectively.As previously mentioned, α signifies the value of D s 2πT at the critical temperature and zero momentum, while β represents its temperature dependence.The majority of events are concentrated within the region 4 ≤ α ≤ 6.5 and 0 ≤ β ≤ 5.0.Furthermore, we also plot the values of γ, which characterizes the transverse momentum dependence, and the shadowing factor S arising from the cold nuclear matter effect in Fig. 6.From the distributions, it is observed that the majority of events fall within the range 0.0 ≤ γ ≤ 0.2 and 0.75 ≤ S ≤ 0.9, as depicted in Fig. 6.The distribution of events in the figure reflects the uncertainty in the parameter values resulting from experimental data errors and also the neural network structure.Based on the distributions obtained from the CNN outputs, we can directly extract the mean values of these parameters.These mean values are presented in Table II.It is worth noting that the value of α remains larger than the results obtained from lattice QCD calculations at T = T c [20] and consistent with the values given by previous model calculations.Additionally, the temperature dependence (β) is strong, while the momentum dependence (γ) is relatively weak.

IV. SUMMARY
In this study, the Convolutional Neural Network is employed to extract the temperature and momentum dependence in the spatial diffusion coefficient of heavy quarks using experimental data from non-prompt J/ψ decays in B-mesons.The Langevin equation is utilized to describe the dynamical evolution of heavy quarks in the QGP and B-mesons in the hadronic gas.Additionally, the instantaneous coalescence model is used to describe the hadronization process from bottom quarks to B-mesons.By taking different values of the shadowing factor and diffusion coefficient, the nuclear modification factors and elliptic flows of non-prompt J/ψ in multiple centralities of 5.02 TeV Pb-Pb collisions are generated.The CNN model is trained under supervision with model calculations.To extract the values of the diffusion coefficient, we sample (R AA , v 2 ) from the experimental data points along with their error bars, which are used as inputs for the CNN.By doing so, the corresponding values of the diffusion coefficient and shadowing factor are obtained concurrently.The dispersion in the diffusion coefficient values can be partially attributed to the uncertainties present in the experimental data.The mean values of the diffusion coefficient are also extracted.This research contributes to the understanding of the heavy quark diffusion coefficient through a data-driven analysis approach.

FIG. 1 .
FIG. 1. Schematic figure to show the structure of the convolutional neural network.RAA(pT ) in three centralities and one v2(pT ) are taken as four channels of the input layer, while parameters related to the spatial diffusion coefficient are the output.
parameters region shadow factor S ∈ [0.6, 1.0] D s 2πT α ∈ [2.0, 6.0] with T-dependence β ∈ [0.0, 8.0] p T -dependence γ ∈ [0.0, 1.0] III.RESULTS AND ANALYSE In the previous sections, we introduce the theoretical model to generate the training dataset for the CNN by varying the values of the parameters in the model.We plot some events (R AA (p T ), v 2 (p T )) randomly selected from one channel of the training datasets.They are shown in Fig.2.The lines can cover the experimental data, which indicates that the training range of the model encompasses the distribution of experimental data.The model can be effectively applied to analyze the experimental data.As experimental data points about R AA are located in p T ≥ 2 GeV/c and v 2 is located in p T ≥ 4 GeV/c, We truncate the training data by only retaining data with p T values above 4 GeV/c.This ensures that the shape of the training data aligns as closely as possible with the experimental data, making it easier to incorporate into the input, please see Fig.2.

FIG. 2 .
FIG. 2. Some events randomly selected from one channel of the training dataset.The lines represent the nuclear modification factors of non-prompt J/ψ calculated with different values of the parameters in the centrality 30-100% in 5.02 TeV Pb-Pb collisions.

FIG. 3 .
FIG. 3. The loss of the CNN model as a function of the training epochs.The loss of the model calculated with the training datasets and the validation datasets are plotted respectively.

FIG. 4 .
FIG. 4. To consider the error-bars of the experimental data about non-prompt J/ψ in 5.02 TeV Pb-Pb collisions, we sample the values of RAA and v2 within the error-bars of each data points, and take them as inputs of the deep neural network.Some lines representing random events are plotted in the figures.Four figures represents four channels of the network.The experimental data are cited from CMS Collaboration [34, 35].

FIG. 5 .
FIG. 5.The distribution of parameter values (α, β) from the CNN.The x-axis represents α, while the y-axis represents β.Each point represent one event.

12 FIG. 6 .
FIG. 6.The distribution of parameter values (S, γ) obtained from the CNN model is shown in the plot.The x-axis corresponds to the values of S or γ, while the y-axis represents the number of events.

TABLE I .
The samples of the parameters used in the training dataset

TABLE II .
The