Enhancing Spectroscopic Experiment Calibration through Differentiable Programming

: In this work, we present an innovative calibration technique leveraging differentiable programming to enhance energy resolution and reduce the energy scale systematic uncertainty in X-ray spectroscopic experiments. This approach is demonstrated using synthetic data and is applicable in general to various spectroscopic measurements. This method extends the scope of differentiable programming for calibration, employing Kernel Density Estimation (KDE) to achieve a target Probability Density Function (PDF) for a fully differentiable model of the calibration. To assess the effectiveness of the calibration, we conduct a toy simulation replicating the entire detector response chain and compare it with a standard calibration. This ensures a robust and reliable calibration methodology, holding promise for improving energy resolution and providing a more versatile and efficient approach without the need for extensive fine-tuning.


Introduction
Calibrating spectroscopic X-ray detectors stands as a critical task in numerous scientific experiments, with far-reaching implications across diverse fields, particularly in nuclear and atomic physics.Among these detectors, Silicon Drift Detectors (SDDs) [1,2] feature prominently, and they are often employed in substantial numbers to reach the precision needed in measurement of energy spectra.The accuracy and reliability of the calibration process directly impact the uncertainty associated with the energy scale, rendering it a pivotal aspect of precision X-ray spectroscopy.In various cases, it represents the dominant source of systematic uncertainty in exotic atoms with SDDs [3], Transition Edge Sensors (TES) [4][5][6][7], and even in scintillators [8] in particle physics.
Traditionally, the calibration of SDDs, like other spectroscopic detectors, has relied upon fitting the observed data to known spectral lines within the measurement spectrum, in situ or in special runs.While this approach has proven its efficacy in numerous experiments, it has inherent limitations.The method's precision hinges on the quality of the calibration data, and any defects in the calibration process are accounted for as a systematic uncertainty in the measurements.
Recently, we proposed a pioneering approach to spectroscopic detector calibration [9], leveraging differentiable programming and achieving a record FWHM at the copper line with the VIP-2 apparatus at the Gran Sasso National Laboratories.This approach showed promise in calibrating detectors with high precision.However, its utility was primarily tailored to situations involving a single spectral line of interest, making it challenging to extend its applicability to more general scenarios.
To address this limitation, this work introduces a significant advancement in spectroscopic detector calibration, enhancing the method by approximating the spectra Probability Density Functions (PDFs) using Kernel Density Estimation (KDE).The incorporation of KDE resolves the issue of differentiability, rendering the approach more versatile and adaptable.In this novel calibration approach, we compare the target PDF with the KDE representation, thus providing a robust calibration methodology that is not constrained by the presence of specific spectral lines.
Our findings indicate that this refined method offers increased flexibility, making it well-suited for calibration across various spectroscopic features.Furthermore, it enables the minimization of associated systematic uncertainties, ultimately enhancing the resolution of spectral peaks to unprecedented levels.
To illustrate the capabilities of this innovative calibration approach, we have constructed a synthetic data-based toy scenario that simulates the entire detector calibration process for a multitude of channels or calibration batches.This simulation serves as a demonstrative model, showcasing the broad applicability and significant benefits of our advanced calibration method in enhancing the precision and reliability of spectroscopic measurements.

Differentiable Programming
Differentiable programming, a powerful computational paradigm, has emerged as a pivotal tool for scientific computing and machine learning, with its roots extending from the domain of the CERN's Large Hadron Collider and particle physics [10][11][12].This innovative approach enables the automated differentiation of mathematical functions, providing a foundation for the creation of complex, fully differentiable models.These models can subsequently be fine-tuned and optimized through the application of gradient-based techniques, making them adaptable for a wide range of scientific applications.
In general, we can divide the differentiation in programming into three categories: numerical, symbolic, and automatic.
Numerical differentiation is the most straightforward, and can approximate the gradient ∇ f (x) of a function by evaluating it at two or more points in proximity of the evaluation point.This method is simple but can be computationally expensive.It is based on Newton's difference quotient, and can be expressed as: where h = |h| is the step size, which can be optimized depending on the function to be differentiated, with higher-order methods generally used to improve the accuracy.In the limit of h → 0, the numerical differentiation converges to the true gradient.Numerical differentiation has been widely adopted in scientific computing, but its performance degrades linearly with the dimensionality of x; additionally, a theoretical (truncation) error is always present due to the neglected terms in the power series representation.Symbolic differentiation leverages computer algebra systems to compute the derivative of a function.This method is exact, but can be computationally expensive and slow, especially for complex functions.Symbolic differentiation is based on the application of the chain rule and the rules of differentiation to the function to be differentiated.It is particularly useful for functions with a small number of variables, and can be used to derive the analytical form of the gradient.If there is no closed-form solution, other approaches have to be used.
Automatic differentiation is a hybrid approach that combines the advantages of numerical and symbolic differentiation, achieving an exact (up to machine precision) numerical evaluation of the function derivative per point, without theoretical truncation errors.It is based on the chain rule, and can be implemented in two ways: forward mode and reverse mode.In the forward mode, the derivative is computed by applying the chain rule to the function, while in the reverse mode, the derivative is computed by applying the chain rule to the function's output.If f (x) is defined as R n → R m , the forward mode will scale with the input dimensionality n, while the reverse mode will scale with the output dimensionality m, and the choice of the mode can be optimized depending on the function to be differentiated.It is the most efficient method for computing gradients, and it is widely used in machine learning and scientific computing.
To fully leverage modern differentiable programming frameworks, we use JAX [13], Google's open-source state-of-the-art library for large-scale machine learning research, which can automatically differentiate native Python and NumPy functions on GPUs and TPUs.

Toy Setup
In the following, we present a toy scenario to illustrate the calibration process of a spectroscopic detector.This scenario is designed to simulate the entire chain of detector response, calibration, and energy reconstruction, providing a comprehensive overview of the calibration process.The toy scenario is constructed using synthetic data, enabling the evaluation of the calibration method's performance and its comparison in a standard calibration setting.
The strategy: We assume the standard conditions of typical X-ray experiments, where the runs of data taking are divided into two categories: physics runs, where the actual measurements are taken, and calibration runs, where the calibration parameters are determined.In the physics run, the goal is to characterize an unknown line in the presence of two known emissions.This is a common scenario in X-ray spectroscopy, where lines from material contamination are present and hard to remove, and these lines are typically exploited to determine in situ the energy scale uncertainty of the detector.
We simulate a sequence of calibration and physics runs, alternating ten physics runs with one single calibration run.The sequence is repeated ten times.The detector response is left free to vary according to a stochastic process for the entire duration of the simulation.
To avoid benchmarking biases, we assume that the calibration runs perfectly characterize the detectors, i.e., the calibration parameters are determined with infinite precision.This is a conservative choice, as in reality, the calibration runs are affected by uncertainties, which would be reflected in the final calibration.
We will then show that this optimization can leverage the information from the known lines in situ, using them as anchors to improve the overall detector calibration, with a specialized dynamical correction.
The simulation: In our toy scenario, we begin at the truth level, where we create a simulated emission spectrum for the physics runs.This spectrum consists of key spectral lines (for simplicity, we do not consider K α 2 ), including the Titanium K α (Ti K α ) line at µ 1 = 4508.9eV, the Copper K α (Cu K α ) line at µ 2 = 8047.8eV, and an additional line at 6156 eV, which we intend to characterize.To add a realistic touch, we also introduce a flat background component to the spectrum.At truth level, the intrinsic width of the Ti and Cu lines is assumed to be much smaller than the resolution of the detector.In this toy, we simulate 100 batches representing either different detector units or different calibration slots.
The next step involves accounting for the detector's response to these emissions.We assume a Gaussian detector response, which is a commonly encountered response function.This convolution process results in the detector energy signal for each data batch or detector being calibrated.The detector energy only accounts for the detector response, and represents the spectrum which would be obtained in case of a "perfect" calibration.
To proceed with the calibration, we assume a linear relationship (without loss of generality, any fully differentiable function can be used) between the detector ADC (analogueto-digital converter) values and the actual energy: where E is the calibrated energy from the ADC values, using p 0,i and p 1,i constants in the batch i.The coefficients p 0,i and p 1,i are usually referred to as offset and gain in the context of calibrations.ADC values are typically stored from the front-end electronics of X-ray spectroscopic experiments, which is the raw data to be calibrated.Inverting Equation (2), we derive the ADC spectrum for each batch; from this step, we can perform the actual calibration and compare the results with the true value.The p 0,i and p 1,i intrinsically depend on many physical factors such as temperature, pressure, electronics, radiation damage, and so on.We approximate the time dependence of these effects as a stochastic process; in more realistic scenarios, autoregressive or more complex models [14,15] could be used.We simulate the typical chain of physics runs and calibration runs, as shown in Figure 1.The calibration runs sample the state of the detector response in the unshaded areas, which is then applied to the physics runs in the shaded areas.In X-ray spectroscopy, for example, the calibration runs are performed using known sources, excited via the X-ray tube.The calibration parameters are then applied to the physics runs, where the actual measurements are taken.The shaded area represents the physics runs where the parameters are applied.For simplicity, and as a conservative choice, we assume the calibration runs to exactly determine the true values of the parameters.Without loss of generality, the linearity in Equation ( 2) is a simplifying assumption but is commonly employed as a reasonable approximation for many calibration procedures; SDDs as an example are demonstrated to have a linear response [16].The method presented here does not depend on this assumption; more complicated relationships can be used, provided their differentiability.
We define the calibration matrix as: The goal of the optimization procedure is to find the optimal P. Representative spectra for the first three steps are shown in Figure 2.After these steps, we perform the calibration as typically performed in spectroscopic experiments:

•
Use the calibration constants: From known spectroscopic lines, the calibration constants p 0,i ,p 1,i are derived in the calibration runs.These are then applied to the subsequent physics runs, until a new calibration is performed.

•
Calibrating detector data: Using C and P, we apply the calibration to the entire dataset.This process corrects the recorded detector ADCs for each batch, ensuring that they are accurately aligned with the true energy values of the emitted lines.
• Combining the data: Finally, we combine the calibrated data from each batch together.This results in a fully calibrated dataset where the detector signals are corrected to reflect the actual energy values of the emitted lines.
This toy scenario, summarized in Figure 3, provides a simplified but illustrative example of the calibration process commonly used in spectroscopic experiments, where the goal is to align detector signals with the true energy values of emitted spectral lines.
To showcase the gains and benefits of our novel technique, we compare the standard calibration reconstruction detailed above with the differentiable optimization.This is performed on a dedicated GPU (NVIDIA Tesla T4), and we characterize the calibrated spectrum with standard tools (MINUIT2) [17,18].detectors and different batches (top center, yellow) to obtain the detector energy; finally, we go to the detector ADC signals (right, blue) assuming for each detector a slightly different response.From this point onwards, the procedure is a normal calibration; the detector ADC signals are calibrated from calibration runs to obtain the reconstructed detector energy (bottom center, yellow).Finally, the spectra are added together to obtain the final spectrum, which is used to determine the properties of the unknown peak (bottom left, yellow).In case of a perfect calibration, where the p 0,i ,p 1,i correspond to the true ones, the reconstructed energy would be the same as the true detector energy (top center, yellow) from the first step.

Gradient-Based Optimization
To achieve a gradient-based optimization, we have first to define a loss function, which encapsulates the knowledge of what a "good calibration" is.In [9], the loss function was expressed in terms of a log likelihood of the adherence of the data to the model, event by event on one spectroscopic line.Since we do not want to constrain our optimization to one transition only, we reformulate the loss function in a simpler way, which additionally can take into account the entire shape of the spectrum.The best candidate for the loss function is the χ 2 of the spectrum with respect to the target PDF, which expresses the correct positions of the peaks in energy.In our case, this is just the two known lines which are used as anchors for the method, and a polynomial background.
where µ 1 and µ 2 are the centers of the peaks as detailed above, and σ 1 , σ 2 the corresponding widths.The presence of the additional peak (not included in the target), as we shall see, will contribute as a constant value in the optimization and therefore will not play any role.The , evaluate the distance of the data to the target PDF.In this case, the O j is the observed data, i.e., the value of the spectrum in the j-th bin, and E j is the value of the target.For the differentiable approach, however, this involves the use of the spectrum histogram, which is inherently non-differentiable, because of the intrinsic discontinuity of the bin edges.In order to provide flexibility and allow for the gradientbased optimization, we employ the Kernel Density Estimation (KDE) as proxy to evaluate the χ 2 .With this approach, we can obtain a smooth and differentiable representation of the spectrum, which can be used in the optimization process.The bandwidth (bw) of the KDE has to be chosen with care, and in general will depend on the resolution of the detector considered.In this toy example, we found a value of bw = 0.01; however, one can rely on known bandwidth estimators [19].The loss function, as described above, in our case becomes: With this definition, the gradient can be then evaluated for all the i batches, with respect to the calibration matrix P. With the learning rate vector, R, which in our case has two components, we update the calibration matrix: and we repeat this until ∇ P (loss) is small enough.After convergence, the resulting P ′ is associated to a smaller loss, i.e., a better adherence of the data to the target.The evolution of the gradient for the first 200 iterations of the optimization is shown in Figure 4.As can be seen, the loss reaches a plateau around 85, which is the offset due to the additional peak present in the spectrum but not on the target PDF.This does not represent a concern, since the constant value is ignored in the differentiable optimization.

Results
In this section, the comparison of the differentiable optimization and the standard scenario is presented.A further comparison is also made with the true detector energy (the perfect calibration) obtained from the convolution of the truth-level energy with the detector response, which represents the best possible result of the optimization.In Figure 5, top, the evolution of the KDE PDF estimation before the start of the optimization (left) and at the end (right) is shown.As can be seen, the data are much more adherent to the target, both in terms of sharpness of the peaks and in terms of their position.As discussed earlier, the additional peak not included in the target does not degrade the performance of the method.(c,d) plots, on the left, the comparison of the calibrated spectrum with the standard approach (yellow) and after the differentiable optimization presented in this work (blue) is shown.As can be seen, and already hinted by the PDF distribution on the top plots, there is clear gain both in terms of stability and resolution.On the right, the optimized spectrum (blue) is compared against the true detector level energy spectrum (red), i.e., the one obtained with a perfect calibration.
The spectrum obtained after the differentiable optimization is shown in Figure 5, bottom plot left, and compared against the standard calibration.Again, clear improvements are visible on all the lines, both in terms of scale and resolution, quantitatively detailed below.On the bottom right, the true detector-level spectrum is also shown.Our method is shown to be able to bring the energy reconstruction almost at the same level of the true energy after detector response, that is, assuming a perfect calibration (bottom right, red).
Figure 6, left, shows the difference of the centroid of the lines, determined with a line shape fit, with respect to the true energy of the peaks.While the standard calibration shows a degree of divergence with respect to the true level energy, the optimized procedure presented in this work pushes the position to the correct values within the level of the statistical uncertainty of the fit.Figure 6, right, shows the width of the peaks as a function of the peak energy.Also, in this case, the differentiable optimization brings the line very close to the limit of the perfect calibration, with sizable gains throughout the spectrum.
(a) (b) Figure 6.On (a), the relative error of the peaks as a function of the energy, determined with a line shape fit to the centroid.The perfect (truth-level) calibration is shown in green, the standard calibration in blue, and the differentiable optimization in orange.Values as close as possible to 0 eV on the y-axis represent a better calibration.While the standard calibration shows a high degree of incompatibility, the optimized calibration aligns very well with the perfect calibration.On the (b), the same comparison is made for the line widths, from a line shape fit to the centroid.The optimized calibration demonstrates that it can push the energy resolution very close to the maximum possible value.
To further prove the result of the method, the calibration constants can be compared against their true value with the relative difference.This is shown in Figure 7 for the two calibration parameters A and B (P 1 and P 2 ) utilized for the energy calibration.The differentiable optimization is shown in blue, and the standard calibration is shown in orange, bringing in the latter a clear gain.

Discussion
In this work, we generalized the approach presented in [9] to the entire spectrum, moving away from the unbinned loss function and using instead the KDE representation for the underlying PDF of the data.This way, arbitrarily complex spectrum shapes can be taken into account in a differentiable manner.The gains of this method were showcased in a simplified, but illustrative example of the calibration process commonly used in spectroscopic experiments.The gradient-based optimization is shown to outperform the standard way of dealing with calibrations.Great gains are visible both at the level of the precision of the calibration, and also in terms of improvements in the resolution.It is worth emphasizing again that in many cases the dominant source of systematic uncertainty originates from the energy scale.The differentiable optimization shows to potentially minimize this uncertainty to the statistical level, greatly improving the overall precision of spectroscopic measurement.One additional benefit of this method is to unlock the possibility of using known but low-yield transitions present in the data taking as an in situ additional source calibration.Spectroscopic experiments, e.g., in nuclear physics, often rely on special calibration runs to provide lines of known energy which are in turn used to determine the calibration parameters of the detectors.The determined set of parameters is then utilized to calibrate the actual physics run, and the procedure is repeated in this order for the entire data-taking campaign.Usually, in physics runs, spectra present additional transitions, e.g., from elements within the experimental apparatus, which cannot be directly used for calibration because of their lower yield.The method presented in this work can be used to refine the calibration parameters obtained from the calibration runs, leveraging the in situ capability of the additional transitions during the physics runs, even at lower yield, a capability which could not be exploited in standard methods otherwise.
Finally, the computing power needed for this method needs to be considered.In the case of this simplified example with 100 batches to be calibrated, the differentiable optimization was run on a dedicated GPU; however, in more realistic scenarios, where usually hundreds of detectors are taking data for longer times, this could represent a limitation.This overhead mainly originates from the computation of the KDE, which is carried out event by event.One possible mitigation would be to move from a KDE description to a binned KDE (bKDE), already pioneered in HEP [12], at the cost of an expected slight degradation of the performance of the method.

Conclusions
In this work, we presented a general way to optimize the calibration of spectroscopic experiments, building upon previous approaches [9].The method presented here has several advantages over the previous one, and can be summarized as follows: • The previous unbinned likelihood-based loss function, L(ADC i , P i ), which was suited to rare event searches in underground laboratories, is now replaced by a global χ 2 -based loss function.This allows for a wealth of different shapes to be taken into account, and the method can be applied to a broader range of spectroscopic experiments.

•
Unlike previous methods, the χ 2 -based loss function allows the use of KDE as a representation of the underlying PDF, providing a fully differentiable model, which can be used in gradient-based optimization.The KDE can be trivially implemented as a binned KDE, reducing the computational overhead, and allowing the method to be used with high statistics data.

•
In the case of precision measurements or characterization of spectroscopic lines, the global loss function ensures a much more contained degree of distortion of the calibration parameters, making the calibration more robust and reliable, and enabling a more accurate estimation of the associated systematic uncertainties.
This differentiable optimization technique demonstrated significant improvements in the calibration process compared to the standard approach.The method was able to align detector signals with the true energy values of emitted spectral lines, resulting in a much more accurate and reliable calibration.The differentiable optimization approach, with the formulation of a loss function using the χ 2 of the spectrum with respect to the target PDF, showed substantial gains in the accuracy of the calibration parameters, and allowed us to optimize spectra of arbitrarily complex shapes.The results of the differentiable optimization were validated with various descriptors, including the comparison of the calibrated spectrum, the relative difference of the calibration parameters, and the width of the peaks.In all cases, the differentiable optimization method outperformed the standard calibration, bringing the energy reconstruction close to the true energy values and significantly improving the accuracy of the parameters.
The SIDDHARTA-2 experiment at DAΦNE [20], after a commissioning run where it performed the most precise kaonic helium shift measurement in gas [21], has completed a kaonic neon run in 2023.These data can be used to link the higher-order transitions to the kaon mass, currently affected by a large uncertainty due to the incompatibility of the two most precise measurements [22].The collaboration plans to employ the method presented in this work to reach the level of uncertainty needed to provide an additional high-precision measurement.
Currently, photon detectors employing black silicon (b-Si) are being developed and tested, providing intriguing possibilities of applications in nuclear physics.They are demonstrated to achieve a more robust radiation hardness [23,24], and show excellent capabilities in terms of energy resolution and efficiency for the near-infrared and ultraviolet range [25], and also in X-rays with black Silicon Drift Detectors [26].In precision measurements, an inexact calibration can lead to a significant systematic uncertainty, preventing researchers from unlocking the full potential of innovations in material science.The method presented in this work can be employed to fully exploit the capabilities of new detection technologies.
Finally, this method has the potential to have an impact on detectors known for complex calibration, or that have a very sensible response on the temperature or pressure, such as TES detectors.
Funding: This research was funded by the National Institute for Nuclear Physics (INFN, Italy) and by the EU STRONG-2020 project (Grant Agreement No. 824093) and the EU Horizon 2020 project under the MSCA (Grant Agreement 754496).This publication was also made possible through the support of Grant 62099 from the John Templeton Foundation.The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the John Templeton Foundation.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Figure 1 .
Simulation of the time dependence of the two parameters p 0 (a) and p 1 (b).The time dependence is modeled as a stochastic process.The blue line represents the fine-grain evolution, the red line is the average on the single run.The shaded area represents physics runs where the parameters (derived from unshaded areas' calibration runs) are applied.

Figure 2 .
The spectra obtained from the first three steps of the simulation for a single batch.On (a), the truth level energy with the Ti and Cu K α and an additional line to be measured.(b) shows the detector energy, obtained from the convolution of the previous step with the detector response.Finally, on (c), the detector ADC counts assuming a linear calibration function is shown.

Figure 3 .
Figure 3.A representation of the toy scenario considered in this work.At truth energy level, we generate three lines and a flat background (top left, green).Subsequently, this is convolved for different

Figure 4 .
Figure 4.The loss function as a function of the number of iterations in the gradient-based optimization.

Figure 5 .
(a,b), evolution of the PDF of the data, estimated via KDE (blue) and the target PDF (orange).On the left, before the optimization, and on the right, at the end.

Figure 7 .
The relative difference of the calibration parameters, offset (a) and gain (b), using the standard (orange) and optimized procedure (blue) with respect to the true values used in the simulation.As can be seen, the distributions of the calibration parameters are closer to their true values, especially for the gain, reflecting the increased performance of the calibration.