1. Introduction
Soft gamma rays are emitted from many poorly measured astrophysical processes, motivating exploration of the low MeV region of the astronomical electromagnetic spectrum [
1]. Firstly, for example, positrons in the Milky Way can be observed from 511 keV emissions, a product of electron-positron annihilation. However, it is unclear where the positrons within the interstellar medium originated from, and the quantity and spatial distribution of the positrons is uncertain as well. Therefore, mapping out the distribution of the 511 keV line is key to understanding antimatter within our galaxy [
2]. Secondly, we have observed gamma rays from the galactic center in surplus of that expected from known astrophysical processes. These gamma rays could be due to Weakly Interactive Massive Particles, or WIMPs, for the WIMP dark matter model predicts a self-annihilation interaction that results in the production of low MeV gamma rays. It is important to determine if the galactic center excess is due to physics beyond the Standard Model or can be attributed to millisecond pulsars [
3]. Thirdly, energetic processes such as active galactic nuclei, supernovae, and merging binary stars emit gamma radiation. The structure of jets, accretion disks, magnetic fields, and underlying causes for gamma emission can be understood by studying the polarization of these gamma rays. Neutron star mergers are accompanied by gamma ray bursts, and localization of these events gives insight to the dynamics associated with the final stages of the binary system [
4].
With such phenomena in mind, soft gamma rays can be viewed as a powerful probe for research in many areas of astrophysics. However, the low MeV gamma ray band is a region of the electromagnetic spectrum that is largely unexplored. Instruments aboard the Compton Gamma Ray observatory such as COMPTEL and OSSE have investigated this energy range though suffer low signal to noise and a small effective area. As a consequence, an all sky gamma ray observatory called GammaTPC has been proposed for development. This instrument concept promises a larger field of view, better pointing capability and energy resolution, and high sensitivity to polarization, thus providing a significant enhancement in sensitivity over previous instruments [
5].
GammaTPC utilizes liquid argon Time Projection Chamber (TPC) technology [
6] in order to localize gamma ray sources. Gamma rays will scatter in the liquid argon, creating energetic recoiling electrons which lose energy by both ionizing and exciting nearby argon atoms. The excited atoms return to their ground state and emit photons that are promptly detected by silicon photomultipliers. The liberated charges then drift in an applied electric field to a 2D array of charge-sensitive pixels to be read out. The distribution of charges across this 2D array locates them in the
plane. Additionally, the time between the initial flash of light and the time the charges are detected, combined with the known value of the electron drift speed, determines the interaction depth, thus giving a 3D reconstruction of the event. Fine grained
charge readout with pixels enables detailed measurement of the 3D charge cloud from each gamma ray scatter. Two main challenges associated with this technology are power consumption and charge diffusion. More power is required for lower noise charge readout. Furthermore, electrons read out by the pixels are at risk of being too diffuse and not reaching the charge threshold required for detection. This issue is resolved with a novel charge readout system. A coarse grid of wires is placed above the pixel grid. Electrons passing through the wires induce current that powers the pixels behind the wire mesh, therefore reserving power until readout is required. Furthermore, the coarseness of the grid allows full measurement of the signal from diffuse charge clouds that otherwise will lose significant charge to below threshold signals in the pixels.
By measuring the energy from the scattered electrons and light from atomic de-excitation, we can infer the energy of the incident gamma ray. Furthermore, by measuring the positions of the first two Compton scatters, we can infer the sky location of the gamma ray source using the Compton formula, resulting in estimates of the location as a circular loop in the sky [
7]. Though, the axial degeneracy of the location constraint can be lifted if we can also obtain the initial direction of the scattered electron. If measured perfectly, the initial scattering direction allows us to reduce the inferred location the gamma ray from a circle to a point. In practice, the loops are reduced to arcs with non-negligible width, and the thickness of the width is reflective of the uncertainty in the energy and position in the kinematic reconstruction. Overlapping arcs from multiple scattering events allow for localization of the gamma source in which the energy and position uncertainties determine the point-spread function of the image.
The interaction locations are the minimum pieces of information required to locate the gamma ray source, though additionally obtaining the scattered electrons’ initial directions improves the signal to noise ratio and reduces the number of gamma rays needed for localization. Algorithmic extraction of these interaction parameters from the charge readout data has been previously performed though is not straightforward. However, recent advances in the field of machine learning have provided potential for deep learning to achieve considerably better accuracy and precision over algorithmic techniques. This is especially important when considering the fact that we aim for sub-millimeter level accuracy to prevent the error in the kinematic reconstruction of the incident photon direction from being dominated by the interaction location error. Therefore, we employ a convolutional neural network to estimate the interaction location and the initial scattered electron direction.
The contributions and outline of our work can be summarized as follows. In
Section 1, we motivate how a proposed instrument called GammaTPC can shed light on physics associated with the poorly measured MeV sky and requires high accuracy machine learning models to optimize its pointing capability. The ultimate purpose of our work is to develop these high performing ML models. Then, in
Section 2, we discuss prior and related work utilizing physics based and machine learning based estimation and compare and contrast these to our investigation.
Section 3 describes the model architecture we have developed for GammaTPC as well as the model hyperparameters and the generation and details of the data set used. Notably, we use very large training data sets generated with highly realistic simulated scattering events as well as custom code to simulate the effects of detector readout. Our results are discussed in
Section 4 and specifically detail the high accuracy of the initial direction and initial interaction location model predictions. We also include plots of the electron tracks overlaid with visualizations of the models’ predictions. Results for models trained on a broader data set are shown in
Section 5, and we conclude and present paths for future work in
Section 6.
2. Related Work
Prior efforts focusing on electron track reconstruction may be divided into two approaches: physics based estimation and data driven estimation. The reconstruction of general electron tracks is of seminal importance for the fields of Compton telescopes and X-ray polarimetry and has been investigated by researchers in both the disciplines.
With respect to physics based estimation, Bellazzini et al. [
8] outlined the Momentum Method for reconstructing the direction of photo-electron emission. Recently, Li et al. [
9] presented an algorithm based on graph theoretic concepts for X-ray polarimetry. Yoneda et al. [
10] have attempted to apply this technique to electron-tracking Compton cameras. These techniques are inherently limited as they are applicable to strip based detectors only. In this context, Bernard et al. [
11] have outlined the potential accuracy of time projection chambers for providing a photon direction measurement for Compton events, along with the polarimetry of linearly polarized radiation.
With respect to data driven estimation, there has been limited prior work carried out in this context, wherein Machine Learning algorithms have been applied for similar electron track reconstruction. Ikeda et al. [
12] and Takada et al. [
13] have utilized a Convolutional Neural Networks based on the VGG [
14] and the U-Net [
15] (The U-Net architecture consists of a contracting section, and an expansive section that also performs concatenations with the features from the contracting section. This leads to an architecture that resembles the alphabet “U”, leading to the nomenclature) architectures to predict the scattering positions and electron recoil directions from track images from PIC anode and cathode strips. Similarly, Peirson et al. [
16], Peirson and Romani [
17] and Peirson and Romani [
18] utilized an ensemble of deterministic neural networks corresponding to the ResNet [
19] architecture trained on Monte Carlo event simulations for the IXPE detector.
Our work focuses on inputs of 3-D images, as opposed to the aforementioned strip based detectors that generated conventional images with a singleton channel as inputs for the ML model. The complexity of the input leads to ensuing complexity of the problem. Furthermore, the proposed GammaTPC instrument is a liquid-based detector and thus is able to reconstruct electrons with energies of (100 keV) and above. The aforementioned studies focused on electrons of (1 keV). The higher energy leads to additional challenges in the ML task. Additionally, the higher dimensionality of the input (3-D images) obviates the adoption of established CNN architectures, as in the aforementioned prior works. We design our own model architecture, as is described in the following section.
3. Methods
The input to the CNN model is a 3D image of the voxelized charge readout from the detector. We simulate this readout in two steps. First, because GammaTPC is an instrument concept, real data is not yet available, so we use the
PENetration and Energy LOss of Positrons and Electrons (PENELOPE) [
20] code to simulate the energy loss along the tracks created by scattered electrons (A copy of PENELOPE can be obtained from the Nuclear Energy Agency at the following link:
https://www.oecd-nea.org/tools/abstract/detail/nea-1525/, accessed on 1 November 2022). PENELOPE provides a detailed simulation of electron transport through various media (in this case, liquid argon), including elastic and inelastic scatters off of atoms, bremsstrahlung, and electron-positron pair production. The output is a table containing a series of steps through the medium, with each corresponding to one of the aforementioned interaction types. At each step, the energy deposited is recorded, as well as any daughter particles, which are also tracked. Chaining the steps together provides the simulated electron track. In the second step of our simulation pipeline, we use a custom code that simulates the creation of clouds of ionized electrons from the energy deposited along the electron track, the diffusion of these clouds as electrons drift to the pixel readouts, and the readout of the charge collected on the pixels. The drift distance of the charge to the detector, the readout noise, and pixel threshold are all adjustable parameters in this custom detector effects and readout code.
As the machine learning model acts as a surrogate for a physics phenomenon, it needs to mimic the underlying characteristics of the physical process. Based on physical criteria, the learning algorithm to approximate the mapping should have translational invariance and subsume locality in the mapping. The translational invariance is desired as the behavior of scattered electrons is independent of their position in the TPC, at least to the degree of analysis relevant for this work. Additionally, the locality would enable the model to consider voxels in spatial proximity to learn the head and direction. As convolutional layers are equivariant to spatial translations [
21,
22], they are a judicious choice for this task. When coupled with pooling layers, they are approximately translation invariant [
21,
22,
23,
24]. That is, convolution-based feature extraction is not affected by the absolute position of the feature in the feature map. Additionally, the convolution operation preserves locality of features conditioned upon the extent of the filter sizes [
25]. Based on these criteria for the inductive biases of the learning algorithm, we utilize a convolutional neural network to learn a mapping from the raw three-dimensional images of the charge readout due to Compton scatters in the LArTPC to the location of the electron track head or the initial direction of the scattered electron.
We include spatial dropout [
26,
27] and batch normalization [
28] layers after each convolutional layer to reduce overfitting. The feature maps are flattened at the conclusion of the feature extraction stages that utilize three-dimensional convolutions. The final output from the fully connected layers is a 3-dimensional prediction for the location of the electron track head or of the initial direction of the scattered electron. The model architecture and ancillary hyperparameters were established based on Bayesian Optimization, followed by subsequent manual fine tuning. The outline of the model architecture utilized in this investigation is similar to Buuck et al. [
29], where a probabilistic analysis of the the reconstruction was executed. Our head model uses an MSE loss function, though the initial direction model uses a cosine loss function, for the magnitude of the initial direction vector is not a quantity relevant to the kinematic reconstruction. Furthermore, we include a flatten layer for the head and initial direction models instead of a global average pooling layer as done in their work. A schematic of the model is shown in
Figure 1.
The substitution of a global average pooling layer with a flatten layer increases the model complexity, though this is justified given the size of our data set compared to Buuck et al. [
29]. Approximately 100,000 electron tracks were generated with PENELOPE for each of the chosen energies ranging from 50 to 1000 keV and were split in a 95:5 ratio for training/validation and testing; the same ratio was used for splitting the training and validation datasets. Individual models were trained on each of the energies with batch sizes of 64. We used the Adam optimizer with an initial learning rate of
and decreased this by a factor of four if no improvement was made to the validation loss after four consecutive epochs with 100 epochs in total. Furthermore, we halted training if the learning rate was below
and the validation loss did not decrease after 12 consecutive epochs. The typical training time for the direction and head models in
Section 4 was 1.5 h and 30 min, respectively, while the models in
Section 5 took approximately 7.5 h to train. Our training scripts are publically available (
https://gitlab.com/gammatpc-ml/data-set-and-training-scripts, accessed on 1 November 2020), and additional code or data is available upon request. Finally, for our readout simulation, we use drift distances of 1, 5, and 10 cm for each energy and 200, 300, 400, and 500
m pixel pitches.
6. Conclusions and Future Work
We have developed machine learning models to predict the origin and initial direction of the electron recoil events in the proposed liquid argon gamma ray observatory GammaTPC. The precision of the reconstruction of the electron tracks is a major driver in the overall ability of the instrument to locate gamma ray sources on the sky. Consequently, the optimized pointing capability of GammaTPC gives rise to strong discovery potential for phenomena associated with gamma ray emission in the low MeV range, allowing for deeper insight on phenomena ranging from black hole jets to WIMP dark matter.
We have found that sub-mm accuracy for the track origin predictions is possible for the low energy events, especially at small drift lengths. Our initial direction model predicts the initial direction exceptionally for high energy events. At energies of 500 keV and below, the direction model is significantly less accurate, and at long drift distances it is largely incapable of determining the initial direction. We have tested the limits of the head and initial direction models by training them over a spectrum of energies and drift distances, finding that the larger dataset mitigates the models’ efficacies. The broadly trained models are less performant for energies that they are trained on compared to the specialized models, and they have little ability to interpolate to make predictions on events with energies not in the training data set.
It is possible to localize gamma ray sources more precisely by quantifying the uncertainty with the estimates from the machine learning models, and we intend to explore this with
post-hoc methods such as the Laplace Approximation [
30] in future work. As it stands, the models only provide point predictions of the initial direction and head location. Quantifying the uncertainty allows us to remove erroneous predictions that don’t contribute to the source localization. We also look forward to tuning the model hyperparameters so that it maintains adequate predicting capability when training data consists of mixed energies and drift lengths.