Fast and Accurate Gamma Imaging System Calibration Based on Deep Denoising Networks and Self-Adaptive Data Clustering

Gamma imagers play a key role in both industrial and medical applications. Modern gamma imagers typically employ iterative reconstruction methods in which the system matrix (SM) is a key component to obtain high-quality images. An accurate SM could be acquired from an experimental calibration step with a point source across the FOV, but at a cost of long calibration time to suppress noise, posing challenges to real-world applications. In this work, we propose a time-efficient SM calibration approach for a 4π-view gamma imager with short-time measured SM and deep-learning-based denoising. The key steps include decomposing the SM into multiple detector response function (DRF) images, categorizing DRFs into multiple groups with a self-adaptive K-means clustering method to address sensitivity discrepancy, and independently training separate denoising deep networks for each DRF group. We investigate two denoising networks and compare them against a conventional Gaussian filtering method. The results demonstrate that the denoised SM with deep networks faithfully yields a comparable imaging performance with the long-time measured SM. The SM calibration time is reduced from 1.4 h to 8 min. We conclude that the proposed SM denoising approach is promising and effective in enhancing the productivity of the 4π-view gamma imager, and it is also generally applicable to other imaging systems that require an experimental calibration step.


Introduction
A gamma imager is an important tool in industrial and medical applications for visually inspecting and measuring the spatial distribution of gamma radiation. Industrial gamma imagers, such as coded-aperture gamma cameras [1,2] and Compton cameras [3,4], are widely used in homeland security and nuclear emergency response scenarios. Medical gamma imaging devices, such as planar gamma cameras, Single Photon Emission Computed Tomography (SPECT), and Positron Emission Computed Tomography (PET), have been the backbone of clinical and preclinical molecular imaging.
In either of the above gamma imagers, image reconstruction is a key step that solves source distribution images from the measured photon position and energy information. Most modern gamma imagers employ a statistical image reconstruction algorithm, such as the maximum likelihood expectation maximization (MLEM) algorithm [5]. Recent studies have suggested that the iterative reconstruction methods could yield better image resolution and signal-to-noise performance [4,6] compared to analytical methods such as back-projection, filtered back-projection, and correlation analysis methods [7].

The 4π-View Gamma Imager
The proposed SM denoising approach is validated with the 4π-view gamma imager developed in our lab. To facilitate understanding, we briefly describe the gamma imager design in this section. Readers are referred to Ref. [26] for more details.
As shown in Figure 1a, the core component of the gamma imager is a 3D positionsensitive radiation detector block. When gamma photons emitted from surrounding radiation source(s) hit the detector block, every detector element has a different photon detection probability depending on the direction of the gamma ray due to varied photon attenuation from other detector elements on the photon path. Therefore, the accumulated photon events' distribution over a period of time reflects the directional distribution of radiation sources.
Sensors 2023, 23, x FOR PEER REVIEW 3 o one. We also investigated the gamma positioning accuracy and image resolution per mance of the gamma images with the denoised SM.

The 4π-View Gamma Imager
The proposed SM denoising approach is validated with the 4π-view gamma ima developed in our lab. To facilitate understanding, we briefly describe the gamma ima design in this section. Readers are referred to Ref. [26] for more details.
As shown in Figure 1a, the core component of the gamma imager is a 3D posit sensitive radiation detector block. When gamma photons emitted from surrounding r ation source(s) hit the detector block, every detector element has a different photon de tion probability depending on the direction of the gamma ray due to varied photon at uation from other detector elements on the photon path. Therefore, the accumulated p ton events' distribution over a period of time reflects the directional distribution of ra tion sources. As shown in Figure 1b,c, we assembled a realistic detector block with cerium-do gadolinium aluminum gallium garnet (GAGG(Ce)) scintillator and silicon photom plier (SiPM) arrays on both ends. The entire scintillator block was 67.5 × 67.5 × 20 mm size, consisting of 16 × 16 GAGG(Ce) scintillators (EPIC Crystal, China) with a size of × 4.05 × 20 mm 3 . Each scintillator was smeared with a totally reflective material, Ba (EPIC Crystal, China), on the four side surfaces. Two 16 × 16 SiPM arrays (FJ30035, semi, USA) were coupled to the dual end surfaces of the scintillator block.
The signals of SiPM arrays were read by a lab-developed ASIC chip [41]. Since e ASIC had 8 × 8 channels, we virtually divided the detector block into 2 × 2 sub-blo (Figure 1b,c), and each sub-block contained 8 × 8 GAGG(Ce) scintillators. One should n that the sub-block division strategy impacts the detector performance as well as the noising process (see Sections 2.3 and 2.4 for details).
To formulate the imaging problem, we defined a spherical coordinate system w the origin at the geometrical center of the detector. As shown in Figure 2, the image As shown in Figure 1b,c, we assembled a realistic detector block with cerium-doped gadolinium aluminum gallium garnet (GAGG(Ce)) scintillator and silicon photomultiplier (SiPM) arrays on both ends. The entire scintillator block was 67.5 × 67.5 × 20 mm 3 in size, consisting of 16 × 16 GAGG(Ce) scintillators (EPIC Crystal, Shanghai, China) with a size of 4.05 × 4.05 × 20 mm 3 . Each scintillator was smeared with a totally reflective material, BaSO 4 (EPIC Crystal, Shanghai, China), on the four side surfaces. Two 16 × 16 SiPM arrays (FJ30035, Onsemi, Phoenix, AZ, USA) were coupled to the dual end surfaces of the scintillator block.
The signals of SiPM arrays were read by a lab-developed ASIC chip [41]. Since each ASIC had 8 × 8 channels, we virtually divided the detector block into 2 × 2 sub-blocks (Figure 1b,c), and each sub-block contained 8 × 8 GAGG(Ce) scintillators. One should note that the sub-block division strategy impacts the detector performance as well as the denoising process (see Sections 2.3 and 2.4 for details).
To formulate the imaging problem, we defined a spherical coordinate system with the origin at the geometrical center of the detector. As shown in Figure 2, the image was defined on a 4π-view sphere surface, denoted by the polar angle θ and azimuth angle ϕ. We discretized the image domain into 181(θ) × 360(ϕ) pixels with a pixel size of 1 • × 1 • . The radioactivity in the i-th image pixel is denoted as x i . defined on a 4π-view sphere surface, denoted by the polar angle θ and azimuth a We discretized the image domain into 181(θ) × 360(φ) pixels with a pixel size of The radioactivity in the i-th image pixel is denoted as . The detector block measures the 3D position for each detected photon and grams the measured events into projection { }, where denotes the number of events in the j-th detector bin. In the transverse direction, the intrinsic detector res is determined by the scintillator size; therefore, there are 16 × 16 detector bins transv In the depth direction, we use a dual-end read-out technique [27] to calculate the interaction position. According to our measurement, the position estimation accu ~4 mm, resulting in 5 detector bins in the depth direction. Therefore, in each scin block, we define a total of 16 × 16 × 5 detector bins.
The imaging task aims to reconstruct the radioactivity image { } using the me projection dataset { }. We used the MLEM algorithm [5] in image reconstruction lows:

= ∑ ∑
where and indicate the reconstructed images after k+1 and k iterations, tively. The system matrix { } denotes the probability of one photon emitted from th image voxel and being detected in the j-th detector bin.

System Matrix Calibration and Detector Response Function
Figure 3a demonstrates a realistic scheme, with multiple gamma imagers m tured in a volume-producing procedure, that requires extensive SM calibrations. hardware and assembling-induced variations, the SM of each imager needs to be i ually calibrated. In this study, we chose two representative imagers, Imager 1 and 2.  The detector block measures the 3D position for each detected photon and histograms the measured events into projection {p j }, where p j denotes the number of photon events in the j-th detector bin. In the transverse direction, the intrinsic detector resolution is determined by the scintillator size; therefore, there are 16 × 16 detector bins transversely. In the depth direction, we use a dual-end read-out technique [27] to calculate the photon interaction position. According to our measurement, the position estimation accuracy is 4 mm, resulting in 5 detector bins in the depth direction. Therefore, in each scintillator block, we define a total of 16 × 16 × 5 detector bins.
The imaging task aims to reconstruct the radioactivity image {x i } using the measured projection dataset p j . We used the MLEM algorithm [5] in image reconstruction as follows: i and x k i indicate the reconstructed images after k + 1 and k iterations, respectively. The system matrix {a ij } denotes the probability of one photon emitted from the i-th image voxel and being detected in the j-th detector bin.

System Matrix Calibration and Detector Response Function
Figure 3a demonstrates a realistic scheme, with multiple gamma imagers manufactured in a volume-producing procedure, that requires extensive SM calibrations. Due to hardware and assembling-induced variations, the SM of each imager needs to be individually calibrated. In this study, we chose two representative imagers, Imager 1 and Imager 2. The detector block measures the 3D position for each detected photon and hist grams the measured events into projection { }, where denotes the number of phot events in the j-th detector bin. In the transverse direction, the intrinsic detector resoluti is determined by the scintillator size; therefore, there are 16 × 16 detector bins transverse In the depth direction, we use a dual-end read-out technique [27] to calculate the phot interaction position. According to our measurement, the position estimation accuracy ~4 mm, resulting in 5 detector bins in the depth direction. Therefore, in each scintillat block, we define a total of 16 × 16 × 5 detector bins.
The imaging task aims to reconstruct the radioactivity image { } using the measur projection dataset { }. We used the MLEM algorithm [5] in image reconstruction as f lows:

= ∑ ∑
where and indicate the reconstructed images after k+1 and k iterations, respe tively. The system matrix { } denotes the probability of one photon emitted from the th image voxel and being detected in the j-th detector bin.

System Matrix Calibration and Detector Response Function
Figure 3a demonstrates a realistic scheme, with multiple gamma imagers manufa tured in a volume-producing procedure, that requires extensive SM calibrations. Due hardware and assembling-induced variations, the SM of each imager needs to be indivi ually calibrated. In this study, we chose two representative imagers, Imager 1 and Imag 2.  In the radiation source monitoring application scenario illustrated in Figure 3b, the gamma imager is installed on the ceiling with a top-view posture. In this case, a 2π image FOV is required. Therefore, although the gamma imager itself allows 4π-imaging, in what follows, we define a 2π-FOV for the realistic imaging system (Figure 3c). The developed methodology is expected to be applicable in the entire 4π-FOV.
In the experimental calibration process, the SM is measured with a point source surrounding the imager and then pre-stored in the computer. Each imager is mounted on a motion controller so that it is rotatable about the center and is exposed to a stationary radioactive source (Figure 3d) to calibrate the system matrix. We rotate the imager to allow hemispherical coverage of the illumination of the point source, which makes θ from 0 • to 90 • and ϕ from 0 • to 359 • (Figure 3c). The whole calibration process is described as follows: (1) Define a 10 × 36 grid in the 2π image FOV, with θ ranging from 0 • to 90 • with 10 • intervals and ϕ ranging from 0 • to 350 • with 10 • intervals. The SM of the gamma imager is dependent on the energy of the radiation source. Therefore, the calibration process is repeated for each source. In this study, we chose two representative radiation sources, 99m Tc and 137 Cs, to investigate the fast calibration approach. The 99m Tc source represents low-energy radiation and is widely used in medical applications. The 137 Cs source has medium gamma energy and is regularly used in industrial applications.
In a typical calibration process, each of the 99m Tc or 137 Cs radiation sources used in the calibration has an activity of~15 mCi and is placed~0.9 m away from the detector. At each measurement point, we acquire the projection in 15 s durations. The overall calibration measurement requires~1.4 h for each imager and each radiation source. After spline interpolation, the total number of acquired photon events in each detector bin varies from 1.60 × 10 6 to 2.17 × 10 8 for 99m Tc in a 112-168 keV energy window and 5.41 × 10 6 to 3.59 × 10 7 for 137 Cs in a 530-785 keV energy window. Figure 4a indicates the geometric relationship between the calibration measurement and the measured SM. The projection of each detector bin can be extracted as a column of SM. Figure 4b shows representative columns of the SMs measured in a typical acquisition time (~1.4 h) for two detector bins. There are 360 (ϕ) × 91 (θ) = 32,760 elements in each column, which is visualized as a 360 × 91 color-scale image. In this work, we define such an image as a detector response function (DRF), as it represents the detection probability distribution for a certain detector bin over all image pixels.
For a single detector bin, gamma photons emitted from different directions have various detection probabilities because of different attenuation distances between sources and the detector bin induced by other detector bins along the path. The total acquired counts in each detector bin and for each radiation source are marked in the upper-right corner of each DRF image in Figure 4b.
In Figure 4c, we show four DRFs that were also acquired for the same detector bins but with a 10% acquisition time. The counts are marked in the upper-right corner. Figure 4d demonstrates line profiles along the white double-headed arrows in Figure 4b,c for DRFs measured in full acquisition time and in 10% acquisition time. Obviously, DRFs measured with a 10% acquisition time differ from their long-time measured counterparts in terms of significantly increased statistical noise. Therefore, an effective denoising process is mandatory for the noisy SM.
In what follows, we chose DRF as the input of neural networks because (1) SM is the combination of DRFs for all the detector bins; thus, denoised SM can be given by assembling all the denoised DRFs together. (2) DRF can be shaped into a 2D smooth image with an appropriate size while retaining the physical meaning, which is beneficial for denoising. In what follows, we chose DRF as the input of neural networks because (1) SM combination of DRFs for all the detector bins; thus, denoised SM can be given by a bling all the denoised DRFs together. (2) DRF can be shaped into a 2D smooth imag an appropriate size while retaining the physical meaning, which is beneficial f noising.     Figure 5a, we show sensitivity maps over the entire detector block for 99m Tc and 137 Cs sources. The maps are displayed from the side view, and layers 1 to 5 are indicated in the upper row in Figure 5a. Clearly, the sensitivity of each detector bin varies with the position of the detector bin as well as the source type. For the 99m Tc source, the detector bins on the surface have more counts due to the low penetrating capability of the 140 keV gamma ray. The last row of detector bins for each layer has the highest sensitivity due to the lower-hemisphere FOV. Additionally, the discrepancy of properties between each crystal bar and each electronic signal read-out element contributes to the non-uniformity. For the 137 Cs source, the "cross" pattern is caused by the signal read-out setting described in Section 2.1. Since the signals in each sub-block are read out individually, if a photon interacts with the detector block through Compton scattering and deposits a portion of energy in two sub-blocks, the produced signals on one sub-block or both sub-blocks may be low enough to be rejected by the energy window discrimination logic, leading to a low photon event distribution on the edge of each sub-block. Since it has been shown that the optimal parameter set of a denoising network is highly relevant to the count level of the images [37], the non-uniformity of sensitivity poses a challenge in training denoising networks.

Self-Adaptive, Sensitivity-Dependent Data-Grouping Strategy
Sensors 2023, 23, x FOR PEER REVIEW 7 of 26 side view, and layers 1 to 5 are indicated in the upper row in Figure 5a. Clearly, the sensitivity of each detector bin varies with the position of the detector bin as well as the source type. For the 99m Tc source, the detector bins on the surface have more counts due to the low penetrating capability of the 140 keV gamma ray. The last row of detector bins for each layer has the highest sensitivity due to the lower-hemisphere FOV. Additionally, the discrepancy of properties between each crystal bar and each electronic signal read-out element contributes to the non-uniformity. For the 137 Cs source, the "cross" pattern is caused by the signal read-out setting described in Section 2.1. Since the signals in each sub-block are read out individually, if a photon interacts with the detector block through Compton scattering and deposits a portion of energy in two sub-blocks, the produced signals on one sub-block or both sub-blocks may be low enough to be rejected by the energy window discrimination logic, leading to a low photon event distribution on the edge of each sub-block. Since it has been shown that the optimal parameter set of a denoising network is highly relevant to the count level of the images [37], the non-uniformity of sensitivity poses a challenge in training denoising networks. To address this issue, we separated the DRF images into multiple groups according to each detector bin's sensitivity. For each group, the denoising network training and parameter optimization processes were performed individually.
To accommodate the system response discrepancy for different radioactive sources and different machines, we implemented a self-adaptive, unsupervised K-means clustering algorithm in data grouping. The Euclidean distance was used as a metric of similarity for detector bins' sensitivity (i.e., the sum of counts in each DRF image). Cluster centroids were initialized with random items, and then DRFs of 1280 detector bins were automatically categorized into three groups according to their sensitivity.
In Figure 5b, we show the total counts in each detector bin in descending order, and the grouping result is labeled with different colors. In both cases of the 99m Tc and 137 Cs sources, we defined three data groups. For the 99m Tc source, there are 75 DRFs in Group 1 with an average count level of 1.6853 × 10 8 , 167 DRFs in Group 2 with an average count level of 6.9547 × 10 7 , and 1038 DRFs in Group 3 with an average count level of 2.1096 × 10 7 . For the 137 Cs source, Group 1 has 307 DRFs with 2.4511 × 10 7 counts on average, Group 2 has 566 DRFs with 1.6439 × 10 7 counts on average, and Group 3 has 407 DRFs with 1.0682 × 10 7 counts on average.
We further show the position maps of detectors in each data group (labeled in different colors) for the 99m Tc source and 137 Cs source in Figure 5c. For each radioactive source and each DRF group, an individual network is trained.

Deep-Learning-Based Denoising
We proposed two deep learning networks for the SM denoising task: a U-net encoderdecoder network [39] and a residual U-net (Res-U-net) framework, which is the combination of a U-net and a residual connection [42]. Both networks accept DRF images as the network input and produce denoised DRF images as the output. Figure 6 illustrates the U-net network architecture in this study. The width of each block indicates the number of feature maps in the layer, the length denotes the input size of the matrix, and the arrows stand for different operations. The entire network consists of an encoder, a bottleneck, and a symmetrical decoder, making up a U-shape. The encoder contains four stacks; in each stack, there are 2 convolutional layers with a 3 × 3 kernel followed by a rectified linear unit (ReLU), and a 2 × 2 max pooling layer with a stride of 2. The bottleneck has 2 convolutional layers. The decoder consists of four stacks of convolutional layers and up-convolutional layers which expand the feature maps. A fully convolutional layer is added to the end to match the feature maps to the label. Between the encoding layers and their corresponding decoding layers, there are skip connections to propagate low-level features to high-resolution layers and compensate for information loss in max pooling.

Network Architectures U-Net Architecture
We created two adaptions from the original U-net [39]. First, image padding was applied in each convolutional layer to keep a constant size of feature maps. Second, since there are 4 max pooling layers and 4 up-convolutional layers, we adapted the length and width of the input image to be multiples of 16 so that the output image had the same size as the input.

Res-U-Net Architecture
Different from the U-net network, in the Res-U-net network structure (as shown in Figure 7), a skip connection is added between the input and output of the whole network. The adoption of the residual connection concept could ease the training of the network, resolve the degradation problem, and potentially improve training accuracy.

Res-U-Net Architecture
Different from the U-net network, in the Res-U-net network structure (as shown in Figure 7), a skip connection is added between the input and output of the whole network. The adoption of the residual connection concept could ease the training of the network, resolve the degradation problem, and potentially improve training accuracy.

Res-U-Net Architecture
Different from the U-net network, in the Res-U-net network structure (as shown in Figure 7), a skip connection is added between the input and output of the whole network. The adoption of the residual connection concept could ease the training of the network, resolve the degradation problem, and potentially improve training accuracy.

Dataset Preparation and Network Training
The training and testing datasets were produced from the SM calibration measurements described in Section 2.2 with the following steps: (1) By using all the events acquired in the full acquisition time of Imager 1, we produced a full-count SM (FC-SM). (2) We generated low-count SM (LC-SM) by randomly picking 10% events from the fully acquired list mode data, representing an SM that can be measured with a 10% acquisition time. (3) We extracted 1280 pairs of full-count DRFs and low-count DRFs from FC-SM and LC-SM and used them as the label and input dataset, respectively, which were fed into the deep networks. For each source energy and each DRF group, an individual network was trained. (4) We repeated down-sampling steps (2) and (3) 20 times to produce 20 independent LC-SMs and used all of them as the training data so that the deep networks had sufficient input data to avoid overfitting.
One should note that (1) to match the magnitude of FC-SM, each LC-SM is multiplied by 10, and (2) due to 4 pairs of max pooling and up-convolutional layers in both U-net and Res-U-net architectures, the length and width of input matrix are preferably multiples of 16 to match the size of output figures with that of the input DRFs. Therefore, we added paddings around the input DRFs and transformed their size into 368 × 96.
We ran the network training process separately for each gamma energy and each of the data groups, as described in Section 2.3. All the training data were extracted from the calibration measurements for Imager 1.
We evaluated the efficacy of the trained denoising network with two approaches: Intra-device testing. We produced another 10 LC-SMs with measured data of Imager 1 as the testing data (statistically independent of the training data). The denoised SMs were compared to the FC-SMs.
Inter-device testing. We produced 1 FC-SM and 10 LC-SMs from the calibrated data of Imager 2. Instead of training other denoising networks for Imager 2, we directly used the network trained with data of Imager 1 to denoise Imager 2 s LC-SMs. We expected this approach to reveal the potential of real-world acceleration of the calibration process in a volume production pipeline since long-time calibration measurement is required for only one device.
However, when implementing inter-device evaluation, the count levels of Imager 1 and Imager 2 were different, even for DRFs from the same detector bin, leading to a mismatch between the training and testing data noise levels. This was caused by the different properties of scintillation crystals and digital processing units from different devices. On the other hand, due to the non-linear response of the networks, the mismatch of the count level should be taken care of. Therefore, we applied detector-by-detector scaling to compensate for the mismatch as follows: (1) Calculate DRF-wise scaling factors F j , F j = total counts o f the j th DRF f rom Imager1 total counts o f the j th DRF f rom Imager2 ;

Implementation Details
We used a batch size of 16 in all the training tasks. The epoch numbers of each network were empirically chosen to assure convergence, as listed in Table 1. In all the cases, we used the MSE loss function, the Adam optimizer, with an initial learning rate of 0.0001 and an exponential decay rate of 0.996. All the computations were carried out on a workstation equipped with an NVIDIA GeForce RTX 2080 GPU card. We used a hybrid programming framework with MATLAB V9.8 and Python V3.6 with the PyTorch framework.

Conventional Gaussian-Filtering-Based Denoise Approach
We also implemented a traditional Gaussian-filtering-based denoising method for comparison. For each individual DRF image, we filtered the image with a 2D Gaussian filtering kernel as follows: where σ denotes the standard deviation of the Gaussian kernel function. To obtain the best Gaussian filtering performance for fair comparisons, we tested on the LC-SM (Imager 1) data (multiplied by 10 to match the magnitude of FS-SM (Imager 1)) with σ ranging from 0.1 to 15 pixels with a step size of 0.1 pixels for each detector bin. We used a figure-of-merit of the mean square error (MSE) between the full-count DRF image and the filtered lowcount DRF image to determine an optimal σ for each DRF. Figure 8 illustrates the MSE curves of two representative detector bins (as indicated in Figure 4a) for 99m Tc and 137 Cs sources. The optimal σ values that yielded the smallest MSE are marked in each sub-figure.
GeForce RTX 2080 GPU card. We used a hybrid programming framework with MATLAB V9.8 and Python V3.6 with the PyTorch framework.

Conventional Gaussian-Filtering-Based Denoise Approach
We also implemented a traditional Gaussian-filtering-based denoising method for comparison. For each individual DRF image, we filtered the image with a 2D Gaussian filtering kernel as follows: where σ denotes the standard deviation of the Gaussian kernel function. To obtain the best Gaussian filtering performance for fair comparisons, we tested on the LC-SM (Imager 1) data (multiplied by 10 to match the magnitude of FS-SM (Imager 1)) with σ ranging from 0.1 to 15 pixels with a step size of 0.1 pixels for each detector bin. We used a figure of-merit of the mean square error (MSE) between the full-count DRF image and the filtered low-count DRF image to determine an optimal σ for each DRF. Figure 8 illustrates the MSE curves of two representative detector bins (as indicated in Figure 4a) for 99m Tc and 137 Cs sources. The optimal σ values that yielded the smallest MSE are marked in each sub-figure.

SSIM between System Matrices
The structural similarity index (SSIM) directly reflects the difference between FC-SM and the denoised LC-SM. SSIM between two system matrices is represented by the mean SSIM value of each corresponding DRF; additionally, SSIM for each DRF group is calcu lated by the mean. For two system matrices composed of N DRFs, the formula can be written as follows: The structural similarity index (SSIM) directly reflects the difference between FC-SM and the denoised LC-SM. SSIM between two system matrices is represented by the mean SSIM value of each corresponding DRF; additionally, SSIM for each DRF group is calculated by the mean. For two system matrices composed of N DRFs, the formula can be written as follows:

Positioning Bias
We  Figure 9. At each point, we collected around 1M photon events. Each reconstructed image was calculated with 10,000 MLEM iterations. The experiments were conducted twice, one with a 99m Tc source and the other with a 137 Cs source.
We tested the positioning accuracy by imaging a single point source at exactly known angular positions. We experimentally placed a point source at 6 × 6 different positions with θ = {17°, 35°, 46°, 53°, 64°, 81°} and φ = {64°, 82°, 127°, 189°, 261°, 333°}. The distribution map of testing positions is shown in Figure 9. At each point, we collected around 1M photon events. Each reconstructed image was calculated with 10,000 MLEM iterations. The experiments were conducted twice, one with a 99m Tc source and the other with a 137 Cs source. We calculated the positioning bias, which refers to the deviation between the reconstructed position of the radioactive source and the ground truth (denoted as and ). The reconstructed position and is determined by the centroid of the image in both θ and φ directions: where , represents the value at pixel location , of the reconstructed image. Then, positioning bias was calculated by: = arccos , ⃗ • , ⃗ , ⃗ × , ⃗

FWHM Resolution
Image resolution is also an important image quality index. We calculated the fullwidth-half-maximum (FWHM) resolution from the reconstructed single-point-source images described in Section 2.6.2. We fit each image with a 2D non-isotropic Gaussian function, from which we calculated the FWHM of the point source in both θ and φ directions. Then, we calculated the FWHM resolution as =

Denoised SMs
We chose 10 LC-SMs of Imager 1 as the testing set to perform intra-device evaluation. Figure 10 and Figure 11 show representative DRF images of FC-SM, LC-SM, Gaussian- We calculated the positioning bias, which refers to the deviation between the reconstructed position of the radioactive source and the ground truth (denoted as θ true and ϕ true ). The reconstructed positionθ andφ is determined by the centroid of the image in both θ and ϕ directions:θ where v(θ, ϕ) represents the value at pixel location (ϕ) of the reconstructed image. Then, positioning bias was calculated by:

FWHM Resolution
Image resolution is also an important image quality index. We calculated the fullwidth-half-maximum (FWHM) resolution from the reconstructed single-point-source images described in Section 2.6.2. We fit each image with a 2D non-isotropic Gaussian function, from which we calculated the FWHM of the point source in both θ and ϕ directions. Then, we calculated the FWHM resolution as resolution = (FW HM θ ) 2 + FW HM ϕ 2

Denoised SMs
We chose 10 LC-SMs of Imager 1 as the testing set to perform intra-device evaluation. Figures 10 and 11 show representative DRF images of FC-SM, LC-SM, Gaussian-filteringbased denoised SM (G-DSM), U-net-based denoised SM (U-DSM), and Res-U-net-based denoised SM (R-DSM) for 99m Tc and 137 Cs sources, respectively. We also statistically calculated the SSIM value between each DRF image and the corresponding FC-SM case, shown in the bottom-right corners of the images. For each group, we chose one representative detector bin (indicated in the first row in Figures 10 and 11 with a highlighted box) and plotted its DRFs in the rest of the rows.
filtering-based denoised SM (G-DSM), U-net-based denoised SM (U-DSM), and Res-Unet-based denoised SM (R-DSM) for 99m Tc and 137 Cs sources, respectively. We also statistically calculated the SSIM value between each DRF image and the corresponding FC-SM case, shown in the bottom-right corners of the images. For each group, we chose one representative detector bin (indicated in the first row in Figures 10 and 11 with a highlighted box) and plotted its DRFs in the rest of the rows.  For both the 99m Tc source and 137 Cs source, the DRFs of LC-SMs (third row in Figures  10 and 11) are evidently different from those of FC-SMs (second row) due to increased noise. Compared with LC-SMs, the DRFs of G-DSMs (fourth row) are much smoother For both the 99m Tc source and 137 Cs source, the DRFs of LC-SMs (third row in Figures 10 and 11) are evidently different from those of FC-SMs (second row) due to increased noise. Compared with LC-SMs, the DRFs of G-DSMs (fourth row) are much smoother after Gaussian filtering but with an unavoidable loss of details. DRFs of U-DSMs (fifth row) and those of R-DSMs (sixth row) are visually more similar to those of FC-SMs after U-net-based denoising and Res-U-net-based denoising, respectively. R-DSMs yield slightly better recovery of details.
The mean and standard deviation (SD) of SSIM calculated for 10 testing LC-SMs and FC-SM, as well as those between DSMs and FC-SM for 99m Tc and 137 Cs sources, are listed in Tables 2 and 3, respectively. For both 99m Tc and 137 Cs sources, the three denoising methods can improve SSIM, among which U-DSMs and R-DSMs reach higher SSIM values, while G-DSMs have the worst performance.

Performance of Reconstructed Images-Positioning Bias
In terms of image reconstruction evaluation, we tested the 36 different point source positions described in Section 2.6.2. The projections used for reconstruction were also measured in experiments with a count level of 1M. Each reconstructed image was obtained using the MLEM algorithm with 10,000 iterations.
The reconstructed images of 99m Tc and 137 Cs point sources at five representative positions are illustrated in Figures 12 and 13, respectively. The yellow box and green cross in each of the images in the first column indicate the zone for displaying the zoomed images and the true position of the point source. One can observe in Figures 12 and 13 that the image quality of LC-SM is poor with dispersive hot-dot artifacts. Using Gaussian-filtered SMs moderately improves the image quality; however, in certain positions, there are still visible artifacts, which lead to notable positioning bias. After implementing U-net-based denoising and Res-U-net-based denoising, the reconstructed images were obviously more similar to those with FC-SM, leading to better positional accuracy.
In Figures 14 and 15, we present box plots for the mean value and SD of the positioning bias of reconstructed images at the 36 different source positions indicated in Figure 9. We first calculated the mean value and SD for the 10 testing datasets at each source position for LC-SM, G-DSM, U-DSM, and R-DSM cases. Then, the mean and SD results at all 36 source positions were presented in box plots. It is important to note that since there is only one dataset for FC-SM, the mean positioning bias is exactly the value of the single dataset, and there is no SD statistics result for FC-SM. Both U-net and Res-U-net-based denoising achieve <2.5 • positioning bias for 99m Tc source and <2 • for 137 Cs source, outperforming LC-SM and G-DSM. Additionally, U-DSM and R-DSM have a lower bias SD, indicating that the deep-learning-based denoising methods are more robust for different LC-SMs than Gaussian filtering. In Figure 15b, the bias SD of U-DSM concentrates around 0.2 • , making the box plot a line.
The reconstructed images of Tc and Cs point sources at five representative posi-tions are illustrated in Figure 12 and Figure 13, respectively. The yellow box and green cross in each of the images in the first column indicate the zone for displaying the zoomed images and the true position of the point source. One can observe in Figures 12 and 13 that the image quality of LC-SM is poor with dispersive hot-dot artifacts. Using Gaussianfiltered SMs moderately improves the image quality; however, in certain positions, there are still visible artifacts, which lead to notable positioning bias. After implementing Unet-based denoising and Res-U-net-based denoising, the reconstructed images were obviously more similar to those with FC-SM, leading to better positional accuracy.  In Figures 14 and 15, we present box plots for the mean value and SD of the positioning bias of reconstructed images at the 36 different source positions indicated in Figure 9. We first calculated the mean value and SD for the 10 testing datasets at each source position for LC-SM, G-DSM, U-DSM, and R-DSM cases. Then, the mean and SD results at all

Image Performance-FWHM Resolution
The mean and SD values of FWHM resolution for 99m Tc and 137 Cs at 36 di source positions are shown in Figure 16 and Figure 17, respectively. For the 99m Tc s the mean resolution for FC-SM, U-DSM, and R-DSM mostly stayed below 20°, bette LC-SM and G-DSM. For the 137 Cs source, both U-net-and Res-U-net-based deep le methods achieved around 10~20° image resolution with a few exceptions, outperfo the 20~45° resolution with LC-SM and 10~35° with G-DSM. The SD values of U-DS R-DSM were also much lower for both 99m Tc and 137 Cs sources, indicating that th with deep learning denoising methods yield more robust image reconstruction.

Image Performance-FWHM Resolution
The mean and SD values of FWHM resolution for 99m Tc and 137 Cs at 36 dif source positions are shown in Figure 16 and Figure 17, respectively. For the 99m Tc s the mean resolution for FC-SM, U-DSM, and R-DSM mostly stayed below 20°, bette LC-SM and G-DSM. For the 137 Cs source, both U-net-and Res-U-net-based deep lea methods achieved around 10~20° image resolution with a few exceptions, outperfo the 20~45° resolution with LC-SM and 10~35° with G-DSM. The SD values of U-DSM R-DSM were also much lower for both 99m Tc and 137 Cs sources, indicating that th with deep learning denoising methods yield more robust image reconstruction.

Image Performance-FWHM Resolution
The mean and SD values of FWHM resolution for 99m Tc and 137 Cs at 36 different source positions are shown in Figures 16 and 17, respectively. For the 99m Tc source, the mean resolution for FC-SM, U-DSM, and R-DSM mostly stayed below 20 • , better than LC-SM and G-DSM. For the 137 Cs source, both U-net-and Res-U-net-based deep learning methods achieved around 10~20 • image resolution with a few exceptions, outperforming the 20~45 • resolution with LC-SM and 10~35 • with G-DSM. The SD values of U-DSM and R-DSM were also much lower for both 99m Tc and 137 Cs sources, indicating that the SMs with deep learning denoising methods yield more robust image reconstruction.

Image Performance-FWHM Resolution
The mean and SD values of FWHM resolution for 99m Tc and 137 Cs at 36 dif source positions are shown in Figure 16 and Figure 17, respectively. For the 99m Tc s the mean resolution for FC-SM, U-DSM, and R-DSM mostly stayed below 20°, bette LC-SM and G-DSM. For the 137 Cs source, both U-net-and Res-U-net-based deep lea methods achieved around 10~20° image resolution with a few exceptions, outperfo the 20~45° resolution with LC-SM and 10~35° with G-DSM. The SD values of U-DSM R-DSM were also much lower for both 99m Tc and 137 Cs sources, indicating that th with deep learning denoising methods yield more robust image reconstruction.

Imaging Performance-Positioning Bias
As described in Section 2.4.2, for inter-device evaluation, we trained the den networks with data measured in Imager 1 and applied the networks in the denoising for Imager 2. Figures 18 and 19 show the reconstructed images of 99m Tc and 137 Cs sources at five representative positions. The images in the second to sixth columns spond to the reconstructed images using an FC-SM, an LC-SM, and three denoise with the Gaussian filtering method (G-DSM), with the U-net-based denoising metho DSM), and with the Res-U-net based denoising method (R-DSM). One can observe s distortion with a noisy LC-SM (third column) or G-DSM (fourth column). U-DSM a DSM (fifth and sixth columns) yield better image quality and visually more similar shapes to the FC-SM cases (second columns).

Imaging Performance-Positioning Bias
As described in Section 2.4.2, for inter-device evaluation, we trained the denoising networks with data measured in Imager 1 and applied the networks in the denoising tasks for Imager 2. Figures 18 and 19 show the reconstructed images of 99m Tc and 137 Cs point sources at five representative positions. The images in the second to sixth columns correspond to the reconstructed images using an FC-SM, an LC-SM, and three denoised SMs with the Gaussian filtering method (G-DSM), with the U-net-based denoising method (U-DSM), and with the Res-U-net based denoising method (R-DSM). One can observe severe distortion with a noisy LC-SM (third column) or G-DSM (fourth column). U-DSM and R-DSM (fifth and sixth columns) yield better image quality and visually more similar image shapes to the FC-SM cases (second columns).  Quantitative analyses of the mean and SD values of positioning bias are summarized in Figure 20 and Figure 21 for the two radiation sources. For the 99m Tc source, the positioning bias is <2.6° in all cases. Although the image quality of LC-SM has an evident degradation, as shown in Figure 18, the positioning bias does not significantly increase, probably due to the centroid calculation step. In general, LC-SM, G-DSM, U-DSM, and R-DSM show comparable positioning accuracy, among which U-DSM performs slightly poorer. However, as shown in Figure 20b, the SD values of positioning bias for U-DSM and R-DSM are significantly smaller than those for other cases. For the 137 Cs source (Figure 21a), U-DSM and R-DSM achieve <2.5° average positioning bias, close to that of FC-SM. For LC-SM and G-DSM cases, the mean positioning bias is higher, ranging up to 5°. In Figure 21b, the SD values of the positioning bias for U-DSM and R-DSM cases outperform those for LC-SM and G-DSM cases.

Imaging Performance-FWHM Resolution
The FWHM resolution performance analyses are summarized in Figure Figure 22b and 23b) also demonstrate the advantage of deep-learning-bas noising over LC-SM and G-DSM cases. There are no significant differences betwe imaging performance using U-net and Res-U-net networks.  Figures 22b and 23b) also demonstrate the advantage of deep-learning-based denoising over LC-SM and G-DSM cases. There are no significant differences between the imaging performance using U-net and Res-U-net networks.

Imaging Performance-FWHM Resolution
The FWHM resolution performance analyses are summarized in Figure 22 Figure 22b and 23b) also demonstrate the advantage of deep-learning-bas noising over LC-SM and G-DSM cases. There are no significant differences betwe imaging performance using U-net and Res-U-net networks.

Discussion
In this study, we proposed a deep-learning-based denoising method to realize efficient SM calibration for a 4π-view gamma detector. Two network architectures

Discussion
In this study, we proposed a deep-learning-based denoising method to realize timeefficient SM calibration for a 4π-view gamma detector. Two network architectures were investigated, including U-net and Res-U-net, and they both outperformed the non-denoised LC-SM and the denoised SM with conventional Gaussian filtering method in terms of more accurate source position estimation and improved FWHM resolution. The trained networks were validated with the measured data both from the same imager device as well as a different imager to test the versatility of the proposed method. With our proposed method, the system matrix calibration time can be significantly reduced from 1.4 h to 8 min while positioning accuracy and image resolution remain comparable.
To accommodate the significant response discrepancy between different detector elements, we proposed a self-adaptive data-grouping method and trained a separate network for each group. We clustered the DRF images into three different groups to accommodate different noise levels. The number of groups was chosen as a balance among various considerations, e.g., count distribution determined by signal read-out setup, image FOV, and attenuation features of gamma photons in GAGG(Ce). Using more groups might further reduce the discrepancy but at the cost of data processing complexity. Having fewer DRF images in one group may also reduce the training capacity for each network and cause over-fitting. However, the data-grouping strategy can be flexible for different system designs.
The convergence property and model performance varied among different DRF groups. Taking 99m Tc as an example, Figures 24 and 25 show the training loss of three different groups for the U-net model and Res-U-net model, respectively. For both networks, Group 1 takes the largest number of epochs to converge, and Group 3 takes the fewest. This is because Group 1 comprises the fewest DRF images; thus, there are fewer iterations in one epoch, and Group 3 has more iterations in one epoch. In Table 4, we list the mean and SD of SSIM of different DRF groups calculated between 10 testing LC-SMs and FC-SM as well as those between DSMs and FC-SM for 99m Tc. For all the cases, SSIM decreases from Group 1 to Group 3. We believe this is because the detector bins in Group 1 have higher sensitivity and lower noise; therefore, they are more similar to the full-count case. In general, the deep-learning-based denoising approach can improve the SSIM of all three DRF groups and outperforms Gaussian filtering.        The imaging performance of 99m Tc and 137 Cs sources is different. One can notice in Figures 12, 13, 18 and 19 that the image quality of 99m Tc is intrinsically better than that of 137 Cs, with fewer dispersive artifacts. Additionally, the image degradation of 137 Cs is also much more severe than 99m Tc, referring to positioning accuracy as well as resolution. We believe that is because the higher-energy gamma photons of 137 Cs (662 keV) have stronger penetration capability through the detector, which reduces the sensitivity and increases the noise. On the other hand, increased Compton scattering interactions for the 137 Cs source may also lead to image degradation. However, compared with 99m Tc, the proposed deep-learning-based denoising methods have more significant improvements for 137 Cs. We believe this is because the adverse impacts of LC-SM are stronger for 137 Cs than for 99m Tc, especially when focusing on positioning bias.
There are other ways to further optimize our work. First, the performance of deep learning is highly dependent on the extensiveness of training data, so it would be better to use data generated from more devices to train the networks. In the present study, we only used the SM data from one device (Imager 1) for training. Additionally, when performing an inter-device evaluation, the mismatch of noise levels caused by different hardware leads to performance degradation compared to intra-device evaluation. In future work, we will utilize SM data from different devices for network training to improve reliability. Second, in this study, we selected two radioactive isotopes, 99m Tc and 137 Cs, as representatives of the most regularly used gamma sources in medical and industrial applications. We planned to test the method with expanded collections of gamma sources in further implementations. Third, in this study, we mainly focused on applying deep learning denoising on the system matrix and practically resolving the problem of the gamma imager calibration process. Therefore, we utilized classical network architectures to primarily prove the feasibility of the approach. The hyperparameters of the networks were chosen referring to existing works which have achieved satisfactory results [35,39]. However, more comprehensive optimization of the network parameters may further improve the performance. In future work, we plan to conduct an ablation study on the network parameters (e.g., number of convolutional layers, kernel size, optimizer) for better results and explore other deeplearning models (e.g., generative adversarial network (GAN) [43]) for the SM denoising task. Additionally, the extension of datasets mentioned above may also help improve the model's efficiency.
Our proposed deep-learning-based denoising method generally applies to other imaging systems that rely on an experimental calibration step to accommodate comprehensive system response factors in an accurately measured SM. Our proposed method effectively addresses the challenge of long calibration time, which represents a major obstruction that limits the application of experimental measurement in real practice. We expect that the presented technique can be extended to other gamma imaging devices, including industrial gamma cameras, SPECT, and PET systems.

Conclusions
In this study, we proposed a time-efficient SM calibration method with short-time measured SM and deep-learning-based denoising. To deal with sensitivity discrepancy across different detector bins, we proposed a self-adaptive, K-means clustering method to classify DRF images into multiple groups fed to independent network training processes. We investigated two denoising networks with U-net and Res-U-net architectures and compared them against a conventional Gaussian filtering method. Through intra-device and inter-device studies, we demonstrated that the denoised SMs with deep networks effectively reduce the noise-induced image degradation and faithfully yield comparable imaging performance with the long-time measured SM. Henceforth, the system matrix calibration time can be reduced from 1.4 h to 8 min. We conclude that the proposed SM denoising approach is promising and effective in enhancing the productivity of the 4π-view gamma imager, and it is also generally applicable to other imaging systems that require an experimental calibration step.