Noise2Noise Improved by Trainable Wavelet Coefﬁcients for PET Denoising

: The signiﬁcant statistical noise and limited spatial resolution of positron emission tomography (PET) data in sinogram space results in the degradation of the quality and accuracy of reconstructed images. Although high-dose radiotracers and long acquisition times improve the PET image quality, the patients’ radiation exposure increases and the patient is more likely to move during the PET scan. Recently, various data-driven techniques based on supervised deep neural network learning have made remarkable progress in reducing noise in images. However, these conventional techniques require clean target images that are of limited availability for PET denoising. Therefore, in this study, we utilized the Noise2Noise framework, which requires only noisy image pairs for network training, to reduce the noise in the PET images. A trainable wavelet transform was proposed to improve the performance of the network. The proposed network was fed wavelet-decomposed images consisting of low-and high-pass components. The inverse wavelet transforms of the network output produced denoised images. The proposed Noise2Noise ﬁlter with wavelet transforms outperforms the original Noise2Noise method in the suppression of artefacts and preservation of abnormal uptakes. The quantitative analysis of the simulated PET uptake conﬁrms the improved performance of the proposed method compared with the original Noise2Noise technique. In the clinical data, 10 s images ﬁltered with Noise2Noise are virtually equivalent to 300 s images ﬁltered with a 6 mm Gaussian ﬁlter. The incorporation of wavelet transforms in Noise2Noise network training results in the improvement of the image contrast. In conclusion, the performance of Noise2Noise ﬁltering for PET images was improved by incorporating the trainable wavelet transform in the self-supervised deep learning framework.


Introduction
Reconstructing clear positron emission tomography (PET) images from noisy observations without the loss of spatial resolution remains a challenge because of the severe noise corruption in raw PET data and the limited resolution of the scanner. For decades, numerous studies have attempted to address this problem using various statistical and numerical approaches and signal processing techniques [1][2][3][4][5][6][7][8][9][10][11]. Various filters such as bilateral, nonlocal means, and wavelet-based filters have been proposed to reduce the noise in the corrupted images without causing blur to the anatomical boundaries [1][2][3]10,11]. In addition, iterative reconstruction algorithms in Bayesian frameworks incorporating specifically designed regularization functions have also been proposed [5,10,[12][13][14]. Consequently, these algorithms could significantly improve the quality of the reconstructed PET images by employing proper noise models and adequate optimization methods. However, several problems remain, such as the hyperparameter selection, optimal modeling of noise, and high computational burden. In recent years, data-driven machine learning techniques based on deep neural networks have made remarkable progress in performing many challenging signal and image processing tasks [15].
In general, supervised learning for image denoising requires a paired dataset of corrupted images and clean targets [6,16,17]. On the other hand, unsupervised learning methods for PET images were also proposed [7,8,18]. They utilized deep image prior (DIP), which trains the convolutional neural network as the regularizer for a given cost function [19]. Recently, the Noise2Noise framework, which trains neural networks using only noisy images, was proposed [9,20]. The only mandatory requirement for the successful training of the Noise2Noise framework is that the noise distribution in the input and training target should be identical and independent. Once the requirement was satisfied, different noisy realizations from the same image were fed to the neural network as the input and target. Moreover, list-mode PET data acquisition allows the generation of independent noisy sinograms with different time frames. This is the main difference from the above DIP-based methods, which only require corrupted input at the training phase and generate clean results. Besides, DIP sometimes suffers from overfitting, which tends to replicate results with corrupted inputs [19,21]. In this study, we applied Noise2Noise to list-mode PET data to determine whether this self-supervised denoising method was effective for short-scan PET data.
We also focused on enhancing the generalization ability of the neural network, as generalization is a major technical issue in deep learning-based medical image processing due to the limited dataset. Based on the proven efficacy of image denoising by transforming a given image into a wavelet basis [22][23][24], we propose the wavelet transform (WT) in a neural network with trainable coefficients. Simulation and clinical data show that our proposed method can improve the generalization ability of the trained network compared with the ordinary Noise2Noise method.

PET Data Model and Image Reconstruction
The PET image and observation were modeled by the following equation: where y is the observed data, A is the projection matrix, x is the desired image to be reconstructed, r is random, and s is the scatter component. Both the analytic reconstruction method, e.g., filtered back-projection, and the iterative algorithm can be applied in y to find the target image x. In this study, we used the standard ordered-subset expectationmaximization (OS-EM) algorithm for image reconstruction. Although the noise distribution in the sinogram is independent, the noise characteristics in the reconstructed image may not be independent of each other due to some systemic noise sources generated through the reconstruction [25,26]. However, the following experimental results show that this method works well with both simulation and real data. We used six outer iterations and 21 subsets for both simulation and clinical data. No postfilter was applied to the reconstructed images.

Noise2Noise Training and Trainable Wavelets
Unlike the normal supervised learning for image denoising, Noise2Noise exploits two noisy pairs (never uses a clean target) in the training phase, as follows: where f is the neural network with parameters θ, x n is the clean target image, and n,. is the independent noise for each image. In this study, x n + n,. corresponds to the results of the OS-EM algorithm with different time frames. Using the list-mode data of PET images, we can divide original data from a reference scan-time to an independent short scan-time satisfying the i.i.d assumption of Noise2Noise in the sinogram space. The generalization power of the trained network, which depends on the number of patients and list-mode time bins, is often limited in medical imaging applications. To improve the generalization for a small dataset, we proposed applying WT, allowing the images to be handled in the multispectral band. The network was fed the decomposed images consisting of low-pass and high-pass components, and the inverse WT produced a denoised image by converting the output of the network (Figure 1a). Accordingly, the overall training procedure can be described by: where W θ W and W * θ W * are the forward and inverse WTs with coefficients θ W and θ W * . WTs were initialized with the elements of the discrete Haar filter banks [27]. The singlelevel normalized low-pass and high-pass Haar filter banks before network training are defined by: It is well known that the 1D forward WT is equivalent to a convolution between a given signal and above filters followed by dyadic downsampling. The inverse WT is found by a convolution between dyadic upsampled wavelet domain signals and transposed filter banks ( Figure 1b). For 2D WT, the low-pass and high-pass transforms are applied to each axis, and this operation produces four components: low resolution (LL), vertical (LH), horizontal (HL), and diagonal (HH), respectively. In the conventional wavelet theory using fixed filter banks, the inverse transform is performed by applying the same filters used for the forward transform. However, we used independent filters for forward and inverse transforms. A detailed discussion can be found in Section 3. All the network parameters and wavelet coefficients were trained using an efficient backpropagation algorithm [28].
We used a U-net-based convolutional neural network combined with DenseNet architecture [29,30], which is widely used for various tasks in biomedical imaging [31][32][33][34][35]. The network parameters and wavelet coefficients were optimized using Adam optimizer with a learning rate of 10 −4 . The batch size was 8, and the network was implemented using Tensorflow on GTX 1080Ti.

Simulation Data
Twenty segmented synthetic magnetic resonance images were collected from Brain-Web (Cocosco et al. 1997), and ground truth PET images were generated. The assigned PET uptake value for different brain tissues was 0.5 for gray matter, 0.125 for white matter and background, and 0.75 for randomly located lesions in gray matter. After the forward The forward and inverse wavelet transform (WT) performed before and after passing the deep neural network has trainable coefficients. To achieve the continuity of 3D data, we used a 2.5D input and output. (b) Schematic explanation for the 1D wavelet transform and its inverse transform. * denotes the convolution with Haar filter banks (θ . ), which were learnable parameters during the training. The arrows show the dyadic down sampling ( ↓ 2 ) and upsampling ( ↑ 2 ).

Simulation Data
Twenty segmented synthetic magnetic resonance images were collected from Brain-Web (Cocosco et al. 1997), and ground truth PET images were generated. The assigned PET uptake value for different brain tissues was 0.5 for gray matter, 0.125 for white matter and background, and 0.75 for randomly located lesions in gray matter. After the forward projection using the 2D matrix operation implemented in MATLAB 2018a (The MathWorks Inc., Natick, MA, USA), Poisson noise was added to the projection data to generate 100 independent realizations of the noisy sinogram with 1/30 counts relative to the reference. We trained the Noise2Noise network models with and without the incorporation of the trainable WT in 2.5D mode; three adjacent slices were fed into the networks together as different channels. The numbers of training and test sets were 1635 and 545 slices, obtained from fifteen and five image volumes, respectively. After training the neural networks, the normalized root-mean-square error (NRMSE), peak-to-signal ratio (PSNR), and structural similarity metric (SSIM) were calculated between the ground truth PET image and test data, as follows: where x j is the j-th voxel test image generated from the trained network;x . is the ground truth; and ROI is the predefined region, which was the simulated abnormal lesions or whole gray matter. µ . and σ . were the average and standard deviation of the images, and we used the default function for SSIM from MATLAB 2018a.

Clinical Data
We retrospectively used fourteen [ 18 F] FDG brain scans (eight males and six females, age = 50.9 ± 21.6 years) using a Biograph mCT 40 scanner (Siemens Healthcare, Knoxville, TN, USA). Nine images were used to train the neural networks and five to test and evaluate their performance (981 and 545 slices). The list-mode PET data were acquired 60 min after the intravenous injection of 18 F-FDG (5.18 MBq/kg) for 5 min in a single bed position. To obtain noisy sinograms, we divided the 5 min list-mode data into 10 s long bins. The PET images were then reconstructed using the OS-EM algorithm (six iterations and 21 subsets, no post-filter) using E7tool provided by Siemens, in which CT-derived attenuation maps were used for attenuation and scatter correction. The matrix and pixel sizes of the reconstructed PET images were 200 × 200 × 109 and 2.04 × 2.04 × 2.03 mm 3 , respectively. The same training parameters were used to train the Noise2Noise networks with and without the incorporation of the WT. We also calculated PSNR and SSIM for each test image data, which consisted of 10 independent samples acquired from full count data, after normalizing them using the 99% value of the full count image. Figure 2 shows the noisy simulation (input) and denoised images using Gaussian filter and Noise2Noise filters with and without the incorporation of the trainable WT. The proposed Noise2Noise filter with WT outperformed the original Noise2Noise method in the preservation of abnormal uptakes indicated by the red arrows. Moreover, increased uptake of normal gray matter tissue was observed in the original Noise2Noise method (blue arrows). Noise2Noise with wavelet transform yielded a smaller error in abnormal lesions and normal gray matter regions than original Noise2Noise (Figure 3a,b). The quantitative analysis using the image quality metrics also confirmed the improved performance of the proposed method compared to the original Noise2Noise technique (Figure 3c,d). In the clinical data, the 10-s images filtered with the Noise2Noise model were most similar to the 300-s images (Figure 4). The incorporation of the WT, which yielded multiple downscaled images in the Noise2Noise network training, resulted in an improved image contrast (red arrows). Moreover, the quantitative analysis of clinical data also supported that Noise2Noise with wavelet transform outperformed the original Noise2Noise ( Figure 5). The proposed Noise2Noise along with wavelet transform yielded the best results among other filtering methods.

Results and Discussion
filter and Noise2Noise filters with and without the incorporation of the trainable WT. The proposed Noise2Noise filter with WT outperformed the original Noise2Noise method in the preservation of abnormal uptakes indicated by the red arrows. Moreover, increased uptake of normal gray matter tissue was observed in the original Noise2Noise method (blue arrows). Noise2Noise with wavelet transform yielded a smaller error in abnormal lesions and normal gray matter regions than original Noise2Noise (Figure 3a,b). The quantitative analysis using the image quality metrics also confirmed the improved performance of the proposed method compared to the original Noise2Noise technique (Figure 3c,d). In the clinical data, the 10-s images filtered with the Noise2Noise model were most similar to the 300-s images (Figure 4). The incorporation of the WT, which yielded multiple downscaled images in the Noise2Noise network training, resulted in an improved image contrast (red arrows). Moreover, the quantitative analysis of clinical data also supported that Noise2Noise with wavelet transform outperformed the original Noise2Noise ( Figure 5). The proposed Noise2Noise along with wavelet transform yielded the best results among other filtering methods.   One of the interesting properties of the proposed network is that it does not obey the orthogonality of the Haar wavelet because different filter banks are used for the forward and inverse wavelet transforms. This unsatisfied mathematical completeness would be a limitation of this study; however, the network produced the desired denoised images. Moreover, the updates for filter banks were more efficient because they were not shared One of the interesting properties of the proposed network is that it does not obey the orthogonality of the Haar wavelet because different filter banks are used for the forward and inverse wavelet transforms. This unsatisfied mathematical completeness would be a limitation of this study; however, the network produced the desired denoised images. Moreover, the updates for filter banks were more efficient because they were not shared and overlapped in the back-propagation path for each iteration. If the same filter banks were used during training, they should be updated twice for forward and inverse transforms, resulting in undesired change in the computed gradient.
In this study, we only evaluated the Haar wavelet in the network because it is one of the simplest discrete wavelet transforms. However, various wavelets were proposed, including continuous wavelets [36,37]. In the future, we will investigate various wavelet transforms to find the best structure for the Noise2Noise framework.
The noise characteristics of PET images are complex, over-dispersed, and correlated [26]. Various studies have attempted to reduce the noise in a reconstructed image by applying handcrafted prior or data-driven methods [6][7][8]16]. Most data-driven methods based on deep neural networks have focused on supervised learning that requires clean target images. Recently, the Noise2Noise [20] framework has produced promising outcomes in image denoising without ground truth. The present study shows that the Noise2Noise method can mitigate the considerable noise in reconstructed PET images. Noise2Noise network training is simple and does not require the design of a complex noise model of the image. Moreover, the incorporation of WT with a trainable coefficient led to an improved generalization in Noise2Noise training, as shown in clinical data experiments.

Conclusions
We proposed the Noise2Noise framework to reduce the noise in low-count PET images. Discrete Haar wavelet transform was applied to the input and output of the neural network, where both images were noisy but contained identical and independent noise statistics. Filter banks of the given wavelets were updated during the training.

Conclusions
We proposed the Noise2Noise framework to reduce the noise in low-count PET images. Discrete Haar wavelet transform was applied to the input and output of the neural

Conclusions
We proposed the Noise2Noise framework to reduce the noise in low-count PET images. Discrete Haar wavelet transform was applied to the input and output of the neural network, where both images were noisy but contained identical and independent noise statistics. Filter banks of the given wavelets were updated during the training.
After training, both Noise2Noise with and without WT successfully alleviated the noise in the input. Furthermore, the performance of Noise2Noise filtering for PET images was improved by incorporating a trainable WT. In computer simulations, Noise2Noise with WT recovered the intensities of randomly added lesions better than the original Noise2Noise. Experiments using clinical images also showed an improved performance of Noise2Noise training with WT in terms of PSNR and SSIM. In the future, more rigorous evaluations will be performed using various wavelet transforms and radiotracers.  Informed Consent Statement: Informed consent was waived because of the retrospective nature of the study and the analysis used anonymous clinical data.