Deep Learning-Based Denoising in Brain Tumor CHO PET: Comparison with Traditional Approaches

: 18 F-choline (CHO) PET image remains noisy despite minimum physiological activity in the normal brain, and this study developed a deep learning-based denoising algorithm for brain tumor CHO PET. Thirty-nine presurgical CHO PET/CT data were retrospectively collected for patients with pathological conﬁrmed primary diffuse glioma. Two conventional denoising methods, namely, block-matching and 3D ﬁltering (BM3D) and non-local means (NLM), and two deep learning-based approaches, namely, Noise2Noise (N2N) and Noise2Void (N2V), were established for imaging denoising, and the methods were developed without paired data. All algorithms improved the image quality to a certain extent, with the N2N demonstrating the best contrast-to-noise ratio (CNR) (4.05 ± 3.45), CNR improvement ratio (13.60% ± 2.05%) and the lowest entropy (1.68 ± 0.17), compared with other approaches. Little changes were identiﬁed in traditional tumor PET features including maximum standard uptake value (SUVmax), SUVmean and total lesion activity (TLA), while the tumor-to-normal (T/N ratio) increased thanks to smaller noise. These results suggested that the N2N algorithm can acquire sufﬁcient denoising performance while preserving the original features of tumors, and may be generalized for abundant brain tumor PET images.


Introduction
Positron emission tomography (PET) is an emerging imaging modality that detects photons produced after positron annihilation from a radionuclide tagged substrate, and has been widely applied to evaluate the tissue metabolism at a molecular level. 18 F-fluorodeoxyglucose (FDG) is the most commonly used radiotracer due to the altered glucose metabolism of tumors; however, FDG displays high background activity in the normal brain because of the abundant glucose consumption, and this hinders the clinical application of FDG to evaluate brain tumors. Radiolabeled choline (CHO), therefore, has been developed as an alternative PET tracer to assess lipid metabolism, since the rapidly proliferating cells typically display elevated phospholipid production (resulting in increased choline uptake) while the physiological activity of the normal brain is low. As a result, CHO PET of the central nervous system allows a clearer visualization of Appl. Sci. 2022, 12, 5187 2 of 12 the metabolic tumor. However, the images remain noisy due to the limited number of photons captured during the PET scan, and higher counts of detected photons rely on a higher tracer dose and longer scanning time, which may give rise to the potential ionizing radiation damage to patients and medical staff or decrease the examination efficiency due to the long scanning duration. Therefore, developing a denoising method for CHO PET images of brain tumors that can produce a clearer image and retain the tumor property, would facilitate the clinical interpretation of PET data and facilitate the diagnosis of brain tumors.
Some traditional denoising methods have been applied for PET image noise reduction. Clinically, the Gaussian filter is a basic and simple approach, which convolves the reconstructed image with a Gaussian kernel. However, the Gaussian filter smooths out noise and detailed structures at the same time so this method is not edge-preserving. Multiple conventional algorithms, including bilateral filter [1], non-local means (NLM) [2], wavelet filter [3], block-matching and 3D filtering (BM3D) [4] and image-guided filter, have been developed to reduce the PET image noise while trying to preserve the details, but their clinical utility remains to be improved.
Over the past several years, deep learning techniques in image processing have provided novel approaches to acquiring high-quality images in the field of nuclear medicine imaging. Supervised networks require paired datasets for input and target, and have been proposed for PET image noise reduction, employing the high-quality images to restore the noisy images [5][6][7]. However, the feasibility to acquire clean PET data is limited in clinical practice, and it takes additional works to obtain paired datasets which is a burden for clinical work. Therefore, unsupervised and self-supervised learning techniques, which can train the models without labels, have been applied to PET imaging. Deep image prior (DIP) [8], which employs random noise and a corrupted image as the input and target without pre-training, shows the CNN's intrinsic ability of learning structure information from a single corrupted image, and has been developed for FDG PET denoising with the patient's prior image as the input [9][10][11]. Meanwhile, networks trained by large datasets have been reported to be useful for improving PET image quality without clean data, such as Noise2Noise (N2N) and Noise2Void (N2V). N2N [12] requires two datasets that have independent zero-mean noise, and have been verified to improve the PET image quality to a similar extent compared with supervised methods (e.g., Noise2Clean) [13,14]. N2V [15] employs a noisy dataset masked by a blind-plot network as the input and the same original dataset as the corresponding target, and has been applied for simulated as well as public brain PET data [16]. Notably, the majority of deep learning denoising methods are developed for FDG PET, whose images reflect basic anatomical structures and can be corrected with MRI, while CHO PET has a lower background activity in almost all normal brain structures and highlight the metabolic tumors. Therefore, networks trained by FDG PET images prior are not applicable for CHO PET and an alternative denoising algorithm is required.
This study developed a deep learning-based denoising algorithm for brain tumor CHO-PET that exhibits sufficient performance while preserving the detail of lesions. The method was explored and validated without a paired dataset, aiming to expand its generalizability for abundant retrospective images.

Patients
This was a retrospective investigation on a prospective CHO PET cohort which was approved by the Institutional Review Board of Peking Union Medical College Hospital (PUMCH) (ethics code ZS-2660), and informed consent was collected from all patients. Eligible criteria of the current study include those: (1) of age ≥ 18 years and Karnofsky Performance Score (KPS) ≥ 70; (2) suspected of having a primary brain tumor and being planned for surgery; (3) who underwent head CHO PET/CT prior to surgery; (4) with histopathological proof that the brain lesions are primary diffuse glioma; and (5) who had no anti-tumor treatment prior to PET/CT or surgery. Finally, 39 patients with pathological-confirmed primary diffuse glioma were enrolled for CHO PET/CT and included in the current study. The role of CHO PET quantitative parameters to distinguish the World Health Organization (WHO) grade and molecular markers were reported previously [17,18], and the current study focused on the denoising of CHO PET image.

CHO PET/CT Data Acquisition and Tumor Segmentation
18 F-fluoroethylcholine (CHO) was synthesized as described previously, and a dose of 5.55 MBq (0.15 mCi) CHO per kilogram of body weight was given intravenously. PET/CT was acquired 40-60 min after CHO injection on a Biograph 64 TruePoint TrueV PET/CT system (Siemens Medical Solutions, Erlangen, Germany). The original image was acquired with a slice thickness of 3 mm and interpolated into the DICOM data with a matrix of 336 × 336 × 148 as a standard protocol, with the physical size of each pixel being approximately 1 × 1 × 1.5 mm. The finally output DICOM data were processed by Gaussian post-filtering. Standard uptake value (SUV) of each pixel normalized by body weight and decay factor was subsequently calculated.
Tumors were semiautomated segmented using 3D slicer 4.10.2 (https://www.slicer. org/, accessed on 13 November 2019) by a nuclear medicine physician as previously reported. Briefly, three spherical reference regions of interest (ROI) were first manually placed on the contralateral normal cortex to calculate the maximum and mean SUV (Nmax and Nmean) of the normal brain. The tumoral ROI was semiautomatically defined for regions with SUV/Nmean > 2.0 and SUV/Nmax > 1.0 if the lesion displayed significant CHO activity, or delineated on the CT image and co-registered to CHO PET image if the lesions did not reveal significant CHO uptake. Manual editing was performed to ensure the continuity of the ROI and to remove the structures with physiological CHO uptake.
The brain was segmented based on CT image with the threshold to delineate the skull and floodfill function of OpenCV (version 4.5.3, https://opencv.org/, accessed on 6 October 2021) to fill the interior region (which represented the brain tissue). The CTbased brain segmentation was co-registered to the PET data using the resize function of OpenCV, and the normal brain was defined by subtracting the tumor segmentation from the brain segmentation.

Tumor Feature Definition
Five traditional features, namely, SUVmax, SUVmean, metabolic tumor volume (MTV), total lesion activity (TLA) and tumor-to-normal contralateral cortex activity ratio (T/N ratio), were defined to quantify the metabolic characteristics of the tumor [18]. SUVmax, SUVmean and TLA represent the maximum, mean and total radioactivity of the tumoral ROI, while the T/N ratio indicates the ratio of the tumor SUVmax to mean SUV of the contralateral brain (Nmean). MTV represents the volume of ROI and remained unchanged before and after postprocessing. Changes of SUVmax, SUVmean, TLA and T/N ratio during postprocessing were calculated to reflect the influence of denoising to the original feature of the tumor, and Pearson correlation coefficients of SUVmax, SUVmean, MTV and TLA with denoising performance were computed to evaluate whether the denoising was tumor feature-dependent.

Denoising (Postprocessing)
Two conventional denoising methods (BM3D and NLM) and two deep learning-based approaches (N2N and N2V) were established for imaging denoising. BM3D and NLM are conventional denoising approaches that are supposed to be the basic reference in the field of image restoration. NLM filtering uses the weighted average value of similar blocks with square neighborhood of 10, the search window size of 21 and the Gaussian filtering parameter of 5. BM3D realized denoising by nonlocal-based grouping, collaborative filter and aggregation, with the standard deviation of additive white Gaussian noise set to 15.
N2N and N2V employed U-Net architecture as shown in Figures 1 and 2. The retrospective nature of the study only allows the original PET image to be the target, and N2N added Gaussian noise to create the input image, while N2V masked each input patch with blind-spot networks to implement the network possible. L2 loss function was minimized by Adam optimizer and utilized for both algorithms.
Appl. Sci. 2022, 12, 5187 4 of 12 filtering parameter of 5. BM3D realized denoising by nonlocal-based grouping, collaborative filter and aggregation, with the standard deviation of additive white Gaussian noise set to 15. N2N and N2V employed U-Net architecture as shown in Figures 1 and 2. The retrospective nature of the study only allows the original PET image to be the target, and N2N added Gaussian noise to create the input image, while N2V masked each input patch with blind-spot networks to implement the network possible. L2 loss function was minimized by Adam optimizer and utilized for both algorithms.   filtering parameter of 5. BM3D realized denoising by nonlocal-based grouping, collaborative filter and aggregation, with the standard deviation of additive white Gaussian noise set to 15. N2N and N2V employed U-Net architecture as shown in Figures 1 and 2. The retrospective nature of the study only allows the original PET image to be the target, and N2N added Gaussian noise to create the input image, while N2V masked each input patch with blind-spot networks to implement the network possible. L2 loss function was minimized by Adam optimizer and utilized for both algorithms.

Denoising Evaluation
The contrast-to-noise ratio (CNR) shows the ratio of the contrast between tumor region and the normal brain of the same patient, defined as where M tumor stands for the mean pixel value of the tumor region while M norm reflects the mean value of complement pixels (the pixels outside the tumor). SD norm is the standard deviation of the complement pixels. To evaluate the performance of different methods, CNR improvement ratio is subsequently defined where CNR denoised and CNR original present the CNR of denoised data and original PET data, respectively. Inspired by the concept of thermodynamics, Shannon [19] proposed entropy as a measure of information. Entropy (H) presents the statistical randomness of the gray-scale image and characterizes the image texture, defined as where p(x), calculated by the gray level histogram, represents the normalized probability density of the pixels which have the value of x, χ is the scale of gray levels.

Baseline Characteristics
There were 38.5% (n = 15), 25.6% (n = 10), 35.9% (n = 14) patients diagnosed with WHO grade II, III and IV primary diffuse gliomas, respectively, and the CHO activity of the lesions increased progressively with tumor grade. The baseline characteristics of the 39 enrolled patients were displayed in Table 1.

Computation Time
The training time and parameters for N2N and N2V, and the computation time for N2N, N2V, BM3D and NLM are shown in Table 2. We utilized an integral image algorithm with memory optimization [20] to accelerate NLM, while other algorithms (BM3D, N2N and N2V) were accelerated by GPU. This experiment was carried out on a computer with NVIDIA GeForce RTX3070 GPU and Intel(R) Core(TM) i7-10700F CPU @ 2.90 GHz. According to the computation time, BM3D and NLM are difficult to meet the needs of

Denoising Performance
All algorithms improved the image quality to a certain extent, and Figure 3 shows the comparison of the input image and denoised results. Both traditional algorithms (NLM and BM3D) enhanced the image quality to a similar extent, with mean CNRs of 3.85 ± 3.06 and 3.86 ± 3.06, respectively, while BM3D displayed a higher CNR improvement ratio (1.62% ± 3.41%) than NLM (1.37% ± 3.03%). On the other hand, deep learning methods (N2N and N2V) presented higher CNRs and CNR improvement ratios than NLM and BM3D. N2V resulted in the CNR at 3.89 ± 3.12 and the CNR improvement ratio at 7.75% ± 2.07%. N2N reflected clearer output images with higher CNR (4.05 ± 3.45) and CNR improvement ratio (13.60% ± 2.05%), both of which outperformed other methods. Values and distributions of CNR and CNR improvement ratio are demonstrated in Table 3 and Figure 4, respectively.

Denoising Performance
All algorithms improved the image quality to a certain extent, and Figure 3 shows the comparison of the input image and denoised results. Both traditional algorithms (NLM and BM3D) enhanced the image quality to a similar extent, with mean CNRs of 3.85 ± 3.06 and 3.86 ± 3.06, respectively, while BM3D displayed a higher CNR improvement ratio (1.62% ± 3.41%) than NLM (1.37% ± 3.03%). On the other hand, deep learning methods (N2N and N2V) presented higher CNRs and CNR improvement ratios than NLM and BM3D. N2V resulted in the CNR at 3.89 ± 3.12 and the CNR improvement ratio at 7.75% ± 2.07%. N2N reflected clearer output images with higher CNR (4.05 ± 3.45) and CNR improvement ratio (13.60% ± 2.05%), both of which outperformed other methods. Values and distributions of CNR and CNR improvement ratio are demonstrated in Table 3 and Figure 4, respectively.  and the normal brain region of a brain tumor patient (C), with the window width set as the maximum and minimum value of each image. In accordance with their mathematical definition, NLM and BM3D blurred the speckle and the tumor simultaneously, which decreased the noise of the normal brain but also removed the details of the tumor. Conversely, N2V and N2N removed the speckles to provide clearer images, and the tumor structure was better preserved. Among the four algorithms, N2N exhibited the clearest normal brain and better protected tumor details. mathematical definition, NLM and BM3D blurred the speckle and the tumor simultaneously, which decreased the noise of the normal brain but also removed the details of the tumor. Conversely, N2V and N2N removed the speckles to provide clearer images, and the tumor structure was better preserved. Among the four algorithms, N2N exhibited the clearest normal brain and better protected tumor details.   Figure 5 showed the distributions of entropy for different methods. Higher entropy is suggestive of more information presented in the image, and N2V preserved more information with higher entropy (2.42 ± 0.16) than other methods. From another perspective, higher entropy is also correlated with a disordered image, and therefore, N2N exhibited the best denoising performance with the lowest entropy (1.68 ± 0.17).  Figure 5 showed the distributions of entropy for different methods. Higher entropy is suggestive of more information presented in the image, and N2V preserved more information with higher entropy (2.42 ± 0.16) than other methods. From another perspective, higher entropy is also correlated with a disordered image, and therefore, N2N exhibited the best denoising performance with the lowest entropy (1.68 ± 0.17).   Figure 5 showed the distributions of entropy for different me is suggestive of more information presented in the image, and N2V mation with higher entropy (2.42 ± 0.16) than other methods. Fro higher entropy is also correlated with a disordered image, and th the best denoising performance with the lowest entropy (1.68 ± 0.1

Influence of Denoising on Tumor Features
SUVmax, SUVmean, TLA and T/N ratio are the most crucia

Influence of Denoising on Tumor Features
SUVmax, SUVmean, TLA and T/N ratio are the most crucial parameters for malignancy evaluation in clinical settings, and minimum changes were identified in SUVmax, SUVmean, and TLA. SUVmax measures the maximum radioactivity of the tumor region, which represents the highest malignancy, and this remained unchanged after denoising (change ratio of 6.29% for N2N, 4.77% for N2V, 3.02% for BM3D, and 4.86% for NLM). Little changes were noticed in the SUVmean, which reflects the average radioactivity of the tumor region, with the change ratio of 0.85% for N2N, 0.58% for N2V, 2.25% for BM3D, and 1.52% for NLM. N2N and N2V reflected smaller changes than conditional methods in TLA, while all these methods exhibited little changes within 3% compared to the original images. In accordance with the denoising purpose, the T/N ratios increased for all algorithms, with the improvement ratio of 20.93% and 5.46% for N2N and N2V, while fewer changes were identified in conventional methods (2.54% for BM3D and 1.51% for NLM). Table 4 showed the mean improvement ratio of tumor features with outliers detected and removed. Figure 6 showed the distributions of tumor features for denoised data. which represents the highest malignancy, and this remained unchanged after denoising (change ratio of 6.29% for N2N, 4.77% for N2V, 3.02% for BM3D, and 4.86% for NLM). Little changes were noticed in the SUVmean, which reflects the average radioactivity of the tumor region, with the change ratio of 0.85% for N2N, 0.58% for N2V, 2.25% for BM3D, and 1.52% for NLM. N2N and N2V reflected smaller changes than conditional methods in TLA, while all these methods exhibited little changes within 3% compared to the original images. In accordance with the denoising purpose, the T/N ratios increased for all algorithms, with the improvement ratio of 20.93% and 5.46% for N2N and N2V, while fewer changes were identified in conventional methods (2.54% for BM3D and 1.51% for NLM). Table 4 showed the mean improvement ratio of tumor features with outliers detected and removed. Figure 6 showed the distributions of tumor features for denoised data.
There is also no correlation of CNR improvement ratio with tumor size, SUVmax, SUVmean and TLG, with the Pearson correlation coefficient ranging from −0.110 to −0.031 for CNR improvement ratio. Correlation coefficients of CNR and CNR improvement ratio with tumor features are displayed in Table 5.   There is also no correlation of CNR improvement ratio with tumor size, SUVmax, SUVmean and TLG, with the Pearson correlation coefficient ranging from −0.110 to −0.031 for CNR improvement ratio. Correlation coefficients of CNR and CNR improvement ratio with tumor features are displayed in Table 5. Table 5. Correlation of CNR and CNR improvement ratio with MTV, SUVmax, SUVmean and TLA. Notes: The values of tumor features change ratios were calculated with outliers that differed from the median by more than three scaled median absolute deviations detected and removed.

Discussion
CHO PET is an effective molecular-level lipid metabolism evaluation approach for tumor diagnosis. PET images are recognized by limited spatial resolution, and this paper established conventional and deep learning methods for CHO PET denoising. All proposed algorithms reduced noise to a certain extent, and deep learning methods (N2N and N2V) preserved more details of the tumor region compared to conventional filters (BM3D and NLM). N2N had superior performance among all algorithms, with a higher CNR improvement ratio of 13.60% ± 2.05%, lower entropy of 1.68 ± 0.17 and fewer changes in tumor features (SUVmax, SUVmean and TLA). This proposed method provided clearer images for physician's diagnosis with good preservation of the representative features in tumor region, which can be expected to be applied to other low background activity radiotracer data.
The conventional filters reduced noise in the spatial domain, which calculates the pixel value by weighted average of correlated pixels. Though there are algorithms, such as BM3D and NLM, to search the correlated pixels and estimate the weight of the pixels, they cause blurring of results when the noise level is high. As presented in Figure 3, the normal brain regions of conventional method results have unwanted indistinction. Deep learning-based methods, on the other hand, are data-driven and learn to remove noise from images directly instead of filtering the neighborhood pixels to smooth both noise and detail information. Although it has narrower application scenarios, a well-trained deep learning algorithm may present a better performance in executing a specific assignment. In our study, the deep learning-based networks outperformed the conventional filters with higher CNRs and CNR improvement ratios, indicating the capability of deep learning for the denoising of brain tumor CHO PET data.
Previous studies are always based on FDG PET data [21], for FDG is the most widely applied radiotracer with an enormous amount of application. The injection dose of CHO is similar to the dose of FDG, but the uptake approaches of the two kinds of tracer have essential differences. FDG, as a glucose analog that can be absorbed but not further metabolized, can detail the major brain structures due to the significant glucose assumption in the cortex. CHO, on the other hand, reflects phospholipid membrane production, which is generally low in both the cortex and medulla. Therefore, CHO PET images present low background metabolism in the whole brain, that are supposed to obtain less information from brain tissue. Comparing CHO PET with FDG PET, the total uptake of radiation tracer is lower and the characteristics of the images and noise are different. Therefore, supervised denoising networks trained by FDG PET images may have difficulty with the generalization to CHO PET images, which have different characteristics compared to FDG.
A variety of neural networks have been designed to produce denoised images, and the majority of them are trained by paired data such as short-scan-time and long-scan-time data. Although these approaches are expected to have better performance on the denoising result, the limitation of these methods still narrows their practical application. Supervised denoising networks rely on the experiments bringing out long-scan-time data or down-sampled data to generate paired datasets. However, the short-scan-time data or down-sampled data are not always accessible for the retrospective dataset which are postprocessed by Gaussian filters. Hence, the techniques that require ground-truth data are difficult to be adopted in clinical practice. In contrast, N2N and N2V denoising networks are self-supervised/unsupervised learning approaches, and they have the potential capability of clinical generalization. Our denoising algorithm was developed from the finally outputted DICOM data, which is different from some of the previous studies that denoise as one of the post-process steps. Processing the final DICOM data provides clearer images with important tumor features preserved, and it gives us opportunities to deal with the abundant retrospective (original-existed) data. In addition to CHO PET, amino acid PET was also recognized for its low background activity and high tumor-to-normal contrast in brain tumors, and our denoising method may also be utilized for amino acid PET to produce a clearer image.
Entropy is considered to represent the variance of an image, with a higher value indicating a higher image disorder of the same subject. For N2N network, which tend to learn the low-frequency signal to fulfill the accuracy of norm regression, the output of this network presented low entropy. The N2V approach is based on the assumption that the noise of images is pixel-wise independent, so that it can filter out the unpredictable noise with the structured noise left. While the uptake of normal brain tissue is assumed to be low and evenly distributed, the high uptake spots in the normal brain region are considered to be noise, which seems not to be pixel-wise independent. As a result, the entropy of the N2V output was higher. BM3D and NLM had similar entropy scores which were lower than N2V, for they sacrificed the details compared to N2V.
The denoising methods reflected little changes in tumor features such as SUVmax, SUVmean and TLA, suggesting a minimum influence on tumor grading and threshold of diagnosis. Therefore, the methods are feasible for the clinical experience and research findings. Conversely, there is also no correlation of the CNR improvement ratio with the tumor features, indicating the denoising algorithms are robust and the denoising performances of the normal brain are not influenced by the intrinsic tumor feature.
This study has a few limitations. First, ground-truth images or reference images are not available for the evaluation of denoising results, so the proposed method was not compared to supervised networks. Second, the assumption of N2N network that Gaussian noise was added to the noisy images is not precise, so the method can be improved in the future research. Third, the network has not been verified for other tracers, different types of tumors and other tissues. Forth, the denoising methods for 3D data will be developed, and developing new networks and comparing with more effective algorithms will be the topic of our research in the future.

Conclusions
In this work, deep learning-based networks trained by full dose original images were applied to CHO PET denoising, demonstrating a higher overall image quality compared with conventional approaches. The proposed N2N network was more effective in removing the noise with tumor features, keeping the detailed structures preserved, compared to NLM, BM3D and N2V.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The code and model are available online (https://github.com/xushuo0629, accessed on 10 May 2022).

Conflicts of Interest:
The authors declare no conflict of interest.