Medical Radiation Exposure Reduction in PET via Super-Resolution Deep Learning Model

In positron emission tomography (PET) imaging, image quality correlates with the injected [18F]-fluorodeoxyglucose (FDG) dose and acquisition time. If image quality improves from short-acquisition PET images via the super-resolution (SR) deep learning technique, it is possible to reduce the injected FDG dose. Therefore, the aim of this study was to clarify whether the SR deep learning technique could improve the image quality of the 50%-acquisition-time image to the level of that of the 100%-acquisition-time image. One-hundred-and-eight adult patients were enrolled in this retrospective observational study. The supervised data were divided into nine subsets for nested cross-validation. The mean peak signal-to-noise ratio and structural similarity in the SR-PET image were 31.3 dB and 0.931, respectively. The mean opinion scores of the 50% PET image, SR-PET image, and 100% PET image were 3.41, 3.96, and 4.23 for the lung level, 3.31, 3.80, and 4.27 for the liver level, and 3.08, 3.67, and 3.94 for the bowel level, respectively. Thus, the SR-PET image was more similar to the 100% PET image and subjectively improved the image quality, as compared to the 50% PET image. The use of the SR deep-learning technique can reduce the injected FDG dose and thus lower radiation exposure.


Introduction
Deep neural networks have been applied in computer vision tasks, such as segmentation, image classification, denoising, image generation, image synthesis, and superresolution (SR). Among them, SR is one of the most popular approaches used for increasing the resolution of degraded low-resolution (LR) images. Many SR models have been published since the SR convolutional neural network (SRCNN) was first reported by Dong et al. in the European Conference on Computer Vision 2014 [1]. A summary of the SR challenge in the New Trends in Image Restoration and Enhancement workshop is reported annually [2][3][4][5][6]. Moreover, many open-access state-of-the-art (SOTA) models have been published on websites [7]. These technologies are attracting attention not only in natural image processing but also in medical image processing, and deep neural networks have been applied to nuclear medicine [8].
Positron-emission tomography (PET) is a functional imaging modality that uses radiotracers, such as [18F]-fluorodeoxyglucose (FDG), and has been used for the diagnosis of cancer and assessment of the extent of disease in oncology, combined with anatomical data from computed tomography (CT) or magnetic resonance imaging (MRI) [9][10][11]. However, one of the limitations of PET imaging is its relatively poor spatial resolution, as compared with CT or MRI, because of physical parameters, such as scatter, counting statistics, position range, and patient motion. Currently, a small number of clinical PET systems using silicon photomultipliers (SiPM) are commercially available, such as the Signa PET/MRI system (GE Healthcare, Waukesha, WI) [12], Discovery MI PET/CT system (GE Healthcare) [13], Vision PET/CT system (Siemens, Munich, Germany) [14], and Vereos PET/CT system (Philips, Amsterdam, Netherlands) [15]. These clinical PET systems achieve high-energy resolution (<10% in full-width at half-maximum (FWHM) and precise time-of-flight (TOF) measurements (<400 ps FWHM in coincidence time resolutions). Although the PET systems equipped with SiPM are expensive, the use of these systems is expected to become widespread in the future because they can improve the image resolution as compared with the conventional PET system without increasing radiation exposure for the patients [12][13][14][15][16][17].
In Japan, radiation exposure management has been obligatory since April 2020, due to the enforcement of the partial revision of the Enforcement Regulations of the Medical Care Law, which includes the safety management of radiation for medical use based on the established Japanese diagnostic reference levels (DRLs) 2020 [18]. Abe et al. reported the details of Japan's DRLs 2020 for nuclear medicine [19]. Depending on the examination protocol, the radiation exposure in FDG-PET/CT imaging is higher than that in contrastenhanced CT imaging [20,21]. Thus, the absorbed radiation dose per examination should be as low as possible, particularly for young patients who are sensitive to radiation and potentially require repeated follow-up studies. Previously, Queiroz et al. reported that the quality of PET images with half the injected FDG dose was clinically acceptable [22]. In addition, Sekine et al. demonstrated that the PET image quality in a TOF PET/MR system was clinically adequate with 60% of the usually injected FDG dose in patients with a body mass index (BMI) > 25 kg/m 2 and 50% of the injected FDG dose in patients with a BMI < 25 kg/m 2 [23].
The noise-equivalent count correlates with the acquisition time and injected dose in FDG-PET. At the point of radiation exposure, a reduced dosage of injected FDG is desirable for patients. However, PET images with a decreased signal-to-noise ratio (SNR) and structural similarity (SSIM) would affect disease diagnosis. In clinical practice, diagnostic examinations are performed based on defined procedure guidelines for tumor imaging [20,24], so it is not easy to obtain images with low-dose FDG injections for ethical reasons. Thus, it is important to simulate low-dose data from the under-sampled normally injected FDG dose, for further injected FDG dose reductions.
We hypothesized that it would be possible to reduce the radiation exposure in PET examinations by improving the image quality via SR deep learning techniques from lowquality PET images obtained with short acquisition times. Thus, the aim of this study was to clarify whether, when using a short acquisition time, PET image quality could improve to a level similar to the conventional full-acquisition-time PET image by applying the SR deep learning model.

Related Works
SR techniques have been applied in PET imaging to solve the problem of blurry reconstructed PET images due to noise and artifacts. The traditional SR approach is based on interpolation such as the most commonly used bilinear or bicubic methods. These interpo-lation methods increase the number of pixels and improve the image resolution with the obtained polynomial function constructed from all known points. Since the publication of SRCNN, research on SR using convolutional neural networks (CNNs) has dramatically advanced, and various deeper network architectures have been proposed [1][2][3][4][5][6][7]. These SR models succeeded in achieving SR with higher accuracy than previous interpolation methods. Recent SOTA methods with SR deep learning techniques have shown exceptional performance for natural images [2][3][4][5][6][7]. Following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) standard, Ooi et al. systematically reviewed the SR deep learning algorithms [25]. Although unsupervised learning models such as the generative adversarial network (GAN) have also been proposed [26], most of these methods are based on the framework of supervised learning. These supervised learning SR frameworks artificially create the under-sampled low-resolution images from the given ground-truth high-resolution images and are trained to recover the original ground-truth images from the low-resolution images. Previously, image-generation techniques using GAN, such as PET images at normal doses being generated from reduced doses, have been considered to reduce radiation exposure in nuclear medicine [27,28]. On the other hand, the use of SR deep learning was limited for the purpose of radiation exposure reduction in PET imaging.

Patients and Image Acquisition
This retrospective observational study was conducted according to the guidelines of the Declaration of Helsinki and approved by the institutional review board (approval No. 020-0070). The need for written informed consent was waived due to the retrospective nature of the study. All images were acquired using the Vereos PET-CT system at our institution between April 2019 and May 2020. A total of 108 adult patients, 108 examinations, and 25,678 PET images were enrolled in this study ( Table 1). The number of images varied from patient to patient (Table 1). No more than one whole-body scan was performed on each patient. All patients fasted for ≥6 h before FDG injection (ca. 4 MBq/kg), and emission scanning was initiated approximately 60-min post-injection. The effective dose was used to calculate the whole-body dose to compare the radiologic detriments from different radiation exposures. The effective dose from 18F-FDG PET scans was calculated as the product of injected 18F-FDG radioactivity and the dose coefficient weighting factor recommended in the International Commission on Radiological Protection publication 80 [11,29]. This weighting factor was set at 7.0 mSv/MBq for adults when 18F-FDG was administered to be 370 MBq. All images were reconstructed using an ordered-subset expectation maximization (OSEM) algorithm, time-of-flight algorithm, and point-spread function correction. The reconstructed images had a matrix size of 144 × 144 and voxel size of 4.0 × 4.0 × 4.0 mm. In this study, the "ground-truth" PET images in each patient were reconstructed at 90 s, which unified the collection time in all patients. To obtain PET images with simulated reduced injected FDG doses, three types of short-acquisition-time PET images (10%, 20%, and 50%) were reconstructed from identical PET emission data for each patient.

Super Resolution
We used the residual dense network (RDN) model for this study, which acquired SOTA at the time of research planning [30]. In short, this SR model fully exploited the hierarchical features from all convolutional layers by the residual dense block (RDB), which allowed direct connections from the state of the preceding RDB to all layers of the current RDB. Consequently, the RDN model achieved better/comparable performance against SOTA in experiments on benchmark datasets with different degradations [30]. This SR model was limited to importation in an 8-bit 3-channel RGB color image. Because the PET images represent one gray-scale channel, the training data in one channel were trained and concatenated with those in the other two channels to construct complete, color, high-resolution images. Thus, all reconstructed PET image data were anonymized and converted from 16-bit grayscale digital imaging and communications in medicine (DICOM) files to 8-bit three-channel grayscale portable network graphic (PNG) files using MATLAB's (MATLAB2019b, The MathWorks, Natick, MA, USA) "mat2gray" function. We used a computer with two graphic processing units: NVIDIA GeForce GTX 1080 Ti 11GB (NVIDIA Corporation, Santa Clara, CA, USA). In the RDN model training, because the RDN model does not require low-resolution images, training data of the full-acquisition-time PET images (ground-truth) were used, and one-fourth were downsampled with the bilinear method. After the training, the predicted SR image was upsampled 4 times from the low-resolution input test images, which was not included in the training data. The training model hyperparameters were as follows: Maximum number of training epochs, 100; initial learning rate, 10 −5 ; mini-batch size, 4. We divided the reconstructed PET images into 9 equal subsets according to the number of patients. Based on the subsets, we performed a 9-fold cross-validation procedure. We imported the half-acquisition-time PET image (50% PET image) as the LR image to the RDN and subsequently predicted the SR image. Overall, the output SR image had a four-fold upscaling resolution from these 50% PET images. Figure 1 shows the flow chart of the proposed method and the model architecture in this study.

Evaluation
We divided the supervised data into nine subsets for nested cross-validation [31]. Each subset was an independent combination of 96 patients used for training and 12 patients used for test images to prevent the overlap of patient images between the training

Evaluation
We divided the supervised data into nine subsets for nested cross-validation [31]. Each subset was an independent combination of 96 patients used for training and 12 patients used for test images to prevent the overlap of patient images between the training and testing images within the subsets. To determine the effectiveness of SR for radiation exposure reduction in PET examinations, we performed objective and subjective evaluations. The objective evaluation is aimed at determining whether the SR model can be used to approximate the reference image better, and the subjective evaluation is aimed at determining whether the SR model can achieve an image quality appropriate for diagnosis.
For objective evaluation, 50% of the PET images were first downsampled using a bilinear model with a scaling factor of 4. Next, downsampled 50% PET images were upsampled by bilinear and RDN models with a scaling factor of four, to compare the output SR image and original "ground-truth" images in the same matrix size. The peak signal-tonoise ratio (PSNR) and structural similarity (SSIM) were calculated using MATLAB "PSNR" and "SSIM" functions [32,33]. These objective indices are well-known quality metrics for the comparison of two images. The PSNR is based on comparisons using explicit numerical criteria, using the mean squared error (MSE). However, SSIM is considered to be correlated with the quality perception of the human visual system, designed by modeling any image distortion as a combination of three factors (loss of correlation, luminance distortion, and contrast distortion). For a reference image x and test image y, the details of the PSNR and SSIM calculation equations are as follows: where V p is the peak value of the signal, which was set to (2 8 − 1 = 255) in this study; MSE(x, y) is the simplest and most widely used full-reference quality metric calculated by averaging the squared intensity differences of distorted and reference image pixels; µ x and µ y are the means of x and y; σ xy and σ y are the variances of x and y; σ xy is the covariance of x and y; and L is the dynamic range of the pixel values (255 for 8-bit grayscale images). The value of PSNR(x, y) approaches infinity as MSE(x, y) approaches zero, and a small value of PSNR(x, y) implies large numerical differences between x and y. The SSIM ranges from 0 to 1, where 1 denotes perfect similarity between two images. These measures were calculated for all images. For the subjective evaluation, 50% PET images were upsampled using the RDN model. Bilinear interpolation was used to resample the other PET images to the same matrix size as the output SR-PET images. Three experienced board-certified nuclear medicine physicians (KH, 14 years; SW, 4 years; and RK, 1 year after board certification) visually evaluated all the images, independently, without access to the image label (i.e., which recovery method was used). We selected images for the subjective evaluation of the lung, liver, and bowel levels of each patient. All images were provided to the physicians in random order for image review. We performed a mean opinion score (MOS) test to quantify this ability [34]. Specifically, we asked these physicians to assign scores from 1 (poor image quality) to 5 (excellent image quality) to the original PET image, 10% PET image, 20% PET image, 50% PET image, and image super-resolved via RDN from the 50% PET image. Intraclass correlation coefficients (ICCs) were used to assess agreement between quantitative measurements in terms of consistency and conformity [35]. Based on ICC selection guidelines, we used the following forms using two-way mixed effects for model selection, consistency for definition selection, and single rater for type selection [36]: where 3 refers to the two-way mixed effect model, k is the number of raters, k = 3 was set in this study, and MSR is the mean square for rows. When MSE = 0, ICC(a, b) = 1. ICC values less than 0.5 indicated poor reliability, values between 0.5 and 0.75 indicated moderate reliability, values between 0.75 and 0.9 indicated good reliability, and values greater than 0.90 indicated excellent reliability [36]. Bland-Altman plots were generated to evaluate the agreement of the MOS between operators in each image set. In the Bland-Altman plot, the horizontal axis shows the mean of the MOS between operators, and the vertical axis represents the difference in MOS between operators (d). Cohen's weighted kappa (k) was used as a measure of agreement of interoperator variance [37,38]. This indicates the magnitude of the disagreement between the operators in the calculation. The interpretation of agreement for k was categorized as follows: Poor (k < 0), slight (0 ≤ k ≤ 0.2), fair (0.21 ≤ k ≤ 0.4), moderate (0.41 ≤ k ≤ 0.6), substantial (0.61 ≤ k ≤ 0.80), and almost perfect (k > 0.8).
The Wilcoxon signed-rank test was used for statistical analysis. Differences were considered statistically significant when p < 0.05. All statistical analyses were performed using the Statistical Package for the Social Sciences (SPSS) Statistics 26 (IBM Corp., Armonk, NY, USA).

Results
In the objective evaluation, the mean PSNR was 30.9 dB (95% confidence interval [CI]: 30.7-31.0 dB) in bilinear upsampling and 31.3 dB (95% CI: 31.1-31.5 dB) in the RDN model super-resolved image. In addition, the mean SSIM was 0.927 (95% CI: 0.924-0.930 dB) in bilinear upsampling and 0.931 (95% CI: 0.928-0.934 dB) in images super-resolved using the RDN model. Statistically significant differences were observed in both PSNR and SSIM (p < 0.05). Thus, the quality of the super-resolved image obtained using the RDN model was significantly better than that of the conventional bilinear upsampled image.
In the subjective evaluation, Figure 2 shows an example of an image at the liver level. Figure 3 shows the MOS results for each image set and Table 2 summarizes the subjective evaluation results. As shown in Table 2, the MOS of the super-resolved image obtained via the RDN model at all levels was significantly higher than that of the 50% PET image upsampled by the bilinear method (p < 0.05). However, the MOS of the super-resolved image obtained using the RDN model was significantly lower at all levels than that of the original image upsampled using the bilinear method (p < 0.05). Furthermore, Figure 4 shows the Bland-Altman plots for the inter-operator difference in MOS for each image set. The ICC estimates and their 95% CIs for the 50% PET image set were 0.62 (95% CI: 0.48-0.73) for the lung level, 0.56 (95% CI: 0.56-0.39) for the liver level, and 0.14 (95% CI: −0.18 to 0.39) for the bowel level. The ICC estimates and their 95% CIs for the SR-PET image set were 0.48 (95% CI: 0.28-0.63) for the lung level, 0.43 (95% CI: 0.22-0.60) for the liver level, and 0.32 (95% CI: 0.07-0.52) for the bowel level. All kappa indexes (k) exceeded 0.8 (Table 3).

Discussion
18F-FDG PET-CT scans are required to provide accurate tumor diagnoses and monitor the metabolic response of patients to treatment. However, this examination involves considerable radiation exposure [11,39]. By replacing CT with MRI, the resulting PET/MR system can reduce radiation exposure from CT scans. Many previous studies have focused on replacing CT with MRI for the registration of anatomical and functional information from PET. However, reduction of the injected FDG dose has received less attention. The major problem associated with a reduced injected FDG dose is the increase in image noise [21]. In this study, we evaluated the image quality of 18F-FDG PET images generated with 50% of the typically injected FDG dose. In this study, using the RDN model, we created an SR-PET image set from a 50% PET image obtained using a Vereos PET/CT scanner equipped with a SiPM detector. In our objective evaluation of PSNR and SSIM, the SR-PET image set showed high similarity to the conventional method. In addition, our subjective evaluation of MOS by three different experienced board-certified nuclear medicine physicians suggested that the SR-PET image set was of significantly higher quality than the 50% PET image set. On the other hand, the MOS of the SR-PET image set was significantly lower than that of the original "ground-truth" image. Moreover, the ICC was moderately reliable in all cases. Our results thus suggest that SR-PET images enable the use of a low FDG injection dose in whole-body PET scans. This is useful not only for adults, but also for pediatric and adolescent young adults.
Wang et al. demonstrated the generation of diagnostic 18F-FDG PET images of pediatric cancer patients from an ultra-low-dose (6.25%) 18F-FDG PET image by using a convoluted neural network algorithm [40]. Recently, image generation techniques, such as generative adversarial networks, have been considered for generating PET images, such as reconstructed normal injected doses from lower injected dose images [27,28]. Based on these reports, the use of simulated half-injected FDG dose images is reasonable [22,23]. The median injected FDG dose and exposure dose related to PET in clinical data were 258.7 MBq and 4.9 mSv (Table 1). If an SR image sufficient for diagnosis can be obtained from a half-acquisition-time PET image, it can be expected that the exposure dose will be reduced by half. Because a lower injected FDG dose for patients causes lower radiation exposure in PET scans, we should consider a much lower injected dose image as the input image set.
This study had some limitations. The first limitation is the selection of the SR deeplearning model architecture. As many new architectures are released every year, it was difficult to verify which model would be optimal for our purposes [2][3][4][5][6]. Therefore, we selected the RDN model, which achieved excellent results when considering this research plan. However, the RDN model was fine-tuned for natural-color images and not for medical grayscale images. However, since there are many colored quantitative images in medicine for evaluating blood flow and function [41,42] in the human body, training with three channels of RBG instead of one channel of grayscale input would be useful for transfer learning of the created model. Consequently, we converted the 16-bit DICOM image to an 8-bit PNG image, as the RDN model could not demonstrate its performance on PET images. As a result, the expressivity of the intensity histogram would be reduced. If an SR PET image was directly generated from the 16-bit DICOM image, the expressivity would maintain the intensity histogram. Recently, a model of direct super-resolution for DICOM images was proposed. Sim et al. designed deep convolutional networks for working with grayscale DICOM images of the brain [43]. In this way, further improvement can be expected by using the directly imported DICOM image model without needing to convert them to PNG images.
The second limitation of our study was inter-operator bias. As shown in Figure 4 and Table 3, the inter-operator agreement was almost perfect. However, the absolute d in our proposed method was higher than 1 at some points. This suggests that operator 2 understood the SR-PET images generated by the RDN model more than did the other operators.
The third limitation was the evaluation of the maximum standardized uptake value (SUV max ). In this study, we mentioned the image quality of the SR PET image and did not sufficiently consider the quantitative diagnostic ability. As a quantitative measurement index of the de facto standard, SUV max has been used to express the degree of FDG uptake. Normally, SUV max is calculated using the DICOM tag with each voxel expressed in a 16-bit integer in the Metavol software package for PET-CT volumetric analysis [44,45]. Because quantitative measurements using SUV max are important in clinical use, it is required to save the quantitative value for each voxel. In this study, however, the SUV max was not calculated because we converted the 16-bit DICOM images into 8-bit PNG images. Therefore, the current challenge is to secure a quantitative value of SUV max when using SR. Based on the above, we will improve the SR model and generate diagnostic PET images from much lower injected FDG doses in future research.

Conclusions
In conclusion, we evaluated the image quality of super-resolved images obtained from 50% simulated injected FDG-dose PET images using the RDN model. By using the SR model, the image quality was improved compared to that before SR image processing. Although there are some limitations to its clinical use, our results suggest that implementing the SR model could be effective in reducing the injected FDG dose for PET examination. Funding: This study was supported in part by the Grants-in-Aid for Regional R&D Proposal-Based Program from the Northern Advancement Center for Science & Technology of Hokkaido Japan.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Hokkaido University Hospital (approval No. 020-0070).

Informed Consent Statement:
The need for written informed consent was waived due to the retrospective nature of the study. Data Availability Statement: Not applicable.