Generating High-Resolution CT Slices from Two Image Series Using Deep-Learning-Based Resolution Enhancement Methods

Medical image super-resolution (SR) has mainly been developed for a single image in the literature. However, there is a growing demand for high-resolution, thin-slice medical images. We hypothesized that fusing the two planes of a computed tomography (CT) study and applying the SR model to the third plane could yield high-quality thin-slice SR images. From the same CT study, we collected axial planes of 1 mm and 5 mm in thickness and coronal planes of 5 mm in thickness. Four SR algorithms were then used for SR reconstruction. Quantitative measurements were performed for image quality testing. We also tested the effects of different regions of interest (ROIs). Based on quantitative comparisons, the image quality obtained when the SR models were applied to the sagittal plane was better than that when applying the models to the other planes. The results were statistically significant according to the Wilcoxon signed-rank test. The overall effect of the enhanced deep residual network (EDSR) model was superior to those of the other three resolution-enhancement methods. A maximal ROI containing minimal blank areas was the most appropriate for quantitative measurements. Fusing two series of thick-slice CT images and applying SR models to the third plane can yield high-resolution thin-slice CT images. EDSR provides superior SR performance across all ROI conditions.


Introduction
Medical imaging is an important activity in modern medical systems and the best means of diagnosing and treating patients [1]. For example, in the diagnosis and management of pulmonary diseases, medical imaging has evolved from simple planar film, and sonography, to computed tomography (CT) including low-dose CT. Chest CT is the best imaging modality for revealing pulmonary conditions such as emphysema, interstitial fibrosis, tumor masses and early lung cancer [2]. There is a growing demand for high-resolution (HR) thin-slice medical images for clinical applications to identify early lung cancer with sub-centimeter ground glass opacity, characterize lung fibrosis, determine the relationships between lung lesions and their surrounding structures, and provide quantitative measurements [3][4][5]. Thin-slice CT images also provide adequate information for generating navigation guides for bronchoscopy [6]. Based on the development of modern CT scanners, we achieved enhanced contrast, higher resolution, and reduced radiation doses.
Image resolution and slice thickness must inevitably be balanced with the file size of medical images, which impacts the cost of storage, speed of file exchange on the intranet or across institutions, and human resources required for interpretation. Although the original data from modern scanners can be reconstructed into thin-slice images, most images in the picture archiving and communication system (PACS) are not thin-slice images, meaning they have low spatial resolution in the Z plane. This system can reduce storage demand,

Materials and Methods
Four SR models were applied in our study. In Section 2.1.1 to Section 2.1.4, we present a simple description of each network model and its modifications. Section 2.2 to Section 2.5 describe the Digital Imaging and Communications in Medicine (DICOM) data collection process, the process for generating LR DICOM data, and the training and testing procedure. SRCNN was the first CNN adapted to image SR reconstruction [11]. Its network is simple and consists of three components for patch extraction and representation, nonlinear mapping, and reconstruction. Figure 1 presents the SR flow in SRCNN. An input LR image is resized through bicubic interpolation to match the target size (i.e., SR size). Then, it is passed through the SRCNN to obtain an SR image. SRCNN is simple, fast, and provides good quality compared to classical computer vision algorithms for SR.
image is resized through bicubic interpolation to match the target size (i.e., SR size). The it is passed through the SRCNN to obtain an SR image. SRCNN is simple, fast, and pr vides good quality compared to classical computer vision algorithms for SR. In this study, the resolution of our LR and HR images was the same, so we skipp the interpolation step and applied SRCNN directly.

VDSR (Very Deep Super Resolution)
Kim et al. found that if a network has more convolution layers, it can provide bet accuracy [20]. In the VDSR network, there are 20 layers of network depth. However deep-depth network typically encounters convergence problems during training. The fore, the authors used residual learning and gradient clipping to overcome this issue. L and HR images share significant low-frequency information, so training a model to lea the differences (i.e., high-frequency information) between LR and HR is advantageou Figure 2 presents the flow for producing SR images using VDSR. Similar to the SRCNN described above, because our LR and HR images have t same resolution, bicubic interpolation was omitted here.

SRResNet (Generator Network for SRGAN)
SRResNet is the generator for SRGAN [21], and we adopted this deep learning mod as one of the evaluation networks. SRResNet is based on the concept of a residual netwo and contains 16 residual blocks. Figure 3 presents the SRResNet network. SRResNet a plies pixel shuffling (subpixel convolution) for upsampling the feature map size.
In SRResNet, the upsampling block plays the role of changing the LR size to mat the HR size. In this study, because our LR images had the same resolution as the HR i ages, we modified the upsampling block to provide 1:1 outputs, and removed the secon ary upsampling block. In this study, the resolution of our LR and HR images was the same, so we skipped the interpolation step and applied SRCNN directly.

VDSR (Very Deep Super Resolution)
Kim et al. found that if a network has more convolution layers, it can provide better accuracy [20]. In the VDSR network, there are 20 layers of network depth. However, a deep-depth network typically encounters convergence problems during training. Therefore, the authors used residual learning and gradient clipping to overcome this issue. LR and HR images share significant low-frequency information, so training a model to learn the differences (i.e., high-frequency information) between LR and HR is advantageous. Figure 2 presents the flow for producing SR images using VDSR.
it is passed through the SRCNN to obtain an SR image. SRCNN is simple, fast, and p vides good quality compared to classical computer vision algorithms for SR. In this study, the resolution of our LR and HR images was the same, so we skipp the interpolation step and applied SRCNN directly.

VDSR (Very Deep Super Resolution)
Kim et al. found that if a network has more convolution layers, it can provide bet accuracy [20]. In the VDSR network, there are 20 layers of network depth. However deep-depth network typically encounters convergence problems during training. The fore, the authors used residual learning and gradient clipping to overcome this issue. and HR images share significant low-frequency information, so training a model to lea the differences (i.e., high-frequency information) between LR and HR is advantageo Figure 2 presents the flow for producing SR images using VDSR. Similar to the SRCNN described above, because our LR and HR images have t same resolution, bicubic interpolation was omitted here.

SRResNet (Generator Network for SRGAN)
SRResNet is the generator for SRGAN [21], and we adopted this deep learning mod as one of the evaluation networks. SRResNet is based on the concept of a residual netwo and contains 16 residual blocks. Figure 3 presents the SRResNet network. SRResNet a plies pixel shuffling (subpixel convolution) for upsampling the feature map size.
In SRResNet, the upsampling block plays the role of changing the LR size to mat the HR size. In this study, because our LR images had the same resolution as the HR i ages, we modified the upsampling block to provide 1:1 outputs, and removed the secon ary upsampling block. Similar to the SRCNN described above, because our LR and HR images have the same resolution, bicubic interpolation was omitted here.

SRResNet (Generator Network for SRGAN)
SRResNet is the generator for SRGAN [21], and we adopted this deep learning model as one of the evaluation networks. SRResNet is based on the concept of a residual network and contains 16 residual blocks. Figure 3 presents the SRResNet network. SRResNet applies pixel shuffling (subpixel convolution) for upsampling the feature map size.
In SRResNet, the upsampling block plays the role of changing the LR size to match the HR size. In this study, because our LR images had the same resolution as the HR images, we modified the upsampling block to provide 1:1 outputs, and removed the secondary upsampling block.

EDSR (Enhanced Deep Residual Network)
EDSR is based on SRResNet. Lim et al. performed some modifications on the original network and achieved enhanced results [22]. The EDSR structure is presented in Figure 4. Compared to SRResNet, EDSR removes the batch normalization layer because although batch normalization can normalize features in a SR task, preserving original features instead of normalizing them can yield enhanced detail in SR images. Additionally, Lim et al. found that when the number of feature maps increases, network training becomes more numerically unstable. Therefore, they applied a residual scaling factor of 0.1 to each residual block to overcome this issue.  [22]. The EDSR structure is presented in Figure 4. Compared to SRResNet, EDSR removes the batch normalization layer because although batch normalization can normalize features in a SR task, preserving original features instead of normalizing them can yield enhanced detail in SR images. Additionally, Lim et al. found that when the number of feature maps increases, network training becomes more numerically unstable. Therefore, they applied a residual scaling factor of 0.1 to each residual block to overcome this issue.

EDSR (Enhanced Deep Residual Network)
EDSR is based on SRResNet. Lim et al. performed some modifications on the original network and achieved enhanced results [22]. The EDSR structure is presented in Figure 4. Compared to SRResNet, EDSR removes the batch normalization layer because although batch normalization can normalize features in a SR task, preserving original features instead of normalizing them can yield enhanced detail in SR images. Additionally, Lim et al. found that when the number of feature maps increases, network training becomes more numerically unstable. Therefore, they applied a residual scaling factor of 0.1 to each residual block to overcome this issue.  In the original EDSR network, there are 16 residual blocks. In our study, we modified this from 16 to 32 residual blocks. Similar to SRResNet, we modified the upsampling block for 1:1 outputs because our LR and HR images had the same resolution.

Data Collection
Deidentified paired axial-plane images from chest CT scans and corresponding coronalplane images from the same studies were collected. A portion of the CT images came from two retrospective studies, and the waving of informed consent was approved by the Institutional Review Board of Taipei Veterans General Hospital (TPEVGH IRB No.: 2019-07-046BC, 2021-04-014BC). The remaining CT images came from a study evaluating interstitial lung disease and their use was approved by the Institutional Review Board of the Taipei Veterans General Hospital where the participants provided informed consent (TPEVGH IRB No.: 2017-07-010CC). All CT studies were saved in the original DICOM format from the PACS. The deidentification process involved the elimination of all names and identifiers of patients in DICOM tags. All dates were modified to January 1st in the same year. We also generated new universal identifiers (UIDs) for each series. Beyond the CT images, we did not collect any other clinical data from the original studies. The CT scanner manufacturers corresponding to the images included Philips (Amsterdam, The Netherlands), Siemens Healthcare GmbH (Erlangen, Germany), and Toshiba (now Canon Medical Systems Corp. (Otawara, Japan)). The parameters of CT scanners of each case are listed in the Supplementary Table S1.

Image Preprocessing Process
For each CT scan, we reserved axial-plane slices of 1 mm and 5 mm in thickness, as well as a coronal-plane slice of 5 mm in thickness. If the axial-plane and coronal-plane were not right orthogonal to each other, then the CT set was discarded. Table 1 summarizes the requirements for collecting CT sets from DICOM. In the image preprocessing process, our goal was to generate LR DICOM to serve an as input when training SR models. We used the world coordinates of axial thin slices to perform image registration in 3D space for the axial thick slices and coronal thick slices, and then combined the slices by averaging to generate LR DICOM data. LR DICOM data contains more details compares to axial thick slices. Figure 5 presents the process flow for generating LR DICOM data.
All 35 sets of lung CT DICOM data were processed in this manner to generate LR DICOM data. The thin-slice DICOM data in the axial plane were used as HR data. We split the 35 datasets into training, evaluation, and testing sets. Eleven datasets were used as training data and two datasets were used for evaluation during training. The remaining 22 datasets were reserved as testing data.

SR Model Selection and Training
In the training procedure, we read each pair of HR and LR DICOM data and stacked the images into a 3D array. According to the training settings, the pairs of HR and LR images were taken from a specific plane (i.e., axial, coronal, or sagittal) in the corresponding 3D array to act as inputs to train the model. Figure 6 presents a diagram of the training process.
x FOR PEER REVIEW 6 of 19

SR Model Selection and Training
In the training procedure, we read each pair of HR and LR DICOM data and stacked the images into a 3D array. According to the training settings, the pairs of HR and LR images were taken from a specific plane (i.e., axial, coronal, or sagittal) in the corresponding 3D array to act as inputs to train the model. Figure 6 presents a diagram of the training process.

SR Model Selection and Training
In the training procedure, we read each pair of HR and LR DICOM data and stacked the images into a 3D array. According to the training settings, the pairs of HR and LR images were taken from a specific plane (i.e., axial, coronal, or sagittal) in the corresponding 3D array to act as inputs to train the model. Figure 6 presents a diagram of the training process. After the HR and LR images were selected, we cropped random 96 × 96 pixel patches from the images. SRCNN, VDSR, SRResNet, and EDSR were trained using different parameters according to our training experience. The type of loss function was selected by referencing the original paper associated with each model. Table 2 summarizes the training settings for each model. After the HR and LR images were selected, we cropped random 96 × 96 pixel patches from the images. SRCNN, VDSR, SRResNet, and EDSR were trained using different parameters according to our training experience. The type of loss function was selected by referencing the original paper associated with each model. Table 2 summarizes the training settings for each model.

Testing Procedure
We used 22 sets of HR and LR DICOM data to test each different model. Figure 7 presents a testing diagram with peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) calculations.

Quantitative and Statistical Analysis
Two common metrics, namely PSNR and SSIM, were considered to compare quality between SR images generated using the deep learning models and ground-truth HR images. We also computed PSNR and SSIM values for segmented body and lung regions to focus on ROIs for physicians. The methods for ROI selection is illustrated in Figure 8 [23,24]. This type of evaluation can help reduce the effects of blank areas in medical images. The one-sided Wilcoxon signed-rank test was used to assess excellence scores between SR images (including LR images) generated by any two models. Boxplots were used to visualize PSNR and SSIM changes between model-generated SR images and ground-truth HR images. All numerical analysis and data preprocessing were performed

Quantitative and Statistical Analysis
Two common metrics, namely PSNR and SSIM, were considered to compare quality between SR images generated using the deep learning models and ground-truth HR images. We also computed PSNR and SSIM values for segmented body and lung regions to focus on ROIs for physicians. The methods for ROI selection is illustrated in Figure 8 [23,24]. This type of evaluation can help reduce the effects of blank areas in medical images. The onesided Wilcoxon signed-rank test was used to assess excellence scores between SR images (including LR images) generated by any two models. Boxplots were used to visualize PSNR and SSIM changes between model-generated SR images and ground-truth HR images. All numerical analysis and data preprocessing were performed using Python and R.

Results
We trained the deep learning models on a server with an AMD EPYC 7542 CPU and four NVIDIA A100 GPUs. The server was not a dedicated server and was shared by several colleagues. The number of GPUs for the training process was limited to two, and the training time could have had a greater variance. Table 3 lists the time required to train the different models. Table 4 lists the average processing times per SR DICOM slice inferred by different models and saved as DICOM files.

Results
We trained the deep learning models on a server with an AMD EPYC 7542 CPU and four NVIDIA A100 GPUs. The server was not a dedicated server and was shared by several colleagues. The number of GPUs for the training process was limited to two, and the training time could have had a greater variance. Table 3 lists the time required to train the different models. Table 4 lists the average processing times per SR DICOM slice inferred by different models and saved as DICOM files. The ground-truth HR images, LR images, and sample comparisons for each SR DICOM slice generated along the sagittal axis are presented in Figure 9. The upper case is typical ground glass nodule that was confirmed to be early-stage lung cancer through surgical resection. All SR images maintain the ground glass features of this lesion, but only EDSR maintains the correct size and structure in terms of human visual quality. In the EDSR images, the fine structures of the surrounding lung markers are more clearly defined and correctly connected. The lower case is a cavitary lesion. Although the details of the upper border of the cavitary lesion are blurred in all SR images, the EDSR image preserves the arrangement of the proximal vessels and bronchi.

OR PEER REVIEW 9 of 19
the EDSR images, the fine structures of the surrounding lung markers are more clearly defined and correctly connected. The lower case is a cavitary lesion. Although the details of the upper border of the cavitary lesion are blurred in all SR images, the EDSR image preserves the arrangement of the proximal vessels and bronchi. Figure 9. SR image samples of different AI models. All images were generated by applying SR models in the sagittal plan. Table 5 lists the mean image quality metrics (including SSIM and PSNR) of different SR images compared to ground-truth HR images. Figure 10 presents a boxplot of the results for a visual comparison of the performance of each model. Model inferences for different planes are plotted near each other. LR results are also included in the graph for visual comparison. The SR models applied to the sagittal plane exhibited superior performance compared to those applies to the axial and coronal planes. Statistically, the PSNR and SSIM values of the SR images in the sagittal plane were significantly higher than those of another two planes, indicating that our hypothesis is correct. (see Tables 6 and 7) EDSR was statistically better than other models, with the exception of SRResNet in PSNR.   Table 5 lists the mean image quality metrics (including SSIM and PSNR) of different SR images compared to ground-truth HR images. Figure 10 presents a boxplot of the results for a visual comparison of the performance of each model. Model inferences for different planes are plotted near each other. LR results are also included in the graph for visual comparison. The SR models applied to the sagittal plane exhibited superior performance compared to those applies to the axial and coronal planes. Statistically, the PSNR and SSIM values of the SR images in the sagittal plane were significantly higher than those of another two planes, indicating that our hypothesis is correct. (see Tables 6 and 7) EDSR was statistically better than other models, with the exception of SRResNet in PSNR.    A demonstration of the proposed method for selecting an ROI is presented in Figure 11. While the ROI covered the entire images (blue box), we also selected bounding boxes for the body area (orange box) and lung fields (grey box). Smaller bounding boxes ignored most of the ambient air and focused on the structures we were interested in. As expected, more uniform black ambient air areas increase the PSNR and SSIM. When the entire image was included in the calculation, the PSNR and SSIM values were relatively high, whereas the lung field bounding boxes yielded the lowest values (see Table 7 and Figure 12). Regardless, applying EDSR to the sagittal plane yielded the best results for all three ROIs (see Table 8). Overall, the PSNR and SSIM increased in the same manner, regardless of ROI selection (see Figure 13). One can see that EDSR was superior to the other models on average, but sometimes, SRResNet performs better than EDSR.
Diagnostics 2022, 12, x FOR PEER REVIEW A demonstration of the proposed method for selecting an ROI is presented i 11. While the ROI covered the entire images (blue box), we also selected boundin for the body area (orange box) and lung fields (grey box). Smaller bounding boxes most of the ambient air and focused on the structures we were interested in. As e more uniform black ambient air areas increase the PSNR and SSIM. When the enti was included in the calculation, the PSNR and SSIM values were relatively high, the lung field bounding boxes yielded the lowest values (see Table 7 and Figure gardless, applying EDSR to the sagittal plane yielded the best results for all thr (see Table 8). Overall, the PSNR and SSIM increased in the same manner, regar ROI selection (see Figure 13). One can see that EDSR was superior to the other m average, but sometimes, SRResNet performs better than EDSR.

Discussion
Deep learning techniques have been successfully applied to the field of medical imaging since 2014 [25]. In this study, we utilized SR methods to estimate HR medical image sequences from LR images. In contrast to most approaches in this area, we constructed a 3D LR data cube by combining two registered orthogonal image series (i.e., axial and coronal) from real-world clinical data. Furthermore, we compared the results of different SR models applied to different plane orientations of the 3D data cube. The PSNR and SSIM results of applying the SR models to the sagittal plane of the LR images were significantly better than the results of other orientations. The image quality achieved by the EDSR model was higher than that achieved by the other SR models. SRResNet also exhibited consistent quality gains across all test cases. The generated SR CT images may be used in other computer aided diagnosis tasks, such as pulmonary nodule detection and classification for early lung cancers [26][27][28]. Kazem, et al. found that high resolution SR images could assist the decision-makers in diagnosis [29].
When selecting ROIs in this study, we found that blank areas caused by ambient air can erroneously increase the scores of quantitative measures. Unlike natural images, medical images contain many blank areas. Some are attributed to the ambient air, and some are attributed to the scanning field of view. Ideally, the intensity differences in these regions are small and may increase similarity scores. The ROIs considered ranged from a 512 × 512 pixel DICOM image slice to a 32 × 32 window [30]. In our experiments, the higher the proportion of air and smaller the proportion of body tissue, the higher the PSNR and SSIM values. Additionally, regardless of which ROI was selected, SRResNet and EDSR achieved the best performance. Although the trend of change in the quality scores appeared constant regardless of ROI size, a general rule should be developed for selecting ROIs in future medical image SR studies.
Deep networks for SR are known to be influenced by the scale factor, which can be translated into the multiplier of a single voxel [17,19]. Small scale factors tend to yield better SR results. Our method combined two series of medical image planes to enrich the information in input LR images and reduce the theoretical scale factor. In this manner, the scale factor could be less than three on average, even when the original slice thickness was 5 mm.
Our study has several weaknesses. First, we only considered CT images in which the axial and coronal planes were right orthogonal to each other. Such images only accounted for one-tenth of our original data pool. CT images with arbitrary angles require 3-D registration and fusion. This process requires significant computational power. This condition requires further work and testing for integration into our SR workflow. Second, CT images may have different reconstruction kernels and we did not limit the reconstruction method of the input images. In most of the clinical cases collected, we merged 5 mm images with the so-called smooth kernels to predict the 1 mm axial images reconstructed by sharp kernels. Most previous studies used HR images to generate simulated LR images in which the CT reconstruction kernel or MR sequence remained the same [16,19,31]. There are also deep-learning-based image conversion methods for CT reconstruction kernels [32]. However, there are no articles discussing the influence of different reconstruction kernels for input images in SR problems. To observe the effects of input images with different combinations of reconstruction kernels, a much larger dataset is required before conclusions can be drawn. Finally, we did not have doctors to provide visual scores for output images. Many studies have derived human visual scores by various processes using different questions [30,33]. However, we performed some preliminary tests by ourselves. The SR images could be easily distinguished from HR and LR images, and senior physicians could point out differences between the images. The images wee not identical in all aspects, particularly in terms of the fine structures of artifacts. To derive more subjective visual scores, we require a thorough testing design, with cropping for ROIs, randomly selected images, and suitable questions.
On the basis of this study, several recommendations have been made to expand our current work in the future. Firstly, we can expand our collection of image files to balance the number of conventional CT, LDCT and contrast CT. Secondly, the SR algorithm proposed in the method should be updated for advanced performance. Thirdly, there are many new SR methods for medical images to enhance details between slices, so they need to be collected and compared to each other. Finally, since medical images are interpreted by physicians, it is necessary to apply a well-designed human visual score in further studies to meet clinical requirements.

Conclusions
We fused two orthogonal CT planes and applied several deep-learning-based SR methods to a third plane to generate HR slice CT images. Based on qualitative and quantitative comparisons, the EDSR model outperformed other deep-learning-based resolution enhancement methods. A reasonable ROI that contains a maximal ROI with minimal blank areas should be selected for reasonable quantitative measurements. Based on our real-world clinical dataset, the proposed method has excellent potential for clinical tasks and may also be beneficial for research tasks, such as updating old CT images to facilitate various types of longitudinal image-based research.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/diagnostics12112725/s1, Table S1: Parameters of CT scanners of each case.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study of 2017-07-010CC. Patient consent was waived due to retrospective, observation study in the study of 2019-07-046BC and 2021-04-014BC.