In this section, we describe the dataset and the implementation details of the model. Two common image quality evaluation indexes PSNR and SSIM are adopted to evaluate the model objectively. We also compare our model with other advanced super-resolution methods and analyze the results.
4.3. Evaluation Indexes
In order to verify the performance of the super-resolution reconstruction network, it is necessary to evaluate the reconstructed image. There are two methods to evaluate image quality: objective evaluation and subjective evaluation. Subjective evaluation is affected by many aspects, it mainly evaluates the reconstructed SR image from the visual effect. The objective evaluation is to quantitatively analyze the image and evaluate it through specific evaluation indicators. In this paper, the performance of super-resolution networks is evaluated by using two recognized image quality indicators, the Peak Signal-to-Noise Ratio (PSNR) [
32] and Structure Similarity Index (SSIM) [
33]. The PSNR and SSIM of the reconstructed SR results are calculated by the MATLAB to compare our method and other methods.
PSNR.
PSNR (dB) is based on the error between the corresponding pixels of the image pair. It is an objective standard for evaluating the image. The unit of
PSNR is dB. In general, the higher
PSNR represents a higher resolution image. It is defined by
MSE:
Then,
PSNR can be expressed as:
where
and
are represented as images with a size of
. The larger the
PSNR, the smaller the image distortion and the better the image quality.
SSIM.
SSIM is an index used to measure the structural similarity between two images, which measures the image similarity from the luminance, contrast and structure. The
SSIM expression is:
where
represents the average value of image
,
represents the average of image
,
represents the variance of image
,
represents the variance of image
,
represents the covariance of
and
, and
and
are constants. The higher the
SSIM value, the more similar the reconstructed image is to the original image, and the better the image quality.
4.4. Ablation Studies
This section mainly verifies the influence of some modules and strategies on our CT image reconstruction network. Among them, in order to prove the effect of information distillation and MAB on the model, we retain or remove the corresponding module to conduct the following ablation experiments.
The effectiveness of information distillation. IDMAN extracts the feature through information distillation (info-distill) to effectively utilize and learn feature information. In
Table 1, Conv means extract the feature by using traditional convolution (Conv-ReLU-Conv), and Info-distill means to extract the feature by using information distillation. In order to avoid the influence of MAB, the attention mechanism adopts the traditional channel attention. As shown in
Table 1, it can be seen that, compared with the traditional convolution operation, using information distillation to extract features can effectively improve PSNR, which means that our method, which uses info-distill, can make more use of feature information.
To show the reconstructed results more vividly and intuitively, we select two sets of images. We test the performance of IDMAN (with info-distill) and the model without info-distill. The reconstruction results are shown in
Figure 5.
From
Figure 5a, it easy to see that the edge of the reconstructed image is broken without info-distill, while the image can restore clearer edges when the information distillation is introduced. In
Figure 5b, it is obvious that more accurate details can be reconstructed when using information distillation.
It can be seen that more details can be obtained in the reconstruction results with information distillation, which also proves that our method can make better use of feature information to improve learning ability when using information distillation to extract the feature. The improvements in both the subjective visual reconstruction results and objective evaluation index show that using information distillation to make more use of feature information is effective.
The effectiveness of MAB. In order to explore the influence of MAB with multiple branches, we designed several sets of contrasting experiments to verify whether single or multiple branches with different convolution kernel sizes could improve the network learning ability and expression ability by capturing information under different branches. The result on the validation set is shown in
Table 2. The convolution kernel size
,
and
corresponded to the branches of
,
and
. When none of them occurred, it meant the traditional channel attention was used (
-ReLU-
). In addition, to avoid the influence of information distillation, the traditional convolution operation (Conv-ReLU-Conv) was used for feature extraction.
We can draw some conclusions by analyzing the data from
Table 2. First, under the single branch (Number of branches = 1), the performance could be improved by adding two convolution layers. For instance, the PSNR increased from 33.966 dB to 34.008 dB when we added two
convolution layers based on traditional channel attention. This demonstrated that the feature information could be better extracted by adding convolution layers. Second, it was obvious that, compared with the single branch MAB, the MAB with multiple branches (Number of branches is 2 or 3) could achieve a higher PSNR value. In other words, the performance of multiple branches was better than that of a single branch. The feature information could be better captured by our MAB with multiple branches structure.
The MAB with 3 × 3, 5 × 5, and 7 × 7 branches works better than the MAB with one or two branches. However, these comparison experiments are conducted without information distillation; if information distillation is introduced, the MAB can only have at most two branches, 3 × 3 and 5 × 5, due to the limitation of GPU. Therefore, our method selects MAB with 3 × 3 and 5 × 5 branches to better capture feature information as much as possible.
Table 3 shows the quantitative evaluation results of each module on the validation set. The experimental results show that PSNR are improved when adding the information distillation or MAB. When two modules are added at the same time, based on information distillation to the extract feature and the combination with MAB, the experimental results are the best (PSNR = 34.022 dB).
The effectiveness of weight normalization. EDSR improves the performance of the network by removing the batch normalization (BN). Inspired by this, we also adjust the normalization in the network. The weight normalization (WN) can accelerate the convergence of the deep learning network parameters by reparameterizing the weights of the network which decoupling the norm of the weight vector from the direction of the weight vector. Therefore, we normalize the convolution layer through WN in IDMAN.
Assume the output is
:
where
denotes the k-dimensional vector of the input features,
denotes the k-dimensional vector of weight,
denotes the scalar bias term. WN uses Formula (22) to re-parameterize the weight vector
:
where
denotes the scalar,
denotes the k-dimensional vector,
denotes the Euclidean norm of
. Further, we can obtain
, which is independent of the parameters
.
We have two experiments about WN. From
Table 4, it can be seen that IDMAN with WN can get the higher PSNR value. This also can prove that WN has a positive effect on our reconstruction network.
In
Figure 6, we can see that the curve of IDMAN without WN is steeper before 100 epochs, while the training of network with WN is more stable and the convergence speed is faster. Therefore, WN can accelerate the convergence of IDMAN and help the network to better learn the feature information.
4.5. Analysis of Experimental Results
Quantitative Analysis: In order to evaluate the performance of the model, we select some representative methods as the contrast. These methods are Bicubic [
1], SRCNN [
6], FSRCNN [
7], VDSR [
13], DRRN [
14], EDSR [
10], MDSR [
10], RDN [
20], and RCAN [
11]. PSNR and SSIM are adopted to evaluate the quality of each SR result.
Table 5 and
Table 6 summarize the PSNR and SSIM results of the quantitative evaluation of the scaling factors on
,
and
. As shown from
Table 5 and
Table 6, compared with Bicubic, PSNR and SSIM improved by 8.601~11.683 dB and 17.98~20.63%, respectively. Compared with the methods based on deep learning, PSNR and SSIM improved by 0.063~1.92 dB and 0.06~0.98% when the scaling factor was
. Under scaling factor
, PSNR and SSIM increased by 0.026~2.081 dB and 0.01~1.44%. Under
, they increased by 0.013~2.562 dB and 0~2.46%, respectively. It is clear that the Bicubic reconstruction based on the traditional interpolation method is the worst, PSNR and SSIM are the lowest, while the SR algorithm based on deep learning is clearly better than that based on the interpolation method. In either case, the reconstruction effect of the low scaling factor is better than the high scaling factor. The PSNR and SSIM values show that the performance of the IDMAN is better than that of the comparative methods, which proves the superiority of our proposed method.
In order to compare more comprehensively and to further show the universality advantage of IDMAN on other datasets. We also tested our method under different scaling factors on the medical datasets NSCLC Radiogenomics and Lung-PET-CT-Dx.
Table 7 shows the testing results on the NSCLC Radiogenomics dataset. Under
scale, compared with other methods, the improvements of IDMAN in PSNR and SSIM are 0.076~14.016 dB and 0.02~16.23%. Under the
scale, the improvements are 0.05~11.598 dB and 0.02~17.36%. Additionally, under the
scale, the improvements are 0.062~10.456 dB and 0.01~18.36%.
Table 8 shows the results for the Lung-PET-CT-Dx dataset. The improvements of PSNR are 0.097~16.329 dB, 0.094~12.939 dB and 0.055~11.09 dB for
,
and
scaling factors. The improvements of SSIM are 0.01~14.46%, 0.03~16.5% and 0.02~18.45%, respectively. It can be seen that, for both testing sets, our IDMAN almost achieves the best performance with all scaling factors and improved to different degrees.
Furthermore, we compared the convergence curves of our proposed IDMAN and suboptimal RCAN. As shown in
Figure 7, we visualized the results of PSNR and L1 loss.
Figure 7a shows the curves of PSNR during training. It can be seen that the value of PSNR gradually stabilizes and no longer increases significantly during the training up to 200 to 300 epochs. Before approximately 200 epochs, the convergence speed of the IDMAN was slightly faster than RCAN, and the PSNR value of IDMAN mostly outperformed the other. As can be seen in
Figure 7b, the loss shows an overall decreasing trend as the training epoch increases. Before about 100 epochs, the loss of RCAN decreases more rapidly, but after about 200 epochs, the loss of RCAN and IDMAN tend to become stable.
Visual Results: In order to analyze the reconstructed CT image from the subjective visual effect, we select several groups of CT image for a contrast display in
Figure 8,
Figure 9 and
Figure 10.
On the whole, The IDMAN proposed in this paper can reconstruct clearer edges and more realistic textures. Under the
scale SR, as shown in
Figure 8a, it is clear that the image reconstructed by IDMAN has a more realistic texture and a better visual effect. In
Figure 8b, the reconstructed SR image can restore more truthful details than other methods. In
Figure 8c, the image reconstructed by the contrast method is blurred, while the reconstructed result by our method has sharpener edges and more detail. In
Figure 8d, the reconstructed image has a clearer and smoother edge. Under
3 scale SR, from
Figure 9a, it can be seen that the structure of the reconstructed image by other algorithms is more unclear, while the reconstructed result of IDMAN is more realistic. In
Figure 9b, our reconstructed image has a more restored effect with clearer and more accurate contour lines, compared to other methods. Under the
scale SR, we can observe, from
Figure 10a, that the result of our method can better restore the original information and is more similar to the original image. In
Figure 10b, it is clear that IDMAN can reconstruct a clearer edge but the images of other methods appear more blurred. Our proposed model achieved better results in the CT image dataset. The reconstructed CT image of our method has more details, and PSNR and SSIM also achieved higher scores.
It can be clearly observed that the reconstruction effect of SRCNN with only three layers is the worst among these methods based on deep learning, with problems such as blurring, artifacts, a lack of detail, unclear edges and so on. The later improved models are becoming more and more complex by deepening the network or using different learning strategies, and the reconstructed results have a clearer structure which is better than SRCNN. Additionally, our IDMAN has a superior learning ability, and the reconstructed results become to have more details and sharper edges. The better visual reconstruction results also prove that we can use information distillation and multi-scale attention to help the network make full use of the feature information more effectively, and capture more information to restore more details.