4.2.1. Effectiveness of RASR for Image SR
To validate the effectiveness of our proposed RASR for SR problem, we compare three SR variants: bicubic up-sampling, our SR module and the proposed RASR method. The quantitative evaluation results of image quality achieved by different variants are shown in
Table 1. Here, we use the Bicubic down-sampling approach and Bicubic + Gaussian blur degradation approach to generate LR images.
We first validate our method on Bicubic down-sampling degradation. On the HRSC2016 datasets, the PSNR value achieved by RASR is 0.89 and 0.13 dB higher than Bicubic up-sampling and SR module, respectively. Similarly, the SSIM value of our RASR is 0.699, and is also higher than the results of two variants. That is because, with the assistance of image rescaling, the SR module in our RASR can acquire more information from the adapted LR images, and thus improve the SR performance. We also verify the effectiveness of our method on the DOTA and NWPU VHR-10 datasets. Specifically, on the DOTA dataset, the PSNR and SSIM of RASR is 26.31 dB and 0.718, and are higher than the bicubic up-sampling (25.89 dB and 0.672) and SR module (26.12 dB and 0.698). On the NWPU VHR-10 dataset, the PSNR and SSIM of RASR is 30.72 dB and 0.870, and are higher than the bicubic up-sampling (30.18 dB and 0.861) and the SR module (30.48 dB and 0.857). The above results demonstrate the effectiveness of our RASR method.
Then, we extend our ablation experiments on more complex scenarios. Specifically, besides bicubicly down-sampling operation, we add Gaussian blur to original HR images when generating LR images. The sizes of Gaussian kernel are set to 21 × 21, and sigma is set to 8. In this way, LR images are more difficult to be super-resolved. The results are shown in
Table 1. It can be observed that the PSNR and SSIM values of different variants on Bicubic + Gaussian blur down-sampling degradation are lower than the bicubic down-sampling degradation. That is because, Bicubic + Gaussian blur degradation lead to more severe information lost in the LR images. Furthermore, on Bicubic + Gaussian blur degradation, PSNR and SSIM values of different variants (i.e., Bicubic, SR module and our RASR) are gradually increasing, which demonstrates the effectiveness and generality of our method.
Figure 3 shows the visualization results of the original HR images and the restored images by different variants, where
,
,
and
mean HR image, bicubic up-sampling image, output of SR module and output of the proposed RASR. Taking bicubic up-sampling image as baseline, we show the residual image between bicubic up-sampling image and the output of SR module (i.e.,
). It can be observed that the major differences between the SR image and bicubicly up-sampled image are located at the edges. Moreover, we show the residual image between the output of RASR and SR module (i.e.,
). The differences are still at the edges. That is, the SR module can super-resolve detail information in edges and our RASR can further restore these edge detail information with the assistance of image rescaling.
To qualitatively illustrate the assistance of image rescaling to image SR, we provide the amplitude-frequency curves of original HR images, outputs of RASR and its variants. We use Fast Fourier Transform to acquire the frequency images. Specifically, we first transfer an RGB image to gray version, then we compute the 2-dimensional Discrete Fourier Transform of the gray image and shift zero-frequency terms to the center of the frequency domain, finally we use the decadic logarithm of frequency amplitude to generate the frequency image. Since the frequency map is symmetrical around the center, we can group it according to the distance between its elements and its center uniformly. We use the pixel distance between each group and the image center as the abscissa value, and use the summation of all amplitude values in each group as the ordinate values to generate the amplitude-frequency graph. Note that we use the decadic logarithm of amplitude to narrow the range of frequency values, and thus show the difference between curves more clearly. As shown in
Figure 4, the frequency can be divided into low, medium and high frequency bands. Information in the low-frequency band is rarely lost, and different SR methods contribute to restoring the information in the medium-frequency band. From
Figure 4 we can find that the outputs of image rescaling is closer to HR images than the outputs of SR module. That is because pseudo M-LR images in image rescaling has more beneficial information than original M-LR images. With the assistance of image rescaling, the amplitude-frequency curve of RASR is beyond the curve of the SR module and is closer to that of image rescaling. Although RASR can restore partial information in the medium-frequency band, information in the high-frequency band is hard to be restored due to the irreversible information lost in image degradation. Moreover, there are still spaces to investigate more innovative SR methods since there are still distances between the curve of HR images and RASR outputs in the medium-frequency band.
4.2.2. Effectiveness of RASR for Ship Detection
To demonstrate the effectiveness of our proposed RASR method for ship detection, we use the outputs of different RASR variants as the inputs of detector, and choose Faster-RCNN (with ResNet50 as backbone) and FCOS (with ResNet101 as backbone) to detect ship targets. The detection performance is shown in
Table 2. Consistent with
Section 4.2.1, we validate on Bicubic and Bicubic + Gaussian blur down-sampling metrics, separately.
We first validate the effectiveness of our method on Bicubic down-sampling degradation. When taking Faster-RCNN as the evaluation detector, on the HRSC2016 dataset, directly using bicubically up-sampled images achieves the lowest detection accuracy. When the SR module is introduced, the value is significantly improved from 0.788 to 0.838. This demonstrates that image SR can provide more useful appearance information than the bicubic up-sampling operation. Furthermore, using the outputs of RASR to detect ship targets can further improve the detection performance, and achieves 0.859 in terms of . That is because, the adaption module can transform the original LR images to informative ones with the guide of image rescaling, and more information is embedded in the outputs of RASR and transferred to the detection network. Note that using the outputs of RASR cannot achieve good performance as HR images (i.e., 0.894) due to the information lost. Similarly, when taking FCOS as the evaluation detector, on HRSC2016 dataset, directly using bicubic up-sampling images achieves 0.478 in terms of . When the SR module is introduced, the value is improved to 0.626. Using the outputs of RASR to detect ship targets can further improve the detection performance to 0.735.
We extend our ablation studies on the DOTA and NWPU VHR-10 datasets. With the bicubicly up-sampled images as input, Faster-RCNN achieves 0.633 and 0.861 in terms of on DOTA and NWPU VHR-10 datasets, respectively. When the outputs of the SR module are fed to the detector, the detection accuracy improves to 0.655 and 0.875 on DOTA and NWPU VHR-10 datasets, respectively. It can be observed that the detection performance can be further improved when the outputs of RASR are used, but is still inferior to the results on HR images. The detection results of FCOS are consistent with Faster-RCNN on the DOTA and NWPU VHR-10 datasets. FCOS achieves 0.513 and 0.822 when taking bicubicly up-sampled images as inputs on DOTA and NWPU VHR-10 datasets, respectively. When the SR module is introduced, the values are improved to 0.597 and 0.852 on DOTA and NWPU VHR-10 datasets, respectively. Using the outputs of RASR to detect ship targets can further improve the detection performance to 0.695 and 0.898 on two datasets, respectively. These consistent results on two detectors and three datasets validate the effectiveness of our RASR method on the ship detection task.
Furthermore, we validate the effectiveness of our method on Bicubic + Gaussian blur down-sampling degradation. It can be observed form
Table 2 that two evaluation detectors with RASR results as inputs both achieve better detection performance than with Bicubic and SR module results as inputs. Take Faster-RCNN as an evaluation detector example, our SR module achieves 0.876 in terms of
, and is 0.091 better than the Bicubic method, and our RASR achieves 0.008 improvements over the SR module. For DOTA and NWPU VHR-10 datasets, Faster-RCNN on RASR results also achieves 0.008 and 0.030 improvements as compared to the results of the SR module, respectively. This demonstrates the effectiveness and generality of our method on more complex scenarios.
We further show the detection performance on targets with different sizes, as shown in
Table 3. Note that here we use Faster-RCNN with ResNet50 as the backbone to validate the influence of our method for targets with different scales. Furthermore, we use bicubic as the down-sampling approach to generate the LR images. It can be observed that the performance improvement on small targets is the largest. Taking the DOTA dataset as an example, compared with using the outputs of SR module, using the outputs of RASR can achieve a 0.086 improvement on small ships, 0.065 improvement on medium targets and 0.053 improvement on large targets, respectively. For the HRSC2016 dataset, Faster-RCNN achieves 0.052, 0.021 and 0.027 improvements in terms of
,
and
when using RASR as pre-processing compared to using the SR module. Moreover, the improvement on small targets is still higher than medium and large targets, respectively. That is because there is more information lost for smaller targets than larger ones. Thus, the recovered detail information using our RASR is more beneficial for small target detection.