Deep learning has been widely applied to image super-resolution (SR) tasks and has achieved superior performance over traditional methods due to its excellent feature learning capabilities. However, most of these deep learning-based methods require training image sets to pre-train SR network parameters. In this paper, we propose a new single image SR network without the need of any pre-training. The proposed network is optimized to achieve the SR reconstruction only from a low resolution observation rather than training image sets, and it focuses on improving the visual quality of reconstructed images. Specifically, we designed an attention-based decoder-encoder network for predicting the SR reconstruction, in which a residual spatial attention (RSA) unit is deployed in each layer of decoder to capture key information. Moreover, we adopt the perceptual metric consisting of L1 metric and multi-scale structural similarity (MSSSIM) metric to learn the network parameters. Different than the conventional MSE (mean squared error) metric, the perceptual metric coincides well with perceptual characteristics of the human visual system. Under the guidance of the perceptual metric, the RSA units are capable of predicting the visually sensitive areas at different scales. The proposed network can thus pay more attention to these areas for preserving visual informative structures at multiple scales. The experimental results on the Set5 and Set14 image set demonstrate that the combination of Perceptual metric and RSA units can significantly improve the reconstruction quality. In terms of PSNR and structural similarity (SSIM) values, the proposed method achieves better reconstruction results than the related works, and it is even comparable to some pre-trained networks.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited