The Research on Enhancing the Super-Resolving Effect of Noisy Images through Structural Information and Denoising Preprocessing

: Both noise and structure matter in single image super-resolution (SISR). Recent researches have beneﬁted from a generative adversarial network (GAN) that promotes the development of SISR by recovering photo-realistic images. However, noise and structural distortion are detrimental to SISR. In this paper, we focus on eliminating noise and geometric distortion during super-resolving noisy images. It includes a denoising preprocessing module and a structure-keeping branch. At the same time, the advantages of GAN are still used to generate satisfying details. Especially, on the basis of the original SISR, the gradient branch is developed, and the denoising preprocessing module is designed before the SR branch. Denoising preprocessing eliminates noise by learning the noise distribution and utilizing residual-skip. By restoring the high-resolution(HR) gradient maps and combining gradient loss with space loss to guide the parameter optimization, the gradient branch brings additional structural constraints. Experimental results show that we have obtained better Perceptual Index (PI) and Learned Perceptual Image Patch Similarity (LPIPS) performance on the noisy images, and Peak Signal to Noise Ratio(PSNR) and Structure Similarity (SSIM) are equivalent compared with the most reported SR method combined with DNCNN. Taking the Urban100 dataset with noise intensity in 25 as an example, four indexes of the proposed method are respectively 3.6976(PI), 0.1124(LPIPS), 24.652(PSNR) and 0.9481(SSIM). Combined with the performance under different noise intensity and different datasets reﬂected in box-and-whiskers plots, the values of PI and LPIPS are the best among all comparison methods, and PSNR and SSIM also achieve equivalent effects. Also, the visual results show that the proposed method of enhancing the super-resolving effect of noisy images through structural information and denoising preprocessing(SNS) is not affected by the noise while preserving the geometric structure in SR processing.


Introduction
Image denoising and super-resolution (SR), which aim to reconstruct high-quality images from low-quality observations are fundamental to complex image processing fields such as image Mosaic, 3D reconstruction, and simultaneous localization and mapping (SLAM). Most methods for obtaining realistic results use the generative adversarial network (GAN) [1] but unfortunately, accompany two intractable drawbacks. On the one hand, when the input images contain noise, the results often lose many details. This is because the network learns the complex mapping between high and low for higher perceptual quality and structural quality, but it relies on interaction. SPSR [3] solves the above problems by introducing a contour gradient constraint, which is inspired by the solution of Matins et al. [34]. Nevertheless, the gradient contains only two directions and is very sensitive to noise.
Denoising combined Single Image Super resolution. SR of noisy images has been the focus of research. Singh et al. [35] separate the denoising and SR and obtain the results by combining the SR based on the noise images with the SR based on the denoising images. Laghrib et al. [36] propose a new filter-based algorithm to achieve SR while removing the noise retention edge. Hu et al. [37] propose an SR simultaneous denoising method based on the noise reduction characteristics of the multi-scale Pyramid. These methods do not take advantage of NN. At the same time, they do not make full use of the edge information constraints. Chen et al. [38] use GAN fusing denoising and SR for a task. They take advantage of the residual network to directly set the image with the original image noise map. Similarly, this method does not make full use of the constraints of edge information.
Structural-relevant Method. Structural information has been used in previous works [39,40]. Fattal [41] proposes a method based on the edge statistics of different resolutions. Yan et al. [42] rely on the edge contour sharpness extracted from the gradient description model. Sun et al. [43] rely on the gradient contour to enhance the image sharpness. Liu et al. [44] combine multi-scale image features with fuzzy and defocused features and use a gradient-based Sobel operator to improve the visual effect. These methods use the corresponding parameters of the LR images and the HR images for modelling. The calculation of each pixel involved in the image is complicated. NN has unique advantages in learning the probability variation of pixel distribution. Yang et al. [45] propose to reconstruct the details by adding the edge extracted by the ready-made edge detector into the residual network [14]. Although many methods utilize edge constraints, they are primarily high PSNR oriented.
The purpose of this paper is to use edge information and denoising preprocessing to improve the GAN-based noisy image SR method. We introduce edge constraints to reduce the geometric distortion generated in the process of denoising and GAN-based super-resolving. GAN has excellent performance in data generation. Further, edge information adds geometric constraints to the network that originally contained only spatial constraints. As far as we know, there is currently no method performing the denoising pretreatment before the GAN-based SR and using the structural information to preserve the geometric structure.

GAN
GAN generates samples by learning the data distribution. According to Doodfellow's [1] definition of the GAN, it consists of two parts including generator G and discriminator D. The structure of the GAN used in this paper is shown in Figure 1: Generator G produces images from the input signal to deceive D, and D tries to distinguish the real data between the generated data. For the similarity between the generated image and the real image, the expected results of G and D are opposite. In SNS, we take an LR image (I LR ) as input and generate an SR (I SR ) image given their HR counterparts (I HR ) as ground-truth. This paper generates I LR with parameter θ G of G. The purpose is to generate I SR that is as similar as possible to I HR . The symbol ω can be used to represent the function of parameter optimization. The following formula is used: The overall framework is shown in Figure 2. The generator consists of two branches. One is the SR branch to maintain the structure and the other is the gradient branch. The SR branch contains a denoising preprocessing module for the initial filtering of noise. The SR branch takes I LR as input and I SR is recovered by combining the gradient map of the gradient branch after denoising preprocessing. The SR branch also contains a denoising preprocessing module for noise pre-elimination. The purpose of gradient branching is to super-resolve the LR gradient map to the HR gradient map. The parameters that overlap with the SR branch are fully utilized, and finally, the SR process is guided through the fusion block.

Denoising Preprocessing
DNCNN [2] uses a convolutional network with a depth of 17 to extract image noise, indicating that noise distribution is easier than image features to be learned by the network. On this basis, a denoising preprocessing method for LR noisy images is proposed. In order to maintain the overall computational complexity of the network, seven convolutional layers to extract the noise distribution in the feature domain are used, as shown in Figure 3. Compared with the original network that only deepened the depth of one dense residual block (RRDB), each layer of the convolutional network adopts the convolution kernel of 3 × 3 and takes the modified linear activation unit (ReLU) as the activation function. Denoising features are obtained by using the residual-skip structure between the original feature and the extracted noise to remove the extracted noise distribution, where the coefficient of residual-skip is −1.

Gradient Branch
Inspired by SPSR [3], the gradient branch into the traditional GAN-based SR network is introduced to learn the mapping of LR to HR edge images and the quality of reconstruction is improved. SPSR [3] ignores the direction information and fuses the intensity of the horizontal and vertical gradients as the gradient diagram corresponding to the image. SPSR thinks that the intensity information is sufficient to reflect the sharpness of the image. Similarly, the direction of the gradient is ignored and the intensity of the gradient is just used. However, through comparison, it is found that focusing on the intensity in only two directions will lose the edge details. Therefore, based on SPSR [3], the gradient in the two directions (45 • /135 • ) is introduced additionally as the gradient map in this paper. At the same time, to understand the mapping between the two resolutions, SPSR [3] suggests that gradient images of different resolutions can be converted between images. In order to focus more on the spatial relationship of the contour, the convolution kernel in the direction of 0 • , 45 • , 90 • and 135 • is designed so that the obtained edge image is still close to zero for the most part.
Compared with SPSR [3], an additional gradient in the direction of 45 • /135 • is introduced. The intensity of the gradient in the four directions constitutes the final gradient diagram. The introduction of gradients in the 45 • and 135 • directions increases the sharpness of the image edges. Gradient branching is an additional constraint introduced by SR reconstruction, and the enhancement of edge sharpness also has a positive effect on SR reconstruction. Figure 4 compares the gradient maps in two directions with those in four directions. A wealth of structural information is contained in the layers of convolution of the SR branch, including features at different depths. These features are converted into Gradient information at different depths through Gradient block as input and fused under Gradient branch, which restricts the reconstruction of Gradient branch and makes full use of the network parameters of the SR branch. The reconstructed HR gradient diagram is then integrated into the SR branch to guide SR reconstruction. The final reconstructed gradient results of gradient branching will also be used as part of the objective loss function to optimize the network.

SR Branch
As the input of the SR branch, the denoising preprocessing results will significantly reduce the noise amplified by the SR branch compared with the input without preprocessing. The overall network structure of this paper contains the common SR structure and also integrates the edge information.
For the former, the dense residual block (RRDB) proposed in ESRGAN [31] is adopted. Moreover, every five RRDB [31] outputs from the SR branch into the gradient branch is incorporated. In order to maintain the overall sensitivity and depth of the network, the last RRDB is removed. For the conventional SR task, the last convolutional reconstruction layer is removed and the output feature is input into the subsequent part. The latter is the SR gradient diagram obtained by the above gradient branching. The fusion block fuses the characteristics of the two branches together to realize the fusion of structural information. Figure 5 shows the Dense block in RRDB use in ESRGAN [31].

Loss Function
SR loss: Minimum mean square error (MSE) is widely used to measure the difference between two images to minimize the loss function representing the distance between pixels of two images, which will result in the image failing to maintain visual sharpness. However, its property of easy convergence is still adopted in this paper. The above loss function is expressed by Equation (2): (2) G(.) stands for generator, I LR stands for LR input, I HR and stands for ground-truth. The above formula directly optimizes the difference between pixels without improving the image perception quality. The perception loss proposed by [46] extracts features containing semantic information through a pre-trained VGG network [47]. By comparing the distance between the pixels of the feature map, the result is often satisfactory, as Equation (3): Q(.) donates the ith layer output of the VGG model. The original intention of GAN network design is to rely on the game between the generator and discriminator to improve the performance of the network. For the generator, the following objective function is used: D I represents the discriminator's judgment of the generated image. Gradient loss: In order to introduce an edge constraint in image recovery, the gradient loss is represented by the distance between the gradient diagram extracted from the SR image and the gradient diagram extracted from the corresponding HR image.
Grad(.) stands for the function extracting gradient map.
As a part of GAN, the adversarial loss of the gradient is also a part of the overall objective function: D Grad is the judgment of the generated image gradient. Overall loss: All the above are the target functions of the generator. The discriminator has the following form on the SR branch and gradient branch: (Grad(I HR )))]. (7) For the generator of the SR branch, two parts of the loss are used to supervise the generation of the images. For gradient branching, pixelwise loss l MSE Grad is used. The loss functions of each part in the SR branch and gradient branch constitute the final generator objective function: α VGG , α G , β MSE , β G and γ are the weights of different loss functions in the total loss function. Among these, the α and the β represent the constraints of the SR branch on the SR image and SR gradient image respectively. The γ donates the weight of the gradient branch loss in the total objective function.

Experimental Settings
In order to evaluate the effectiveness of the proposed SNS method, the DIV2K is used as the training set and four common benchmarks are used as the test: Set5 [48], Set14 [49], BSD100 [34] and Urban100 [50]. The LR images are obtained by sampling the HR images with bicubic interpolation and the sampling factor of four is only considered. The noise image is obtained by adding noise to the LR image. The intensity of the added noise is expressed as σ, which is composed of Gaussian noise, salt and pepper noise and Poisson noise. Take the noise with noise intensity σ as an example, including Gaussian noise with intensity σ, pepper and salt noise with intensity σ *0.01 and Poisson noise with intensity σ. Perceptual Index (PI) [50], Learned Perceptual Image Patch Similarity (LPIPS) [51], PSNR and Structure Similarity (SSIM) [52] are selected as the evaluation metrics. Lower PI and LPIPS values indicate higher perceptual quality. The SR branch which borrows from ESRGAN's architecture and RRDB [31] is used as the Grad Block. The noise is added to the LR images through the function in Matlab and the noise intensities are used to measure the noise level. Because it is difficult to train the GAN and the loss function is difficult to converge, a small change will have a significant impact on the results of the network. So in the actual training process, the MSE as the loss function for pre-training and the parameters obtained from pre-training as the initial value of the network for training are still used. In the training, 16 pairs of matching images as training data for each small batch are applied. MSE is set as the loss function in the pre-training, and the training times are set as 400,000 times, the initial learning rate is 0.0001 and the learning rate decreases by 1/10 at 200,000 times. The α VGG and α G are set as 0.01 and 0.005, respectively, by referring to the weight setting of the loss function in SPSR. For the edge loss introduced later, β MSE and β G are also set as 0.01 and 0.005, respectively. To optimize the gradient branch generation function, the = 0.5 is used. In the actual training process, the training times are set as 200,000, the initial learning rate is also 0.0001, and the learning rate decreases to 1/10 when the training rate reaches 100,000 times. Generally, the model will be converged. All the experiments are implemented by Tensorflow on two pieces of NVIDIA GTX 1080Ti GPUs.

Quantitative Comparison
The model used in this article was trained on the equipment we used for about 45 h. This paper focus on the SR task of noisy images, and the denoising preprocessing in the SNS refers to DNCNN [2]. Therefore, the noisy images are preprocessed through DNCNN [2] before comparing with state-of-the-art SR methods, such as SFTGAN [53], ESRGAN [31], IMDN [46], SRMD [47] and SPSR [3]. It should be noted that SNS is not preprocessed by DNCNN [2]. The data in Table 1 are calculated when the noise intensity is 25. This paper artificially adds Gaussian noise, salt-and-pepper noise and Poisson noise to the images. In each row, the best results are highlighted in red and the second best ones are highlighted in blue. It can be seen that SNS achieves the best PI [51] and LPIPS [52] performance for noisy images in all test data sets. At the same time, the values of this paper for PSNR and SSIM [53] in most data sets are close to the best results. In addition, in the SR branch, the denoising preprocessing module designed by this research only adds limited network parameters, which means SNS has better performance than SPSR and ESRGAN in noisy images. In order to measure the performance of the proposed method on different data sets with different noise intensity, the indexes of each method under different datasets with different noise intensity are presented by the box-whisker diagram. For the sake of scale, we put four indicators in two pictures. Lower PI [51] and LPIPS [52] values indicate higher perceptual quality. It can be seen in Figure 6 that under different data sets and noise intensity, the proposed method still achieves the performance of the state-of-the-art stepwise method. Therefore, the results show that the SNS method has the superior ability of less distortion while obtaining good perception quality for noisy images. Table 1. Comparison with advanced super-resolution (SR) networks on benchmark data sets. The best performance is highlighted in red (1st best) and blue (2nd best). The SNS obtains the best PI and LPIPS values as well as the comparable PSNR and SSIM values.   Figure 6. The performance distribution of the proposed method under different data sets and noise intensity.

Qualitative Comparison
The proposed SNS is visually compared with the state-of-the-art SR approach. For the general SR network, we remove noise through DNCNN [2] while SNS processes the noisy image end to end. As shown in Figure 7, the results of this paper are more natural and realistic than the other two-step approach. The experiment is divided into three groups according to the added noise intensity, which are 15, 25, and 50, respectively. In the first image with a noise intensity of 50, the SNS correctly calculated the shape of the window, indicating that SNS could eliminate the noise in the LR image and capture the structural features of the object in the image. At other noise levels, SNS also recovers better textures than the step-by-step SR method. In our results, the noise is completely removed while the distortion free structure is preserved, while other methods cannot display satisfactory appearance objects. The gradient mapping of the image from Set14 is shown in Figure 8. It can be found that the proposed SNS method has minimal structural degradation compared with other methods. The qualitative comparison shows that the proposed SNS method can obtain more structural information from gradient space after denoising pretreatment and generate realistic SR images by preserving the geometric structure.

Denoising Preprocessing
SNS is designed for the SR of noisy images, so the denoising preprocessing is added on the basis of SPSR [3]. The denoising preprocessing can perform the initial filtering of the noise before the SR branch. As shown in Figure 9, it is found that the denoising preprocessing has a significant effect on the SR of the noise-containing image. In the process of SR, the noise in the image leads to the wrong mapping, making the result extremely fuzzy. Figure 9. The image of the above row is the SR result of SPSR [3] in the image containing noise without denoising preprocessing, and the second line is the result containing denoising preprocessing.

Conclusions
In conclusion, this paper proposes a noisy image SR method, which includes a denoising preprocessing module and a gradient branch. Compared with the stepwise denoising method and SR method, SNS has better perception quality and smaller geometric distortion. Combining different data sets and different noise intensity, compared with the advanced stepwise method, the proposed method improves the performance of the PI index by 17%, LPIPS by 11%, PSNR and SSIM by 5% and 6%, respectively. In the generator, we use multiple convolutional layers to learn the noise distribution, which is removed by residual-skip. The introduced gradient branch combines the edge information of the image as a new constraint with the original spatial constraint to improve the quality of the reconstruction of the noisy image. In future work, this research hopes to design a general-purpose lightweight network model for multi-scale SR that can be used on mobile devices.