Super-Resolution of Compressed Images Using Residual Information Distillation Network

: Super-Resolution (SR) is a fundamental computer vision task, which reconstructs high-resolution images from low-resolution ones. Existing SR methods mainly recover images from clear low-resolution images, leading to unsatisfactory results when processing compressed low-resolution images. In the paper, we propose a two-stage SR method for compressed images, which consists of the Compression Artifact Removal Module (CARM) and Super-Resolution Module (SRM). The compressed low-resolution image is used to reconstruct the clear low-resolution image by CARM, and the high-resolution image is obtained by SRM. In addition, we propose a residual information distillation block to learn the texture details which are lost during the compression process. The proposed method has been validated and compared with the state of the art, and experimental results show that the proposed method outperforms other super-resolution methods in terms of visual effects and objective evaluation metrics.


Introduction
Images are generally down-sampled and compressed so as to reduce space consumption and accelerate image transmission.The process of down-sampling causes loss of details in high-resolution images, and the subsequent compression also brings undesirable artifacts such as block effects.Reconstruction of images from down-sampled and compressed ones is therefore important for reducing storage consumption while retaining image details when viewing.
The key problem of compressed image super-resolution is to eliminate compression artifacts and preserve image details while increasing image resolution.The currently proposed deep learning-based super-resolution algorithms target uncompressed images, i.e., processed low-resolution images with only down-sampling or blurring degradation process, which leads to the fact that these super-resolution reconstruction algorithms for uncompressed images cannot effectively handle with compressed images.If superresolution is performed directly on JPEG images it will aggravate the block effect and ringing effect, leading to poor visual effect [23].Some algorithms have been proposed for compressed image reconstruction, but most of them are trained by decomposing this task into two independent subtasks and then using a joint level.Experimental results often show more pronounced compression artifacts and severe loss of details.
To address these issues, we propose a novel super-resolution algorithm to reconstruct compressed images.The model input uses three kinds of data: compressed low resolution (C-LR) images as input, and LR and HR images as labels.We propose two modules: compression artifact removal module (CARM) and super-resolution module (SRM).To remove the compression artifacts in C-LR images, a two-stage joint loss function is used.The first of these stages use the LR image as supervised information, which greatly reduces the probability of image scaling errors in the super-resolution stage.To further make the generated images clearer, we propose a residual information-distillation module to learn more about the image features lost in the compression process.Finally, we use peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) to evaluate the image quality, and the experimental results verify the effectiveness of the method.The main contributions of this paper are as follows: Firstly, we design a compressed image super-resolution network consisting of two modules: Compression Artifact Removal Module (CARM) and Super-Resolution Module (SRM), using a joint loss function to train the network.
Secondly, we propose a novel residual information distillation block so as to efficiently learn the image features lost in the compression process.
Finally, experimental results show that the proposed model performs well and obtains better results on common evaluation metrics including peak signal-to-noise ratio (PSNR) than state-of-the-art.

Problem Formulation
The traditional SR problem can be formulated as where X represents the HR image, H denotes the down-sampling and blur kernel, and n represents the additive noise.For super-resolution task for JPEG image, the downscaling process is shown in Figure 1.Compared with the traditional super-resolution problem, the JPEG compression low-resolution image degradation process adds JPEG compression.So (1) can be converted into (2): where Y is the low-resolution image defined by (1), C represents compression kernel, and Z denotes the compressed low-resolution image.This work mainly studies compression artifacts, so we ignore the additional noise.The degradation process of its low-resolution images as shown in Figure 1.Our goal is to generate the high-resolution image X from Z.
To address these issues, we propose a novel super-resolution algorithm to reconstruct compressed images.The model input uses three kinds of data: compressed low resolution (C-LR) images as input, and LR and HR images as labels.We propose two modules: compression artifact removal module (CARM) and super-resolution module (SRM).To remove the compression artifacts in C-LR images, a two-stage joint loss function is used.The first of these stages use the LR image as supervised information, which greatly reduces the probability of image scaling errors in the super-resolution stage.To further make the generated images clearer, we propose a residual information-distillation module to learn more about the image features lost in the compression process.Finally, we use peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) to evaluate the image quality, and the experimental results verify the effectiveness of the method.The main contributions of this paper are as follows: Firstly, we design a compressed image super-resolution network consisting of two modules: Compression Artifact Removal Module (CARM) and Super-Resolution Module (SRM), using a joint loss function to train the network.
Secondly, we propose a novel residual information distillation block so as to efficiently learn the image features lost in the compression process.
Finally, experimental results show that the proposed model performs well and obtains better results on common evaluation metrics including peak signal-to-noise ratio (PSNR) than state-of-the-art.

Problem Formulation
The traditional SR problem can be formulated as where  represents the HR image,  denotes the down-sampling and blur kernel, and  represents the additive noise.For super-resolution task for JPEG image, the downscaling process is shown in Figure 1.Compared with the traditional super-resolution problem, the JPEG compression low-resolution image degradation process adds JPEG compression.So (1) can be converted into (2): where  is the low-resolution image defined by (1),  represents compression kernel, and  denotes the compressed low-resolution image.This work mainly studies compression artifacts, so we ignore the additional noise.The degradation process of its low-resolution images as shown in Figure 1.Our goal is to generate the high-resolution image  from .

Image Super-Resolution
More SISR methods based deep learning have been proposed by researchers and achieved excellent results in image quality assessment metrics and visual effects.Dong et al. [24] originally proposed a three-layer convolutional super-resolution network SRCNN,

Image Super-Resolution
More SISR methods based deep learning have been proposed by researchers and achieved excellent results in image quality assessment metrics and visual effects.Dong et al. [24] originally proposed a three-layer convolutional super-resolution network SRCNN, which implemented feature block extraction and representation, nonlinear mapping, and image reconstruction.Subsequently, Kim et al. [25] alleviated the difficulty of network training by introducing residual learning, and the proposed 20-layer-deep model VDSR learns residual images instead of directly learning high-resolution images.To further fuse the shallow and deep features of the image, RDN [26] used dense connectivity to obtain richer reconstruction information and details.To address the situation that most algorithms do not apply to mobile devices, Hui et al. [27] also proposed a lightweight information distillation network IMDN based on IDNs [28].Overall, deep neural network-based super-resolution methods have good performance, but these algorithms only undergo a single down-sampling degradation process and are not applicable to directly solve the super-segmentation variability problem of compressed images.Therefore, the effectiveness is greatly reduced when applied to compressed images.
Several algorithms have been dedicated to addressing compression artifacts.AR-CNN [29] is based on a deep learning algorithm to address JPEG compression artifacts with four convolutional layers for feature extraction, feature enhancement, nonlinear mapping, and reconstruction in sequence.After that, the CISRDCNN [30] algorithm is invoked for the first time to solve the end-to-end super-resolution for compression image.It can effectively improve the image quality and reduce the compression artifacts.To preserve the functionality of each module and the relevance of the two subproblems, each module is first trained end-to-end individually, and finally, the whole network is trained by joint optimization.
However, compression artifacts removal and up-sampling in the existing SR method are considered two independent stages, which may result in more artifacts or excessive smoothing.Overall, the SR of compression images still needs to be improved.

Proposed Method
A novel super-resolution reconstruction network is proposed for compressed images in this work, and Figure 2 shows the detailed network framework.
richer reconstruction information and details.To address the situation that mo rithms do not apply to mobile devices, Hui et al. [27] also proposed a lightweigh mation distillation network IMDN based on IDNs [28].Overall, deep neural n based super-resolution methods have good performance, but these algorithms o dergo a single down-sampling degradation process and are not applicable to solve the super-segmentation variability problem of compressed images.Theref effectiveness is greatly reduced when applied to compressed images.
Several algorithms have been dedicated to addressing compression a ARCNN [29] is based on a deep learning algorithm to address JPEG compression a with four convolutional layers for feature extraction, feature enhancement, no mapping, and reconstruction in sequence.After that, the CISRDCNN [30] algor invoked for the first time to solve the end-to-end super-resolution for compression It can effectively improve the image quality and reduce the compression artifacts.serve the functionality of each module and the relevance of the two subproblem module is first trained end-to-end individually, and finally, the whole network is by joint optimization.
However, compression artifacts removal and up-sampling in the existing SR are considered two independent stages, which may result in more artifacts or ex smoothing.Overall, the SR of compression images still needs to be improved.

Proposed Method
A novel super-resolution reconstruction network is proposed for compressed in this work, and Figure 2 shows the detailed network framework.

General Framework
This paper aims to leverage compressed LR images to reconstruct HR images.the training process, each sample includes three types of data: the C-LR image, image, and the LR image.The C-LR image is processed by JPEG compression b the LR image.Down-sampling is employed to generate the LR image.It is used ground truth in the first stage.In the second stage, the HR image is taken as th image.The two stages of our method are presented separately as follows.
The method consists of two steps: Compression Artifact Removal Module ( and Super-Resolution Module (SRM).Since the compression operation makes t resolution image lose more image details, reconstructing more image details based LR image is the key of the first stage.In Figure 2, the C-LR image is firstly recons  The method consists of two steps: Compression Artifact Removal Module (CARM) and Super-Resolution Module (SRM).Since the compression operation makes the lowresolution image lose more image details, reconstructing more image details based on the LR image is the key of the first stage.In Figure 2, the C-LR image is firstly reconstructed details lost during image compression by a compression artifact removal module, which consists of feature extraction, residual information distillation blocks, feature fusion layer, and reconstruction block.The main part of this stage is composed of multiple residual information distillation blocks (RIDBs), which are stacked to progressively refine the extracted features.The RIDBs will be described in Section 3.2.Finally, the part uses feature fusion layers as well as reconstruction blocks to reconstruct a clear low-resolution image.
The image super-resolution stage uses essentially the same network configuration as the first stage except for the final sub-pixel layer.Specifically, the second sub-module takes the clear LR image predicted by the first stage as input and outputs the HR image.This stage uses the same RIDB module as the first stage, the main reason is that each layer of the network in this module is good at learning the pixel-level feature representation.The process of low to high resolution image enlargement is achieved by sub-pixel layers, which will significantly reduce the training time since the sub-pixel layer only changes the image size at the last layer.The final model will output high-resolution images.

Residual Information Distillation Block
The proposed model is comprised of residual information distillation blocks, which are stacked to gradually refine the extracted features, where the RIDB structure is depicted in Figure 3.
and reconstruction block.The main part of this stage is composed of multiple r information distillation blocks (RIDBs), which are stacked to progressively refine tracted features.The RIDBs will be described in Section 3.2.Finally, the part uses fusion layers as well as reconstruction blocks to reconstruct a clear low-resolution The image super-resolution stage uses essentially the same network configura the first stage except for the final sub-pixel layer.Specifically, the second sub-m takes the clear LR image predicted by the first stage as input and outputs the HR This stage uses the same RIDB module as the first stage, the main reason is that eac of the network in this module is good at learning the pixel-level feature represen The process of low to high resolution image enlargement is achieved by sub-pixel which will significantly reduce the training time since the sub-pixel layer only c the image size at the last layer.The final model will output high-resolution image

Residual Information Distillation Block
The proposed model is comprised of residual information distillation blocks are stacked to gradually refine the extracted features, where the RIDB structure is d in Figure 3.The RIDB is composed of a residual block (RB) layer, a convolutional layer contrast-aware channel attention (CCA) layer.The input features are split into tw after a channel distillation operation: one part of the features is retained when th part is sent to the next RIDB.The distillation operation compresses the feature ch in a fixed proportion, so 30% of the features are retained in this paper.The lefttained features use a 1 × 1 convolution instead of a 3 × 3 convolution to decrease th ber of parameters while remaining efficient.The split features on the right side en RB for deeper residual learning, where the RB consists of two 3 × 3 convolutions excitation module.Since RB is the body part of RIDB, 3 × 3 convolution can be capture the background information effectively and further refine the features.learn deeper features from the residual learning without introducing any additio rameters.The RIDB is composed of a residual block (RB) layer, a convolutional layer, and a contrast-aware channel attention (CCA) layer.The input features are split into two parts after a channel distillation operation: one part of the features is retained when the other part is sent to the next RIDB.The distillation operation compresses the feature channels in a fixed proportion, so 30% of the features are retained in this paper.The left-side retained features use a 1 × 1 convolution instead of a 3 × 3 convolution to decrease the number of parameters while remaining efficient.The split features on the right side enter the RB for deeper residual learning, where the RB consists of two 3 × 3 convolutions and an excitation module.Since RB is the body part of RIDB, 3 × 3 convolution can be used to capture the background information effectively and further refine the features.RB can learn deeper features from the residual learning without introducing any additional parameters.

Loss Functions
As illustrated in 3.1, the network is comprised of two stages: the CARM and the SRM.The stage 1 aims to reconstruct the C-LR image to the LR image by recovering the corresponding location pixel point in the C-LR and LR image.The objective of this stage can be expressed as making the C-LR image recover to the LR pixel value.Therefore, the L1 loss function is chosen in this stage which is prevalent in pixel-level tasks (image denoising, super-resolution, and image deblurring).The loss can be presented as (3): where H and W respectively denote the height and the width of the C-LR image.F CAR corresponds to the network of the first stage in (3).The intermediate result of the previous stage is reconstructed by super-resolution in the second stage to change the resolution size of the image, such that a clear high-resolution image is obtained.To make the SRM generate more accurate SR, the loss function of stage II inherits the same loss of the previous excellent SRM, so the loss can be presented as Equation ( 4): where s is the scaling factor.For the sake of generalizing most cases, the loss weights of both stages are set equal in proportion in the text, so the general loss function of this algorithm is shown as ( 5):

Dataset and Implementation Settings
The widely used dataset DIV2K in the image super-resolution field is employed in this work.The 1000 high-quality RGB images in DIV2K are divided into 800, 100, and 100 three parts for training, validation, and testing, respectively.In order to obtain the compressed LR image, the HR image is firstly down-sampled by bicubic to generate the uncompressed LR image, and the scale factor in the down-sampling process is set to 2 and 4.After that, the clear LR image is compressed by the JPEG encoder in MATLAB to obtain the compressed low-resolution image.The standard JPEG compression method is used in this experiment, and the compression quality factor QF is set to 20.The image dataset contains HR images, LR images, and C-LR images.The test data were selected from the widely used Set5 [31] and Set14 [32].
During training, the LR image is randomly cropped to 64 × 64 size as the model input, the training epochs is 1500 and the batch size is 32.The input image is randomly flipped horizontally or rotated by 90 degrees to enhance the data.This paper uses ADAM optimizer for optimization training, where β 1 = 0.9 and β 2 = 0.999, the learning rate is 2 × 10 −4 initially and decreased half every 2 × 10 5 training rounds.

Evaluation Metrics
PSNR [33] and SSIM [34] are utilized to measure the effectiveness of the SR methods, where all values are calculated after converting the image of the RGB channel to the color space of the YCrCb channel for the Y channel is calculated.The full-reference image evaluation metric PSNR is widely leveraged in image restoration tasks such as SR and deblurring, which calculates the magnitude of the global pixel error between the original and reconstructed image to measure the image quality.The image similarity is evaluated by SSIM, which combines brightness, structure, and contrast.SSIM takes 0 to 1, where the high value represents the significant similarity.

Experimental Results
To verify the performance of the proposed method, this paper compares with the state-of-the-art works including Bicubic, ARCNN [29], VDSR [25], RCAN [26], IMDN [32], and CISRDCNN [31].These algorithms are tested on Set5 and Set14 datasets using PSNR and SSIM as evaluation metrics.To ensure the fairness of the comparison experiments, the models and codes for the comparison experiments are obtained from the URLs provided in the relevant papers, and the test datasets are compressed.
The experimental results of each algorithm when the down-sampling factors are 2 and 4 are presented in Table 1.In particular, the bold font indicates the optimal results for each item.Obviously, the CNN delivers the best results on both PSNR and SSIM when the down-sampling factor is 2 and 4, with some improvement compared to Bicubic.VDSR and IMDN use the residual blocks to learn high-frequency information from the image and obtain better visual results while improving the network efficiency.In terms of PSNR and SSIM, the proposed method surpasses other methods on down-sampling factors of 2 and 4, meaning that our strategy is able to reduce noise well and recover more high-frequency details in more complex environments.In comparison with the second-best approach, our method can improve up to about 0.9 and 0.1 dB in PSNR and SSIM, with the largest gain coming from the set5 dataset, which is arguably the most frequently used set of data in these experiments.
In addition, many noise and block artifacts occurred in the images, which may contain clues to recover their textures and details.It also shows that methods with high performance can successfully remove noise and can distinguish between noise and artifacts, which can be used to retain some complex textures and high-frequency details.
In this paper, two images from the dataset Set5 are selected for comparison of the experimental results with a down-sampling factor of 2 and a QF of 20. Figure 4 represents the visual reconstruction effect of each algorithm on the same image.For better observation of comparative effects, a local area of the image is given in the figure for the comparison demonstration of various algorithms.Table 2 represents the PSNR and SSIM for each algorithm on these two images.It is clear from the figure and the table that Bicubic has a more noticeable compression micro-shadow and serious loss of image details than other deep learning-based algorithms, and the image lash details are difficult to observe.The reconstructed images of ARCNN and VDSR also have compression artifacts, the IMDN algorithm is capable of removing the majority of the compression artifacts, but the image becomes smooth.In contrast, the reconstructed images of the proposed method have the least compression macrophages and more details are preserved, which has the most effective visual effect.

Ablation Study
The performance of the first stage CARM is tested here to demonstrate the effectiveness of our strategy.In this section, the paper removes the CAR module from the model to verify its effectiveness.The C-LR image distortion is super-resolved directly, the network architecture of the SRM remains unchanged, and L1 loss is still employed for training.From Table 3, it can be seen that both PSNR and SSIM metrics are significantly improved after using the CARM.The reason is that, when the experiment is performed directly on the compressed image for super-resolution reconstruction, it leads to the micro-shadow and noise generated by the compression process is amplified in the super-resolution process, which results in undesirable visual effects.A single model cannot combine the recovery task and

Ablation Study
The performance of the first stage CARM is tested here to demonstrate the effectiveness of our strategy.
In this section, the paper removes the CAR module from the model to verify its effectiveness.The C-LR image distortion is super-resolved directly, the network architecture of the SRM remains unchanged, and L1 loss is still employed for training.From Table 3, it can be seen that both PSNR and SSIM metrics are significantly improved after using the CARM.The reason is that, when the experiment is performed directly on the compressed image for super-resolution reconstruction, it leads to the micro-shadow and noise generated by the compression process is amplified in the super-resolution process, which results in undesirable visual effects.A single model cannot combine the recovery task and the SR task, so the CARM can perform supervised learning using LR to learn the mapping from C-LR to LR, thus tabulating better results in the reconstruction stage.

Conclusions
This paper proposes a two-stage SR reconstruction approach to address the reconstruction of HR images from compressed LR images.The network consists of CARM and SRM and incorporates residual information distillation blocks to extract hierarchical features.Extensive experiments are conducted on real low-quality images and the results have shown that our method obtains better high-resolution images and better performance on objective evaluation metrics in comparison with the state-of-the-art.We demonstrate the application of the method in this paper to compressed images, allowing users to view clear images as well as facilitating post-visualization tasks of the images.The application of the algorithm can be extended to images and videos of other compression standards, such as jpeg2000 and HEVC.However, this task is still challenging because our method and the algorithms proposed so far still cannot accurately reconstruct the full texture of compressed images, which points to a direction for future research, which is to use generative adversarial networks to solve this problem.

Figure 1 .
Figure 1.Degradation process for compressed LR images.

Figure 1 .
Figure 1.Degradation process for compressed LR images.

Figure 2 .
Figure 2. The overall pipeline of the proposed super-resolution reconstruction network.

Figure 2 .
Figure 2. The overall pipeline of the proposed super-resolution reconstruction network.

3. 1 .
General FrameworkThis paper aims to leverage compressed LR images to reconstruct HR images.During the training process, each sample includes three types of data: the C-LR image, the HR image, and the LR image.The C-LR image is processed by JPEG compression based on the LR image.Down-sampling is employed to generate the LR image.It is used as the ground truth in the first stage.In the second stage, the HR image is taken as the label image.The two stages of our method are presented separately as follows.

Figure 4 .
Figure 4. Visual comparison of different methods in Baby and Bird.

Figure 4 .
Figure 4. Visual comparison of different methods in Baby and Bird.

Author Contributions:
Conceptualization, J.L., Y.C. and N.L.; methodology, Y.Z. and C.Y.; software, Y.Z. and C.Y; data curation, J.L. and Y.C.; writing-original draft preparation, Y.Z. and C.Y.; writing-review and editing, J.L., N.L., Y.C. and C.Y.; investigation, N.L. and Y.C.; supervision, J.L., N.L., Y.C. and C.Y.; validation, Y.Z. and J.L.; funding acquisition, Y.C. and J.L.All authors have read and agreed to the published version of the manuscript.Funding: This research was funded by the Collaborative Innovation Major Project of Zhengzhou (grant number 20XTZX06013) and the National Natural Science Foundation of China (grant number 61972092).

Table 1 .
The comparative results on PSNR and SSIM.

Table 2 .
Comparison of the results on PSNR and SSIM in Baby and Bird.

Table 2 .
Comparison of the results on PSNR and SSIM in Baby and Bird.

Table 3 .
Comparison of results for modules with and without CAR.

Table 3 .
Comparison of results for modules with and without CAR.