A Lightweight Fusion Distillation Network for Image Deblurring and Deraining

Recently, deep learning-based image deblurring and deraining have been well developed. However, most of these methods fail to distill the useful features. What is more, exploiting the detailed image features in a deep learning framework always requires a mass of parameters, which inevitably makes the network suffer from a high computational burden. We propose a lightweight fusion distillation network (LFDN) for image deblurring and deraining to solve the above problems. The proposed LFDN is designed as an encoder–decoder architecture. In the encoding stage, the image feature is reduced to various small-scale spaces for multi-scale information extraction and fusion without much information loss. Then, a feature distillation normalization block is designed at the beginning of the decoding stage, which enables the network to distill and screen valuable channel information of feature maps continuously. Besides, an information fusion strategy between distillation modules and feature channels is also carried out by the attention mechanism. By fusing different information in the proposed approach, our network can achieve state-of-the-art image deblurring and deraining results with a smaller number of parameters and outperform the existing methods in model complexity.


Introduction
Image deblurring and Fderaining are both typical and essential tasks in image restoration research. Blurry images are caused by the movement of objects or camera shaking, and rainy images often distort the color of the background image due to the blocking and refraction of sunlight by rain. Both of theses affect the vision sensors and people's visual perception of detailed image information. Therefore, image deblurring and deraining have become indispensable steps for many computer vision tasks, such as object detection, image classification, and surveillance. However, estimating the blur kernel and rain streaks to restore sharp and clean images from the blurry and rainy ones is difficult since it is a highly ill-posed problem. Most traditional methods [1][2][3][4][5][6] regularize the solution space by introducing some prior information to model the blur kernel and rain streaks. However, because the prior information may not conform to the real scene, the quality of the restored images is not optimal. Recently, some information fusion based methods have also been proposed for image restoration problem. For instance, Zhu [7] utilized a set of artificial • We propose a multi-scale hierarchical information fusion scheme (MSHF) to encode the image with rain and blur. MSHF extracts and fuses the image feature in multiple small-scale spaces, which can eliminate redundant parameters while maintaining the rich image information. • We propose a very lightweight module named feature distillation normalization block (FDNB) which can constantly filter out useless feature channel information. To the best of our knowledge, it is the first time that the distillation network is adopted in image deblurring and deraining tasks. • Two attention mechanism based modules are also presented in the decoding process of our approach to exploit the interdependency between the layers and feature channels, which is termed as Multi-feature fusion module based on attention mechanism (MFFD). Through MFFD, a better information fusion can be achieved to compensate for the potential image detail lost in FDNB.

Related Work
Image Deblurring. In recent years, CNNs based image deblurring has been developed rapidly. Nah et al. [9] proposed a multi-scale CNN method for image deblurring called DeepDeblur, which is based on a coarse-to-fine strategy to restore the sharp image progressively. While it achieved satisfactory results, DeepDeblur contains 40 convolutional layers in each scale without parameter sharing. Zhang et al. [13] proposed a spatial variant neural network that consists of three CNNs and a recurrent neural network (RNN) for dynamic scene deblurring. While this algorithm is effective, it uses many convolutional layers for feature extraction and weights estimation by RNNs, which increase the number of parameters. Tao et al. [10] employed an encoder-decoder structure to propose a scalerecurrent network (SRN). Compared with DeepDeblur, SRN applied the long-short term memory (LSTM) to share weights across scales. Similarly, the parameter sharing scheme is also adopted by Gao et al. [11] to improve the efficiency of their image deblurring network. Zhang et al. [12] investigated a new scheme that exploits the deblurring cues at different scales via a hierarchical multi-patch model and proposed a simple yet effective multi-level CNNs model called Deep Multi-Patch Hierarchical Network (DMPHN). Nevertheless, under the coarse-to-fine scheme, most networks use a large number of training parameters due to large filter sizes. Thus, the multi-scale, scale-recurrent, and multi-patch methods result in an expensive runtime and struggle to improve deblurring quality. In order to solve this problem and further make use of multi-scale mechanisms, Kupyn et al. [16] introduced a feature pyramid network (FPN) [31] to replace the multi-scale strategy, which can handle both semantic information and context detail of the burry image. However, the problem of semantic information dilution occurs in its decoding process.
Image Deraining. Deng et al. [18] constructed a specialized detail repair network that utilizes some well-designed structure detail context aggregation blocks (SDCAB) for image deraining. However, SDCAB only focused on local feature fusion, which may lead its result to be suboptimal. Jiang et al. [19] employed a recurrent manner to capture the global image texture, thus allowing to explore the complementary and redundant information at the spatial dimension to characterize target rain streaks. However, this method required both rainy image and rain strike information as the input of network. Thus, it cannot be applied to most benchmark datasets which only contain rainy images without the rain strike information. Recently, an uncertainty guided multi-scale residual learning network (UMRL) [28] was proposed to learn the rain pattern information at different scales. At the same time, a cycle spinning frame was also used to remove artifacts. While UMRL achieved efficient results, it needs three kinds of images at different scales as input, which inevitably increases the model complexity. Ren et al. [22] provided a better and simpler baseline deraining network by simultaneously considering network architecture, input, output, and loss functions. MPRNet [25] proposed a multi-stage architecture that progressively learns restoration functions for the degraded inputs. Specifically, MPRNet adopted the encoder-decoder architecture to learn context-specific features and then combined them with high-resolution branching to retain local information. However, MPRNet ignored the global information. RESCAN [27] defined heavy rain as the accumulation of multiple rainwater layers. Because rainwater stripe layers overlap each other, it is not easy to remove rainwater completely in a single stage. Therefore, they proposed a deep network combined with a recursive neural network that preserves useful information in the first stage and then facilitates the removal of rainwater in the second stage. However, the multi-stage strategy makes RESCAN suffer from the problem of model complexity.
Attention Mechanism. Inspired by its successful applications in natural language processing, the attention mechanism has been widely used in image processing tasks [19,25,29,[32][33][34][35]. Zhang et al. [32] leveraged the attention mechanism to allow the network to focus on the relationship among spatial image areas. Kuldeep et al. [29] utilized the weighted sum of all location features to selectively aggregate the location feature. Kim et al. [33] learned the correlation between feature channels through residual blocks and spatial channel attention. MSPFN [19] combined pyramid structure [31] and channel attention mechanism [30] to synergistically represent multi-scale rain-pattern information. More recently, Niu et al. [35] proposed a holistic attention network (HAN) to characterize the relationship between network layers. Nevertheless, HAN regarded multiple feature channels of each layer as a whole group and only focused on estimating the correlation between feature channel groups of different layers. Thus, the correlation between the inter-layer feature channels was ignored. Moreover, the 3D convolution in HAN increased the number of parameters and computational burden dramatically.
Distillation Network. The information distillation network is one of the state-of-the-art methods to reduce the number of parameters and achieve a lightweight network architecture. Zheng et al. [36] proposed an information distillation network (IDN) which reduces the computational complexity and memory consumption by channel splitting strategy to downscale the feature maps. Based on IDN, a fast and lightweight information multidistillation network (IMDN) [37] was presented. IMDN extracted features at a granular level by applying the channel splitting strategy multiple times and proposed contrast-aware channel attention (CCA) to connect the extracted features. However, Liu et al. pointed out that IMDB is still inflexible and inefficient [38]. As a result, they introduced the residual feature distillation block (RFDB), which utilizes the feature distillation connection instead of the channel splitting strategy. Thus, RFDB can improve its performance without introducing additional parameters. While the distillation network has already been employed in some computer vision problems, there are few studies adopt it for image deblurring and deraining. Furthermore, the hierarchical information of different distillation layers is also neglected in the existing methods. Finally, as shown in StyleGAN [39], normalization is very important for low-level vision tasks, but none of the existing distillation methods contains a normalization layer.

Overview
The overall architecture of the proposed LFDN is shown in Figure 2, where the green block represents multi-scale hierarchical fusion module (MSHF), red block represents pixelshuffle up-sampling, and ⊕ represents the operation of elementwise addition. Our LFDN is based on an encoder-decoder structure and consists of two main parts. The feature of blurry or rainy image is firstly extracted by a convolution layer with kernel size as 3 × 3 and step size as 1. Then the extracted feature map is input into the multi-scale hierarchical fusion module (MSHF) in Part I for down-sampling, which acts as the encoder of our model. Through MSHF, the input will be encoded into several small-scale features, and the loss of information will be compensated by the information fusion between different layers. Part II is the multi-feature fusion distillation module (MFFD), which includes several feature distillation normalization blocks (FDNBs) and two attention fusion mechanisms. This part makes the network have the ability to screen useful features by a finer-grained feature extraction strategy while reducing the number of parameters. After part II, it comes to the process of decoder, which combines Residual Convolution block [40] and pixel-shuffle up-sampling block [41] to enlarge the size of feature map to obtain restored image.

Multi-Scale Hierarchical Information Fusion Scheme (MSHF)
As shown in Figure 3, MSHF is a down-sampling module that serves as the encoder in our approach. First, we denote the input feature map as f 0 , and its size is W × H × C. Then, the intermediate features f i (i = 1, 2, 3, 4) of f 0 are progressively extracted using downblock modules [34]. The downblock module consists of two convolution layers, one has a kernel size of 3 × 3, a step size of 2 to make the feature map down-sampled, and the other has a kernel size of 3 × 3, a step size of 1 to make the feature map resampled. Each downblock module [34] will reduce the size of the feature by half and the resampling in it will further refine the feature. After each downblock module for down-sampling, we simply use resblock [40] to realize residual learning. In addition, we fuse different scale features in the small-scale space. The small-scale features after residual learning are up-sampled by pixel-shuffle up-sampling [41] and then elementwise added (denoted by ⊕) with the features of adjacent layers. Finally, the more fine-grained feature on a small scale can be obtained.
Many existing methods [11,12,16,35,36] carried out multiple complex convolution operations at each downsampling layer to prevent the network from losing important detailed information, which will overload the network with large parameters. In our work, we only carry out a small number of feature extraction operations and then fuse multi-scale hierarchical information in small-scale space. This strategy can effectively reduce the computational cost. The overall algorithm of the proposed MSHF is given in Algorithm 1. After MSHF, a MFFD module is proposed to make the network continuously filter the useful channel feature information and eliminate the useless disturbance information. Generally, for reconstructing a sharp image well, it is necessary to increase the number of convolutional layers in the network so that the receptive field can be enlarged to get more information. However, this strategy is not a good choice in practice due to it dramatically increases the number of parameters, which will make the network converge slowly. In order to further reduce the network burden and pursue a lighter and faster network, it is extremely important to extract features through a distillation block. In our network, feature distillation normalization blocks (FDNB) are proposed for the first time in image deblurring and deraining method to extract useful features progressively. As shown in Figure 4, FDNB is mainly composed of the convolution layers with convolution kernel sizes as 1 × 1 and 3 × 3, normalization block, and contrast-aware channel attention layer (CCA-Layer). Before each convolution layer, we perform channel segmentation on the input feature, which divides the feature into two parts. In the first part, the distilled feature f c1_j can be obtained by the 1 × 1 convolution. The 1 × 1 convolution aims to reduce the number of feature channels and the parameters. Then, we add a normalization layer [39,42] after 1 × 1 convolution. The normalization is designed to ensure the stability of network training and make the network converge more easily. In addition, since normalization can alleviate the dependence of a model on certain dataset, it also helps to improve the generalization ability of our network [43]. In the other part, through the 3 × 3 convolution, the coarse feature f c3_j is obtained. At the end of FDNB, the CCA-Layer is utilized to concatenate all the distilled features. Specifically, the operation of contrast allows us to obtain feature mappings of multiple spatial vectors. Through a series of FDNBs operations, the performance of our network can be steadily improved. Overall, we can describe the process of FDNB more clearly with the following Equations:

Algorithm 1 Multi-scale Hierarchical fusion Algorithm
where f in is the input feature of FDNB, F s denotes the channel splitting operation, Conv i represents i-th convolution layer, f ci_j denotes the feature obtained by i × i Conv in j-th split operation, f nci_j represents the feature obtained by i × i Conv in j-th normalization operation, x indicates residual. CCA indicates the operation of CCA-Layer, [ ] represents the concatenation operation along the channel dimension, f out_i is the output of i-th FDNB.

Fusion Mechanism
Unlike other networks [36][37][38] that simply stack several distillation modules to hierarchically extract features and only employ final features for the specific task, we rethink the distillation block in our proposed approach. We argue that although the multiple distillation modules (i.e., FDNBs) help the network to obtain richer image information with fewer parameters, the correlation between the intermedia features of each FDNB is ignored. At the same time, we also think about the problem of how to make the information of the last layer fully used. Thus, two different attention based feature fusion mechanisms are proposed to improve the representation ability of extracted features. One is the attention layer fusion module (ALFM) to learn the correlation between feature channels obtained by multiple FDNB layers. The other is the attention channel fusion module (ACFM) to describe the dependency between the inter-channel and intra-channel information in adjacent feature channels of the last layer.
The structure of ALFM is shown in Figure 5. Given the groups of feature channels obtained by N FDNB layers with the dimension of N × C × W × H, we first reshape them along the channel and get an NC × WH feature matrix. Then, the feature matrix is transposed and multiplied by itself. In order to normalize the values of attention parameters, the softmax function is utilized in our ALFM. After softmax, we can get an attention matrix to reflect the correlation between channels. At last, we multiply the attention matrix by feature matrix with a scale factor θ and fuse it with the original feature channels by an addition operation. The process can be expressed in Equations (9) and (10): where f g is the input feature channels, ⊗ denotes the matrix multiplication, m ij is the attention matrix, θ is initialized to 0 and it will be automatically optimized by the network.
Here, it should be noted that our ALFM treats the feature channels of each FDNB layer separated rather than as a whole. Thus, the correlation between feature channels from both the same and different layers can be considered. The structure of ACFM is shown in Figure 6. The aim of ACFM is to model the interdependency between feature channels of the last FDNB layer by jointly considering channel and spatial information. Nevertheless, different from Niu et al. [35], who adopted 3D convolution to accomplish this task, our ACFM leverage a Pseudo-3D convolution strategy [44] to reduce the number of parameters. Taking the output of the last FDNB f out_n as input, we first utilize two convolution kernels with the size of 1 × 3 × 3 and 3 × 1 × 1 to capture the spatial and channel correlations. Then, the attention matrix W obtained after the sigmoid function is element-wise multiplied by f out_n . Finally, the weighted f out_n is scaled by a factor α and fused with the original features.
Through the attention mechanism in ALFM and ACFM, a more powerful feature representation can be achieved, which will compensate for the potential information loss in lightweight FDNB.
Overall, the loss function of our network can be expressed by Equation (12): where I Blur/Rain represents the input blurred or rainy image, LFDN represents our network. I is the standard sharp image, w and h are the length and width of the input/output image, respectively.

Experimental Settings
We set the number of FDNB layers in our model as 4. The dimension of the feature maps in our fusion module is set as N = 4, C = 50, W = 320, H = 180. For model optimization, we adopt Adam with momentum as 0.9 and weight decay as 10 −4 . The learning rate is initialized as 10 −4 and decreases with a factor of 10 for every 5 × 10 5 iterations. All experiments were conducted using Pytorch on NVIDIA 2080Ti GPUs. Following other image deblurring and deraining works [19,22,45,46], PSNR, SSIM, model size, and inference time are adopted to evaluate our method. For comparative methods in the experiment, their results are directly quoted from their original papers.

Image Deblurring Dataset
Three benchmark datasets are employed for our image deblurring experiment. GoPro dataset [9] consists of 3214 pairs of blurry and sharp images extracted from 33 sequences captured at 720 × 1280 resolution. The training and testing sets include 2103 and 1111 pairs, respectively.
Kohler dataset [47] consists of 4 images blurred with 12 different kernels for each of them, which is a standard benchmark dataset for evaluating the blind deblurring algorithms.
HIDE dataset [48] has 8422 sharp and blurry image pairs. The images are carefully selected from 31 high-fps videos that contain realistic outdoor scenes containing humans.

Image Deraining Dataset
In our deraining experiment, we use the following synthetic datasets. The images of synthesizing Rain100L [49] and Rain100H [49] are selected from BSD200 [50]. Rain100L is a synthesized data set with only one type of rain streaks. Rain100H is a synthesized data set with five streak directions.
The Rain14000 [51] collects 1000 clean images from UCID dataset [52], BSD dataset [53], and Google image search to synthesize rainy images. Each clean image was used to generate 14 rainy images with different streak orientations and magnitudes.
The RainTest100 [27] consists of 100 images, where 50 images are randomly chosen from the last 500 images in UCID dataset, and 50 images are randomly chosen from the test set of BSD-500 [54] dataset.
In image deblurring experiment, we use GoPro [9] dataset that contains 2103 image pairs for training and 1111 pairs for evaluation. Furthermore, to demonstrate generalization performance of our method, we take our GoPro trained model and directly apply it to the test images of HIDE [48] and Kohler datasets [47]. The detailed information of benchmark datasets involved in our experiments can be seen in Table 1.

Quantitative and Qualitative Evaluation on Deraining Task
Following the prior related work [19,22,24,[26][27][28]55], we adopt the PSNR and SSIM metrics to evaluate the results of different methods for the image deraining task. It can be clearly seen from Table 2 that the proposed method outperforms other approaches on all four datasets. This indicates our LFDN can effectively remove the image degradation caused by the rain. Meanwhile, the experimental results in Table 2 also mean that our method is not dependent on a specific dataset and has a strong generalization ability to achieve good deraining results in various scenarios. Compared to the most recent MSPFN [19] method, the improvement of our method on Rain14000 [51] is very small, but the performance of our LFDN is much better than MSPFN on the other three benchmark datasets. PreNet [22] and MSPFN [19] also achieve good results on RAIN100L [49] and Rain14000 [51]. However, since there is no feature filter and distillation mechanism in them, their performance is inferior to ours. What is more, thanks to the design of ACFM and ALFM in the proposed LFDN, our method outperforms DerainNet [55]. DIDMDN [24] applied a small receptive field to capture small raindrops with small-scale features, while a large receptive field is employed to capture large raindrops with long scale features. However, due to the lack of fusion mechanism, its performance is worse than some other methods. UMRL [28] and RESCAN [27] use a multi-stage guidance model to generate sharp images, which also results in suboptimal network performance. In addition, in terms of computational complexity, our model runs at least two orders of magnitude faster. From the qualitative comparison shown in Figure 7, it can be found that the image recovered by DIDMDN [24] is distorted in color, while the image details of UMRL [28] are still insufficient. However, our method can greatly restore the details blocked by the rain. The obvious improvement of our method over some recent works [19,22,24,[26][27][28]55] may be attributed to the design of MSHF and MFFD.

GoPro Dataset
We compare the deblurring performance of our model with some state-of-the-art methods [9][10][11][12]19,48] in terms of PSNR, SSIM, model size, and inference time on GoPro dataset. The quantitative results are shown in Table 3, and visual comparisons are shown in Figure 8. As shown in Table 3, our method owns a smaller model size (i.e., nearly 1.55 M parameters), which is 300X smaller than DeepDeblur [9]. DeepDeblur exploited a multi-scale CNN to restore sharp image, which may cause the heavy parameters. Through introducing the strategies of parameter sharing [10,11,13,20], GAN [15,16], hierarchical multi-patch [12,45], optical flow [21] and motion offsets [46], the number of parameters can be effectively reduced in other comparison methods. Among them, Gao et al. [11] got the smallest number of parameters because it employed a nested skip connection structure for the nonlinear transformation modules to replace stacked convolution layers or residual blocks. However, the authors of reference [11] only focus on the information of the last block and ignore a lot of semantic information contained in the intermedia layers. Different from [11], we utilize the ALFM to obtain the correlation between feature channels from the same and different layers and the ACFM is used to establish the interdependency between feature channels of the last FDNB layer. With the ALFM and ACFM as the fusion methods, the feature information of the middle layer and channels can be reused. Meanwhile, the fusion methods can reduce the weight of the network while ensuring the quality of image repair. Therefore, the PSNR and SSIM obtained by our LFDN are both superior to Gao et al. [11] and other approaches. For more details about the influence of network structure on the performance and model size of our LFDN, please refer to our ablation experiments in Section 4.4. In the visual comparison, it can be seen that most methods cannot recover the sharp object contour, cars, and persons from severe motion blur. For the second and forth images, we can still observe some noticeable blur artifacts in the results of some methods (such as [10][11][12]). Compared with other methods, our model can recover clearer details effectively.

Kohler Dataset
We report the quantitative results on Kohler dataset in Table 4. Visual comparisons are shown in Figure 9. From these results, we could see that our LFDN achieves the best quantitative (PSNR and SSIM) and qualitative performance. Furthermore, please be reminded that similarly to the GoPro case, the model size and average inference time of our method is still less than others.

HIDE Dataset
To verify the validity of our method, we further evaluate our approach on HIDE testing set [48]. From the experimental results in Table 5 and Figure 10, we can clearly find that the performance of our method is better than others and the deblurred images obtained by our LFDN contains more detailed information.

Ablation Study
In this section, we conduct several experiments to evaluate the effectiveness of each component in our method on both deblurring and deraining datasets: GoPro [9] and Rain100H [49]. First, we replace the multi-scale hierarchical fusion module (MSHF) with traditional convolution layers in our network. In this experiment, the stride of convolution in each layer is set as 2, and two residual models are utilized to resampling the feature. Then, the feature distillation normalization blocks (FDNBs) in the decoding stage of our approach are removed and the same number of traditional residual feature distillation blocks (RFDBs) or CNNs are employed for sharp image reconstruction. Finally, we discarded the attention based information fusion modules (i.e., ALFM and ACFM) and employed a standard concatenation to combine the features obtained by different FDNB layers and feature channels of the last FDNB. Through the above settings, we can get five new network structures, i.e., the proposed LFDN without MSHF, without FDNB (replaced by RFDB), without FDNB (replaced by CNN), without ALFM /ACFM and without all of the above components. In order to fairly compare the proposed model with different structures, we adopted the same parameter setting to train the networks. The results of ablation experiments on the GoPro [9] and Rain100H [49] datasets are summarized in Table 6. As can be seen from Table 6, without any of MSHF, FDNB, or ALFM/ACFM, the image recovery is not optimal. We can find that without MSHF, the value of PSNR decreases by 0.5, and the model size increases by 0.8. This means that MSHF can extract better features with a larger receiver field. Moreover, we can find that performing most of the data computation in the low-scale space is beneficial to reduce the network parameters. The feature fusion in low-scale space is necessary to improve the accuracy. The effectiveness of the FDNB is the most obvious for the network performance. When the FDNB in our model is replaced by RFDB or CNN, its accuracy will be greatly decreased. This is because the RFDB has no ability to normalize the feature maps and CNN cannot filter out useless feature channel information. In order to make the network lightweight without loss of accuracy, we employ ALFM/ACFM mechanism to get finer-grained features. By the ablation experiments, we found that the value of PSNR and SSIM obtained by our model with ALFM/ACFM are higher than without them, which justifies the validity of the proposed ALFM/ACFM mechanism. In addition, we also provide a qualitative visual comparison in Figure 11. In addition, the sensitivity of our LFDN to the parameters θ and α in ALFM and ACFM modules is tested. As we have mentioned in Section 3.3.2, these two parameters are automatically optimized by our network. Through experiments, we find that the optimal θ obtained by our network are −0.0015 and −0.0008 for deblurring and deraining tasks, while the optimal α obtained by our network are 0.1724 and 0.1137 for deblurring and deraining tasks. In order to justify these optimal parameter values, a comparative experiment is carried out to test the performance of our LFDN under a series of manually set parameter values. From the results in Tables 7 and 8, it can be found that the parameter values optimized by the network can achieve the best performance on both deblurring and deraining tasks. Finally, we compare the performances of our LFDN with different numbers of FDNBs. From Table 9, we can find our method performs worse when the number of FDNB layers is small, while too many FDNBs would not further increase its performance. The results also justify that setting the number of FDNB layers as 4 is reasonable in our model.

Conclusions
In this work, we propose a lightweight fusion distillation network (LFDN) for image deblurring and deraining tasks. In order to make the network have fewer parameters and faster speed, the multi-scale hierarchical fusion module (MSHF) and feature distillation normalization block (FDNB) are adopted in the encoding and decoding stages, respectively. Moreover, two attention based modules are also proposed to improve the feature representation power in our approach. Through a large number of experiments on several benchmark datasets, the following points can be found. Firstly, the MSHF can extract and fuse the image feature in multiple small-scale spaces, which effectively eliminate redundant parameters while maintaining the rich image information. Secondly, the FDNB could help to improve the convergence speed and generalization ability of our proposed model. Thirdly, the attention based ALFM and ACFM modules also contribute to enhancing the performance of our model. Finally, the experimental results also demonstrate that our LFDN outperforms some state-of-the-art methods.
In our future work, we will extend the proposed method to some other image restoration tasks (such as image dehazing and denoising) and compare its performance with related approaches. Furthermore, we will also use other image quality assessment (IQA) methods to evaluate the performance of our method.