Image Restoration Based on End-to-End Unrolled Network

: Recent studies on image restoration (IR) methods under unrolled optimization frameworks have shown that deep convolutional neural networks (DCNNs) can be implicitly used as priors to solve inverse problems. Due to the ill-conditioned nature of the inverse problem, the selection of prior knowledge is crucial for the process of IR. However, the existing methods use a ﬁxed DCNN in each iteration, and so they cannot fully adapt to the image characteristics at each iteration stage. In this paper, we combine deep learning with traditional optimization and propose an end-to-end unrolled network based on deep priors. The entire network contains several iterations, and each iteration is composed of analytic solution updates and a small multiscale deep denoiser network. In particular, we use different denoiser networks at different stages to improve adaptability. Compared with a ﬁxed DCNN, it greatly reduces the number of computations when the total parameters are equal and the number of iterations is the same, but the gains from a practical runtime are not as signiﬁcant as indicated in the FLOP count. The experimental results of our method of three IR tasks, including denoising, deblurring, and lensless imaging, demonstrate that our proposed method achieves state-of-the-art performances in terms of both visual effects and quantitative evaluations.


Introduction
Image restoration (IR) is a classical topic in the field of low-level image processing. Digital images are always degraded during the acquisition process, with issues such as electronic noise caused by the thermal vibration of atoms and blur caused by camera shake [1,2]. Therefore, image restoration is of great significance and is widely used in a variety of applications, e.g., smartphone imaging, medical imaging, and remote sensing. The purpose of image restoration is to recover an unknown latent image from a corrupted observation. In general, IR is an ill-posed inverse problem. The mathematical degradation model can be written as y = Ax + n, where y and x are the degraded measurement and the clean image, respectively. n denotes the additive noise, which is generally assumed to be additive white Gaussian noise (AWGN), and A denotes the system degradation matrix. Although IR problems have been extensively studied, they are still full of challenges due to the large amount of natural image contents [3] and the diversity of degradations.
Over the past few decades, many methods have been proposed to tackle IR problems, including denoising [4][5][6][7][8][9][10], deblurring [11][12][13][14][15], image super-resolution [16][17][18][19][20], and lensless imaging [21][22][23][24]. Recently, the rapid development of deep learning technology has injected new vitality into IR research. Many works based on deep convolutional neural networks (DCNNs) have achieved excellent results [10,23,25]. From the point of view of linear algebra, the reason leads to ill condition of IR is that the null space of A is nonzero. Because of this, the prior knowledge of the image is crucial, and we need to select a good estimation of the latent image from the solution space according to this prior knowledge. A common approach is to establish a cost function by maximizing the posterior probability P(x|y): x = arg max x log P(y|x) + log P(x) (1) where log P(x) represents the prior information of the latent image, and log P(y|x) denotes the log-likelihood of the measurement. log P(y|x) is derived from statistical model of noise, such as 2 − norm to Gaussian noise and 1 − norm to Laplacian noise. Under the AWGN model, the cost function can be reformulated as: where φ(x) = −log P(x) is a regularization term. There are two paths for obtaining the solution of Equation (2). One involves model-based methods [26][27][28], and the other involves learning-based methods [29][30][31][32]. The former methods reduce the value of the cost function gradually through the optimization principles, whereas the latter methods mainly rely on DCNNs and datasets. These two types of methods are described in more detail in the following four paragraphs. Among model-based methods, a large number of models based on various priors have been proposed, including the frequently used total variation (TV) prior [33]; sparse representation prior [34] and dictionary learning [7,35,36]; nonlocal means prior [37] and nonlocal self-similarity (NLSS) [6,38]; the low rank approximation prior [39,40]; and the Markov random field (MRF) [41,42]. The characteristics of various priors are as follows.
The well-known TV prior works well with images with simple textures, but it can lead to distinct blurring in complex areas with rich details. The sparse prior model can represent local image patches as a few atoms in some domains, such as the DCT basis and the discrete wavelet transform basis [43]. Compared with the analytical dictionaries, learned dictionaries have a stronger adaptive ability to represent image patches, and they can deal with various tasks more flexibly [3]. In the past two decades, sparse models have made outstanding contributions to IR, and a large number of IR algorithms based on them have been proposed [4,6,7,16,26,34]. Using the redundant information in an image, the nonlocal means can eliminate Gaussian noise well [44]. A more efficient and robust solution is to apply the NLSS to the sparse prior model [26]. Another powerful prior is the low-rank approximation because a matrix with many nonlocal similar patches is essentially of low rank [39]. The soft threshold function can be used to solve this problem easily and quickly [45]. WNNM [40] improves the flexibility of nuclear norm and achieves good results in terms of both visual effects and quantitative evaluations. Combined with the MRF model, the Bayesian optimization framework is applied to low-level vision [41,46,47]. Although the MRF can learn a generic prior that can represent the statistics of natural scenes, its complexity is high. It is also difficult to carry out a physical interpretation. In summary, model-based methods can deal with all kinds of visual problems flexibly, but they usually incur high time and computational costs, and their effects are not as good as those of the popular DCNNs.
With the rapid development of deep learning, DCNNs have blossomed in the field of low-level vision. Many learning-based methods [8][9][10]29,30,48,49] have been applied to denoising tasks. Burger et al. [29] proposed using a plain multilayer perceptron for denoising. This work showed the great potential of neural networks because a simple perceptron was shown to be able to achieve the effect of a traditional well-known BM3D denoiser [4]. With batch normalization, the DnCNN [30] was established to provide an end-to-end residual learning network for predicting residuals to indirectly eliminate noise. CBD-Net [8] separated noise estimation from nonblind denoising and used two sub-networks to complete these two functions separately. RIDNet [9] exploited channel dependencies by using feature attention and built a blind real image denoising network under a modular architecture. SADNet [10] introduced the deformable convolution to implement spatial adaptive denoising, which can achieve reconstruction with a high signal-to-noise ratio while effectively maintaining spatial texture and edges of the image. Zhang et al. [49] provides a novel and efficient RDN that achieves superiority over comparative methods for several image restoration tasks.
In addition to noise removal, DCNNs are also widely used in the fields of superresolution and lensless imaging. The SRCNN [18] mapped low-resolution patches to highresolution patches by three layers of convolutions. After that, many deep networks were proposed for super-resolution, such as the ESPCN [19], DRRN [20], VDSR [32], and SRGAN [25]. DCNNs have also performed well in the field of lensless imaging. Nguyen et al. [22] used a DCNN to restore lensless images and protected privacy. Khan et al. [23] first carried out model fitting and then used a DCNN to improve image quality.
Although learning-based DCNNs can quickly complete high-quality image reconstruction on GPUs after training, they usually lose the flexibility inherent in model-based methods. Additionally, the improvement in reconstruction quality is only due to the strong fitting ability of pure DCNN. Therefore, hybrid IR methods under unrolled framework were proposed [50,51]. Section 2.2 describes more details about IR methods under unrolled framework. These kinds of methods combine traditional methods with DCNNs, and the advantages of both can be exploited. When solving the inverse problem, they can incorporate the physical models of systems into the networks. First, this structure can make full use of the prior knowledge of the systems, such as the observation matrix in compressed sensing and the point spread function in deconvolution. In addition, the main function of the network is to learn a prior, rather than the whole inverse operation. Thus, unrolled structure has lower requirements on the network. Additionally, the functions are easier to achieve. Furthermore, they can improve the reconstruction quality compared with pure DCNNs. In spite of their wide applications in low-level vision tasks, there is still room for improvement in terms of optimization and the networks. For example, the existing methods usually use a fixed DCNN in each iteration, and so they cannot fully adapt to the image characteristics at each iteration stage. The gradient descent is slower for the convex problem.
In this paper, we propose a deep denoiser-based unrolled network that combines DCNNs with optimization to exploit the advantages of both. The entire end-to-end network can be unfolded into several analytic solution blocks and the subsequent small deep denoiser networks. All the parameters are learned through training. On the one hand, we solve the convex problem in the form of an analytic solution that is faster than gradient descent which usually requires multiple iterations because the solution is not accurate. On the other hand, each small deep denoiser network adopts a structure with an encoder and a decoder to capture multiscale information from the image. The small deep denoiser networks are also different at different stages so that they can better adapt to the image characteristics at each stage. Compared with a fixed DCNN, it greatly reduces the number of computations when the total parameters are the same and the number of iterations is the same. The experimental results of several IR tasks, including denoising, deblurring, and lensless imaging, demonstrate that our approach is effective and computationally efficient. The visual effects and the objective evaluations indicate that our network achieves excellent performances in high-quality image reconstruction.
The remainder of this paper is organized as follows. Section 2 reviews related works. Section 3 introduces our proposed method. Section 4 shows the numerical results of several IR tasks, and Section 5 concludes this paper.

Related Work
The unrolled network we proposed is mainly derived from two aspects: deep learning and denoiser-based IR methods under unrolled optimization. In this section, we briefly review these two aspects.

Deep Learning
With the rapid development of computing power, deep learning technology has led to many breakthroughs in the field of vision, including low-level denoising [10,30,48], deblurring [15,31], super-resolution [32,52] and high-level recognition [53], segmentation [54]. The image textures are richer under the generative adversarial networks (GANs) [55]. The perceptual loss takes advantage of high-level abstract features that enhance the details of super-resolution images [56]. Additionally, the training method for networks has been developed, such as batch normalization [57], gradient clipping [58], and Xavier initialization [59].

IR Methods under Unrolled Optimization
By decoupling the deconvolution and denoising, the original complex regularization term can be transferred. Many denoiser based IR methods have been proposed [11,50,[60][61][62] that can integrate the strengths of model-based methods and DCNNs. Under the framework of half-quadratic splitting (HQS), a new auxiliary variable z is introduced to Equation (2). Therefore, the IR problem can be written in the following form: where µ is the penalty parameter. The above equation can be solved alternatively by: As we can see, the former equation is a convex equation while the latter is a proximal operator with special regularization parameters. In practice, we usually treat it as a denoising problem that can avoid the explicit expression of priors and there are many successful solutions at present. The decoupling can also be achieved through the ADMM [63]; the principle is similar and will not be repeated here. IR methods under unrolled optimization frameworks can be divided into two categories: deep unfolding networks and plug-and-play. The former type is an overall end-to-end network, while the latter is not.
Plug-and-play methods can be flexibly applied to various tasks [64,65] by using one well-trained denoiser. In [11], the well-known BM3D denoiser was used for deblurring based on the generalized Nash equilibrium. The CBM3D denoiser was used for single image superresolution (SISR) [17]. In [66], Brifman et al. realized SISR by using the NCSR denoiser [26] which combines the traditional sparse prior with the NLSS. Additionally, the results were better than those of original NCSR. The TV prior and BM3D prior were used for Fourier ptychographic microscopy [67]. In [68], the denoising-based IR method with the ADMM was used for electron microscope imaging. Additionally, the state-of-the-art DCNN denoiser prior was used in IR tasks [69]. Zhang et al. [61] trained 25 CNN denoisers at different noise levels. There are some theoretical analyses on this topic. Sreehari et al. [68] analyzed the convergence of the plug-and-play approach when the denoiser is a symmetric smoothing filter. Chan's algorithm with a bounded denoiser [70] was proven to be convergent. Ryu's work theoretically established the convergence of the PnP-ADMM algorithm when the denoiser has a certain Lipschitz condition [71]. Although the plug-and-play technique has achieved start-of-the-art results, it usually requires multiple iterations. The proposed SISR solver in [66] iterates 35 times, and the IRCNN [61] takes 30 iterations to deblur.
In response to this problem, the end-to-end deep unfolding network consisting of only a few iterations was proposed for IR [51,62]. Zhang et al. [72] proposed a deep network for compressed sensing reconstruction. Dong et al. [51] proposed an end-to-end approach named as the DPDNN. In [51], the whole iterative process was carried out six times, where the same denoising network was called each time. Despite the small number of iterations, its effect was still outstanding. Jeon et al. [73] achieved high-spectral reconstruction through a deep unfolding network.

Proposed Algorithm for Image Restoration
In this section, we introduce the principle and process of our method in detail. The general form of the analytic solution is given. Additionally, its application and variation to the three IR tasks are discussed in detail.

Our End-to-End Unrolled Network
Generally, the goal of the IR task is to obtain an output with a lower cost. In our method, the cost function is described in Equation (3). The HQS method converts our cost function into two sub-problems, as described in Equation (4). In this way, the two sub-problems are easy to solve. The first convex equation in Equation (4) can be solved by gradient descent or in the form of an analytic solution. The gradient descent algorithm is a simple and generalized method. The first-order method is often used in various inverse problems because it can obtain a good result after many iterations, and the second-order method is too computationally expensive. As we stated before, the gradient descent algorithm usually requires multiple iterations because the solution is not accurate. This leads to high time costs in traditional methods. The number of iterations is limited by time and space costs in DCNN-based IR methods. In this paper, we solve the first equation in (4) in the form of an analytic solution. By deriving x and setting the derivative equal to zero, the following formula can be obtained: It is evident that matrix inversion is a stumbling block on the road to an analytic solution because of its computational complexity. We use the singular value decomposition of A to reduce the computational complexity of matrix inversion [74] because the cost of the inversion a diagonal matrix is small. Through SVD on degenerate matrix A, the analytic solution can be written in the following form: where A = U A S A V A T . As we can see, the updates of the analytic solution can be calculated quickly and efficiently. The overall framework of our proposed approach is shown in Figure 1a. Since the first equation in Equation (4) is convex, we can obtain the optimal solution for each iteration. This lays the foundation for our end-to-end network consisting of only a few iterations to achieve excellent results. The solution of the latter equation in Equation (4) is a proximal operator, which is as follows: where τ = λ/µ. In this step, the prior information is important. The research on IR methods under unrolled optimization shows that the DCNNs can express image priors implicitly. Combined with the DCNN's strong fitting ability, we use the deep denoiser network to solve the latter problem in Equation (4). We use different deep prior networks at different stages to better adapt to the image characteristics of each stage. The proposed end-to-end deep analytic network based on deep priors is summarized in Algorithm 1. In our method, the number of iterations is set to six. Our deep denoiser networks at each stage are small. Benefiting from the above settings, our overall model is not too large, which allows it to avoid overfitting.

Structure of the Deep Denoiser Network
The well-known U-Net [54] was proposed for medical image segmentation originally. Additionally, it has been widely used in visual tasks due to its excellent effects. Inspired by those works, our proposed DCNNs is a residual learning network with a U-Net structure. The architecture of our proposed deep network is illustrated in Figure 1b. The network is a four-scale U-Net with a soft-threshold function. The first half is a multiscale encoder for feature extraction, and the second half is a multiscale decoder for image reconstruction based on these features. In each scale of the encoder, we use two convolutional layers to encode spatial features and a max-pooling layer to increase the receptive field. The number of channels in the two convolutional layers in the first two scales of the encoder is 32. In the third scale, the number of channels is 64. After three feature extractions, there are two 64-channel convolutional layers at the top of the DCNNs. The kernel size of each convolutional layer is 3 × 3. In each scale of the decoder, there is a transconvolutional layer, a skip layer, and two convolutional layers. The number of channels in the two convolutional layers in the first scale of the decoder is 32. In other two scales of the decoder, the number of channels is 64. The skip layer combines feature maps of the same size to compensate for the loss of spatial details caused by multiple extraction operations. After the decoder, a soft-threshold function is used to shrink the multichannel image. Then, a convolutional layer restores the image to the original color space. Finally, we establish a long residual connection between the input and output, because residual learning is easier to optimize [30] and more robust [51]. Different from the original U-Net, we adopt the leaky ReLU [75] as the activation function.

Structure of the Deep Denoiser Network
The well-known U-Net [54] was proposed for medical image segmentation originally. Additionally, it has been widely used in visual tasks due to its excellent effects. Inspired by those works, our proposed DCNNs is a residual learning network with a U-Net structure. The architecture of our proposed deep network is illustrated in Figure 1b. The network is a four-scale U-Net with a soft-threshold function. The first half is a multiscale encoder for feature extraction, and the second half is a multiscale decoder for image reconstruction based on these features. In each scale of the encoder, we use two convolutional layers to encode spatial features and a max-pooling layer to increase the receptive field. The number of channels in the two convolutional layers in the first two scales of the encoder is 32. In the third scale, the number of channels is 64. After three feature extractions, there are two 64-channel convolutional layers at the top of the DCNNs. The kernel size of each convolutional layer is 3 × 3. In each scale of the decoder, there is a trans-convolutional layer, a skip layer, and two convolutional layers. The number of channels in the two convolutional layers in the first scale of the decoder is 32. In other two scales of the decoder, the number of channels is 64. The skip layer combines feature maps of the same size to compensate for the loss of spatial details caused by multiple extraction operations. After the decoder, a soft-threshold function is used to shrink the multichannel image. Then, a convolutional layer restores the image to the original color space. Finally, we establish a long residual connection between the input and output, because residual learning is easier to optimize [30] and more robust [51]. Different from the original U-Net, we adopt the leaky ReLU [75] as the activation function.

Variation in Three Applications
We have introduced the principles and process of our method and gave the general form of analytic solutions. In this section, we discuss its specific form and variations in three visual problems (denoising, deblurring, and lensless imaging). In the denoising problem, the system degradation matrix A is the identity matrix. In this case, the analytic solution degenerates into the following form: As we can see, the update of the analytic solution is a basic matrix operation, which can be completed quickly and efficiently. For deblurring with a uniform kernel, the former equation in Equation (4) is usually written in a convolutional form: where * denotes a two-dimensional convolution operation. In this situation, the system degradation matrix is a large sparse blurring matrix A. It is not wise to solve the equation in matrix form. Hence, we obtain an analytic solution in the frequency domain based on energy equality, as shown below: where F and F −1 represent the fast Fourier transform (FFT) and the inverse FFT, respectively. M represents the complex conjugate of matrix M. We use Equation (10) instead of analytic updates in Algorithm 1 for deblurring. The third scenario is a lensless imaging problem named FlatCam [21]. In FlatCam, the system model is: where Φ L , Φ R are system transfer matrixes. n denotes noise. Therefore, the former equation in (4) becomes the following: The corresponding analytic solution is as follows: , and the vectors σ L , σ R are the diagonal entries of S L , S R . ||ones || denotes a matrix in which all elements are ones. is the Hadamard product. SVD is carried out in advance, and then the above formula can be calculated efficiently. In the lensless image restoration experiment, Equation (13) is used instead of the analytic update in Algorithm 1.
As shown in [76], the analytic updates of the lensless model y = Φ L xΦ T R also do not introduce singularity and gradient explosion. This lays a theoretical foundation for our network to successfully complete the training process. In the actual training process, we record the PSNR values of the test image for different epochs, as shown in Figure 2. It can be seen that the losses converge gently to a straight line, which confirms the above analysis from the side. be calculated efficiently. In the lensless image restoration experiment, Equation (13) is used instead of the analytic update in Algorithm 1.
As shown in [76], the analytic updates of the lensless model = also do not introduce singularity and gradient explosion. This lays a theoretical foundation for our network to successfully complete the training process. In the actual training process, we record the PSNR values of the test image for different epochs, as shown in Figure 2. It can be seen that the losses converge gently to a straight line, which confirms the above analysis from the side.

Experiments
In this section, we perform experiments on three IR tasks: image denoising, image deblurring, and lensless imaging. We fix and train a model for each specific problem. All models are implemented in TensorFlow [77] and trained on a Linux server with an Intel E5-2678 CPU at 2.5 GHz with 64 GB of memory and four graphic cards (NVIDIA GTX1080Ti) with 11 GB of memory. We train our models through the ADAM optimizer [78] by setting 1 = 0.9, 2 = 0.999, and = 10 −8 . ℓ 2 − is used as the loss function in all experiments, which means that = ‖ 6 − ‖ 2 2 . We train each model for 50 epochs. In addition, the PSNR and SSIM [79] metrics are used for objective evaluation.

Ablation Study
Dong et al. [51] undertook some research into the deep unfolding IR method. Additionally, he performed a comparative experiment of deep unfolding net and pure DCNNs. His results show that the performance of deep unfolding net which combining traditional with deep learning is better than pure DCNNs. On this basis, we conducted five groups of deblurring experiments to show the superiority of the analytical solution under the deep unfolding framework. The datasets and training settings in each group used were

Experiments
In this section, we perform experiments on three IR tasks: image denoising, image deblurring, and lensless imaging. We fix and train a model for each specific problem. All models are implemented in TensorFlow [77] and trained on a Linux server with an Intel E5-2678 CPU at 2.5 GHz with 64 GB of memory and four graphic cards (NVIDIA GTX1080Ti) with 11 GB of memory. We train our models through the ADAM optimizer [78] by setting β 1 = 0.9, β 2 = 0.999, and ε = 10 −8 . 2 − loss is used as the loss function in all experiments, which means that loss = z 6 − G T || 2 2 . We train each model for 50 epochs. In addition, the PSNR and SSIM [79] metrics are used for objective evaluation.

Ablation Study
Dong et al. [51] undertook some research into the deep unfolding IR method. Additionally, he performed a comparative experiment of deep unfolding net and pure DCNNs. His results show that the performance of deep unfolding net which combining traditional with deep learning is better than pure DCNNs. On this basis, we conducted five groups of deblurring experiments to show the superiority of the analytical solution under the deep unfolding framework. The datasets and training settings in each group used were the same. More details are described in Section 4.3. The commonly used 10 images are used for deblurring tests are shown in Figure 3. The results are summarized in Table 1. the same. More details are described in Section 4.3. The commonly used 10 images are used for deblurring tests are shown in Figure 3. The results are summarized in Table 1.  DPDNN is a deep unfolding net that uses gradient descent in the first step and a net in the second step. The DPDNN-AS method replaces the gradient descent in the original DPDNN with the analytic solution and keeps the rest unchanged. From Table 1, the average PSNR is increased over 0.2 dB by using the analytic solution, which demonstrates the advantage of the analytical solution. In addition, the increased PSNR between our network and the DPDNN-AS method shows the power of six small deep prior networks that have stronger adaptabilities at different stages.
We also performed the ablation study results on different iteration numbers of our method. Two groups of deblurring experiments on the Kodak24 dataset with different kernel size were conducted as examples. The results are summarized in Table 2. The comparisons of calculation amount are shown in Table 3. Ours-1, Ours-2, Ours-4, Ours-6 (Ours), and Ours-8 indicate the number of iterations in our method are 1, 2, 4, 6, and 8, respectively. Considering the effectiveness and computation cost, the number of iterations is set to six in our method. Compared with a fixed DCNN (DPDNN), our method greatly reduces the number of FLOPs when the total parameters are the same and the number of iterations is the same.   DPDNN is a deep unfolding net that uses gradient descent in the first step and a net in the second step. The DPDNN-AS method replaces the gradient descent in the original DPDNN with the analytic solution and keeps the rest unchanged. From Table 1, the average PSNR is increased over 0.2 dB by using the analytic solution, which demonstrates the advantage of the analytical solution. In addition, the increased PSNR between our network and the DPDNN-AS method shows the power of six small deep prior networks that have stronger adaptabilities at different stages.
We also performed the ablation study results on different iteration numbers of our method. Two groups of deblurring experiments on the Kodak24 dataset with different kernel size were conducted as examples. The results are summarized in Table 2. The comparisons of calculation amount are shown in Table 3. Ours-1, Ours-2, Ours-4, Ours-6 (Ours), and Ours-8 indicate the number of iterations in our method are 1, 2, 4, 6, and 8, respectively. Considering the effectiveness and computation cost, the number of iterations is set to six in our method. Compared with a fixed DCNN (DPDNN), our method greatly reduces the number of FLOPs when the total parameters are the same and the number of iterations is the same.

Image Denoising
In image denoising, the analytic updates are as in Equation (8) andx 0 = y. To train our network, we built a training dataset from the DIV2K dataset [81]. First, we cut out 27,594 small patches form DIV2K, each of which was 256 × 256 in size. Then, we added zero-mean Gaussian noise with the variance of σ n to these small patches. Finally, we saved the values of these patches as integers from 0 to 255. In this way, the original patch and the patch after adding noise formed a training pair. We performed three groups of experiments for image denoising, and σ n was set as 15, 25, and 50. In these three experiments, the batch size was 32, the initial value of µ 0 was 0.9, and learning rate was 0.0005. The learning rate was halved after every five epochs.
To illustrate the excellent effect of our network, we compare it with the existing modelbased methods-i.e., the BM3D method [4], EPLL method [5], and WNNM [40]-and learning based methods-i.e., the TNRD [60], IRCNN [61], DnCNN-S [30], and FFDNetcl [48]. The BSD68 dataset and the Kodak24 dataset are used for testing. The values of noisy inputs in the test are also clipped to integers between 0 and 255. The images of the above three test sets are tested in grayscale. Table 4 records the average PSNR (dB) and SSIM values of the compared methods on the BSD68 and the Kodak24 datasets. The highlighted results show that our network is better than the compared methods. Among them, the data of TNRD and IRCNN are derived from a published paper [61], and the data of other methods are obtained according to public codes.  Figure 4 shows the denoising result of the well-known image Lena. As we can see, our result is more delicate and maintains more thin lines in the complex area than other results. When the noise level is σ n = 25, our results are also better than the compared results. As shown in the green box in Figure 5, our method works well in the high-frequency region and restores more details and textures than other methods. them, the data of TNRD and IRCNN are derived from a published paper [61], and the data of other methods are obtained according to public codes. Figure 4 shows the denoising result of the well-known image Lena. As we can see, our result is more delicate and maintains more thin lines in the complex area than other results. When the noise level is = 25, our results are also better than the compared results. As shown in the green box in Figure 5, our method works well in the high-frequency region and restores more details and textures than other methods.

Image Deblurring
In order to verify the deblurring ability of our network, we performed five groups of experiments. In these experiments, we used the clear image convolution blur kernel to obtain the data set. Three blur kernels were selected, including a 25 × 25 Gaussian blur kernel with a standard deviation of 1.6 and two motion blur kernels from [80], one of which was 17 × 17, and the other was 19 × 19. For image deblurring, the analytic update in Algorithm 1 is shown in Equation (10), and ̂= . To train our deblurring network, we first convoluted the images in the DIV2K dataset with a blur kernel and added additive Gaussian noise with a standard deviation of . In particular, we used zeros to fill the border of the image during the convolution. The convolution results were saved as integers between 0 and 255. Next, we cut the edge of each blurred image at half the length of the blur kernel and extracted 27,468 patches with size of 256 × 256 pixels from it. We trained five models for five experiments, and the blur settings are shown in Table 4. In these deblurring experiments, the batch size was 32, the initial value of 0 was 0.9, and the learning rate was 0.0005. Similar to the denoising networks, our deblurring networks also reduced the learning rate by half every five epochs. During the test phase, we employed the 10 commonly used images seen in Figure 3 and the Kodak24 dataset as test images. Similarly, the above convolution was used to generate the blurred inputs for the test images. All images were operated at grayscale. With the purpose of demonstrating the excellent performance of our network, we compare it with the classical model-based methods (IDD-BM3D [11], EPLL [5], and NCSR [26]), the denoising-based IR method un-

Image Deblurring
In order to verify the deblurring ability of our network, we performed five groups of experiments. In these experiments, we used the clear image convolution blur kernel to obtain the data set. Three blur kernels were selected, including a 25 × 25 Gaussian blur kernel with a standard deviation of 1.6 and two motion blur kernels from [80], one of which was 17 × 17, and the other was 19 × 19. For image deblurring, the analytic update in Algorithm 1 is shown in Equation (10), andx 0 = A T y. To train our deblurring network, we first convoluted the images in the DIV2K dataset with a blur kernel and added additive Gaussian noise with a standard deviation of σ n . In particular, we used zeros to fill the border of the image during the convolution. The convolution results were saved as integers between 0 and 255. Next, we cut the edge of each blurred image at half the length of the blur kernel and extracted 27,468 patches with size of 256 × 256 pixels from it. We trained five models for five experiments, and the blur settings are shown in Table 4. In these deblurring experiments, the batch size was 32, the initial value of µ 0 was 0.9, and the learning rate was 0.0005. Similar to the denoising networks, our deblurring networks also reduced the learning rate by half every five epochs. During the test phase, we employed the 10 commonly used images seen in Figure 3 and the Kodak24 dataset as test images. Similarly, the above convolution was used to generate the blurred inputs for the test images. All images were operated at grayscale. With the purpose of demonstrating the excellent performance of our network, we compare it with the classical model-based methods (IDD-BM3D [11], EPLL [5], and NCSR [26]), the denoising-based IR method under framework of plug and play (IRCNN [61]), and the end-to-end deep unfolding network (DPDNN [51]). The PSNR values of the deblurring results of 10 commonly used images (see Figure 3) are shown in Table 5. The results of the IDD-BM3D, EPLL, NCSR, and IRCNN methods are obtained by restoring the test images with open codes. If there is a ringing effect in the test process, we use the egdetaper function of MATLAB to perform an edge-preserving processing on the input. We retrain the DPDNN method with our training set according to open codes. It can be seen from Table 5 that our method is obviously superior to the compared methods. Our results are 0.36 dB higher than those of the DPDNN, on average. We selected three groups of images to visually show the effect of the deblurring, as shown in Figures 6-8. Among them, Figure 6 shows that our result recovers the most object information in the region close to the background. Figure 7 shows the effect of processing high-frequency areas, and the enlarged green boxes show that we have restored more hairs and details.

Lensless Imaging
In this section, we apply our network to the lensless FlatCam. The imaging model is as in Equation (11). The corresponding analytic solution is shown in Equation (13). ̂ is the least squares estimation with the Tikhonov regularization term. We use the training In addition to the good performance of our method on the motion blur kernel, our method also works well on the Gaussian kernel. As shown in Figure 8, our result has high contrast and sharp edges. We also test our network on the Kodak24 dataset. The PSNR and SSIM values are summarized in Table 6. Our results are superior to the other results.

Conclusions
In this paper, we propose a DCNN denoiser based unrolled network for image restoration. We unfold the tedious iterative process in the model-based method into an endto-end network consisting of several iterations, each of which has an analytic solution update step and a small multiscale deep denoiser network. Every DCNN serves as a denoiser rather than as the whole inverse process, which makes the network function easier to realize. In this way, our method can take advantage of optimization and the DCNNs. Specifically, we solve the convex problem in the form of an analytic solution that is faster than gradient descent under this framework. In addition, we use different multiscale prior networks in different iterations to better accommodate the image features. Compared with a fixed DCNN, it greatly reduces the number of computations when the total parameters are the same and the number of iterations is the same. Under an unrolled optimization framework, our method incorporates the physical model into the overall network, which

Conclusions
In this paper, we propose a DCNN denoiser based unrolled network for image restoration. We unfold the tedious iterative process in the model-based method into an end-to-end network consisting of several iterations, each of which has an analytic solution update step and a small multiscale deep denoiser network. Every DCNN serves as a denoiser rather than as the whole inverse process, which makes the network function easier to realize. In this way, our method can take advantage of optimization and the DCNNs. Specifically, we solve the convex problem in the form of an analytic solution that is faster than gradient descent under this framework. In addition, we use different multiscale prior networks in different iterations to better accommodate the image features. Compared with a fixed DCNN, it greatly reduces the number of computations when the total parameters are the same and the number of iterations is the same. Under an unrolled optimization framework, our method incorporates the physical model into the overall network, which provides a guarantee of high-quality image restoration. Visual effects and quantitative evaluation of the method for three IR tasks, including denoising, deblurring, and lensless imaging, indicate that our method achieves excellent performance in high quality image reconstruction.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.